qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: QMP (without OOB) function running in thread different from the main


From: Juan Quintela
Subject: Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll
Date: Thu, 27 Apr 2023 16:36:14 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Fiona Ebner <f.ebner@proxmox.com> wrote:
> Am 27.04.23 um 13:03 schrieb Kevin Wolf:
>> Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben:
>>> Am 20.04.23 um 08:55 schrieb Paolo Bonzini:

Hi

> Our function is a custom variant of saving a snapshot and uses
> qemu_savevm_state_setup(), which is why the qemu_mutex_unlock_iothread()
> is there. I looked for inspiration for how upstream does things and it
> turns out that upstream QEMU v8.0.0 has essentially the same issue with
> snapshot-save. When snapshot_save_job_bh runs in a vCPU thread instead
> of the main thread, the situation is the same: after
> qemu_mutex_unlock_iothread(), qemu_get_current_aio_context() will return
> 0x0 and then the assertion in the AIO_WAIT_WHILE_INTERNAL macro fails
> (this time the generated coroutine wrapper is bdrv_writev_vmstate)[0].
>
>
> So all bottom halves scheduled for the main thread's AioContext can
> potentially get to run in a vCPU thread and need to be very careful with
> things like qemu_mutex_unlock_iothread.
>
> Is qemu_get_current_aio_context() returning 0x0 expected? I haven't
> looked into why it happens yet. Does there need to be a way to drop the
> BQL without also giving up the main thread's AioContext or would it be
> enough to re-acquire the context?
>
> CC-ing Juan as the migration maintainer.

This is the world backwards.
The tradition is that migration people blame block layer people for
breaking things and for help, not the other way around O:-)

> Best Regards,
> Fiona
>
> [0]:
>> Thread 21 "CPU 0/KVM" received signal SIGABRT, Aborted.
>> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>> 50   ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>> (gdb) bt
>> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>> #1  0x00007f9027b3e537 in __GI_abort () at abort.c:79
>> #2 0x00007f9027b3e40f in __assert_fail_base (fmt=0x7f9027cb66a8
>> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
>> assertion=0x558ed44fcec0 "qemu_get_current_aio_context() ==
>> qemu_get_aio_context()", file=0x558ed44fce80
>> "/home/febner/repos/qemu/block/block-gen.h", line=43,
>> function=<optimized out>) at assert.c:92
>> #3 0x00007f9027b4d662 in __GI___assert_fail
>> (assertion=0x558ed44fcec0 "qemu_get_current_aio_context() ==
>> qemu_get_aio_context()", file=0x558ed44fce80
>> "/home/febner/repos/qemu/block/block-gen.h", line=43,
>> function=0x558ed44fcf80 <__PRETTY_FUNCTION__.14> "bdrv_poll_co") at
>> assert.c:101
>> #4  0x0000558ed412df5f in bdrv_poll_co (s=0x7f8ffcff37a0) at 
>> /home/febner/repos/qemu/block/block-gen.h:43
>> #5  0x0000558ed412f4cd in bdrv_writev_vmstate (bs=0x558ed60536a0, 
>> qiov=0x7f8ffcff3840, pos=0) at block/block-gen.c:809
>> #6 0x0000558ed3df36d0 in qio_channel_block_writev
>> (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0,
>> flags=0, errp=0x7f8ffcff39c0) at ../migration/channel-block.c:89
>> #7 0x0000558ed40feedb in qio_channel_writev_full
>> (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0,
>> flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:108
>> #8 0x0000558ed40ff3c3 in qio_channel_writev_full_all
>> (ioc=0x7f8ff40ac060, iov=0x7f8ff4648040, niov=1, fds=0x0, nfds=0,
>> flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:263
>> #9 0x0000558ed40ff2e4 in qio_channel_writev_all (ioc=0x7f8ff40ac060,
>> iov=0x7f8ff4648040, niov=1, errp=0x7f8ffcff39c0) at
>> ../io/channel.c:242
>> #10 0x0000558ed3dee4dc in qemu_fflush (f=0x7f8ff4640000) at 
>> ../migration/qemu-file.c:302
>> #11 0x0000558ed4050f91 in ram_save_setup (f=0x7f8ff4640000, 
>> opaque=0x558ed4ca34c0 <ram_state>) at ../migration/ram.c:3302
>> #12 0x0000558ed3e141c8 in qemu_savevm_state_setup (f=0x7f8ff4640000) at 
>> ../migration/savevm.c:1266
>> #13 0x0000558ed3e14eed in qemu_savevm_state (f=0x7f8ff4640000, 
>> errp=0x558ed68c5238) at ../migration/savevm.c:1626
>> #14 0x0000558ed3e1755e in save_snapshot (name=0x558ed72af790
>> "snap0", overwrite=false, vmstate=0x558ed6708ce0 "scsi0",
>> has_devices=true, devices=0x558ed66d6a60, errp=0x558ed68c5238) at
>> ../migration/savevm.c:2954
>> #15 0x0000558ed3e17fb1 in snapshot_save_job_bh (opaque=0x558ed68c5170) at 
>> ../migration/savevm.c:3253
>> #16 0x0000558ed42f050a in aio_bh_call (bh=0x558ed671ae00) at 
>> ../util/async.c:155
>> #17 0x0000558ed42f0615 in aio_bh_poll (ctx=0x558ed5c62910) at 
>> ../util/async.c:184
>> #18 0x0000558ed42d47b8 in aio_poll (ctx=0x558ed5c62910, blocking=true) at 
>> ../util/aio-posix.c:721
>> #19 0x0000558ed412df1c in bdrv_poll_co (s=0x7f8ffcff3eb0) at 
>> /home/febner/repos/qemu/block/block-gen.h:43
>> #20 0x0000558ed4130c3a in blk_pwrite (blk=0x558ed5ed4f60,
>> offset=230912, bytes=512, buf=0x7f8ffc438600, flags=0) at
>> block/block-gen.c:1650
>> #21 0x0000558ed3ba9078 in pflash_update (pfl=0x558ed5eb7b30, offset=230912, 
>> size=1) at ../hw/block/pflash_cfi01.c:394
>> #22 0x0000558ed3ba9749 in pflash_write (pfl=0x558ed5eb7b30,
>> offset=231232, value=0, width=1, be=0) at
>> ../hw/block/pflash_cfi01.c:522
>> #23 0x0000558ed3ba9cda in pflash_mem_write_with_attrs
>> (opaque=0x558ed5eb7b30, addr=231232, value=0, len=1, attrs=...) at
>> ../hw/block/pflash_cfi01.c:681
>> #24 0x0000558ed402a36a in memory_region_write_with_attrs_accessor
>> (mr=0x558ed5eb7ef0, addr=231232, value=0x7f8ffcff40c8, size=1,
>> shift=0, mask=255, attrs=...) at ../softmmu/memory.c:514
>> #25 0x0000558ed402a4a9 in access_with_adjusted_size (addr=231232,
>> value=0x7f8ffcff40c8, size=1, access_size_min=1, access_size_max=4,
>> access_fn=0x558ed402a270 <memory_region_write_with_attrs_accessor>,
>> mr=0x558ed5eb7ef0, attrs=...) at ../softmmu/memory.c:555
>> #26 0x0000558ed402d5de in memory_region_dispatch_write
>> (mr=0x558ed5eb7ef0, addr=231232, data=0, op=MO_8, attrs=...) at
>> ../softmmu/memory.c:1522
>> #27 0x0000558ed403a6f4 in flatview_write_continue
>> (fv=0x558ed66d62c0, addr=4291004224, attrs=..., ptr=0x7f9029957028,
>> len=1, addr1=231232, l=1, mr=0x558ed5eb7ef0) at
>> ../softmmu/physmem.c:2641
>> #28 0x0000558ed403a857 in flatview_write (fv=0x558ed66d62c0,
>> addr=4291004224, attrs=..., buf=0x7f9029957028, len=1) at
>> ../softmmu/physmem.c:2683
>> #29 0x0000558ed403ac07 in address_space_write (as=0x558ed4ca2b20
>> <address_space_memory>, addr=4291004224, attrs=...,
>> buf=0x7f9029957028, len=1) at ../softmmu/physmem.c:2779
>> #30 0x0000558ed403ac74 in address_space_rw (as=0x558ed4ca2b20
>> <address_space_memory>, addr=4291004224, attrs=...,
>> buf=0x7f9029957028, len=1, is_write=true) at
>> ../softmmu/physmem.c:2789
>> #31 0x0000558ed40cea88 in kvm_cpu_exec (cpu=0x558ed622a910) at 
>> ../accel/kvm/kvm-all.c:2989
>> #32 0x0000558ed40d179a in kvm_vcpu_thread_fn (arg=0x558ed622a910) at 
>> ../accel/kvm/kvm-accel-ops.c:51
>> #33 0x0000558ed42d925f in qemu_thread_start (args=0x558ed5c68c80) at 
>> ../util/qemu-thread-posix.c:541
>> #34 0x00007f9028ab7ea7 in start_thread (arg=<optimized out>) at 
>> pthread_create.c:477
>> #35 0x00007f9027c18a2f in clone () at 
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

To see that I am understading this right:

- you create a thread
- that calls a memory_region operation
- that calls a device write function
- that calls the block layer
- that creates a snapshot
- that calls the migration code
- that calls the block layer again

Without further investigation, I have no clue what is going on here,
sorry.

Later, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]