qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: QMP (without OOB) function running in thread different from the main


From: Juan Quintela
Subject: Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll
Date: Fri, 28 Apr 2023 09:47:27 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Fiona Ebner <f.ebner@proxmox.com> wrote:
> Am 27.04.23 um 16:36 schrieb Juan Quintela:
>> Fiona Ebner <f.ebner@proxmox.com> wrote:
>>> Am 27.04.23 um 13:03 schrieb Kevin Wolf:
>>>> Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben:
>>>>> Am 20.04.23 um 08:55 schrieb Paolo Bonzini:
>> 
>> Hi
>> 
>>> Our function is a custom variant of saving a snapshot and uses
>>> qemu_savevm_state_setup(), which is why the qemu_mutex_unlock_iothread()
>>> is there. I looked for inspiration for how upstream does things and it
>>> turns out that upstream QEMU v8.0.0 has essentially the same issue with
>>> snapshot-save. When snapshot_save_job_bh runs in a vCPU thread instead
>>> of the main thread, the situation is the same: after
>>> qemu_mutex_unlock_iothread(), qemu_get_current_aio_context() will return
>>> 0x0 and then the assertion in the AIO_WAIT_WHILE_INTERNAL macro fails
>>> (this time the generated coroutine wrapper is bdrv_writev_vmstate)[0].
>>>
>>>
>>> So all bottom halves scheduled for the main thread's AioContext can
>>> potentially get to run in a vCPU thread and need to be very careful with
>>> things like qemu_mutex_unlock_iothread.
>>>
>>> Is qemu_get_current_aio_context() returning 0x0 expected? I haven't
>>> looked into why it happens yet. Does there need to be a way to drop the
>>> BQL without also giving up the main thread's AioContext or would it be
>>> enough to re-acquire the context?
>>>
>>> CC-ing Juan as the migration maintainer.
>> 
>> This is the world backwards.
>> The tradition is that migration people blame block layer people for
>> breaking things and for help, not the other way around O:-)
>
> Sorry, if I didn't provide enough context/explanation. See below for my
> attempt to re-iterate. I CC'ed you, because the issue happens as part of
> snapshot-save and in particular the qemu_mutex_unlock_iothread call in
> qemu_savevm_state is one of the ingredients leading to the problem.

This was a joke O:-)

>> To see that I am understading this right:
>> 
>> - you create a thread
>> - that calls a memory_region operation
>> - that calls a device write function
>> - that calls the block layer
>> - that creates a snapshot
>> - that calls the migration code
>> - that calls the block layer again
>> 
>> Without further investigation, I have no clue what is going on here,
>> sorry.
>> 
>> Later, Juan.
>> 
>
> All I'm doing is using QEMU (a build of upstream's v8.0.0) in intended
> ways, I promise! In particular, I'm doing two things at the same time
> repeatedly:
> 1. Write to a pflash drive from within the guest.
> 2. Issue a snapshot-save QMP command (in a way that doesn't lead to an
> early error).
>
> (I actually also used a debugger to break on pflash_update and
> snapshot_save_job_bh, manually continuing until I triggered the
> problematic situation. It's very racy, because it depends on the host OS
> to switch threads at the correct time.)

I think the block layer is the problem (famous last words)

>
> Now we need to be aware of two things:
> 1. As discussed earlier in the mail thread, if the host OS switches
> threads at an inconvenient time, it can happen that a bottom half
> scheduled for the main thread's AioContext can be executed as part of a
> vCPU thread's aio_poll.
> 2. Generated coroutine wrappers for block layer functions spawn the
> coroutine and use AIO_WAIT_WHILE/aio_poll to wait for it to finish.
>
> What happens in the backtrace above is:
> 1. The write to the pflash drive uses blk_pwrite which leads to an
> aio_poll in the vCPU thread.
> 2. The snapshot_save_job_bh bottom half, that was scheduled for the main
> thread's AioContext, is executed as part of the vCPU thread's aio_poll.
> 3. qemu_savevm_state is called.
> 4. qemu_mutex_unlock_iothread is called. Now
> qemu_get_current_aio_context returns 0x0. Usually, snapshot_save_job_bh
> runs in the main thread, in which case qemu_get_current_aio_context
> still returns the main thread's AioContext at this point.

I am perhaps a bit ingenuous here, but it is there a way to convince qemu
that snapshot_save_job_bh *HAS* to run on the main thread?

> 5. bdrv_writev_vmstate is executed as part of the usual savevm setup.
> 6. bdrv_writev_vmstate is a generated coroutine wrapper, so it uses
> AIO_WAIT_WHILE.
> 7. The assertion to have the main thread's AioContext inside the
> AIO_WAIT_WHILE macro fails.

several problems here:
a- There is no "test/qtest/snapshot-test" around
   Hint, Hint.
   Basically snapshots are the bastard sibling of migration, and nobody
   really tests them.
b- In an ideal world, migration shouldn't need a bottom_handler
   Remember, it has its own thread.  But there are functions that can
   only been called from the main thread.  And no, I don't remember
   which, I just try very hard not to touch that part of the code.
c- At that point the vcpus are stopped, for migration it doesn't matter
   a lot(*) to have to use a bottom handler.
d- snapshots are a completely different beast, that don't really stop
   the guest in the same way at that point, and sometimes it shows in
   this subtle details.

I am sorry that I can't be of more help.  I still think that the block
layer people should have a clue at what is going on here.

Later, Juan.

*: That is not completely true.  There is no way to get back from the
   bottom handler to the iterative stage of migration.  If something
   happens at that point, we just have to cancel migration. But we will
   left this discussion for another day.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]