qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: QMP (without OOB) function running in thread different from the main


From: Kevin Wolf
Subject: Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll
Date: Thu, 27 Apr 2023 13:03:19 +0200

Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben:
> Am 20.04.23 um 08:55 schrieb Paolo Bonzini:
> > 
> > 
> > Il gio 20 apr 2023, 08:11 Markus Armbruster <armbru@redhat.com
> > <mailto:armbru@redhat.com>> ha scritto:
> > 
> >     So, splicing in a bottom half unmoored monitor commands from the main
> >     loop.  We weren't aware of that, as our commit messages show.
> > 
> >     I guess the commands themselves don't care; all they need is the BQL.
> > 
> >     However, did we unwittingly change what can get blocked?  Before,
> >     monitor commands could block only the main thread.  Now they can also
> >     block vCPU threads.  Impact?
> > 
> > 
> > Monitor commands could always block vCPU threads through the BQL(*).
> > However, aio_poll() only runs in the vCPU threads in very special cases;
> > typically associated to resetting a device which causes a blk_drain() on
> > the device's BlockBackend. So it is not a performance issue.
> > 
> 
> AFAIU, all generated coroutine wrappers use aio_poll. In my backtrace
> aio_poll happens via blk_pwrite for a pflash device. So a bit more
> often than "very special cases" ;)

Yes, it's a common thing for devices that start requests from the vcpu
thread when handling I/O (as opposed to devices that use an eventfd or
similar mechanisms).

> > However, liberal reuse of the main block layer AioContext could indeed
> > be a *correctness* issue. I need to re-read Fiona's report instead of
> > stopping at the first three lines because it's the evening. :)
> 
> For me, being called in a vCPU thread caused problems with a custom QMP
> function patched in by Proxmox. The function uses a newly opened
> BlockBackend and calls qemu_mutex_unlock_iothread() after which
> qemu_get_current_aio_context() returns 0x0 (when running in the main
> thread, it still returns the main thread's AioContext). It then calls
> blk_pwritev which is also a generated coroutine wrapper and the
> assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> in the else branch of the AIO_WAIT_WHILE_INTERNAL macro fails.
> 
> Sounds like there's room for improvement in our code :/ I'm not aware
> of something similar in upstream QEMU.

Yes, even if it didn't crash immediately, calling blk_*() without
holding a lock is invalid. In many cases, this is the BQL. If you don't
hold it while calling the function from a vcpu thread, you could run
into races with the main thread, which would probably be very painful to
debug.

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]