qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB


From: Wolfgang Bumiller
Subject: Re: [PATCH] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB
Date: Mon, 22 Mar 2021 12:08:47 +0100
User-agent: NeoMutt/20180716

On Thu, Mar 18, 2021 at 02:35:50PM +0100, Stefan Reiter wrote:
> If OOB is disabled, events received in monitor_qmp_event will be handled
> in the main context. Thus, we must not acquire a qmp_queue_lock there,
> as the dispatcher coroutine holds one over a yield point, where it
> expects to be rescheduled from the main context. If a CHR_EVENT_CLOSED
> event is received just then, it can race and block the main thread by
> waiting on the queue lock.
> 
> Run monitor_qmp_cleanup_queue_and_resume in a BH on the iohandler
> thread, so the main thread can always make progress during the
> reschedule.
> 
> The delaying of the cleanup is safe, since the dispatcher always moves
> back to the iothread afterward, and thus the cleanup will happen before
> it gets to its next iteration.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---

This is a tough one. It *may* be fine, but I wonder if we can approach
this differently:

>From what I can gather we have the following call stacks & contexts:

Guarded lock (lock & release):
  * monitor_qmp_cleanup_queue_and_resume
    by monitor_qmp_event
    by file handler (from I/O loop)
    ^ iohandler_context (assuming that's where the file handling happens...)
    (after this patch as BH though)

  * handle_qmp_command
    a) by the json parser (which is also re-initialized by
       monitor_qmp_event btw., haven't checked if that can also
       "trigger" its methods immediately)
    b) by monitor_qmp_read
    by file handler (from I/O loop)
    ^ iohandler_context

Lock-"returning":
  * monitor_qmp_requests_pop_any_with_lock
    by coroutine_fn monitor_qmp_dispatcher_co
    ^ iohandler_context

Lock-releasing:
  * coroutine_fn monitor_qmp_dispatcher_co
    ^ qemu_aio_context

The only *weird* thing that immediately pops out here is
`monitor_qmp_requests_pop_any_with_lock()` keeping a lock while
switching contexts.
This is done in order to allow `AIO_WAIT_WHILE` to work while making
progress on the events, but do we actually already need to be in this
context for the OOB `monitor_resume()` call or can we defer the context
switch to after having done that and released the lock?
`monitor_resume()` itself seems to simply schedule a BH which should
work regardless if I'm not mistaken. There's also a
`readline_show_prompt()` call, but that *looks* harmless?
`monitor_resume()` is also called without the lock later on, so even if
it needs to be in this context at that point for whatever reason, does
it need the lock?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]