qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations


From: Kevin Wolf
Subject: Re: [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine
Date: Fri, 11 Feb 2022 12:54:30 +0100

Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
> Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin()
> is not a good idea: the callback might be called when running
> a drain in a coroutine, and bdrv_drained_begin_poll() does not
> handle that case, resulting in assertion failure.

I remembered that we talked about this only recently on IRC, but it
didn't make any sense to me again when I read this commit message. So I
think we need --verbose.

The .drained_begin callback was always meant to run outside of coroutine
context, so the unexpected part isn't that it calls a function that
can't run in coroutine context, but that it is already called itself in
coroutine context.

The problematic path is bdrv_replace_child_noperm() which then calls
bdrv_parent_drained_begin_single(poll=true). Polling in coroutine
context is dangerous, it can cause deadlocks because the caller of the
coroutine can't make progress. So I believe this call is already wrong
in coroutine context.

Now I don't know the call path up to bdrv_replace_child_noperm(), but as
far as I remember, that was another function that was originally never
run in coroutine context. Maybe we have good reason to change this, I
can't point to anything that would be inherently wrong with it, but I
would still be curious in which context it does run in a coroutine now.

Anyway, whatever the specific place is, I believe we must drop out of
coroutine context _before_ calling bdrv_parent_drained_begin_single(),
not only in callbacks called by it.

> Instead, bdrv_do_drained_begin with no recursion and poll
> will accomplish the same thing (invoking bdrv_do_drained_begin_quiesce)
> but will firstly check if we are already in a coroutine, and exit
> from that via bdrv_co_yield_to_drain().
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]