[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb()
From: |
Stefan Hajnoczi |
Subject: |
Re: [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb() |
Date: |
Thu, 10 Feb 2022 14:33:40 +0000 |
On Tue, Feb 08, 2022 at 10:36:54AM -0500, Emanuele Giuseppe Esposito wrote:
> This test uses a callback of an I/O function (blk_aio_preadv)
> to modify the graph, using bdrv_attach_child.
> This is simply not allowed anymore. I/O cannot change the graph.
>
> Before "block/io.c: make bdrv_do_drained_begin_quiesce static
> and introduce bdrv_drained_begin_no_poll", the test would simply
> be at risk of failure, because if bdrv_replace_child_noperm()
> (called to modify the graph) would call a drain,
> then one callback of .drained_begin() is bdrv_do_drained_begin_quiesce,
> that specifically asserts that we are not in a coroutine.
>
> Now that we fixed the behavior, the drain will invoke a bh in the
> main loop, so we don't have such problem. However, this test is still
> illegal and fails because we forbid graph changes from I/O paths.
>
> Once we add the required subtree_drains to protect
> bdrv_replace_child_noperm(), the key problem in this test is in:
>
> acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
> /* Drain and check the expected result */
> bdrv_subtree_drained_begin(parent_b);
>
> because the detach_by_parent_aio_cb calls detach_indirect_bh(), that
> modifies the graph and is invoked during bdrv_subtree_drained_begin().
> The call stack is the following:
> 1. blk_aio_preadv() creates a coroutine, increments in_flight counter
> and enters the coroutine running blk_aio_read_entry()
> 2. blk_aio_read_entry() performs the read and then schedules a bh to
> complete (blk_aio_complete)
> 3. at this point, subtree_drained_begin() kicks in and waits for all
> in_flight requests, polling
> 4. polling allows the bh to be scheduled, so blk_aio_complete runs
> 5. blk_aio_complete *first* invokes the callback
> (detach_by_parent_aio_cb) and then decrements the in_flight counter
> 6. Here we have the problem: detach_by_parent_aio_cb modifies the graph,
> so both bdrv_unref_child() and bdrv_attach_child() will have
> subtree_drains inside. And this causes a deadlock, because the
> nested drain will wait for in_flight counter to go to zero, which
> is only happening once the drain itself finishes.
>
> Different story is test_detach_by_driver_cb(): in this case,
> detach_by_parent_aio_cb() does not call detach_indirect_bh(),
> but it is instead called as a bh running in the main loop by
> detach_by_driver_cb_drained_begin(), the callback for
> .drained_begin().
>
> This test was added in 231281ab42 and part of the series
> "Drain fixes and cleanups, part 3"
> https://lists.nongnu.org/archive/html/qemu-block/2018-05/msg01132.html
> but as explained above I believe that it is not valid anymore, and
> can be discarded.
>
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
> tests/unit/test-bdrv-drain.c | 46 +++++++++---------------------------
> 1 file changed, 11 insertions(+), 35 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
signature.asc
Description: PGP signature
- Re: [PATCH 4/6] test-bdrv-drain.c: adapt test to the coming subtree drains, (continued)