qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] block/stream: Drain subtree around graph change


From: John Snow
Subject: Re: [PATCH v2] block/stream: Drain subtree around graph change
Date: Thu, 24 Mar 2022 14:49:48 -0400

On Thu, Mar 24, 2022 at 10:09 AM Hanna Reitz <hreitz@redhat.com> wrote:
>
> When the stream block job cuts out the nodes between top and base in
> stream_prepare(), it does not drain the subtree manually; it fetches the
> base node, and tries to insert it as the top node's backing node with
> bdrv_set_backing_hd().  bdrv_set_backing_hd() however will drain, and so
> the actual base node might change (because the base node is actually not
> part of the stream job) before the old base node passed to
> bdrv_set_backing_hd() is installed.
>
> This has two implications:
>
> First, the stream job does not keep a strong reference to the base node.
> Therefore, if it is deleted in bdrv_set_backing_hd()'s drain (e.g.
> because some other block job is drained to finish), we will get a
> use-after-free.  We should keep a strong reference to that node.

Does that introduce any possibility of deadlock if we're now keeping a
strong reference? I guess not, the job can just delete its own
reference and everything's ... fine, right?

>
> Second, even with such a strong reference, the problem remains that the
> base node might change before bdrv_set_backing_hd() actually runs and as
> a result the wrong base node is installed.

ow

>
> Both effects can be seen in 030's TestParallelOps.test_overlapping_5()
> case, which has five nodes, and simultaneously streams from the middle
> node to the top node, and commits the middle node down to the base node.
> As it is, this will sometimes crash, namely when we encounter the
> above-described use-after-free.


Is there a BZ# or is this an occasional failure in 030 you saw? What
does failure look like? Does it require anything special to trigger?

>
> Taking a strong reference to the base node, we no longer get a crash,
> but the resuling block graph is less than ideal: The expected result is
> obviously that all middle nodes are cut out and the base node is the
> immediate backing child of the top node.  However, if stream_prepare()
> takes a strong reference to its base node (the middle node), and then
> the commit job finishes in bdrv_set_backing_hd(), supposedly dropping
> that middle node, the stream job will just reinstall it again.
>
> Therefore, we need to keep the whole subtree drained in
> stream_prepare(), so that the graph modification it performs is
> effectively atomic, i.e. that the base node it fetches is still the base
> node when bdrv_set_backing_hd() sets it as the top node's backing node.
>
> Verify this by asserting in said 030's test case that the base node is
> always the top node's immediate backing child when both jobs are done.
>

(Off-topic: Sometimes I dream about having a block graph rendering
tool that can update step-by-step, so I can visualize what these block
operations look like. My "working memory" is kind of limited and I get
sidetracked too easily tracing code. That we have the ability to
render at a single point is pretty nice, but it's still hard to get
images from intermediate steps when things happen in tight sequence
without the chance for intervention.)

> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
> v2:
> - Oops, the base can be NULL.  Would have noticed if I had ran all test
>   cases from 030, and not just test_overlapping_5()...
>   That means that keeping a strong reference to the base node must be
>   conditional, based on whether there even is a base node or not.
>   (I mean, technically we do not even need to keep a strong reference to
>   that node, given that we are in a drained section, but I believe it is
>   better style to do it anyway.)
> ---
>  block/stream.c         | 15 ++++++++++++++-
>  tests/qemu-iotests/030 |  5 +++++
>  2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/block/stream.c b/block/stream.c
> index 3acb59fe6a..694709bd25 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -64,7 +64,13 @@ static int stream_prepare(Job *job)
>      bdrv_cor_filter_drop(s->cor_filter_bs);
>      s->cor_filter_bs = NULL;
>
> +    bdrv_subtree_drained_begin(s->above_base);
> +
>      base = bdrv_filter_or_cow_bs(s->above_base);
> +    if (base) {
> +        bdrv_ref(base);
> +    }
> +
>      unfiltered_base = bdrv_skip_filters(base);
>
>      if (bdrv_cow_child(unfiltered_bs)) {
> @@ -75,14 +81,21 @@ static int stream_prepare(Job *job)
>                  base_fmt = unfiltered_base->drv->format_name;
>              }
>          }
> +
>          bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
>          ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt, 
> false);
>          if (local_err) {
>              error_report_err(local_err);
> -            return -EPERM;
> +            ret = -EPERM;
> +            goto out;
>          }
>      }
>
> +out:
> +    if (base) {
> +        bdrv_unref(base);
> +    }
> +    bdrv_subtree_drained_end(s->above_base);
>      return ret;
>  }
>
> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
> index 567bf1da67..14112835ed 100755
> --- a/tests/qemu-iotests/030
> +++ b/tests/qemu-iotests/030
> @@ -436,6 +436,11 @@ class TestParallelOps(iotests.QMPTestCase):
>          self.vm.run_job(job='node4', auto_dismiss=True)
>          self.assert_no_active_block_jobs()
>
> +        # Assert that node0 is now the backing node of node4
> +        result = self.vm.qmp('query-named-block-nodes')
> +        node4 = next(node for node in result['return'] if node['node-name'] 
> == 'node4')
> +        self.assertEqual(node4['image']['backing-image']['filename'], 
> self.imgs[0])
> +
>      # Test a block-stream and a block-commit job in parallel
>      # Here the stream job is supposed to finish quickly in order to reproduce
>      # the scenario that triggers the bug fixed in 3d5d319e1221 and 
> 1a63a907507
> --
> 2.35.1
>

Seems reasonable, but the best I can give right now is an ACK because
I'm a little rusty with block graph ops ...

--js




reply via email to

[Prev in Thread] Current Thread [Next in Thread]