qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 10/12] block.c: add subtree_drains where needed


From: Emanuele Giuseppe Esposito
Subject: Re: [PATCH 10/12] block.c: add subtree_drains where needed
Date: Wed, 2 Feb 2022 16:37:43 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0


On 01/02/2022 15:47, Vladimir Sementsov-Ogievskiy wrote:
> 18.01.2022 19:27, Emanuele Giuseppe Esposito wrote:
>> Protect bdrv_replace_child_noperm, as it modifies the
>> graph by adding/removing elements to .children and .parents
>> list of a bs. Use the newly introduced
>> bdrv_subtree_drained_{begin/end}_unlocked drains to achieve
>> that and be free from the aiocontext lock.
>>
>> One important criteria to keep in mind is that if the caller of
>> bdrv_replace_child_noperm creates a transaction, we need to make sure
>> that the
>> whole transaction is under the same drain block. This is imperative,
>> as having
>> multiple drains also in the .abort() class of functions causes
>> discrepancies
>> in the drained counters (as nodes are put back into the original
>> positions),
>> making it really hard to retourn all to zero and leaving the code very
>> buggy.
>> See https://patchew.org/QEMU/20211213104014.69858-1-eesposit@redhat.com/
>> for more explanations.
>>
>> Unfortunately we still need to have bdrv_subtree_drained_begin/end
>> in bdrv_detach_child() releasing and then holding the AioContext
>> lock, since it later invokes bdrv_try_set_aio_context() that is
>> not safe yet. Once all is cleaned up, we can also remove the
>> acquire/release locks in job_unref, artificially added because of this.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>   block.c | 50 ++++++++++++++++++++++++++++++++++++++++++++------
>>   1 file changed, 44 insertions(+), 6 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index fcc44a49a0..6196c95aae 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -3114,8 +3114,22 @@ static void bdrv_detach_child(BdrvChild **childp)
>>       BlockDriverState *old_bs = (*childp)->bs;
>>         assert(qemu_in_main_thread());
>> +    if (old_bs) {
>> +        /*
>> +         * TODO: this is called by job_unref with lock held, because
>> +         * afterwards it calls bdrv_try_set_aio_context.
>> +         * Once all of this is fixed, take care of removing
>> +         * the aiocontext lock and make this function _unlocked.
>> +         */
>> +        bdrv_subtree_drained_begin(old_bs);
>> +    }
>> +
>>       bdrv_replace_child_noperm(childp, NULL, true);
>>   +    if (old_bs) {
>> +        bdrv_subtree_drained_end(old_bs);
>> +    }
>> +
>>       if (old_bs) {
>>           /*
>>            * Update permissions for old node. We're just taking a
>> parent away, so
>> @@ -3154,6 +3168,7 @@ BdrvChild
>> *bdrv_root_attach_child(BlockDriverState *child_bs,
>>       Transaction *tran = tran_new();
>>         assert(qemu_in_main_thread());
>> +    bdrv_subtree_drained_begin_unlocked(child_bs);
>>         ret = bdrv_attach_child_common(child_bs, child_name, child_class,
>>                                      child_role, perm, shared_perm,
>> opaque,
>> @@ -3168,6 +3183,7 @@ out:
>>       tran_finalize(tran, ret);
>>       /* child is unset on failure by bdrv_attach_child_common_abort() */
>>       assert((ret < 0) == !child);
>> +    bdrv_subtree_drained_end_unlocked(child_bs);
>>         bdrv_unref(child_bs);
>>       return child;
>> @@ -3197,6 +3213,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState
>> *parent_bs,
>>         assert(qemu_in_main_thread());
>>   +    bdrv_subtree_drained_begin_unlocked(parent_bs);
>> +    bdrv_subtree_drained_begin_unlocked(child_bs);
>> +
>>       ret = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
>> child_class,
>>                                      child_role, &child, tran, errp);
>>       if (ret < 0) {
>> @@ -3211,6 +3230,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState
>> *parent_bs,
>>   out:
>>       tran_finalize(tran, ret);
>>       /* child is unset on failure by bdrv_attach_child_common_abort() */
>> +    bdrv_subtree_drained_end_unlocked(child_bs);
>> +    bdrv_subtree_drained_end_unlocked(parent_bs);
>> +
>>       assert((ret < 0) == !child);
>>         bdrv_unref(child_bs);
>> @@ -3456,6 +3478,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs,
>> BlockDriverState *backing_hd,
>>         assert(qemu_in_main_thread());
>>   +    bdrv_subtree_drained_begin_unlocked(bs);
>> +    if (backing_hd) {
>> +        bdrv_subtree_drained_begin_unlocked(backing_hd);
>> +    }
>> +
>>       ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
>>       if (ret < 0) {
>>           goto out;
>> @@ -3464,6 +3491,10 @@ int bdrv_set_backing_hd(BlockDriverState *bs,
>> BlockDriverState *backing_hd,
>>       ret = bdrv_refresh_perms(bs, errp);
>>   out:
>>       tran_finalize(tran, ret);
>> +    if (backing_hd) {
>> +        bdrv_subtree_drained_end_unlocked(backing_hd);
>> +    }
>> +    bdrv_subtree_drained_end_unlocked(bs);
>>         return ret;
>>   }
>> @@ -5266,7 +5297,8 @@ static int
>> bdrv_replace_node_common(BlockDriverState *from,
>>         assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>>       assert(bdrv_get_aio_context(from) == bdrv_get_aio_context(to));
>> -    bdrv_drained_begin(from);
>> +    bdrv_subtree_drained_begin_unlocked(from);
>> +    bdrv_subtree_drained_begin_unlocked(to);
>>         /*
>>        * Do the replacement without permission update.
>> @@ -5298,7 +5330,8 @@ static int
>> bdrv_replace_node_common(BlockDriverState *from,
>>   out:
>>       tran_finalize(tran, ret);
>>   -    bdrv_drained_end(from);
>> +    bdrv_subtree_drained_end_unlocked(to);
>> +    bdrv_subtree_drained_end_unlocked(from);
>>       bdrv_unref(from);
>>         return ret;
>> @@ -5345,6 +5378,9 @@ int bdrv_append(BlockDriverState *bs_new,
>> BlockDriverState *bs_top,
>>         assert(!bs_new->backing);
>>   +    bdrv_subtree_drained_begin_unlocked(bs_new);
>> +    bdrv_subtree_drained_begin_unlocked(bs_top);
>> +
>>       ret = bdrv_attach_child_noperm(bs_new, bs_top, "backing",
>>                                      &child_of_bds,
>> bdrv_backing_role(bs_new),
>>                                      &bs_new->backing, tran, errp);
>> @@ -5360,6 +5396,8 @@ int bdrv_append(BlockDriverState *bs_new,
>> BlockDriverState *bs_top,
>>       ret = bdrv_refresh_perms(bs_new, errp);
>>   out:
>>       tran_finalize(tran, ret);
>> +    bdrv_subtree_drained_end_unlocked(bs_top);
>> +    bdrv_subtree_drained_end_unlocked(bs_new);
>>         bdrv_refresh_limits(bs_top, NULL, NULL);
>>   @@ -5379,8 +5417,8 @@ int bdrv_replace_child_bs(BdrvChild *child,
>> BlockDriverState *new_bs,
>>       assert(qemu_in_main_thread());
>>         bdrv_ref(old_bs);
>> -    bdrv_drained_begin(old_bs);
>> -    bdrv_drained_begin(new_bs);
>> +    bdrv_subtree_drained_begin_unlocked(old_bs);
>> +    bdrv_subtree_drained_begin_unlocked(new_bs);
>>         bdrv_replace_child_tran(&child, new_bs, tran, true);
>>       /* @new_bs must have been non-NULL, so @child must not have been
>> freed */
>> @@ -5394,8 +5432,8 @@ int bdrv_replace_child_bs(BdrvChild *child,
>> BlockDriverState *new_bs,
>>         tran_finalize(tran, ret);
>>   -    bdrv_drained_end(old_bs);
>> -    bdrv_drained_end(new_bs);
>> +    bdrv_subtree_drained_end_unlocked(new_bs);
>> +    bdrv_subtree_drained_end_unlocked(old_bs);
>>       bdrv_unref(old_bs);
>>         return ret;
>>

>From the other thread:
> So you mean that backing_hd is modified as its parent is changed, do I 
> understand correctly?

Yes

> 
> As we started to discuss in a thread started with my "[PATCH] block:
> bdrv_set_backing_hd(): use drained section", I think draining the whole
> subtree when we modify some parent-child relation is too much. In-flight
> requests in subtree don't touch these relations, so why to wait/stop
> them? Imagine two disks A and B with same backing file C. If we want to
> detach A from C, what is the reason to drain in-fligth read request of B
> in C?

If I am not mistaken, bdrv_set_backing_hd adds a new node (yes removes
the old backing_hd, but let's not think about this for a moment).
So we have disk B with backing file C, and new disk A that wants to have
backing file C.

I think I understand what you mean, so in theory the operation would be
- create new child
- add child to A->children list
- add child to C->parents list

So in theory we need to
* drain A (without subtree), because it can't happen that child nodes of
  A have in-flight requests that look at A status (children list), right?
  In other words, if A has another node X, can a request in X inspect
  A->children
* drain C, as parents can inspect C status (like B). Same assumption
  here, C->children[x]->bs cannot have requests inspecting C->parents
  list?

> Modifying children/parents lists should be protected somehow. Currently
> it's protected by aio-context lock.. If we drop it, we probably need
> some new mutex for it. Also, graph modification should be protected from
> each other, so that one graph modifiction is not started until another
> is in-flight, probably we need some global mutex for graph modification.
> But that's all not about drains.

The idea was to use BQL + drain, to obtain the same effect of aiocontext
lock. BQL should prevent concurrent graph modifications, drain
concurrent I/O reading the state while being modificated.

Emanuele
> 
> Drains are about in-flight requests. And when we switch a child of node
> X, we should do it in drained section for X, as in-flight requests in X
> can touch its children. But another in-flight requests in subtree are
> safe and should not be awaited.
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]