qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 10/12] block.c: add subtree_drains where needed


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH 10/12] block.c: add subtree_drains where needed
Date: Fri, 4 Feb 2022 12:49:49 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0

02.02.2022 18:37, Emanuele Giuseppe Esposito wrote:


On 01/02/2022 15:47, Vladimir Sementsov-Ogievskiy wrote:
18.01.2022 19:27, Emanuele Giuseppe Esposito wrote:
Protect bdrv_replace_child_noperm, as it modifies the
graph by adding/removing elements to .children and .parents
list of a bs. Use the newly introduced
bdrv_subtree_drained_{begin/end}_unlocked drains to achieve
that and be free from the aiocontext lock.

One important criteria to keep in mind is that if the caller of
bdrv_replace_child_noperm creates a transaction, we need to make sure
that the
whole transaction is under the same drain block. This is imperative,
as having
multiple drains also in the .abort() class of functions causes
discrepancies
in the drained counters (as nodes are put back into the original
positions),
making it really hard to retourn all to zero and leaving the code very
buggy.
See https://patchew.org/QEMU/20211213104014.69858-1-eesposit@redhat.com/
for more explanations.

Unfortunately we still need to have bdrv_subtree_drained_begin/end
in bdrv_detach_child() releasing and then holding the AioContext
lock, since it later invokes bdrv_try_set_aio_context() that is
not safe yet. Once all is cleaned up, we can also remove the
acquire/release locks in job_unref, artificially added because of this.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
   block.c | 50 ++++++++++++++++++++++++++++++++++++++++++++------
   1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index fcc44a49a0..6196c95aae 100644
--- a/block.c
+++ b/block.c
@@ -3114,8 +3114,22 @@ static void bdrv_detach_child(BdrvChild **childp)
       BlockDriverState *old_bs = (*childp)->bs;
         assert(qemu_in_main_thread());
+    if (old_bs) {
+        /*
+         * TODO: this is called by job_unref with lock held, because
+         * afterwards it calls bdrv_try_set_aio_context.
+         * Once all of this is fixed, take care of removing
+         * the aiocontext lock and make this function _unlocked.
+         */
+        bdrv_subtree_drained_begin(old_bs);
+    }
+
       bdrv_replace_child_noperm(childp, NULL, true);
   +    if (old_bs) {
+        bdrv_subtree_drained_end(old_bs);
+    }
+
       if (old_bs) {
           /*
            * Update permissions for old node. We're just taking a
parent away, so
@@ -3154,6 +3168,7 @@ BdrvChild
*bdrv_root_attach_child(BlockDriverState *child_bs,
       Transaction *tran = tran_new();
         assert(qemu_in_main_thread());
+    bdrv_subtree_drained_begin_unlocked(child_bs);
         ret = bdrv_attach_child_common(child_bs, child_name, child_class,
                                      child_role, perm, shared_perm,
opaque,
@@ -3168,6 +3183,7 @@ out:
       tran_finalize(tran, ret);
       /* child is unset on failure by bdrv_attach_child_common_abort() */
       assert((ret < 0) == !child);
+    bdrv_subtree_drained_end_unlocked(child_bs);
         bdrv_unref(child_bs);
       return child;
@@ -3197,6 +3213,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState
*parent_bs,
         assert(qemu_in_main_thread());
   +    bdrv_subtree_drained_begin_unlocked(parent_bs);
+    bdrv_subtree_drained_begin_unlocked(child_bs);
+
       ret = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
child_class,
                                      child_role, &child, tran, errp);
       if (ret < 0) {
@@ -3211,6 +3230,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState
*parent_bs,
   out:
       tran_finalize(tran, ret);
       /* child is unset on failure by bdrv_attach_child_common_abort() */
+    bdrv_subtree_drained_end_unlocked(child_bs);
+    bdrv_subtree_drained_end_unlocked(parent_bs);
+
       assert((ret < 0) == !child);
         bdrv_unref(child_bs);
@@ -3456,6 +3478,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs,
BlockDriverState *backing_hd,
         assert(qemu_in_main_thread());
   +    bdrv_subtree_drained_begin_unlocked(bs);
+    if (backing_hd) {
+        bdrv_subtree_drained_begin_unlocked(backing_hd);
+    }
+
       ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
       if (ret < 0) {
           goto out;
@@ -3464,6 +3491,10 @@ int bdrv_set_backing_hd(BlockDriverState *bs,
BlockDriverState *backing_hd,
       ret = bdrv_refresh_perms(bs, errp);
   out:
       tran_finalize(tran, ret);
+    if (backing_hd) {
+        bdrv_subtree_drained_end_unlocked(backing_hd);
+    }
+    bdrv_subtree_drained_end_unlocked(bs);
         return ret;
   }
@@ -5266,7 +5297,8 @@ static int
bdrv_replace_node_common(BlockDriverState *from,
         assert(qemu_get_current_aio_context() == qemu_get_aio_context());
       assert(bdrv_get_aio_context(from) == bdrv_get_aio_context(to));
-    bdrv_drained_begin(from);
+    bdrv_subtree_drained_begin_unlocked(from);
+    bdrv_subtree_drained_begin_unlocked(to);
         /*
        * Do the replacement without permission update.
@@ -5298,7 +5330,8 @@ static int
bdrv_replace_node_common(BlockDriverState *from,
   out:
       tran_finalize(tran, ret);
   -    bdrv_drained_end(from);
+    bdrv_subtree_drained_end_unlocked(to);
+    bdrv_subtree_drained_end_unlocked(from);
       bdrv_unref(from);
         return ret;
@@ -5345,6 +5378,9 @@ int bdrv_append(BlockDriverState *bs_new,
BlockDriverState *bs_top,
         assert(!bs_new->backing);
   +    bdrv_subtree_drained_begin_unlocked(bs_new);
+    bdrv_subtree_drained_begin_unlocked(bs_top);
+
       ret = bdrv_attach_child_noperm(bs_new, bs_top, "backing",
                                      &child_of_bds,
bdrv_backing_role(bs_new),
                                      &bs_new->backing, tran, errp);
@@ -5360,6 +5396,8 @@ int bdrv_append(BlockDriverState *bs_new,
BlockDriverState *bs_top,
       ret = bdrv_refresh_perms(bs_new, errp);
   out:
       tran_finalize(tran, ret);
+    bdrv_subtree_drained_end_unlocked(bs_top);
+    bdrv_subtree_drained_end_unlocked(bs_new);
         bdrv_refresh_limits(bs_top, NULL, NULL);
   @@ -5379,8 +5417,8 @@ int bdrv_replace_child_bs(BdrvChild *child,
BlockDriverState *new_bs,
       assert(qemu_in_main_thread());
         bdrv_ref(old_bs);
-    bdrv_drained_begin(old_bs);
-    bdrv_drained_begin(new_bs);
+    bdrv_subtree_drained_begin_unlocked(old_bs);
+    bdrv_subtree_drained_begin_unlocked(new_bs);
         bdrv_replace_child_tran(&child, new_bs, tran, true);
       /* @new_bs must have been non-NULL, so @child must not have been
freed */
@@ -5394,8 +5432,8 @@ int bdrv_replace_child_bs(BdrvChild *child,
BlockDriverState *new_bs,
         tran_finalize(tran, ret);
   -    bdrv_drained_end(old_bs);
-    bdrv_drained_end(new_bs);
+    bdrv_subtree_drained_end_unlocked(new_bs);
+    bdrv_subtree_drained_end_unlocked(old_bs);
       bdrv_unref(old_bs);
         return ret;


 From the other thread:
So you mean that backing_hd is modified as its parent is changed, do I 
understand correctly?

Yes


As we started to discuss in a thread started with my "[PATCH] block:
bdrv_set_backing_hd(): use drained section", I think draining the whole
subtree when we modify some parent-child relation is too much. In-flight
requests in subtree don't touch these relations, so why to wait/stop
them? Imagine two disks A and B with same backing file C. If we want to
detach A from C, what is the reason to drain in-fligth read request of B
in C?

If I am not mistaken, bdrv_set_backing_hd adds a new node (yes removes
the old backing_hd, but let's not think about this for a moment).
So we have disk B with backing file C, and new disk A that wants to have
backing file C.

I think I understand what you mean, so in theory the operation would be
- create new child
- add child to A->children list
- add child to C->parents list

So in theory we need to
* drain A (without subtree), because it can't happen that child nodes of
   A have in-flight requests that look at A status (children list), right?
   In other words, if A has another node X, can a request in X inspect
   A->children

It should not happen. In my understanding, IO request never access parents of 
the node.

* drain C, as parents can inspect C status (like B). Same assumption
   here, C->children[x]->bs cannot have requests inspecting C->parents
   list?

No, I think we should not drain C. IO requests don't inspect parents list. And 
if some _other_ operations do inspect it, drain will not help, as drain is only 
about IO requests.


Modifying children/parents lists should be protected somehow. Currently
it's protected by aio-context lock.. If we drop it, we probably need
some new mutex for it. Also, graph modification should be protected from
each other, so that one graph modifiction is not started until another
is in-flight, probably we need some global mutex for graph modification.
But that's all not about drains.

The idea was to use BQL + drain, to obtain the same effect of aiocontext
lock. BQL should prevent concurrent graph modifications, drain
concurrent I/O reading the state while being modificated.


Concurrent I/O should not touch bs->parents list, so we don't need to drain C.


Drains are about in-flight requests. And when we switch a child of node
X, we should do it in drained section for X, as in-flight requests in X
can touch its children. But another in-flight requests in subtree are
safe and should not be awaited.





--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]