qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] blk: postpone request execution on a context pr


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH] blk: postpone request execution on a context protected with "drained section"
Date: Wed, 13 Mar 2019 17:04:31 +0100
User-agent: Mutt/1.11.3 (2019-02-01)

Am 14.12.2018 um 12:54 hat Denis Plotnikov geschrieben:
> On 13.12.2018 15:20, Kevin Wolf wrote:
> > Am 13.12.2018 um 12:07 hat Denis Plotnikov geschrieben:
> >> Sounds it should be so, but it doesn't work that way and that's why:
> >> when doing mirror we may resume postponed coroutines too early when the
> >> underlying bs is protected from writing at and thus we encounter the
> >> assert on a write request execution at bdrv_co_write_req_prepare when
> >> resuming the postponed coroutines.
> >>
> >> The thing is that the bs is protected for writing before execution of
> >> bdrv_replace_node at mirror_exit_common and bdrv_replace_node calls
> >> bdrv_replace_child_noperm which, in turn, calls child->role->drained_end
> >> where one of the callbacks is blk_root_drained_end which check
> >> if(--blk->quiesce_counter == 0) and runs the postponed requests
> >> (coroutines) if the coundition is true.
> > 
> > Hm, so something is messed up with the drain sections in the mirror
> > driver. We have:
> > 
> >      bdrv_drained_begin(target_bs);
> >      bdrv_replace_node(to_replace, target_bs, &local_err);
> >      bdrv_drained_end(target_bs);
> > 
> > Obviously, the intention was to keep the BlockBackend drained during
> > bdrv_replace_node(). So how could blk->quiesce_counter ever get to 0
> > inside bdrv_replace_node() when target_bs is drained?
> > 
> > Looking at bdrv_replace_child_noperm(), it seems that the function has
> > a bug: Even if old_bs and new_bs are both drained, the quiesce_counter
> > for the parent reaches 0 for a moment because we call .drained_end for
> > the old child first and .drained_begin for the new one later.
> > 
> > So it seems the fix would be to reverse the order and first call
> > .drained_begin for the new child and then .drained_end for the old
> > child. Sounds like a good new testcase for tests/test-bdrv-drain.c, too.
> Yes, it's true, but it's not enough...

Did you ever implement the changes suggested so far, so that we could
continue from there? Or should I try and come up with something myself?

> In mirror_exit_common() we actively manipulate with block driver states.
> When we replaced a node in the snippet you showed we can't allow the 
> postponed coroutines to run because the block tree isn't ready to 
> receive the requests yet.
> To be ready, we need to insert a proper block driver state to the block 
> backend which is done here
> 
>      blk_remove_bs(bjob->blk);
>      blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
>      blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort); << << << <<
> 
>      bs_opaque->job = NULL;
> 
>      bdrv_drained_end(src);

Did you actually encounter a bug here or is this just theory? bjob->blk
is the BlockBackend of the job and isn't in use at this point any more.
We only insert the old node in it again because block_job_free() must
set bs->job = NULL, and it gets bs with blk_bs(bjob->blk).

So if there is an actual bug here, I don't understand it yet.

> If the tree isn't ready and we resume the coroutines, we'll end up with 
> the request landed in a wrong block driver state.
> 
> So, we explicitly should stop all activities on all the driver states
> and its parents and allow the activities when everything is ready to go.
> 
> Why explicitly, because the block driver states may belong to different 
> block backends at the moment of the manipulation beginning.
> 
> So, it seems we need to disable all their contexts until the 
> manipulation ends.

If there actually is a bug, it is certainly not solved by calling
aio_disable_external() (it is bad enough that this even exists), but by
keeping the node drained.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]