qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v2 0/9] block: Delay poll when ending drained se


From: Max Reitz
Subject: Re: [Qemu-block] [PATCH v2 0/9] block: Delay poll when ending drained sections
Date: Wed, 17 Jul 2019 15:20:07 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2

On 16.07.19 18:37, Kevin Wolf wrote:
> Am 16.07.2019 um 18:24 hat Max Reitz geschrieben:
>> On 16.07.19 16:40, Kevin Wolf wrote:
>>> Am 19.06.2019 um 17:25 hat Max Reitz geschrieben:
>>>> Hi,
>>>>
>>>> This is v2 to “block: Keep track of parent quiescing”.
>>>>
>>>> Please read this cover letter, because I’m very unsure about the design
>>>> in this series and I’d appreciate some comments.
>>>>
>>>> As Kevin wrote in his reply to that series, the actual problem is that
>>>> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
>>>> may cause graph changes, which is actually forbidden.
>>>>
>>>> To solve that problem, this series makes the drain code construct a list
>>>> of undrain operations that have been initiated, and then polls all of
>>>> them on the root level once graph changes are acceptable.
>>>>
>>>> Note that I don’t like this list concept very much, so I’m open to
>>>> alternatives.
>>>
>>> So drain_end is different from drain_begin in that it wants to wait only
>>> for all bdrv_drain_invoke() calls to complete, but not for other
>>> requests that are in flight. Makes sense.
>>>
>>> Though instead of managing a whole list, wouldn't a counter suffice?
>>>
>>>> Furthermore, all BdrvChildRoles with BDS parents have a broken
>>>> .drained_end() implementation.  The documentation clearly states that
>>>> this function is not allowed to poll, but it does.  So this series
>>>> changes it to a variant (using the new code) that does not poll.
>>>>
>>>> There is a catch, which may actually be a problem, I don’t know: The new
>>>> variant of that .drained_end() does not poll at all, never.  As
>>>> described above, now every bdrv_drain_invoke() returns an object that
>>>> describes when it will be done and which can thus be polled for.  These
>>>> objects are just discarded when using BdrvChildRole.drained_end().  That
>>>> does not feel quite right.  It would probably be more correct to let
>>>> BdrvChildRole.drained_end() return these objects so the top level
>>>> bdrv_drained_end() can poll for their completion.
>>>>
>>>> I decided not to do this, for two reasons:
>>>> (1) Doing so would spill the “list of objects to poll for” design to
>>>>     places outside of block/io.c.  I don’t like the design very much as
>>>>     it is, but I can live with it as long as it’s constrained to the
>>>>     core drain code in block/io.c.
>>>>     This is made worse by the fact that currently, those objects are of
>>>>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
>>>>     type that is externally visible (we only need the AioContext and
>>>>     whether bdrv_drain_invoke_entry() is done).
>>>>
>>>> (2) It seems to work as it is.
>>>>
>>>> The alternative would be to add the same GSList ** parameter to
>>>> BdrvChildRole.drained_end() that I added in the core drain code in patch
>>>> 2, and then let the .drained_end() implementation fill that with objects
>>>> to poll for.  (Which would be accomplished by adding a frontend to
>>>> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
>>>> parameter through.)
>>>>
>>>> Opinions?
>>>
>>> I think I would add an int* to BdrvChildRole.drained_end() so that we
>>> can just increase the counter whereever we need to.
>>
>> So you mean just polling the @bs for which a caller gave poll=true until
>> the counter reaches 0?  I’ll try, sounds good (if I can get it to work).
> 
> Yes, that's what I have in mind.
> 
> We expect graph changes to happen during the polling, but I think the
> caller is responsible for making sure that the top-level @bs stays
> around, so we don't need to be extra careful here.
> 
> Also, bdrv_drain_invoke() is always called in the same AioContext as the
> top-level drain operation, so the whole aio_context_acquire/release
> stuff from this series should become unnecessary, and we don't need
> atomics to access the counter either.
> 
> So I think this should really simplify the series a lot.

Hm.  Unfortunately, not all nodes in a chain always have the same
AioContext.

I think they generally should, but there is at least one exception:
bdrv_set_aio_context*() itself.  bdrv_set_aio_context_ignore() drains
the node, then puts other members of the subgraph into the same
AioContext, then itself.

Now say this reaches the bottom node.  That node will not recurse
anywhere else, but only change its own AioContext, in a drained section.
 So when that section ends, the bottom node will be in a different
AioContext than the other nodes.

So, er, well.  I have three ideas:

(1) Skip the polling on the top level drained_end if the node still has
another quiesce_counter on it.  Sounds a bit too error-prone to me.

(2) Drop the drained sections in bdrv_set_aio_context_ignore().  Instead
require the root caller to have the whole subtree drained.  That way,
drained_end will never be invoked while the subtree has different
AioContexts.

(3) I need a list after all (one that only contains AioContexts, but still).


I like (3) as little as I did in this series.  (1) seems wrong.  I’ll
try (2) first.

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]