[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 1/3] block: add BDS field to count in-flight req

From: Paolo Bonzini
Subject: Re: [Qemu-block] [PATCH 1/3] block: add BDS field to count in-flight requests
Date: Tue, 11 Oct 2016 18:45:38 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0

On 11/10/2016 17:25, Kevin Wolf wrote:
> Am 11.10.2016 um 16:09 hat Paolo Bonzini geschrieben:
>> Is what you call "a BDS-level bdrv_isolate_begin/end" the same as my
>> "bdrv_drained_begin(bs) and bdrv_drained_end(bs), which quiesce all root
>> BdrvChildren reachable from bs"?  Anyway I think we agree.
> Ah, is your bdrv_drained_begin() working in child-to-parent direction,
> too?


>> I mentioned block jobs, block migration and the NBD server.  They do use
>> a BB (as you said, they wouldn't compile after the BdrvChild
>> conversion), but they don't have their own *separate* BB as far as I can
>> tell, with separate throttling state etc.  How far are we from that?
> They have separate BBs, we're already there.

Hey, you're right!  I don't know how I missed that!

> bdrv_drain() should only be called from bdrv_close(), yes, and I would
> agree with inactivating an image as the first step before we close it.
> However, the .bdrv_drained_begin/end callbacks in BlockDriver are still
> called in the context of isolation and that's different from
> inactivating.

Agreed, .bdrv_drained_begin/end is separate from inactivation.  That's
just "bdrv_drain" (which we agreed has to be called from bdrv_close only).

> I think my point was that you don't have to count requests at the BB
> level if you know that there are no requests pending on the BB level
> that haven't reached the BDS level yet.

I need to count requests at the BB level because the blk_aio_*
operations have a separate bottom half that is invoked if either 1) they
never reach BDS (because of an error); or 2) the bdrv_co_* call
completes without yielding.  The count must be >0 when blk_aio_*
returns, or bdrv_drain (and thus blk_drain) won't loop.  Because
bdrv_drain and blk_drain are conflated, the counter must be the BDS one.

In turn, the BDS counter is needed because of the lack of isolated
sections.  The right design would be for blk_isolate_begin to call
blk_drain on *other* BlockBackends reachable in a child-to-parent visit.
 Instead, until that is implemented, we have bdrv_drained_begin that
emulates that through the same-named callback, followed by a
parent-to-child bdrv_drain that is almost always unnecessary.

> If you move the restarting to
> throttled requests to blk_drain(), shouldn't that be all you need to do
> on the BB level for now?
> Hm, or actually, doesn't the BdrvChild callback still do this work after
> your patch?

Yes, it does.  The only difference is that the callback is not needed
anymore for bdrv_requests_pending to answer true.  With my patch,
bdrv_requests_pending always answers true if there are throttled
requests.  Without my patch, it relies on blk_root_drained_begin's call
to throttle_group_restart_blk.

The long-term view is that there would be a difference between blk_drain
and the BdrvChild callback.  blk_drain waits for throttled requests
(using the same algorithm as bdrv_drain and a BB-level counter only).
Instead, the BdrvChild callback can merely arrange to be ignored by the
throttle group.

(By the way, I need to repost this series anyway, but let's finish the
discussion first to understand what you'd like to have in 2.8).


> Maybe I'm failing to understand something important about
> the patch...
>> All in all I don't think this should be done as a single patch series
>> and certainly wouldn't be ready in time for 2.8.  These two patches
>> (plus the blockjob_drain one that I posted) are needed for me to get rid
>> of RFifoLock and fix bdrv_drain_all deadlocks.  I'd really love to do
>> that in 2.8, and the patches for that are ready to be posted (as RFC I
>> guess).
> Agreed, I don't want to make the full thing a requirement.
> Kevin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]