qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/3] block: add BDS field to count in-flight req


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH 1/3] block: add BDS field to count in-flight requests
Date: Tue, 11 Oct 2016 17:25:56 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 11.10.2016 um 16:09 hat Paolo Bonzini geschrieben:
> > Anyway, let's see which of the existing bdrv_drained_begin/end users
> > would use this (please correct):
> > 
> > * Block jobs use during completion
> >
> > * The QMP transaction commands to start block jobs drain as well, but
> >   they don't have a BlockBackend yet. Those would call a BDS-level
> >   bdrv_isolate_begin/end then, right?
> > 
> > * Quorum wants to isolate itself from new parent requests while adding a
> >   new child, that's another user for bdrv_isolate_begin/end
> > 
> > * bdrv_snapshot_delete(), probably the save BDS-level thing
> 
> Is what you call "a BDS-level bdrv_isolate_begin/end" the same as my
> "bdrv_drained_begin(bs) and bdrv_drained_end(bs), which quiesce all root
> BdrvChildren reachable from bs"?  Anyway I think we agree.

Ah, is your bdrv_drained_begin() working in child-to-parent direction,
too? Then I misunderstood and it's the same, yes.

> >> blk_drain and blk_co_drain can simply count in-flight requests, exactly
> >> as done in patch 1.  Sure, I'm adding it at the wrong layer because not
> >> every user of the block layer has already gotten its own personal BB.
> > 
> > Really? Where is it still missing? I was pretty sure I had converted all
> > users.
> 
> I mentioned block jobs, block migration and the NBD server.  They do use
> a BB (as you said, they wouldn't compile after the BdrvChild
> conversion), but they don't have their own *separate* BB as far as I can
> tell, with separate throttling state etc.  How far are we from that?

They have separate BBs, we're already there.

> > As for .bdrv_inactivate/invalidate, I agree that it should ideally
> > imply .bdrv_drained_begin/end, though in practice we don't assert
> > cleared BDRV_O_INACTIVE on reads, so I think it might not hold true.
> > 
> > I'm not so sure if .bdrv_drained_begin/end should imply inactivation,
> > though.
> 
> Again, bad naming hurts...  See above where you said that bdrv_drain
> should be called from bdrv_close only, and not be recursive.  And indeed
> in qcow2_close you see:
> 
>     if (!(s->flags & BDRV_O_INACTIVE)) {
>         qcow2_inactivate(bs);
>     }
> 
> so this in bdrv_close:
> 
>     bdrv_flush(bs);
>     bdrv_drain(bs); /* in case flush left pending I/O */
> 
> should be replaced by a non-recursive bdrv_inactivate, and bdrv_flush
> should be the default implementation of bdrv_inactivate.  The QED code
> added in patch 3 can also become its .bdrv_inactivate callback.  In
> addition, stopping the VM could also do a "full flush" without setting
> BDRV_O_INACTIVE instead of using bdrv_flush_all.

bdrv_drain() should only be called from bdrv_close(), yes, and I would
agree with inactivating an image as the first step before we close it.

However, the .bdrv_drained_begin/end callbacks in BlockDriver are still
called in the context of isolation and that's different from
inactivating.

> > Why don't we start with adding a proper blk_drain so that we don't have
> > to introduce the ugly intermediate state? This would mostly mean
> > splitting the throttling drain into one version called by blk_drain that
> > restarts all requests while ignoring the throttling, and changing the
> > BdrvChild callback to only stop sending them. Once you're there, I think
> > that gets you rid of the BDS request count manipulations from the BB
> > layer because the BDS layer already counts all requests.
> 
> I guess that would be doable.  However I don't think it's so easy.  I
> also don't think it's very interesting (that's probably where we disagree).
> 
> First of all, the current implementation of bdrv_drain is nasty,
> especially the double aio_poll and the recursive bdrv_requests_pending
> are hard to follow.  My aim for these two patches was to have an
> implementation of bdrv_drain that is more easily understandable, so that
> you can _then_ move things to the correct layer.  So even if I
> implemented a series to add a proper blk_drain, I would start with these
> two patches and then move things up.
> 
> Second, there's the issue of bdrv_drained_begin/end, which is the main
> reason why I placed the request count at the BDS level.  BlockBackend
> isolation is a requirement before you can count requests exclusively at
> the BB level.  At which point, implementing isolated sections properly
> (including migration from aio_disable/enable_external to two new
> BlockDevOps, and giving NBD+jobs+migration a separate BB) is more
> interesting than cobbling together the minimum that's enough to
> eliminate bs->in_flight.

I think my point was that you don't have to count requests at the BB
level if you know that there are no requests pending on the BB level
that haven't reached the BDS level yet. If you move the restarting o
throttled requests to blk_drain(), shouldn't that be all you need to do
on the BB level for now?

Of course, having the full thing would be nice, but I don't think it's a
requirement.

Hm, or actually, doesn't the BdrvChild callback still do this work after
your patch? Maybe I'm failing to understand something important about
the patch...

> All in all I don't think this should be done as a single patch series
> and certainly wouldn't be ready in time for 2.8.  These two patches
> (plus the blockjob_drain one that I posted) are needed for me to get rid
> of RFifoLock and fix bdrv_drain_all deadlocks.  I'd really love to do
> that in 2.8, and the patches for that are ready to be posted (as RFC I
> guess).

Agreed, I don't want to make the full thing a requirement.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]