qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v2 11/17] block-backend: Decrease in_flight only


From: Kevin Wolf
Subject: Re: [Qemu-block] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
Date: Mon, 17 Sep 2018 14:53:57 +0200
User-agent: Mutt/1.9.1 (2017-09-22)

Am 17.09.2018 um 14:38 hat Paolo Bonzini geschrieben:
> On 17/09/2018 13:48, Kevin Wolf wrote:
> > Am 14.09.2018 um 19:38 hat Paolo Bonzini geschrieben:
> >> On 14/09/2018 19:14, Kevin Wolf wrote:
> >>>> As you mention, you could have a nested aio_poll() in the main thread,
> >>>> for example invoked from a bottom half, but in that case I'd rather
> >>>> track the caller that is creating the bottom half and see if it lacks a
> >>>> bdrv_ref/bdrv_unref (or perhaps it's even higher in the tree that is
> >>>> missing).
> >>> I went back to the commit where I first added the patch (it already
> >>> contained the ref/unref pair) and tried if I could reproduce a bug with
> >>> the pair removed. I couldn't.
> >>>
> >>> I'm starting to think that maybe I was just overly cautious with the
> >>> ref/unref. I may have confused the nested aio_poll() crash with a
> >>> different situation. I've dealt with so many crashes and hangs while
> >>> working on this series that it's quite possible.
> >>
> >> Are you going to drop the patch hen?
> > 
> > I think I can drop the ref/unref pair, but not the whole patch (whose
> > main point is reordering dec_in_flight vs. the AIO callback).
> 
> You're right, though I think I did that on purpose back in the day.
> IIRC it was related to bdrv_drain, which might never complete if called
> from an AIO callback.

Hm... This seems to become a common pattern, it's the same as for the
job completion callbacks (only improved enough for the bug at hand to
disappear instead of properly fixed in "blockjob: Lie better in
child_job_drained_poll()").

Either you say there is no activity even though there is still a
callback pending, then bdrv_drain() called from elsewhere will return
too early and we get a bug. Or you say there is activity, then any
nested drain inside that callback will deadlock and we get a bug, too.

So I suppose we need some way to know which activities to ignore during
drain, depending on who is the caller? :-/

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]