qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] block-backend: delay application of request queuing


From: Paolo Bonzini
Subject: Re: [PATCH 3/3] block-backend: delay application of request queuing
Date: Wed, 12 Apr 2023 14:03:52 +0200

On Wed, Apr 12, 2023 at 1:54 PM Hanna Czenczek <hreitz@redhat.com> wrote:
> On 05.04.23 18:31, Paolo Bonzini wrote:
> > +    if (busy || blk->in_flight) {
> > +        return true;
> > +    }
> > +
> > +    if (qatomic_read(&blk->request_queuing) == BLK_QUEUE_READY) {
> > +        qatomic_set(&blk->request_queuing, BLK_QUEUE_QUIESCENT);
> > +    }
> > +    return false;
> >   }
>
> This implicitly relies on nobody increasing blk->in_flight (or
> dev_ops->drained_poll() returning `true` again) while the BB is starting
> to be drained, because if the function were to be called again after it
> has returned `false` once per drained section (not sure if that’s
> possible![1]), then we’d end up in the original situation, with
> in_flight elevated and queuing enabled.

Yes, it does.

> Is that really strictly guaranteed somehow or is it rather a complex
> conglomerate of many cases that in the end happen to work out
> individually?  I mean, I could imagine that running
> BlockDevOps.drained_begin() is supposed to guarantee that, but it can’t,
> because only NBD seems to implement it.  The commit message talks about
> IDE being fine (by accident?) because it needs BQL availability to
> submit new requests.  But that’s very complex and I’d rather have a
> strict requirement to guarantee correctness.

It's a conglomerate of three cases each of which is sufficient (BQL,
aio_disable_external, bdrv_drained_begin---plus just not using
blk_inc_in_flight could be a fourth, of course). Of these,
aio_disable_external() is going away in favor of the
.bdrv_drained_begin callback; and blk_inc_in_flight() is used rarely
in the first place so I thought it'd be not too hard to have this
requirement.

> [1] If the blk_root_drained_poll() isn’t called anymore after returning
> `false`, all will be good, but I assume it will be, because we have a
> quiesce_counter, not a quiesce_bool.  We could kind of emulate this by
> continuing to return `false` after blk_root_drained_poll() has returned
> `false` once, until the quiesce_counter becomes 0.
> We could also have blk_root_drained_poll(), if it sees in_flight > 0 &&
> request_queuing == BLK_QUEUE_QUIESCENT, revert request_queuing to
> BLK_QUEUE_READY and resume all queued requests.

The intended long term fix is to remove request queuing and, if a
request is submitted while BLK_QUEUE_QUIESCENT, give an assertion
failure.

But since the hang requires blk_inc_in_flight() in the device, perhaps
in the short term documenting it in blk_inc_in_flight() may be enough?

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]