qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] block-backend: delay application of request queuing


From: Hanna Czenczek
Subject: Re: [PATCH 3/3] block-backend: delay application of request queuing
Date: Wed, 12 Apr 2023 16:33:38 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1

On 12.04.23 14:03, Paolo Bonzini wrote:
On Wed, Apr 12, 2023 at 1:54 PM Hanna Czenczek <hreitz@redhat.com> wrote:
On 05.04.23 18:31, Paolo Bonzini wrote:
+    if (busy || blk->in_flight) {
+        return true;
+    }
+
+    if (qatomic_read(&blk->request_queuing) == BLK_QUEUE_READY) {
+        qatomic_set(&blk->request_queuing, BLK_QUEUE_QUIESCENT);
+    }
+    return false;
   }
This implicitly relies on nobody increasing blk->in_flight (or
dev_ops->drained_poll() returning `true` again) while the BB is starting
to be drained, because if the function were to be called again after it
has returned `false` once per drained section (not sure if that’s
possible![1]), then we’d end up in the original situation, with
in_flight elevated and queuing enabled.
Yes, it does.

Is that really strictly guaranteed somehow or is it rather a complex
conglomerate of many cases that in the end happen to work out
individually?  I mean, I could imagine that running
BlockDevOps.drained_begin() is supposed to guarantee that, but it can’t,
because only NBD seems to implement it.  The commit message talks about
IDE being fine (by accident?) because it needs BQL availability to
submit new requests.  But that’s very complex and I’d rather have a
strict requirement to guarantee correctness.
It's a conglomerate of three cases each of which is sufficient (BQL,
aio_disable_external, bdrv_drained_begin---plus just not using
blk_inc_in_flight could be a fourth, of course). Of these,
aio_disable_external() is going away in favor of the
.bdrv_drained_begin callback; and blk_inc_in_flight() is used rarely
in the first place so I thought it'd be not too hard to have this
requirement.

Does IDE’s BQL requirement work for nested drains, though, i.e. when you have a drained_begin, followed by another?  The commit message doesn’t say whether it’s impossible for IDE to create a new request in between the two.

I’m a bit afraid that these cases are too complicated for me to fully comprehend.

[1] If the blk_root_drained_poll() isn’t called anymore after returning
`false`, all will be good, but I assume it will be, because we have a
quiesce_counter, not a quiesce_bool.  We could kind of emulate this by
continuing to return `false` after blk_root_drained_poll() has returned
`false` once, until the quiesce_counter becomes 0.
We could also have blk_root_drained_poll(), if it sees in_flight > 0 &&
request_queuing == BLK_QUEUE_QUIESCENT, revert request_queuing to
BLK_QUEUE_READY and resume all queued requests.
The intended long term fix is to remove request queuing and, if a
request is submitted while BLK_QUEUE_QUIESCENT, give an assertion
failure.

Yep, that would be a nice obvious requirement.

But since the hang requires blk_inc_in_flight() in the device, perhaps
in the short term documenting it in blk_inc_in_flight() may be enough?

Technically it needs a blk_inc_in_flight() whose blk_dec_in_flight() depends on a different request that can be queued (which is only the case in IDE), so I suppose we could document exactly that in those functions’ interfaces, i.e. that users must take care not to use blk_inc_flight() while the BlockBackend is (being) drained, when the associated blk_dec_in_flight() may depend on an I/O request to the BB.

I think that should be enough, yes.  Well, as long as you can guarantee that IDE will indeed fulfill that requirement, because I find it difficult to see/prove...

Hanna




reply via email to

[Prev in Thread] Current Thread [Next in Thread]