[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained se
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections |
Date: |
Tue, 16 Jul 2019 16:40:16 +0200 |
User-agent: |
Mutt/1.11.3 (2019-02-01) |
Am 19.06.2019 um 17:25 hat Max Reitz geschrieben:
> Hi,
>
> This is v2 to “block: Keep track of parent quiescing”.
>
> Please read this cover letter, because I’m very unsure about the design
> in this series and I’d appreciate some comments.
>
> As Kevin wrote in his reply to that series, the actual problem is that
> bdrv_drain_invoke() polls on every node whenever ending a drain. This
> may cause graph changes, which is actually forbidden.
>
> To solve that problem, this series makes the drain code construct a list
> of undrain operations that have been initiated, and then polls all of
> them on the root level once graph changes are acceptable.
>
> Note that I don’t like this list concept very much, so I’m open to
> alternatives.
So drain_end is different from drain_begin in that it wants to wait only
for all bdrv_drain_invoke() calls to complete, but not for other
requests that are in flight. Makes sense.
Though instead of managing a whole list, wouldn't a counter suffice?
> Furthermore, all BdrvChildRoles with BDS parents have a broken
> .drained_end() implementation. The documentation clearly states that
> this function is not allowed to poll, but it does. So this series
> changes it to a variant (using the new code) that does not poll.
>
> There is a catch, which may actually be a problem, I don’t know: The new
> variant of that .drained_end() does not poll at all, never. As
> described above, now every bdrv_drain_invoke() returns an object that
> describes when it will be done and which can thus be polled for. These
> objects are just discarded when using BdrvChildRole.drained_end(). That
> does not feel quite right. It would probably be more correct to let
> BdrvChildRole.drained_end() return these objects so the top level
> bdrv_drained_end() can poll for their completion.
>
> I decided not to do this, for two reasons:
> (1) Doing so would spill the “list of objects to poll for” design to
> places outside of block/io.c. I don’t like the design very much as
> it is, but I can live with it as long as it’s constrained to the
> core drain code in block/io.c.
> This is made worse by the fact that currently, those objects are of
> type BdrvCoDrainData. But it shouldn’t be a problem to add a new
> type that is externally visible (we only need the AioContext and
> whether bdrv_drain_invoke_entry() is done).
>
> (2) It seems to work as it is.
>
> The alternative would be to add the same GSList ** parameter to
> BdrvChildRole.drained_end() that I added in the core drain code in patch
> 2, and then let the .drained_end() implementation fill that with objects
> to poll for. (Which would be accomplished by adding a frontend to
> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
> parameter through.)
>
> Opinions?
I think I would add an int* to BdrvChildRole.drained_end() so that we
can just increase the counter whereever we need to.
> And then we have bdrv_replace_child_noperm(), which actually wants a
> polling BdrvChildRole.drained_end(). So this series adds
> BdrvChildRole.drained_end_unquiesce(), which takes that role (if there
> is any polling to do).
>
> Note that if I implemented the alternative described above
> (.drained_end() gets said GSList ** parameter), a
> .drained_end_unquiesce() wouldn’t be necessary.
> bdrv_parent_drained_end_single() could just poll the list returned by
> .drained_end() by itself.
The split between .drained_end/.drained_end_unquiesce feels wrong. It
shouldn't be the job of the BdrvChildRole to worry about this. Polling
should be handled inside bdrv_parent_drained_end_single(), like we do in
bdrv_parent_drained_begin_single(), so that the BdrvChildRole never has
to poll.
> Finally, patches 1, 8, and 9 are unmodified from v1.
> [...]
>
> include/block/block.h | 22 +++++-
> include/block/block_int.h | 23 ++++++
> block.c | 24 +++---
> block/io.c | 155 ++++++++++++++++++++++++++++++-------
> python/qemu/__init__.py | 5 +-
> tests/qemu-iotests/040 | 40 +++++++++-
> tests/qemu-iotests/040.out | 4 +-
> tests/qemu-iotests/255 | 2 +-
> 8 files changed, 231 insertions(+), 44 deletions(-)
I feel this series should add something to tests/test-bdrv-drain.c, too.
qemu-iotests can only test high-level block job commands that happen to
trigger the bug today, but that code may change in the future. Unit
tests allow us to test the problematic cases more directly.
Kevin