[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 00/25] block layer: split block APIs in global state and I
Re: [PATCH v4 00/25] block layer: split block APIs in global state and I/O
Mon, 15 Nov 2021 17:03:28 +0100
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0
On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:
Currently, block layer APIs like block-backend.h contain a mix of
functions that are either running in the main loop and under the
BQL, or are thread-safe functions and run in iothreads performing I/O.
The functions running under BQL also take care of modifying the
block graph, by using drain and/or aio_context_acquire/release.
This makes it very confusing to understand where each function
runs, and what assumptions it provided with regards to thread
We call the functions running under BQL "global state (GS) API", and
distinguish them from the thread-safe "I/O API".
The aim of this series is to split the relevant block headers in
global state and I/O sub-headers.
Despite leaving quite some comments, the series and the split seem
reasonable to me overall. (This is a pretty big series, after all, so
those “some comments” stack up against a majority of changes that seem
OK to me. :))
One thing I noticed while reviewing is that it’s really hard to verify
that no I/O function calls a GS function. What would be wonderful is
some function marker like coroutine_fn that marks GS functions (or I/O
functions) and that we could then verify the call paths. But AFAIU
we’ve always wanted precisely that for coroutine_fn and still don’t have
it, so this seems like extremely wishful thinking... :(
I think most of the issues I found can be fixed (or are even
irrelevant), the only thing that really worries me are the two places
that are clearly I/O paths that call permission functions: Namely first
block_crypto_amend_options_generic_luks() (part of the luks block
driver’s .bdrv_co_amend implementation), which calls
bdrv_child_refresh_perms(); and second fuse_do_truncate(), which calls
In the first case, we need this call so that we don’t permanently hog
the WRITE permission for the luks file, which used to be a problem, I
believe. We want to unshare the WRITE permission (and apparently also
CONSISTENT_READ) during the key update, so we need some way to
temporarily update the permissions.
I only really see four solutions for this:
(1) We somehow make the amend job run in the main context under the BQL
and have it prevent all concurrent I/O access (seems bad)
(2) We can make the permission functions part of the I/O path (seems
wrong and probably impossible?)
(3) We can drop the permissions update and permanently require the
permissions that we need when updating keys (I think this might break
existing use cases)
(4) We can acquire the BQL around the permission update call and perhaps
I don’t know how (4) would work but it’s basically the only reasonable
solution I can come up with. Would this be a way to call a BQL function
from an I/O function?
As for the second case, the same applies as above, with the differences
that we have no jobs, so this code must always run in the block device’s
AioContext (I think), which rules out (1); but (3) would become easier
(i.e. require the RESIZE permission all the time), although that too
might have an impact on existing users (don’t think so, though). In any
case, if we could do (4), that would solve the problem here, too.
And finally, another notable thing I noticed is that the way how
create-related functions are handled is inconsistent. I believe they
should all be GS functions; qmp_blockdev_create() seems to agree with me
on this, but we currently seem to have some bugs there. It’s possible
to invoke blockdev-create on a block device that’s in an I/O thread, and
then qemu crashes. Oops. (The comment in qmp_blockdev_create() says
that the block drivers’ implementations should prevent this, but
apparently they don’t...?) In any case, that’s a pre-existing bug, of
course, that doesn’t concern this series (other than that it suggests
that “create” functions should be classified as GS).