[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Qemu-devel] [PATCH v4 02/11] block: Filtered children
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [Qemu-block] [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions |
Date: |
Tue, 7 May 2019 13:33:42 +0000 |
07.05.2019 16:15, Max Reitz wrote:
> On 07.05.19 11:32, Vladimir Sementsov-Ogievskiy wrote:
>> 24.04.2019 19:36, Max Reitz wrote:
>>> On 19.04.19 12:23, Vladimir Sementsov-Ogievskiy wrote:
>>>> 17.04.2019 19:22, Max Reitz wrote:
>>>>> On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 10.04.2019 23:20, Max Reitz wrote:
>>>>>>> What bs->file and bs->backing mean depends on the node. For filter
>>>>>>> nodes, both signify a node that will eventually receive all R/W
>>>>>>> accesses. For format nodes, bs->file contains metadata and data, and
>>>>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>>>>> bs->file. Usually.
>>>>>>>
>>>>>>> In any case, it is not trivial to guess what a child means exactly with
>>>>>>> our currently limited form of expression. It is better to introduce
>>>>>>> some functions that actually guarantee a meaning:
>>>>>>>
>>>>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>>>> filtered through COW. That is, reads may or may not be forwarded
>>>>>>> (depending on the overlay's allocation status), but writes never
>>>>>>> go to
>>>>>>> this child.
>>>>>>>
>>>>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>>>> filtered through some very plain process. Reads and writes
>>>>>>> issued to
>>>>>>> the parent will go to the child as well (although timing, etc.
>>>>>>> may be
>>>>>>> modified).
>>>>>>>
>>>>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>>>> block layer anyway) always only have one of these children: All
>>>>>>> read
>>>>>>> requests must be served from the filtered_rw_child (if it
>>>>>>> exists), so
>>>>>>> if there was a filtered_cow_child in addition, it would not
>>>>>>> receive
>>>>>>> any requests at all.
>>>>>>> (The closest here is mirror, where all requests are passed on to
>>>>>>> the
>>>>>>> source, but with write-blocking, write requests are "COWed" to the
>>>>>>> target. But that just means that the target is a special child
>>>>>>> that
>>>>>>> cannot be introspected by the generic block layer functions, and
>>>>>>> that
>>>>>>> source is a filtered_rw_child.)
>>>>>>> Therefore, we can also add bdrv_filtered_child() which returns
>>>>>>> that
>>>>>>> one child (or NULL, if there is no filtered child).
>>>>>>>
>>>>>>> Also, many places in the current block layer should be skipping filters
>>>>>>> (all filters or just the ones added implicitly, it depends) when going
>>>>>>> through a block node chain. They do not do that currently, but this
>>>>>>> patch makes them.
>>>>>>>
>>>>>>> One example for this is qemu-img map, which should skip filters and only
>>>>>>> look at the COW elements in the graph. The change to iotest 204's
>>>>>>> reference output shows how using blkdebug on top of a COW node used to
>>>>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>>>>> patch, the allocation in the base image is reported correctly.
>>>>>>>
>>>>>>> Furthermore, a note should be made that sometimes we do want to access
>>>>>>> bs->backing directly. This is whenever the operation in question is not
>>>>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>>>>> not. This is the case in functions such as bdrv_open_backing_file() or
>>>>>>> whenever we have to deal with the special behavior of @backing as a
>>>>>>> blockdev option, which is that it does not default to null like all
>>>>>>> other child references do.
>>>>>>>
>>>>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>>>>> are modified to return any filtered child under "backing", not just
>>>>>>> bs->backing or COW children. This is so that filters do not interrupt
>>>>>>> the reported backing chain. This changes the output of iotest 184, as
>>>>>>> the throttled node now appears as a backing child.
>>>>>>>
>>>>>>> Signed-off-by: Max Reitz <address@hidden>
>>>>>>> ---
>>>>>>> qapi/block-core.json | 4 +
>>>>>>> include/block/block.h | 1 +
>>>>>>> include/block/block_int.h | 40 +++++--
>>>>>>> block.c | 210
>>>>>>> +++++++++++++++++++++++++++------
>>>>>>> block/backup.c | 8 +-
>>>>>>> block/block-backend.c | 16 ++-
>>>>>>> block/commit.c | 33 +++---
>>>>>>> block/io.c | 45 ++++---
>>>>>>> block/mirror.c | 21 ++--
>>>>>>> block/qapi.c | 30 +++--
>>>>>>> block/stream.c | 13 +-
>>>>>>> blockdev.c | 88 +++++++++++---
>>>>>>> migration/block-dirty-bitmap.c | 4 +-
>>>>>>> nbd/server.c | 6 +-
>>>>>>> qemu-img.c | 29 ++---
>>>>>>> tests/qemu-iotests/184.out | 7 +-
>>>>>>> tests/qemu-iotests/204.out | 1 +
>>>>>>> 17 files changed, 411 insertions(+), 145 deletions(-)
>>>>>>
>>>>>> really huge... didn't you consider conversion file-by-file?
>>>>>
>>>>> Frankly, no, I just didn’t consider it.
>>>>>
>>>>> Hm. I don’t know, 30-patch series always look so frightening.
>>>>>
>>>>>>> diff --git a/block.c b/block.c
>>>>>>> index 16615bc876..e8f6febda0 100644
>>>>>>> --- a/block.c
>>>>>>> +++ b/block.c
>>>>>>
>>>>>> [..]
>>>>>>
>>>>>>>
>>>>>>> @@ -3467,14 +3469,17 @@ static int
>>>>>>> bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>>>>>> /*
>>>>>>> * Find the "actual" backing file by skipping all links that
>>>>>>> point
>>>>>>> * to an implicit node, if any (e.g. a commit filter node).
>>>>>>> + * We cannot use any of the bdrv_skip_*() functions here because
>>>>>>> + * those return the first explicit node, while we are looking for
>>>>>>> + * its overlay here.
>>>>>>> */
>>>>>>> overlay_bs = bs;
>>>>>>> - while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit)
>>>>>>> {
>>>>>>> - overlay_bs = backing_bs(overlay_bs);
>>>>>>> + while (overlay_bs->backing &&
>>>>>>> bdrv_filtered_bs(overlay_bs)->implicit) {
>>>>>>
>>>>>> So, you don't want to skip implicit filters with 'file' child? Then, why
>>>>>> not to use
>>>>>> child_bs(overlay_bs->backing), like in following if condition?
>>>>>
>>>>> I think it was an artifact of writing the patch. I started with
>>>>> bdrv_filtered_bs() and then realized this depends on ->backing,
>>>>> actually. There was no functional difference so I left it as it was.
>>>>>
>>>>> But you’re right, it is more clear to use child_bs(overlay_bs->backing)
>>>>> isntead.
>>>>>
>>>>>> Could we instead make backing-based filters equal to file-based, to make
>>>>>> it possible
>>>>>> to use file-based filters in backing-chain related scenarios (like
>>>>>> upcoming copy-on-read
>>>>>> filter for stream)? So, to expand backing-chain concept to include
>>>>>> filters with file child?
>>>>>
>>>>> If I understand you correctly, that’s basically the purpose of this
>>>>> series and especially this patch here. As far as it is possible and
>>>>> reasonable, I want filters that use bs->backing and bs->file behave the
>>>>> same.
>>>>>
>>>>> However, there are cases where this is not possible and
>>>>> bdrv_reopen_parse_backing() is one such case. bs->backing and bs->file
>>>>> correspond to QAPI names, namely 'backing' and 'file'. If that
>>>>> distinction was already visible to the user, we cannot change it now.
>>>>>
>>>>> We definitely cannot make file-based filters use bs->backing now because
>>>>> you can create them over QAPI and they use 'file' as their child name.
>>>>> Can we make backing-based filters use bs->file? Seems more likely,
>>>>> because all of them are implicit nodes, so the user usually doesn’t see
>>>>> them. But usually isn’t always; they do become user-visible once the
>>>>> user specifies a node-name for mirror or commit.
>>>>>
>>>>> I found it more reasonable to introduce new functions that explicitly
>>>>> express what kind of child they expect and then apply them everywhere as
>>>>> I saw fit, instead of making the mirror/commit filter drivers use
>>>>> bs->file and hope it works; not least because I’d still have to go
>>>>> through the whole block layer and check every instance of bs->backing to
>>>>> see whether it really needs bs->backing or whether it should use either
>>>>> of bs->backing or bs->file.
>>>>>
>>>>>>> + overlay_bs = bdrv_filtered_bs(overlay_bs);
>>>>>>> }
>>>>>>>
>>>>>>> /* If we want to replace the backing file we need some extra
>>>>>>> checks */
>>>>>>> - if (new_backing_bs != backing_bs(overlay_bs)) {
>>>>>>> + if (new_backing_bs != child_bs(overlay_bs->backing)) { >
>>>>>>> /* Check for implicit nodes between bs and its backing file */
>>>>>>> if (bs != overlay_bs) {
>>>>>>> error_setg(errp, "Cannot change backing link if '%s'
>>>>>>> has "
>>>>>>
>>>>>> [..]
>>>>>>
>>>>>>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>>>>>> BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>>>>>> BlockDriverState *bs)
>>>>>>> {
>>>>>>> - while (active && bs != backing_bs(active)) {
>>>>>>> - active = backing_bs(active);
>>>>>>> + while (active && bs != bdrv_filtered_bs(active)) {
>>>>>>
>>>>>> hmm and here you actually support backing-chain with file-child-based
>>>>>> filters in it..
>>>>>
>>>>> Yes, because this is not about the QAPI 'backing' link. This function
>>>>> should continue to work even if there are filters in the backing chain.
>>>>
>>>> this is a generic function to find overlay in backing chain and it may be
>>>> used from different places,
>>>> for example it is used in Andrey's series about filter for block-stream.
>>>
>>> Well, all places that use it accept backing chains with filters inside
>>> of them.
>>>
>>>> It is used from qmp_block_commit, isn't it about QAPI?
>>>
>>> By "QAPI 'backing' link" I mean the user-visible block graph. Hm. I
>>> wrote in my other mail that you could use query-named-block-nodes to see
>>> that graph; apparently you can’t. So besides x-debug-query-block-graph,
>>> we still don’t have any facility to query the block graph? I don’t know
>>> what to say.
>>>
>>> Anyway, you can still construct the graph with blockdev-add, so it is
>>> user-visible. And in that block graph, there is a 'backing' link, and
>>> there is a 'file' link -- this is what I mean with "QAPI link".
>>>
>>> We have commands that are abstract and don’t work on specific graph
>>> links. For instance, block-commit commits across a backing chain, so it
>>> doesn’t matter whether the graph link is called 'backing' or whatever,
>>> what is important is that it’s a COW link. But we should also ignore
>>> filters on the way, so this patch makes block-commit and others use
>>> those more abstract child access functions.
>>>
>>> But whenever it is about exactly the "file" or the "backing" link, we
>>> have to use bs->file and bs->backing, respectively. That's just how it
>>> currently is.
>>>
>>>>>>> + active = bdrv_filtered_bs(active);
>>>>>>> }
>>>>>>>
>>>>>>> return active;
>>>>>>> @@ -4226,11 +4231,11 @@ bool
>>>>>>> bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState
>>>>>>> *base,
>>>>>>> {
>>>>>>> BlockDriverState *i;
>>>>>>>
>>>>>>> - for (i = bs; i != base; i = backing_bs(i)) {
>>>>>>> + for (i = bs; i != base; i = child_bs(i->backing)) {
>>>>>>
>>>>>> and here don't..
>>>>>
>>>>> Yes, because this function is about the QAPI 'backing' link.
>>>>
>>>> And this again a generic thing, that may be used in same places as
>>>> bdrv_find_overlay,
>>>
>>> But it isn’t.
>>>
>>>> and it is used in series about block-stream filter too. So, for further
>>>> developments
>>>> we'll have to keep in mind all these differences between generic block
>>>> layer functions,
>>>> which supports .file children inside backing chain and which are not...
>>>
>>> I was wrong about bdrv_is_backing_chain_frozen(), if that helps (as I
>>> wrote in my other (previous) mail).
>>>
>>> But for example bdrv_set_backing_hd() always has to use bs->backing,
>>> because that’s what it’s about (and I do change its descriptive comment
>>> to reflect that, so you don’t need to keep it in mind). Same for
>>> bdrv_open_backing_file().
>>>
>>> Hm, what other cases are there...
>>>
>>> bdrv_reopen_parse_backing(): Fundamentally, this too is about the
>>> user-visible "backing" link (as specified through x-blockdev-reopen).
>>> But the loop it contains is more difficult to translate than I had
>>> thought. At some point, there needs to be a bs->backing link, because
>>> that is what this function is about, but it should also skip all
>>> implicit filters in the way, I think. So e.g. this should be recognized:
>>>
>>> bs ---backing--> COR ---file--> base
>>>
>>> @overlay_bs should be COR, I think...? I mean, as long as COR is an
>>> implicit node. So the loop really should use bdrv_filtered_bs()
>>> everywhere, and then the same afterwards. I think that we should also
>>> ensure that @bs can support a ->backing child, but how would I check
>>> that? Maybe it’s safe to just omit such a check...
>>>
>>> But then another issue comes in: The link to replace (in the above case
>>> from "COR" to "base") is no longer necessarily a backing link. So
>>> bdrv_reopen_commit() has to be capable of replacing both bs->backing and
>>> bs->file.
>>>
>>> Actually, how does bdrv_reopen_commit() handle implicit nodes at all?
>>> bdrv_reopen_parse_backing() just sets reopen_state->replace_backing_bs
>>> and ->new_backing_bs. It doesn’t communicate anything about overlay_bs.
>>> bdrv_reopen_commit() then asserts that !bs->backing->bs->implicit and
>>> replaces bs->backing. So it seems to just fail on the implicit nodes
>>> that bdrv_reopen_parse_backing() took care to skip...
>>>
>>>
>>> OK, what else... bdrv_reopen_prepare() checks
>>> reopen_state->bs->backing, which I claim is correct because while there
>>> may be implicit filters in the chain, the first link has to be a
>>> ->backing link.
>>
>> [sorry for a long delay]
>> Are you working on next version or waiting for more reviews?
>
> I haven’t worked on the next version yet, but that’s just because other
> things were more important, not because of reviews.
>
>> Why first link should be backing? We want to skip all implicit filters,
>> including
>> file-child-based in following call to bdrv_reopen_parse_backing(). So, don't
>> we
>> want something like bdrv_backing_chain_next() here? But then a question,
>> could
>> reopen_state->bs be filter itself...
>
> Because this function is about the 'backing' option. As I explained
> above, this must correspond to a bs->backing child. If there is an
> implicit filter, it will still be under bs->backing.
>
Aha, ok, understand.
>
>>> bdrv_backing_overridden() has to query bs->backing because this function
>>> is used when it is about a specific characteristic of the backing link:
>>> There is a non-null default (given by the image header), so if the
>>> current bs->backing matches this default, you do not have to specify the
>>> backing filename in either blockdev-add or a filename. Same in
>>> bdrv_refresh_filename().
>>>
>>>
>>> I hope that was all...?
>>>
>>> Max
>>>
>>
>>
>
>
--
Best regards,
Vladimir
Re: [Qemu-block] [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions, Vladimir Sementsov-Ogievskiy, 2019/05/17
Re: [Qemu-block] [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions, Max Reitz, 2019/05/31