[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH v4 02/11] block: Filtered children

From: Max Reitz
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
Date: Tue, 7 May 2019 15:15:58 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 07.05.19 11:32, Vladimir Sementsov-Ogievskiy wrote:
> 24.04.2019 19:36, Max Reitz wrote:
>> On 19.04.19 12:23, Vladimir Sementsov-Ogievskiy wrote:
>>> 17.04.2019 19:22, Max Reitz wrote:
>>>> On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 10.04.2019 23:20, Max Reitz wrote:
>>>>>> What bs->file and bs->backing mean depends on the node.  For filter
>>>>>> nodes, both signify a node that will eventually receive all R/W
>>>>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>>>> bs->file.  Usually.
>>>>>> In any case, it is not trivial to guess what a child means exactly with
>>>>>> our currently limited form of expression.  It is better to introduce
>>>>>> some functions that actually guarantee a meaning:
>>>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>>>      filtered through COW.  That is, reads may or may not be forwarded
>>>>>>      (depending on the overlay's allocation status), but writes never go 
>>>>>> to
>>>>>>      this child.
>>>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>>>      filtered through some very plain process.  Reads and writes issued 
>>>>>> to
>>>>>>      the parent will go to the child as well (although timing, etc. may 
>>>>>> be
>>>>>>      modified).
>>>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>>>      block layer anyway) always only have one of these children: All read
>>>>>>      requests must be served from the filtered_rw_child (if it exists), 
>>>>>> so
>>>>>>      if there was a filtered_cow_child in addition, it would not receive
>>>>>>      any requests at all.
>>>>>>      (The closest here is mirror, where all requests are passed on to the
>>>>>>      source, but with write-blocking, write requests are "COWed" to the
>>>>>>      target.  But that just means that the target is a special child that
>>>>>>      cannot be introspected by the generic block layer functions, and 
>>>>>> that
>>>>>>      source is a filtered_rw_child.)
>>>>>>      Therefore, we can also add bdrv_filtered_child() which returns that
>>>>>>      one child (or NULL, if there is no filtered child).
>>>>>> Also, many places in the current block layer should be skipping filters
>>>>>> (all filters or just the ones added implicitly, it depends) when going
>>>>>> through a block node chain.  They do not do that currently, but this
>>>>>> patch makes them.
>>>>>> One example for this is qemu-img map, which should skip filters and only
>>>>>> look at the COW elements in the graph.  The change to iotest 204's
>>>>>> reference output shows how using blkdebug on top of a COW node used to
>>>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>>>> patch, the allocation in the base image is reported correctly.
>>>>>> Furthermore, a note should be made that sometimes we do want to access
>>>>>> bs->backing directly.  This is whenever the operation in question is not
>>>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>>>>> whenever we have to deal with the special behavior of @backing as a
>>>>>> blockdev option, which is that it does not default to null like all
>>>>>> other child references do.
>>>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>>>> are modified to return any filtered child under "backing", not just
>>>>>> bs->backing or COW children.  This is so that filters do not interrupt
>>>>>> the reported backing chain.  This changes the output of iotest 184, as
>>>>>> the throttled node now appears as a backing child.
>>>>>> Signed-off-by: Max Reitz <address@hidden>
>>>>>> ---
>>>>>>     qapi/block-core.json           |   4 +
>>>>>>     include/block/block.h          |   1 +
>>>>>>     include/block/block_int.h      |  40 +++++--
>>>>>>     block.c                        | 210 
>>>>>> +++++++++++++++++++++++++++------
>>>>>>     block/backup.c                 |   8 +-
>>>>>>     block/block-backend.c          |  16 ++-
>>>>>>     block/commit.c                 |  33 +++---
>>>>>>     block/io.c                     |  45 ++++---
>>>>>>     block/mirror.c                 |  21 ++--
>>>>>>     block/qapi.c                   |  30 +++--
>>>>>>     block/stream.c                 |  13 +-
>>>>>>     blockdev.c                     |  88 +++++++++++---
>>>>>>     migration/block-dirty-bitmap.c |   4 +-
>>>>>>     nbd/server.c                   |   6 +-
>>>>>>     qemu-img.c                     |  29 ++---
>>>>>>     tests/qemu-iotests/184.out     |   7 +-
>>>>>>     tests/qemu-iotests/204.out     |   1 +
>>>>>>     17 files changed, 411 insertions(+), 145 deletions(-)
>>>>> really huge... didn't you consider conversion file-by-file?
>>>> Frankly, no, I just didn’t consider it.
>>>> Hm.  I don’t know, 30-patch series always look so frightening.
>>>>>> diff --git a/block.c b/block.c
>>>>>> index 16615bc876..e8f6febda0 100644
>>>>>> --- a/block.c
>>>>>> +++ b/block.c
>>>>> [..]
>>>>>> @@ -3467,14 +3469,17 @@ static int 
>>>>>> bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>>>>>         /*
>>>>>>          * Find the "actual" backing file by skipping all links that 
>>>>>> point
>>>>>>          * to an implicit node, if any (e.g. a commit filter node).
>>>>>> +     * We cannot use any of the bdrv_skip_*() functions here because
>>>>>> +     * those return the first explicit node, while we are looking for
>>>>>> +     * its overlay here.
>>>>>>          */
>>>>>>         overlay_bs = bs;
>>>>>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>>>>>> -        overlay_bs = backing_bs(overlay_bs);
>>>>>> +    while (overlay_bs->backing && 
>>>>>> bdrv_filtered_bs(overlay_bs)->implicit) {
>>>>> So, you don't want to skip implicit filters with 'file' child? Then, why 
>>>>> not to use
>>>>> child_bs(overlay_bs->backing), like in following if condition?
>>>> I think it was an artifact of writing the patch.  I started with
>>>> bdrv_filtered_bs() and then realized this depends on ->backing,
>>>> actually.  There was no functional difference so I left it as it was.
>>>> But you’re right, it is more clear to use child_bs(overlay_bs->backing)
>>>> isntead.
>>>>> Could we instead make backing-based filters equal to file-based, to make 
>>>>> it possible
>>>>> to use file-based filters in backing-chain related scenarios (like 
>>>>> upcoming copy-on-read
>>>>> filter for stream)? So, to expand backing-chain concept to include 
>>>>> filters with file child?
>>>> If I understand you correctly, that’s basically the purpose of this
>>>> series and especially this patch here.  As far as it is possible and
>>>> reasonable, I want filters that use bs->backing and bs->file behave the
>>>> same.
>>>> However, there are cases where this is not possible and
>>>> bdrv_reopen_parse_backing() is one such case.  bs->backing and bs->file
>>>> correspond to QAPI names, namely 'backing' and 'file'.  If that
>>>> distinction was already visible to the user, we cannot change it now.
>>>> We definitely cannot make file-based filters use bs->backing now because
>>>> you can create them over QAPI and they use 'file' as their child name.
>>>> Can we make backing-based filters use bs->file?  Seems more likely,
>>>> because all of them are implicit nodes, so the user usually doesn’t see
>>>> them.  But usually isn’t always; they do become user-visible once the
>>>> user specifies a node-name for mirror or commit.
>>>> I found it more reasonable to introduce new functions that explicitly
>>>> express what kind of child they expect and then apply them everywhere as
>>>> I saw fit, instead of making the mirror/commit filter drivers use
>>>> bs->file and hope it works; not least because I’d still have to go
>>>> through the whole block layer and check every instance of bs->backing to
>>>> see whether it really needs bs->backing or whether it should use either
>>>> of bs->backing or bs->file.
>>>>>> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>>>>>>         }
>>>>>>         /* If we want to replace the backing file we need some extra 
>>>>>> checks */
>>>>>> -    if (new_backing_bs != backing_bs(overlay_bs)) {
>>>>>> +    if (new_backing_bs != child_bs(overlay_bs->backing)) { >           
>>>>>> /* Check for implicit nodes between bs and its backing file */
>>>>>>             if (bs != overlay_bs) {
>>>>>>                 error_setg(errp, "Cannot change backing link if '%s' has 
>>>>>> "
>>>>> [..]
>>>>>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>>>>>     BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>>>>>                                         BlockDriverState *bs)
>>>>>>     {
>>>>>> -    while (active && bs != backing_bs(active)) {
>>>>>> -        active = backing_bs(active);
>>>>>> +    while (active && bs != bdrv_filtered_bs(active)) {
>>>>> hmm and here you actually support backing-chain with file-child-based 
>>>>> filters in it..
>>>> Yes, because this is not about the QAPI 'backing' link.  This function
>>>> should continue to work even if there are filters in the backing chain.
>>> this is a generic function to find overlay in backing chain and it may be 
>>> used from different places,
>>> for example it is used in Andrey's series about filter for block-stream.
>> Well, all places that use it accept backing chains with filters inside
>> of them.
>>> It is used from qmp_block_commit, isn't it about QAPI?
>> By "QAPI 'backing' link" I mean the user-visible block graph.  Hm.  I
>> wrote in my other mail that you could use query-named-block-nodes to see
>> that graph; apparently you can’t.  So besides x-debug-query-block-graph,
>> we still don’t have any facility to query the block graph?  I don’t know
>> what to say.
>> Anyway, you can still construct the graph with blockdev-add, so it is
>> user-visible.  And in that block graph, there is a 'backing' link, and
>> there is a 'file' link -- this is what I mean with "QAPI link".
>> We have commands that are abstract and don’t work on specific graph
>> links.  For instance, block-commit commits across a backing chain, so it
>> doesn’t matter whether the graph link is called 'backing' or whatever,
>> what is important is that it’s a COW link.  But we should also ignore
>> filters on the way, so this patch makes block-commit and others use
>> those more abstract child access functions.
>> But whenever it is about exactly the "file" or the "backing" link, we
>> have to use bs->file and bs->backing, respectively.  That's just how it
>> currently is.
>>>>>> +        active = bdrv_filtered_bs(active);
>>>>>>         }
>>>>>>         return active;
>>>>>> @@ -4226,11 +4231,11 @@ bool 
>>>>>> bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState 
>>>>>> *base,
>>>>>>     {
>>>>>>         BlockDriverState *i;
>>>>>> -    for (i = bs; i != base; i = backing_bs(i)) {
>>>>>> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>>>>> and here don't..
>>>> Yes, because this function is about the QAPI 'backing' link.
>>> And this again a generic thing, that may be used in same places as 
>>> bdrv_find_overlay,
>> But it isn’t.
>>> and it is used in series about block-stream filter too. So, for further 
>>> developments
>>> we'll have to keep in mind all these differences between generic block 
>>> layer functions,
>>> which supports .file children inside backing chain and which are not...
>> I was wrong about bdrv_is_backing_chain_frozen(), if that helps (as I
>> wrote in my other (previous) mail).
>> But for example bdrv_set_backing_hd() always has to use bs->backing,
>> because that’s what it’s about (and I do change its descriptive comment
>> to reflect that, so you don’t need to keep it in mind).  Same for
>> bdrv_open_backing_file().
>> Hm, what other cases are there...
>> bdrv_reopen_parse_backing(): Fundamentally, this too is about the
>> user-visible "backing" link (as specified through x-blockdev-reopen).
>> But the loop it contains is more difficult to translate than I had
>> thought.  At some point, there needs to be a bs->backing link, because
>> that is what this function is about, but it should also skip all
>> implicit filters in the way, I think.  So e.g. this should be recognized:
>> bs  ---backing-->  COR ---file-->  base
>> @overlay_bs should be COR, I think...?  I mean, as long as COR is an
>> implicit node.  So the loop really should use bdrv_filtered_bs()
>> everywhere, and then the same afterwards.  I think that we should also
>> ensure that @bs can support a ->backing child, but how would I check
>> that?  Maybe it’s safe to just omit such a check...
>> But then another issue comes in: The link to replace (in the above case
>> from "COR" to "base") is no longer necessarily a backing link.  So
>> bdrv_reopen_commit() has to be capable of replacing both bs->backing and
>> bs->file.
>> Actually, how does bdrv_reopen_commit() handle implicit nodes at all?
>> bdrv_reopen_parse_backing() just sets reopen_state->replace_backing_bs
>> and ->new_backing_bs.  It doesn’t communicate anything about overlay_bs.
>>   bdrv_reopen_commit() then asserts that !bs->backing->bs->implicit and
>> replaces bs->backing.  So it seems to just fail on the implicit nodes
>> that bdrv_reopen_parse_backing() took care to skip...
>> OK, what else...  bdrv_reopen_prepare() checks
>> reopen_state->bs->backing, which I claim is correct because while there
>> may be implicit filters in the chain, the first link has to be a
>> ->backing link.
> [sorry for a long delay]
> Are you working on next version or waiting for more reviews?

I haven’t worked on the next version yet, but that’s just because other
things were more important, not because of reviews.

> Why first link should be backing? We want to skip all implicit filters, 
> including
> file-child-based in following call to bdrv_reopen_parse_backing(). So, don't 
> we
> want something like bdrv_backing_chain_next() here? But then a question, could
> reopen_state->bs be filter itself...

Because this function is about the 'backing' option.  As I explained
above, this must correspond to a bs->backing child.  If there is an
implicit filter, it will still be under bs->backing.


>> bdrv_backing_overridden() has to query bs->backing because this function
>> is used when it is about a specific characteristic of the backing link:
>> There is a non-null default (given by the image header), so if the
>> current bs->backing matches this default, you do not have to specify the
>> backing filename in either blockdev-add or a filename.  Same in
>> bdrv_refresh_filename().
>> I hope that was all...?
>> Max

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]