qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functi


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
Date: Thu, 23 May 2019 16:49:49 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 17.05.19 13:50, Vladimir Sementsov-Ogievskiy wrote:
> 07.05.2019 18:13, Max Reitz wrote:
>> On 07.05.19 15:30, Vladimir Sementsov-Ogievskiy wrote:
>>> 10.04.2019 23:20, Max Reitz wrote:
>>>> What bs->file and bs->backing mean depends on the node.  For filter
>>>> nodes, both signify a node that will eventually receive all R/W
>>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>> bs->file.  Usually.
>>>>
>>>> In any case, it is not trivial to guess what a child means exactly with
>>>> our currently limited form of expression.  It is better to introduce
>>>> some functions that actually guarantee a meaning:
>>>>
>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>     filtered through COW.  That is, reads may or may not be forwarded
>>>>     (depending on the overlay's allocation status), but writes never go to
>>>>     this child.
>>>>
>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>     filtered through some very plain process.  Reads and writes issued to
>>>>     the parent will go to the child as well (although timing, etc. may be
>>>>     modified).
>>>>
>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>     block layer anyway) always only have one of these children: All read
>>>>     requests must be served from the filtered_rw_child (if it exists), so
>>>>     if there was a filtered_cow_child in addition, it would not receive
>>>>     any requests at all.
>>>>     (The closest here is mirror, where all requests are passed on to the
>>>>     source, but with write-blocking, write requests are "COWed" to the
>>>>     target.  But that just means that the target is a special child that
>>>>     cannot be introspected by the generic block layer functions, and that
>>>>     source is a filtered_rw_child.)
>>>>     Therefore, we can also add bdrv_filtered_child() which returns that
>>>>     one child (or NULL, if there is no filtered child).
>>>>
>>>> Also, many places in the current block layer should be skipping filters
>>>> (all filters or just the ones added implicitly, it depends) when going
>>>> through a block node chain.  They do not do that currently, but this
>>>> patch makes them.
>>>>
>>>> One example for this is qemu-img map, which should skip filters and only
>>>> look at the COW elements in the graph.  The change to iotest 204's
>>>> reference output shows how using blkdebug on top of a COW node used to
>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>> patch, the allocation in the base image is reported correctly.
>>>>
>>>> Furthermore, a note should be made that sometimes we do want to access
>>>> bs->backing directly.  This is whenever the operation in question is not
>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>>> whenever we have to deal with the special behavior of @backing as a
>>>> blockdev option, which is that it does not default to null like all
>>>> other child references do.
>>>>
>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>> are modified to return any filtered child under "backing", not just
>>>> bs->backing or COW children.  This is so that filters do not interrupt
>>>> the reported backing chain.  This changes the output of iotest 184, as
>>>> the throttled node now appears as a backing child.
>>>>
>>>> Signed-off-by: Max Reitz <address@hidden>
>>>> ---
>>>>    qapi/block-core.json           |   4 +
>>>>    include/block/block.h          |   1 +
>>>>    include/block/block_int.h      |  40 +++++--
>>>>    block.c                        | 210 +++++++++++++++++++++++++++------
>>>>    block/backup.c                 |   8 +-
>>>>    block/block-backend.c          |  16 ++-
>>>>    block/commit.c                 |  33 +++---
>>>>    block/io.c                     |  45 ++++---
>>>>    block/mirror.c                 |  21 ++--
>>>>    block/qapi.c                   |  30 +++--
>>>>    block/stream.c                 |  13 +-
>>>>    blockdev.c                     |  88 +++++++++++---
>>>>    migration/block-dirty-bitmap.c |   4 +-
>>>>    nbd/server.c                   |   6 +-
>>>>    qemu-img.c                     |  29 ++---
>>>>    tests/qemu-iotests/184.out     |   7 +-
>>>>    tests/qemu-iotests/204.out     |   1 +
>>>>    17 files changed, 411 insertions(+), 145 deletions(-)
>>>>
>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>> index 7ccbfff9d0..dbd9286e4a 100644
>>>> --- a/qapi/block-core.json
>>>> +++ b/qapi/block-core.json
>>>> @@ -2502,6 +2502,10 @@
>>>>    # On successful completion the image file is updated to drop the 
>>>> backing file
>>>>    # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>    #
>>>> +# In case @device is a filter node, block-stream modifies the first 
>>>> non-filter
>>>> +# overlay node below it to point to base's backing node (or NULL if @base 
>>>> was
>>>> +# not specified) instead of modifying @device itself.
>>>> +#
>>>
>>> Is it necessary, why we can't keep it as is, modifying exactly device node? 
>>> May be,
>>> user wants to use filter in stream process, throttling for example.
>>
>> That wouldn't make any sense.  Say you have this configuration:
>>
>> throttle -> top -> base
>>
>> Now you stream from base to throttle.  The data goes from base through
>> throttle to top.  You propose to then make throttle point to base:
>>
>> throttle -> base
>>
>> This will discard all the data in top.
>>
>> Filters don’t store any data.  You need to keep the top data storing
>> image, i.e. the first non-filter overlay.
> 
> Ah, yes, good reason.
> 
>>
>>>>    # @job-id: identifier for the newly-created block job. If
>>>>    #          omitted, the device name will be used. (Since 2.7)
>>>>    #
>>
>> [...]
>>
>>>> @@ -2345,7 +2347,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
>>>> BlockDriverState *backing_hd,
>>>>        bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
>>>>            bdrv_inherits_from_recursive(backing_hd, bs);
>>>>    
>>>> -    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
>>>> +    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {
>>>
>>> If we support file-filters for frozen backing chain, could it go through 
>>> file child here?
>>> Hmm, only in case when we are going to set backing hd for file-filter.. 
>>> Hmm, could filter have
>>> both file and backing children?
>>
>> No.  A filter passes through data from its children, so it can only have
>> a single child, or it is quorum.
>>
>> The file/backing combination is reserved for COW overlays.  file is
>> where the current layer’s data is, backing is the filtered child.
> 
> My backup-top has two children - backing and target.. So, I think, we can 
> state that
> filter should not have both file and backing children, but may have any other 
> special
> children he wants, invisible for backing-child/file-child generic logic.

Ah, yes, sorry, that’s what I meant.  A filter can have only a single
filtered child, but other than that, they’re free to have whatever.

[...]

>>> Here we don't want to check the chain, we exactly want to check backing 
>>> link, so it should be
>>> something like
>>>
>>> if (bs->backing && bs->backing->frozen) {
>>>      error_setg("backig exists and frozen!");
>>>      return;
>>> }
>>>
>>>
>>> Hmm, on the other hand, if we have frozen backing chain, going through file 
>>> child, we must not add
>>> backing child to the node with file child, as it will change backing chain 
>>> (which by default goes
>>> through backing)..
>>>
>>> Anyway, we don't need to check the whole backing chain, as we may find 
>>> other frozen backing subchain,
>>> far away of bs.. So, we possibly want to check
>>>
>>> if (bdrv_filtered_child(bs) && bdrv_filtered_child(bs)->frozed) {
>>>     ERROR
>>> }
>>>
>>>
>>> ....
>>>
>>> also, we'll need to check for frozen file child, when we want to replace it.
>>
>> I don’t quite understand.  It sounds to me like you’re saying we don’t
>> need to check the whole chain here but just the immediate child.  But
>> isn’t that true regardless of this series?
> 
> If we restrict adding backing child to filter with file child, all becomes 
> simpler and seems to be correct.

OK. :-)

> Should we add check for frozen file child to bdrv_replace_child() ?

Argh.  You mean move it from bdrv_set_backing_hd()?  That actually makes
a lot of sense to me.  The problem is that bdrv_replace_child()
currently cannot return an error, which may be a problem for
bdrv_detach_child().  Hm.  But that’s effectively only called from
functions where the child is unref’d, and you have to know that your own
child is not frozen before you unref it.  So I guess we should be good
to pass an &error_abort there.

[...]

>>>> @@ -2208,7 +2218,7 @@ static int coroutine_fn 
>>>> bdrv_co_block_status_above(BlockDriverState *bs,
>>>>        bool first = true;
>>>>    
>>>>        assert(bs != base);
>>>> -    for (p = bs; p != base; p = backing_bs(p)) {
>>>> +    for (p = bs; p != base; p = bdrv_filtered_bs(p)) {
>>>>            ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, 
>>>> map,
>>>>                                       file);
>>>
>>> Interesting that for filters who use bdrv_co_block_status_from_backing and
>>> bdrv_co_block_status_from_file we will finally call .bdrv_co_block_status of
>>> underalying real node two or more times.. It's not wrong but obviously not 
>>> optimal.
>>
>> Hm.  If @p is a filter, we could skip straight to *file.  Would that work?
> 
> No, as file may be not in backing chain:
> 
> filter
>     |
>     v
> qcow2 -> file
>     |
>     v
> qcow2
> 
> So, we shouldn't redirect the whole loop to file..

But qcow2 is not a filter.  I meant skipping to *file only if the
current node is a filter.  And I don’t mean bs->file, I mean *file --
like, what bdrv_co_block_status() returns.

You say in your other mail that filters can have an own implementation
of .bdrv_co_block_status(), but I don’t think that makes sense,
actually.  They should always pass the status of their filtered child.

blkdebug is the only filter I know that has an own implementation, and
the only thing besides passing the data through is add an alignment
assertion.  If it simplifies everything else, I’m very much willing to
break that.

Max

> May be the correct solution should be introducing additional handler
> .bdrv_co_block_status_above with different logic..

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]