qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 03/12] block: Filtered children access functi


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v3 03/12] block: Filtered children access functions
Date: Fri, 15 Feb 2019 17:45:38 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0

On 14.02.19 03:25, Eric Blake wrote:
> On 2/13/19 4:53 PM, Max Reitz wrote:
>> What bs->file and bs->backing mean depends on the node.  For filter
>> nodes, both signify a node that will eventually receive all R/W
>> accesses.  For format nodes, bs->file contains metadata and data, and
>> bs->backing will not receive writes -- instead, writes are COWed to
>> bs->file.  Usually.
>>
>> In any case, it is not trivial to guess what a child means exactly with
>> our currently limited form of expression.  It is better to introduce
>> some functions that actually guarantee a meaning:
>>
>> - bdrv_filtered_cow_child() will return the child that receives requests
>>   filtered through COW.  That is, reads may or may not be forwarded
>>   (depending on the overlay's allocation status), but writes never go to
>>   this child.
>>
>> - bdrv_filtered_rw_child() will return the child that receives requests
>>   filtered through some very plain process.  Reads and writes issued to
>>   the parent will go to the child as well (although timing, etc. may be
>>   modified).
>>
>> - All drivers but quorum (but quorum is pretty opaque to the general
>>   block layer anyway) always only have one of these children: All read
>>   requests must be served from the filtered_rw_child (if it exists), so
>>   if there was a filtered_cow_child in addition, it would not receive
>>   any requests at all.
>>   (The closest here is mirror, where all requests are passed on to the
>>   source, but with write-blocking, write requests are "COWed" to the
>>   target.  But that just means that the target is a special child that
>>   cannot be introspected by the generic block layer functions, and that
>>   source is a filtered_rw_child.)
>>   Therefore, we can also add bdrv_filtered_child() which returns that
>>   one child (or NULL, if there is no filtered child).
>>
>> Also, many places in the current block layer should be skipping filters
>> (all filters or just the ones added implicitly, it depends) when going
>> through a block node chain.  They do not do that currently, but this
>> patch makes them.
>>
>> One example for this is qemu-img map, which should skip filters and only
>> look at the COW elements in the graph.  The change to iotest 204's
>> reference output shows how using blkdebug on top of a COW node used to
>> make qemu-img map disregard the rest of the backing chain, but with this
>> patch, the allocation in the base image is reported correctly.
>>
>> Furthermore, a note should be made that sometimes we do want to access
>> bs->backing directly.  This is whenever the operation in question is not
>> about accessing the COW child, but the "backing" child, be it COW or
>> not.  This is the case in functions such as bdrv_open_backing_file() or
>> whenever we have to deal with the special behavior of @backing as a
>> blockdev option, which is that it does not default to null like all
>> other child references do.
>>
>> Finally, the query functions (query-block and query-named-block-nodes)
>> are modified to return any filtered child under "backing", not just
>> bs->backing or COW children.  This is so that filters do not interrupt
>> the reported backing chain.  This changes the output of iotest 184, as
>> the throttled node now appears as a backing child.
>>
>> Signed-off-by: Max Reitz <address@hidden>
>> ---
> 
>> +++ b/qapi/block-core.json
>> @@ -2417,6 +2417,10 @@
>>  # On successful completion the image file is updated to drop the backing 
>> file
>>  # and the BLOCK_JOB_COMPLETED event is emitted.
>>  #
>> +# In case @device is a filter node, block-stream modifies the first 
>> non-filter
>> +# overlay node below it to point to base's backing node (or NULL if @base 
>> was
>> +# not specified) instead of modifying @device itself.
> 
> Maybe s/In case/If/

Works for me.

>> +/*
>> + * For a backing chain, return the first non-filter backing image.
>> + */
>> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
>> +{
>> +    return 
>> bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)));
> 
> Quite a mouthful, but looks correct.
> 
> 
>> +++ b/block/io.c
>> @@ -118,8 +118,17 @@ static void bdrv_merge_limits(BlockLimits *dst, const 
>> BlockLimits *src)
>>  void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>>  {
>>      BlockDriver *drv = bs->drv;
>> +    BlockDriverState *storage_bs;
>> +    BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
> 
> If the backing file is filtered by a blkdebug layer that intentionally
> is trying to advertise alternative block sizes...
> 
>>      Error *local_err = NULL;
>>  
>> +    /*
>> +     * FIXME: There should be a function for this, and in fact there
>> +     * will be as of a follow-up patch.
>> +     */
>> +    storage_bs =
>> +        child_bs(bs->file) ?: bdrv_filtered_rw_bs(bs);
>> +
>>      memset(&bs->bl, 0, sizeof(bs->bl));
>>  
>>      if (!drv) {
>> @@ -131,13 +140,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error 
>> **errp)
>>                                  drv->bdrv_aio_preadv) ? 1 : 512;
>>  
>>      /* Take some limits from the children as a default */
>> -    if (bs->file) {
>> -        bdrv_refresh_limits(bs->file->bs, &local_err);
>> +    if (storage_bs) {
>> +        bdrv_refresh_limits(storage_bs, &local_err);
>>          if (local_err) {
>>              error_propagate(errp, local_err);
>>              return;
>>          }
>> -        bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
>> +        bdrv_merge_limits(&bs->bl, &storage_bs->bl);
>>      } else {
>>          bs->bl.min_mem_alignment = 512;
>>          bs->bl.opt_mem_alignment = getpagesize();
>> @@ -146,13 +155,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error 
>> **errp)
>>          bs->bl.max_iov = IOV_MAX;
>>      }
>>  
>> -    if (bs->backing) {
>> -        bdrv_refresh_limits(bs->backing->bs, &local_err);
>> +    if (cow_bs) {
>> +        bdrv_refresh_limits(cow_bs, &local_err);
> 
> ...then this change means that the active layer no longer picks up the
> blkdebug block sizes, but the original COW layer sizes.

How so?

I don't see how this changes current behavior with just COW/blkdebug
nodes at all.  For blkdebug, bs->file is the storage_bs.  For COW nodes,
bs->file is the storage_bs and bs->backing is the cow_bs.  So these code
changes do nothing in that case.

(They only change something for filter nodes that use bs->backing for
their filtered node; because that is translated to storage_bs now.)


Note that bdrv_filtered_cow_bs() does not skip filters (i.e. the
blkdebug node).  That would be bdrv_backing_chain_next().

> Is that
> intentional?  I don't think it is fatal to the patch (as blkdebug is not
> used in production, but only in testing), but it may cause some
> head-scratching when trying to test behaviors of a COW child with
> different block sizes than the active layer by using blkdebug on top of
> the COW child.  I guess I'll find out soon enough (on my todo list is
> fixing NBD to never split an NBD_CMD_READ or NBD_CMD_BLOCK_STATUS reply
> below the granularity advertised at the active layer, even if the
> backing file has a smaller granularity - and using blkdebug to force the
> backing file granularity was my original plan of attack - but it is not
> until this patch is applied that NBD can even locate the bitmap in a
> backing file when a filter is interposed)
> 
> Reviewed-by: Eric Blake <address@hidden>

Thanks!

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]