Re: [Qemu-block] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA

From:	Max Reitz
Subject:	Re: [Qemu-block] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
Date:	Tue, 13 Aug 2019 18:21:30 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 13.08.19 17:41, Kevin Wolf wrote:
> Am 13.08.2019 um 16:43 hat Max Reitz geschrieben:
>> On 13.08.19 13:04, Kevin Wolf wrote:
>>> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
>>>> returned file. But is it correct behavior at all? If returned file
>>>> itself has a backing file, we may report as totally unallocated and
>>>> area which actually has data in bottom backing file.
>>>>
>>>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
>>>> by following commit with a test. Let's make raw-format behave more
>>>> correctly returning BDRV_BLOCK_DATA.
>>>>
>>>> Suggested-by: Max Reitz <address@hidden>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>>>
>>> After some reading, I think I came to the conclusion that RAW is the
>>> correct thing to do. There is indeed a problem, but this patch is trying
>>> to fix it in the wrong place.
>>>
>>> In the case where the backing file contains some data, and we have a
>>> 'raw' node above the qcow2 overlay node, the content of the respective
>>> block is not defined by the queried backing file layer, so it is
>>> completely correct that bdrv_is_allocated() returns false,like it would
>>> if you queried the qcow2 layer directly.
>>
>> I disagree.  The queried backing file layer is the raw node.  As I said,
>> in my opinion raw nodes are not filter nodes, neither in behavior (they
>> have an offset option), nor in how they are generally used (as a format).
>>
>> The raw format does not support backing files.  Therefore, everything on
>> a raw node is allocated.
>>
>> (That is, like, my opinion.)
>>
>>>                                          If it returned true, we would
>>> copy everything, which isn't right either (the test cases should may add
>>> the qemu-img map output of the target so this becomes visible).
>>
>> It is right.
> 
> So we don't even agree what mirroring the raw node should even mean.
> 
> I can the see your point when you say that the raw node has no backing
> file, so everything should be copied. But I can also see the point that
> the raw node can really just be used as a filter that limits the data
> exposed from the qcow2 layer, and you want to keep the copy a COW
> overlay over the same backing file.

But it is not a filter.  If you limit the data by using an offset, that
precisely makes it not a filter.

> Both are valid use cases in principle and there is no single right or
> wrong.

I don’t get what you mean because raw is not a filter.

If it were a filter, it’d be clear: Regarding allocation information, we
skip it.  But it isn’t.  Hence we cannot skip it.

I don’t see why you’re trying to make the point with raw, when it simply
is not a filter.

> We don't currently support the latter use case because we have only
> sync=full or sync=top, but if you could specify a base node instead, we
> could probably suport the case without all of the special-casing filter
> nodes and backing file childs.

This sounds like it’d be difficult to special-case filter nodes.  It isn’t.

Whenever the user specifies some node that should refer to a backing
chain element, all you do is call bdrv_skip_rw_filters() and you land at
the corresponding COW node.

> You would call bdrv_co_block_status_above() with the right base node and
> it would just recurse whereever the data is stored, be it bs->backing,
> bs->file or even driver-specific children. This would allow you to find
> out whether some block in the top node came from the base node that
> we're going to keep. If yes, skip it; if no, copy it.
> 
> This way we wouldn't have to decide whether raw is a filter or not,

Well, it isn’t.

> because it wouldn't make a difference. The behaviour would only depend
> on the base node given by the user. If you specified the top-level qcow2
> file as the base, you get your full copy; if you specified the backing
> qcow2, you get the partial copy where the target still uses the same
> backing file.

I simply do not see your point.

For two reasons:

(1) I don’t think anyone should use is_allocate on nodes that are not
COW nodes.  They are likely to be doing something wrong then.

You don’t seem to agree, because you say this makes the callers
complicated.  What I think is that we don’t have that many callers, and
it’s probably a good thing when they have to think about the backing
chain structure.

But anyway.

(2) What I understand is that you propose adding some way to find out
the next element in the COW chain if is_allocated returns false.  Well,
my “Deal with filters” series adds two ways to do that:

- bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)) will return the first
child behind the COW node, but that may be another filter.

- bdrv_backing_chain_next(bs) will return “the first non-filter backing
image of the first non-filter image.”.  It’s just
bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs))).

So the thing is, we don’t need to modify block_status to return that
information.  If is_allocated returns that something is not allocated,
the caller can just call one of these and thus get the next node where
to look.

When it comes to bdrv_is_allocated_above(), my series fixes that anyway,
no?  (In the sense that it does not choke on filters.)

So I cannot see how it does not come down to the callers.

Max

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-block] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE, (continued)
- [Qemu-block] [PATCH 2/2] iotests: test mirroring qcow2 under raw format, Vladimir Sementsov-Ogievskiy, 2019/08/12
  - Re: [Qemu-block] [PATCH 2/2] iotests: test mirroring qcow2 under raw format, Kevin Wolf, 2019/08/13
    - Re: [Qemu-block] [PATCH 2/2] iotests: test mirroring qcow2 under raw format, Vladimir Sementsov-Ogievskiy, 2019/08/13
    - Re: [Qemu-block] [PATCH 2/2] iotests: test mirroring qcow2 under raw format, Kevin Wolf, 2019/08/13
- Re: [Qemu-block] [PATCH 0/2] deal with BDRV_BLOCK_RAW, Max Reitz, 2019/08/12
  - Re: [Qemu-block] [PATCH 0/2] deal with BDRV_BLOCK_RAW, Max Reitz, 2019/08/12
    - Re: [Qemu-block] [PATCH 0/2] deal with BDRV_BLOCK_RAW, Vladimir Sementsov-Ogievskiy, 2019/08/13
    - Re: [Qemu-block] [PATCH 0/2] deal with BDRV_BLOCK_RAW, Vladimir Sementsov-Ogievskiy, 2019/08/13
    - Re: [Qemu-block] [PATCH 0/2] deal with BDRV_BLOCK_RAW, Vladimir Sementsov-Ogievskiy, 2019/08/13
    - Re: [Qemu-block] [PATCH 0/2] deal with BDRV_BLOCK_RAW, Vladimir Sementsov-Ogievskiy, 2019/08/13

Prev by Date: Re: [Qemu-block] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
Next by Date: Re: [Qemu-block] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
Previous by thread: Re: [Qemu-block] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
Next by thread: [Qemu-block] [PATCH 2/2] iotests: test mirroring qcow2 under raw format
Index(es):
- Date
- Thread