[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 0/2] deal with BDRV_BLOCK_RAW

From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-block] [PATCH 0/2] deal with BDRV_BLOCK_RAW
Date: Tue, 13 Aug 2019 13:00:44 +0000

13.08.2019 14:51, Kevin Wolf wrote:
> Am 13.08.2019 um 13:14 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 13.08.2019 12:33, Vladimir Sementsov-Ogievskiy wrote:
>>> 13.08.2019 12:01, Vladimir Sementsov-Ogievskiy wrote:
>>>> 13.08.2019 11:39, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 12.08.2019 22:50, Max Reitz wrote:
>>>>>> On 12.08.19 21:46, Max Reitz wrote:
>>>>>>> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> Hi all!
>>>>>>>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
>>>>>>>> broken. It should be either fixed like I propose (by Max's suggestion)
>>>>>>>> or somehow forbidden (just forbid backing-file supporting node to be
>>>>>>>> file child of raw-format node).
>>>>>>> I agree, I think only filters should return BDRV_BLOCK_RAW.
>>>>>>> (And not even them, they should just be handled transparently by
>>>>>>> bdrv_co_block_status().  But that’s something for later.)
>>>>>>>> Vladimir Sementsov-Ogievskiy (2):
>>>>>>>>     block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
>>>>>>>>     iotests: test mirroring qcow2 under raw format
>>>>>>>>    block/raw-format.c         |  2 +-
>>>>>>>>    tests/qemu-iotests/263     | 46 
>>>>>>>> ++++++++++++++++++++++++++++++++++++++
>>>>>>>>    tests/qemu-iotests/263.out | 12 ++++++++++
>>>>>>>>    tests/qemu-iotests/group   |  1 +
>>>>>>>>    4 files changed, 60 insertions(+), 1 deletion(-)
>>>>>>>>    create mode 100755 tests/qemu-iotests/263
>>>>>>>>    create mode 100644 tests/qemu-iotests/263.out
>>>>>>> Thanks, applied to my block-next branch:
>>>>>>> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next
>>>>>> Oops, maybe not.  221 needs to be adjusted.
>>>>> Hmm yes, I forget to run tests.. Areas which were zero becomes data|zero, 
>>>>> it
>>>>> don't look good.
>>>>> So, it's not quite right to report DATA | RECURSE, we actually should 
>>>>> report
>>>>> DATA_OR_ZERO | RECURSE, which is actually ALLOCATED | RECURSE, as 
>>>>> otherwise
>>>>> DATA will be set in final result (generic layer must not drop it, 
>>>>> obviously).
>>>>> ALLOCATED never returned by drivers but seems it should be. I'll think a 
>>>>> bit and
>>>>> resend something new.
>>>> Hmmm.. So, we have raw node, and assume backing chain under it. who should 
>>>> loop through it,
>>>> generic code or raw driver?
>>>> Now it all looks like generic code is responsible for looping through 
>>>> filtered chain (backing files
>>>> and filters) and driver is responsible for all it's children except for 
>>>> filtered child.
>>>> Or, driver may return something that says to generic child to handle the 
>>>> whole backing chain of returned
>>>> file at once, as it's another backing chain. And seems even RECURSE don't 
>>>> work correctly as it doesn't handle
>>>> the backing chain in this recursion. Why it works better than RAW - just 
>>>> because we return it together
>>>> with DATA flags and this DATA flag is kept anyway, independently of 
>>>> finding zeros or not.
>>> Hmm, so, is it correct that we return DATA | RECURSE, if we are not really 
>>> sure that it is data?
>>> If we see at
>>>    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
>>> seems like we should report DATA only if there is allocation..
>>>    *  t    t        t       sectors read as zero, returned file is zero at 
>>> offset
>>>    *  t    f        t       sectors read as valid from file at offset
>>>    *  f    t        t       sectors preallocated, read as zero, returned 
>>> file not
>>> so, ZERO alone doesn't guarantee that we may safely read?
>>> So, for qcow2 metadata-preallocated images, what about zero-init? We report 
>>> DATA, and probably get ZERO from
>>> file and have finally DAYA | ZERO which guarantees that read will return 
>>> zeros, but is it true?
>>> Finally, what "DATA" mean? That space is allocated and occupies disk space? 
>>> Or it only  means only ALLOCATED i.e.
>>> "read from this layer, not from backing" otherwise, and adds additional 
>>> meaning to ZERO when used together, that
>>> read will return zeros for sure?
> I think DATA means that the data for this block is provided by *file. I
> wouldn't necessarily understand it to mean that the data actually takes
> up physical disk space there.
>> Continue self-discussion.
>> Consider closer the following case:
>>   >   *  f    t        t       sectors preallocated, read as zero, returned 
>> file not
>> It actually means that we must not read, as read will return wrong
>> data, when clusters are actually zero for guest.
> It means that you need to read from bs itself to get the correct data
> (which will be zero). Even though OFFSET_VALID is set, reading from
> *file (typically bs->file->bs) at the returned offset might not give the
> right result.
>> It's OK, when for ex. qcow2 returns this combination and link to its
>> file child: it means that if you read from qcow2 node, you'll see
>> correct zeros, as qcow2 has special metadata which shows that these
>> clusters are zero. But if you read from file directly at returned
>> offset you'll see garbage, so don't do it.
> Correct.
>> But what if some node returns this combination with file == itself? It
>> actually means that you must not read, but you should call
>> block-status to understand that there are zeros. So, if some format
>> can return this combination with file == itself it means that you must
>> not read it directly, but only after checking block status.
> This doesn't make sense to me. Reading from a node is always correct.
> But you're right that DATA seems to mean something slightly different at
> the protocol level because *file cannot have a meaningful value for the
> lower layer there. In this case, DATA still seems to mean that the data
> is fetched from the lower layer (i.e. the block device on which the file
> system resides). For holes, this is not the case.
>> And file-posix is example of such driver. But file-posix holes are 
>> guaranteed to read as zero, so we can report DATA | ZERO.
>> But this will break user expirience which assumes that DATA means occupation 
>> of real disk space.
> With the above explanation, DATA shouldn't be set for holes.
> But it's still kind of inconsistent because OFFSET_VALID and the offset
> refer to bs itself and not to the lower layer.
>> ...
>> me go and re-read what we've documented in NBD protocol about block steus...
>> "DATA" turns into NBD_STATE_HOLE, which formally means nothing, and just 
>> notes that probably there is no disk space occupation
>> for this region.. So it's about disk space allocation and nothing about 
>> correctness of read.
>> and NBD_STATE_ZERO guarantees that region read as all zeroes.
>> Look at code in nbd/server.c.. Aha, it calls block_status_above and turns 
>> !ALLOCATED into HOLE. Which means that it will never
>> return HOLE for file-posix..
> Hm... This is a mess. :-)

I'm afraid the these all are consequences of really different usages of 

1. For backing-chain aware block-jobs, to work with overlays. So backing 
children are different from the others,
as they are presenting overlays, which may be used in separate, and other 
children are not overlays and "more owned" by their parents

2. To understand where are zeroes and improve performance of copying loops (not 
copying zeroes, not sending zeroes through the wire,
  use effective write_zeroes)

3. To show file-mapping (qemu-img map)

4. Mirror use it to do DISCARD if "UNALLOCATED", but seems wrong for me now.. 
For which driver bdrv_block_status_above(source, NULL,...)
will return UNALLOCATED? Seems neither file-posix nor qcow2.

anything else?

May be we should split these use cases instead of trying to combine them using 
a lot of returned flags and parameters?

Best regards,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]