qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above
Date: Tue, 19 Nov 2019 12:49:58 +0000

19.11.2019 15:34, Vladimir Sementsov-Ogievskiy wrote:
> 19.11.2019 15:32, Vladimir Sementsov-Ogievskiy wrote:
>> 19.11.2019 15:17, Vladimir Sementsov-Ogievskiy wrote:
>>> 19.11.2019 15:05, Kevin Wolf wrote:
>>>> Am 16.11.2019 um 17:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>> Hi all!
>>>>>
>>>>> I wanted to understand, what is the real difference between
>>>>> bdrv_block_status_above and bdrv_is_allocated_above, IMHO
>>>>> bdrv_is_allocated_above should work through bdrv_block_status_above..
>>>>>
>>>>> And I found the problem: bdrv_is_allocated_above considers space after
>>>>> EOF as UNALLOCATED for intermediate nodes..
>>>>>
>>>>> UNALLOCATED is not about allocation at fs level, but about should we
>>>>> go to backing or not.. And it seems incorrect for me, as in case of
>>>>> short backing file, we'll read zeroes after EOF, instead of going
>>>>> further by backing chain.
>>>>
>>>> We actually have documentation what it means:
>>>>
>>>>   * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
>>>>   *                       layer rather than any backing, set by block layer
>>>>
>>>> Say we have a short overlay, like this:
>>>>
>>>> base.qcow2:     AAAAAAAA
>>>> overlay.qcow2:  BBBB
>>>>
>>>> Then of course the content of block 5 (the one after EOF of
>>>> overlay.qcow2) is still determined by overlay.qcow2, which can be easily
>>>> verified by reading it from overlay.qcow2 (produces zeros) and from
>>>> base.qcow2 (produces As).
>>>>
>>>> So the correct result when querying the block status of block 5 on
>>>> overlay.qcow2 is BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.
>>>>
>>>> Interestingly, we already fixed the opposite case (large overlay over
>>>> short backing file) in commit e88ae2264d9 from May 2014 according to the
>>>> same logic.
>>>>
>>>>> This leads to the following effect:
>>>>>
>>>>> ./qemu-img create -f qcow2 base.qcow2 2M
>>>>> ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
>>>>>
>>>>> ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
>>>>> ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
>>>>>
>>>>> Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
>>>>> ./qemu-io -c "read -P 0 1M 1M" top.qcow2
>>>>> read 1048576/1048576 bytes at offset 1048576
>>>>> 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
>>>>>
>>>>> But after commit guest visible state is changed, which seems wrong for me:
>>>>> ./qemu-img commit top.qcow2 -b mid.qcow2
>>>>>
>>>>> ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
>>>>> Pattern verification failed at offset 1048576, 1048576 bytes
>>>>> read 1048576/1048576 bytes at offset 1048576
>>>>> 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
>>>>>
>>>>> ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
>>>>> read 1048576/1048576 bytes at offset 1048576
>>>>> 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
>>>>>
>>>>>
>>>>> I don't know, is it a real bug, as I don't know, do we support backing
>>>>> file larger than its parent. Still, I'm not sure that this behavior of
>>>>> bdrv_is_allocated_above don't lead to other problems.
>>>>
>>>> I agree it's a bug.
>>>>
>>>> Your fix doesn't look right to me, though. You leave the buggy behaviour
>>>> of bdrv_co_block_status() as it is and then add four patches to work
>>>> around it in some (but not all) callers of it.

The only caller is bdrv_co_block_status_above.

>>>>
>>>> All that it should take to fix this is making the bs->backing check
>>>> independent from want_zero and let it set ALLOCATED. What I expected
>>>> would be something like the below patch.
>>>>
>>>> But it doesn't seem to fully fix the problem (though 'alloc 1M 1M' in
>>>> qemu-io shows that the range is now considered allocated), so probably
>>>> there is still a separate bug in bdrv_is_allocated_above().
>>>>
>>>> And I think we'll want an iotests case for both cases (short overlay,
>>>> short backing file).
>>>>
>>>> Kevin
>>>>
>>>>
>>>> diff --git a/block/io.c b/block/io.c
>>>> index f75777f5ea..5eafcff01a 100644
>>>> --- a/block/io.c
>>>> +++ b/block/io.c
>>>> @@ -2359,16 +2359,17 @@ static int coroutine_fn 
>>>> bdrv_co_block_status(BlockDriverState *bs,
>>>>       if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
>>>>           ret |= BDRV_BLOCK_ALLOCATED;
>>>> -    } else if (want_zero) {
>>>> -        if (bdrv_unallocated_blocks_are_zero(bs)) {
>>>> -            ret |= BDRV_BLOCK_ZERO;
>>>> -        } else if (bs->backing) {
>>>> -            BlockDriverState *bs2 = bs->backing->bs;
>>>> -            int64_t size2 = bdrv_getlength(bs2);
>>>> -
>>>> -            if (size2 >= 0 && offset >= size2) {
>>>> +    } else if (want_zero && bdrv_unallocated_blocks_are_zero(bs)) {
>>>> +        ret |= BDRV_BLOCK_ZERO;
>>>> +    } else if (bs->backing) {
>>>> +        BlockDriverState *bs2 = bs->backing->bs;
>>>> +        int64_t size2 = bdrv_getlength(bs2);
>>>> +
>>>> +        if (size2 >= 0 && offset >= size2) {
>>>> +            if (want_zero) {
>>>>                   ret |= BDRV_BLOCK_ZERO;
>>>>               }
>>>> +            ret |= BDRV_BLOCK_ALLOCATED;
>>>
>>> No, I think it's wrong, as it's not allocated at bs, but at bs->backing->bs.
>>>
>>>
>>> So, bdrv_co_block_status works correct, what we can change about it, is not
>>> to return pnum=0 if requested range after eof, but return pnum=bytes, 
>>> together
>>> with BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO | BDRV_BLOCK_EOF.
>>
>> But this will break users which rely on pnum=0.
> 
> Possibly, we may add flag to bdrv_co_block_status, to behave in such manner, 
> and
> set it to true when call from bdrv_block_status_above/bdrv_is_allocated_above.
> 
>>
>>>
>>> And actual problem is in bdrv_block_status_above and 
>>> bdrv_is_allocated_above, which
>>> I'm fixing.
>>>
>>>
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]