qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 2/2] block: avoid recursive block_status call


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v2 2/2] block: avoid recursive block_status call if possible
Date: Tue, 28 May 2019 17:47:50 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 28.05.19 08:39, Vladimir Sementsov-Ogievskiy wrote:
> 27.05.2019 18:13, Max Reitz wrote:
>> On 08.04.19 18:26, Vladimir Sementsov-Ogievskiy wrote:
>>> drv_co_block_status digs bs->file for additional, more accurate search
>>> for hole inside region, reported as DATA by bs since 5daa74a6ebc.
>>>
>>> This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
>>> knows, where are holes and where is data. But every block_status
>>> request calls lseek additionally. Assume a big disk, full of
>>> data, in any iterative copying block job (or img convert) we'll call
>>> lseek(HOLE) on every iteration, and each of these lseeks will have to
>>> iterate through all metadata up to the end of file. It's obviously
>>> ineffective behavior. And for many scenarios we don't need this lseek
>>> at all.
>>>
>>> However, lseek is needed when we have metadata-preallocated image.
>>>
>>> So, let's detect metadata-preallocation case and don't dig qcow2's
>>> protocol file in other cases.
>>>
>>> The idea is to compare allocation size in POV of filesystem with
>>> allocations size in POV of Qcow2 (by refcounts). If allocation in fs is
>>> significantly lower, consider it as metadata-preallocation case.
>>>
>>> 102 iotest changed, as our detector can't detect shrinked file as
>>> metadata-preallocation, which don't seem to be wrong, as with metadata
>>> preallocation we always have valid file length.
>>>
>>> Other two iotests tiny changed QMP output sequence, which should be
>>> exactly because skipped lseek at mirror beginning.
>>>
>>> Suggested-by: Denis V. Lunev <address@hidden>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>>> ---
>>>   block/qcow2.h              |  4 ++++
>>>   include/block/block.h      |  8 +++++++-
>>>   block/io.c                 |  9 ++++++++-
>>>   block/qcow2-refcount.c     | 32 ++++++++++++++++++++++++++++++++
>>>   block/qcow2.c              | 11 +++++++++++
>>>   tests/qemu-iotests/102     |  2 +-
>>>   tests/qemu-iotests/102.out |  3 ++-
>>>   tests/qemu-iotests/141.out |  2 +-
>>>   tests/qemu-iotests/144.out |  2 +-
>>>   9 files changed, 67 insertions(+), 6 deletions(-)
>>
>> For me, this patch breaks iotests 141 (for qed) and 211 (for vdi):
>>
>>> 141 1s ...        [17:11:53] [17:11:53] - output mismatch (see 141.out.bad)
>>> --- tests/qemu-iotests/141.out 2019-05-27 17:11:43.327664282 +0200
>>> +++ build/tests/qemu-iotests/141.out.bad       2019-05-27 
>>> 17:11:53.949439880 +0200
>>> @@ -42,9 +42,9 @@
>>>   {"return": {}}
>>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
>>> "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
>>> "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
>>> -{"return": {}}
>>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
>>> "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
>>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
>>> "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 
>>> 0, "speed": 0, "type": "commit"}}
>>> +{"return": {}}
>>>   {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block 
>>> device is in use by block job: commit"}}
>>>   {"return": {}}
>>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
>>> "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
>>
>> and
>>
>>> 211 5s ...        [17:11:54] [17:11:58] - output mismatch (see 211.out.bad)
>>> --- tests/qemu-iotests/211.out 2019-05-22 19:58:34.870273427 +0200
>>> +++ build/tests/qemu-iotests/211.out.bad       2019-05-27 
>>> 17:11:58.259348827 +0200
>>> @@ -55,8 +55,7 @@
>>>   virtual size: 32 MiB (33554432 bytes)
>>>   cluster_size: 1048576
>>>   
>>> -[{ "start": 0, "length": 3072, "depth": 0, "zero": false, "data": true, 
>>> "offset": 1024},
>>> -{ "start": 3072, "length": 33551360, "depth": 0, "zero": true, "data": 
>>> true, "offset": 4096}]
>>> +[{ "start": 0, "length": 33554432, "depth": 0, "zero": false, "data": 
>>> true, "offset": 1024}]
>>>   
>>>   === Invalid BlockdevRef ===
>>
>> Doesn’t look too bad, but still, broken iotests are broken iotests. :/
>>
> 
> 
> You are right, thanks for pointing to this! So, here is my investigation:
> 
> about 211: aha, looks like I just didn't implement metadata preallocation 
> detection for vdi,
> and default behavior is changed. I don't know vdi code, but after fast look, 
> it seems that
> it may be as simple as (and it fixes the test):
> 
> 
> diff --git a/block/vdi.c b/block/vdi.c
> index d7ef6628e7..a72333dcf4 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -542,7 +542,9 @@ static int coroutine_fn 
> vdi_co_block_status(BlockDriverState *bs,
>       *map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
>           index_in_block;
>       *file = bs->file->bs;
> -    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
> +
> +    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID |
> +            (s->header.image_type == VDI_TYPE_STATIC ? BDRV_BLOCK_RECURSE : 
> 0);
>   }

That would be the same as what this patch implements for qcow2, yes.

> Or, we can rollback vdi behavior to pre-patch state:
> 
> 
> 
> diff --git a/block/vdi.c b/block/vdi.c
> index d7ef6628e7..074256d845 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -542,7 +542,7 @@ static int coroutine_fn 
> vdi_co_block_status(BlockDriverState *bs,
>       *map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
>           index_in_block;
>       *file = bs->file->bs;
> -    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
> +    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_RECURSE;
>   }
> 
>   static int coroutine_fn
> 
> 
> 
> 
> about 141: aha, it's a difference between qcow2 and qed, because of changed 
> io sequence for
> qcow2, as Kevin described "What happens is that 
> qcow2_detect_metadata_preallocation() causes
> additional I/O by reading in the refcount block.". I think correct way to fix 
> it is to filter
> {"return": {}} elements, as we can't be sure where they occur among events:
> 
> 
> 
> 
> diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
> index 2197a82d45..2704695921 100755
> --- a/tests/qemu-iotests/141
> +++ b/tests/qemu-iotests/141
> @@ -58,16 +58,20 @@ test_blockjob()
>                 }}}" \
>           'return'
> 
> +    # We may get {"return": {}} result of the following command among events 
> in
> +    # the following output or after all events in the next output (if $2 is 
> an
> +    # event, not 'return'). So, filter it here and in the following output to
> +    # avoid race in test output.
>       _send_qemu_cmd $QEMU_HANDLE \
>           "$1" \
>           "$2" \
> -        | _filter_img_create
> +        | _filter_img_create | _filter_qmp_empty_return
> 
>       # We want this to return an error because the block job is still running
>       _send_qemu_cmd $QEMU_HANDLE \
>           "{'execute': 'blockdev-del',
>             'arguments': {'node-name': 'drv0'}}" \
> -        'error' | _filter_generated_node_ids
> +        'error' | _filter_generated_node_ids | _filter_qmp_empty_return
> 
>       _send_qemu_cmd $QEMU_HANDLE \
>           "{'execute': 'block-job-cancel',
> diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
> index 4d71d9dcae..dbd3bdef6c 100644
> --- a/tests/qemu-iotests/141.out
> +++ b/tests/qemu-iotests/141.out
> @@ -10,7 +10,6 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 
> backing_file=TEST_DIR/m.
>   Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 
> backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> -{"return": {}}
>   {"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
> @@ -27,7 +26,6 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 
> backing_file=TEST_DIR/t.
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 
> 0, "type": "mirror"}}
> -{"return": {}}
>   {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block 
> device is in use by block job: mirror"}}
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
> @@ -42,7 +40,6 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 
> backing_file=TEST_DIR/t.
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> -{"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 
> 0, "type": "commit"}}
>   {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block 
> device is in use by block job: commit"}}
> @@ -61,7 +58,6 @@ wrote 1048576/1048576 bytes at offset 0
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> -{"return": {}}
>   {"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
> @@ -77,7 +73,6 @@ wrote 1048576/1048576 bytes at offset 0
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> -{"return": {}}
>   {"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
> diff --git a/tests/qemu-iotests/common.filter 
> b/tests/qemu-iotests/common.filter
> index 35fddc746f..8e9235d6fe 100644
> --- a/tests/qemu-iotests/common.filter
> +++ b/tests/qemu-iotests/common.filter
> @@ -219,5 +219,10 @@ _filter_nbd()
>           -e 's#\(foo\|PORT/\?\|.sock\): Failed to .*$#\1#'
>   }
> 
> +_filter_qmp_empty_return()
> +{
> +    grep -v '{"return": {}}'
> +}
> +
>   # make sure this script returns success
>   true

I’m not really a fan of this, but I don’t think there is an alternative
for a bash test.  So that does look like the best solution for me.

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]