qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v10 09/17] qcow2: Optimize write zero of unalign


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster
Date: Fri, 28 Apr 2017 22:48:29 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.0

On 27.04.2017 03:46, Eric Blake wrote:
> We've already improved discards to operate efficiently on the tail
> of an unaligned qcow2 image; it's time to make a similar improvement
> to write zeroes.  The special case is only valid at the tail
> cluster of a file, where we must recognize that any sectors beyond
> the image end would implicitly read as zero, and therefore should
> not penalize our logic for widening a partial cluster into writing
> the whole cluster as zero.
> 
> Update test 154 to cover the new scenarios, using two images of
> intentionally differing length.
> 
> While at it, fix the test to gracefully skip when run as
> ./check -qcow2 -o compat=0.10 154
> since the older format lacks zero clusters already required earlier
> in the test.
> 
> Signed-off-by: Eric Blake <address@hidden>
> 
> ---
> v10: rebase to better reporting of preallocated zero clusters
> v9: new patch
> ---
>  block/qcow2.c              |   7 +++
>  tests/qemu-iotests/154     | 112 
> ++++++++++++++++++++++++++++++++++++++++++++-
>  tests/qemu-iotests/154.out |  83 +++++++++++++++++++++++++++++++++
>  3 files changed, 200 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 5c1573c..8038793 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2449,6 +2449,10 @@ static bool is_zero_sectors(BlockDriverState *bs, 
> int64_t start,
>      BlockDriverState *file;
>      int64_t res;
> 
> +    if (start + count > bs->total_sectors) {
> +        count = bs->total_sectors - start;
> +    }
> +
>      if (!count) {
>          return true;
>      }
> @@ -2467,6 +2471,9 @@ static coroutine_fn int 
> qcow2_co_pwrite_zeroes(BlockDriverState *bs,
>      uint32_t tail = (offset + count) % s->cluster_size;
> 
>      trace_qcow2_pwrite_zeroes_start_req(qemu_coroutine_self(), offset, 
> count);
> +    if (offset + count == bs->total_sectors * BDRV_SECTOR_SIZE) {
> +        tail = 0;
> +    }
> 
>      if (head || tail) {
>          int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS;
> diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154
> index 7ca7219..005f09f 100755
> --- a/tests/qemu-iotests/154
> +++ b/tests/qemu-iotests/154
> @@ -2,7 +2,7 @@
>  #
>  # qcow2 specific bdrv_pwrite_zeroes tests with backing files (complements 
> 034)
>  #
> -# Copyright (C) 2016 Red Hat, Inc.
> +# Copyright (C) 2016-2017 Red Hat, Inc.
>  #
>  # This program is free software; you can redistribute it and/or modify
>  # it under the terms of the GNU General Public License as published by
> @@ -42,7 +42,10 @@ _supported_proto file
>  _supported_os Linux
> 
>  CLUSTER_SIZE=4k
> -size=128M
> +size=$((128 * 1024 * 1024))
> +
> +# This test requires zero clusters, added in v3 images
> +_unsupported_imgopts compat=0.10
> 
>  echo
>  echo == backing file contains zeros ==
> @@ -299,6 +302,111 @@ $QEMU_IO -c "read -P 0 75k 1k" "$TEST_IMG" | 
> _filter_qemu_io
> 
>  $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> 
> +echo
> +echo == unaligned image tail cluster, no allocation needed ==
> +
> +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))

Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I
actually think that would be a better test because as it is, the image's
"unaligned" tail is exactly one cluster (so it isn't really unaligned).

> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
> +
> +# With no backing file, write to all or part of unallocated partial cluster
> +
> +# Backing file: 128m: ... | 00 --

Not sure what the "00 --" is supposed to mean. The cluster size is 1 kB,
so this is just a single cluster; which will be converted from
unallocated to a zero cluster because the tail reads as zero.

> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
> +
> +# Backing file: 128m: ... | -- 00

Same here, but now it's the head that reads as zero (and it's already a
zero cluster so nothing changes here).

> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io

But I suppose you mean $((size + 512))?

> +
> +# Backing file: 128m: ... | 00 00

And finally this doesn't change anything either -- but then again this
is a fully aligned request, so I don't quite see the point in testing
this here.

> +$QEMU_IO -c "write -z $size 1024" "$TEST_IMG.base" | _filter_qemu_io
> +
> +# No offset should be allocated, although the cluster itself is considered
> +# allocated due to the zero flag
> +$QEMU_IO -c "alloc $size 1024" "$TEST_IMG.base" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map
> +
> +# With backing file that reads as zero, write to all or part of entire 
> cluster
> +
> +# Backing file: 128m: ... | 00 00
> +# Active layer: 128m: ... | 00 00 00 00
> +$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +
> +# Backing file: 128m: ... | 00 00
> +# Active layer: 128m: ... | 00 00 -- --

How so? Shouldn't this just stay a zero cluster because the rest of the
cluster already reads as zero?

> +$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG" | _filter_qemu_io
> +
> +# Backing file: 128m: ... | 00 00
> +# Active layer: 128m: ... | -- -- 00 00

Same here.

> +$QEMU_IO -c "write -z $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
> +
> +# Backing file: 128m: ... | 00 00
> +# Active layer: 128m: ... | -- 00 00 --

Same here.

Also, I'm not quite sure what you are testing by just issuing three
consecutive write requests without checking the result each time. You
assume it to be something which you wrote in the comments above each,
but you can't be sure (yes, if something goes wrong in between, the
following final map should catch it, but it's not for certain).

> +$QEMU_IO -c "write -z $((size + 512)) 1024" "$TEST_IMG" | _filter_qemu_io
> +
> +# No offset should be allocated, although the cluster itself is considered
> +# allocated due to the zero flag
> +$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +
> +# Preallocated cluster is not freed when unmap is not requested
> +
> +# Backing file: 128m: ... | XX -- => ... | XX 00
> +$QEMU_IO -c "write $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io
> +$QEMU_IO -c "write -z $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io
> +
> +# Backing file: 128m: ... | XX 00 => ... | 00 00
> +$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG.base" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map |
> +    sed 's/"offset": [0-9]*/"offset": OFFSET/g'
> +
> +echo
> +echo == unaligned image tail cluster, allocation required ==
> +
> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
> +
> +# Write beyond backing file must COW
> +# Backing file: 128m: ... | XX --

Here, the "--" is an unallocated cluster...

> +# Active layer: 128m: ... | -- -- 00 --

...while here it is normally allocated data. Or is "--" supposed to mean
you just don't write to that area...?

> +
> +$QEMU_IO -c "write -P 1 $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io
> +$QEMU_IO -c "write -z $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 1 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 0 $((size + 512)) 1536" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +
> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
> +
> +# Writes at boundaries of (partial) cluster must not lose mid-cluster data
> +# Backing file: 128m: ... | -- XX
> +# Active layer: 128m: ... | 00 -- -- 00
> +
> +$QEMU_IO -c "write -P 1 $((size + 512)) 512" "$TEST_IMG.base" | 
> _filter_qemu_io
> +$QEMU_IO -c "write -z $((size)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_> 
> +$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io

[1]

> +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +
> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
> +
> +# Partial write must not lose data
> +# Backing file: 128m: ... | -- --
> +# Active layer: 128m: ... | -- -- XX 00

Well, you have basically tested this in [1] already. After the first
write -z to the active layer, the final four sectors are fully allocated
(as they are in a single cluster) and look like this:

00 01 00 00 = XX XX XX XX

(Because the 00s are just written as data.)

Now the write -z in [1] just overwrites the last of those four sectors
with zero data (albeit it being zero already).

But if you want to test it again, I'm not stopping you. :-)

Max

> +
> +$QEMU_IO -c "write -P 1 $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 0 $((size)) 1024" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 1 $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 0 $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +
>  # success, all done
>  echo "*** done"
>  rm -f $seq.full
> diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
> index da9eabd..f8a09af 100644
> --- a/tests/qemu-iotests/154.out
> +++ b/tests/qemu-iotests/154.out
> @@ -282,4 +282,87 @@ read 1024/1024 bytes at offset 76800
>  { "start": 69632, "length": 4096, "depth": 0, "zero": true, "data": false},
>  { "start": 73728, "length": 4096, "depth": 0, "zero": false, "data": true, 
> "offset": 24576},
>  { "start": 77824, "length": 134139904, "depth": 1, "zero": true, "data": 
> false}]
> +
> +== unaligned image tail cluster, no allocation needed ==
> +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 
> backing_file=TEST_DIR/t.IMGFMT.base
> +wrote 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 1024/1024 bytes at offset 134217728
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +1024/1024 bytes allocated at offset 128 MiB
> +[{ "start": 0, "length": 134218752, "depth": 0, "zero": true, "data": false}]
> +wrote 2048/2048 bytes at offset 134217728
> +2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
> +{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": 
> false}]
> +wrote 1024/1024 bytes at offset 134217728
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 1024/1024 bytes at offset 134218752
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 1024/1024 bytes at offset 134218240
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +2048/2048 bytes allocated at offset 128 MiB
> +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
> +{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": 
> false}]
> +wrote 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 134218240
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 1024/1024 bytes at offset 134217728
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +[{ "start": 0, "length": 134217728, "depth": 0, "zero": true, "data": false},
> +{ "start": 134217728, "length": 1024, "depth": 0, "zero": true, "data": 
> false, "offset": OFFSET}]
> +
> +== unaligned image tail cluster, allocation required ==
> +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 
> backing_file=TEST_DIR/t.IMGFMT.base
> +wrote 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 134218752
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 1536/1536 bytes at offset 134218240
> +1.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
> +{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": 
> true, "offset": 20480}]
> +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 
> backing_file=TEST_DIR/t.IMGFMT.base
> +wrote 512/512 bytes at offset 134218240
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134218240
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 1024/1024 bytes at offset 134218752
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 134219264
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134218240
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 1024/1024 bytes at offset 134218752
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
> +{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": 
> true, "offset": 20480}]
> +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 
> backing_file=TEST_DIR/t.IMGFMT.base
> +wrote 512/512 bytes at offset 134218752
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 134219264
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 1024/1024 bytes at offset 134217728
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134218752
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134219264
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
> +{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": 
> true, "offset": 20480}]
>  *** done
> 


Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]