Re: [Qemu-block] [PATCH v10 09/17] qcow2: Optimize write zero of unalign

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v10 09/17] qcow2: Optimize write zero of unalign

From:	Eric Blake
Subject:	Re: [Qemu-block] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster
Date:	Fri, 28 Apr 2017 16:24:34 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.0

On 04/28/2017 03:48 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> We've already improved discards to operate efficiently on the tail
>> of an unaligned qcow2 image; it's time to make a similar improvement
>> to write zeroes.  The special case is only valid at the tail
>> cluster of a file, where we must recognize that any sectors beyond
>> the image end would implicitly read as zero, and therefore should
>> not penalize our logic for widening a partial cluster into writing
>> the whole cluster as zero.
>>
>> Update test 154 to cover the new scenarios, using two images of
>> intentionally differing length.
>>
>> While at it, fix the test to gracefully skip when run as
>> ./check -qcow2 -o compat=0.10 154
>> since the older format lacks zero clusters already required earlier
>> in the test.
>>
>> Signed-off-by: Eric Blake <address@hidden>
>>

>>
>> +echo
>> +echo == unaligned image tail cluster, no allocation needed ==
>> +
>> +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
> 
> Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I
> actually think that would be a better test because as it is, the image's
> "unaligned" tail is exactly one cluster (so it isn't really unaligned).

Uhhh - copy-and-paste?  You're right, that 1024 is too small for what
I'm actually doing with it.  :(

> 
>> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
>> +
>> +# With no backing file, write to all or part of unallocated partial cluster
>> +
>> +# Backing file: 128m: ... | 00 --
> 
> Not sure what the "00 --" is supposed to mean. The cluster size is 1 kB,
> so this is just a single cluster; which will be converted from
> unallocated to a zero cluster because the tail reads as zero.
> 
>> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io

I'm doing a write of 512 bytes, so I was trying to show that in the
cluster starting at 128m, we have a write request for one sector to be
written to 00, while the rest of the sector is left uninitialized. The
obvious intent is that we note that the uninitialized portion reads as
zero, so we can optimize the entire cluster (even though it is a partial
cluster) to be a cluster write-zero instead of a sector-based allocating
write.  But since you've pointed out that my setup is flawed, I'll have
to double-check that I'm actually testing what I thought I was (I know I
hit the right breakpoints in gdb, but its the iotest expected output
that needs to show the difference between pre- and post-patched qemu
with regards to the partial-tail-cluster optimizations).

>> +
>> +# Backing file: 128m: ... | -- 00
> 
> Same here, but now it's the head that reads as zero (and it's already a
> zero cluster so nothing changes here).
> 
>> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
> 
> But I suppose you mean $((size + 512))?

Umm. Yeah.  (Hey - you're not the only one that writes patches late at
night).

> 
>> +
>> +# Backing file: 128m: ... | 00 00
> 
> And finally this doesn't change anything either -- but then again this
> is a fully aligned request, so I don't quite see the point in testing
> this here.

At one point during my development, I had a bug where a partial write
request at the head of the cluster behaved differently from a partial
write request at the end (one allocated while the other did not).

Maybe it's best if I do a single write, then inspect the map, then reset
the image, rather than doing all three writes in a row and proving that
the net result of the three writes had no allocation.  In v9, when I was
trying to use unallocated clusters as a shortcut on files with no
backing image, this actually worked (after all three writes, the cluster
would still be unallocated); but since Kevin convinced me that v10 has
to set the zero flag on all three writes, I'm back to the point of
needing to clear the zero flag between writes.

Okay, I'll definitely have to fix it.

>> +# With backing file that reads as zero, write to all or part of entire 
>> cluster
>> +
>> +# Backing file: 128m: ... | 00 00
>> +# Active layer: 128m: ... | 00 00 00 00
>> +$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
>> +
>> +# Backing file: 128m: ... | 00 00
>> +# Active layer: 128m: ... | 00 00 -- --
> 
> How so? Shouldn't this just stay a zero cluster because the rest of the
> cluster already reads as zero?

Same problem as above.  In v9, I was trying to exploit code that left a
cluster unallocated; but that's not viable, so I'll have to reset the
image between steps.

> 
> Also, I'm not quite sure what you are testing by just issuing three
> consecutive write requests without checking the result each time. You
> assume it to be something which you wrote in the comments above each,
> but you can't be sure (yes, if something goes wrong in between, the
> following final map should catch it, but it's not for certain).

Well, it helped me during development, but you're absolutely right that
we can't be certain in the long-run.  3 separate checks, with resets
between, is indeed the way to go.

>> +echo
>> +echo == unaligned image tail cluster, allocation required ==
>> +
>> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))

Here, my backing cluster size is intentionally small, so that
bdrv_get_block_status() can easily return sane results for the backing
file portions, so only the active layer has a partial cluster. But
making it obvious in the comments can't hurt, given that my earlier
tests were invalidated when my size choice didn't actually match with
the operations I was attempting.

>> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
>> +
>> +# Write beyond backing file must COW
>> +# Backing file: 128m: ... | XX --
> 
> Here, the "--" is an unallocated cluster...
> 
>> +# Active layer: 128m: ... | -- -- 00 --
> 
> ...while here it is normally allocated data. Or is "--" supposed to mean
> you just don't write to that area...?

If we had subclusters, then it would be unallocated data.  But we don't.
 Since I just reset the image, I'm doing a write onto an unallocated
cluster in the current layer; the write request is to a subset of the
cluster, and the code checking whether the rest of the cluster reads as
zeros will see that a COW (and corresponding allocation) is required.

> 
>> +
>> +$QEMU_IO -c "write -P 1 $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io
>> +$QEMU_IO -c "write -z $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 1 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 0 $((size + 512)) 1536" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
>> +
>> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
>> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
>> +
>> +# Writes at boundaries of (partial) cluster must not lose mid-cluster data
>> +# Backing file: 128m: ... | -- XX
>> +# Active layer: 128m: ... | 00 -- -- 00
>> +
>> +$QEMU_IO -c "write -P 1 $((size + 512)) 512" "$TEST_IMG.base" | 
>> _filter_qemu_io
>> +$QEMU_IO -c "write -z $((size)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_> 
>> +$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
> 
> [1]
> 
>> +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
>> +
>> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
>> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
>> +
>> +# Partial write must not lose data
>> +# Backing file: 128m: ... | -- --
>> +# Active layer: 128m: ... | -- -- XX 00
> 
> Well, you have basically tested this in [1] already. After the first
> write -z to the active layer, the final four sectors are fully allocated
> (as they are in a single cluster) and look like this:
> 
> 00 01 00 00 = XX XX XX XX
> 
> (Because the 00s are just written as data.)
> 
> Now the write -z in [1] just overwrites the last of those four sectors
> with zero data (albeit it being zero already).
> 
> But if you want to test it again, I'm not stopping you. :-)
> 

Well, you've already proven that I need to revisit this test with a
fine-tooth comb, especially if I split the series to do the blkdebug
stuff separately from the qcow2 stuff.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-block] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length, (continued)
- [Qemu-block] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting, Eric Blake, 2017/04/26
  - Re: [Qemu-block] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting, Max Reitz, 2017/04/28
    - Re: [Qemu-block] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting, Eric Blake, 2017/04/28
- [Qemu-block] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster, Eric Blake, 2017/04/26
  - Re: [Qemu-block] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster, Max Reitz, 2017/04/28
    - Re: [Qemu-block] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster, Eric Blake <=
- [Qemu-block] [PATCH v10 12/17] blkdebug: Sanity check block layer guarantees, Eric Blake, 2017/04/26
- [Qemu-block] [PATCH v10 13/17] blkdebug: Refactor error injection, Eric Blake, 2017/04/26
- [Qemu-block] [PATCH v10 15/17] blkdebug: Simplify override logic, Eric Blake, 2017/04/26
- [Qemu-block] [PATCH v10 14/17] blkdebug: Add pass-through write_zero and discard support, Eric Blake, 2017/04/26
- [Qemu-block] [PATCH v10 10/17] qcow2: Assert that cluster operations are aligned, Eric Blake, 2017/04/26
- [Qemu-block] [PATCH v10 16/17] blkdebug: Add ability to override unmap geometries, Eric Blake, 2017/04/26
- [Qemu-block] [PATCH v10 11/17] qcow2: Discard/zero clusters by byte count, Eric Blake, 2017/04/26
- [Qemu-block] [PATCH v10 17/17] tests: Add coverage for recent block geometry fixes, Eric Blake, 2017/04/26

Prev by Date: Re: [Qemu-block] RFC: some problems with bdrv_get_block_status
Next by Date: Re: [Qemu-block] [Qemu-devel] [PULL 14/34] iotests: Launch qemu-nbd with -e 42
Previous by thread: Re: [Qemu-block] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster
Next by thread: [Qemu-block] [PATCH v10 12/17] blkdebug: Sanity check block layer guarantees
Index(es):
- Date
- Thread