[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 2/2] qemu-img convert: Fix sparseness detection

From: Vladimir Sementsov-Ogievskiy
Subject: Re: [RFC PATCH 2/2] qemu-img convert: Fix sparseness detection
Date: Tue, 20 Apr 2021 19:52:12 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.9.1

20.04.2021 18:04, Kevin Wolf wrote:
Am 20.04.2021 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
15.04.2021 18:22, Kevin Wolf wrote:
In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
like non-zero data if the end of the checked area isn't aligned. This
can improve the efficiency of the conversion and was introduced in
commit 8dcd3c9b91a.

However, it comes with a correctness problem: qemu-img convert is
supposed to sparsify areas that contain only zeros, which it doesn't do
any more. It turns out that this even happens when not only the
unaligned area is zeroed, but also the blocks before and after it. In
the bug report, conversion of a fragmented 10G image containing only
zeros resulted in an image consuming 2.82 GiB even though the expected
size is only 4 KiB.

As a tradeoff between both, let's ignore zeroed sectors only after
non-zero data to fix the alignment, but if we're only looking at zeros,
keep them as such, even if it may mean additional RMW cycles.

Hmm.. If I understand correctly, we are going to do unaligned
write-zero. And that helps.

This can happen (mostly raw images on block devices, I think?), but
usually it just means skipping the write because we know that the target
image is already zeroed.

What it does mean is that if the next part is data, we'll have an
unaligned data write.

Doesn't that mean that alignment is wrongly detected?

The problem is that you can have bdrv_block_status_above() return the
same allocation status multiple times in a row, but *pnum can be
unaligned for the conversion.

We only look at a single range returned by it when detecting the
alignment, so it could be that we have zero buffers for both 0-11 and
12-16 and detect two misaligned ranges, when both together are a
perfectly aligned zeroed range.

In theory we could try to do some lookahead and merge ranges where
possible, which should give us the perfect result, but it would make the
code considerably more complicated. (Whether we want to merge them
doesn't only depend on the block status, but possibly also on the
content of a DATA range.)


Oh, I understand now the problem, thanks for explanation.

Hmm, yes that means, that if the whole buf is zero, is_allocated_sectors must not align 
it down, to be possibly "merged" with next chunk if it is zero too.

But it's still good to align zeroes down, if data starts somewhere inside the 
buf, isn't it?

what about something like this:

diff --git a/qemu-img.c b/qemu-img.c
index babb5573ab..d1704584a0 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1167,19 +1167,39 @@ static int is_allocated_sectors(const uint8_t *buf, int 
n, int *pnum,
+ if (i == n) {
+        /*
+         * The whole buf is the same.
+         *
+         * if it's data, just return it. It's the old behavior.
+         *
+         * if it's zero, just return too. It will work good if target is alredy
+         * zeroed. And if next chunk is zero too we'll have no RMW and no 
+         * to write data.
+         */
+        *pnum = i;
+        return !is_zero;
+    }
     tail = (sector_num + i) & (alignment - 1);
     if (tail) {
         if (is_zero && i <= tail) {
-            /* treat unallocated areas which only consist
-             * of a small tail as allocated. */
+            /*
+             * For sure next sector after i is data, and it will rewrite this
+             * tail anyway due to RMW. So, let's just write data now.
+             */
             is_zero = false;
         if (!is_zero) {
-            /* align up end offset of allocated areas. */
+            /* If possible, align up end offset of allocated areas. */
             i += alignment - tail;
             i = MIN(i, n);
         } else {
-            /* align down end offset of zero areas. */
+            /*
+             * For sure next sector after i is data, and it will rewrite this
+             * tail anyway due to RMW. Better is avoid RMW and write zeroes up
+             * to aligned bound.
+             */
             i -= tail;

Best regards,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]