Re: [PATCH v2 1/4] qemu-img: implement compare --stat

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/4] qemu-img: implement compare --stat

From:	Hanna Reitz
Subject:	Re: [PATCH v2 1/4] qemu-img: implement compare --stat
Date:	Tue, 26 Oct 2021 10:47:47 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0

On 26.10.21 09:53, Vladimir Sementsov-Ogievskiy wrote:

25.10.2021 19:40, Hanna Reitz wrote:

On 21.10.21 12:12, Vladimir Sementsov-Ogievskiy wrote:

With new option qemu-img compare will not stop at first mismatch, but
instead calculate statistics: how many clusters with different data,
how many clusters with equal data, how many clusters were unallocated
but become data and so on.

We compare images chunk by chunk. Chunk size depends on what
block_status returns for both images. It may return less than cluster
(remember about qcow2 subclusters), it may return more than cluster (if
several consecutive clusters share same status). Finally images may
have different cluster sizes. This all leads to ambiguity in how to
finally compare the data.

What we can say for sure is that, when we compare two qcow2 images with
same cluster size, we should compare clusters with data separately.
Otherwise, if we for example compare 10 consecutive clusters of data
where only one byte differs we'll report 10 different clusters.
Expected result in this case is 1 different cluster and 9 equal ones.

So, to serve this case and just to have some defined rule let's do the
following:

1. Select some block-size for compare procedure. In this commit it must
    be specified by user, next commit will add some automatic logic and
    make --block-size optional.

2. Go chunk-by-chunk using block_status as we do now with only one
    differency:
    If block_status() returns DATA region that intersects block-size
    aligned boundary, crop this region at this boundary.

This way it's still possible to compare less than cluster and report
subcluster-level accuracy, but we newer compare more than one cluster
of data.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
  docs/tools/qemu-img.rst |  18 +++-

qemu-img.c | 206+++++++++++++++++++++++++++++++++++++---

  qemu-img-cmds.hx        |   4 +-
  3 files changed, 212 insertions(+), 16 deletions(-)


Looks good to me overall!  Just some technical comments below.

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index d58980aef8..21164253d4 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -159,6 +159,18 @@ Parameters to compare subcommand:
    Strict mode - fail on different image size or sector allocation
+.. option:: --stat
+
+  Instead of exit on first mismatch compare the whole images and print
+  statistics on amount of different pairs of clusters, based on their
+  block-status and are they equal or not.


I’d phrase this as:

Instead of exiting on the first mismatch, compare the whole imagesand print statistics on how much they differ in terms of block status(i.e. are blocks allocated or not, do they contain data, are theymarked as containing only zeroes) and block content (a block of datathat contains only zero still has the same content as a marked-zeroblock).

For me the rest starting from "and block content" sounds unclear,seems doesn't add any information to previous (i.e. are blocksallocated ...)

By “block content” I meant what you said by “equal or not”, i.e. what isreturned when reading from the block.

Now that I think about it again, I believe we should go with youroriginal “equal or not”, though, because that reflects what qemu-img--stat prints, like so perhaps:

Instead of exiting on the first mismatch, compare the whole images andprint statistics on the amount of different pairs of blocks, based ontheir block status and whether they are equal or not.

I’d still like to add something like what I had in parentheses, though,because as a user, I’d find the “block status” and “equal or not” termsto be a bit handwave-y. I don’t think “block status” is a common termin our documentation, so I wanted to add some examples; and I wanted toshow by example that “equal” blocks don’t need to have the same blockstatus.


[...]

@@ -1304,6 +1306,107 @@ static int check_empty_sectors(BlockBackend*blk, int64_t offset,
      return 0;
  }
+#define IMG_CMP_STATUS_MASK (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO | \
+                             BDRV_BLOCK_ALLOCATED)
+#define IMG_CMP_STATUS_MAX (IMG_CMP_STATUS_MASK | BDRV_BLOCK_EOF)
+
+typedef struct ImgCmpStat {
+ /* stat: [ret: 0 is equal, 1 is not][status1][status2] ->n_bytes */
+    uint64_t stat[2][IMG_CMP_STATUS_MAX + 1][IMG_CMP_STATUS_MAX + 1];
`IMG_CMP_STATUS_MAX` isn’t packed tightly because it only has fourbits set (0x33). That in itself isn’t a problem, but it means that`IMG_CMP_STATUS_MAX + 1` is 52, and so this array’s size is 52 * 52 *2 * sizeof(uint64_t) = 43264. Again, that isn’t a problem in itself(although it is a bit sad that this could fit into 16 * 16 * 2 * 8 =4 kB), but in `img_compare()` [1], you put this structure on thestack, and I believe it’s too big for that.
Hmm. May be, it's better just use GHashTables and don't bother withthese sparse arrays

Or we could use our own bits here (ALLOCATED = (1 << 2), EOF = (1 << 3))and have a small function that translates BDRV_BLOCK_* values into them.

In any case, I don’t mind the sparseness too much, it’s just that itshouldn’t go on the stack.


Hanna

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v2 0/4] qemu-img compare --stat, Vladimir Sementsov-Ogievskiy, 2021/10/21
- [PATCH v2 1/4] qemu-img: implement compare --stat, Vladimir Sementsov-Ogievskiy, 2021/10/21
  - Re: [PATCH v2 1/4] qemu-img: implement compare --stat, Hanna Reitz, 2021/10/25
    - Re: [PATCH v2 1/4] qemu-img: implement compare --stat, Vladimir Sementsov-Ogievskiy, 2021/10/26
    - Re: [PATCH v2 1/4] qemu-img: implement compare --stat, Hanna Reitz <=
    - Re: [PATCH v2 1/4] qemu-img: implement compare --stat, Vladimir Sementsov-Ogievskiy, 2021/10/26
    - Re: [PATCH v2 1/4] qemu-img: implement compare --stat, Hanna Reitz, 2021/10/27
- [PATCH v2 2/4] qemu-img: make --block-size optional for compare --stat, Vladimir Sementsov-Ogievskiy, 2021/10/21
  - Re: [PATCH v2 2/4] qemu-img: make --block-size optional for compare --stat, Hanna Reitz, 2021/10/26
    - Re: [PATCH v2 2/4] qemu-img: make --block-size optional for compare --stat, Vladimir Sementsov-Ogievskiy, 2021/10/28
- [PATCH v2 3/4] qemu-img: add --shallow option for qemu-img compare, Vladimir Sementsov-Ogievskiy, 2021/10/21
  - Re: [PATCH v2 3/4] qemu-img: add --shallow option for qemu-img compare, Hanna Reitz, 2021/10/26
- [PATCH v2 4/4] iotests: add qemu-img-compare-stat test, Vladimir Sementsov-Ogievskiy, 2021/10/21
  - Re: [PATCH v2 4/4] iotests: add qemu-img-compare-stat test, Hanna Reitz, 2021/10/26

Prev by Date: Re: [Libguestfs] [PATCH 1/9] qapi: New special feature flag "unstable"
Next by Date: Re: [PATCH 4/9] qapi: Tools for sets of special feature flags in generated code
Previous by thread: Re: [PATCH v2 1/4] qemu-img: implement compare --stat
Next by thread: Re: [PATCH v2 1/4] qemu-img: implement compare --stat
Index(es):
- Date
- Thread