Re: [PATCH v20 3/4] qcow2: add zstd cluster compression

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v20 3/4] qcow2: add zstd cluster compression

From:	Denis Plotnikov
Subject:	Re: [PATCH v20 3/4] qcow2: add zstd cluster compression
Date:	Mon, 27 Apr 2020 22:26:19 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1



On 27.04.2020 15:35, Max Reitz wrote:

On 21.04.20 10:11, Denis Plotnikov wrote:

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
   time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
                   src.img [zlib|zstd]_compressed.img
decompress cmd
   time ./qemu-img convert -O qcow2
                   [zlib|zstd]_compressed.img uncompressed.img

            compression               decompression
          zlib       zstd           zlib         zstd
------------------------------------------------------------
real     65.5       16.3 (-75 %)    1.9          1.6 (-16 %)
user     65.0       15.8            5.3          2.5
sys       3.3        0.2            2.0          2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov <address@hidden>
Reviewed-by: Vladimir Sementsov-Ogievskiy <address@hidden>
Reviewed-by: Alberto Garcia <address@hidden>
QAPI part:
Acked-by: Markus Armbruster <address@hidden>
---
  docs/interop/qcow2.txt |   1 +
  configure              |   2 +-
  qapi/block-core.json   |   3 +-
  block/qcow2-threads.c  | 157 +++++++++++++++++++++++++++++++++++++++++
  block/qcow2.c          |   7 ++
  5 files changed, 168 insertions(+), 2 deletions(-)

[...]

diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..0525718704 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
  #define ZLIB_CONST
  #include <zlib.h>

+#ifdef CONFIG_ZSTD

+#include <zstd.h>
+#include <zstd_errors.h>
+#endif
+
  #include "qcow2.h"
  #include "block/thread-pool.h"
  #include "crypto.h"
@@ -166,6 +171,148 @@ static ssize_t qcow2_zlib_decompress(void *dest, size_t 
dest_size,
      return ret;
  }

+#ifdef CONFIG_ZSTD

+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *          -ENOMEM destination buffer is not enough to store compressed data
+ *          -EIO    on any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+                                   const void *src, size_t src_size)
+{
+    ssize_t ret;
+    ZSTD_outBuffer output = { dest, dest_size, 0 };
+    ZSTD_inBuffer input = { src, src_size, 0 };

Minor style note: I think it’d be nicer to use designated initializers here.

+    ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+    if (!cctx) {
+        return -EIO;
+    }
+    /*
+     * Use the zstd streamed interface for symmetry with decompression,
+     * where streaming is essential since we don't record the exact
+     * compressed size.
+     *
+     * In the loop, we try to compress all the data into one zstd frame.
+     * ZSTD_compressStream2 potentially can finish a frame earlier
+     * than the full input data is consumed. That's why we are looping
+     * until all the input data is consumed.
+     */
+    while (input.pos < input.size) {
+        size_t zstd_ret;
+        /*
+         * ZSTD spec: "You must continue calling ZSTD_compressStream2()
+         * with ZSTD_e_end until it returns 0, at which point you are
+         * free to start a new frame". We assume that "start a new frame"
+         * means call ZSTD_compressStream2 in the very beginning or when
+         * ZSTD_compressStream2 has returned with 0.
+         */
+        do {
+            zstd_ret = ZSTD_compressStream2(cctx, &output, &input, ZSTD_e_end);

The spec makes it sound to me like ZSTD_e_end will always complete in a
single call if there’s enough space in the output buffer.  So the only
team we have to loop would be when there isn’t enough space anyway:

It says this about ZSTD_e_end:

flush operation is the same, and follows same rules as calling
ZSTD_compressStream2() with ZSTD_e_flush.

Those rules being:

Note that, if `output->size` is too small, a single invocation with
ZSTD_e_flush might not be enough (return code > 0).

So it seems like it will only return a value > 0 if the output buffer is
definitely too small.

The spec also notes that the return value is greater than 0 if:

0 if some data still present within internal buffer (the value is

minimal estimation of remaining size),

So it’s a minimum estimate.  That’s another point that heavily implies
to me that if the return value were less than what’s left in the buffer,
the function wouldn’t return but still try to write it out, until it
realizes that there isn’t enough space in the output buffer, and then
return a value that exceeds the remaining output buffer size.

(Because if the function just played it safe, I would expect it to
return a maximum estimate.)


OTOH, if it were actually possible for ZSTD_e_end to finish a frame
earlier than the end of the input, I think it would make more sense to
use ZSTD_e_continue until the input is done and then finish with
ZSTD_e_end, like the spec seems to propose.  That way, we’d always end
up with a single frame to make decompression simpler (and I think it
would also make more sense overall).


But anyway.  From how I understand the spec, this code simply always
ends up creating a single frame or erroring out, without looping ever.
So it isn’t exactly wrong, it just seems overly complicated.  (Again,
assuming I understand the spec correctly.  Which seems like a tough
thing to assume, because the spec is not exactly obvious to read...)

(Running some quick tests by converting some images with zstd
compression seems to confirm that whenever ZSTD_compressStream2()
returns, either zstd_ret > output.size - output.pos, or zstd_ret == 0
and input.pos == input.size.  So none of the loops ever loop.)

Max


So, what should we do?

1. Rely on the test that there's no need for the loop:
   * make one ZSTD_compressStream2() call
   * make sure it returned with zstd_ret == 0 and
     input.pos == input.size.
     if so, return with the size
   * if not, check that zstd_ret > output.size - output.pos
     if so, return with -ENOMEM
   * if none above return with -EIO

   This should cover the majority of the compressing cases

2. Leave the loop as is, because of the documentation:

"You *must* continue calling ZSTD_compressStream2() with ZSTD_e_enduntil it returns 0,

    at which point you are free to start a new frame."

   This fulfills the documentation requirements.

3. Any other option?

Denis

+
+            if (ZSTD_isError(zstd_ret)) {
+                ret = -EIO;
+                goto out;
+            }
+            /* Dest buffer isn't big enough to store compressed content */
+            if (zstd_ret > output.size - output.pos) {
+                ret = -ENOMEM;
+                goto out;
+            }
+        } while (zstd_ret);
+    }
+    /* make sure we can safely return compressed buffer size with ssize_t */
+    assert(output.pos <= SSIZE_MAX);
+    ret = output.pos;
+out:
+    ZSTD_freeCCtx(cctx);
+    return ret;
+}

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v20 0/4] qcow2: Implement zstd cluster compression methodi, Denis Plotnikov, 2020/04/21
- [PATCH v20 2/4] qcow2: rework the cluster compression routine, Denis Plotnikov, 2020/04/21
  - Re: [PATCH v20 2/4] qcow2: rework the cluster compression routine, Max Reitz, 2020/04/27
- [PATCH v20 3/4] qcow2: add zstd cluster compression, Denis Plotnikov, 2020/04/21
  - Re: [PATCH v20 3/4] qcow2: add zstd cluster compression, Max Reitz, 2020/04/27
    - Re: [PATCH v20 3/4] qcow2: add zstd cluster compression, Denis Plotnikov <=
    - Re: [PATCH v20 3/4] qcow2: add zstd cluster compression, Max Reitz, 2020/04/28
    - Re: [PATCH v20 3/4] qcow2: add zstd cluster compression, Denis Plotnikov, 2020/04/28
    - Re: [PATCH v20 3/4] qcow2: add zstd cluster compression, Max Reitz, 2020/04/28
- [PATCH v20 1/4] qcow2: introduce compression type feature, Denis Plotnikov, 2020/04/21
  - Re: [PATCH v20 1/4] qcow2: introduce compression type feature, Alberto Garcia, 2020/04/21
  - Re: [PATCH v20 1/4] qcow2: introduce compression type feature, Max Reitz, 2020/04/27
- [PATCH v20 4/4] iotests: 287: add qcow2 compression type test, Denis Plotnikov, 2020/04/21
  - Re: [PATCH v20 4/4] iotests: 287: add qcow2 compression type test, Vladimir Sementsov-Ogievskiy, 2020/04/21
  - Re: [PATCH v20 4/4] iotests: 287: add qcow2 compression type test, Max Reitz, 2020/04/27
    - Re: [PATCH v20 4/4] iotests: 287: add qcow2 compression type test, Denis Plotnikov, 2020/04/28

Prev by Date: Re: [RFC PATCH] qom: Implement qom-get HMP command
Next by Date: Re: [PATCH v2 0/2] hw/pci/pcie: Forbid hot-plug if it's disabled on the slot
Previous by thread: Re: [PATCH v20 3/4] qcow2: add zstd cluster compression
Next by thread: Re: [PATCH v20 3/4] qcow2: add zstd cluster compression
Index(es):
- Date
- Thread