On 21.04.20 10:11, Denis Plotnikov wrote:
zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.
The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G
The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.
compress cmd:
time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
src.img [zlib|zstd]_compressed.img
decompress cmd
time ./qemu-img convert -O qcow2
[zlib|zstd]_compressed.img uncompressed.img
compression decompression
zlib zstd zlib zstd
------------------------------------------------------------
real 65.5 16.3 (-75 %) 1.9 1.6 (-16 %)
user 65.0 15.8 5.3 2.5
sys 3.3 0.2 2.0 2.0
Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G
Signed-off-by: Denis Plotnikov <address@hidden>
Reviewed-by: Vladimir Sementsov-Ogievskiy <address@hidden>
Reviewed-by: Alberto Garcia <address@hidden>
QAPI part:
Acked-by: Markus Armbruster <address@hidden>
---
docs/interop/qcow2.txt | 1 +
configure | 2 +-
qapi/block-core.json | 3 +-
block/qcow2-threads.c | 157 +++++++++++++++++++++++++++++++++++++++++
block/qcow2.c | 7 ++
5 files changed, 168 insertions(+), 2 deletions(-)
[...]
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..0525718704 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
#define ZLIB_CONST
#include <zlib.h>
+#ifdef CONFIG_ZSTD
+#include <zstd.h>
+#include <zstd_errors.h>
+#endif
+
#include "qcow2.h"
#include "block/thread-pool.h"
#include "crypto.h"
@@ -166,6 +171,148 @@ static ssize_t qcow2_zlib_decompress(void
*dest, size_t dest_size,
return ret;
}
+#ifdef CONFIG_ZSTD
+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ * -ENOMEM destination buffer is not enough to store
compressed data
+ * -EIO on any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+ const void *src, size_t src_size)
+{
+ ssize_t ret;
+ ZSTD_outBuffer output = { dest, dest_size, 0 };
+ ZSTD_inBuffer input = { src, src_size, 0 };
Minor style note: I think it’d be nicer to use designated initializers
here.
+ ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+ if (!cctx) {
+ return -EIO;
+ }
+ /*
+ * Use the zstd streamed interface for symmetry with decompression,
+ * where streaming is essential since we don't record the exact
+ * compressed size.
+ *
+ * In the loop, we try to compress all the data into one zstd
frame.
+ * ZSTD_compressStream2 potentially can finish a frame earlier
+ * than the full input data is consumed. That's why we are looping
+ * until all the input data is consumed.
+ */
+ while (input.pos < input.size) {
+ size_t zstd_ret;
+ /*
+ * ZSTD spec: "You must continue calling ZSTD_compressStream2()
+ * with ZSTD_e_end until it returns 0, at which point you are
+ * free to start a new frame". We assume that "start a new
frame"
+ * means call ZSTD_compressStream2 in the very beginning or
when
+ * ZSTD_compressStream2 has returned with 0.
+ */
+ do {
+ zstd_ret = ZSTD_compressStream2(cctx, &output, &input,
ZSTD_e_end);
The spec makes it sound to me like ZSTD_e_end will always complete in a
single call if there’s enough space in the output buffer. So the only
team we have to loop would be when there isn’t enough space anyway:
It says this about ZSTD_e_end:
flush operation is the same, and follows same rules as calling
ZSTD_compressStream2() with ZSTD_e_flush.
Those rules being:
Note that, if `output->size` is too small, a single invocation with
ZSTD_e_flush might not be enough (return code > 0).
So it seems like it will only return a value > 0 if the output buffer is
definitely too small.
The spec also notes that the return value is greater than 0 if:
0 if some data still present within internal buffer (the value is
minimal estimation of remaining size),
So it’s a minimum estimate. That’s another point that heavily implies
to me that if the return value were less than what’s left in the buffer,
the function wouldn’t return but still try to write it out, until it
realizes that there isn’t enough space in the output buffer, and then
return a value that exceeds the remaining output buffer size.
(Because if the function just played it safe, I would expect it to
return a maximum estimate.)
OTOH, if it were actually possible for ZSTD_e_end to finish a frame
earlier than the end of the input, I think it would make more sense to
use ZSTD_e_continue until the input is done and then finish with
ZSTD_e_end, like the spec seems to propose. That way, we’d always end
up with a single frame to make decompression simpler (and I think it
would also make more sense overall).
But anyway. From how I understand the spec, this code simply always
ends up creating a single frame or erroring out, without looping ever.
So it isn’t exactly wrong, it just seems overly complicated. (Again,
assuming I understand the spec correctly. Which seems like a tough
thing to assume, because the spec is not exactly obvious to read...)
(Running some quick tests by converting some images with zstd
compression seems to confirm that whenever ZSTD_compressStream2()
returns, either zstd_ret > output.size - output.pos, or zstd_ret == 0
and input.pos == input.size. So none of the loops ever loop.)
Max