bug-gzip
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#67022: Gzip decompression can be 60% faster using zlib's CRC32


From: Young Mo Kang
Subject: bug#67022: Gzip decompression can be 60% faster using zlib's CRC32
Date: Thu, 9 Nov 2023 12:40:24 -0500
User-agent: Mozilla Thunderbird

Hello,


I have noticed that GNU Gzip's CRC32 calculation is the main bottleneck in decompression, and it can run significantly faster >60% if we replace it with crc32 function from zlib.


I tested decompression speed of linux source code tar.gz file before and after replacing CRC32 computation. On an AMD 7735HS system, I get

GNU Gzip unmodified
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.11
GNU Gzip with CRC32 from zlib
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.16


And I saw even better performance improvement when tested on an Apple Silicon M1 system.

GNU Gzip unmodified
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.83
GNU Gzip with CRC32 from zlib
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.72


Since both GNU Gzip and zlib are written by the same authors, I was wondering if GNU Gzip can share zlib's CRC32 calculation and obtain this performance gain--I am not sure if there would be a license issue though.


The following bash script should reproduce the result

```

# download GNU Gzip and zlib
wget -O- https://ftp.gnu.org/gnu/gzip/gzip-1.13.tar.gz | tar xzf -
wget -O- https://zlib.net/zlib-1.3.tar.gz | tar xzf -

# download linux source code as a test file for decompression speed
wget -O- https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.6.1.tar.xz | xz -d | gzip > linux.tar.gz

# compile zlib
cd zlib-1.3
CFLAGS="-O2 -g" ./configure --static && make -j
cd ..

# compile GNU Gzip
cd gzip-1.13
CFLAGS="-O2 -g" ./configure && make -j

# measure decompression speed
/usr/bin/time -v ./gzip -d < ../linux.tar.gz > linux.tar 2> ../gzip1.time

# use crc32 from zlib
cat > util.diff << EOF
@@ -27,6 +27,7 @@
 #include <stdlib.h>
 #include <errno.h>

+#include "crc32.h"
 #include "tailor.h"
 #include "gzip.h"
 #include <dirname.h>
@@ -136,25 +137,14 @@ copy (int in, int out)
 ulg
 updcrc (uch const *s, unsigned n)
 {
-    register ulg c;         /* temporary variable */
-
-    if (s == NULL) {
-        c = 0xffffffffL;
-    } else {
-        c = crc;
-        if (n) do {
-            c = crc_32_tab[((int)c ^ (*s++)) & 0xff] ^ (c >> 8);
-        } while (--n);
-    }
-    crc = c;
-    return c ^ 0xffffffffL;       /* (instead of ~c for 64-bit machines) */
+    crc = crc32(crc, s, n);
 }

 /* Return a current CRC value.  */
 ulg
 getcrc ()
 {
-  return crc ^ 0xffffffffL;
+  return crc;
 }

 #ifdef IBM_Z_DFLTCC
EOF
patch < util.diff util.c

# create header file
cat > crc32.h << EOF
#pragma once

unsigned long  crc32(unsigned long crc, const unsigned char  *buf,
                            unsigned int len);
EOF

# copy crc32 object file from zlib
cp ../zlib-1.3/crc32.o .

# re-compile GNU Gzip
gcc -O2 -g -c util.c -Ilib
gcc -O2 -g *.o lib/libgzip.a -o gzip

# measure decompression speed
/usr/bin/time -v ./gzip -d < ../linux.tar.gz > linux.tar 2> ../gzip2.time

# print out time difference
cd ..
echo
echo "GNU Gzip unmodified"
grep Elapsed gzip1.time
echo "GNU Gzip with CRC32 from zlib"
grep Elapsed gzip2.time
```






reply via email to

[Prev in Thread] Current Thread [Next in Thread]