bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] maint: use an optimal-for-grep xz compression setting


From: Gilles Espinasse
Subject: Re: [PATCH 2/2] maint: use an optimal-for-grep xz compression setting
Date: Sun, 4 Mar 2012 10:38:01 +0100

----- Original Message ----- 
From: "Jim Meyering" <address@hidden>
To: "GNU" <address@hidden>
Sent: Saturday, March 03, 2012 11:14 AM
Subject: [PATCH 2/2] maint: use an optimal-for-grep xz compression setting


...
> From 4b2224681fbc297bf585630b679d8540a02b78d3 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <address@hidden>
> Date: Sat, 3 Mar 2012 10:51:11 +0100
> Subject: [PATCH 2/2] maint: use an optimal-for-grep xz compression setting
>
> * cfg.mk (XZ_OPT): Use -6e (determined empirically, see comments).
> This sacrifices a meager 60 bytes of compressed tarball size for a
> 55-MiB decrease in the memory required during decompression.  I.e.,
> using -9e would shave off only 60 bytes from the tar.xz file, yet
> would force every decompression process to use 55 MiB more memory.
> ---
...
> +export XZ_OPT = -6e
> +
>  old_NEWS_hash = 347e90ee0ec0489707df139ca3539934
>
-9 should be set only when the file to compress is really big enought
-6 is xz default compression setting
-6e approximately double the required time to compress (with 1% size gain)

-6{,e} work well with a file with approximately the same size as
grep-2.11.tar.
But if a bigger .tar is compressed, that may not give good compression
result.

rm -f dummy; for i in 1 2 3 4 5; do echo " $i x grep-2.11.tar size";cat
grep-2.11.tar >>dummy; xz -vv -6 < dummy >/dev/null; done; rm dummy
 1 x grep-2.11.tar size
xz: Filter
chain: --lzma2=dict=8MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
xz: 94 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 9 MiB of memory.
  100 %     1112.9 KiB / 9240.0 KiB = 0.120   746 KiB/s       0:12
 2 x grep-2.11.tar size
xz: Filter
chain: --lzma2=dict=8MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
xz: 94 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 9 MiB of memory.
  100 %       2130.4 KiB / 18.0 MiB = 0.115   721 KiB/s       0:25
 3 x grep-2.11.tar size
xz: Filter
chain: --lzma2=dict=8MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
xz: 94 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 9 MiB of memory.
  100 %       3147.8 KiB / 27.1 MiB = 0.114   708 KiB/s       0:39
 4 x grep-2.11.tar size
xz: Filter
chain: --lzma2=dict=8MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
xz: 94 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 9 MiB of memory.
  100 %       4165.2 KiB / 36.1 MiB = 0.113   709 KiB/s       0:52
 5 x grep-2.11.tar size
xz: Filter
chain: --lzma2=dict=8MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
xz: 94 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 9 MiB of memory.
  100 %       5182.4 KiB / 45.1 MiB = 0.112   707 KiB/s       1:05

So using -6 could consider more decompression memory requirement than
compressed file size result.
Contrast that with setting dictionary size (I know this benchmark is
extreme)
Dictionary size limit is theorically the file to be compressed, here 3/4 is
fully arbitrary, only decrease memory requirement a bit.

for i in 1 2 3 4 5; do cat grep-2.11.tar >>dummy;
XZ_OPT=--lzma2=dict=$(du -h dummy | awk '{ printf "%dMiB", $1 / 4 * 3 }')
xz -vv < dummy >/dev/null; done; rm dummy
xz: Filter
chain: --lzma2=dict=6MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
xz: 75 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 7 MiB of memory.
  100 %     1118.2 KiB / 9240.0 KiB = 0.121   748 KiB/s       0:12
xz: Filter
chain: --lzma2=dict=14MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
xz: 167 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 15 MiB of memory.
  100 %       1114.1 KiB / 18.0 MiB = 0.060   752 KiB/s       0:24
xz: Filter
chain: --lzma2=dict=21MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
xz: 265 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 22 MiB of memory.
  100 %       1115.5 KiB / 27.1 MiB = 0.040   739 KiB/s       0:37
xz: Filter
chain: --lzma2=dict=27MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
xz: 322 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 28 MiB of memory.
  100 %       1116.8 KiB / 36.1 MiB = 0.030   752 KiB/s       0:49
xz: Filter
chain: --lzma2=dict=34MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
xz: 389 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 35 MiB of memory.
  100 %       1118.2 KiB / 45.1 MiB = 0.024   751 KiB/s       1:01

Here adding 4 times to the load give same compressed file size (in 1%
range).

Probably tar should learn to set xz dictionary size to the size of .tar when
using -J?
That would be the most efficient way to compress without wasting memory.

Gilles




reply via email to

[Prev in Thread] Current Thread [Next in Thread]