bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] maint: use an optimal-for-grep xz compression setting


From: Gilles Espinasse
Subject: Re: [PATCH 2/2] maint: use an optimal-for-grep xz compression setting
Date: Sun, 4 Mar 2012 21:05:31 +0100

----- Original Message ----- 
From: "Jim Meyering" <address@hidden>
To: "Gilles Espinasse" <address@hidden>
Cc: "GNU" <address@hidden>
Sent: Sunday, March 04, 2012 5:04 PM
Subject: Re: [PATCH 2/2] maint: use an optimal-for-grep xz compression
setting


...
>
> I am happy to tell xz to spend a few more seconds (use -e) and save 1% for
> everyone who downloads a grep tarball.
>
> > -6{,e} work well with a file with approximately the same size as
> > grep-2.11.tar.
> > But if a bigger .tar is compressed, that may not give good compression
> > result.
>
> Yes, I too would like to automate the xz-preset selection process.
>
>
> I have just experimented a little with coreutils, using this adjusted
> rule in the top-level Makefile:
>
> gl_distdir_kb_ = $(du -sk $(distdir) | awk '{ printf "%dKiB", $$1 * 3 /
4 }')
> gl_xz_opt_ = --lzma2=dict=$(gl_distdir_kb_) --memlimit-compress=512MiB
> dist-xz: distdir
> tardir=$(distdir) && $(am__tar) | XZ_OPT=$${XZ_OPT-$(gl_xz_opt_)} \
>           xz -c >$(distdir).tar.xz
> $(am__post_remove_distdir)
>
> However, your heuristic (even when I added --memlimit-compress=512MiB)
> left me with a tarball nearly 2% larger than the one compressed with -8e.
>
> If you come up with a heuristic that is competitive, please let us know.

I tested against coreutils-8.15.tar as that was simplier for me to test.
xz -vv  -8e < coreutils-8.15.tar >/dev/null
xz: Filter
chain: --lzma2=dict=32MiB,lc=3,lp=0,pb=2,mode=normal,nice=273,mf=bt4,depth=5
12
xz: 370 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 33 MiB of memory.
  100 %       4832.6 KiB / 44.2 MiB = 0.107   361 KiB/s       2:05

I spotted that when only setting the dictionary size nice and depth were set
to a different value.
When I asked Lasse for an advice long time ago, his anwser was to set
dictionary size to the size of the file to be compressed and
nice=273,depth=512

So using
xz -vv --lzma2=dict=$(du -sk coreutils-8.15.tar | awk '{ printf "%dKiB", $1
* 3 / 4 }'),nice=273,depth=512 < coreutils-8.15.tar >/dev/null
xz: Filter
chain: --lzma2=dict=33993KiB,lc=3,lp=0,pb=2,mode=normal,nice=273,mf=bt4,dept
h=512
xz: 381 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 34 MiB of memory.
  100 %       4833.4 KiB / 44.2 MiB = 0.107   358 KiB/s       2:06

That setting only loose 0.8 KiB here and should work with various size to be
compressed.
I had no difference in size and time when I try with -e option when
nice=273,depth=512 are set.

If you really want minimal size, it look a bigger dictionary size help than
using ' 3 / 4 size to be compressed' factor.

xz -vv --lzma2=dict=$(du -sk coreutils-8.15.tar | awk '{ printf "%dKiB",
$1 }'),nice=273,depth=512 < coreutils-8.15.tar >/dev/null
xz: Filter
chain: --lzma2=dict=45324KiB,lc=3,lp=0,pb=2,mode=normal,nice=273,mf=bt4,dept
h=512
xz: 486 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 45 MiB of memory.
  100 %       4832.4 KiB / 44.2 MiB = 0.107   358 KiB/s       2:06

Why is it required to have a bigger dictionary when manually setting
dictionary size to have same file result is mysterious for me.
( with -8e, dictionary size was 32 MiB, with printf "%dKiB", $1 that give
45324KiB )
That would be a question to ask to Lasse.

xz is full of mystery
xz -vv --lzma2=dict=$(du -sk coreutils-8.15.tar | awk '{ printf "%dKiB", $1
* 4 / 5 }'),nice=273,depth=512 < coreutils-8.15.tar >/dev/null
xz: Filter
chain: --lzma2=dict=36259KiB,lc=3,lp=0,pb=2,mode=normal,nice=273,mf=bt4,dept
h=512
xz: 402 MiB of memory is required. The limit is 17592186044416 MiB.
xz: Decompression will need 36 MiB of memory.
  100 %       4831.7 KiB / 44.2 MiB = 0.107   357 KiB/s       2:06

Woh, this time size is even smaller than was give bare -9e.
Funny that using a smaller dictionary size here ($1 * 4 / 5) give a better
result than $1.
Maybe 4 / 5 is the golden ratio for xz (and coreutils workload)?

Gilles




reply via email to

[Prev in Thread] Current Thread [Next in Thread]