bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] maint: use an optimal-for-grep xz compression setting


From: Jim Meyering
Subject: Re: [PATCH 2/2] maint: use an optimal-for-grep xz compression setting
Date: Sun, 04 Mar 2012 17:04:34 +0100

Gilles Espinasse wrote:
> ----- Original Message ----- 
> From: "Jim Meyering" <address@hidden>
> To: "GNU" <address@hidden>
> Sent: Saturday, March 03, 2012 11:14 AM
> Subject: [PATCH 2/2] maint: use an optimal-for-grep xz compression setting
> ...
>> From 4b2224681fbc297bf585630b679d8540a02b78d3 Mon Sep 17 00:00:00 2001
>> From: Jim Meyering <address@hidden>
>> Date: Sat, 3 Mar 2012 10:51:11 +0100
>> Subject: [PATCH 2/2] maint: use an optimal-for-grep xz compression setting
>>
>> * cfg.mk (XZ_OPT): Use -6e (determined empirically, see comments).
>> This sacrifices a meager 60 bytes of compressed tarball size for a
>> 55-MiB decrease in the memory required during decompression.  I.e.,
>> using -9e would shave off only 60 bytes from the tar.xz file, yet
>> would force every decompression process to use 55 MiB more memory.
>> ---
> ...
>> +export XZ_OPT = -6e
>> +
>>  old_NEWS_hash = 347e90ee0ec0489707df139ca3539934
>>
> -9 should be set only when the file to compress is really big enought
> -6 is xz default compression setting
> -6e approximately double the required time to compress (with 1% size gain)

I am happy to tell xz to spend a few more seconds (use -e) and save 1% for
everyone who downloads a grep tarball.

> -6{,e} work well with a file with approximately the same size as
> grep-2.11.tar.
> But if a bigger .tar is compressed, that may not give good compression
> result.

Yes, I too would like to automate the xz-preset selection process.

> rm -f dummy; for i in 1 2 3 4 5; do echo " $i x grep-2.11.tar size";cat
> grep-2.11.tar >>dummy; xz -vv -6 < dummy >/dev/null; done; rm dummy
>  1 x grep-2.11.tar size
> xz: Filter
> chain: --lzma2=dict=8MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
> xz: 94 MiB of memory is required. The limit is 17592186044416 MiB.
> xz: Decompression will need 9 MiB of memory.
>   100 %     1112.9 KiB / 9240.0 KiB = 0.120   746 KiB/s       0:12
>  2 x grep-2.11.tar size
...
>
> So using -6 could consider more decompression memory requirement than
> compressed file size result.
> Contrast that with setting dictionary size (I know this benchmark is
> extreme)
> Dictionary size limit is theorically the file to be compressed, here 3/4 is
> fully arbitrary, only decrease memory requirement a bit.
>

> Probably tar should learn to set xz dictionary size to the size of .tar when
> using -J?
> That would be the most efficient way to compress without wasting memory.

That would be fine if the tarball were formed on disk before
invoking xz, but that's not how the current process works.
Currently, tar's output is piped to xz, and it seems wasteful
to create the full tar file on disk first, and also wasteful
to run the tar-file-creation process separately just to determine
the size of the tarball.  If there is way to make tar tell us
the size of the tarball it would have created, that would be best.
However, we can approximate that by running du -sk $(distdir).

I have just experimented a little with coreutils, using this adjusted
rule in the top-level Makefile:

gl_distdir_kb_ = $(du -sk $(distdir) | awk '{ printf "%dKiB", $$1 * 3 / 4 }')
gl_xz_opt_ = --lzma2=dict=$(gl_distdir_kb_) --memlimit-compress=512MiB
dist-xz: distdir
        tardir=$(distdir) && $(am__tar) | XZ_OPT=$${XZ_OPT-$(gl_xz_opt_)} \
          xz -c >$(distdir).tar.xz
        $(am__post_remove_distdir)

However, your heuristic (even when I added --memlimit-compress=512MiB)
left me with a tarball nearly 2% larger than the one compressed with -8e.

If you come up with a heuristic that is competitive, please let us know.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]