[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] Source code repository for lzip

From: Antonio Diaz Diaz
Subject: Re: [Lzip-bug] Source code repository for lzip
Date: Mon, 03 Mar 2014 19:09:47 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.11) Gecko/20050905

Michał Górny wrote:
So in fact the 'underlying stream' in lzip file format is incompatible
with the 'underlying stream' in xz? Am I understanding this correctly?

Yes. The sad thing is that the 'underlying stream' in the lzip file format is identical to the 'underlying stream' in lzma-alone. It is xz the one that throw away the code in lzma-utils and rewrote it from scratch, changing the license, the stream, and adding some features dangerous for Free Software and mostly useless for GNU/Linux systems.

Just make a test: does Gentoo use any of the "advanced" features of xz ("cheaper" recompression of already compressed files, binary filters, user defined filters, etc)?

BTW, the simpler stream in lzip files is what makes lziprecover possible.

I agree with you that xz is unnecessarily complex and therefore you
could say that it moves in that regard, but I guess I don't understand
lzip enough to see what the arguments are in favor of it instead,
and that's what I'm trying to get a grasp on, what the key benefits are.

Lzip is just a compressor like gzip or bzip2, only it compresses more. A wise person should choose lzip by default and only switch to xz after verifying that he needs the additional complexity.

In practice, and surprisingly, it happens the contrary: people choose xz, complain about its unnecessary complexity, and ask what are the benefits of using something simpler that makes the job as well or even better.

It occurs to me that if data safety was my top priority, I'd use a tool
dedicated to just that task, like PAR2.

Normally one compresses the files before using a tool like parchive on them. See for example how the page of zfec[1] (another package implementing an "erasure code") describes the process:

"a Unix-style tool like "zfec" does only one thing -- in this case erasure coding -- and leaves other tasks to other tools. Other Unix-style tools that go well with zfec include GNU tar for archiving multiple files and directories into one file, lzip for compression, and GNU Privacy Guard for encryption or sha256sum for integrity. It is important to do things in order: first archive, then compress, then either encrypt or integrity-check, then erasure code."
[1] https://pypi.python.org/pypi/zfec

But it is usually much easier to just store two or more copies of your important files on different media and use lziprecover to merge the copies if all of them get damaged.

> But if I just need a tool to compress my sources for distribution, I
> can safely assume that something else will be responsible for
> ensuring the integrity of my data.

Then I would prefer the simplest tool that makes the job; lzip.

Another technical concern I have, is regarding memory. How does lzip
compare in regards to xz? If the peak memory use is determined
by the dictionary size, doesn't this make efficient use of memory
a matter of better implementation rather than the format?

Peak memory use is mainly a matter of choosing the right dictionary size, and lzip is much better than xz in this regard because 'xz -9' always uses a 64 MiB dictionary (even to compress a very small file), while lzip automatically uses the smallest possible dictionary size for each file.

Xz is the only compressor that gave problems in the 'dist' target of automake. This is why it is the only compressor that automake does not invoke with option '-9' by default. Once again, a wise man would just use lzip; there is no reason to waste 674 MiB to compress a small file as xz does.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]