Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip

lzip-bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip

From:	Antonio Diaz Diaz
Subject:	Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip
Date:	Thu, 18 May 2017 22:07:54 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14

Damir wrote:

But calculating the CRC is just a small part of the total decompression
time. So, even if you accelerate it, the total speed gain is small.
(Probably smaller than 5%). For compression the speed gain is even smaller.


I can cite your own lzip benchmark, when comparing uncompression peformance
of lunzip vs busybox unxz, enabling crc in unxz (by using xz with crc32)
gives performance penalty of 16.7% (9.723s vs 8.331s). That's more
convincing number than 5%.

Take into account that busybox, gzip and lzip use the classical Sarwatealgorithm because it is simple, safe, and runs on all architectures. Inthe case of lzip, the CRC calculation takes about 10-19% of the totaldecompression time (depending on dictionary size and compression ratio).

Maybe the fair thing to do is to compare hardware acceleration withsoftware acceleration. This paper[1] states that:

"First, a 'slicing-by-4' algorithm doubles the performance of existingsoftware-based, table-driven CRC implementations based on the Sarwate(August 1988) algorithm while using a 4K cache footprint. Second, a'slicing-by-8' algorithm triples the performance of existingsoftware-based CRC implementations while using an 8K cache footprint."

[1]http://www.researchgate.net/publication/4167696_A_systematic_approach_to_building_high_performance_software-based_CRC_generators

I do not have any figures about how much hardware accelerationaccelerates the calculation of the CRC, but if the paper above is right,slicing-by-8 can reduce the software calculation time to a 3.5-7% of thetotal decompression time, leaving little margin for further improvements.

CRC32C has a slightly larger Hamming distance than ethernet CRC32 for
"small" packet sizes (see pags 3,4 of [1]). But beyond some size perhaps
not much larger than 128 KiB, both have the same HD of 2. For files
larger than that (uncompressed) size, there is little diference between
both CRCs.


That's not accurate at all. According to Koopman's crc32 zoo, crc32c gives
hd=4 on 2 gigabits, while ethernet crc is 92 kilobits.
http://users.ece.cmu.edu/~koopman/crc/crc32.html

You are right, the paper I cited was old and incomplete. But if Iunderstand CRC Zoo correctly as explained at[1], from 2 gigabits to 4gigabits ethernet CRC32 has a HD of 3 while CRC32C has a HD of 2. Thesesizes (268 MB to 537 MB) are also well within typical lzip usage.

Also, at a size of 4294967263 bits (537 MB), ethernet CRC32 has beenconfirmed to feature the longest dataword at HD 3 for all possible CRC32polynomials. Twice that of CRC32C.


[1] http://users.ece.cmu.edu/~koopman/crc/notes.html

That's a valid point. There is no good error model describing corruption in
uncompressed file resulting from typical errors in compressed files (BER,
burst errors, nand page error, hdd sector error). Still, better HD on
typical sizes is preferred.

In my experience with LZMA compression, there are two kinds of errorsnot caught by the decoder:1) Without error multiplication. The error affects a single byte and isguaranteed to be detected by both CRCs (ethernet and CRC32C).2) With error multiplication. The total number of bits flipped is largerthan 4, which both CRCs can detect about equally well.

I have performed many millions of trial decompressions and never foundany indication that the current CRC is not perfectly adequate.

  New decompressor can decompress both old files and new ones, and old
decompressor can decompress new ones, but can't check its integrity. Large
downside, but not very large.


It does not work like that:

$ lzip -t v2.lz
  v2.lz: Version 2 member format not supported.


Best regards,
Antonio.

[Prev in Thread]

Current Thread

[Next in Thread]

[Lzip-bug] Selection of CRC32 Polynomial for lzip, Damir, 2017/05/17
- Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip, Antonio Diaz Diaz, 2017/05/17
  - Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip, Damir, 2017/05/18
    - Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip, Antonio Diaz Diaz <=
    - Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip, Damir, 2017/05/19
    - Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip, Antonio Diaz Diaz, 2017/05/19

Prev by Date: Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip
Next by Date: Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip
Previous by thread: Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip
Next by thread: Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip
Index(es):
- Date
- Thread