[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: spec compliance: header CRC?
From: |
Greg Roelofs |
Subject: |
Re: spec compliance: header CRC? |
Date: |
Mon, 5 Jul 2010 09:42:54 -0700 |
Hi Paul,
>> Are there any plans to address this?
> Not until you mentioned it, but I just now installed a patch for this;
> please see the end of this message.
Awesome! Many thanks.
> Can you please help out by supplying some test cases?
I can certainly provide one, currently part of a not-quite-final patch at
https://issues.apache.org/jira/browse/HADOOP-6835
http://issues.apache.org/jira/secure/attachment/12448469/HADOOP-6835.v5.trunk-hadoop-common.patch
I've copied it here:
http://gregroelofs.com/test/testCompressThenConcat.txt.gz
This was hand-built, but I've verified that zlib > 1.2.1.2 reads it
correctly--that is, using the regular zlib inflateInit2() API, not
the gz* one, which ignores the CRC but otherwise also handles it OK.
(Versions prior to 1.2.1.2 forgot to compute the CRC on the trailing
NULLs in the filename and comment fields.) I don't recall if I've
verified it yet with Sun's JDK--I've made myself a note to do so
sometime this week. (They're not exactly swift on gzip-related fixes
in any case. ;-) http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4691425)
> Are there
> examples on the net of gzip files that gzip 1.4 won't decompress, due
> to this problem?
Not to my knowledge; we've got a bit of chicken-and-egg problem there,
insofar as most people avoid generating gzip'd files that can't be decoded
with standard gzip. Neither the JDK nor zlib minigzip provides a mechanism
to generate arbitrary header fields, AFAIK. Possibly something like 7-Zip
does, but I suspect not.
> If not, can you please generate some? As things
> stand, I feel that I haven't tested it in any real-world way. Thanks.
I'll try to do so, yes. We're putting together a more extensive test plan
for the Hadoop patch, and the ideal suite would include all possible header
combos (with/without extra field, filename, comment, CRC). I'm not sure
I'll have time--we're approaching an internal code freeze shortly--but I'll
do what I can.
> Here's the patch. I'll add a NEWS entry shortly.
Thank you! I'll also test this at work this week--it will make my own
testing easier.
Greg