bug-gzip
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: fixing the 32-bit size and time limits in gzip file format


From: Paul Eggert
Subject: Re: RFC: fixing the 32-bit size and time limits in gzip file format
Date: Thu, 19 Aug 2010 23:54:54 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6

On 08/17/10 07:03, Mark Adler wrote:

> it does not seem possible to both have interoperability and to
> always provide a correct uncompressed length in the stream.

In that case I'm afraid that we need to give up on the goal of always
providing a correct uncompressed length.  At this point the gzip
format is so widely used that an incompatible change to it would cause
far more trouble than the relatively minor problem of gzip -l
reporting the wrong length.  Instead, it might be better to leave the
format alone, and to change gzip -l so that it decompresses the data
in order to report the uncompressed data length.

On 08/17/10 07:40, Mark Adler wrote:

> The format of the extra field at the end could be similar to the one
> in the header but with smaller sizes and fewer id's:
> 
> n (n == 0 permitted) occurrences of:
> 
>    1-byte sub-field id, 1-byte length, then that many bytes
> 
> followed by:
> 
>    1-byte end-of-extra-field id, 1-byte total length of extra field
>    including following crc, 2-byte crc of entire extra field except
>    of course the crc.

That 255-byte limit for the total length of all the sub-fields is a bit
tight; perhaps that should be increased to a 2-byte total length?
Otherwise this looks fine, except for the major problem of its being
an incompatible change.

> There is another problem that could be solved with this, which is
> the inability to know about concatenated gzip streams in a file
> without decompressing.  Another sub-field in the extra field at the
> end could be the number of bytes back to the start of the current
> gzip stream.  Then you could step back through the headers and
> trailers of all of the gzip streams and find out what the
> uncompressed length *really* would be.

That would be nice, yes.

One possible way forward would be to add a new option to gzip, "-F
FORMAT" say, which specifies which format to use when generating a
gzip file.  The default would be "-F rfc-1952", which would result in
the current format, without the extra field at the end.  If the user
specified (say) "-F gnu-2010" gzip would generate the new format.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]