[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RFC: fixing the 32-bit size and time limits in gzip file format
From: |
Paul Eggert |
Subject: |
RFC: fixing the 32-bit size and time limits in gzip file format |
Date: |
Mon, 16 Aug 2010 02:25:47 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6 |
The most often-reported bug for GNU gzip is that gzip -l reports sizes
modulo 2**32, instead of full sizes. This is because the gzip format
specifies a 4-byte (32-bit) size field.
A similar problem in gzip format is that it supports only nonzero
32-bit time stamps, which limits it to the range from 1970-01-01
00:00:01 through 2106-02-07 06:28:15 UTC. OK, so this is not as
pressing a bug, but it wouldn't hurt to fix this while we're at it.
I am thinking that we should fix that by putting full sizes and time
stamps into the header, as follows:
* If the file size is 2**32 or larger, gzip should emit an extra field
that records the size divided by 2**32 (discarding fractions). gzip
-l should read this field when reporting the size.
* We want to do this in such a way that is compatible with all the
other gzip implementations out there, including old versions of GNU
gzip. So, we use the already-existing mechanism for extra fields,
namely FLG.FEXTRA as per RFC 1952. We use SI1='H', SI2='S' (this is
short for High-order bits of the Size). LEN is the length of the
high-order bits field, and the field's value contains the high-order
bits, represented as usual in little-endian order. A missing HS
field is treated as zero.
* Similarly, we use SI1='H', SI2='M' (High-order Modification time)
for the high-order bits of the modification time, when a time stamp
is less than 1 or greater than 2**32 - 1. There are a few extra
goodies here, though. If the leading bit of the high-order time
field is 1, then the entire time stamp (including the lower order
bits) is treated as a negative number, using two's complement.
Also, if the high-order bits are present but are all zero, the time
stamp is considered to be zero rather than missing.
* This approach will allow us to represent sizes up to 2**65568, which
should be enough for quite some time. Similarly, representable times
would range from 2**65567 seconds before 1970 to 2**65567 seconds
after 1970, which would handle all file-system formats that I know of.
* This approach is backward-compatible with older versions of gzip,
with any decompressor that conforms to Internet RFC 1952, and with
all implementations of gzip decompressors that I know of.
* This approach does not address the issue of sub-second time stamp
resolution, as I thought that would make the proposal too complicated.
Comments are welcome; please CC: to <address@hidden>.
- RFC: fixing the 32-bit size and time limits in gzip file format,
Paul Eggert <=
- Re: RFC: fixing the 32-bit size and time limits in gzip file format, Greg Roelofs, 2010/08/15
- Re: RFC: fixing the 32-bit size and time limits in gzip file format, Mark Adler, 2010/08/15
- Re: RFC: fixing the 32-bit size and time limits in gzip file format, Greg Roelofs, 2010/08/16
- Re: RFC: fixing the 32-bit size and time limits in gzip file format, Paul Eggert, 2010/08/16
- Re: RFC: fixing the 32-bit size and time limits in gzip file format, Mark Adler, 2010/08/17
- Re: RFC: fixing the 32-bit size and time limits in gzip file format, Mark Adler, 2010/08/17
- Re: RFC: fixing the 32-bit size and time limits in gzip file format, Paul Eggert, 2010/08/19
- Re: RFC: fixing the 32-bit size and time limits in gzip file format, Mark Adler, 2010/08/20
- Re: RFC: fixing the 32-bit size and time limits in gzip file format, Greg Roelofs, 2010/08/20
- Re: RFC: fixing the 32-bit size and time limits in gzip file format, Mark Adler, 2010/08/20