[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#48424: bug in "gzip -lv gzip-file"
From: |
Adler, Mark |
Subject: |
bug#48424: bug in "gzip -lv gzip-file" |
Date: |
Fri, 14 May 2021 19:53:22 +0000 |
Robert,
No, it’s not that the gzip utility implementation is using the wrong size
integer. This is because the gzip utility is using the gzip-format trailer to
guess at the uncompressed length. That trailer has a four-byte length, which is
the uncompressed length of the last member modulo 2^32. Sometimes the guess is
wrong.
The only way around this limitation, built into the gzip format, would be to
decode the entire file to compute the determine the actual uncompressed length.
pigz will do this on request with the -lt option.
There is no way to both rapidly and reliably get the uncompressed length.
What’s more, a compressed length of more than 4 GiB is not the only way for
gzip -l to be wrong. gzip streams can consist of multiple members, in which
case gzip -l will report the length from only the last member. Here is an
example, first correctly enumerated by pigz -ltv:
% pigz -ltv mult.gz
method check timestamp compressed original reduced name
gzip 8 66007dba Mar 21 2005 54405 152089 64.2% alice
gzip 8 b56c3f9d Mar 21 2005 13 14 7.1% <...>
gzip 8 8efc3b00 Mar 21 2005 71667 296960 75.9% <...>
gzip -lv will give information only from the last member:
% gzip -lv mult.gz
method crc date time compressed uncompressed ratio
uncompressed_name
defla 8efc3b00 Feb 2 09:30 126145 296960 57.5% mult
pigz -lv just looks at the trailer for the crc and length just like gzip, and
also gets it wrong:
% pigz -lv mult.gz
method check timestamp compressed original reduced name
gzip 8 8efc3b00 Mar 21 2005 126121 296960 57.5% alice
Mark
On May 14, 2021, at 11:01 AM, Robert Urban
<robert.urban@stromasys.com<mailto:robert.urban@stromasys.com>> wrote:
Hello,
gzip (at least my version, v1.10 running on Fedora 33) apparently uses an
unsigned 32-bit value when displaying the uncompressed size of a gzipped file.
This demonstrates the problem:
Create a 5GiB test file:
$ fallocate -l $((5*1024*1024*1024)) fatfile
Compress it:
$ gzip -c fatfile > fatfile.gz
List the contents:
$ gzip -lv fatfile.gz
method crc date time compressed uncompressed ratio
uncompressed_name
defla 193838c3 May 14 19:53 5857306 1073741824 99.5%
fatfile
As you can see, the value in the "uncompressed" column is exactly 1GiB.
Regards,
Robert Urban
Please cc me in replies, as I'm not a subscriber of the list