--- Begin Message ---
Subject: |
gzip huge filesize problem |
Date: |
Sat, 15 Aug 2015 23:42:20 +0200 |
User-agent: |
Roundcube Webmail/1.1-git |
Hi Gzip team,
I compressed a 500 GB file (raw hdd image) using gzip 1.6 under Ubuntu
14.10 (64 bit). uncompressing the file gives a file with 500 gb
(checked).
But "gzip -l" shows bad (small) uncompressed_size and bad ratio
(-5167%).
Below you can see some details, but I think it is a general bug.
Thanks for help, Alexander
gzip -l asus.gz
compressed uncompressed ratio uncompressed_name 99630975185
1891655680 -5166.9% asus
gzip --version
gzip 1.6
Linux myname 3.16.0-43-generic #58-Ubuntu SMP Fri Jun 19 11:04:02 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux
the 2 files (compressed 93gb + uncompressed 500gb)
-rwxrwx--- 1 root plugdev 99630975185 Aug 15 21:39 asus.gz
-rwxrwx--- 1 root plugdev 500107862016 Aug 14 09:00 sdc.raw
-rwxrwx--- 1 root plugdev 93G Aug 15 21:39 asus.gz
-rwxrwx--- 1 root plugdev 466G Aug 14 09:00 sdc.raw
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#21270: gzip huge filesize problem |
Date: |
Sun, 16 Aug 2015 21:44:54 -0600 |
tags 21270 notabug
thanks
On Sun, Aug 16, 2015 at 1:58 AM, Mark Adler <address@hidden> wrote:
> Alexander,
>
> Thank you for your report. This is a well-known limitation of the gzip
> format. The -l function makes use of the uncompressed length stored in the
> last four bytes of a gzip stream. Therein lies the rub, since four bytes can
> represent no more than 4 GB - 1.
>
> There is another problem with that approach, in that a valid gzip file may
> consist of a series of concatenated gzip streams, in which case -l will
> report only on the last one. In that case, even if the entire stream
> decompresses to less than 4 GB, the result will still be incorrect.
>
> The only reliable way to determine the uncompressed size of a gzip file is to
> decompress the entire file (which can be done without storing the result).
> This in fact is what "pigz -lt file.gz" does. It will correctly report the
> uncompressed length, but takes much longer than "gzip -l".
>
> -l remains useful however in most cases, so it remains a gzip and pigz option.
Thank you for replying Mark.
I've marked this as "notabug" with the in-line comment above, and am
closing the auto-created issue with the "-done" part of the debbugs
email recipient address.
--- End Message ---