[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#29089: Truncated size of big file
From: |
Mark Adler |
Subject: |
bug#29089: Truncated size of big file |
Date: |
Tue, 31 Oct 2017 11:20:29 -0700 |
Alex,
This is inherent in the gzip format, and is not really a bug in gzip. (Though
gzip could notice the problem and not display a large negative compression
ratio.)
The gzip format stores the uncompressed length at the end using four bytes,
which can only represent up to 2^32-1. So what you are seeing is the low 32
bits of 18962535424, which is in fact 1782666240. When gzip uses that truncated
value to compute a compression ratio, it gets a nonsensical result.
Unfortunately the only way to get the real uncompressed length and compute a
real ratio is to decompress the entire file. (In fact, pigz will do this with
"pigz -lt", which tests the entire file without storing the result, and reports
the correct uncompressed size and compression ratio. "pigz -l" will do the same
bad thing that "gzip -l" does on > 4 GB uncompressed sizes, though it will
report “unk” for questionable ratios, i.e. expansions of the data beyond what
would be expected for incompressible data.)
Mark
> On Oct 31, 2017, at 10:59 AM, Alex Peshkoff <address@hidden> wrote:
>
> Before decompressing a copy of database I've decided to take a look at it's
> size:
>
> localhost stg # gunzip -l SWHTOROLT_20171019.GBK.gz
> compressed uncompressed ratio uncompressed_name
> 3645968323 1782666240 -104.5% SWHTOROLT_20171019.GBK
>
> uncompressed is reported as 1.7Gb which is definitely something unreal like
> -104.5 compress ratio
>
> Actual size after unzip is:
>
> localhost stg # gunzip SWHTOROLT_20171019.GBK.gz
> localhost stg # ls -l SWHTOROLT_20171019.GBK
> -rw-r--r-- 1 root root 18962535424 Oct 19 15:59 SWHTOROLT_20171019.GBK
>
> Lickily I've had enough disk space - but let me not attach problematic
> archive to email, I suppose it's easier to reproduce this locally ;)
>
> Alex.
>
>
>
>
>