[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gzip --force bug
From: |
Mark Adler |
Subject: |
gzip --force bug |
Date: |
Tue, 2 Feb 2010 08:21:54 -0800 |
bug-gzip,
I got a report of a behavior of gzip that is not replicated in pigz. In the
process of investigating that, I found a bug in gzip (all versions including
1.4). Here's the deal.
The behavior is that if you use --force and --stdout with --decompress, gzip
will behave like cat if it doesn't recognize any compressed data magic headers.
This is so that zcat can act as a replacement for cat, automatically detecting
and decompressing compressed data. (pigz doesn't currently do that, which I
need to fix.) Another behavior of gzip is that it will decompress concatenated
gzip streams. Combining those two behaviors, gzip -cfd on a gzip stream
followed by non-gzip data should give you the decompressed data from the stream
followed by the non-gzip data copied.
gzip doesn't do that, at least not correctly.
What it does for a small example is write the decompressed data, write the
initial gzip stream without decompressing it (!), and then write the non-gzip
data. The stuff in the middle is the result of this code in gzip.c:
} else if (force && to_stdout && !list) { /* pass input unchanged */
method = STORED;
work = copy;
inptr = 0;
last_member = 1;
}
(By the way, the tabs should be removed from all of the gzip source code.)
The culprit is the "inptr = 0". It resets the input back to the beginning of
the current input buffer (wherever that happens to be) and copies from there.
That works fine if you start the input with non-gzip data, but messes up in the
case of non-gzip data after a gzip stream.
I have not developed a fix, since it is non-trivial. You can't just restore a
saved inptr, since it is possible for the two-byte magic header to be split on
a buffer boundary. That is, reading the first byte of the magic header empties
the input buffer, so that reading the second byte of the magic reader fills the
input buffer, overwriting the first byte.
If you want, I can try to come up with a patch for that, or you could have that
pleasure.
Mark
- gzip --force bug,
Mark Adler <=
- Re: gzip --force bug, Jim Meyering, 2010/02/03
- Re: gzip --force bug, Mark Adler, 2010/02/03
- Re: gzip --force bug, Jim Meyering, 2010/02/03
- Re: gzip --force bug, Mark Adler, 2010/02/03
- Re: gzip --force bug, Mark Adler, 2010/02/03
- Re: gzip --force bug, Mark Adler, 2010/02/03
- Re: gzip --force bug, Mark Adler, 2010/02/03
- Re: gzip --force bug, Jim Meyering, 2010/02/04
- Re: gzip --force bug, Mark Adler, 2010/02/04
- Re: gzip --force bug, Jim Meyering, 2010/02/04