[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gzip --force bug
From: |
Jim Meyering |
Subject: |
Re: gzip --force bug |
Date: |
Wed, 03 Feb 2010 06:30:44 +0100 |
Mark Adler wrote:
> I got a report of a behavior of gzip that is not replicated in pigz. In the
> process of investigating that, I found a bug in gzip (all versions including
> 1.4). Here's the deal.
>
> The behavior is that if you use --force and --stdout with --decompress, gzip
> will behave like cat if it doesn't recognize any compressed data magic
> headers. This is so that zcat can act as a replacement for cat,
> automatically detecting and decompressing compressed data. (pigz doesn't
> currently do that, which I need to fix.) Another behavior of gzip is that it
> will decompress concatenated gzip streams. Combining those two behaviors,
> gzip -cfd on a gzip stream followed by non-gzip data should give you the
> decompressed data from the stream followed by the non-gzip data copied.
>
> gzip doesn't do that, at least not correctly.
>
> What it does for a small example is write the decompressed data, write the
> initial gzip stream without decompressing it (!), and then write the non-gzip
> data. The stuff in the middle is the result of this code in gzip.c:
>
> } else if (force && to_stdout && !list) { /* pass input unchanged */
> method = STORED;
> work = copy;
> inptr = 0;
> last_member = 1;
> }
>
> (By the way, the tabs should be removed from all of the gzip source code.)
>
> The culprit is the "inptr = 0". It resets the input back to the beginning of
> the current input buffer (wherever that happens to be) and copies from there.
> That works fine if you start the input with non-gzip data, but messes up in
> the case of non-gzip data after a gzip stream.
>
> I have not developed a fix, since it is non-trivial. You can't just restore
> a saved inptr, since it is possible for the two-byte magic header to be split
> on a buffer boundary. That is, reading the first byte of the magic header
> empties the input buffer, so that reading the second byte of the magic reader
> fills the input buffer, overwriting the first byte.
>
> If you want, I can try to come up with a patch for that, or you could have
> that pleasure.
Thanks for the report.
I'm adding a test to exercise that, currently expected to fail:
>From 026eb1815d339e73102e3ae5a61543049ae9423a Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Tue, 2 Feb 2010 08:19:36 +0100
Subject: [PATCH 1/2] gzip -cdf mishandles some concatenated input streams: test
it
* tests/mixed: Exercise "gzip -cdf" bug.
* Makefile.am (XFAIL_TESTS): Add it.
Mark Adler reported the bug.
---
Makefile.am | 3 +++
tests/mixed | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 0 deletions(-)
create mode 100644 tests/mixed
diff --git a/Makefile.am b/Makefile.am
index b4e75fc..4263b1d 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -99,6 +99,9 @@ check-local: $(FILES_TO_CHECK) $(bin_PROGRAMS) gzip.doc.gz
done
@echo 'Test succeeded.'
+XFAIL_TESTS = \
+ tests/mixed
+
TESTS = \
tests/helin-segv \
tests/hufts \
diff --git a/tests/mixed b/tests/mixed
new file mode 100644
index 0000000..0ca8e80
--- /dev/null
+++ b/tests/mixed
@@ -0,0 +1,52 @@
+#!/bin/sh
+# Ensure that gzip -cdf handles mixed compressed/not-compressed data
+# Before gzip-1.5, it would produce invalid output.
+
+# Copyright (C) 2010 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+# limit so don't run it by default.
+
+if test "$VERBOSE" = yes; then
+ set -x
+ zgrep --version
+fi
+
+: ${srcdir=.}
+. "$srcdir/tests/init.sh"
+
+printf 'xxx\nyyy\n' > exp2 || framework_failure
+printf 'aaa\nbbb\nccc\n' > exp3 || framework_failure
+
+fail=0
+
+(echo xxx; echo yyy) > in || fail=1
+gzip -cdf < in > out || fail=1
+compare out exp2 || fail=1
+
+# Uncompressed input, followed by compressed data.
+(echo xxx; echo yyy|gzip) > in || fail=1
+gzip -cdf < in > out || fail=1
+compare out exp2 || fail=1
+
+# Compressed input, followed by regular (not-compressed) data.
+(echo xxx|gzip; echo yyy) > in || fail=1
+gzip -cdf < in > out || fail=1
+compare out exp2 || fail=1
+
+(echo xxx|gzip; echo yyy|gzip) > in || fail=1
+gzip -cdf < in > out || fail=1
+compare out exp2 || fail=1
+
+Exit $fail
--
1.7.0.rc1.167.gdb08
- gzip --force bug, Mark Adler, 2010/02/02
- Re: gzip --force bug,
Jim Meyering <=
- Re: gzip --force bug, Mark Adler, 2010/02/03
- Re: gzip --force bug, Jim Meyering, 2010/02/03
- Re: gzip --force bug, Mark Adler, 2010/02/03
- Re: gzip --force bug, Mark Adler, 2010/02/03
- Re: gzip --force bug, Mark Adler, 2010/02/03
- Re: gzip --force bug, Mark Adler, 2010/02/03
- Re: gzip --force bug, Jim Meyering, 2010/02/04
- Re: gzip --force bug, Mark Adler, 2010/02/04
- Re: gzip --force bug, Jim Meyering, 2010/02/04
- gzip 1.4 warnings, Mark Adler, 2010/02/03