[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnubg] Downloads from gnubg.org appear to be compressed twice

From: Michael Petch
Subject: [Bug-gnubg] Downloads from gnubg.org appear to be compressed twice
Date: Tue, 08 Feb 2011 07:21:11 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20101207 Thunderbird/3.1.7


This is interesting. I have confirmed that the issue is not a matter of CRLF processing. What IS occurring with Firefox is that when the files are downloaded from www.gnubg.org/media/sources they appear to be compressed a second time, and the doubly compressed file is stored by Firefox.

Here is the firefox HTTP request/response:

GET /media/sources/gnubg-source-SNAPSHOT-20110207.tar.gz HTTP/1.1

Host: www.gnubg.org

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv: Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0E)

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://www.gnubg.org/media/sources/

HTTP/1.1 200 OK
Date: Tue, 08 Feb 2011 13:21:14 GMT
Server: Apache
Last-Modified: Mon, 07 Feb 2011 03:50:09 GMT
ETag: "10f3-d8a97b-49ba920ce1e40"
Accept-Ranges: bytes
Cache-Control: max-age=2419200
Expires: Tue, 08 Mar 2011 13:21:14 GMT
Vary: Accept-Encoding
Content-Encoding: gzip
Keep-Alive: timeout=5, max=200
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: application/x-gzip

I'm begining to think that the chunked encoding with gzip is adding the extra compression. I took the firefox download (that tar can't process directly) and asked it to tell me the content of the file. it said:

gunzip -ltv ~mpetch/Desktop/gnubg-source-SNAPSHOT-20110207.tar.gz
method crc date time compressed uncompressed ratio uncompressed_name defla 54b9e403 Feb 8 04:54 14194658 14199163 0.0% /home/mpetch/Desktop/gnubg-source-SNAPSHOT-20110207.tar

Make note of the fact that the compression ration is near 0 and it claims the contents of the archive tar ball. 0% Compression tells me that the fireforx download is not a gzipped-tarball, but a gzipped-gzipped-tarball.

So I decided to gunzip the firefox tarball, rename the .tar file to .tar.gz and then ask gunzip to tell me the contents. this is what I see:

(rename the file from tar to tar.gz)
mv gnubg-source-SNAPSHOT-20110207.tar gnubg-source-SNAPSHOT-20110207.tar.gz
(ask gunzip to tell me what is in the file)
gunzip -l gnubg-source-SNAPSHOT-20110207.tar
         compressed        uncompressed  ratio uncompressed_name
14199163 23439360 39.4% gnubg-source-SNAPSHOT-20110207.tar

Sure enough now it says the contents are a tarball and there was 39.4% compression. So clearly this thing is compressed twice! When you download with wget you get a tar.gz that has been compressed once. wget though DOES NOT use chunked encoding. I am going to guess that Chunked encoding+Apache on the fly Gzip is somehow causing this.

Is it possible to have apache on gnubg.org to not compress already compressed files (don't compress anything ending in .gz)?

As for why tar doesn't decompress ('z' option) but gunzip works appears to be simple. gunzip clearly does one decompression, resulting in a true tar.gz (but with a .tar extension) file. It appears that tar is smart enough to figure out that the input stream is a compressed tarball and even without the 'z' parameter on the command line - it is still able to decompress it. What tar can't do is double decompress the file by itself.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]