|
From: | John Arbash Meinel |
Subject: | Re: [Gnu-arch-users] Free space wasting when handling binary files |
Date: | Thu, 24 Mar 2005 18:15:59 -0600 |
User-agent: | Mozilla Thunderbird 1.0 (Macintosh/20041206) |
Adrian Irving-Beer wrote:
On Thu, Mar 24, 2005 at 12:42:46PM -0600, John Arbash Meinel wrote:Remember, the tarball is compressed, so you do get a little bit of delta compression even though there are 2 copies in there.Negative on that, AFAICT... % dd if=/dev/urandom of=file1 bs=1k count=1k 1024+0 records in 1024+0 records out 1048576 bytes transferred in 0.338545 seconds (3097302 bytes/sec) % cp file1 file2 % ls -l file1 file2 -rw-r--r-- 1 wisq wisq 1048576 2005-03-24 18:50 file1 -rw-r--r-- 1 wisq wisq 1048576 2005-03-24 18:51 file2 % tar -zcf files.tar.gz file1 file2 % ls -l files.tar.gz -rw-r--r-- 1 wisq wisq 2097810 2005-03-24 18:51 files.tar.gz
It does depend on the size of the file versus the size of the compression window: $ dd if=/dev/random of=file1 bs=1k count=10 10+0 records in 10+0 records out $ cp file1 file2 $ tar czf files.tar.gz file1 file2 $ ls -l file1 file2 files.tar.gz -rw-r--r-- 1 jameinel jameinel 10240 Mar 24 18:07 file1 -rw-r--r-- 1 jameinel jameinel 10240 Mar 24 18:07 file2 -rw-r--r-- 1 jameinel jameinel 10586 Mar 24 18:07 files.tar.gz $ dd if=/dev/random of=file1 bs=1k count=100 100+0 records in 100+0 records out $ cp file1 file2 $ tar czf files.tar.gz file1 file2 $ ls -l file1 file2 files.tar.gz -rw-r--r-- 1 jameinel jameinel 102400 Mar 24 18:09 file1 -rw-r--r-- 1 jameinel jameinel 102400 Mar 24 18:09 file2 -rw-r--r-- 1 jameinel jameinel 205198 Mar 24 18:09 files.tar.gz I don't know what the gzip window is, but the bzip2 window is 900k. (If we used bzip2 instead of gzip, the above holds true up until > 500k files). $ dd if=/dev/random of=file1 bs=1k count=100 100+0 records in 100+0 records out $ cp file1 file2 $ tar cjf files.tar.bz2 file1 file2 $ ls -l file1 file2 files.tar.bz2 -rw-r--r-- 1 jameinel jameinel 102400 Mar 24 18:09 file1 -rw-r--r-- 1 jameinel jameinel 102400 Mar 24 18:09 file2 -rw-r--r-- 1 jameinel jameinel 128478 Mar 24 18:11 files.tar.bz2 $ dd if=/dev/random of=file1 bs=1k count=500 500+0 records in 500+0 records out $ cp file1 file2 $ tar cjf files.tar.bz2 file1 file2 $ ls -l file1 file2 files.tar.bz2 -rw-r--r-- 1 jameinel jameinel 512000 Mar 24 18:13 file1 -rw-r--r-- 1 jameinel jameinel 512000 Mar 24 18:13 file2 -rw-r--r-- 1 jameinel jameinel 750903 Mar 24 18:13 files.tar.bz2 $ dd if=/dev/random of=file1 bs=1k count=1k 1024+0 records in 1024+0 records out $ cp file1 file2 $ tar cjf files.tar.bz2 file1 file2 $ ls -l file1 file2 files.tar.bz2 -rw-r--r-- 1 jameinel jameinel 1048576 Mar 24 18:12 file1 -rw-r--r-- 1 jameinel jameinel 1048576 Mar 24 18:12 file2 -rw-r--r-- 1 jameinel jameinel 2106766 Mar 24 18:12 files.tar.bz2 So I would say your mostly right. Unless the files are below the size of the compression window, then you get pretty good delta compression. John =:->
signature.asc
Description: OpenPGP digital signature
[Prev in Thread] | Current Thread | [Next in Thread] |