rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] gzip --rsyncable (was: Re: New User Seeking Some Cl


From: Gregor Zattler
Subject: [rdiff-backup-users] gzip --rsyncable (was: Re: New User Seeking Some Clarification)
Date: Fri, 30 Jan 2004 00:34:32 +0100
User-agent: Mutt/1.5.5.1+cvs20040105i

Hi rdiff-backup-users,
* Gregor Zattler <address@hidden> [28. Jan. 2004]:
> Hi Ben,
> * Ben Escoto <address@hidden> [27. Jan. 2004]:
> > >>>>> Alan <address@hidden>
> > >>>>> wrote the following on Fri, 9 Jan 2004 13:35:37 -0800
> [...]
> > > until I realized that because of the
> > > bzip the .sql file was completely different each time, so the entire
> > > file was transfered as an increment.  When I removed the bzip part of
> > > the process the base file was larger, but the increments were much
> > > smaller because they were simply text diffs of new/changed data, not a
> > > binary diff of an entirely changed file. 
> 
> > I think there is a patch to gzip floating around that adds an option
> > to reset the buffer at certain clever intervals.  The end result is
> > that similar data gzipped stays similar---one extra byte at the
> > beginning doesn't result in two totally separate gzip archives.
> 
> This is in Debian unstable since almost one year:

I tested it.  The results are as expected:


I rdiff-backup-ed a directory /tmp/testdir which
contained three files: a 46 MB mbox, the same mbox as a a bzip2
compressed file and as a gzip compressed file:

   0 pit:/tmp/testdir$ ls -Al 
   total 57084
   -rw-r--r--    1 grfz     grfz     46154878 2004-01-29 23:35 mbox
   -rw-r--r--    1 grfz     grfz      5438580 2004-01-29 23:38 mbox.bz2
   -rw-r--r--    1 grfz     grfz      6779264 2004-01-29 23:36 mbox.gz
   
   0 pit:/tmp$ rdiff-backup -b testdir testdir-backup

I modified the mbox slightly by deleting two unimportant header in
the first mail, rebuild the two compressed files and did a
rdiff-backup: 

   0 pit:/tmp/testdir$ cat mbox |gzip -9 >mbox.gz;cat mbox |bzip2 -9 >mbox.bz2; 
ls -Al
   total 57084
   -rw-r--r--    1 grfz     grfz     46154774 2004-01-29 23:40 mbox
   -rw-r--r--    1 grfz     grfz      5437410 2004-01-29 23:42 mbox.bz2
   -rw-r--r--    1 grfz     grfz      6779265 2004-01-29 23:40 mbox.gz
   
   0 pit:/tmp$ rdiff-backup -b testdir testdir-backup
   
   0 pit:/tmp$ ls -Al testdir-backup/rdiff-backup-data/increments/
   0 pit:/tmp/testdir-backup/rdiff-backup-data/increments$ ls -al
   total 11968
   drwx------    2 grfz     grfz          100 2004-01-29 23:43 ./
   drwx------    3 grfz     grfz          600 2004-01-29 23:43 ../
   -rw-r--r--    1 grfz     grfz      6779788 2004-01-29 23:36 
mbox.gz.2004-01-29T23:38:12+01:00.diff
   -rw-r--r--    1 grfz     grfz      5438999 2004-01-29 23:38 
mbox.bz2.2004-01-29T23:38:12+01:00.diff
   -rw-r--r--    1 grfz     grfz         4803 2004-01-29 23:35 
mbox.2004-01-29T23:38:12+01:00.diff.gz

In fact the increments of both compressed files are bigger then
the original compressed files.

I then deleted the first backup and did it again, this time with
the --rsyncable option:

   0 pit:/tmp/testdir$ rm mbox.gz 
   rm: remove regular file `mbox.gz'? y
   0 pit:/tmp/testdir$ cat mbox |gzip -9 --rsyncable >mbox.gz
   0 pit:/tmp/testdir$ ls -Al
   total 57372
   -rw-r--r--    1 grfz     grfz     46154774 2004-01-29 23:40 mbox
   -rw-r--r--    1 grfz     grfz      5437410 2004-01-29 23:42 mbox.bz2
   -rw-r--r--    1 grfz     grfz      7076210 2004-01-29 23:46 mbox.gz
   
   0 pit:/tmp$ rdiff-backup -b testdir testdir-backup

Deleted two header in the first mail and did it again:

   0 pit:/tmp/testdir$ cat mbox |gzip -9 --rsyncable >mbox.gz;cat mbox |bzip2 
-9 >mbox.bz2; ls -Al
   total 57372
   -rw-r--r--    1 grfz     grfz     46154537 2004-01-29 23:46 mbox
   -rw-r--r--    1 grfz     grfz      5437393 2004-01-29 23:48 mbox.bz2
   -rw-r--r--    1 grfz     grfz      7076151 2004-01-29 23:47 mbox.gz
   
   0 pit:/tmp$ rdiff-backup -b testdir testdir-backup
   0 pit:/tmp$ ls -Al testdir-backup/rdiff-backup-data/increments/
   total 5340
   -rw-r--r--    1 grfz     grfz         4810 2004-01-29 23:40 
mbox.2004-01-29T23:46:22+01:00.diff.gz
   -rw-r--r--    1 grfz     grfz      5437829 2004-01-29 23:42 
mbox.bz2.2004-01-29T23:46:22+01:00.diff
   -rw-r--r--    1 grfz     grfz         7558 2004-01-29 23:46 
mbox.gz.2004-01-29T23:46:22+01:00.diff


So while the gzip -9 --rsyncable produced slightly bigger archives
(~ 4.38 %), the increment in the second test case was significantly
smaller than in the first test case without the --rsyncable
option.  The size of the increment was roughly the same as the
increment of the not compressed mbox.

Sadly the last stable gzip version (1.24) is several years old.
So it is totally unclear when this feature will become widely
available. 


Ciao; Gregor




reply via email to

[Prev in Thread] Current Thread [Next in Thread]