|From:||Marcel (Felix) Giannelia|
|Subject:||[rdiff-backup-users] more experiments, + apology|
|Date:||Sat, 14 Mar 2009 14:53:51 -0700|
|User-agent:||Thunderbird 18.104.22.168 (X11/20080726)|
Hello again,First of all, I just re-read the changelog and checked which version of rdiff-backup was installed on the server I've been playing with -- and I owe the developers an apology. The current version does diff mirror_metadata files (and has done for quite a while), so that I've indirectly done the same thing in my "rdiff-backup-rollup" experiments is no great achievement. It turns out that our server has (yikes) version 1.0.4 -- a consequence of Gentoo Linux's package repositories being dreadfully far behind, and my unfamiliarity with Gentoo.
Regardless, parts of what I said are still true (very small patches between increments), but it's looking now more like something's confusing the rsync algorithm itself, rather than rdiff-backup. There are files in the backup set that seem to change daily (zipped backups that Moodle produces), but only by a few bytes. Most of the file stays exactly the same, but trying to use rdiff by itself on any pair of them causes a patch that's just as big as the file. Somehow, putting several of them together in a tar file (even gzipped) clues rdiff/rsync in to the similarities, and then it can make a decent patch.
Trying my procedure on a different machine's backup sets still makes patches that are smaller than the increment files, but by a much lesser amount that can be totally explained by the mirror_metadata and file_statistics files. I'm a little disappointed that it's all explicable, but nonetheless, space savings are fun :) For instance, this second machine's backup set includes rotated log files, so I was still able to compress its archive of old increments by 76% using rdiff-backup-rollup followed by rdiff'ing all but a few files (I kept a whole file every 15 increments or so as a basis). (I cheated a bit to get that much compression -- I wrote another script that removes file_statistics files from the increments entirely [they look optional], then gunzips all the individually gzipped files in the increment and re-tars it. That makes the tar file much bigger, but rdiff has an easier time with the uncompressed data. After I've generated increments and re-compressed everything, it's smaller than it was originally.)
Now that rdiff-backup increments mirror_metadata, though, these experiments are of limited interest except in dealing with old backup sets retroactively (and for the side-effect of file move detection).
Is this of interest to anybody? If not, I'll relegate myself to a quiet corner of the wiki and stop cluttering up the mailing list with it :)
|[Prev in Thread]||Current Thread||[Next in Thread]|