rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] Feature requests questions/discussion


From: Wiebe Cazemier
Subject: [rdiff-backup-users] Feature requests questions/discussion
Date: Fri, 21 Oct 2005 17:48:47 +0200
User-agent: Mozilla Thunderbird 1.0.6 (X11/20050729)

Ben Escoto,

I would like to discuss two feature requests. I'd like your input on the matter. It's quite some text, so I hope you'll bear with me :)

First, I would really like an option like --store-checksums so that rdiff-backup calculates md5 hashes when doing a backup, and that that checksum it used for integrity checks upon restoration. But at restoration time, the check should not be an option, it should always be done, if the file being restored has hash info. This to prevent user-mistake. I once severly corrupted a partition on an external HD because of USB2 transfer errors (which I haven't been able to solve BTW). A lot was damaged, but of every file that resided in compressed files I was told about the corruption, beceause of the hashing usually done in zips/gzips/etc. The rdiff-backup repository became useless, because it had no idea the files were damaged. Such an option would of course be annoying to most people, because it's quite slow, but most of my backups are done through cron, so it doesn't matter for me. I think as an option, this could be very valuable. It doesn't even have to be that slow, if it's cleverly integrated with the copy routine.

The other thing I'd like to discuss is how rdiff-backup detects change. I noted earlier in this list that mtime+size checking, which rdiff-backup does IIRC, is not very reliable. Mtimes can be changed. For example, when I install a new GCC on Gentoo, the package manager looks for hardcoded filepaths in a whole bunch of files on the sytem and changes them to reflect the newly installed GCC. Portage (gentoo package manager) uses the mtimes of files to determine if the file still belongs to a package. So, when you uninstall a package and some file has a different mtime than is stored in the meta-file, it is assumed that this is a new file, meaning it doesn't belong to the package, and is not uninstalled. Now, about those hardcoded filepaths. When portage changes them, the mtime of the files are also changed. I don't know if portages then restores the mtimes back to what they were, to avoid orphaned files, but it should. And if it doesn't, it may in the future. Now, when you run rdiff-backup again on your system, those files are not detected as changed and they are left alone. This is of course not desirable behaviour.

A different way of checking for change would be checking the ctimes. But, this of course has the problem that not all filesystems have ctimes. And, when you restore your backup to a new disk and run rdiff-backup again, the entire system is considered as changed. This is not very ideal.

A different approach would be using the checksums feature describe above. Rdiff-backup could calculate a hash of every file (or perhaps only of those files with unchanged mtimes because when the mtime has changed, it needs to backuped anyway) and use that for change comparison. This of course has the disadvantage of yet more slowdown, because now even if little has changed in what your backing up, it's contents is read completely. But, perhaps this behaviour could also reside under an option, an option besides the --store-checksums, like --checksum-diffs (with the latter requiring the former to be present, for example).

Summarized, --store-checksums would calculate checksum info for integrity checks, and --checksum-diffs would use checksums for change-detections, instead of mtime+size.

I'm very curious to find out if you find my requests valid.

Regards,

Wiebe Cazemier




reply via email to

[Prev in Thread] Current Thread [Next in Thread]