rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] Regression optimizations?


From: Nathan Lewis
Subject: [rdiff-backup-users] Regression optimizations?
Date: Wed, 26 Oct 2011 01:20:27 -0500
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.23) Gecko/20110920 Thunderbird/3.1.15

I've been playing with and studying rdiff-backup for about a week and for the most part it works well for our scenario - keep a backup mirror that is easy for anyone to access with incrementals if necessary.

However, it has the rather unfortunate property that when an incremental fails, the next run proceeds to do a regression on the entire mirror. I understand that this is necessary to get the mirror back into a consistent state, but it seems like it could be optimized. Logically, if an incremental fails, 99.999% of the files will still be perfectly fine because the failed incremental didn't touch them in the first place. So why does a regression need to touch every file? Can't a regression look at which files have incrementals that need to be deleted and only regress those files? It seems to spend most of its time in the following loop:

1.  Copy the file in question to a .tmp file
2.  Apply attributes/ACLs to the .tmp file
3.  Rename the .tmp file back to the original file.

When there's 400k files in a backup, this actually takes longer than a full backup would. Surely I'm missing some scenario where this is necessary? Couldn't this (extremely common) scenario be detected and just apply the attributes/ACLs to the original file from the mirror metadata? Why is the .tmp file necessary?

This brings up another related question - the attributes are stored in a separate file in the rdiff-backup-data directory, do they really need to be applied to the mirror? I understand rdiff-backup is trying to make the mirror match the original as closely as possible but due to filesystem differences the mirror attributes can't really be trusted anyway. I would actually like to override the mirror's attributes and make them read-only so the mirror can't be messed with, or simply tell rdiff-backup not to bother setting attributes on the mirror's files (particularly when regressing.)

I'm not afraid to go poking around in the source and try to make some changes but I'd like to discuss any side effects or pitfalls first.

--Nathan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]