[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Restarting development ... or starting over

From: Daniel Miller
Subject: Re: [rdiff-backup-users] Restarting development ... or starting over
Date: Mon, 5 Apr 2010 14:25:27 -0400

I wasn't really prepared to make this announcement so soon, but now seems like 
a good time to let the community know. I've been working on a new 
implementation of rdiff-backup since about a month ago when I dug into the 
current codebase and discovered its disappointing quality. While what I have 
right now is functional and works on simple cases, it does not cover the broad 
range of features currently offered by rdiff-backup. I could use some help in 
bringing it up to par if others are interested in the path I have taken. While 
I have used the current codebase for direction and inspiration, I have started 
with a clean slate for several reasons:

- An automated test suite makes adding new features and long-term maintenance 
much easier. Adding this to the current codebase is both hard and boring. One 
thing that makes it very hard to write tests for the current codebase is the 
widespread use of globals. My new implementation has been developed using TDD 
and minimal use of globals (e.g. for loggers and constants).

- The current repository layout has a critical design flaw that causes 
performance degradation as a repository grows. Most difference information is 
stored in a single file tree (rdiff-backup-data/increments), that has a very 
similar structure to the mirror. The problem is that as files get 
added/deleted/changed the directories in the increments tree are always growing 
in size, meaning it takes longer and longer to list the contents of directories 
in the tree. This performance problem is negligible in small-to-medium sized 
backup sets, but becomes apparent in very large backup sets as the number of 
increments grows. I have redesigned the repository layout in my new 
implementation to eliminate this performance issue. Note that I do not know for 
sure if my new layout will completely eliminate this problem since I have not 
tested it yet with a very large backup set over a long period of time.

- When the current version of rdiff-backup fails it often aborts completely, 
leaving the repository in a state that needs to be rolled back to the previous 
backup state in order to continue using it. While this is a good conservative 
approach, it potentially results in the loss of difference data that could 
otherwise be saved. I have designed my new version to recover better from 
errors--simply logging unexpected errors and skipping the current task rather 
than aborting completely. I also have plans to make it possible to retain 
incremental data from a failed backup rather than simply discarding it.

- There is currently no (efficient) way to do a complete verification of all 
data in a repository. My new version was designed with this as a requirement.

- Although it is not implemented yet, I have some ideas of how to make use of 
multiple cores to speed up rdiff-backup once the initial backup has been 
created. Backups after the first one (which is usually IO bound) are often CPU 
bound; using multiple cores could help to speed up backups.

- Another thing that is planned, but not implemented yet is the ability to 
remove all traces of selected files from a backup repository. This should be a 
built-in feature of rdiff-backup since it is a common occurrence to have to 
remove files that were backed up by mistake. Currently it is only possible to 
do this by hand (very error prone) which I find unacceptable.

- Did I mention that the new version has been developed from the ground up with 
full unicode support?

Please note that this new version will obviously not be backward compatible 
with older rdiff-backup repositories. While a tool could conceivably be written 
to convert an old repository to the new format, I have no desire to do so, and 
I doubt that anyone else will either...

I have developed this new version using git for version control, which I plan 
to continue using. I am hoping to put it up on github soon.

~ Daniel

reply via email to

[Prev in Thread] Current Thread [Next in Thread]