[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Restarting development ... or starting over

From: Randy Syring
Subject: Re: [rdiff-backup-users] Restarting development ... or starting over
Date: Tue, 06 Apr 2010 09:08:21 -0400
User-agent: Thunderbird (X11/20100317)

Daniel Miller wrote:
I wasn't really prepared to make this announcement so soon, but now seems like 
a good time to let the community know. I've been working on a new 
implementation of rdiff-backup since about a month ago when I dug into the 
current codebase and discovered its disappointing quality. While what I have 
right now is functional and works on simple cases, it does not cover the broad 
range of features currently offered by rdiff-backup. I could use some help in 
bringing it up to par if others are interested in the path I have taken. While 
I have used the current codebase for direction and inspiration, I have started 
with a clean slate for several reasons:
I'm interested and am looking forward to seeing the code.
- An automated test suite makes adding new features and long-term maintenance 
much easier. Adding this to the current codebase is both hard and boring. One 
thing that makes it very hard to write tests for the current codebase is the 
widespread use of globals. My new implementation has been developed using TDD 
and minimal use of globals (e.g. for loggers and constants).
YAY TDD!  :)
- The current repository layout has a critical design flaw that causes 
performance degradation as a repository grows. Most difference information is 
stored in a single file tree (rdiff-backup-data/increments), that has a very 
similar structure to the mirror. The problem is that as files get 
added/deleted/changed the directories in the increments tree are always growing 
in size, meaning it takes longer and longer to list the contents of directories 
in the tree. This performance problem is negligible in small-to-medium sized 
backup sets, but becomes apparent in very large backup sets as the number of 
increments grows. I have redesigned the repository layout in my new 
implementation to eliminate this performance issue. Note that I do not know for 
sure if my new layout will completely eliminate this problem since I have not 
tested it yet with a very large backup set over a long period of time.
Can this be tested further? It would suck to get further down the road with this repository structure and find out it didn't really help the problem.

Randy Syring

"Whether, then, you eat or drink or whatever you do, do all to the glory
of God." 1 Cor 10:31

reply via email to

[Prev in Thread] Current Thread [Next in Thread]