[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [rdiff-backup-users] About backups and increments
Re: [rdiff-backup-users] About backups and increments
Tue, 23 Aug 2011 22:44:38 +0000 (UTC)
Maarten Bezemer <mcbrdiff <at> robuust.nl> writes:
> If it is detected that a file has changed (based on file attributes), a
> new file in the destination directory is created using a "temp name", and
> it is synced to its new contents, using the old version to speed up the
> rsync process. After that, an increment is created, and only then will the
> old version be removed.
> This process is followed sequentially for all files, so the total space
> needed would be the space for the increments that are created during this
> session, plus the size of the largest file in the repository.
> Of course, you usually don't know in advance how large the increments will
Of course. I just wanted to understand how space was managed.
> Maybe I now know what I didn't understand in your line of questioning.
> With rdiff-backup, increments are for individual files, and only when
> these individual files have been changed. So, there are no reverse diffs
> if a file has not been changed. For a data set of 1000 files with only 10
> files changing since the previous run, the increments dir would only
> contain 10 reverse diff files for this run.
> Likewise, if a file hasn't been changed for 3 months and it is changed
> today, but I only want to keep 1 month of history, I can NOT simply ditch
> the 3-months old version. Maybe it wasn't changed for all these months,
> but it is still yesterday's version and has to be kept in history for the
> coming month minus 1 day...
> So.. lets assume you make weekly backups. (Hoping it will be more often,
> but just as an example.)
> You want to keep history of 2 months. That's about 8 or 9 weeks.
> But sometimes you make an extra backup halfway through a week, and
> sometimes you go on a vacation and don't run any backup.
> So, in these cases, you might want to keep history for 2 months, but also
> at least 5 increments, even if that means it will be more than 2 months?
> Would it really be useful to.. eh.. keep increments from 4 months ago if
> you forgot to run backups for the last 2 months? This sounds just like
> "oh, I didn't make backups over the last two months, but I do happen to
> have some historic versions from 3 months ago containing your PhD thesis
> you've been working on... for the last 3 months....."
I'll pick one of my scenarios here, so please don't consider this as the
"only" way I'm currently using rdiff-backup.
I have a small laptop connected to a NAS drive. I have done a little
perl tool that wraps rdiff-backup so that I can have some automation over
it, for instance I can control which directory gets backed up, pre-post
backup scripts per-directory, backup frequency, etc. I integrated the script
with both cron and udev, so as soon as the laptop is plugged to the NAS
the backups start to roll. If enough time passes, backups gets re-run (some
directories have hourly granularity).
The backup frequency though is not guaranteed. If I move the laptop away from
the NAS, I will have holes (in that case I have more than one NAS, but I
Of course, space (although not a problem in this case) is limited. What I've
done is to allow for each tree to have a different retention mechanism. Some
trees have unlimited retention, some have lesser guarantees such as
"one month minimum, but more if there's space", some even less as "just one
month, prune everything else".
To make up the free space, before the backup, I simply speculate with a
linear predictor for each tree, and decide what to prune using a simple
proportion between the trees. Again, I want to retain as much as possible,
but within the limits.
This way, by default I can restore the machine fully from scratch at any
time. I can also possibly do a full restore of months ago (if there was
enough space), but that's not guaranteed. What I can always do is recover
any important file at any time in the past, and recover a working state
of the machine.
I want at least "X" copies of increments because you may never know at
which instant some files were backed up. Having N copies allow me to try
(in emergency situations) other snapshots. Since backups are not regular,
you may never know in terms of "time" how to prune them.
And yeah, that's pure perl hackery. I also do quite good with "ugly".
I'm quite satisfied with this particular setup, but I feel like I can
reduce the hackery ;)
> Let's just say that I don't think having such an option would be a really
> nice thing to have
> And creating a small script would indeed be far easier
counting output lines is not my favorite, but yeah, I can do that.
> Side note: I never automate the removal of old increments. Always do that
> by hand, first without --force to check the increment dates it announces
> that will be removed, then with --force if it looks OK. The only thing
> that's automated wrt increment removal is a cron job reminding me of the
> task. I could even modify it to remind me daily if increment removal is
> due and wasn't done yet, but for now, I keep these reminders in my inbox
> until the removal is done.
Sometimes automatic removal is a "Good thing(tm)", but again it depends.
I'm trying to maximize snapshots while still guaranteeing some directory