rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] Wiki additions...


From: listserv . traffic
Subject: [rdiff-backup-users] Wiki additions...
Date: Thu, 26 Mar 2009 10:04:27 -0700

Since these issues seem to come up often, and they were issues I was
most concerned about, I thought I'd place them on the wiki.

Here's a draft: Comments welcome before I add them.
Did I state anything unclearly, wrong etc. Please save me from later
shame! :)

-Greg

---
Question:
What happens when I have to restore a file that has many reverse diffs to apply 
to it?
It will take the current version of the file and use the meta-data stored to 
tell it how to apply all the reverse differences files that apply back to the 
date you requested, provided it exists. 

Answer:
It has to have all three parts: 
1) The current version of the file as it existed when RDiff-Backup was last 
run. 
2) The meta data that tells the system if/when/how to apply the reverse diffs.
3) All the reverse diffs themselves. 

Question: 
Does the system have to restore all the reverse diffs for a file? What if there 
are dozens or even hundreds? 
What if only one is broken, is the whole process or "restoring" the file broken?

Answer:
Yes, the system has to apply all the reverse diffs that apply to the "version" 
of the file you requested. If there were 200 reverse diffs, because the file 
had changed over 200 rdiff-backup sessions, yes it will have to apply all 200 
reverse diffs to get to the version of the file you want. If any of the three 
parts of the system, current file, meta-data, or reverse diffs are missing, the 
process will break, and you won't get your file. 

(There are ways to attempt to manually salvage the file, but these are far 
outside the scope of this document. Suffice it to say, that if any of the parts 
(file/meta-data/rdiffs) needed are missing, RDiff-Backup isn't going to be able 
to restore it automatically, and all bets are off. You'll be in deep weeds and 
if you're lucky you might be able to get parts of your data back. Perhaps if 
you're really super lucky and the missing reverse diffs overlap others *and* 
you can finagle the restore process, you might get everything back. Or, if it's 
just not your day, you won't get jack, you'll get fired, your dog will bite you 
and you'll get rabies...)

Question:
*Isn't it dangerous to have to rely on all those reverse diffs, especially when 
they're being applied serially, and every single one of the reverse diffs has 
to apply properly, in order to get back to the version I want?

Answer:
Yes, it is "dangerous" - though every definition of dangerous depends on your 
perspective. (Just ask a BASE jumper about what's considered dangerous.) The 
design decision was to only keep the differences and because of limitations in 
the rsync libraries it's impossible to merge rdiffs. While we're certainly not 
trying to convince you to use RDiff-Backup and agree with our reasoning on 
what's best and reasonable, we think reasonable trade-offs were made on 
managing the resources used vs the advantages of redundancy.

Question:
OK, I like most of what I hear, but how can I be sure the whole system retains 
it's integrity? Is there a way to test all the parts of the system and make 
sure they all work, and work properly. For example, can I have the system "self 
test" the archive and let me know if any parts of it fail.

Answer: 
Certainly. The "--verify-at-time xyz" switch is your friend. This switch, in 
essence does a full restore of the file to the time specified in "xyz." In 
brief, it takes the current version of the file, and then uses the meta-data 
and applicable reverse diffs to roll the file back to the date specified. (i.e. 
xyz) It then re-calculates the SHA-1 hash for the re-created file. It then 
checks that newly calculated SHA-1 hash with the SHA-1 hash it stored for this 
file when it was backed up back on the date that corresponds with "xyz."

If any part of the process fails, rdiff-backup will exit with a non-zero 
result. (And it should generate errors to the console...) 

If meta-data is damaged, and it can't figure out how to apply the rdiffs, you 
should get an error message.
If after rolling the file backward to date xyz, the check-sums don't match, 
you'll get an error.

Thus, to test the integrity of every piece of the system, pick a date for "xyz" 
that is at least as old as the oldest rdiff session. This should, by 
requirement, apply every reverse diff in the repository and all the meta-data.

While a successful results of a "--verify-at-time xyz" isn't sufficient to 
ensure that someone hasn't tampered with the rdiff-repository in an attempt, 
for example, to modify executable files - it is very strong evidence that 
chance or bad-luck hasn't damaged the system. Random collisions for the same 
file in the SHA-1 checksum are vanishingly small. (i.e. Two very similar files 
having the same SHA-1 checksum but not being equal, by simple chance (not 
malicious design), is exceedingly unlikely.)






Here's how R-DB works
RDiff-Backup "mirrors" the backed-up files, and for files that have changed 
since the last "backup" it creates reverse diffs.

So, for a respository that covers week-day backups, once daily for a year, 200 
diffs a year...
---
To roll-back a file, you'll need a good current version (i.e. that matches the 
file at the time of the last RDiff-Backup.) and all the RDiffs, back to the 
time RDiff made the target-date archive.
(i.e. You have a year of RDiff-backups, with 200 versions/diffs. You want a 
file from a year ago that changed every single RDiff-Backup run [and thus has 
changes in every single RDiff.]

To restore that file, will require the current version of the file as it was on 
the last RDiff-Backup run, and every single RDiff archive will need to be valid 
and uncorrupted to guarantee a sucessful restore.

Possible methods of verifying the integrity of the RDiff-backup archive...

---
Rough check of the archive...
You can probably ascertain the integrity of the exiting RDiff's by checking the 
integrity of the .gz files. [If the GZ is uncorrupted, it's likely the RDiff it 
contains is OK too.]
However this doesn't guarantee that all the correct increment data is 
available. It just verifies that what data IS there, it is *probably* not 
corrupted.

The situation is a bit more complex than the above explaination, since just 
because you have, for example, the GZ file for the diff doesn't mean you have 
all the required pieces to apply it properly since there are more files 
required to tell RDiff how and what to restore than just the RDiff file...

A simple answer that you can be fairly certain of: If you haven't deleted or 
modified *any* of the files in the RDiff repository, and all the GZ files pass 
integrity checks you're probably OK. 

However, deleting or modifying ANY files in the repository will have serious 
negative consequences for restoring files in that repository. You might still 
be able to do so, but it will require you to hand-edit or create the pieces to 
fake RDiff into doing the restore.

---
If you want to do a more complete test, you can do a "dry" restore back to the 
earliest version in the repository. 

This should apply all the available diffs to all appropriate files.

If there's a problem doing so, RDiff-Backup should throw an error, and by 
examining the error you should be able to determine the problem.

[Or alternatively doing a restore to the earliest critical version. i.e. You 
have a years worth of rdiffs, but only 90 days are critical. Doing a restore to 
90 days ago would test the most critical pieces of the archive, and would be 
less time and compute intensive than doing a full year.]

However, doing a "full" restore could consume a lot of disk-space and will be 
time and compute intensive.
[Does anyone want to give an estimate of how time intensive this might be - 
local disk to local disk?]

---







reply via email to

[Prev in Thread] Current Thread [Next in Thread]