rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re[8]: [rdiff-backup-users] Verify times increasing


From: listserv . traffic
Subject: Re[8]: [rdiff-backup-users] Verify times increasing
Date: Wed, 25 Nov 2009 09:25:20 -0800

I'll lay out my vision of how I see this to make sure we're on the
same page, and perhaps you'll agree with my conclusions.

I'm quite sure you understand how RDiff works, but humor me...

RDiff does a backup of the current files, moving delta's (though with
a local file system drive it's doing all the check-sums etc on the
same machine - thus "less efficient" than a SSH transfer) - when
you're done you get:

1) Regular files that are identical copies of the files that got backed
up. (Nothing needs to be done for a restore here, they're exact
copies.)

2) *Reverse* delta's (and meta data) to roll-back the files in (1)
back to the "last" revision.

So, a --verify isn't needed to "verify" the current files. The very
nature of RDB is that they're exact. (provided you trust the RDB
protocol...which we assume.)

A --verify IS needed when you want to check an "older" version to be
sure that something hasn't borked your repository for an older delta
set. [But the "current" files are already verified, IMO]

So, your most important data the current data is verified.
[IMO] Progressively older delta sets are each less certain, as they
all get layered on top of each other in reverse order to get to
"older" sets. [But in general, I consider each "older" set to be
progressively less important - at least in general.]

So, I see your problem as the following.

1) Verify that the current backup completed properly.
(I do this via logs and exit codes. I don't "double" check the
current backup by doing a --verify on the current backup set. I
implicitly trust that RDB does it's job properly and that at the end
the hashes will match properly and that the current "remote" files do
equal the current "local" files. {i.e. the files that were the
source of the backup equal the backup files)

2) Verify that your older delta's are as intact as possible. That all
the meta-data, deltas and current files can be merged and rolled-back
to whatever desired end-point you want.

(This is where I use --verify - it's not perfect because there's not
a way to check every delta-set for every single file in the
repository - at least not easily. [A recursive loop checking every
version would do that, but as you say, it's going to be very resource
expensive.])

3) Verify that the data is exact from your FW800 drive to the USB
drive on the mac-mini.

(I wouldn't use a --verify for this. As long as the files are equal
from the FW drive to the USB drive, if you can --verify on the FW drive
[source] you should be able to --verify on the USB drive too. So I'd
either "trust" rsync to be sure they're equal - or do something like
you are doing - checking that the FW files are exactly equal to the
USB files.

I'd do a verify on the fastest drive on the most powerful system.
Plus you don't need to do this all the time, say once a week - over a
weekend probably works. [And perhaps a full recursive loop through
all the diffs would be possible. If you write a bash script to do
that, I'd love to have it!])

To recap:
** Trust RDB does the backup properly and that source = destination
without additional checks.

** --verify the backup repository on the FW drive, and as much as
possible that all the older deltas and meta-data are intact and
functioning properly.

** check that the FW drive does copy exactly to the off-site USB
drive - but don't use --verify to accomplish this task. Just make
sure that the "off-site" repository is exactly equal to the "on-site"
FW drive.

HTH

-Greg



>>>> I'm not sure what you're doing with your --verify...
>>>>
>>>> It *sounds* like you want a full CRC style check of the *current*
>>>> files after the backup is complete. (i.e. File X gets updated with a
>>>> delta, and you want to verify that file X is the same both on the
>>>> source and destination locations/drives.)
>>
>>> Yes, although it's more of an internal consistency check within the
>>> rdiff-backup repository itself. I'm looking for a way to quickly
>>> verify the integrity my entire rdiff-backup repository.
>>
>>> In my scenario the repository is synced to an external USB drive that
>>> gets rotated each day (i.e. each day I put yesterday's drive in
>>> storage and bring a different drive out of storage to use for the  
>>> next
>>> backup). I use rsync to transfer my rdiff-backup repository (which
>>> gets updated daily) to the USB drive. Then I run rdiff-backup -- 
>>> verify-
>>> at-time to verify that the files on the USB drive are not corrupt.  
>>> But
>>> lately this has been taking too long.
>>
>>> Does that make sense?
>>
>> Yes, and the USB connection may explain the longish verify times,
>> since it's somewhat slow, compared to a SATA drive connected directly
>> to the controller...

> USB probably does have something to do with how long it takes. But on
> the other hand yafic can do a full verify in 1/4 of the time on the  
> same drive with the same data, etc. So maybe rdiff-backup could be  
> made to be faster?

>> But I see that you want to verify the "local" RDiff repository to the
>> "of-line" one.

> I'm not sure what you mean by this statement... I want to do an  
> internal consistency check on my rdiff-backup repository after it's  
> been rsync'd to the USB disk. I need to be sure that the data on the  
> USB disk is valid. I am doing the verify on the USB drive because that
> is the last place that the data will be copied before it goes into  
> secure storage (for up to a month, but normally just a few days).  
> Maybe an outline of my data flow will help you to understand what I'm
> trying to accomplish.

> First the hardware:
> - Xserve with raid array - this is being backed up with rdiff-backup
> - Firewire 800 drive attached to Xserve - staging location for rdiff- 
> backup repository, gets a new revision each night
> - Mac Mini - remote backup "server"
> - USB 2.0 drive attached to Mac Mini - gets a copy of the rdiff-backup
> repo from the Firewire 800 drive on the Xserve

> Now the data flow:
> - Xserve runs rdiff-backup from raid array to local firewire drive
> - Xserve runs rdiff-backup --verify-at-time 0B on local firewire drive
> to verify integrity of most recent revision (this step may not be  
> necessary)
> - Mac Mini runs rsync to copy rdiff-backup repo from Xserve firewire  
> 800 drive to local USB drive
> - Mac Mini would now like to verify the integrity of the rdiff-backup
> repository that it just rsync'd to the USB drive

> During this last step I would rather not tie up any resources on the  
> Xserve. Instead, I want to do a fully local (to Mac Mini) verification
> of the rdiff-backup repository. This verification should let me know  
> if any link in the (hardware) chain is failing: is the firewire 800  
> staging drive failing? is the USB drive failing?

>> Not sure how to do that - I'd guess you could do it with some other
>> tools - not storing the hashes - just a full compare each time. (How
>> big is the repository? [I think you said, but I don't recall.]

> 100 GB mirror + 80 GB of rdiff data. So almost 200 GB

>> ---
>> But I'd guess your "local" repository isn't on the same disks as the
>> data, right?

> Right.

>> If so, then it's probably not a huge deal if it takes 20 hours to
>> check the local repository against the remote. [Though I guess all
>> that disk channel activity might impact other disk through-put too...]

> The drive will be moved to a secure location, so it needs to happen as
> quickly as possible. If we have a disaster (fire, etc.) a backup  
> doesn't do us much good if the most recent snapshot is still online  
> being verified (and hence consumed by the fire).

>> (Add a controller? Dunno...)
>>
>> I use a similar system and I don't verify the local repository to the
>> remote, though perhaps I should. (I trust rsync to make sure they're
>> the same...since it's not just copying the files - it's doing hash
>> matches like RDiff...)

> Even if rsync verifies that they're the same this is only a false  
> sense of security since the staging repo (the source that rsync copied
> from) could be corrupt and you'll never know it. This corruption could
> be sneaking into old revisions which you don't bother to verify  
> because it takes too long. There needs to be some way to verify that  
> ALL of the data is fully intact after it's been copied... --verify-at-
> time almost gets there, but not quite. It could get you there if you  
> have lots of time to do a verify-at-time for each revision in the  
> repo, but I'm guessing that would be prohibitively expensive in most  
> cases.

>> BTW, is this on a windows platform? (Curious...) Ah, probably not
>> since yafic isn't... :)

> Nope. All machines are running Mac OS. I have aspirations to add some
> Windows machines at some point, but that's not likely until I get a  
> faster verify.

> ~ Daniel



-- 
Best regards,
 listserv                            mailto:address@hidden





reply via email to

[Prev in Thread] Current Thread [Next in Thread]