rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Feature requests questions/discussion


From: Ben Escoto
Subject: Re: [rdiff-backup-users] Feature requests questions/discussion
Date: Thu, 27 Oct 2005 01:29:02 -0500

>>>>> Wiebe Cazemier <address@hidden>
>>>>> wrote the following on Tue, 25 Oct 2005 14:00:30 +0200
> 
> Do you think it's possible to combine it with the copy
> syscall/API-call rdiff-backup probably uses so that the data which
> is read by copying is checksummed at the same time? If it is, it
> wouldn't even take more time to backup a new file than it would
> without checksums.

Yes, I just checked some patches into the unstable tree which do just
that.  So right now each mirror_metadata entry for a regular file has
an "SHA1Digest" field with the 40 character hex digest in it.  Only
hash writing has been added--it doesn't actually do anything with the
information yet.

I'm not sure about a speed penalty, but I just recently realized the
biggest drawback of writing hashes.  Hash data is incompressable, and
so a 160 bit hash like SHA1 will add at least 20 bytes (and probably
more) per regular file to the size of the compressed mirror_metadata
file.

At least for my usage, this approximately triples the size of the
mirror_metadata file.  I like to keep about a year's worth of backups
of my files, and I have about a million files.  So adding the hashs
would turn each of my mirror_metadata file from 12MB as they are now
to 32MB+.  Over a year that would cost about 8GB.

I was assuming before I would just turn hashing on and not expose any
other option.  But with this tradeoff I think we need to give people
the option.  So what do you think the default should be?  Keep the
hashs and triple the size of the mirror_metadata file?


-- 
Ben Escoto

Attachment: pgpAljeW5B7N9.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]