rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Feature requests questions/discussion


From: Wiebe Cazemier
Subject: Re: [rdiff-backup-users] Feature requests questions/discussion
Date: Thu, 27 Oct 2005 13:37:49 +0200
User-agent: Mozilla Thunderbird 1.0.7 (X11/20051026)

Ben Escoto wrote:
Yes, I just checked some patches into the unstable tree which do just
that.  So right now each mirror_metadata entry for a regular file has
an "SHA1Digest" field with the 40 character hex digest in it.  Only
hash writing has been added--it doesn't actually do anything with the
information yet.

I'm not sure about a speed penalty, but I just recently realized the
biggest drawback of writing hashes.  Hash data is incompressable, and
so a 160 bit hash like SHA1 will add at least 20 bytes (and probably
more) per regular file to the size of the compressed mirror_metadata
file.

At least for my usage, this approximately triples the size of the
mirror_metadata file.  I like to keep about a year's worth of backups
of my files, and I have about a million files.  So adding the hashs
would turn each of my mirror_metadata file from 12MB as they are now
to 32MB+.  Over a year that would cost about 8GB.

I was assuming before I would just turn hashing on and not expose any
other option.  But with this tradeoff I think we need to give people
the option.  So what do you think the default should be?  Keep the
hashs and triple the size of the mirror_metadata file?

Without the hashes, the 8 GB of metadata would be a little under 3 GB. In my opinion, that doesn't matter much, but I tend to ignore that some people just don't have the resources to spare. But a difference of 5 GB for a year? One question you can ask, is "is your data worth the investment of a bit of HD space". In my opinion, this can be standard behaviour, not available under an option.

Disadvantages of options, is that it is possible to forget to supply one or two sometimes. I do my backups from scripts, but what if someone does not? What will happen if you forget --store-checksums? Will you have an archive with files of which some are checksummed and some are not? Or will rdiff-backup simply enable checksums if you're backing up to an archive which already has them?

In the end, my opinion is this: either make it standard behaviour, or when enabling it with an option, make sure when enabled for the first backup, subsequent backups have it enabled implictly. Or, when first disabled and then sometime later enabled, keep it enabled for subsequent backups as well.

Perhaps these issues justify just having it enabled all the time.

BTW, what is your plan what will happen when one upgrades to an rdiff-backup version which enables checksums? Will an archive be backwards compatible with older versions?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]