|
From: | Wiebe Cazemier |
Subject: | Re: [rdiff-backup-users] Feature requests questions/discussion |
Date: | Thu, 27 Oct 2005 13:37:49 +0200 |
User-agent: | Mozilla Thunderbird 1.0.7 (X11/20051026) |
Ben Escoto wrote:
Yes, I just checked some patches into the unstable tree which do just that. So right now each mirror_metadata entry for a regular file has an "SHA1Digest" field with the 40 character hex digest in it. Only hash writing has been added--it doesn't actually do anything with the information yet. I'm not sure about a speed penalty, but I just recently realized the biggest drawback of writing hashes. Hash data is incompressable, and so a 160 bit hash like SHA1 will add at least 20 bytes (and probably more) per regular file to the size of the compressed mirror_metadata file. At least for my usage, this approximately triples the size of the mirror_metadata file. I like to keep about a year's worth of backups of my files, and I have about a million files. So adding the hashs would turn each of my mirror_metadata file from 12MB as they are now to 32MB+. Over a year that would cost about 8GB. I was assuming before I would just turn hashing on and not expose any other option. But with this tradeoff I think we need to give people the option. So what do you think the default should be? Keep the hashs and triple the size of the mirror_metadata file?
Without the hashes, the 8 GB of metadata would be a little under 3 GB. In my opinion, that doesn't matter much, but I tend to ignore that some people just don't have the resources to spare. But a difference of 5 GB for a year? One question you can ask, is "is your data worth the investment of a bit of HD space". In my opinion, this can be standard behaviour, not available under an option.
Disadvantages of options, is that it is possible to forget to supply one or two sometimes. I do my backups from scripts, but what if someone does not? What will happen if you forget --store-checksums? Will you have an archive with files of which some are checksummed and some are not? Or will rdiff-backup simply enable checksums if you're backing up to an archive which already has them?
In the end, my opinion is this: either make it standard behaviour, or when enabling it with an option, make sure when enabled for the first backup, subsequent backups have it enabled implictly. Or, when first disabled and then sometime later enabled, keep it enabled for subsequent backups as well.
Perhaps these issues justify just having it enabled all the time.BTW, what is your plan what will happen when one upgrades to an rdiff-backup version which enables checksums? Will an archive be backwards compatible with older versions?
[Prev in Thread] | Current Thread | [Next in Thread] |