[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] File change detection using hashes

From: Wiebe Cazemier
Subject: Re: [rdiff-backup-users] File change detection using hashes
Date: Mon, 13 Feb 2006 23:42:39 +0100
User-agent: Mozilla Thunderbird 1.0.7 (X11/20051026)

(feature-discussion summary at the end of this mail)

On 02/13/06 21:30, dave kempe wrote:

> I think I misread your request. I believe the current method to work
> with changing files during the backup is to take an LVM snapshot. Is
> that going to be possible for you?

I think there still is some misunderstanding.

The problem is in the "diff" part of rdiff-backup. It reads the source
dir, and any file for which the mtime or size is different than the most
recent backup of it, is selected to make a backup of.

But, as I showed with my "mv" example (orignally "cp" example, but
should be "mv"), the contents of a file can change without the mtime or
size changing. The example shown is not the only way files can change
without their mtime or size changing. In my original feature request, I
wrote down an elaborate explaination when a system's package management
is involved.

But it comes down to this: using only mtime and size to determin if a
file should be selected for backup is unreliable. The data of the file
must be checked, hence the checksum comparison.

But, now that you mention it, detection for changes in files which are
processed _during_ a backup (which produces the well-known
update-errors) also rely on mtime, which is not 100% reliable. But I
don't think that's too bad. If it is, the ctime could be used. For this
feature, the ctime may very well be an excellent choice, because we're
only talking about the time-span it takes to do a backup. Using ctimes
to for the file-selection-for-backup situation is not practical, because
very little is needed to change a ctime, doing a backup of your disk
with "dar" with default settings is one. That would cause rdiff-backup
to think every file on disk has changed.

You know, come to think of it, perhaps it's not that bad a choice to use
ctimes to detect if a file has changed for determining if it should be
backed up or not. It could be available as an option. When the ctime has
changed, _some_thing must have happened to it. Then, when it starts to
diff the files, the resulting diff increment for the new session is
almost 0 bytes (there is some header info or something, but not much).
This could be a lot faster then my --checksum-diffs option, because it
only reads the contents of the files of which the ctime has changed. But
a possible problem with this is, that some filesystems don't have
ctimes. But, fvat's mtime for example acts like a ctime. And when you
request ctime, you get mtime it would seem. So, that's not really a
problem. And as for ctimes changing because of dar, dar can be run with
the "--alter=atime" option, to avoid dar resetting the atime (which
results in a change in ctime, restoring atime). Ben tried to implement
ctime checking, but there was a problem, and he forgot what it was :(

OK, to sum this up, mostly for Ben (and I do hope you're reading this,
because it's kind of critical IMO):

Something _has_ to be devised to detect changes in files properly, to
avoid files not being backed up. Perhaps you could try to implement an
option for ctime checking, and possibly discover again why that's not
possible. If it _is_ impossible, my --checksum-diffs should be
implemented, IMO.

And, would it be possible to check for changes occured in files _during_
a backup with ctimes? That would be more reliable then mtimes.

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]