[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Safe to compare by ctime?

From: Andrew K. Bressen
Subject: Re: [rdiff-backup-users] Safe to compare by ctime?
Date: Sat, 28 Jun 2003 14:52:24 -0400
User-agent: Gnus/5.090024 (Oort Gnus v0.24) XEmacs/21.4 (Common Lisp, linux)

Ben Escoto <address@hidden> writes:
> Hi all, I'm wondering how safe it would be to determine whether
> anything about a file has changed simply by looking at its ctime (plus
> probably some sanity checks, like the file type).

As a sysadmin, my feeling about backup software is that it
should be iron-clad; no matter what weird stuff happens, 
it has to work. So, my desired algorithm for seeing if a file
has changed would be to check all the metadata plus a checksum
(or two different checksums, or an actual full compare); 
anything less and a clever trojan, an experimental filesystem with
a bug in it, or a sysadmin script gone badly awry could munge
things up. 

That said, another nice attribute of backup software is that its
elapsed run-time be as short as possible (I'm assuming speed is the
reason to propose only using ctime in the first place) to avoid two

(1) unsynched files. two files are related to each other, one
gets backed up, they both change, the other gets backed up. 
  rdiff-backup does warn if a file has changed during the
backup process, which is important, but it's nice if this simply
doesn't happen in the first place.  

(2) system crash/reboot/etc during backup, leaving you with a partial backup.
you could drop back to the last increment (maybe rdiff-backup does;
I haven't checked) but it's nice to be able use whatever files made it
across before the crash, except then you have to make sure that
problem #1 doesn't occur; all in all, it's messy and the whole
situation is best avoided by faster backups. 

I think one could get the best of both worlds at the cost of feature
creep and complexity. 

Make using ctime only an option, AND 
add a run-mode that (1) verifies everything for a specific increment and 
                    (2) reports what has changed (and thus can't be verified).

This would let folks run a backup quickly, and then the verify can
take its leisurely checksumming time. A few syslog files and other
such might change in the meantime and be unable to be verified; this
is ok for most people. For the people (or files) for whom it is not
ok, they need to run their backups using more than just ctime.

One then gets the nice side benefit of being able to generate 
reports of, say, exactly which files have changed since a given date. 

As long as I'm soap-boxing and feature mongering, by the way, I'll do
my usual spiel about backup software features. [Ben, your program is
lovely as is, feel free to ignore me, I just prostelytize this upon
the slightest excuse; if you choose to implement, I'll send you
cookies and flattery, but I don't want to seem like I'm expecting you
to make my priorities into yours; consider the following as food for

Two common applications (and a few less common ones) are
closely related to backups and to each other.
These are: integrity checking and 
           file finding (ala locate(1)).  

A good backup system, by building a checksummed and timestamped list
of files (and by the way, I don't know for sure that rdiff-backup
stores checksums in the metadata files instead of computing them from
the source and target files on the fly), has done most of the back-end
work of tripwire/integrit/etc and of the updatedb program for
locate/slocate/etc. If backup software were to include some stuff to
manage this data, and some front end tools to access it, life would be

In rdiff-backup's case, I think the management tool needed would be
something to grab parts of one or several metadata lists and move them
someplace. This is needed for an integrity tool, because one wants to
be able to cut a piece of read-only media (perhaps as small as a floppy)
to run verifies against. This is needed for a locate tool because one
may wish to backup a large filesystem, but only enable searches against
a smaller part of it, or conversely backup two filesystems seperately
but search them together. 

The front end tools would be something to run an integrity verify
against specified files in one or more metadata files, 
something to list the contents of a metadata file, 
and something to perform find(1) against a metadata file. 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]