rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] feature requests and notes


From: Ben Escoto
Subject: Re: [rdiff-backup-users] feature requests and notes
Date: Sat, 1 Nov 2003 23:58:56 -0800

On Sat, 01 Nov 2003 23:30:12 -0500
address@hidden (Andrew K. Bressen) wrote:
> I was thinking of adding a bit more stuff to the metadata file, adding
> an option to rdiff-backup to place a second copy of the metadata file
> in some specified location to be used for the new functionality, and a
> set of CLI utilities to perform operations on the spare metadata file.
> 
> Looking at an rdiff-backup metadata file, I see filetype, size,
> modtime, uid, gid, mode (permissions), and device number (if any)
> being stored. I assume if I was running 0.13.x and had a filesystem
> that supported ACL's and other EA's, that they'd be in there too,
> along with inode number and ctime.

ctime is still on the todo list.  ACLs and EAs are stored in separate
files, in the formats understood by {get|set}f{acl|attr}.  Currently
inode numbers are only stored if linkcount >= 2.

> I'm assuming rdiff-backup doesn't store atime or any hashes of the
> files.

That's right.  Neither would be very expensive though.

> A tripwire or integrit database is basically one of those metadata
> files with a bit more stuff in it: atime and several different hashes
> of the file. So, if those things get added, to make a file integrity
> checker, all that's needed is a program that compares the actual
> filesystem to the metadata file and reports differences. It does need
> a bit of logic to deal with a list of allowed differences it shouldn't
> complain about.
> 
> To make a file locator requires a program that searches the
> metadata file, and which perhaps is setuid and can tell which files
> the invoking user is allowed to see info about. Making it able to act
> substantially like find(1) but with less heinous syntax is nice.
> 
> To make a duplicate file finder, sort the file on a hash
> and report doubles. Make an option to exclude 0 length files. 
> 
> If these programs can deal with multiple metadata files, that's nice. 
> Then one could do a duplicate file find or locate across multiple systems
> easily. 
> 
> I'm assuming that people writing this stuff could reuse some 
> of rdiff-backup's modules/objects/code but that they would be seperate 
> programs so as to minimize the complexity/error-proneness impact on
> rdiff-backup itself. I suspect rdiff-backup would be the most efficient
> place to put the hash generation since it has to read everything at 
> some point anyway, but if this seems like a performance or complexity 
> issue then perhaps the hashmaking could be done seperately from the backup 
> pass and/or live in a different utility. 

One question that would have to be decided is what interface to give
for access to this metadata file.  For instance, we could provide
functions to list the files in a directory, to return the statblock of
a file, to read a symlink, etc.  But then it seems we are just
building a file system using a non-standard interface.  We could
conclude two things from this:

1.  There is no point, because the file system itself already provides
    a better/quicker/non-redundant interface to its metadata.  Why
    would a file locator want to read my rdiff-backup metadata file
    instead of just reading the directories themselves?

2.  Or if there is a point (say to access older information), perhaps
    we could take a page from the duplicity discussion earlier and
    write a filesystem interface to the rdiff-backup metadata file.
    Then existing utilities like find would already be compatible.

I am (slowly) making progress on the duplicity archive idea talked
about earlier, so for #2 there could be the possibility of overlap
between the rdiff-backup metadata files and the duplicity archive
index, and perhaps they could even be in the exact same format.

It seems compressed flat text files like the current metadata files
wouldn't be the best choice for this.  A traditional file system
layout would be better, but traditional file systems still need to be
indexed like with locate/updatedb, which violates the concept of
one-pass backing-up and indexing.  Perhaps some even more complicated
format is called for?


-- 
Ben Escoto

Attachment: pgplo8YPcWQom.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]