[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] Re: [rdiff-backup-users] Pretty pictures and new ve

From: Will Dyson
Subject: Re: [Duplicity-talk] Re: [rdiff-backup-users] Pretty pictures and new version of proposal
Date: Tue, 30 Sep 2003 07:00:45 -0400

On Mon, 2003-09-29 at 15:17, John Goerzen wrote:

> That is exactly one thing I was thinking :-)  I really don't see what it
> buys anybody.  If the index contains an offset to the start of the metadata
> in the regular stream, is that not enough?  Any extraction problem could
> seek to that offset, read the metadata and continue reading straight on into
> the file's data.

> Also, I don't know what storing the contents of a directory does for you,
> since simply scanning the index could give that informtaion anyway.

Both space vs time to list the contents tradeoffs. When deciding if a 
tradeoff is worth it, remember that each seek into a new block requires
it to be unencoded.

Now that I think about it, it might also be nice for duplicity to be
able to just snip the index out of the file when determining what needs
to be backed up. And the archive header. The scp/sftp transport can do 
that, right?

So redundant metadata might be worth it, but directory contents not. If
the index was huge, it might be nice for finding that one file quickly.
But defining a sorted order for the index entries could also make it
quick to lookup a random file. 

> Some other comments:
>  * You talk about requiring a root directory header.  Sometimes people just
>    want to store a file or three, and there is no real directory to list
>    as a root.

It wouldn't have to be a real directory that would be extracted. If
directory contents are stored, then a reader should be able to rely on

>  * Regarding error correction -- every file should absolutely have some
>    sort of modern checksum (MD5, SHA, etc) associated with it.  Also,
>    file header blocks should start with a recognizable byte sequence,
>    so an extraction problem can make a reasonable attempt to recover an
>    archive starting at any arbitrary position within it (for instance,
>    if the dog ate the first 10 meters of tape)

Good point. Similar caution should be taked with the outer block layer.
So block encoding should be specified at the start of each block.

>  * The information in the archive header should be instead (or better,
>    also) stored at the beginning of the index.  Otherwise, random
>    access will be worse.

Information in the archive header shouldn't need to be accessed often,
otherwise it should be at some other layer.
>  * Some information in the archive header should be instead stored in
>    the file header.  This would allow, for instance, some files to be
>    compressed with gzip, others with bzip2, and still others with cat :-)

It's true that the best compression is likely to vary by file (nothing
like watching your cpu crunch through gziping some mpegs you are backing
up). It would be nice to be flexible in supporting various encodings at
both the block and file layer.

Will Dyson
"Back off man, I'm a scientist!" -Dr. Peter Venkman

reply via email to

[Prev in Thread] Current Thread [Next in Thread]