[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Duplicity-talk] Re: [rdiff-backup-users] Pretty pictures and new versio

From: John Goerzen
Subject: [Duplicity-talk] Re: [rdiff-backup-users] Pretty pictures and new version of proposal
Date: Mon, 29 Sep 2003 14:17:31 -0500
User-agent: Mutt/1.4i

On Mon, Sep 29, 2003 at 12:17:47AM -0700, Ben Escoto wrote:
> Hi all, thanks again for your input.  I have updated the page at:
> and put in some more detail.  This version tries to be both more tape
> friendly and more file system friendly.  As always, if anyone has any
> comments (for instance, you think keeping two copies of file metadata
> is excessive---see the page), I would be happy to hear them.

That is exactly one thing I was thinking :-)  I really don't see what it
buys anybody.  If the index contains an offset to the start of the metadata
in the regular stream, is that not enough?  Any extraction problem could
seek to that offset, read the metadata and continue reading straight on into
the file's data.

Also, I don't know what storing the contents of a directory does for you,
since simply scanning the index could give that informtaion anyway.

And finally, I think that the argument about the compressibility of the
matadata is a non-starter since the format doesn't propose compressing the
metadata (only the actual file data) and that's not something that's going
to be good for random seeks and performance anyway.

Otherwise, it looks good :-)

Some other comments:

 * You talk about requiring a root directory header.  Sometimes people just
   want to store a file or three, and there is no real directory to list
   as a root.

 * Regarding error correction -- every file should absolutely have some
   sort of modern checksum (MD5, SHA, etc) associated with it.  Also,
   file header blocks should start with a recognizable byte sequence,
   so an extraction problem can make a reasonable attempt to recover an
   archive starting at any arbitrary position within it (for instance,
   if the dog ate the first 10 meters of tape)

 * The information in the archive header should be instead (or better,
   also) stored at the beginning of the index.  Otherwise, random
   access will be worse.
 * Some information in the archive header should be instead stored in
   the file header.  This would allow, for instance, some files to be
   compressed with gzip, others with bzip2, and still others with cat :-)

-- John

reply via email to

[Prev in Thread] Current Thread [Next in Thread]