[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Duplicity-talk] Re: [rdiff-backup-users] Pretty pictures and new versio
[Duplicity-talk] Re: [rdiff-backup-users] Pretty pictures and new version of proposal
Mon, 29 Sep 2003 14:17:31 -0500
On Mon, Sep 29, 2003 at 12:17:47AM -0700, Ben Escoto wrote:
> Hi all, thanks again for your input. I have updated the page at:
> and put in some more detail. This version tries to be both more tape
> friendly and more file system friendly. As always, if anyone has any
> comments (for instance, you think keeping two copies of file metadata
> is excessive---see the page), I would be happy to hear them.
That is exactly one thing I was thinking :-) I really don't see what it
buys anybody. If the index contains an offset to the start of the metadata
in the regular stream, is that not enough? Any extraction problem could
seek to that offset, read the metadata and continue reading straight on into
the file's data.
Also, I don't know what storing the contents of a directory does for you,
since simply scanning the index could give that informtaion anyway.
And finally, I think that the argument about the compressibility of the
matadata is a non-starter since the format doesn't propose compressing the
metadata (only the actual file data) and that's not something that's going
to be good for random seeks and performance anyway.
Otherwise, it looks good :-)
Some other comments:
* You talk about requiring a root directory header. Sometimes people just
want to store a file or three, and there is no real directory to list
as a root.
* Regarding error correction -- every file should absolutely have some
sort of modern checksum (MD5, SHA, etc) associated with it. Also,
file header blocks should start with a recognizable byte sequence,
so an extraction problem can make a reasonable attempt to recover an
archive starting at any arbitrary position within it (for instance,
if the dog ate the first 10 meters of tape)
* The information in the archive header should be instead (or better,
also) stored at the beginning of the index. Otherwise, random
access will be worse.
* Some information in the archive header should be instead stored in
the file header. This would allow, for instance, some files to be
compressed with gzip, others with bzip2, and still others with cat :-)