[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [rdiff-backup-users] Proposal to fix long filenames
From: |
dean gaudet |
Subject: |
Re: [rdiff-backup-users] Proposal to fix long filenames |
Date: |
Sun, 13 Nov 2005 17:48:11 -0800 (PST) |
On Sun, 13 Nov 2005, Ben Escoto wrote:
> >>>>> Sheldon Hearn <address@hidden>
> >>>>> wrote the following on Sat, 12 Nov 2005 12:02:45 +0200
> > At the extreme, every object gets a serial number which is used as
> > its name in the backup store's filesystem.
...
> But this is basically the way most backup programs work, and from the
> beginning the premise of rdiff-backup was that it made a mirror.
i've been thinking along the lines of sheldon's proposal for a while
actually...
most of the times i want the mirror functionality it's only for a small
subtree at a time -- if i need the entire backup it's really better for me
to restore it (due to metadata). for folks with posix filesystems with
hardlinks there's a relatively fast option to mirror subtrees out of a
"virtualized" mirror like sheldon is proposing -- rdiff-backup could
hardlink the existing mirror files into an appropriate directory on the
same partition.
then i can look about in that hardlinked tree and remove it when i'm done
perusing or copying whatever it is i was interested in.
you'd have to be careful about accidentally modifying the hardlinked
mirror, but no more careful than you have to be today when perusing the
mirror itself. (actually we could remove write perms from all files and
directories by default no matter what we do about long filenames...
something to consider.)
this doesn't help people using less featureful filesystems though.
while the mirror property was something which initially attracted me to
rdiff-backup, it wasn't the primary factor ... i was more attracted by
rsync-style network usage, (relatively) efficient storage of increments,
and storing backups on hard disk instead of tape. (i was already doing
backups to hd with tar incrementals because i can't stand tape... but it
wasn't as space or network efficient as rdiff-backup.)
> The
> best backup system that used your scheme would have a different
> architecture, and might not have all that much in common with a
> mirror+increment system. (For instance, if I were to your scheme from
> scratch I would optimize for random access of older data, so it could
> be mounted with FUSE or similar with decent performance.)
you know i'm not so sure the goals conflict...
even if you wanted to do something like FUSE as you suggest you're going
to need to generate the already-patched blocks of older files ... so you'd
probably end up keeping a cache around at the FUSE level.
the only real optimisation i can think of is to store all the deltas for a
particular object together -- so that you only need to go to one place to
rebuild whatever ancestor you're interested in. but realistically even if
you concatenate them together the filesystem isn't generally going to be
able to avoid fragmentation...
if i weren't lazy and were doing something like this from scratch i'd
optimize assuming FUSE exists and use a compact single-file-per-backup
representation and let a FUSE layer provide a mirror interface if it's
desired...
i'd go for a single-file-per-backup compressed in chunks with zlib (to
enable random access), and append an index.
a few motivations for doing this:
- reduce I/O overhead on the backup server by eliminating a lot of
disk seeks. (and probably reduce CPU overhead as well due to inode
reduction.)
- simplify archival to DVD/wherever (even archival to another hard drive
would be fast due to all sequential read/write).
- fold in the functionality of duplicity -- you could read/write through a
gpg filter (chunks would be necessary still)
- my largest backup is throwing away at least 0.6GB disk space just for
the tail fragments on all the rdiff-backup-data inodes ... this is for
28 days of increments on a 1.5M inode fs -- there are an additional 0.5M
inodes in rdiff-backup-data, of which 0.3M have a non-zero size, and
so on avg waste 2048 bytes (4KiB blocks).
not that i'm trying to convince you either way... i'm just babbling
really.
-dean