rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Proposal to fix long filenames


From: dean gaudet
Subject: Re: [rdiff-backup-users] Proposal to fix long filenames
Date: Sun, 13 Nov 2005 17:48:11 -0800 (PST)

On Sun, 13 Nov 2005, Ben Escoto wrote:

> >>>>> Sheldon Hearn <address@hidden>
> >>>>> wrote the following on Sat, 12 Nov 2005 12:02:45 +0200
> > At the extreme, every object gets a serial number which is used as
> > its name in the backup store's filesystem.
...
> But this is basically the way most backup programs work, and from the
> beginning the premise of rdiff-backup was that it made a mirror.

i've been thinking along the lines of sheldon's proposal for a while 
actually...

most of the times i want the mirror functionality it's only for a small 
subtree at a time -- if i need the entire backup it's really better for me 
to restore it (due to metadata).  for folks with posix filesystems with 
hardlinks there's a relatively fast option to mirror subtrees out of a 
"virtualized" mirror like sheldon is proposing -- rdiff-backup could 
hardlink the existing mirror files into an appropriate directory on the 
same partition.

then i can look about in that hardlinked tree and remove it when i'm done 
perusing or copying whatever it is i was interested in.

you'd have to be careful about accidentally modifying the hardlinked 
mirror, but no more careful than you have to be today when perusing the 
mirror itself.  (actually we could remove write perms from all files and 
directories by default no matter what we do about long filenames... 
something to consider.)

this doesn't help people using less featureful filesystems though.

while the mirror property was something which initially attracted me to 
rdiff-backup, it wasn't the primary factor ... i was more attracted by 
rsync-style network usage, (relatively) efficient storage of increments, 
and storing backups on hard disk instead of tape.  (i was already doing 
backups to hd with tar incrementals because i can't stand tape... but it 
wasn't as space or network efficient as rdiff-backup.)


> The
> best backup system that used your scheme would have a different
> architecture, and might not have all that much in common with a
> mirror+increment system.  (For instance, if I were to your scheme from
> scratch I would optimize for random access of older data, so it could
> be mounted with FUSE or similar with decent performance.)

you know i'm not so sure the goals conflict...

even if you wanted to do something like FUSE as you suggest you're going 
to need to generate the already-patched blocks of older files ... so you'd 
probably end up keeping a cache around at the FUSE level.

the only real optimisation i can think of is to store all the deltas for a 
particular object together -- so that you only need to go to one place to 
rebuild whatever ancestor you're interested in.  but realistically even if 
you concatenate them together the filesystem isn't generally going to be 
able to avoid fragmentation...

if i weren't lazy and were doing something like this from scratch i'd 
optimize assuming FUSE exists and use a compact single-file-per-backup 
representation and let a FUSE layer provide a mirror interface if it's 
desired...

i'd go for a single-file-per-backup compressed in chunks with zlib (to 
enable random access), and append an index.

a few motivations for doing this:

- reduce I/O overhead on the backup server by eliminating a lot of 
  disk seeks.  (and probably reduce CPU overhead as well due to inode 
  reduction.)

- simplify archival to DVD/wherever (even archival to another hard drive 
  would be fast due to all sequential read/write).

- fold in the functionality of duplicity -- you could read/write through a 
  gpg filter (chunks would be necessary still)

- my largest backup is throwing away at least 0.6GB disk space just for 
  the tail fragments on all the rdiff-backup-data inodes ... this is for
  28 days of increments on a 1.5M inode fs -- there are an additional 0.5M 
  inodes in rdiff-backup-data, of which 0.3M have a non-zero size, and
  so on avg waste 2048 bytes (4KiB blocks).

not that i'm trying to convince you either way... i'm just babbling 
really.

-dean




reply via email to

[Prev in Thread] Current Thread [Next in Thread]