[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] Sliding window backup strategy?

From: Peter Schuller
Subject: Re: [Duplicity-talk] Sliding window backup strategy?
Date: Mon, 6 Oct 2008 23:33:28 +0200
User-agent: Mutt/1.5.18 (2008-05-17)

> An idea I was thinking of for a separate backup tool was to simply
> have a file store that was independent of each snapshot backup. This
> way the bulk data (file contents) would be references on demand from
> whichever snapshots contain them, and the actual snapshots themselves
> would be completely independent of each other. You could use this to
> achieve a rolling window, or an exponential backoff density, or most
> other things very trivially by simple keeping the snapshots you want
> to keep. 

A modification on this (originally conceived to be a tree structure
with filenames that were the sha256 of file contents):

Implement a fairly simple "extent storage" mechanism that is backed by
volumes in a defined size range, being stored on a remote "dumb"
backend, in combination with an index into that storage.

The assumptions is that the remote dumb storage may not be efficient
to access for storing very small files, so you want the minimum volume
size to be something vaguely reasonable (say a few megs or more). At
the same time, you want to be able to download, re-write and re-upload
a volume with reasonable easy without making a huge impact (relatively
speaking) on storage use. So you want a maximum volume size.

So let each snapshot backup be a dump containing meta data about all
files in the backup, with references to globally unique (within the
backup repository) file ids. Again, each such dump would be completely
independent of each other.

Then you keep a file with id -> [list of extent] (each extent being a
volume name + offset + length) mappings for the entire file
repository. This mapping would be in the form of a single file (or a
number of files if too large to fit within the maximum volume size)
and it would have to be replaced whenever you add new files (or new
versions of files).

Assuming the number of files and their meta data is sufficiently small
relative to the total size of the data being backed up, and assuming
we can do away with the rsync algorithm (sorry), this would translate
into a relatively simple "file system" which is possible to have
backed by a duplicity style fairly dumb store.

To save space over time as files drop out, you can incrementally
compact individual volumes. The total use of bandwidth would be higher
than duplicity (though the space vs. bandwidth trade-off would be
controllable by changing the policy of when to compact volumes), but
it would achieve:

* Regular incremental backup semantics.
* Regular full backup semantics.
* Does away with the need to ever re-transfer a full backup.
* Still doesn't require intelligent software on the other hand.

The biggest disadvantage to me would be the lack of an rsync style
algorithm, but on the other hand the model is generally a bit simpler
instead. Another disadvntage would be that in order to keep the
implementation simple you'd probably want to work under the assumption
that the meta data on files can be kept in RAM, which is not
necessarily suitable under all circumstances...

/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <address@hidden>'
Key retrieval: Send an E-Mail to address@hidden
E-Mail: address@hidden Web: http://www.scode.org

Attachment: pgpozaZlXsZlP.pgp
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]