[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] Re: AW: Re: What happens if you add a --exclude to

From: Robert Nichols
Subject: [rdiff-backup-users] Re: AW: Re: What happens if you add a --exclude to an existing rdiff-backup?
Date: Tue, 08 Feb 2011 12:54:58 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20101027 Fedora/3.0.10-1.fc12 Thunderbird/3.0.10

On 01/08/2011 05:17 AM, D. Kriesel wrote:
It would be good if there was a way of removing a subset of data from
the entire repository. So let's say I put a 500GB folder in /home by
accident and it has gone into the repository and is bloating it. I can
exclude it from my future rdiff-backup runs but the folder will still be
held as snapshot[s]. If I run --remove-older-than it will remove all
data older than whenever, but I want to keep all the other stuff and
just remove this folder (and its contents).

RIGHT! This is the ONE feature I miss about rdiff-backup and which is my 
largest concern about it. I'll try to put it in a formalized way:
"I want to be able to remove an entire subtree of an rdiff-backup repository _and 
every single trace of it in the metadata_".

This is not possible right now, as far as I know. If this was possible, it 
would just be great.

In my opinion, one would just have to remove
    * any diffs, snapshots, increment, dir and missing markers and similar 
files of the subtree (easy because you just would have to delete a subtree 
within the metadata plus some few additional files)
    * any trace of any file within the subtree to delete in the zipped backup 
table of content files.

Please correct me if I'm wrong. If anyone wants to implement this feature, It 
will gladly be my shout for a sixpack :-)

How brave and daring are you?  I have a stand-alone script that has been
working successfully for me.  Still, it's hard for me to be too proud of
it.  It is beyond a doubt the most outrageously complex shell script I
have ever written.  I think I've cleaned up the last serious glitch,
that being the problem of a file with multiple hard links and deleting
the checksum bearing link while leaving some of the others.  That has
some strange corner cases that are almost impossible to foresee, so it's
hard to be 100% confident.  And really, how useful is that anyway?  It's
not going to reclaim any space.

A few other caveats:

 * This is a Linux bash shell script and is unlikely to work in other
   environments.  It makes heavy use of 'awk', 'sed', and 'find'.  Yes,
   it really should have been written in Python, but every time I've
   tried to get a handle on Python, Python has instead coiled itself
   around me and started crushing the life out of me.

 * The metadata for Mac resource forks is not handled.  This shouldn't
   be hard to do, but I don't have any way to generate the data or test
   the result.

 * An ASCII character set is assumed, in particular that the printable
   graphic characters are '!' through '~' and that characters outside
   this range are represented as octal escapes, but I believe that is
   what rdiff-backup does internally anyway.

I'll send the script, all 450+ lines of it, to anyone who wants to try it
out.  I'm a bit reluctant to post it in a public place until at least a
few courageous souls can confirm that it doesn't eat their first-born
child, etc.  All I can say right now is that if it does do that, it will
be the first first-born child it has ever eaten.

Bob Nichols     "NOSPAM" is really part of my email address.
                Do NOT delete it.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]