Re: [rdiff-backup-users] Q. on max-file-size behavior

rdiff-backup-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Q. on max-file-size behavior

From:	Maarten Bezemer
Subject:	Re: [rdiff-backup-users] Q. on max-file-size behavior
Date:	Sun, 14 Mar 2010 22:27:33 +0100 (CET)


On Sun, 14 Mar 2010, Whit Blauvelt wrote:

On Sun, Mar 14, 2010 at 03:31:13PM +0100, Maarten Bezemer wrote:

I don't think this is even a corner case. If you want to exclude
large files, then a file that is larger than the limit you specify
(something you explicitly and deliberatly do!) should not be in the
backup. Also, it should not _remain_ in the 'current' backup tree,
because it would no longer match the original in the source tree.
Since rdiff-backup keeps history of the backups, there is no other
way than to treat it as 'deleted from the source'. That's the only
way to keep the history intact AND have a proper 'current' backup
tree.


Here's how the corner case occurs:
[snip]

I do understand when your 'problem case' happens. Not only would it happenwhen you lower the maximum file size in later runs, it would also happenwhen you have files steadily growing over the size limit.IF you tell rdiff-backup "I do not want files larger than X in my backup",then clearly all rdiff-backup can do is... not include them in the backup.There is no difference between "I don't want them" and "They don't exist",as far as the backup application is concerned. You don't want them? Fine,you don't get them. But you also don't get an older version since thatwould make no sense either.

Quoting from the manpage:
" When backing up, if a file is excluded, rdiff-backup acts as if that
  file does not exist in the source directory."

As far as intact history goes, that's a side issue here, isn't it?


No, it's not. That's the whole point.

If rdiff-backup didn't keep history, it could just remove the large fileand be done with it. However, rdiff-backup was designed to be able torestore to previous points in time, for example to just before yourmanager accidentally removed the almost-finished $200.000 tender documentthat was due tomorrow.So, files that are no longer in the source tree (or files that you haveexcluded, either by name or by a size limit, no difference there) are notjust deleted, but rdiff-backup creates a so-called snapshot and moves thatto a proper place in the rdiff-backup-data directory. So, if you do needthat file again, it only needs to restore the snapshot.That snapshot can later be deleted when you decide to remove parts of thehistory kept by rdiff-backup. (--remove-older-than)

The normal 'current' backup tree always contains the exact same files asthe source tree. Rdiff-backup does never gzip files in the current tree.Only the snapshots and diffs in the rdiff-backup-data directory can, atthe user's choice, be gzipped.



But..

I think your problem is not with the gzipping. I think you want to userdiff-backup in a way it was never designed to be used. So, instead ofcommenting on several other "misunderstandings" in your email, I'll focuson what I think triggered this discussion:

That might not just avoid treating a file as if deleted on the original when
it hasn't been, but support actions like running rdiff-backup at regular
intervals during working hours just against smaller files, while running a
daily backup of even the large stuff every night, without having to
establish two redundant backup spaces to accommodate this.

That's just a Bad Idea (tm). The whole idea of "restore to a specificpoint in time" implies that you then get back the tree as it was at thetime you specified. Not a tree with small files from that date/time, andwith large files from an earlier date.


You do have a few options to get what you want.

For example, you could do a two-stage backup, using rsync to regularlysync the source tree to a shadow tree, and exclude-but-not-delete largefiles. And then use rdiff-backup to backup the shadow tree right aftereach rsync run. Overnight, run a full rsync and again a normalrdiff-backup, and it will update the larger files as well.This indeed uses a lot of extra disk space and thus sort of defeats thepurpose.

So, why not just use both --max-file-size and --min-file-size on twoseparate backup trees? That would exclude the large files from thesmallfiles-tree, and the small files from the largefiles-tree, so noredundancy. And you can use different backup schedules for both trees.

To make things more easy, I think I'd just create two backup trees, basedon file paths. Huges files with sizes like you mentioned usually show upon well-defined places in a file system, and not just between a normaluser's mozilla preferences file and a list of recently opened documents.So you could even use a --max-file-size for the normal backup tree, andwarn the users that they CAN use larger files there, but they will NOT bebacked up so no complaining if they get deleted, corrupted, or lost.

Good points. But let me rephrase the claims more clearly. (Language can be
too broad a brush for technical discussions.) If the user's goal is to
compromise [snip]

If you want to compromise, you don't get what you want, and also you getthings you don't want. That's not only a matter of language, it's justsomething you don't want when designing a backup system. If you wantspeed (assuming, for the sake of argument, that gzipping is your onlyproblem), just get larger disks. Extra 1TB of disk space costs way lessthan changing rdiff-backup to something it was never designed to be.

Plus, gzipping might indeed take eons to complete on a 16GB file, but yoursuggestion wouldn't do anything to improve the speed of:

- the part where librsync creates a local copy of the current version of
  the file in the source tree
- the part where a diff is created to be able to go from the current
  version to the previous version
- the part where that possibly large diff is stored into the
  rdiff-backup-data directory.

(Where the first two might very well take even more time than gzipping thefile..)

Actually, your suggestion would only help for large files being deleted(or excluded) from the source tree. For your suggestion to be reallyuseful, you would need to have a source tree that has this happening on aregular basis. And in that case, the time spent in gzipping will be somuch less of a problem than the amount of disk space that will be used byall the increments. (Or you would need to keep such a short history thatyou shouldn't be using rdiff-backup at all, making this discussion mootanyway.)



Maarten

[Prev in Thread]

Current Thread

[Next in Thread]

[rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/13
- Re: [rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/13
  - Re: [rdiff-backup-users] Q. on max-file-size behavior, Jernej Simončič, 2010/03/13
    - Re: [rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/13
  - Re: [rdiff-backup-users] Q. on max-file-size behavior, Josh Nisly, 2010/03/13
    - Re: [rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/13
    - Re: [rdiff-backup-users] Q. on max-file-size behavior, Jernej Simončič, 2010/03/13
    - Re: [rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/13
    - Re: [rdiff-backup-users] Q. on max-file-size behavior, Maarten Bezemer, 2010/03/14
    - Re: [rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/14
    - Re: [rdiff-backup-users] Q. on max-file-size behavior, Maarten Bezemer <=
    - Re: [rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/14

Prev by Date: Re: [rdiff-backup-users] Q. on max-file-size behavior
Next by Date: Re: [rdiff-backup-users] Q. on max-file-size behavior
Previous by thread: Re: [rdiff-backup-users] Q. on max-file-size behavior
Next by thread: Re: [rdiff-backup-users] Q. on max-file-size behavior
Index(es):
- Date
- Thread