rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Proposal to fix long filenames


From: Ben Escoto
Subject: Re: [rdiff-backup-users] Proposal to fix long filenames
Date: Sat, 12 Nov 2005 20:38:11 -0600

>>>>> Chris Wilson <address@hidden>
>>>>> wrote the following on Sat, 12 Nov 2005 12:25:09 +0000 (GMT)
> 
> There is a fourth case: where the destination path is deeper into the 
> destination filesystem than the source path is. For example, I backup many 
> machines root directories (/) into /mnt/backup/<machine-name>/rdiff on my 
> backup servers. In this case, both the original filename and the 
> increments may be too long to back up.

Very good point.  rdiff-backup can also have this problem with too
long paths, and it might be best to solve both problems at once.

> I'd like to propose a compromise:
> 
> rdiff-backup figures out the longest possible filename and deepest 
> possible path for itself when examining the filesystem capabilities.
> 
> If, during backup, any path or file to be written to the destination 
> exceeds those lengths, it's terminated near the maximum length, and a 
> number appended. The relevant IncrementFilename or MirrorFilename 
> directive is written to the metadata at the same time. So for example:
> 
> 
>       /a/really/long/path/on/a/short/path/file/system
> 
> might become
> 
>       /a/really/long/path/on/a/short/path/file/s~1
> 
> and if the filesystem's limits are so short that directories must be 
> renamed as well, then keep at least the first character of each one:
> 
>       /a/really/long/p~1/on/a/s~1/p~1/f~1/s~1

I would like to do something like this, because it preserves more of
the mirror flavor, but there are three issues:

1)  What happens when the mirror uses one of the "spare" filenames
    that rdiff-backup has used for long filenames?  For instance, in
    your example above, what if someone writes /a/really/long/p~1 to
    the mirror?

2)  This change wouldn't be backwards-compatible, because currently a
    file p~1 in the mirror directory means there was a source file
    also called p~1.

3)  rdiff-backup only makes one pass, and it would be a pain to change
    this (besides making things much slower).  It doesn't know in
    advance what the deepest path would be, and so doesn't know to
    rename a directory if paths in the directory are too long.

Also it's not clear that the long path problem is the same as the long
filename problem.  I think that OSes don't care about the absolute
length of a file's path---the limitation only applies to commands used
to manipulate the paths.  So I think chdir'ing when appropriate would
fix the problem without requiring any change to the format of the
repository.

> That's a pity, since I think it would now be really easy: just make
> a hash table of the SHA-1 checksums in the mirror, and compare the
> checksum of each newly added file to this list, to see if it's a
> duplicate or a moved file. This shortcuts the need to transfer the
> file again.

This could be possible, but I was keeping separate the means of
transfer (which might include tricks like that one), and the format of
the repository.  The problem of the long filename is a problem with
the repository format, and can't be fixed just by transferring files
in an innovative way.

There could be some rename-detecting plan which affected the
repository format:  for instance, if I file was renamed we put a
special kind of diff which indicates a change not from the later
version of that file, but from another file.  Or if a file's SHA1 hash
matched another file, we don't bother to write both files separately
to the mirror.  There might be some trickery here which would also fix
the long filename problem, but I doubt there's anything that would be
worth the complexity.


-- 
Ben Escoto

Attachment: pgpvLpBwlhH3X.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]