rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] known issues? trailing spaces on vfat, hyphen as ra


From: Marcel Cary
Subject: [rdiff-backup-users] known issues? trailing spaces on vfat, hyphen as range in regex, long filenames
Date: Fri, 1 Jun 2007 09:09:39 -0700 (PDT)

I'm trying to get rdiff-backup to replace my tar | gzip | split backup system that does full backups every time. So far I've found these issues:

1. rdiff-backup seems to choke on files with trailing spaces when backing
   up to a vfat filesystem

2. There appears to be a bug in the regex for quoting characters that
   causes '*' to not be quoted when it should

3. rdiff-backup turns some ~140 character filenames into ~260 filenames
   on vfat

I've seen that (3) is a known issue (bug 12823, fixed in CVS late 2005), but I would have expected that fix to make it into my version 1.0.4 (released early 2006). But I'm still having long filename issues.

Have other folks seen (1) and (2) behaviors? Are they known issues? I don't see them in the Savannah bug database. I'd be happy to file bugs if that would be helpful. Or perhaps I should be installing the latest stable rdiff-backup from source to avoid these issues.


The details:


My backup disk is currently formated with a FAT filesystem, which appears to disallow filenames with trailing spaces.

  $ sudo touch 'foo '
  touch: setting times of `foo ': No such file or directory
  $ sudo mkdir 'foo '
  mkdir: cannot create directory `foo ': Invalid argument
  $ mount
  ...
  /dev/sda1 on /media/usbdisk type vfat 
(rw,nosuid,nodev,noatime,flush,uid=1001,utf8,shortname=lower)

rdiff-backup chokes on a file with trailing spaces like this:

  Traceback (most recent call last):
    File "/usr/bin/rdiff-backup", line 23, in ?
      rdiff_backup.Main.Main(sys.argv[1:])
    File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line 285, in 
Main
      take_action(rps)
    File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line 255, in 
take_action
      elif action == "backup": Backup(rps[0], rps[1])
    File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line 308, in 
Backup
      backup.Mirror(rpin, rpout)
    File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line 38, in 
Mirror
      DestS.patch(dest_rpath, source_diffiter)
    File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line 218, 
in patch
      ITR(diff.index, diff)
    File "/usr/lib/python2.4/site-packages/rdiff_backup/rorpiter.py", line 288, 
in __call__
      branch.start_process(*args)
    File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line 548, 
in start_process
      if diff_rorp.isdir(): self.prepare_dir(diff_rorp, base_rp)
    File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line 574, 
in prepare_dir
      base_rp.mkdir()
    File "/usr/lib/python2.4/site-packages/rdiff_backup/rpath.py", line 796, in 
mkdir
      self.conn.os.mkdir(self.path)
  OSError: [Errno 22] Invalid argument: 
'/media/usbdisk/filesystem_backup/rdiff-backup/home/somebody/;068ocuments/;069xtended
 ;070amily & ;067ommunity/;071ifts ;082eceived ;076ists '
  Exception exceptions.TypeError: "'NoneType' object is not callable" in <bound method 
GzipFile.__del__ of <gzip open file 
'/media/usbdisk/filesystem_backup/rdiff-backup/rdiff-backup-data/file_statistics.2007-05-30;08407;05815;05817-07;05800.data.gz',
 mode 'wb' at 0xb7b35020 -0x486ef474>> ignored
  Exception exceptions.TypeError: "'NoneType' object is not callable" in <bound method 
GzipFile.__del__ of <gzip open file 
'/media/usbdisk/filesystem_backup/rdiff-backup/rdiff-backup-data/error_log.2007-05-30;08407;05815;05817-07;05800.data.gz',
 mode 'wb' at 0xb7baff98 -0x486ef454>> ignored
  Exception exceptions.TypeError: "'NoneType' object is not callable" in <bound method 
GzipFile.__del__ of <gzip open file 
'/media/usbdisk/filesystem_backup/rdiff-backup/rdiff-backup-data/mirror_metadata.2007-05-30;08407;05815;05817-07;05800.snapshot.gz',
 mode 'wb' at 0xb7b35068 -0x486ef2b4>> ignored

It turns out that in this particular case, I can work around the problem by asking the owner of that directory to rename it; it was certainly an error to name the file that way.

In spite of the easy work-around, it might be worth fixing. I'd guess that escaping a trailing space would work, but might be difficult without escaping *all* spaces, which would make the transformed filenames uglier than necessary. (Note that a leading or interior space is no problem for vfat.) I'm sure there are other solutions as well.


Now for the regex error... after working around the previous issue, I get another stack trace on a file named /usr/share/guile/1.6/ice-9/and-let*.scm. Note the asterisk in the filename. Also, in the output from rdiff-backup:

  Characters needing quoting                   '^a-z0-9_ -.'

Note that the last dash is between a space and a period, which, in a regex character class, means from ord(" ") to ord("."). Asterisk falls within this range. A fix is to move the dash last in the character class to that it will not be interpretted as a range.

import re
r = re.compile("[^a-z0-9_ -.]|;")
r.sub("+", "and-let*.scm.2007-05-30T16:15:53-07:00.missing")
'and-let*.scm.2007-05-30+16+15+53-07+00.missing'  <-- asterisk not replaced
r = re.compile("[^a-z0-9_ .-]|;")
r.sub("+", "and-let*.scm.2007-05-30T16:15:53-07:00.missing")
'and-let+.scm.2007-05-30+16+15+53-07+00.missing'


And finally, the long filename issue. KDE's RSS reader Akregator stores feed data in files named by their URLs. For example:

~/.kde/share/apps/akregator/Archive/http___akregator.sf.net_rss2.php.mk4

But some URLs are longer. And feeds for search results have lots of '&' and '='. Between the escaping of those characters, the timestamp suffix added by rdiff-backup, and the escaping of the characters in the timestamp, several feeds with ~140 character URLs get transformed into
~260 character increment files.  And with that, rdiff-backup fails.

I ran across a link to duplicity, which, asside from encoding backups in a tar-style archive, appends metadata with a prepended directory. Maybe this would alleviate some of the long filename pain? For example, the increment for the above files could be named:

2007-05-30;08419;05851;05805-07;05800/http___akregator.sf.net_rss2.php.mk4.diff.gz

Note that many of the increments in the same directory share the same timestamp, so I would generally expect several diffs in each timestamp directory. The .diff.gz could also potentially be handled this way. Or maybe by generating random names as suggested in bug 12823 if that's done and working.

I'm currently working around the issue by not backing up those files:

--exclude-regexp '/home/[^/]+/\.kde/share/apps/akregator/Archive/[^/]{53,}'

I'm not ready to drop my old tarball strategy as long as rdiff-backup is skipping those files. Scarier still is that after adding the exclude regexp, I had to remove the entire backup repository to get a successful backup. I think that's because the first pass at some of those files did not require increments which append the 47 character (when escaped) timestamp suffix, and perhaps when excluding the files it was necessary to record their removal from the repository by creating an increment.



I'm thrilled to see that in spite of vfat's limitations, rdiff-backup can keep track of unix-type details like owner, group, permissions, and special files. I choose vfat so I can have (limited) access to backup data before I'm able to reinstall my Linux server after a failure.

I'm ready to get rid of all but the most recent tarball from my old strategy, and hopefully soon I can ditch the tarball strategy all together.


$ uname -a
Linux foo 2.6.16.13-4-default #1 Wed May 3 04:53:23 UTC 2006 i686 i686 i386 
GNU/Linux
$ python -V
Python 2.4.2
$ rdiff-backup --version
rdiff-backup 1.0.4


Marcel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]