rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] What filename characters does Mac OS X support?


From: Alastair Rankine
Subject: Re: [rdiff-backup-users] What filename characters does Mac OS X support?
Date: Sun, 23 Oct 2005 15:56:53 +1000

On 22/10/2005, at 12:35 PM, Ben Escoto wrote:

In particular, there's a table with a list of illegal characters  


Thanks for the references.  Unfortunately they're in unicode and I
don't know enough to translate them to ascii offhand.  Kevin Horton's
message suggests that all the standard unix characters should be fine
though.

Ben, I don't know what you mean by "translate [unicode characters] to ascii"? This just isn't possible, but perhaps you mean translate these characters to UTF-8 (ie char * in C)? In which case you should look at the "encode" python string methods, and/or the libiconv C library.

However: After some further investigation I'm not entirely sure you need to worry about that table of illegal unicode characters I quoted earlier. I just ran the following experiment:

#!/usr/bin/python
# -*- coding: utf-8 -*-
open( u"é composed char", "w").close()
open( u"\u00e9 escaped composed", "w").close()
open( u"\u0065\u0301 escaped decomposed", "w").close()

This resulted in the é character being successfully inserted into each of the three output filenames. (I'd include output of "ls" here, but it doesn't seem to be unicode aware). So even though U+00E9 is explicitly designated as an illegal character by the filesystem specification, it looks like the OS is silently taking care of the required decomposition into the U+0065, U+0301 sequence on disk.

So although it is an issue *on disk* for some unicode characters to be decomposed, in reality it doesn't seem to make any difference - the OS takes care of the correct on-disk representation. Interestingly, the OS seems to be re-composing the decomposed characters when reading them from disk:

>>> os.listdir(u".")
[u'e\u0301 composed char', u'e\u0301 escaped composed', u'e\u0301 escaped decomposed']

This is not important for rdiff-backup, just an interesting aside.

Anyway, it seems that any of the unicode character set is usable in MacOS X filenames.

\Maybe it's because I'm new to rdiff-backup, but I can't understand
why you need to determine the capabilities of the source file
system?

Under the old system we didn't check the source, just the destination
(as in your scheme).  This worked ok, but led to unnecessary quoting.
For instance in a Mac OS X -> Mac OS X backup, rdiff-backup would
quote all uppercase characters.

I'm sorry I still don't get it. If the destination filesystem is case *preserving* (which in this case it is), surely this removes the need for unnecessary quoting?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]