gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Trying out the new escaping version...


From: Tom Lord
Subject: Re: [Gnu-arch-users] Trying out the new escaping version...
Date: Thu, 18 Mar 2004 11:27:25 -0800 (PST)


    > From: Jan Hudec <address@hidden>

    > On Thu, Mar 18, 2004 at 11:32:38 +0100, address@hidden wrote:
    > > Next: the escaped version still only accepts 7bit ascii [...]

    > I understand how hard it is to lift this. It would be extremely nice if
    > it could accept high characters. I mean \(U+0100) and higher, since
    > I have ISO-8859-2 encoding here. It, of course requires two things:

    >     1) Proper charset conversion to/from charset detected from locale.
    >     2) Some sane fallback what to do when a character can't be converted.

It is, very much, a goal to add support for "Unicode filenames" to
arch, in a fairly broad sense.

If your filesystem is storing filenames in any character set which is
reasonably regarded as a subset of Unicode,  I want arch to get to a
state where you can use the full range of available filenames.

Roughly speaking: that means that arch changesets and other data files
will be able to store filenames in Unicode, converting for local
purposes as you suggest.

It's a little bit of a hard problem, though.  The design of arch is
going to have to express an opinion about the nature of "portable
filenames" where "portable" encompasses this broader scope of extended
character sets.  "String equivalence" (and, hence, "filename
equivalence") gets pretty complicated in Unicode.  There are multiple
ways to "spell" a given string.  The design of arch is going to have
to adopt some position about _which_ notion of equivalence applies to
filenames.

On top of that, there's practical issues.   For example, what becomes
of the tar file stored by `import'?   Tar, also, must have an answer
about the nature of filename portability.    Arch and tar have to
agree about this, as things stand at least.

It's a tricky issue that will take a while to resolve.   It's this
broader issue that has so far stopped me from liberalizing arch's
naming conventions (as encoded in `inventory') to permit non-ascii
characters in filenames.   It's easy to make that liberalization but I
only want to make it once I'm confident that I won't be shooting
myself in the foot, down the road, as more complete Unicode support
comes on-line.

-t




reply via email to

[Prev in Thread] Current Thread [Next in Thread]