gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Trying out the new escaping version...


From: Jan Hudec
Subject: Re: [Gnu-arch-users] Trying out the new escaping version...
Date: Thu, 18 Mar 2004 20:24:47 +0100
User-agent: Mutt/1.5.5.1+cvs20040105i

On Thu, Mar 18, 2004 at 11:27:25 -0800, Tom Lord wrote:
> 
> 
>     > From: Jan Hudec <address@hidden>
> 
>     > On Thu, Mar 18, 2004 at 11:32:38 +0100, address@hidden wrote:
>     > > Next: the escaped version still only accepts 7bit ascii [...]
> 
>     > I understand how hard it is to lift this. It would be extremely nice if
>     > it could accept high characters. I mean \(U+0100) and higher, since
>     > I have ISO-8859-2 encoding here. It, of course requires two things:
> 
>     >     1) Proper charset conversion to/from charset detected from locale.
>     >     2) Some sane fallback what to do when a character can't be 
> converted.
> 
> It is, very much, a goal to add support for "Unicode filenames" to
> arch, in a fairly broad sense.
> 
> If your filesystem is storing filenames in any character set which is
> reasonably regarded as a subset of Unicode,  I want arch to get to a
> state where you can use the full range of available filenames.
> 
> Roughly speaking: that means that arch changesets and other data files
> will be able to store filenames in Unicode, converting for local
> purposes as you suggest.
> 
> It's a little bit of a hard problem, though.  The design of arch is
> going to have to express an opinion about the nature of "portable
> filenames" where "portable" encompasses this broader scope of extended
> character sets.  "String equivalence" (and, hence, "filename
> equivalence") gets pretty complicated in Unicode.  There are multiple
> ways to "spell" a given string.  The design of arch is going to have
> to adopt some position about _which_ notion of equivalence applies to
> filenames.
> 
> On top of that, there's practical issues.   For example, what becomes
> of the tar file stored by `import'?   Tar, also, must have an answer
> about the nature of filename portability.    Arch and tar have to
> agree about this, as things stand at least.
> 
> It's a tricky issue that will take a while to resolve.   It's this
> broader issue that has so far stopped me from liberalizing arch's
> naming conventions (as encoded in `inventory') to permit non-ascii
> characters in filenames.   It's easy to make that liberalization but I
> only want to make it once I'm confident that I won't be shooting
> myself in the foot, down the road, as more complete Unicode support
> comes on-line.

I am aware of these problems. That's why I bring it up -- it has to get
a lot of talking before it can be implemented the right way.

For tar, arch can fix names after tar unpacks them and can rename files
that are due for tar to pack. Perhaps filenames in tarballs (perhaps
even in library, pristines and temporaries) could be quoted. And only
renamed in the working directory.

-------------------------------------------------------------------------------
                                                 Jan 'Bulb' Hudec 
<address@hidden>

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]