guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Filename encoding


From: Chris Vine
Subject: Re: Filename encoding
Date: Wed, 15 Jan 2014 21:42:57 +0000

On Wed, 15 Jan 2014 23:00:18 +0200
Eli Zaretskii <address@hidden> wrote:
> > Date: Wed, 15 Jan 2014 19:50:51 +0000
> > From: Chris Vine <address@hidden>
> > Cc: address@hidden
> > 
> > POSIX system calls are encoding agnostic.  The filename is just a
> > series of bytes terminating with a NUL character.  All guile needs
> > to know is what encoding the person creating the filesystem has
> > adopted in naming files and which it needs to map to.
> 
> This doesn't work well, because you cannot easily take apart and
> construct file names in encoding-agnostic ways.  For example, some
> multibyte sequence in an arbitrary encoding could include the '/' or
> '\' characters, so searching for directory separators could fail,
> unless you use multibyte-aware string functions (which is a nuisance,
> because these functions only support a single locale at a time).
> 
> So I think using UTF-8 internally is a much better way.

I am not sure what you mean, as I am not talking about internal use.
Guile uses IS0-5598-1 and UTF-32 internally for all its strings, which
is fine.  glib uses UTF-32 and UTF-8 internally for most purposes.  It
is the external representation which is in issue. This is just an
encoding transformation for the library when looking up a file (be it
guile, glib or anything else).

As it happens (although this is beside the point) using a byte value or
sequence in a filename which the operating system reserves as the '/'
character, for a purpose other than designating a pathname, or a NUL
character for designating anything other than end of filename, is not
POSIX compliant and will not work on any operating system I know of,
including windows. (As for POSIX, see SUS, Base Definitions, section
3.170 (Filename) and 3.267 (Pathname).) But as I say, that is
irrelevant.  Whatever the filesystem encoding happens to be, it happens
to be.  It might not be a narrow encoding at all.

Chris



reply via email to

[Prev in Thread] Current Thread [Next in Thread]