bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] Accessing the environment's locale encoding setti


From: Bruno Haible
Subject: Re: [bug-libunistring] Accessing the environment's locale encoding settings
Date: Wed, 16 Nov 2011 03:00:38 +0100
User-agent: KMail/1.13.6 (Linux/2.6.37.6-0.5-desktop; KDE/4.6.0; x86_64; ; )

[Dropping bug-libunistring from the CC.]

Hi Ludo',

> Should we be checking for charset aliases?

Yes, without the system dependent aliases table the locale_charset()
function is buggy on nearly all platforms. Cf. gnulib/lib/config.charset.

> In Guile, strings coming from the C world are assumed to be encoded in
> the current locale encoding.  Like in C, the current locale is set using
> ‘setlocale’, and it’s up to the user to write (setlocale LC_ALL "") to
> set the locale according to the relevant environment variables.
> 
> The problem comes with command-line arguments: the user hasn’t yet had a
> chance to call ‘setlocale’, yet they most likely have to be converted
> from locale encoding. ...

I would recommend to have setlocale(...) happen *before* the command-line
arguments are parsed, not *after*. For two reasons:
  1) The parsing of command-line arguments can provoke errors, and errors
     should be displayed in the user's language, that is, depend on $LANG,
     $LC_MESSAGES, $LC_ALL.
  2) As you noticed, if setlocale(...) happens too late, you want to
     simulate the effects "as if" setlocale(LC_ALL, "") had been called.
     But you have thought only about the locale encoding (part of the
     LC_CTYPE category of the locale), not about LC_MESSAGES which is needed
     when you print an error message.

You wrote:
> > Unfortunately, I don't see a way for the user to call setlocale before a
> > Guile script converts the command-line arguments to Scheme strings, at
> > least not without providing their own `main' function in C.
>
> Hmm, very good point.

That is precisely the point. Only in C, C++, Objective C, PHP, and Guile,
it is the user's responsibility to set the locale. Look at the many
internationalization samples ("hello world" samples) in GNU gettext:
In all other languages (and even many GUI toolkits based on C, C++, or
Objective C) the setlocale call is implicit.

The user should *not* have to worry about conversion of strings from/to
locale encoding, because
  1) This is what people expect from a scripting language nowadays.
  2) In Guile strings are sequences of Unicode characters [1][2].

The fact that in C and C++ the default locale inside a program (that is,
the locale in effect when the program is started) is *not* the locale
specified by the user is only due to backward compatibility:
  - In C, because C started as a system programming language and the
    locale facilities were not there in the beginning,
  - In C++, because C++ has strong backward compatibility links with C.

So my suggestion is to do (setlocale LC_ALL "") as part of the Guile
initialization, very early. Yes, this might lead to some complexity
in the Guile implementation if you have the concept of locale also at
the Guile level and need to make sure that the locale at the C level and
the locale at the Guile level are consistent as soon as the latter is
defined. But this is manageable.

Bruno

[1] http://www.gnu.org/software/guile/manual/html_node/Strings.html
[2] http://www.gnu.org/software/guile/manual/html_node/Characters.html
-- 
In memoriam Kurt Gerron <http://en.wikipedia.org/wiki/Kurt_Gerron>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]