[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 regression in guile 1.9.5
From: |
Andy Wingo |
Subject: |
Re: UTF-8 regression in guile 1.9.5 |
Date: |
Sat, 09 Jan 2010 19:07:38 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.0.92 (gnu/linux) |
Hi,
Reviving an old thread...
On Fri 11 Dec 2009 16:05, Mike Gran <address@hidden> writes:
>> On Sun 06 Dec 2009 21:43, Linas Vepstas writes:
>>
>> > 2009/12/6 Mike Gran :
>> >>
>> >>> > need to call (setlocale LC_ALL "")
>> >>
>> >> But for Guile to store characters as codepoints, declaring a locale
>> >> pretty much a requirement now.
>> >
>> > Would it make sense to add (setlocale LC_ALL "") to some default,
>> > e.g. boot-9.scm ?
>
> If we always call setlocale, legacy code that used UTF-8 and other
> non-Latin locales will just work. Legacy code that used strings to
> contain binary data would break.
>
> (Of couse, UTF-8 strings only worked on Guile 1.8.x so long
> as you either never looked at substrings or chars, or did
> UTF-8 parsing yourself.)
>
> As it is now, the opposite is true: legacy code with strings
> containing binary data will just work; strings containing non-8-bit
> locale encoded strings will break.
>
> | 1.8.x | setlocale |
> | Strings | called | Guile 2.0
> | contain | 1.8 | 2.0 | will
> -----------------------------------------------------------------
> | ASCII | Y/N | Y/N | just work
> -----------------------------------------------------------------
> | locale-encoded | Y/N | Y | just work
> | strings | | |
> -----------------------------------------------------------------
> | locale-encoded | Y/N | N | interpret string bytes as
> | strings | | | Latin-1
> -----------------------------------------------------------------
> | binary data | Y/N | Y | if locale is Latin-1: just work
> | | | |
> | | | | if locale is not latin-1:
> | | | | interpret string bytes using
> | | | | locale encoding
> -----------------------------------------------------------------
> | binary data | Y/N | N | just work
> | | | |
>
> I think I prefer that the coder take the responsibility of calling
> setlocale, but, I only think that because it is how C works. I'm used
> to that convention.
I would still prefer ponies and magic, but I realized: if we do a
setlocale(LC_ALL, "") at the beginning, might that not change e.g. the
floating point format, or some other locale-related variable, which
would make Guile modules unreadable, or otherwise semantically different
or invalid?
I'm asking because I ran into this bug now:
scheme@(guile-user)> ,pr (resolve-module '(gnome gtk))
Throw to key `wrong-type-arg' with args `("procedure-name" "Wrong type
argument in position ~A: ~S" (1 #<dynamic-object "libgw-guile-gnome-pango">)
(#<dynamic-object "libgw-guile-gnome-pango">))'.
Entering the debugger. Type `bt' for a backtrace or `c' to continue.
0 debug> bt
In current input:
<unknown-location>: 13 ERROR: cannot convert to output locale "NONE":
""dynamic-wind""
So I guess we need a special case for NONE there, or something. I really
don't understand i18n/l10n.
FWIW, it seems that both ruby and python require the user to call
setlocale.
Regards,
Andy
--
http://wingolog.org/
- Re: UTF-8 regression in guile 1.9.5,
Andy Wingo <=