[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LC_CTYPE implementation help
From: |
Aragon Gouveia |
Subject: |
Re: LC_CTYPE implementation help |
Date: |
Thu, 28 Aug 2008 02:22:48 +0200 |
User-agent: |
Mutt/1.4i |
| By Bruno Haible <address@hidden>
| [ 2008-08-28 01:32 +0200 ]
> Yes. gettext does not replace the system's locales. If you are on a system
> with broken locales, then either you have a localedef command (like on
> glibc or Solaris systems), or you are hosed (that's the case on most
> other systems, including *BSD, Cygwin, mingw).
You got me a bit alarmed there, so I did some poking around in FreeBSD's
locale design. Looks like it has mklocale(1) and colldef(1) for compiling
LC_CTYPE and LC_COLLATE system locale source files. The remaining locale
categories are generated by the source build system somehow. The OS's
source tree has source files for all the LC_* categories so at worst it
should be possible to add to/change the stock locales by rebuilding the OS.
I see atleast one locale that is missing (mine, en_ZA) so I'll make a point
of asking freebsd-i18n what's involved in adding locales for all
categories. :)
> FreeBSD <ctype.h> are certainly multibyte aware. But isalnum() is not
> sufficient for testing whether '?' is a lower-case or upper-case letter
> because often strlen("?") == 2.
>
> > edit: just noticed FreeBSD has ctype functions like iswalnum() for handling
> > "wide characters" and are declared in wctype.h. Cool! :)
>
> Yes, mbtowc() + iswalnum() together are a working replacement for isalnum().
> But I would not recommend to use functions which work on wide character
> *strings* (wchar_t*) - doing so causes more problems that it solves. The
> preferred representations for strings continue to be char* strings,
> either in locale encoding (the default) or in UTF-8 encoding (see also
> the unistr/u8* functions in gnulib).
I'll need to do some testing. Another worry for me is that I suspect there
are very few FreeBSD users that actually use locale sensitive functions
with multibyte characters, so who knows what state they're in. :)
Thanks a lot for all the advice!
Thanks,
Aragon