bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gettext Mac GUI application returning wrong characters


From: Bruno Haible
Subject: Re: Gettext Mac GUI application returning wrong characters
Date: Sat, 11 Jul 2020 12:18:31 +0200
User-agent: KMail/5.1.3 (Linux/4.4.0-179-generic; KDE/5.18.0; x86_64; ; )

Hi Gonzalo,

> I compiled a sample case that does.

Based on this, I could adapt my test case and reproduce the issue.

Find it attached. Use:
  $ ./configure --prefix=PREFIX; make; make install
  $ mkdir -p hello.app/Contents/MacOS; ln -s PREFIX/bin/hello 
hello.app/Contents/MacOS/hello
then double-click on hello.app in the Finder.

> I printed out the variable values and when run command line only 
> LC_CTYPE=UTF-8 is set.
> 
> When run from the GUI no values are set.

Yes, I reproduce this: in the Terminal, LC_CTYPE=UTF-8; when run from the
Finder, LC_CTYPE is unset.

In my test case, I added code to print the MB_CUR_MAX and locale_charset()
1. after setlocale(LC_ALL,""),
2. after std::locale::global(std::locale("")).

The result is:
1. MB_CUR_MAX=4, locale_charset()=UTF-8
2. MB_CUR_MAX=1, locale_charset()=ASCII

Here are the explanations:

  * Although the text encoding on macOS generally is UTF-8, the locale
    facility in libc by default - i.e. before the first setlocale() call, or
    when setlocale(LC_ALL,"") is called and no LANG, LC_* environment
    variable is set - sets MB_CUR_MAX = 1.

  * When MB_CUR_MAX = 1, the functions like mbrtowc etc. cannot support UTF-8
    encoding. For this reason, libintl and gnulib's locale_charset() function
    returns "ASCII" in this case. See the code at the end of [1].

  * In this case, the gettext facility uses iconv() to convert the strings to
    ASCII. So, for example, "├▒" becomes "n~" or "~n". This is coded in [2],
    function get_output_charset and its caller.

  * Two workarounds exist, to make UTF-8 encoded translations appear
    nevertheless:
      - The Terminal app sets the environment variable LC_CTYPE=UTF-8.
      - <libintl.h> contains a setlocale override that, on macOS, assumes
        LC_CTYPE=UTF-8 even if it is not set. [3] line 1482.

  * In the test case, we invoke the overridden setlocale from <libintl.h>.
    This explains the output
      1. MB_CUR_MAX=4, locale_charset()=UTF-8

  * In the test case, then, the statement
      std::locale::global(std::locale(""))
   invokes setlocale(LC_ALL,"") - the original setlocale from libc, not the
   overridden one. So, it annihilates the effect of the previous step.
   Since none of the two workarounds is active, then, you get the
   transliterated output.

The fix, now, is to add this code before std::locale::global(std::locale("")):

   #if defined __APPLE__ && defined __MACH__
   setenv ("LC_CTYPE", "UTF-8", 1);
   #endif

and include <stdlib.h>.

Bruno

[1] 
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-runtime/intl/localcharset.c
[2] 
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-runtime/intl/dcigettext.c
[3] 
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-runtime/intl/setlocale.c

Attachment: hello-c++-0.tar.gz
Description: application/compressed-tar


reply via email to

[Prev in Thread] Current Thread [Next in Thread]