[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: check-AUTHORS fails because of non ansi characters
From: |
Bruno Haible |
Subject: |
Re: check-AUTHORS fails because of non ansi characters |
Date: |
Sat, 21 Jun 2008 18:10:28 +0200 |
User-agent: |
KMail/1.5.4 |
> > |> 58c58
> > |> < ptx: Fran?ois Pinard
This is not user friendly: proper_name_utf8 should not return a result with
question marks. Instead it's better if it returns its first argument. I'm
fixing it through the appended patch. But it will not fix the coreutils
test failure.
> > |> ---
> > |>> ptx: François Pinard
> >
> > In my email, this is rendering as one vs. two characters. I suspect it
> > might be a locale issue - perhaps Jim is using a UTF-8 locale, and Michael
> > is using a Latin-1 encoding?
Michael must be using a locale in ASCII encoding; if it were a Latin1 encoding,
the output would have contained a cedilla, not a question mark.
Jim Meyering wrote:
> The problem is probably that his system lacks the en_US.UTF-8 locale,
> which is used by that check-AUTHORS rule.
>
> Here's a change I'm considering. It's easy in the sense that it's merely
> using an existing m4 macro, gt_LOCALE_FR_UTF8,
Yes, this change will fix the test failure.
> but has the drawback of depending on a locale that is less likely to be
> installed than the English one.
I'm not sure whether en_US.UTF-8 is more often installed than fr_FR.UTF-8.
Certainly Solaris systems have it for ages, but in general the effort spent
on i18n of French is greater than the one spent on i18n of English.
> One twist was that on my system, the french translation of "F. Pinard"
> was identical to the original
Yes, the test is depending on the message catalog as well. If you
use not only
LC_ALL=$(LOCALE_FR_UTF8)
but
LC_ALL=$(LOCALE_FR_UTF8) LANGUAGE=zxx
it will eliminate this source of trouble. ('zxx' is the language code for
'not applicable'; it's highly unlikely to carry a message catalog ever.)
> + echo 'your system lacks a french UTF8 locale' 1>&2; \
I would write UTF-8 here. That's the only standardized name of the encoding
that you mean.
2008-06-21 Bruno Haible <address@hidden>
* lib/propername.c (proper_name_utf8): Don't use the transliterated
result if it contains question marks.
Reported by Michael Geng <address@hidden>.
*** lib/propername.c.orig 2008-06-21 17:47:37.000000000 +0200
--- lib/propername.c 2008-06-21 17:37:16.000000000 +0200
***************
*** 205,219 ****
# if (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 2) || __GLIBC__ > 2 \
|| _LIBICONV_VERSION >= 0x0105
{
size_t len = strlen (locale_code);
char *locale_code_translit = XNMALLOC (len + 10 + 1, char);
memcpy (locale_code_translit, locale_code, len);
memcpy (locale_code_translit + len, "//TRANSLIT", 10 + 1);
! name_converted_translit = alloc_name_converted_translit =
xstr_iconv (name_utf8, "UTF-8", locale_code_translit);
free (locale_code_translit);
}
# endif
#endif
--- 205,236 ----
# if (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 2) || __GLIBC__ > 2 \
|| _LIBICONV_VERSION >= 0x0105
{
+ char *converted_translit;
+
size_t len = strlen (locale_code);
char *locale_code_translit = XNMALLOC (len + 10 + 1, char);
memcpy (locale_code_translit, locale_code, len);
memcpy (locale_code_translit + len, "//TRANSLIT", 10 + 1);
! converted_translit =
xstr_iconv (name_utf8, "UTF-8", locale_code_translit);
free (locale_code_translit);
+
+ if (converted_translit != NULL)
+ {
+ # if !_LIBICONV_VERSION
+ /* Don't use the transliteration if it added question marks.
+ glibc's transliteration falls back to question marks; libiconv's
+ transliteration does not.
+ mbschr is equivalent to strchr in this case. */
+ if (strchr (converted_translit, '?') != NULL)
+ free (converted_translit);
+ else
+ # endif
+ name_converted_translit = alloc_name_converted_translit =
+ converted_translit;
+ }
}
# endif
#endif
***************
*** 270,276 ****
}
}
! #ifdef TEST
# include <locale.h>
int
main (int argc, char *argv[])
--- 287,293 ----
}
}
! #ifdef TEST1
# include <locale.h>
int
main (int argc, char *argv[])
***************
*** 281,283 ****
--- 298,312 ----
return 0;
}
#endif
+
+ #ifdef TEST2
+ # include <locale.h>
+ # include <stdio.h>
+ int
+ main (int argc, char *argv[])
+ {
+ setlocale (LC_ALL, "");
+ printf ("%s\n", proper_name_utf8 ("Franc,ois Pinard", "Fran\303\247ois
Pinard"));
+ return 0;
+ }
+ #endif