[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C fam
Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11)
Fri, 15 Feb 2013 19:06:06 +0900
Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux)
Miguel Ángel <address@hidden> writes:
>> Isn't it possible to skip Unicode escapes in 'phase7_getc', instead of
>> 'phase5_get'? Like the Python parser?
> No problem, but a change in 'phase5_getc' has to be done to store the
> actual character, something like mixed_string_buffer to translate the
> unicode codepoint to the local encoding.
Right. My previous suggestions seem to contradict each other. If we
handle local encodings, extraction needs to be done in 'phase5_get'.
>> > I am not very sure if I have to change always
>> > 'xgettext_current_source_encoding'. I have looked into x-java.c code.
>> The patch sets 'xgettext_current_source_encoding' to UTF-8 when it
>> detects Unicode escapes. I guess it only works if the source code
>> encoding (see "gcc -finput-charset") is UTF-8.
>> I'm also not very sure how to handle this case though, maybe we should
>> adjust to 'xgettext_global_source_encoding', if it is not ASCII?
> I have seen that iconv is used in CONVERT_STRING (in xgettext.c) to
> translate each non-ASCII string to UTF-8. Is it the default encoding for
> PO(T) files?
Yes, UTF-8 is the default output encoding. However, the input encoding
can be specified with --from-code option of xgettext, like this:
$ xgettext -a --language=C --from-code=ISO-8859-1 -o latin1.po latin1.c
Suppose that latin1.c contains an ISO-8859-1 string with Unicode
escapes. If 'xgettext_current_source_encoding' is set to UTF-8,
ISO-8859-1 part of the string will be treated as UTF-8 and thus cause
So I'd suggest to first convert the Unicode characters given by Unicode
escapes into the source encoding (in x-c.c), and then let
'remember_a_message' to convert them into UTF-8.