Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C fam

bug-gettext

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C fam

From:	Daiki Ueno
Subject:	Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11)
Date:	Fri, 15 Feb 2013 19:06:06 +0900
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux)

Miguel Ángel <address@hidden> writes:

>> Isn't it possible to skip Unicode escapes in 'phase7_getc', instead of
>> 'phase5_get'?  Like the Python parser?
>> 
>
> No problem, but a change in 'phase5_getc' has to be done to store the
> actual character, something like mixed_string_buffer to translate the
> unicode codepoint to the local encoding.

Right.  My previous suggestions seem to contradict each other.  If we
handle local encodings, extraction needs to be done in 'phase5_get'.

>> > I am not very sure if I have to change always
>> > 'xgettext_current_source_encoding'. I have looked into x-java.c code.
>> 
>> The patch sets 'xgettext_current_source_encoding' to UTF-8 when it
>> detects Unicode escapes.  I guess it only works if the source code
>> encoding (see "gcc -finput-charset") is UTF-8.
>> 
>> I'm also not very sure how to handle this case though, maybe we should
>> adjust to 'xgettext_global_source_encoding', if it is not ASCII?
>
> I have seen that iconv is used in CONVERT_STRING (in xgettext.c) to
> translate each non-ASCII string to UTF-8. Is it the default encoding for
> PO(T) files?

Yes, UTF-8 is the default output encoding.  However, the input encoding
can be specified with --from-code option of xgettext, like this:

$ xgettext -a --language=C --from-code=ISO-8859-1 -o latin1.po latin1.c

Suppose that latin1.c contains an ISO-8859-1 string with Unicode
escapes.  If 'xgettext_current_source_encoding' is set to UTF-8,
ISO-8859-1 part of the string will be treated as UTF-8 and thus cause
erroneous conversion.

So I'd suggest to first convert the Unicode characters given by Unicode
escapes into the source encoding (in x-c.c), and then let
'remember_a_message' to convert them into UTF-8.

Regards,
-- 
Daiki Ueno

[Prev in Thread]

Current Thread

[Next in Thread]

[bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11), Miguel Ángel, 2013/02/08
- Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11), Daiki Ueno, 2013/02/12
  - Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11), Miguel Ángel, 2013/02/13
    - Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11), Daiki Ueno <=
    - Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11), Daiki Ueno, 2013/02/15
    - [bug-gettext] [RFC Patch2] Implement \u support in xgettext for C family (C11/C++11), Miguel Ángel, 2013/02/17

Prev by Date: Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11)
Next by Date: [bug-gettext] Which windows 7 setting is used to specify the language ?
Previous by thread: Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11)
Next by thread: Re: [bug-gettext] [RFC Patch] Implement \u support in xgettext for C family (C11/C++11)
Index(es):
- Date
- Thread