[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug#493218: gettext: crash with some unicode chars (fwd)
From: |
Bruno Haible |
Subject: |
Re: Bug#493218: gettext: crash with some unicode chars (fwd) |
Date: |
Sun, 3 Aug 2008 22:04:24 +0200 |
User-agent: |
KMail/1.5.4 |
Hi,
Yann <address@hidden> wrote:
> to reproduce, just open a file test.py with only u'\udfff' in it, and
> run xgettext t.py
> we get a Aborted message
Find attached the fix that I just committed. Thanks for the report.
> This string isn't translatable, so why xgettext parse it? And why does
> it fail?
xgettext's logic would be more complex if it was parsing only when deemed
"necessary". It's simpler to parse all identifiers and strings into a stream
of tokens first.
Bruno
2008-08-03 Bruno Haible <address@hidden>
* x-python.c (mixed_string_buffer_append): Replace a lone high
surrogate with U+FFFD.
Reported by Yann <address@hidden>
via Santiago Vila <address@hidden>.
*** x-python.c 20 Apr 2008 05:23:52 -0000 1.32
--- x-python.c 3 Aug 2008 19:56:58 -0000
***************
*** 930,935 ****
--- 930,940 ----
if (c >= UNICODE (0xd800) && c < UNICODE (0xdc00))
bp->utf16_surr = UNICODE_VALUE (c);
+ else if (c >= UNICODE (0xdc00) && c < UNICODE (0xe000))
+ {
+ /* A half surrogate is invalid, therefore use U+FFFD instead. */
+ mixed_string_buffer_append_unicode (bp, 0xfffd);
+ }
else
mixed_string_buffer_append_unicode (bp, UNICODE_VALUE (c));
}