Re: Bug#493218: gettext: crash with some unicode chars (fwd)

bug-gnu-utils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug#493218: gettext: crash with some unicode chars (fwd)

From:	Bruno Haible
Subject:	Re: Bug#493218: gettext: crash with some unicode chars (fwd)
Date:	Sun, 3 Aug 2008 22:04:24 +0200
User-agent:	KMail/1.5.4

Hi,

Yann <address@hidden> wrote:
> to reproduce, just open a file test.py with only u'\udfff' in it, and
> run xgettext t.py
> we get a Aborted message

Find attached the fix that I just committed. Thanks for the report.

> This string isn't translatable, so why xgettext parse it? And why does
> it fail?

xgettext's logic would be more complex if it was parsing only when deemed
"necessary". It's simpler to parse all identifiers and strings into a stream
of tokens first.

Bruno


2008-08-03  Bruno Haible  <address@hidden>

        * x-python.c (mixed_string_buffer_append): Replace a lone high
        surrogate with U+FFFD.
        Reported by Yann <address@hidden>
        via Santiago Vila <address@hidden>.

*** x-python.c  20 Apr 2008 05:23:52 -0000      1.32
--- x-python.c  3 Aug 2008 19:56:58 -0000
***************
*** 930,935 ****
--- 930,940 ----
  
          if (c >= UNICODE (0xd800) && c < UNICODE (0xdc00))
            bp->utf16_surr = UNICODE_VALUE (c);
+         else if (c >= UNICODE (0xdc00) && c < UNICODE (0xe000))
+           {
+             /* A half surrogate is invalid, therefore use U+FFFD instead.  */
+             mixed_string_buffer_append_unicode (bp, 0xfffd);
+           }
          else
            mixed_string_buffer_append_unicode (bp, UNICODE_VALUE (c));
        }

[Prev in Thread]

Current Thread

[Next in Thread]

Bug#493218: gettext: crash with some unicode chars (fwd), Santiago Vila, 2008/08/03
- Re: Bug#493218: gettext: crash with some unicode chars (fwd), Bruno Haible <=

Prev by Date: Bug#493218: gettext: crash with some unicode chars (fwd)
Next by Date: diff 2.8.7, -y and ')'
Previous by thread: Bug#493218: gettext: crash with some unicode chars (fwd)
Next by thread: diff 2.8.7, -y and ')'
Index(es):
- Date
- Thread