bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16457: 24.3.50; crash rendering Arabic Uthmani script


From: Eli Zaretskii
Subject: bug#16457: 24.3.50; crash rendering Arabic Uthmani script
Date: Thu, 16 Jan 2014 19:33:22 +0200

> Date: Thu, 16 Jan 2014 12:01:04 +0400
> From: Dmitry Antipov <address@hidden>
> CC: address@hidden
> 
> I'm not familiar with composition sequences in detail

The compositions stuff is under-documented.  I provide some
information I know of below.

> For the uthmani-test.txt, the following code in set_iterator_to_next:
> 
>    7127                /* Composition created while scanning forward.  */
>    7128                /* Update IT's char/byte positions to point to the 
> first
>    7129                   character of the next grapheme cluster, or to the
>    7130                   character visually after the current composition.  
> */
>    7131                for (i = 0; i < it->cmp_it.nchars; i++)
>    7132                  bidi_move_to_visually_next (&it->bidi_it);
>    7133                IT_BYTEPOS (*it) = it->bidi_it.bytepos;
>    7134                IT_CHARPOS (*it) = it->bidi_it.charpos;
> 
> advances IT from charpos:bytepos 11:21 to 13:25.  But the following fragment
> from scan_for_column:
> 
>     586        /* Check composition sequence.  */
>     587        if (cmp_it.id >= 0
>     588            || (scan == cmp_it.stop_pos
>     589                && composition_reseat_it (&cmp_it, scan, scan_byte, 
> end,
>     590                                          w, NULL, Qnil)))
>     591          composition_update_it (&cmp_it, scan, scan_byte, Qnil);
>     592        if (cmp_it.id >= 0)
>     593          {
>     594            scan += cmp_it.nchars;
>     595            scan_byte += cmp_it.nbytes;
> 
> advances SCAN:SCAN_BYTE from 11:21 to 13:24.  So the byte position becomes 
> invalid
> and FETCH_CHAR_ADVANCE decodes invalid byte sequence to invalid character C.
> Finally, CHAR_TABLE_REF (Vcomposition_function_table, C) goes out of bounds.

In effect, you are saying that cmp_it.nbytes above is incorrect.

This is really strange.  First, I cannot reproduce the crash on
MS-Windows, so the problem might be related to the shaping engine
being used (I presume yours is libotf and libm17n).  (I tried on both
Windows XP and on Windows 7, which have very different versions of
Uniscribe, and they both work fine.)

Moreover, set_iterator_to_next uses the same code from composite.c
that scan_for_column does, so it is unclear to me how the former
works, while the latter doesn't.

Specifically, cmp_it.nbytes is computed in composition_update_it as
the sum of byte-widths of all the characters being composed:

      cmp_it->width = 0;
      for (i = cmp_it->nchars - 1; i >= 0; i--)
        {
          c = XINT (LGSTRING_CHAR (gstring, cmp_it->from + i));
          cmp_it->nbytes += CHAR_BYTES (c);
          cmp_it->width += CHAR_WIDTH (c);
        }

And the characters in the LGSTRING object are simply copied from the
buffer in fill_gstring_header, when LGSTRING is created:

  for (i = 0; i < len; i++)
    {
      int c;

      if (NILP (string))
        FETCH_CHAR_ADVANCE_NO_CHECK (c, from, from_byte);
      else
        FETCH_STRING_CHAR_ADVANCE_NO_CHECK (c, string, from, from_byte);
      ASET (header, i + 1, make_number (c));
    }

Could you please trace through these fragments and see what goes wrong
there?  Specifically, what characters (which Unicode codepoints) are
being composed, and what are the contents of the cmp_it structure in
scan_for_column when it advances from 11:21 to 13:24.  (Granted, here
I see it advance from 11:21 to 13:25, as expected.)

Also, what does "C-u C-x =" report when you put the cursor in column
10?

Some more details:

The LGSTRING object is created when Emacs encounters for the first
time a group of characters that should be composed together.  The
structure of LGSTRING is describe in the comments to
composition-get-gstring.  Emacs recognizes the character compositions
in composition_reseat_it, which calls autocmp_chars, which calls
composition-get-gstring, which collects the characters to be composed
by calling fill_gstring_header, as shown in the fragment above.

The LGSTRING object is then cached, such that later references to it
use the cached data, instead of computing it from scratch.  The cmp_it
structure holds an ID of the LGSTRING which can be used to look it up
in the cached.  When composition_update_it is called, simply uses the
information already stored in LGSTRING to advance past the composed
characters.

So to understand why it crashes for you, we need to find out why the
nbytes value stored by fill_gstring_header somehow became incorrect.

Btw, does the problem go away if you disable cache-long-scans?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]