bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#15984: 24.3; Problem with combining characters in attachment filenam


From: Eli Zaretskii
Subject: bug#15984: 24.3; Problem with combining characters in attachment filename
Date: Sat, 30 Nov 2013 15:20:13 +0200

> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 15984@debbugs.gnu.org
> 
> > From: nisse@lysator.liu.se (Niels Möller)
> > Cc: 15984@debbugs.gnu.org
> > Date: Fri, 29 Nov 2013 13:41:01 +0100
> > 
> >   $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs 
> > -nw -Q -l bug.el
> > 
> > where bug.el contains
> > 
> >   (setq gnus-init-file nil)
> >   (setq gnus-nntp-server nil)
> >   (gnus-no-server)
> > 
> > Then create the group with G d, pointing out the spool-like directory,
> > enter the group (RET), view the message (RET), try to write out the
> > attachment ("o" on the attachment button). Still crashes for me.
> 
> It crashes in the current development trunk as well, but only if the
> locale is set to Latin-1, like yours.
> 
> I'm looking at this.

There's something strange going on here; I'm CC'ing Handa-san, because
the problem is related to processing character compositions on a TTY.

The reason for the crash is simple: the following code from
indent.c:scan_for_column

      /* Check composition sequence.  */
      if (cmp_it.id >= 0
          || (scan == cmp_it.stop_pos
              && composition_reseat_it (&cmp_it, scan, scan_byte, end,
                                        w, NULL, Qnil)))
        composition_update_it (&cmp_it, scan, scan_byte, Qnil);
      if (cmp_it.id >= 0)
        {
          scan += cmp_it.nchars;
          scan_byte += cmp_it.nbytes;
          if (scan <= end)
            col += cmp_it.width;
          if (cmp_it.to == cmp_it.nglyphs)
            {
              cmp_it.id = -1;
              composition_compute_stop_pos (&cmp_it, scan, scan_byte, end,
                                            Qnil);
            }
          else
            cmp_it.from = cmp_it.to;
          continue;
        }

incorrectly steps into the middle of a multibyte sequence #xCC #x88
for the character u+0308, the Combining Diaeresis, because
cmp_it.nbytes is computed as 1 instead of 2.  The question is why it
does so.

>From stepping through composition_reseat_it and composition_update_it,
it looks like the code contradicts itself: it thinks that 'a' and the
combining diaeresis should be composed, but then acts as if no
composition should happen.  As result, this code in
composition_update_it:

      glyph = LGSTRING_GLYPH (gstring, cmp_it->from);
      cmp_it->nchars = LGLYPH_TO (glyph) + 1 - from;
      cmp_it->nbytes = 0;
      cmp_it->width = 0;
      for (i = cmp_it->nchars - 1; i >= 0; i--)
        {
          c = XINT (LGSTRING_CHAR (gstring, i));
          cmp_it->nbytes += CHAR_BYTES (c);
          cmp_it->width += CHAR_WIDTH (c);
        }

always considers only 'a', never the diaeresis, and so cmp_it->nbytes
is always computed as 1.  So scan_for_column advances only 1 byte,
instead of 2, and finds itself in the middle of a multibyte sequence.
>From there, it's a sure way to a crash.

I hope Handa-san will be able to find the problem.  The crash is 100%
reproducible with the steps described above and a mail message that
Niels can send you off-list.

TIA





reply via email to

[Prev in Thread] Current Thread [Next in Thread]