emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unibyte strings in Lisp data structures


From: Eli Zaretskii
Subject: Re: Unibyte strings in Lisp data structures
Date: Tue, 13 Jul 2010 19:13:59 +0300

> From: Andreas Schwab <address@hidden>
> Cc: Kenichi Handa <address@hidden>,  address@hidden
> Date: Tue, 13 Jul 2010 17:05:30 +0200
> 
> Eli Zaretskii <address@hidden> writes:
> 
> > What code decides that they should be unibyte, when Emacs reads
> > jka-cmpr-hook.el?
> 
> Strings are read as unibyte by default unless they contain non-ascii,
> non-8-bit characters. (See (elisp) Converting Representations::).

Thanks, but I'm not sure this is relevant.  The section you pointed to
deals with conversions and insertions, not with how strings are read
by the Lisp reader.

Note that in jka-cmpr-hook.el, these magic signatures are specified as
octal escapes:

    ["\\.g?z\\(~\\|\\.~[0-9]+~\\)?\\'"
     "compressing"        "gzip"         ("-c" "-q")
     "uncompressing"      "gzip"         ("-c" "-q" "-d")
     t t "\037\213"]

I think the relevant code is this fragment from lread.c:read_escape:

    case '0':
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
      /* An octal escape, as in ANSI C.  */
      {
        register int i = c - '0';
        register int count = 0;
        while (++count < 3)
          {
            if ((c = READCHAR) >= '0' && c <= '7')
              {
                i *= 8;
                i += c - '0';
              }
            else
              {
                UNREAD (c);
                break;
              }
          }

        if (i >= 0x80 && i < 0x100)
          i = BYTE8_TO_CHAR (i);
        return i;
      }

The BYTE8_TO_CHAR macro returns the multibyte representation of an
eight-bit byte.  Then, in read1, we do:

                if (CHAR_BYTE8_P (c))
                  force_singlebyte = 1;
                ...
        else if (force_singlebyte)
          {
            nchars = str_as_unibyte (read_buffer, p - read_buffer);

The question is now: will this rule remain stable for time long enough
to rely on it?  Or is it safer to convert both strings to the same
representation for comparison?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]