[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-c
From: |
Eli Zaretskii |
Subject: |
bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display |
Date: |
Sat, 03 Nov 2012 23:13:40 +0200 |
> From: "Drew Adams" <drew.adams@oracle.com>
> Date: Sat, 3 Nov 2012 12:01:29 -0700
> Cc: 12054@debbugs.gnu.org
>
> I think I understand this (but I might be misunderstanding). The \240 in the
> 4-char ASCII regexp string "\240" is interpreted (read?) as a raw byte, not as
> the char I wanted.
Yes.
> That is, the literal string in my code is read as a string that contains only
> a
> single raw byte of octal 240 in place of the 4 chars \240 (and instead of as a
> string with the multibyte char no-break space). Is that right?
Yes.
> And putting that together with Eli's statement about insertion ("'insert'
> treats
> strings such as "\nnn" as unibyte strings"), I understand that the buffer text
> after I type `C-q 240' contains a unibyte raw byte, and not the multibyte char
> no-break space.
No. It contains the NBSP. Try it. C-q inserts a multibyte
character, unlike '(insert "\240")', for example.
> But in that case I do not understand why `C-u C-x =' says that it _is_ the
> Unicode no-break space char.
Because it is.
> And I do not understand why Yidong's font-lock correction also shows
> that it is a no-break space char.
Chong didn't use "\240".
> So I'm confused about what is actually in the buffer. From the doc and from
> Eli's statement, I gather that there is a unibyte raw byte (octal 240) at that
> position. But `C-u C-x =' and font-lock seem to tell me that there is a
> (multibyte) no-break space char there.
Try '(insert "\240")' and then "C-x =" will show a unibyte byte.
> > (One reason for doing this is to allow unibyte strings to
> > be specified using string constants in Emacs Lisp source code.)
>
> I can see how that can be useful. But I can also see how it would be useful
> to
> have some way of using octal syntax to match multibyte chars. Isn't there
> some
> reasonable way to allow for both?
Maybe, but we didn't find one, at least not one that would be
backward-compatible.
> Is there, for example, (or could there be added) a function that one can apply
> to the unibyte string for \240 that would convert it to a string that DTRT wrt
> multibyte?
Such functions do exist, see the "Converting Representations" node in
the ELisp manual.
> (decode-coding-string "\302\240" 'utf-8)
>
> That allows use of only octal syntax - good. But it still doesn't solve the
> problem for older Emacs versions - they raise the error (coding-system-error
> utf-8).
You don't want this, because even if you succeed in producing a NBSP
in Emacs 22 and older, the result will not match NBSP in other
charsets. It's simply impossible with those versions of Emacs.
- bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display, (continued)
bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display, Chong Yidong, 2012/11/03
bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display, Drew Adams, 2012/11/03
bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display,
Eli Zaretskii <=
bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display, Drew Adams, 2012/11/04
bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display, Andreas Schwab, 2012/11/03