[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 26.1.92, 26.1-mac-7.4; unrecognised escaped chars in *Help*

From: Eli Zaretskii
Subject: Re: 26.1.92, 26.1-mac-7.4; unrecognised escaped chars in *Help*
Date: Tue, 05 Mar 2019 18:07:09 +0200

> From: Van L <address@hidden>
> Date: Mon, 04 Mar 2019 12:46:02 +1100
> >From the *scratch* buffer, I lookup the keybinding possibilities by
>   C-h b
> Under the Global Bindings section, the two lines under SPC look to be
> encoded in Latin-1. I guess Emacs assumes UTF-8.

No, this has nothing to do with encoding.  This text is produced by
Emacs itself (unlike the previous problem with EWW, where the text
came from an external source), so decoding text is not necessary,
because text generated by Emacs itself and inserted into its buffers
is always in the correct "encoding" (we prefer to call that
"representation", to distinguish between the internal representation
of characters in Emacs buffers and strings, and encoded text outside

> The problem is I see \200 \377 and a two row box having inside of it
> 3FF F7F as follows
> -- quote - unknown encoding characters replaced with lookalike sequence
> SPC .. ~      self-insert-command
> \200 .. 3FF_F7F       self-insert-command
> \200 .. \377  self-insert-command

Yes.  This is admittedly confusing, although 100% correct.  To start
digging into what happens here, go to each of the 2 \200's and type
"C-u C-x =".  You will see that these two look identically on display,
but are actually two very different beasts: the former is a Unicode
character whose codepoint happens to be 200 octal (0x80 in hex), the
latter is a raw byte of the same value.  Emacs distinguishes between
them.  The confusing bit here is that they are by default both
displayed identically, for dull historical reasons (once upon a time,
Emacs didn't distinguish between them).  (Perhaps there's no longer a
reason to use this confusing display nowadays.)

So the first of the above 2 lines stands for all the non-ASCII Unicode
characters, all of which are bound to self-insert-command by default.
The funny display of both ends of that character code range is because
none of the shown codes corresponds to a printable character.  In
particular, the \200 codepoint is currently unassigned, i.e. there's
no character whose Unicode codepoint is 0x80.

By contrast, the second row shows all the raw bytes, which are also
bound to self-insert-command by default.

IOW, unlike the case with EWW showing incorrectly decoded text, here
the issue is with how characters are _displayed_, not how they are
decoded.  To change how they look you need to fiddle with display
features, not with decoding features.

And now to your question:

> I know what to do for this kind of situation in EWW, type "E latin-1 RET".
> What goes here?


  M-x customize-variable RET glyphless-char-display-control RET

In the buffer this displays, check the box to the left of the
"c1-control" group.  This enables the button to the right of the
checkbox; click on it and select the method you want, e.g. "Display
acronym" or "Display hex code in a box".  Then click "Apply".  This
will change how all the characters in the range [0x80..0x9f] are

reply via email to

[Prev in Thread] Current Thread [Next in Thread]