emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Re: Inadequate documentation of silly characters on screen.


From: Aidan Kehoe
Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen.
Date: Thu, 19 Nov 2009 16:47:09 +0000

 Ar an naoú lá déag de mí na Samhain, scríobh Alan Mackenzie: 

 > Hi, Stefan,
 > 
 > On Thu, Nov 19, 2009 at 10:30:18AM -0500, Stefan Monnier wrote:
 > > > The actual character in the string is ñ (#x3f).
 > 
 > > No: the string does not contain any characters, only bytes, because it's
 > > a unibyte string.
 > 
 > I'm thinking from the lisp viewpoint.  The string is a data structure
 > I really don't want to have to think about
 > the difference between "chars" and "bytes" when I'm hacking lisp.  If I
 > do, then the abstraction "string" is broken.

For some context on this, that’s how it works in XEmacs; we’ve never had
problems with it, we seem to avoid an entire class of programming errors
that GNU Emacs developers deal with on a regular basis.

Tangentally, for those that like the unibyte/multibyte distinction, to my
knowledge the editor does not have any way of representing “an octet with
numeric value < #x7f to be treated with byte semantics, not character
semantics”, which seems arbitrary to me. For example: 

;; Both the decoded sequences are illegal in UTF-16:
(split-char
 (car (append (decode-coding-string "\xd8\x00\x00\x7f" 'utf-16-be) nil)))
=> (ascii 127)

(split-char
 (car (append (decode-coding-string "\xd8\x00\x00\x80" 'utf-16-be) nil)))
=> (eight-bit-control 128)

-- 
“Apart from the nine-banded armadillo, man is the only natural host of
Mycobacterium leprae, although it can be grown in the footpads of mice.”
  -- Kumar & Clark, Clinical Medicine, summarising improbable leprosy research




reply via email to

[Prev in Thread] Current Thread [Next in Thread]