Re: eight-bit char handling in emacs-unicode

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode

From:	Stefan Monnier
Subject:	Re: eight-bit char handling in emacs-unicode
Date:	18 Nov 2003 12:12:10 -0500
User-agent:	Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50

>>> The basic problem is that we don't distinguish a character
>>> (code) and a number.  So, we introduce a character object

>> That's one way to look at the problem.
>> Another is to say that the problem is instead that we do not distinguish
>> between arrays of chars and arrays of bytes.

> I agree that it's possible to grasp the problem in that way,
> but I'm not sure which is the better way.  Could you explain
> WHY yours is better?

I'm not sure whether it's better or worse.  The problem I have with the
introduction of a new type for chars is that it is a change that has far
reaching consequences and I'm not sure it would solve all our problems
since many of the problems have to do with bad elisp code.

>> Which of 1 to 3 is the best is not clear, and maybe we can just live with
>> `make-string-unibyte' and `make-string-multibyte'.

> I think you mean string-make-unibyte/multibyte, but, for the
> current problem, we can't use it because string-make-unibyte
> may behave differently in different language environment.
> Such a lang. env. that makes iso-8859-1 or Unicode the
> highest priority for the character `À' is ok.

> (string-make-unibyte (concat '(?a 192))) = "a\300"

> But, if some lang. env. prefers such a charset for `À' that
> encodes it not to 192 (e.g. Vietnamese VSCII), we fail.

No.  My `make-string-unibyte' should only work to convert "bytes in
multibyte string" to "bytes in unibyte string": there's no char, thus no
coding-system.  If the multibyte string argument contains a char that's
not an eight-bit-char, then it's an error.

To do what your string-make-unibyte does you should use
`encode-coding-string' where the coding system is passed explicitly.

I've changed my Emacs so that string-make-unibyte does the above
(i.e. signals an error if it encounters a non-byte char) and it works fairly
well, except for the few places where the elisp code is sloppy and needs to
be fixed.

>> Note that 1-3 are not mutually exclusive so we can use
>> them all.

> Yes, but, at least, I really want to avoid "(3) Make a
> series of new functions".

(defun concat-unibyte (&rest x)
  (make-string-unibyte (apply 'concat x)))
...

so we don't need this series of new functions, but if some of them are used
often enough, we can add them of course.


        Stefan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: BIG5-HKSCS?, (continued)

Prev by Date: Re: Changes to Texinfo DTD
Next by Date: Re: Patch to display "System" colors
Previous by thread: Re: eight-bit char handling in emacs-unicode
Next by thread: Re: eight-bit char handling in emacs-unicode
Index(es):
- Date
- Thread