Re: eight-bit char handling in emacs-unicode

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode

From:	Kenichi Handa
Subject:	Re: eight-bit char handling in emacs-unicode
Date:	Wed, 19 Nov 2003 09:06:55 +0900 (JST)
User-agent:	SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <jwvn0atd38w.fsf-monnier+emacs/address@hidden>, Stefan Monnier 
<address@hidden> writes:
> I'm not sure whether it's better or worse.  The problem I have with the
> introduction of a new type for chars is that it is a change that has far
> reaching consequences and I'm not sure it would solve all our problems
> since many of the problems have to do with bad elisp code.

I see.  Apart from the design itself, I agree that it's
difficult to introduce a new type.  But, when I discussed
with Richard about the Character type object a few year ago,
he was not that negative provided that it gives sure
improvement.

>>>  Which of 1 to 3 is the best is not clear, and maybe we can just live with
>>>  `make-string-unibyte' and `make-string-multibyte'.

>>  I think you mean string-make-unibyte/multibyte, but, for the

> No.  My `make-string-unibyte' should only work to convert "bytes in
> multibyte string" to "bytes in unibyte string": there's no char, thus no
> coding-system.

I see.  In emacs-unicode, I already introduced
string-to-multibyte which, I think, is the same as your
make-string-multibyte.   But,

> If the multibyte string argument contains a char that's
> not an eight-bit-char, then it's an error.

Then, we can't use make-string-unibyte for the current case
because, in emacs-unicode, (concat '(?a 192)) returns a
multibyte string whose second element is A-grave, not an
eight-bit-char.  Am I missing something?

> To do what your string-make-unibyte does you should use
> `encode-coding-string' where the coding system is passed explicitly.

Those are conceptually different things (I remember the
similar discussion we had a while ago).

encode-coding-string does:
char-sequence --CCS-set--> (CCS/codepoint-pair)-sequence
  --CES--> encoded-byte-sequence

string-make-unibyte does:
char-sequence --CCS--> code-point-sequence
  --concat--> code-point-sequence

These two yield the same result only when CCS support all
chars in "char-sequence" and CES is stateless
(e.g. iso-latin-1) and .

> I've changed my Emacs so that string-make-unibyte does the above
> (i.e. signals an error if it encounters a non-byte char) and it works fairly
> well, except for the few places where the elisp code is sloppy and needs to
> be fixed.

How did you change it?  string-make-unibyte internally uses
the function copy_text.  Did you change it?  But, then, each
time you copy a multibyte string into a unibyte buffer, you
should get an error.

>>>  Note that 1-3 are not mutually exclusive so we can use
>>>  them all.

>>  Yes, but, at least, I really want to avoid "(3) Make a
>>  series of new functions".

> (defun concat-unibyte (&rest x)
>   (make-string-unibyte (apply 'concat x)))
> ...

As I wrote above, this should signal an error on:
  (concat-unibyte '(?a 192))

> so we don't need this series of new functions, but if some of them are used
> often enough, we can add them of course.

---
Ken'ichi HANDA
address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

eight-bit char handling in emacs-unicode, (continued)

Prev by Date: Re: Display bug with tabs and horizontal scrolling
Next by Date: problem of marker as position
Previous by thread: Re: eight-bit char handling in emacs-unicode
Next by thread: Re: eight-bit char handling in emacs-unicode
Index(es):
- Date
- Thread