[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: eight-bit char handling in emacs-unicode
From: |
Kenichi Handa |
Subject: |
Re: eight-bit char handling in emacs-unicode |
Date: |
Sun, 23 Nov 2003 16:30:49 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
In article <jwvoev4ufqd.fsf-monnier+emacs/address@hidden>, Stefan Monnier
<address@hidden> writes:
>>>> It is perfectly possible to live in such an environment
>>>> where only the charset iso-8859-1 is used but only the
>>>> coding system utf-8 is used. In this environment, the
>>>> results of encode-coding-string and string-make-unibyte are
>>>> of course not the same, but still both operations are
>>>> meaningful.
>>> I see that encode-coding-string does the utf-8 encoding, but what
>>> does string-make-unibyte do in such a case and what is it used for ?
>> It gets iso-8859-1 code-points of all characters in a
>> multibyte string and concatenate them (the same as what is
>> does in latin-1 lang. env.).
> You mean it does the same as (encode-coding-string str 'latin-1) ?
Not exactly the same when STR contains, for instance,
Cyrillic characters. How to deal with unsupported
characters differs in operations. Encode-coding-string may
behave leniently so that the result can be decoded back
correctly (perhaps by adding some escape sequence). But,
string-make-unibyte should never change the number of
charaters. And,
> Then why use string-make-unibyte ?
There's no way to know that we should use the coding-system
latin-1 in this situation. All we know is that the default
coding-system is utf-8, and the default character set is
iso-8859-1.
>> Please try C-x C-m L utf-8 RET and see how
>> string-make-unibyte and string-make-multibyte work.
> I'll try that, but I'd like to understand the motivation for making it work
> the way it works. I've always understood those two as "trying to DTRT" in
> a very ad-hoc way such that people that used to work in an 8bit non-ASCII
> environment don't need to worry about coding-systems and still have
> things working mostly correctly.
Doing unibyte<->multibyte conversion automatically
may be an ad-hoc way. The way how they work for unsupported
characters may also be an ad-hoc way.
But, the concept of unibyte<->multibyte convesion itself is
not ad-hoc. Don't you think their meaning is very clear
when you grasp them as my way? Do you see any inconsistency
in my explanation about them?
---
Ken'ichi HANDA
address@hidden
- Re: eight-bit char handling in emacs-unicode, (continued)
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/20
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/22
- Re: eight-bit char handling in emacs-unicode,
Kenichi Handa <=
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/23
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/24
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/25
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/25
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/26
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/26
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/27
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/30
- Re: eight-bit char handling in emacs-unicode, Richard Stallman, 2003/11/25
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/25