[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: MML charset tag regression
From: |
Kenichi Handa |
Subject: |
Re: MML charset tag regression |
Date: |
Mon, 28 Apr 2003 20:58:34 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
In article <address@hidden>, "James H. Cloos Jr." <address@hidden> writes:
>>>>>> "Simon" == Simon Josefsson <address@hidden> writes:
Simon> For me, when I yanked the string into emacs from galeon it
Simon> becomes double-width. It is single-width in galeon though.
> I also see that; any pasting of cyrillic text via pasting X's
> primary or from the clipboard. The wide cyrillic is from the
> japanese-jisx0208 charset.
[...]
In article <address@hidden>, Simon Josefsson <address@hidden> writes:
> That may be interesting by itself. Go to
> http://www.nns.ru/persons/gorbach.html using galeon (or mozilla, I
> think). Cut'n'paste the first word and yank it in Emacs. It looks as
> single-width in galeon, but when yanked into emacs it becomes double
> width. Yanking it into xterm or gnome-terminal doesn't change the
> string, it looks like single-width. Save the HTML file and open it in
> emacs as a koi8 file (note that emacs doesn't auto detect it as koi8
> so you to do that manually), then it is single-width too.
> I guess it is the emacs X cut'n'paste code that somehow makes the
> string into double width japanese characters.
I don't think so. There's no such code in Emacs that does
such a conversion.
I think galeon sends Emacs those cyrillic characters by
encoding into COMPOUND_TEXT as a charset of JISX0208.
Please try this:
At first, select a cyrillic text on galeon. Then type this
in Emacs: C-x RET X raw-text RET C-y. You'll see something
like this; "ESC $ ( B ...".
Next, try this:
At first, select a cyrillic text on galeon. Then evalute
this in Emacs:
(decode-coding-string (x-get-selecion 'PRIMARY 'UTF8_STRING) 'utf-8)
I think you'll see single width cyrillic chars (you have to
have a iso10646-1 font containing cyrillic glyphs).
The selection problem is very deep. :-(
Ideally, the requester should be able to request of the type
'TEXT instead of the specific 'COMPOUND_TEXT or
'UTF8_STRING, and the requestee should return a text by one
of these appropriate types that can endocde the text;
STRING, COMPOUND_TEXT, or UTF8_STRING (in this priority
order).
But, unfortunetely, many X clients (requestee) don't behaves
like that. If 'TEXT is requested, many returns just "?????"
even if the text can be correctly encoded by COMPOUND_TEXT
or UTF8_STRING.
So, it is necessary for Emacs to request by a specific type
'COMPOUND_TEXT ('UTF8_STRING has been recently introduced in
XFree86, and there are many clients that still doesn't
support it).
Recently, many gtk clients start supporting UTF8_STRING
without making COMPOUND_TEXT support better. It may cause
no problem between gtk clients because they will request
only the type UTF8_STING. But, it's a too shortsighted
manner. :-(
The new encoding method using "Non-Standard Character Set
Encodings" of COMPOUND_TEXT makes the cyrillic case much
more complicated. In some case (perhaps only in KOI8
locale), X clients recently start to encode cyrillic
characters in "ESC % / 0 ...". They don't consider the
situation that the requester is running in a different
locale. :-(
Perhaps, we should make Emacs to request UTF8_STRING at
first if the locale is UTF8, and if that request fails,
request COMPOUND_TEXT.
---
Ken'ichi HANDA
address@hidden