[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: can't paste non-Latin-1 text to Emacs 21.2
From: |
Kenichi Handa |
Subject: |
Re: can't paste non-Latin-1 text to Emacs 21.2 |
Date: |
Tue, 6 Apr 2004 21:56:19 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
In article <address@hidden>, Dave Love <address@hidden> writes:
>> This behaviour is controlled by the
>> function ctext-non-standard-encodings-table.
> This doesn't seem to be in NEWS.
I don't mean that behaviour is customizable. That function
is just a helper function for ctext-pre-write-conversion,
and I showed it so that you can see what's going on by
reading the code.
> Shouldn't it be controlled by a variable (user option)?
That's better, but, for the moment, I don't have a time to
design it. If someone gives me a precise design, I'll
implement it.
> The correct thing to do with text you can't
> encode with standard ISO2022 charsets appears to be to use an extended
> segment labelled as, say, utf-8. That's correct and unambiguous as
> long as you use IANA names. I did once at least start to implement
> that, but I don't remember if I finished.
That's correct but it seems that no other client can decode
it. Do you know any program that implements it?
>> Another idea is to encode such characters to some of legacy
>> charsets that are listed as "Approved Standard Encoding".
> I don't think you should restrict it to the explicit list, if that's
> what you mean. It seems fairly clear that ISO standard charsets
> should get normal ISO2022 encoding in CTEXT. (I couldn't find a
> current address for Scheifler to check the unsupported assertion
> that's wrong.) I'm sure Emacs should try to translate characters from
> private charsets to standard ones for ctext unless it can tell that
> the selection is for another Emacs client.
I've just found this code in xc/lib/X11/lcCT.c in a
distribution from X.org and XFree86.
(1) X.org version
static CTDataRec default_ct_data[] =
{
{ "ISO8859-1:GL", "\033(B" },
{ "ISO8859-1:GR", "\033-A" },
{ "ISO8859-2:GR", "\033-B" },
{ "ISO8859-3:GR", "\033-C" },
{ "ISO8859-4:GR", "\033-D" },
{ "ISO8859-7:GR", "\033-F" },
{ "ISO8859-6:GR", "\033-G" },
{ "ISO8859-8:GR", "\033-H" },
{ "ISO8859-5:GR", "\033-L" },
{ "ISO8859-9:GR", "\033-M" },
{ "ISO8859-10:GR", "\033-V" },
{ "JISX0201.1976-0:GL", "\033(J" },
{ "JISX0201.1976-0:GR", "\033)I" },
{ "GB2312.1980-0:GL", "\033$(A" },
{ "GB2312.1980-0:GR", "\033$)A" },
{ "JISX0208.1983-0:GL", "\033$(B" },
{ "JISX0208.1983-0:GR", "\033$)B" },
{ "KSC5601.1987-0:GL", "\033$(C" },
{ "KSC5601.1987-0:GR", "\033$)C" },
#ifdef notdef
{ "JISX0212.1990-0:GL", "\033$(D" },
{ "JISX0212.1990-0:GR", "\033$)D" },
{ "CNS11643.1986-1:GL", "\033$(G" },
{ "CNS11643.1986-1:GR", "\033$)G" },
{ "CNS11643.1986-2:GL", "\033$(H" },
{ "CNS11643.1986-2:GR", "\033$)H" },
#endif
{ "TIS620.2533-1:GR", "\033-T"},
{ "ISO10646-1", "\033%B"},
/* Non-Standard Character Set Encodings */
{ "KOI8-R:GR", "\033%/1\200\210koi8-r\002"},
{ "FCD8859-15:GR", "\033%/1\200\213fcd8859-15\002"},
} ;
(2) XFree86 version
static CTDataRec default_ct_data[] =
{
/* */
/* X11 registry name MIME name ISO-IR ESC sequence */
/* */
/* Registered character sets with one byte per character */
{ "ISO8859-1:GL", /* US-ASCII 6 */ "\033(B" },
{ "ISO8859-1:GR", /* ISO-8859-1 100 */ "\033-A" },
{ "ISO8859-2:GR", /* ISO-8859-2 101 */ "\033-B" },
{ "ISO8859-3:GR", /* ISO-8859-3 109 */ "\033-C" },
{ "ISO8859-4:GR", /* ISO-8859-4 110 */ "\033-D" },
{ "ISO8859-5:GR", /* ISO-8859-5 144 */ "\033-L" },
{ "ISO8859-6:GR", /* ISO-8859-6 127 */ "\033-G" },
{ "ISO8859-7:GR", /* ISO-8859-7 126 */ "\033-F" },
{ "ISO8859-8:GR", /* ISO-8859-8 138 */ "\033-H" },
{ "ISO8859-9:GR", /* ISO-8859-9 148 */ "\033-M" },
{ "ISO8859-10:GR", /* ISO-8859-10 157 */ "\033-V" },
{ "ISO8859-13:GR", /* ISO-8859-13 179 */ "\033-Y" },
{ "ISO8859-14:GR", /* ISO-8859-14 199 */ "\033-_" },
{ "ISO8859-15:GR", /* ISO-8859-15 203 */ "\033-b" },
{ "ISO8859-16:GR", /* ISO-8859-16 226 */ "\033-f" },
{ "JISX0201.1976-0:GL", /* ISO-646-JP 14 */ "\033(J" },
{ "JISX0201.1976-0:GR", "\033)I" },
{ "TIS620-0:GR", /* TIS-620 166 */ "\033-T" },
/* Registered character sets with two byte per character */
{ "GB2312.1980-0:GL", /* GB_2312-80 58 */ "\033$(A" },
{ "GB2312.1980-0:GR", /* GB_2312-80 58 */ "\033$)A" },
{ "JISX0208.1983-0:GL", /* JIS_X0208-1983 87 */ "\033$(B" },
{ "JISX0208.1983-0:GR", /* JIS_X0208-1983 87 */ "\033$)B" },
{ "JISX0208.1990-0:GL", /* JIS_X0208-1990 168 */ "\033$(B" },
{ "JISX0208.1990-0:GR", /* JIS_X0208-1990 168 */ "\033$)B" },
{ "JISX0212.1990-0:GL", /* JIS_X0212-1990 159 */ "\033$(D" },
{ "JISX0212.1990-0:GR", /* JIS_X0212-1990 159 */ "\033$)D" },
{ "KSC5601.1987-0:GL", /* KS_C_5601-1987 149 */ "\033$(C" },
{ "KSC5601.1987-0:GR", /* KS_C_5601-1987 149 */ "\033$)C" },
{ "CNS11643.1986-1:GL", /* CNS 11643-1992 pl.1 171 */ "\033$(G" },
{ "CNS11643.1986-1:GR", /* CNS 11643-1992 pl.1 171 */ "\033$)G" },
{ "CNS11643.1986-2:GL", /* CNS 11643-1992 pl.2 172 */ "\033$(H" },
{ "CNS11643.1986-2:GR", /* CNS 11643-1992 pl.2 172 */ "\033$)H" },
{ "CNS11643.1992-3:GL", /* CNS 11643-1992 pl.3 183 */ "\033$(I" },
{ "CNS11643.1992-3:GR", /* CNS 11643-1992 pl.3 183 */ "\033$)I" },
{ "CNS11643.1992-4:GL", /* CNS 11643-1992 pl.4 184 */ "\033$(J" },
{ "CNS11643.1992-4:GR", /* CNS 11643-1992 pl.4 184 */ "\033$)J" },
{ "CNS11643.1992-5:GL", /* CNS 11643-1992 pl.5 185 */ "\033$(K" },
{ "CNS11643.1992-5:GR", /* CNS 11643-1992 pl.5 185 */ "\033$)K" },
{ "CNS11643.1992-6:GL", /* CNS 11643-1992 pl.6 186 */ "\033$(L" },
{ "CNS11643.1992-6:GR", /* CNS 11643-1992 pl.6 186 */ "\033$)L" },
{ "CNS11643.1992-7:GL", /* CNS 11643-1992 pl.7 187 */ "\033$(M" },
{ "CNS11643.1992-7:GR", /* CNS 11643-1992 pl.7 187 */ "\033$)M" },
/* Registered encodings with a varying number of bytes per character */
{ "ISO10646-1", /* UTF-8 196 */ "\033%G" },
/* Encodings without ISO-IR assigned escape sequence must be
defined in XLC_LOCALE files, using "\033%/1" or "\033%/2". */
/* Backward compatibility with XFree86 3.x */
{ "ISO8859-14:GR", "\033%/1" },
{ "ISO8859-15:GR", "\033%/1" },
/* used by Emacs, but not backed by ISO-IR */
{ "BIG5-0:GL", "\033$(0" },
{ "BIG5-0:GR", "\033$)0" },
{ "BIG5-1:GL", "\033$(1" },
{ "BIG5-1:GR", "\033$)1" },
};
BUT, it seems that actually used extended segment can be
freely defined in a locale data (i.e. a file XLC_LOCALE) of
each locale. For instance,
/usr/X11R6/lib/X11/locale/georgian-academy/XLC_LOCALE,
contains this code:
XLC_CHARSET_DEFINE
csd0 {
charset_name GEORGIAN-ACADEMY
side GR
length 1
string_encoding False
sequence \x1b%/1
}
END XLC_CHARSET_DEFINE
So, in this locale, the charset GEORGIAN-ACADEMY is encoded
by using extended segment "ESC % / 1 M L GEORGIAN-ACADEMY ...".
Perhaps, each lang. env. should have ctext-encoding-list
(instead of the current ctext-non-standard-encodings) that
reflect all charsets defined in XLC_LOCALE of the
corresponding locale, and force using it in ctext encoding.
And, for a character not encodable by such encodings, it's
almost useless to struggle to find a correcnt encoding.
---
Ken'ichi HANDA
address@hidden