[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: what-cursor-position vs. Unicode
From: |
Werner LEMBERG |
Subject: |
Re: what-cursor-position vs. Unicode |
Date: |
Fri, 09 Jun 2006 17:09:07 +0200 (CEST) |
> >> #x1a265 is a character of chinese-cns11643-1, and the
> >> current Emacs doesn't support Unicode mapping for that
> >> character set.
>
> > Just wondering: Why not?
>
> Because no one has implemented it.
I've sent a `subst-cns.el' file to you, Ken'ichi-san, and the
experimental diff for utf-8.el is attached. A great deal of character
codes is larger than U+20000; this works just fine.
> I myself want to avoid spending a time on what becomes useless in
> the future.
Well, it was rather simple; I just wrote a small perl script to
extract the data from the Unihan.txt data base. On the other hand, I
think it is *very* important to provide good conversion from and to
Unicode for all the charsets Emacs supports, thus it wasn't wasted
time IMHO.
> In addition, in the current Emacs code, adding something like
> lisp/international/subst-cns.el leads to slower startup in CJK
> locales, which I want to avoid.
Agreed -- my changes to utf-8.el don't take this into account. What
about an additional `unicode' language environment which loads really
all mapping tables?
BTW, I suggest to set up a `Chinese-EUC-TW' language environment for
which `subst-cns.el' is loaded by default.
Werner
--- utf-8.el.old 2005-10-15 07:43:43.000000000 +0200
+++ utf-8.el 2006-06-09 17:01:46.000000000 +0200
@@ -1,7 +1,7 @@
;;; utf-8.el --- UTF-8 decoding/encoding support -*- coding: iso-2022-7bit -*-
;; Copyright (C) 2001, 2002, 2003, 2004 Free Software Foundation, Inc.
-;; Copyright (C) 2001, 2002, 2003, 2004
+;; Copyright (C) 2001, 2002, 2003, 2004, 2006
;; National Institute of Advanced Industrial Science and Technology (AIST)
;; Registration Number H14PRO021
@@ -194,6 +194,10 @@
(defconst utf-translate-cjk-charsets '(chinese-gb2312
chinese-big5-1 chinese-big5-2
+ chinese-cns11643-1 chinese-cns11643-2
+ chinese-cns11643-3 chinese-cns11643-4
+ chinese-cns11643-5 chinese-cns11643-6
+ chinese-cns11643-7
japanese-jisx0208 japanese-jisx0212
katakana-jisx0201
korean-ksc5601)
@@ -267,7 +271,9 @@
ucs-unicode-to-mule-cjk (make-hash-table :test 'eq)))
(defcustom utf-translate-cjk-unicode-range '((#x2e80 . #xd7a3)
- (#xff00 . #xffef))
+ (#xff00 . #xffef)
+ (#x20000 . #x2a6df)
+ (#x2f800 . #x2fa1f))
"List of Unicode code ranges supported by `utf-translate-cjk-mode'.
Setting this variable directly does not take effect;
use either \\[customize] or the function
@@ -314,22 +320,26 @@
(load "subst-jis")
(load "subst-big5")
(load "subst-gb2312")
- (load "subst-ksc"))
+ (load "subst-ksc")
+ (load "subst-cns"))
((string= "Chinese-BIG5" current-language-environment)
(load "subst-jis")
(load "subst-ksc")
(load "subst-gb2312")
- (load "subst-big5"))
+ (load "subst-big5")
+ (load "subst-cns"))
((string= "Chinese-GB" current-language-environment)
(load "subst-jis")
(load "subst-ksc")
(load "subst-big5")
- (load "subst-gb2312"))
+ (load "subst-gb2312")
+ (load "subst-cns"))
(t
(load "subst-ksc")
(load "subst-gb2312")
(load "subst-big5")
- (load "subst-jis")))) ; jis covers as much as big5, gb2312
+ (load "subst-jis")
+ (load "subst-cns")))) ; jis covers as much as big5, gb2312
(when redefined
(define-translation-hash-table 'utf-subst-table-for-decode
@@ -365,14 +375,22 @@
zero or negative. This is a minor mode.
Enabling this allows the coding systems mule-utf-8,
mule-utf-16le and mule-utf-16be to encode characters in the charsets
-`korean-ksc5601', `chinese-gb2312', `chinese-big5-1',
-`chinese-big5-2', `japanese-jisx0208' and `japanese-jisx0212', and to
-decode the corresponding unicodes into such characters.
+
+ korean-ksc5601
+ chinese-gb2312
+ chinese-big5-1 chinese-big5-2
+ chinese-cns11643-1 chinese-cns11643-2 chinese-cns11643-3
+ chinese-cns11643-4 chinese-cns11643-5 chinese-cns11643-6
+ chinese-cns11643-7
+ japanese-jisx0208 japanese-jisx0212
+
+and to decode the corresponding unicodes into such characters.
Where the charsets overlap, the one preferred for decoding is chosen
according to the language environment in effect when this option is
turned on: ksc5601 for Korean, gb2312 for Chinese-GB, big5 for
-Chinese-Big5 and jisx for other environments.
+Chinese-Big5 and jisx for other environments. The CNS charsets
+are always loaded last.
This mode is on by default. If you are not interested in CJK
characters and want to avoid some overhead on encoding/decoding
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: what-cursor-position vs. Unicode,
Werner LEMBERG <=