Re: what-cursor-position vs. Unicode

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: what-cursor-position vs. Unicode

From:	Werner LEMBERG
Subject:	Re: what-cursor-position vs. Unicode
Date:	Fri, 09 Jun 2006 17:09:07 +0200 (CEST)

> >> #x1a265 is a character of chinese-cns11643-1, and the
> >> current Emacs doesn't support Unicode mapping for that
> >> character set.
>
> > Just wondering: Why not?
>
> Because no one has implemented it.

I've sent a `subst-cns.el' file to you, Ken'ichi-san, and the
experimental diff for utf-8.el is attached.  A great deal of character
codes is larger than U+20000; this works just fine.

> I myself want to avoid spending a time on what becomes useless in
> the future.

Well, it was rather simple; I just wrote a small perl script to
extract the data from the Unihan.txt data base.  On the other hand, I
think it is *very* important to provide good conversion from and to
Unicode for all the charsets Emacs supports, thus it wasn't wasted
time IMHO.

> In addition, in the current Emacs code, adding something like
> lisp/international/subst-cns.el leads to slower startup in CJK
> locales, which I want to avoid.

Agreed -- my changes to utf-8.el don't take this into account.  What
about an additional `unicode' language environment which loads really
all mapping tables?

BTW, I suggest to set up a `Chinese-EUC-TW' language environment for
which `subst-cns.el' is loaded by default.


    Werner

--- utf-8.el.old        2005-10-15 07:43:43.000000000 +0200
+++ utf-8.el    2006-06-09 17:01:46.000000000 +0200
@@ -1,7 +1,7 @@
 ;;; utf-8.el --- UTF-8 decoding/encoding support -*- coding: iso-2022-7bit -*-
 
 ;; Copyright (C) 2001, 2002, 2003, 2004  Free Software Foundation, Inc.
-;; Copyright (C) 2001, 2002, 2003, 2004
+;; Copyright (C) 2001, 2002, 2003, 2004, 2006
 ;;   National Institute of Advanced Industrial Science and Technology (AIST)
 ;;   Registration Number H14PRO021
 
@@ -194,6 +194,10 @@
 
 (defconst utf-translate-cjk-charsets '(chinese-gb2312
                                       chinese-big5-1 chinese-big5-2
+                                      chinese-cns11643-1 chinese-cns11643-2
+                                      chinese-cns11643-3 chinese-cns11643-4
+                                      chinese-cns11643-5 chinese-cns11643-6
+                                      chinese-cns11643-7
                                       japanese-jisx0208 japanese-jisx0212
                                       katakana-jisx0201
                                       korean-ksc5601)
@@ -267,7 +271,9 @@
        ucs-unicode-to-mule-cjk (make-hash-table :test 'eq)))
 
 (defcustom utf-translate-cjk-unicode-range '((#x2e80 . #xd7a3)
-                                            (#xff00 . #xffef))
+                                            (#xff00 . #xffef)
+                                            (#x20000 . #x2a6df)
+                                            (#x2f800 . #x2fa1f))
   "List of Unicode code ranges supported by `utf-translate-cjk-mode'.
 Setting this variable directly does not take effect;
 use either \\[customize] or the function
@@ -314,22 +320,26 @@
             (load "subst-jis")
             (load "subst-big5")
             (load "subst-gb2312")
-            (load "subst-ksc"))
+            (load "subst-ksc")
+            (load "subst-cns"))
            ((string= "Chinese-BIG5" current-language-environment)
             (load "subst-jis")
             (load "subst-ksc")
             (load "subst-gb2312")
-            (load "subst-big5"))
+            (load "subst-big5")
+            (load "subst-cns"))
            ((string= "Chinese-GB" current-language-environment)
             (load "subst-jis")
             (load "subst-ksc")
             (load "subst-big5")
-            (load "subst-gb2312"))
+            (load "subst-gb2312")
+            (load "subst-cns"))
            (t
             (load "subst-ksc")
             (load "subst-gb2312")
             (load "subst-big5")
-            (load "subst-jis")))) ; jis covers as much as big5, gb2312
+            (load "subst-jis")
+            (load "subst-cns")))) ; jis covers as much as big5, gb2312
 
     (when redefined
       (define-translation-hash-table 'utf-subst-table-for-decode
@@ -365,14 +375,22 @@
 zero or negative.  This is a minor mode.
 Enabling this allows the coding systems mule-utf-8,
 mule-utf-16le and mule-utf-16be to encode characters in the charsets
-`korean-ksc5601', `chinese-gb2312', `chinese-big5-1',
-`chinese-big5-2', `japanese-jisx0208' and `japanese-jisx0212', and to
-decode the corresponding unicodes into such characters.
+
+  korean-ksc5601
+  chinese-gb2312
+  chinese-big5-1 chinese-big5-2
+  chinese-cns11643-1 chinese-cns11643-2 chinese-cns11643-3
+  chinese-cns11643-4 chinese-cns11643-5 chinese-cns11643-6
+  chinese-cns11643-7
+  japanese-jisx0208 japanese-jisx0212
+
+and to decode the corresponding unicodes into such characters.
 
 Where the charsets overlap, the one preferred for decoding is chosen
 according to the language environment in effect when this option is
 turned on: ksc5601 for Korean, gb2312 for Chinese-GB, big5 for
-Chinese-Big5 and jisx for other environments.
+Chinese-Big5 and jisx for other environments.  The CNS charsets
+are always loaded last.
 
 This mode is on by default.  If you are not interested in CJK
 characters and want to avoid some overhead on encoding/decoding

[Prev in Thread]

Current Thread

[Next in Thread]

Re: what-cursor-position vs. Unicode, Werner LEMBERG <=

Prev by Date: Re: PURESIZE increased (again)
Next by Date: Error: Non-hex digit used for Unicode escape
Previous by thread: Tramp adds ^M^M to end of line
Next by thread: Error: Non-hex digit used for Unicode escape
Index(es):
- Date
- Thread