emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Several serious problems


From: Kenichi Handa
Subject: Re: Several serious problems
Date: Mon, 2 Sep 2002 10:28:25 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, Richard Stallman <address@hidden> writes:
>     That depends on whether you include code in utf-8.el that encodes
>     those charsets.  If not, you need that change.

> In that case, I will install that change presently, and then we can
> study the question of whether to include the code in utf-8.el instead.

> What does that code in utf-8.el do, and how safe a change is it?

It defines two CCL codes to decode and encode utf-8 byte
sequence, and makes the coding system mule-utf-8 by using
those CCL codes.

I'll attach the necessary change to enable RC's utf-8 to
encode latin-X plus alpha (e.g. thai).  The docstring of
mule-utf-8 may need improvement.

As the change is very small and that code has been in HEAD
for more than one month, I think the change is quite safe.
I recommend to install it in RC.

I also checked the code to some extent by this testsuite.

(dolist (charset (delq 'ascii
                       (delq 'eight-bit-control
                             (delq 'eight-bit-graphic
                                   (coding-system-get 'mule-utf-8
                                                      'safe-charsets)))))
  (let ((dimension (charset-dimension charset))
        str)
    (if (= dimension 1)
        (setq str (string (make-char charset 33) (make-char charset 34)))
      (setq str (string (make-char charset 33 33) (make-char charset 33 34))))
    (or (memq 'mule-utf-8 (find-coding-systems-string str))
        (not (string-match "\357\277\275" ; UTF-8 form of U+FFFD
                           (encode-coding-string str 'mule-utf-8)))

        (error (format "%s is not supported" charset)))))

---
Ken'ichi HANDA
address@hidden

*** utf-8.el.~1.9.4.2.~ Tue Jul 23 13:54:13 2002
--- utf-8.el    Mon Sep  2 10:28:26 2002
***************
*** 269,275 ****
       (loop
        (if (r5 < 0)
          ((r1 = -1)
!          (read-multibyte-character r0 r1))
        (;; We have already done read-multibyte-character.
         (r0 = r5)
         (r1 = r6)
--- 269,277 ----
       (loop
        (if (r5 < 0)
          ((r1 = -1)
!          (read-multibyte-character r0 r1)
!          (translate-character ucs-mule-to-mule-unicode r0 r1))
! 
        (;; We have already done read-multibyte-character.
         (r0 = r5)
         (r1 = r6)
***************
*** 392,397 ****
--- 394,423 ----
     mule-unicode-0100-24ff
     mule-unicode-2500-33ff
     mule-unicode-e000-ffff
+    latin-iso8859-2 (*)
+    latin-iso8859-3 (*)
+    latin-iso8859-4 (*)
+    cyrillic-iso8859-5 (*)
+    arabic-iso8859-6 (*)
+    greek-iso8859-7 (*)
+    hebrew-iso8859-8 (*)
+    latin-iso8859-9 (*)
+    latin-iso8859-14 (*)
+    latin-iso8859-15 (*)
+    chinese-sisheng (*)
+    ethiopic (*)
+    ipa (*)
+    lao (*)
+    katakana-jisx0201 (*)
+    thai-tis620 (*)
+    tibetan (*)
+    vietnamese-viscii-lower (*)
+    vietnamese-viscii-upper (*)
+ 
+ Among them, the charsets labeled \"(*)\" are supported only on
+ encoding.  That means, they are correctly encoded to UTF-8, but are
+ decoded back to charsets latin-iso8859-1, mule-unicode-0100-24ff, or
+ mule-unicode-2500-33ff, not to the original charsets.
  
  Unicode characters out of the ranges U+0000-U+33FF and U+E200-U+FFFF
  are decoded into sequences of eight-bit-control and eight-bit-graphic
***************
*** 409,415 ****
      latin-iso8859-1
      mule-unicode-0100-24ff
      mule-unicode-2500-33ff
!     mule-unicode-e000-ffff)
     (mime-charset . utf-8)
     (coding-category . coding-category-utf-8)
     (valid-codes (0 . 255))))
--- 435,460 ----
      latin-iso8859-1
      mule-unicode-0100-24ff
      mule-unicode-2500-33ff
!     mule-unicode-e000-ffff
!     latin-iso8859-2 
!     latin-iso8859-3 
!     latin-iso8859-4 
!     cyrillic-iso8859-5 
!     arabic-iso8859-6 
!     greek-iso8859-7 
!     hebrew-iso8859-8 
!     latin-iso8859-9 
!     latin-iso8859-14 
!     latin-iso8859-15 
!     chinese-sisheng 
!     ethiopic 
!     ipa 
!     lao 
!     katakana-jisx0201 
!     thai-tis620 
!     tibetan 
!     vietnamese-viscii-lower 
!     vietnamese-viscii-upper)
     (mime-charset . utf-8)
     (coding-category . coding-category-utf-8)
     (valid-codes (0 . 255))))




reply via email to

[Prev in Thread] Current Thread [Next in Thread]