Re: Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary)

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary)

From:	Kenichi Handa
Subject:	Re: Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary)
Date:	Tue, 4 Jan 2005 21:50:33 +0900 (JST)
User-agent:	SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, Agustin Martin <address@hidden> writes:

> I was aware of this, but anyway thanks for reminding. Code is probably too
> ad-hoc, but latin{0,1} thing is also a somewhat ad-hoc scenario, where
> latin0 should have really be named as something like iso-8859-1v2, that is,
> a revision. I cannot imagine somebody using a iso-8859-2 dict and trying to
> write in a iso8859-1 buffer, but with iso-8859-1 and iso-8859-15 that is
> happening too frequently. 

> So we have a lot of people that blindly select the locale @euro variant
> without realizing its implications, and that iso-8859-1 and iso-8859-15
> are different, but very close encodings (from a practical point of view,
> they are fully equivalent for most languages but IIRC french (oe,"Y) and
> finnish {sSzZ}^, ^ stands for caron; the euro symbol seems not significant
> to spellchecking). 

> Furthermore (this is probably fixed by the CVS code you mentioned above),
> in current sid emacs utf-8 files can be checked with a latin1 dict (of
> course if they do not use chars outside latin1) using the ispell.el
> internal reencodings, but fails for iso-8859-15 declared dict.

No, this is not yet fixed.

> The current state of ispell dicts in Debian is that ifrench is iso-8859-15
> as default (although has a real latin1 entry), while finnish do not set at
> all the {s,z}-caron chars, so it is a fully latin1 entry. aspell-fr and
> aspell-fi are set to plain latin1.

> So the only language that might currently require extra work is french, and
> for it I find reasonable to use for emacs as default the iso-8859-15 entry
> (tagged as iso-8859-1 for the above sustem to work). For this I would like
> to hear Lionel's point of view, since he has put a lot of effort to make
> iso-8859-15 available for spellchecking (Hi, Lionel). 

> I personally do not like having separate iso-8859-15 entries unless they are
> really required. For the above dicts, that would be for french, and I am not
> at all sure that it is really required.

Hmmm, then how about the attached patch to the latest CVS
emacs?  With that, all equivalent charaters (e.g a-grave in
all laitn-X) should be handled well.  This patch will be
applicable also to Emacs 21.3 but not yet tested in that
version.

---
Ken'ichi HANDA
address@hidden


*** ispell.el   25 Dec 2004 11:43:11 +0900      1.151
--- ispell.el   03 Jan 2005 16:05:48 +0900      
***************
*** 1074,1088 ****
        (decode-coding-string str (ispell-get-coding-system))
      str))
  
  (defun ispell-get-casechars ()
!   (ispell-decode-string
!    (nth 1 (assoc ispell-dictionary ispell-dictionary-alist))))
  (defun ispell-get-not-casechars ()
!   (ispell-decode-string
!    (nth 2 (assoc ispell-dictionary ispell-dictionary-alist))))
  (defun ispell-get-otherchars ()
!   (ispell-decode-string
!    (nth 3 (assoc ispell-dictionary ispell-dictionary-alist))))
  (defun ispell-get-many-otherchars-p ()
    (nth 4 (assoc ispell-dictionary ispell-dictionary-alist)))
  (defun ispell-get-ispell-args ()
--- 1074,1127 ----
        (decode-coding-string str (ispell-get-coding-system))
      str))
  
+ (put 'ispell-unified-chars-table 'char-table-extra-slots 0)
+ 
+ ;; Char-table that maps an Unicode character (charset:
+ ;; latin-iso8859-1, mule-unicode-0100-24ff) to
+ ;; a string in which all equivalent characters are listed.
+ 
+ (defconst ispell-unified-chars-table
+   (let ((table (make-char-table 'ispell-unified-chars-table)))
+     (map-char-table
+      #'(lambda (c v)
+        (if (and v (/= c v))
+            (let ((unified (or (aref table v) (string v))))
+              (aset table v (concat unified (string c))))))
+      ucs-mule-8859-to-mule-unicode)
+     table))
+ 
+ ;; Return a string decoded from Nth element of the current dictionary
+ ;; while splicing equivalent characters into the string.  This splicing
+ ;; is done only if the string is a regular expression of the form
+ ;; "[...]" because, otherwise, splicing will result in incorrect
+ ;; regular expression matching.
+ 
+ (defun ispell-get-decoded-string (n)
+   (let* ((slot (assoc ispell-dictionary ispell-dictionary-alist))
+        (str (nth n slot)))
+     (when (and (> (length str) 0)
+              (not (multibyte-string-p str)))
+       (setq str (ispell-decode-string str))
+       (if (and (= (aref str 0) ?\[)
+              (eq (string-match "\\]" str) (1- (length str))))
+         (setq str
+               (string-as-multibyte
+                (mapconcat
+                 #'(lambda (c)
+                     (let ((unichar (aref ucs-mule-8859-to-mule-unicode c)))
+                       (if unichar
+                           (aref ispell-unified-chars-table unichar)
+                         (string c))))
+                 str ""))))
+       (setcar (nthcdr n slot) str))
+     str))
+ 
  (defun ispell-get-casechars ()
!   (ispell-get-decoded-string 1))
  (defun ispell-get-not-casechars ()
!   (ispell-get-decoded-string 2))
  (defun ispell-get-otherchars ()
!   (ispell-get-decoded-string 3))
  (defun ispell-get-many-otherchars-p ()
    (nth 4 (assoc ispell-dictionary ispell-dictionary-alist)))
  (defun ispell-get-ispell-args ()

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary), Kenichi Handa <=
- Re: Bug 130397, Stefan, 2005/01/04
  - Re: Bug 130397, Kenichi Handa, 2005/01/04
    - Re: Bug 130397, Stefan Monnier, 2005/01/04
    - Re: Bug 130397, Kenichi Handa, 2005/01/05
    - Re: Bug 130397, Stefan Monnier, 2005/01/05
    - Re: Bug 130397, Kenichi Handa, 2005/01/05
    - Re: Bug 130397, Ken Stevens, 2005/01/06
    - Re: Bug 130397, Stefan Monnier, 2005/01/06
    - Re: Bug 130397, Kenichi Handa, 2005/01/06
    - Re: Bug 130397, Agustin Martin, 2005/01/07

Prev by Date: format-mode-line
Next by Date: Re: Getting more info on a variable in Customize buffers
Previous by thread: format-mode-line
Next by thread: Re: Bug 130397
Index(es):
- Date
- Thread