emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug 130397


From: Ken Stevens
Subject: Re: Bug 130397
Date: Thu, 06 Jan 2005 08:30:10 -0800

Kenichi Handa writes:


> In article <address@hidden>, Stefan Monnier <address@hidden> writes:
>
>>>>  But ispell.el should be able to automatically check whether the
>>>>  chars can be safely encoded with the coding-system and if not (as
>>>>  in your example), ispell.el will know that the word can't be
>>>>  checked by ispell and should just be skipped (and maybe marked as
>>>>  "uncheckable").
>
>>>  That seems to be a good approach.  But, just checking
>>>  whether the chars is encodable with the coding-system is not
>>>  enough.  For instance, entry for "francais" dict doesn't
>>>  contain "ñ" in CASECHARS, but "español" is safely encodable
>>>  by iso-8859-1.  So, the same error happens.  For ispell.el
>>>  to know that "español" is uncheckable, we anyway need the
>>>  current database ispell-dictionary-alist.
>
>> Aaaahhhh.... I'm beginning to understand, thank you.  But I still
>> think ispell.el should not try to check "espa" and "ol".  So I now
>> agree that the CASECHARS table is needed, but it should be used after
>> encoding the word (rather than when determining what is a word), and
>> if some char is not in CASECHARS the word should be flagged as
>> uncheckable.
>
> Although I have not yet understood the detail, "if some char
> is not in CASECHARS" is not enough.  First of all, CASECHARS
> is a regular expression.  And NOT-CASECHARS, OTHERCHARS,
> MANU-OTHERCHARS-P should also be checked somehow.  If that
> is the way we are going to take, I'd like to ask maintainers
> of ispell.el to do such a change.

Remember that the internationalization of ispell was done long before the
MULE code was added to emacs.  The encoding of the character sets and
the interaction between ispell and emacs was embodied in the ispell code
and interactions.  In ispell.el, this has been controlled by the
CASECHARS, NOT-CASECHARS, OTHERCHARS, MANY-OTHERCHARS-P,
EXTENDED-CHARACER-MODE, and CHARACTER-SET.

The problem is more complicated than simply parsing what are word
characters.  There are differences in encoding when one uses latex as
the source with it's encoding of latin characters with escape sequences
versus a raw ISO character set.  For instance, the dictionary stores
information regarding compound words, possessives, etc. in the spell
checking routines.  Knowing that the "'" character is used as a
possessive, for instance, ispell knows that "Ken's" is a correct
spelling based on the root "Ken".

Most of this complication can be invisibly hidden in ispell.  The
problems mainly arise in two circumstances.

1. when spell checking a single word.
2. when an error occurs and the error is highlighted.

For instance, one of the major issues when MULE was implemented was the
fact that multiple bytes passed to ispell may only count as a single
byte or character on the display.

Here is where most of the hassles with libraries occur.  There may well
be a much better way of encoding the character sets and interactions
right now.  Perhaps we should investigate simplifying and possibly
removing the character set issues.  We would still minimally need to
communicate mode information to ispell.

Geoff has a much better understanding of the underlying spell search
engine.  Perhaps he can shed additional light on this topic.

regards          -Ken





reply via email to

[Prev in Thread] Current Thread [Next in Thread]