bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17742: Acknowledgement (Support for enchant?)


From: Agustin Martin
Subject: bug#17742: Acknowledgement (Support for enchant?)
Date: Mon, 19 Dec 2016 18:37:19 +0100
User-agent: NeoMutt/20161126 (1.7.1)

On Mon, Dec 19, 2016 at 06:01:27PM +0200, Eli Zaretskii wrote:
> > From: Reuben Thomas
> 
> > Basic tests using [[:alpha:]] for casechars and [^[:alpha:]] for 
> > not-casechars seem to work OK.
> 
> For which language and dictionary?  This will definitely do the wrong
> thing for Hunspell he_IL dictionary I have here, which says:
> 
>   WORDCHARS אבגדהוזחטיכלמנסעפצקרשתםןךףץ'"
> 
> That is, it wants ' and " to be treated as word-constituent
> characters.  As another example, I can envision a dictionary of
> acronyms and abbreviations, which might want to treat the period as a
> word-constituent character, to support the likes of "a.k.a.".
> Etc. etc. -- this is up to the dictionary to decide, and Emacs must
> follow suit.
> 
> Also, please note that [:alpha:] in Emacs 25 means a much larger set
> of characters than in previous versions, see NEWS.  It will in general
> catch strings of characters that cannot possibly be TRT for a
> single-language dictionary.  E.g.,
> 
>   (string-match "[[:alpha:]]+" "aβגд") => 0
> 
> > ​I meant [[:graph:]] and [^[:graph:]].​
> 
> This will match an even larger set in Emacs 25, I don't think we will
> ever want that for spell-checking.

Hi,

Not following this very closely, but ispell.el still use [:alpha:] for
aspell and hunspell. If I remember this properly, old meaning means
something like "as for current locale" while it has now a much wider
meaning.

For the vast majority of systems this should not be a problem, but I wonder
if this can have some side effects for ispell.el in corner cases.
 
> > ​Also, as I realised while preparing the patch for bug#25230, it is only 
> > hunspell that has special information
> > about character classes. All the others just use [:alpha:]. So if it's good 
> > enough for ispell and aspell, can't it be
> > good enough for enchant? (It just means that for now "direct Hunspell" is 
> > arguably better than "Hunspell via
> > Enchant".)
> 
> Hunspell is the most modern and sophisticated speller, we certainly
> don't want to degrade it.  Also, Aspell uses the dictionaries at least
> for some of this info, see the function I pointed to above.
> 
> Once again, if Enchant uses a back-end for which we know how to find
> this information, we should do so.

About Enchant, last time I looked at it it was mostly intented for use
through libenchant, not through the standalone enchant binary, which was
more like some kind of testing tool. As a matter of fact its list of
options is quite short and it seems to lack support for personal
dictionaries. Since Emacs uses a pipe for spellchecking I do not think
we should worry too much about the enchant binary.

Things may have changed recently in enchant, but I would not expect that too
much, its man page still mentions myspell and not at all hunspell (so it
may be a bit outdated), although it seems to be able to use libhunspell.

Also, there is no easy way to know which particular spellchecking engine is
being used. Enchant uses $(datadir)/enchant and ~/.enchant config files to
define preferences, but I see no way to make enchant tell which one is being
used. So, it is not easy to parse dictionary info.

Sorry if I have missed some things. Gmail tags some of Reuben mails as spam
and puts them out of my usual workflow.

-- 
Agustin





reply via email to

[Prev in Thread] Current Thread [Next in Thread]