aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aspell-user] small bug: two following non alpha characters


From: Kevin Atkinson
Subject: Re: [Aspell-user] small bug: two following non alpha characters
Date: Tue, 1 Nov 2005 19:03:38 -0700 (MST)

On Wed, 26 Oct 2005, Gary Setter wrote:

Back in August I was trying to make my program working with
Unicode and the koi8-r character set. One of the problems was
tokenizing the text into words. It seemed aspell was treating all
character sets as ASCII.

Could you more specific.

The speller object does have a language
member and the language member does have a sense of the
characteristics of each character in the characterset. What are
the characteristics of the ampersand and dash in your
characterset? Might aspell make use those characterset specific
characteristics to tokenize "hait-l'-ovraedje" as one word?

Yes it might make sense but I do not have support for it. The ' and - are treated as special characters. They can only be part of a word if they have have normal letters on both sides otherwise thinks "like--this" will also be treated as a single word. For this reason special care needs to be taken for treating these special characters.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]