aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Aspell-user] digit behavior


From: Michael Howard
Subject: [Aspell-user] digit behavior
Date: Sat, 1 May 2010 13:57:29 -0400

I am investigating aspell for use on a large set of scanned pages with
text that was generated through OCR.

I searched through the mailing list achiive and found
  http://lists.gnu.org/archive/html/aspell-user/2002-07/msg00003.html
wherein Kevin Atkinson explains that aspell was not designed for
OCR-type errors.

Nevertheless, I chose to proceed a bit ... primarly because I was
unable to find anything open source that was better. Unfortunately I
did not get very far.

aspell seems to ignore any words with digits in them, and my OCR text
has plenty of digit/character confusion. I was unable to find any
options to control behavior with digits.

Searching the mailing list again I found
  http://lists.gnu.org/archive/html/aspell-user/2006-08/msg00013.html
wherein Thomas Güttler suggested modifying the cset table so that
additional characters could be treated as word characters. I tried
copying the .cset file, modifying it to turn the Digits into Letters,
specifyiing my cset using --encoding on the command line. However but
the behavior did not change ... words with digits in them were still
ignored and did not show up with --list.

Any comments/suggestions/advice appreciated.


Michael




reply via email to

[Prev in Thread] Current Thread [Next in Thread]