bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

spell enhancements


From: Joe Corneli
Subject: spell enhancements
Date: Fri, 18 Jul 2003 09:06:31 -0500

Hear ye hear ye,

I would like to request the following two enhancements for spell
versions > 1.0:

1)  the ability to specify a list of "bad words" that should not be
    recognized even if they are in the dictionary that spell uses.  I
    would like to be able to have " ", "+", "=", "alpha", etc. be bad
    words.

2)  I would like it if there was an option to have spell print not the
    line number of the unrecognized word but the offset from the start
    of the file. (point) 

Please let me know what you think about these requests...  If you have
time, please see the short text below that explains why I am making
the requests!  

Thanks,

Joe Corneli



On a method of converting "email math" to TeX.

Here is my thought for marking up a 1-d stream of ascii text: run the
text through a variant of GNU's 'spell' program to produce a list of
"unrecognized" words -- unrecognizable words should include such things
as ' ', 'a', 'b', ..., 'x', 'y', 'z, 'alpha', ..., 'zeta', 'sum', '=',
'+', and so on, as well as all TeX expressions.  Record together with
the word its offset position from 0.

Note that GNU spell does recognize the first batch of words and also
recognizes any of these words with a "\" placed in front of it, which is
why I say "a variant".  (GNU spell does not recognize "infty".)

Now go over this list to build up longer strings by putting together
adjacent "unrecognized" words.  A string of arbitrary length made up
only of spaces can be forgotten at this point. You must also record the
offset of the first character in each string from 0.  At the end of this
process you might have a list of strings that looks something like this:

" I "                 0
" a "                 34
" x "                 55
" sum x^i "           79
" (x - alpha)^i "     1002
" I "                 1233
" sum (x - alpha)^i " 2333

Now scan through this list for strings that shouldn't be treated as math
(" a " and " I ", should almost surly be treated as text, especially
since they do not appear in any of the other formulas).  Delete these
things from this list.  At the same time you should look for substrings
that should be treated as TeX code and rewrite them appropriately,
eg. "alpha" would become "\alpha".  Finally, replace the first and last
space in each of the strings left on the list with a "$" and return the
strings to the text in the appropriate places (using the offset numbers
and correcting for the added \'s).

It seems to me that this method should be successful on everything
except for a's and I's used as variable names and confusing words like
"ad". In order to deal with such things adequately you would have to
do some real parsing.  Let's forget about this minor detail right now
and make something that will work 99% of the time!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]