bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Possible spurious "range striding over charsets" errors


From: James J. Ramsey
Subject: Possible spurious "range striding over charsets" errors
Date: Fri, 31 Dec 2004 12:09:58 -0800 (PST)

I've been working on a patch for ispell.el so it works
better on UTF-8 files when used with Aspell. The patch
is here:

http://sourceforge.net/tracker/index.php?func=detail&aid=945391&group_id=245&atid=300245

There is a variable called ispell-utf-8-casechars that
contains a string that, when passed through the
function ispell-decode-string (basically a wrapper for
decode-coding-string), becomes a regular expression
for anything that is supposed to be a "letter" of a
word. Here is the original value of the variable,
which worked in Emacs 21.3:

(I apologize in advance if it causes horizontal scroll
problems. Sorry.)

(setq ispell-utf-8-casechars
"[A-Za-z\303\200-\303\226\303\230-\303\266\303\270-\303\277\304\200-\304
\261\304\264-\304\276\305\201-\305\210\305\212-\305\276\306\200-\307\203\307\215
-\307\260\307\264\307\265\307\272-\310\227\311\220-\312\250\312\273-\313\201\316
\206\316\210-\316\212\316\214\316\216-\316\241\316\243-\317\216\317\220-\317\226
\317\232\317\234\317\236\317\240\317\242-\317\263\320\201-\320\214\320\216-\321\
217\321\221-\321\234\321\236-\322\201\322\220-\323\204\323\207\323\210\323\213\3
23\214\323\220-\323\253\323\256-\323\265\323\270\323\271\324\261-\325\226\325\23
1\325\241-\326\206\327\220-\327\252\327\260-\327\262\330\241-\330\272\331\201-\3
31\212\331\261-\332\267\332\272-\332\276\333\200-\333\216\333\220-\333\223\333\2
25\333\245\333\246\340\244\205-\340\244\271\340\244\275\340\245\230-\340\245\241
\340\246\205-\340\246\214\340\246\217\340\246\220\340\246\223-\340\246\250\340\2
46\252-\340\246\260\340\246\262\340\246\266-\340\246\271\340\247\234\340\247\235
\340\247\237-\340\247\241\340\247\260\340\247\261\340\250\205-\340\250\212\340\2
50\217\340\250\220\340\250\223-\340\250\250\340\250\252-\340\250\260\340\250\262
\340\250\263\340\250\265\340\250\266\340\250\270\340\250\271\340\251\231-\340\25
1\234\340\251\236\340\251\262-\340\251\264\340\252\205-\340\252\213\340\252\215\
340\252\217-\340\252\221\340\252\223-\340\252\250\340\252\252-\340\252\260\340\2
52\262\340\252\263\340\252\265-\340\252\271\340\252\275\340\253\240\340\254\205-
\340\254\214\340\254\217\340\254\220\340\254\223-\340\254\250\340\254\252-\340\2
54\260\340\254\262\340\254\263\340\254\266-\340\254\271\340\254\275\340\255\234\
340\255\235\340\255\237-\340\255\241\340\256\205-\340\256\212\340\256\216-\340\2
56\220\340\256\222-\340\256\225\340\256\231\340\256\232\340\256\234\340\256\236\
340\256\237\340\256\243\340\256\244\340\256\250-\340\256\252\340\256\256-\340\25
6\265\340\256\267-\340\256\271\340\260\205-\340\260\214\340\260\216-\340\260\220
\340\260\222-\340\260\250\340\260\252-\340\260\263\340\260\265-\340\260\271\340\
261\240\340\261\241\340\262\205-\340\262\214\340\262\216-\340\262\220\340\262\22
2-\340\262\250\340\262\252-\340\262\263\340\262\265-\340\262\271\340\263\236\340
\263\240\340\263\241\340\264\205-\340\264\214\340\264\216-\340\264\220\340\264\2
22-\340\264\250\340\264\252-\340\264\271\340\265\240\340\265\241\340\270\201-\34
0\270\256\340\270\260\340\270\262\340\270\263\340\271\200-\340\271\205\340\272\2
01\340\272\202\340\272\204\340\272\207\340\272\210\340\272\212\340\272\215\340\2
72\224-\340\272\227\340\272\231-\340\272\237\340\272\241-\340\272\243\340\272\24
5\340\272\247\340\272\252\340\272\253\340\272\255\340\272\256\340\272\260\340\27
2\262\340\272\263\340\272\275\340\273\200-\340\273\204\340\275\200-\340\275\207\
340\275\211-\340\275\251\341\202\240-\341\203\205\341\203\220-\341\203\266\341\2
04\200\341\204\202\341\204\203\341\204\205-\341\204\207\341\204\211\341\204\213\
341\204\214\341\204\216-\341\204\222\341\204\274\341\204\276\341\205\200\341\205
\214\341\205\216\341\205\220\341\205\224\341\205\225\341\205\231\341\205\237-\34
1\205\241\341\205\243\341\205\245\341\205\247\341\205\251\341\205\255\341\205\25
6\341\205\262\341\205\263\341\205\265\341\206\236\341\206\250\341\206\253\341\20
6\256\341\206\257\341\206\267\341\206\270\341\206\272\341\206\274-\341\207\202\3
41\207\253\341\207\260\341\207\271\341\270\200-\341\272\233\341\272\240-\341\273
\271\341\274\200-\341\274\225\341\274\230-\341\274\235\341\274\240-\341\275\205\
341\275\210-\341\275\215\341\275\220-\341\275\227\341\275\231\341\275\233\341\27
5\235\341\275\237-\341\275\275\341\276\200-\341\276\264\341\276\266-\341\276\274
\341\276\276\341\277\202-\341\277\204\341\277\206-\341\277\214\341\277\220-\341\
277\223\341\277\226-\341\277\233\341\277\240-\341\277\254\341\277\262-\341\277\2
64\341\277\266-\341\277\274\342\204\246\342\204\252\342\204\253\342\204\256\342\
206\200-\342\206\202\343\200\207\343\200\241-\343\200\251\343\201\201-\343\202\2
24\343\202\241-\343\203\272\343\204\205-\343\204\254]")

In the patched version of ispell.el,
ispell-decode-string translates the sequences of
octets into the appropriate UTF-8 characters. It's the
last bunch of characters that is of interest:

\343\200\207\343\200\241-\343\200\251\343\201\201-\343\202\2
24\343\202\241-\343\203\272\343\204\205-\343\204\254

These translate into UTF-8 characters in the
mule-unicode-2500-33ff charset, corresponding to the
Unicode code points

U+3007 (IDEOGRAPHIC NUMBER ZERO), U+3021 to U+3029
(HANGZHOU NUMERALS), U+3041 to U+3094 (HIRAGANA
LETTERS), U+30A1 to U+30FA (KATAKANA LETTERS), and
U+3105 to U+312C (BOPOMOFO LETTERS)

Emacs 21.3.50, however, complains that the above
ranges stride over charsets, and to mollify Emacs, I
have to change it to

\343\200\207\343\200\241-\343\200\251\343\201\201-
\343\202\223\343\202\224\343\202\241-\343\203\266
\343\203\267-\343\203\272\343\204\205-\343\204\251
\343\204\252-\343\204\254

which corresponds to the same codepoints as mentioned
above, but distributed as follows:

U+3007 (IDEOGRAPHIC NUMBER ZERO), U+3021 to U+3029
(HANGZHOU NUMERALS), U+3041 to U+3093, U+3094
(HIRAGANA LETTERS), U+30A1 to U+30F6, U+30F7 to U+30FA
(KATAKANA LETTERS), and U+3105 to U+3129, U+312A to
U+312C (BOPOMOFO LETTERS)

Interestingly enough, when I run 

(decode-coding-string
"\343\200\207\343\200\241-\343\200\251\343\201\201-\343\202\223\343\202\224\343\202\241-\343\203\266\343\203\267-\343\203\272\343\204\205-\343\204\251\343\204\252-\343\204\254"
'utf-8)

the result suggests that U+3021 to U+3029 (HANGZHOU
NUMERALS), U+3041 to U+3093 (all but one of the
HIRAGANA LETTERS), U+30A1 to U+30F6 (most of the
KATAKANA LETTERS), and U+3105 to U+3129 (most of the
BOPOMOFO LETTERS) are represented by double-wide
characters, while the rest are represented by
single-wide characters.

Running the X11 version of Emacs on OS X.



        
                
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - You care about security. So do we. 
http://promotions.yahoo.com/new_mail




reply via email to

[Prev in Thread] Current Thread [Next in Thread]