|
From: | Jason Rumney |
Subject: | Re: highlighting non-ASCII characters |
Date: | Wed, 24 Mar 2010 13:14:13 +0800 |
User-agent: | Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20100111 Lightning/1.0b1 Thunderbird/3.0.1 |
On 24/03/2010 12:20, Eli Zaretskii wrote:
If we go for such a metric, it would need to be augmented by a database of words where a small number of such characters is ``normal'', not to be highlighted. This is for words like naïve. Otherwise the feature will be an annoyance.
It's also dependent on which characters they are - Cyrillic, Han, Greek, Hebrew etc should be expected to appear in long runs, perhaps with runs of ASCII and/or other characters interleaved. Latin-1 on the other hand would normally appear individually or in very short runs mixed in with ASCII.
There is no single heuristic that can be used to identify "suspicious" characters.
[Prev in Thread] | Current Thread | [Next in Thread] |