|
From: | Elias Mårtenson |
Subject: | Re: On language-dependent defaults for character-folding |
Date: | Fri, 19 Feb 2016 18:51:47 +0800 |
> The Unicode character decomposition was never meant to be used to provide a feature such as character
> folding in Emacs.
That's not true. Canonical equivalence, which is encoded in canonical
decompositions, is a must for searching. Otherwise, what looks the
same on display will not be found, and will look like a bug. See the
example I gave with ñ and ñ (the latter one is 2 characters).
2 and 3 are the same as we do already, AFAICT. (Collation charts
describe ordering, which is irrelevant for searching; other than that,
you will see that Emacs already implements the data shown in
http://unicode.org/charts/collation/.)
As for the locale-specific parts: using that will only DTRT if we
assume that the majority of searches are done in buffers holding text
in locale's language. Is that a good assumption?
We are talking
about a multilingual Emacs, in an age of global communications, where
you can have conversations with someone on the other side of the
world, or read text that combines several languages in the same
buffer. Do we really want to go back to the l10n days, when there was
ever only one locale that was interesting -- the current one? I
wonder.
It is, Unicode provides it. We just didn't import it yet.
It's more complex than that, but patches are welcome, of course.
Note that the prerequisite for anything more complicated and elaborate
than what we have now is to re-implement character-folding on the C
level, inside search.c functions. The current implementation is at
its limits already. I tried to convince the interested people to do
this in C to be gin with, but couldn't, and the feature was important
enough to have even in its current implementation.
[Prev in Thread] | Current Thread | [Next in Thread] |