Re: On language-dependent defaults for character-folding

On 20 February 2016 at 17:21, Eli Zaretskii <address@hidden> wrote:

Your interpretation is wrong, because every implementation of
character-folding in search uses normalization forms. So if you want
to maintain that whoever does that is abusing normalization forms, you
are not just up against Emacs, you are up against the ICU library and
others. You are also up against http://www.unicode.org/notes/tn5/.

They may do so, but only because we're not exactly swimming in great alternatives.

It is possible that you only see the "equivalence" parts of all these
sources. But in that case, you are actually claiming that folding
characters should never be done at all! "Folding" means mapping
_distinct_ character sequences to the same basic sequence. You start
from a normalization form, then compare the results disregarding
certain secondary, tertiary, etc. differences.

Of course. But the fact that you start from a normalisation form is of secondary relevance here. I thinking that perhaps repeating the fact that the normalised form is used has somewhat clouded the discussion.

When you say "ignoring [...] differences", how do you determine those differences?

> Again (I really apologise for repeating myself, I'm starting to sound like a troll and that is truly not my intention),
> the purpose of normalisation forms are to ensure that the two variants of ñ compare the same. It is not
> designed to provide a mechanism to allow n to compare equal to ñ.

Under character-folding that ignores diacritics, ñ should indeed
compare equal to n.

Yes again. But how do you determine what rules to apply?

> Sure, but doesn't it make sense to fall back to the user's default if the buffer does not have an overriding
> locale?

I don't know what you mean by "buffer has an overriding locale".
Emacs buffers don't have a locale, and they cannot do that in
principle because we support multiple languages. E.g., what could the
locale of the HELLO buffer created by "C-h H" be?

I was not talking about what Emacs does today. I was speaking about the hypothetical case where buffers can have unique locales. I can see a few cases where that would be a neat thing to have, but I have to scrape the barrel to do so.

> As opposed to having no concept of locale at all?

Yes. A multilingual environment cannot have a locale in principle.
It will cease being multilingual if it does.

I guess we'll have to agree to disagree about this one. In any case, it's for a different thread.

> Strange, I always thought the data was there. Perhaps you should ask
> a question on the Unicode mailing list, then.
>
> That's a good idea actually.

That's a relief. I was beginning to suspect I don't have any good
ideas at all.

Apparently I have given the impression that I think your ideas are garbage. I profoundly apologise for this and will try to be better going forward.

Regards,

Elias

From:	Elias Mårtenson
Subject:	Re: On language-dependent defaults for character-folding
Date:	Sat, 20 Feb 2016 18:08:20 +0800