[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: On language-dependent defaults for character-folding
From: |
Eli Zaretskii |
Subject: |
Re: On language-dependent defaults for character-folding |
Date: |
Tue, 23 Feb 2016 18:56:36 +0200 |
> From: Achim Gratz <address@hidden>
> Date: Sun, 21 Feb 2016 09:14:18 +0100
>
> Elias Mårtenson writes:
> > Because under the Unicode decomposition rules, ø is not decomposable. I
> > can't explain why that is the case (probably because there is no reason to
> > have a combining /. After all, the only languages that use ø are languages
> > that use it as a character of its own).
>
> AFAIK, for combining characters to be composable/decomposable the glyphs
> must not overlap. This is the same issue as with the polish »ł« to the
> best of my knowledge.
The definitive answer is here, for those interested:
http://www.unicode.org/mail-arch/unicode-ml/y2016-m02/0106.html
> In other words, unicode composition/decomposition rules tell you more
> about the glyph construction than they do about useful strategies to
> search for multiple characters.
That conclusion is too radical, IMO. You will see in the above
message that the criterion you describe was just a means for the UTC
to draw a line somewhere, i.e. it was an ad-hoc rule more than
anything else.
> The idea of using the base character of the canonical decomposition
> in the search might still yield a useful shortcut in most cases, but
> I'm not sure it is correct in all languages even when that
> decomposition exists and, as the examples show, there are cases
> where the non-decomposed character has to be treated specially.
Language-specific tailoring is indeed needed for best results, but the
language-independent decompositions have their place. E.g., you will
see in the Unicode collation database (UCA) a file named decomps.txt
that is basically a list of decompositions from UnicodeData.txt with
additions specifically for collation, searching, and matching
(including ł, btw). Which tells me that the decomposition data in
UnicodeData.txt is a good basis for these features, it is not just
about glyph constructions.
- Re: On language-dependent defaults for character-folding, (continued)
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/19
- Re: On language-dependent defaults for character-folding, Elias Mårtenson, 2016/02/19
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/19
- Re: On language-dependent defaults for character-folding, Elias Mårtenson, 2016/02/20
- Re: On language-dependent defaults for character-folding, Lars Ingebrigtsen, 2016/02/20
- Re: On language-dependent defaults for character-folding, Elias Mårtenson, 2016/02/20
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/20
- Re: On language-dependent defaults for character-folding, Lars Ingebrigtsen, 2016/02/20
- Re: On language-dependent defaults for character-folding, Elias Mårtenson, 2016/02/21
- Re: On language-dependent defaults for character-folding, Achim Gratz, 2016/02/21
- Re: On language-dependent defaults for character-folding,
Eli Zaretskii <=
- Re: On language-dependent defaults for character-folding, Lars Ingebrigtsen, 2016/02/21
- Re: On language-dependent defaults for character-folding, Elias Mårtenson, 2016/02/21
- Re: On language-dependent defaults for character-folding, Eli Zaretskii, 2016/02/21
- Re: On language-dependent defaults for character-folding, Lars Ingebrigtsen, 2016/02/21
- Re: On language-dependent defaults for character-folding, Elias Mårtenson, 2016/02/21
- Re: On language-dependent defaults for character-folding, Lars Ingebrigtsen, 2016/02/21
- Re: On language-dependent defaults for character-folding, Werner LEMBERG, 2016/02/22
- Re: On language-dependent defaults for character-folding, Richard Stallman, 2016/02/22
- Re: On language-dependent defaults for character-folding, Werner LEMBERG, 2016/02/22
- Re: On language-dependent defaults for character-folding, Richard Stallman, 2016/02/22