[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ignoring combining diacritics in isearch
From: |
Eli Zaretskii |
Subject: |
Re: ignoring combining diacritics in isearch |
Date: |
Wed, 23 Nov 2022 20:02:52 +0200 |
> From: Robert Pluim <rpluim@gmail.com>
> Date: Wed, 23 Nov 2022 18:27:25 +0100
>
> Over on Stack Overflow, someone has been trying to get char-folded
> isearch working for Arabic, and has been having some issues because
> char-folding only works for equivalent characters, not base characters
> followed by combining characters. So eg searching for 'ee' when the
> buffer contains
>
> éé
>
> (thatʼs 'e' followed by COMBINING ACUTE ACCENT) fails.
>
> The following patch fixes that, but itʼs a bit of a sledgehammer (the
> "\\c^*" bit probably needs to be configurable, because there are
> diacritic-like codepoints in Arabic that are not combining, such as
> U+0640 ARABIC TATWEEL)
Yes, this is definitely not the way. There are many more "foldings" that
Latin scripts don't know about. For example, it should be possible to fold
the initial, medial, and final forms of letters that exist in some scripts
(including Arabic).
I think we've all but reached the limit to which this quasi-folding via
regexps can be stretched. Writing regexp by hand or semi-mechanically based
on Unicode properties can only go this far. _Real_ character folding cannot
work this way. We should work on infrastructure for folding text for search
purposes, and then we can build features on top of that.