Re: ignoring combining diacritics in isearch

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ignoring combining diacritics in isearch

From:	Eli Zaretskii
Subject:	Re: ignoring combining diacritics in isearch
Date:	Wed, 23 Nov 2022 20:02:52 +0200

> From: Robert Pluim <rpluim@gmail.com>
> Date: Wed, 23 Nov 2022 18:27:25 +0100
> 
> Over on Stack Overflow, someone has been trying to get char-folded
> isearch working for Arabic, and has been having some issues because
> char-folding only works for equivalent characters, not base characters
> followed by combining characters. So eg searching for 'ee' when the
> buffer contains
> 
>     éé
> 
> (thatʼs 'e' followed by COMBINING ACUTE ACCENT) fails.
> 
> The following patch fixes that, but itʼs a bit of a sledgehammer (the
> "\\c^*" bit probably needs to be configurable, because there are
> diacritic-like codepoints in Arabic that are not combining, such as
> U+0640 ARABIC TATWEEL)

Yes, this is definitely not the way.  There are many more "foldings" that
Latin scripts don't know about.  For example, it should be possible to fold
the initial, medial, and final forms of letters that exist in some scripts
(including Arabic).

I think we've all but reached the limit to which this quasi-folding via
regexps can be stretched.  Writing regexp by hand or semi-mechanically based
on Unicode properties can only go this far.  _Real_ character folding cannot
work this way.  We should work on infrastructure for folding text for search
purposes, and then we can build features on top of that.

[Prev in Thread]

Current Thread

[Next in Thread]

ignoring combining diacritics in isearch, Robert Pluim, 2022/11/23
- Re: ignoring combining diacritics in isearch, Juri Linkov, 2022/11/23
- Re: ignoring combining diacritics in isearch, Eli Zaretskii <=

Prev by Date: Re: ignoring combining diacritics in isearch
Next by Date: Re: HEAD fails to build with --enable-checking=structs
Previous by thread: Re: ignoring combining diacritics in isearch
Index(es):
- Date
- Thread