[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: extending case-fold-search to remove nonspacing marks (diacritics et
From: |
Juri Linkov |
Subject: |
Re: extending case-fold-search to remove nonspacing marks (diacritics etc.) |
Date: |
Fri, 06 Feb 2015 02:54:45 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (x86_64-pc-linux-gnu) |
> Something essentially identical to this was being discussed here a
> couple of weeks ago. Look for the thread "Single quotes in Info". I
> wrote a small elisp solution for building this into isearch (which you
> can find on the "scratch/isearch-character-group-folding" branch). It
> took a different approach to yours, relating characters to regexp, but
> it works.
I see that your branch contains nothing more than was already implemented
a long time ago in bug#13041 where the major stumbling block was
an inefficiency of the regexp-based solution. Could you help to improve it?
> The bright side is that I think this two-char way of writing latin
> accents is much less common (not 100% sure though, it's hard to tell
> the difference). The downside is that I know nothing about other
> languages, so maybe using two chars to represent one char is the
> default behavior in some other languages?
As https://emacs.stackexchange.com/q/7992/478 indicates,
other languages require insertion/deletion of special characters
like diacritics/accents from the search string/buffer for normalization.
When looking for a solution I recommend you to check ucs-normalize.
For example, evaluating:
(require 'ucs-normalize)
ucs-normalize-combining-chars
you can see exactly the same characters
1616 1615 1619 1648 1618 1612 1613 1611 1617 1614
mentioned in https://emacs.stackexchange.com/a/8001/478
Using its corresponding regexp `ucs-normalize-combining-chars-regexp'
is easy in isearch, e.g.:
;; Decomposition search for accented letters.
(define-key isearch-mode-map "\M-sd" 'isearch-toggle-decomposition)
(defun isearch-toggle-decomposition ()
"Toggle Unicode decomposition searching on or off."
(interactive)
(setq isearch-word (unless (eq isearch-word 'isearch-decomposition-regexp)
'isearch-decomposition-regexp))
(if isearch-word (setq isearch-regexp nil))
(setq isearch-success t isearch-adjusted t)
(isearch-update))
(defun isearch-decomposition-regexp (string &optional _lax)
"Return a regexp that matches decomposed Unicode characters in STRING."
(let ((accents (substring ucs-normalize-combining-chars-regexp 0 -1)))
(mapconcat
(lambda (c0)
(concat (string c0) accents "?"))
(replace-regexp-in-string accents "" string) "")))
(put 'isearch-decomposition-regexp 'isearch-message-prefix "deco ")
But this is more inefficient than properly implementing it using case tables.
- extending case-fold-search to remove nonspacing marks (diacritics etc.), Ted Zlatanov, 2015/02/05
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Artur Malabarba, 2015/02/05
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Artur Malabarba, 2015/02/05
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.),
Juri Linkov <=
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Artur Malabarba, 2015/02/05
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Artur Malabarba, 2015/02/05
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Eli Zaretskii, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Artur Malabarba, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Eli Zaretskii, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Artur Malabarba, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Eli Zaretskii, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Stephen J. Turnbull, 2015/02/05
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Eli Zaretskii, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Stefan Monnier, 2015/02/06