|
From: | Paul Eggert |
Subject: | bug#55331: Improved support for combining diacritics |
Date: | Mon, 9 May 2022 11:30:28 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 |
On 5/8/22 23:38, Benson Muite wrote:
When using grep -E "\s[a-z\`\'āáàēéèīíìịị̄ị́ị̀ōóòọọ̄ọọ́ọ̀ūúùụ̄ụ́ụ̀n̄ńǹm̄ḿm̀]{4}$" to extract 4 letter Igbo words
The {4} means "4 characters", not "4 letters", and a combining character counts as a character.
It might be nice for 'grep' to have ways to perform Unicode normalization before matching. In the meantime perhaps you can get what you want by normalizing the text before running it through 'grep'.
[Prev in Thread] | Current Thread | [Next in Thread] |