I am trying to use unit irregex to
match regular expressions in UTF-8 text. Is anyone familiar with a way
to ask for the codepoint indices rather than byte indices for the
match?
For example:
(irregex-match-start-index (irregex-search (irregex "Č" 'utf8) "čččČččč"))
returns 6 when I want it to return 3, since there are 3 characters (6 bytes) before my match.
I tried (use utf8), but it is documented that it doesn't affect
irregex and it sure enough doesn't. I tried using the 'utf8 option
while compiling my regex, but it doesn't change the index returned by
irregex-match-start-index.