[Chicken-users] Codepoint indices for matched regexps (UTF-8)?

chicken-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] Codepoint indices for matched regexps (UTF-8)?

From:	Henry Hu
Subject:	[Chicken-users] Codepoint indices for matched regexps (UTF-8)?
Date:	Fri, 15 Jun 2018 09:44:14 -0400

Hello world!

I am trying to use unit irregex to match regular expressions in UTF-8 text. Is anyone familiar with a way to ask for the codepoint indices rather than byte indices for the match?

For example:

(irregex-match-start-index (irregex-search (irregex "Č" 'utf8) "čččČččč"))

returns 6 when I want it to return 3, since there are 3 characters (6 bytes) before my match.

I tried (use utf8), but it is documented that it doesn't affect irregex and it sure enough doesn't. I tried using the 'utf8 option while compiling my regex, but it doesn't change the index returned by irregex-match-start-index.

Thank you for any ideas you might have!

[Prev in Thread]

Current Thread

[Next in Thread]

[Chicken-users] Codepoint indices for matched regexps (UTF-8)?, Henry Hu <=
- Re: [Chicken-users] Codepoint indices for matched regexps (UTF-8)?, John Cowan, 2018/06/15
  - [Chicken-users] Strange behaviour of "(use (prefix <module> <prefix>))", Martin Schneeweis, 2018/06/19
    - Re: [Chicken-users] Strange behaviour of "(use (prefix <module> <prefix>))", Martin Schneeweis, 2018/06/19
    - [Chicken-users] Recursive Types?, Martin Schneeweis, 2018/06/22
    - [Chicken-users] dbc (design-by-contract-egg) related problem, Martin Schneeweis, 2018/06/22
    - Re: [Chicken-users] Recursive Types?, Martin Schneeweis, 2018/06/22
    - [Chicken-users] egg-index-4, Martin Schneeweis, 2018/06/25

Prev by Date: Re: [Chicken-users] context sensitive auto-completion for symbols in SciTE
Next by Date: Re: [Chicken-users] context sensitive auto-completion for symbols in SciTE
Previous by thread: [Chicken-users] Call for draft papers for presentation at IFL 2018 (Implementation and Application of Functional Languages)
Next by thread: Re: [Chicken-users] Codepoint indices for matched regexps (UTF-8)?
Index(es):
- Date
- Thread