[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] Neophyte in scheme: string-split not quite what I wa

From: Peter Bex
Subject: Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want
Date: Fri, 20 Jul 2012 13:33:22 +0200
User-agent: Mutt/

On Fri, Jul 20, 2012 at 03:05:39PM +0400, Дмитрий wrote:
> Hello.
> Does IrRegex support Unicode character classes?

Generally, it does and there are at least a few tests for these.
However, I've never worked with these kinds of characters myself,
so I don't know how well they're supported. The docs also explicitly
have a warning that case insensitive matches do not work for non-ASCII
characters, so YMMV.

> E.g. Will IrRegex consider accented letters (á) or Cyrillic letters (я) as 
> "alpha"? Wil IrRegex consider Chinese wide space ( ) as "space"? Will IrRegex 
> consider Chinese brackets (「」【】) as "punct"?

No, almost all of the named character classes are ASCII only.

> If it doesn't, the regexp is going to be EXTREMELY messy [in fact, I believe 
> it may better to build such a regexp automatically then].

There are a few (undocumented?!) "helper" character classes like
utf8-tail-char, utf8-2-char, utf8-3-char and utf8-4-char.  See the source
for details.

I don't know what Alex's plan is for UTF8 support, but if you're willing
to put in the effort to define character classes for the ranges you
mentioned, possibly you could contribute them to the (upstream) irregex
project.  If the definition of these sets are big, maybe we could turn it
into an optional add-in library.

> I’m on Windows, so I can’t check it (when I use UTF-8 console via chcp 65001, 
> for some reason Chicken seems to fail on every string with operation 
> non-ascii string — even on a simple (display "Привет")).

This could be due to terminal and locale settings.

"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                                                        -- Donald Knuth

reply via email to

[Prev in Thread] Current Thread [Next in Thread]