Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want

From: Charles Hixson
Subject: Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want
Date: Fri, 20 Jul 2012 11:19:22 -0700
On 07/20/2012 04:05 AM, Дмитрий wrote:

Does IrRegex support Unicode character classes? E.g. Will IrRegex consider accented letters (á) or Cyrillic 
letters (я) as "alpha"? Wil IrRegex consider Chinese wide space ( ) as "space"? Will 
IrRegex consider Chinese brackets (「」【】) as "punct"? If it doesn't, the regexp is going to be 
EXTREMELY messy [in fact, I believe it may better to build such a regexp automatically then].

I’m on Windows, so I can’t check it (when I use UTF-8 console via chcp 65001, for some 
reason Chicken seems to fail on every string with operation non-ascii string — even on a 
simple (display "Привет")).

Yours sincerely,
Dmitry Kushnariov

As I said, I'm a neophyte. My "character classes" were based around [a-zA-z] etc. So you can readily see why the pattern would have quickly become unreasonably complex. I didn't find any definition of other character classes (well, not one that meant anything) and given the discussion, I think that they wouldn't have worked if I'd gotten to the point of testing them.

I was planning on using Chicken to learn scheme, since R7SR is supposed to be based more on R5SR than on R6SR, but maybe it's better to learn using Racket. I *trust* the conversion won't be too difficult. (I *do* need to use utf-8 in lots of places, and an incomplete implementation while I was learning would be ... unpleasant. Particularly if the user documentation presumed that it *was* complete.)

Charles Hixson

