[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-users] Neophyte in scheme: string-split not quite what I wa
From: |
Alex Shinn |
Subject: |
Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want |
Date: |
Sat, 21 Jul 2012 10:41:51 +0900 |
On Sat, Jul 21, 2012 at 3:19 AM, Charles Hixson
<address@hidden> wrote:
> On 07/20/2012 04:05 AM, Дмитрий wrote:
>>
>> Hello.
>>
>> Does IrRegex support Unicode character classes? E.g. Will IrRegex consider
>> accented letters (á) or Cyrillic letters (я) as "alpha"? Wil IrRegex
>> consider Chinese wide space ( ) as "space"? Will IrRegex consider Chinese
>> brackets (「」【】) as "punct"? If it doesn't, the regexp is going to be
>> EXTREMELY messy [in fact, I believe it may better to build such a regexp
>> automatically then].
>>
>> I’m on Windows, so I can’t check it (when I use UTF-8 console via chcp
>> 65001, for some reason Chicken seems to fail on every string with operation
>> non-ascii string — even on a simple (display "Привет")).
>>
>>
>> --
>> Yours sincerely,
>> Dmitry Kushnariov
>>
>>
>
> As I said, I'm a neophyte. My "character classes" were based around
> [a-zA-z] etc. So you can readily see why the pattern would have quickly
> become unreasonably complex. I didn't find any definition of other
> character classes (well, not one that meant anything) and given the
> discussion, I think that they wouldn't have worked if I'd gotten to the
> point of testing them.
>
> I was planning on using Chicken to learn scheme, since R7SR is supposed to
> be based more on R5SR than on R6SR, but maybe it's better to learn using
> Racket. I *trust* the conversion won't be too difficult. (I *do* need to
> use utf-8 in lots of places, and an incomplete implementation while I was
> learning would be ... unpleasant. Particularly if the user documentation
> presumed that it *was* complete.)
The utf8 implementation is not incomplete. It's just
not the default.
--
Alex