chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] Neophyte in scheme: string-split not quite what I wa


From: Дмитрий
Subject: Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want
Date: Sat, 21 Jul 2012 14:34:17 +0400

Hello!

> As I said, I'm a neophyte. My "character classes" were based around
> [a-zA-z] etc. So you can readily see why the pattern would have
> quickly become unreasonably complex.
If you don't need any exotic characters, just ASCII (and, probably, a small 
superset of Unicode), character classes would be extremely simple:
(use irregex utf8)

; Cyrillic letters range:
(define cyrl '(/ #\u0400 #\u05012))

(define (split-into-classes s)
  (irregex-extract `(or (+ (or alpha ,cyrl)) (+ num)
                       (+ punct) (+ white)
                       (+ (~ alpha num punct white ,cyrl))) s))

Note that I'm also a kind of a neophyte, so there may be a better way to do 
this. :)

Then you can use this procedure like this:
; In Linux/Cygwin you can input "Hello world! Да." directly, but not in Windows 
console
(split-into-classes "Hello world! \u0414\u0430.")
=> ("Hello" " " "world" "!" " " "Да" ".")

But extending this procedure to cover the whole Unicode would be tricky.

> I was planning on using Chicken to learn scheme, since R7SR is supposed
> to be based more on R5SR than on R6SR, but maybe it's better to learn
> using Racket.
It doesn't matter what tools you use as long as you have a desire to learn. I 
was personally put off by Racket's extremely slow loading time.

Also note that I believe Racket doesn't have a built-in solution to split a 
string into character classes either.

> (I *do* need to use utf-8 in lots of places, and an incomplete implementation
> while I was learning would be ... unpleasant. Particularly if the user
> documentation presumed that it *was* complete.)
What made you think it's incomplete? :o

Windows console's UTF-8 support is incomplete, but on the Chicken's side 
everything is OK.

 -- 
С уважением,
Дмитрий Кушнарёв



reply via email to

[Prev in Thread] Current Thread [Next in Thread]