Re: [Chicken-users] Neophyte in scheme: string-split not quite what I wa

chicken-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] Neophyte in scheme: string-split not quite what I wa

From:	Дмитрий
Subject:	Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want
Date:	Sat, 21 Jul 2012 14:34:17 +0400

Hello!

> As I said, I'm a neophyte. My "character classes" were based around
> [a-zA-z] etc. So you can readily see why the pattern would have
> quickly become unreasonably complex.
If you don't need any exotic characters, just ASCII (and, probably, a small 
superset of Unicode), character classes would be extremely simple:
(use irregex utf8)

; Cyrillic letters range:
(define cyrl '(/ #\u0400 #\u05012))

(define (split-into-classes s)
  (irregex-extract `(or (+ (or alpha ,cyrl)) (+ num)
                       (+ punct) (+ white)
                       (+ (~ alpha num punct white ,cyrl))) s))

Note that I'm also a kind of a neophyte, so there may be a better way to do 
this. :)

Then you can use this procedure like this:
; In Linux/Cygwin you can input "Hello world! Да." directly, but not in Windows 
console
(split-into-classes "Hello world! \u0414\u0430.")
=> ("Hello" " " "world" "!" " " "Да" ".")

But extending this procedure to cover the whole Unicode would be tricky.

> I was planning on using Chicken to learn scheme, since R7SR is supposed
> to be based more on R5SR than on R6SR, but maybe it's better to learn
> using Racket.
It doesn't matter what tools you use as long as you have a desire to learn. I 
was personally put off by Racket's extremely slow loading time.

Also note that I believe Racket doesn't have a built-in solution to split a 
string into character classes either.

> (I *do* need to use utf-8 in lots of places, and an incomplete implementation
> while I was learning would be ... unpleasant. Particularly if the user
> documentation presumed that it *was* complete.)
What made you think it's incomplete? :o

Windows console's UTF-8 support is incomplete, but on the Chicken's side 
everything is OK.

 -- 
С уважением,
Дмитрий Кушнарёв

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want, Дмитрий, 2012/07/20
- Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want, Peter Bex, 2012/07/20
- Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want, Charles Hixson, 2012/07/20
  - Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want, Christian Kellermann, 2012/07/20
  - Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want, Alex Shinn, 2012/07/20
- Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want, Дмитрий, 2012/07/20
  - Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want, Alex Shinn, 2012/07/20
- Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want, Дмитрий <=
- Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want, Дмитрий, 2012/07/21

Prev by Date: Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want
Next by Date: Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want
Previous by thread: Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want
Next by thread: Re: [Chicken-users] Neophyte in scheme: string-split not quite what I want
Index(es):
- Date
- Thread