[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Chicken-users] Issue w/ string-trim functions in utf8-srfi-13
From: |
Matt Gushee |
Subject: |
[Chicken-users] Issue w/ string-trim functions in utf8-srfi-13 |
Date: |
Thu, 19 Sep 2013 14:55:12 -0600 |
Hello--
I've noticed the following unexpected behavior with the string
trimming functions in utf8-srfi-13:
[BTW: this affects the civet egg, so if anyone is using civet, please
see the note at the bottom]
$ csi
CHICKEN
(c) 2008-2013, The Chicken Team
(c) 2000-2007, Felix L. Winkelmann
Version 4.8.0.4 (stability/4.8.0) (rev 578619b)
linux-unix-gnu-x86 [ manyargs dload ptables ]
compiled 2013-07-15 on aeryn.xorinia.dim (Darwin)
; loading /home/matt/.csirc ...
; << etc. >>
csi> (use srfi-13)
csi> (define strings '("abc" "\t abc" "\r abc" "\t abc")
csi> (map string-trim-both strings)
("abc" "abc" "abc" "abc")
csi> (use utf8-srfi-13)
; loading /usr/lib/chicken/6/utf8-srfi-13.import.so ...
; << etc. >>
csi> (map string-trim-both strings)
("abc" "\t abc" "\r abc" "\n abc")
And since SRFI-13 states:
> Char/char-set/pred defaults to the character set char-set:whitespace defined
> in SRFI 14.
... it seems pretty clear that this is an error in the utf8 egg (or at
least a point of non-conformance that should be documented). Unless,
of course, there is something important that I don't understand
(always a possibility ;-)
In any case, the explanation for the unexpected behavior is not hard
to find: inspecting utf8-srfi-13.scm, I find:
(define (string-trim-both s . opt)
(let-optionals* opt ((trimmer #\space))
(string-trim (apply string-trim-right s opt) trimmer)))
... and similarly for string-trim and string-trim-right ... evidently
all three functions default to removing only #\space characters.
Shouldn't it be 'char-set:whitespace' ?
NOTE TO civet USERS:
Since civet uses utf8-srfi13 for string processing, this issue can
produce incorrect output for dynamic attribute insertion (i.e., using
the <cvt:attr> element).
I am assuming this will be fixed in the utf-8 egg; in the meantime, I
have implemented a workaround in a Git branch, but I'm not going to
merge it into master unless I find out that the utf8 egg's behavior is
intentional. So if you would like the modified version of civet, do
the following:
> git clone --branch string-trim-workaround
https://github.com/mgushee/civet.git
Best regards,
Matt Gushee
- [Chicken-users] Issue w/ string-trim functions in utf8-srfi-13,
Matt Gushee <=
- Re: [Chicken-users] Issue w/ string-trim functions in utf8-srfi-13, Alex Shinn, 2013/09/19
- Re: [Chicken-users] Issue w/ string-trim functions in utf8-srfi-13, Matt Gushee, 2013/09/19
- Re: [Chicken-users] Issue w/ string-trim functions in utf8-srfi-13, Alex Shinn, 2013/09/22
- Re: [Chicken-users] Issue w/ string-trim functions in utf8-srfi-13, Matt Gushee, 2013/09/22
- [Chicken-users] Changing string representation for records, Chris Mueller, 2013/09/22
- Re: [Chicken-users] Changing string representation for records, Evan Hanson, 2013/09/22
- Re: [Chicken-users] Changing string representation for records, Chris Mueller, 2013/09/23