chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] problems string-trimming on UTF8


From: Kristian Lein-Mathisen
Subject: [Chicken-users] problems string-trimming on UTF8
Date: Fri, 27 Jan 2017 14:36:55 +0100


Dear CHICKEN mailing list,

I encountered a strange issue with string-trim-right and some UTF8 string:

$ csi -R srfi-13 -p '(string-trim "Zazà")'
Zazà

So far so good!

$ csi -R srfi-13 -p '(string-trim-right "Zazà")'
Zaz�

Oh no, what happened?

$ csi -R utf8 -R srfi-13 -p '(string-trim-right "Zazà")'
Zaz�

utf8 doesn't seem to do it! But utf8, at least, gets the string-length right:

$ csi -R srfi-13 -p '(string-length "Zazà")'
5
$ csi -R utf8 -R srfi-13 -p '(string-length "Zazà")'
4

It took me a while to figure out what was going on. These are the bytes of Zazà:

$ printf 'Zazà' | xxd
00000000: 5a61 7ac3 a0                             Zaz..

So it seems like string-trim-right just looks at the last byte, \xa0 which is a non-breaking space in itself, and then dropping that off. It should be looking at the last utf8 codepoint instead.

I don't know if this is a known bug or if I've come across something undiscovered. I suppose the fix belongs in the utf8 egg.

Thanks!
K.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]