chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg


From: Alaric Snell-Pym
Subject: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Date: Tue, 18 Mar 2008 11:21:17 +0000


On 18 Mar 2008, at 2:29 am, Alex Shinn wrote:

The problems we're having aren't even about string
representation though, they're about the semantics of the
string operations themselves.  Are the string indices byte
positions or character positions?  Different libraries
disagree.


IMHO Java does it more or less right (falls down on the details,
though; tends to assume that one UTF16 code = 1 character, sigh).

As in, you have a byte type, and a char type, and never the twain
shall meet, except that String (a wrapper around a char array with
stringy operations defined) has an encode method that takes an
encoding name and returns a byte array, and a constructor that takes
a byte array and an encoding name. There's versions, too, that don't
take an encoding name, and then use the "platform default
encoding" (eg, on UNIX, it looks up the locale and works from that).

So when you read from a file, you get bytes, but if you ask, they'll
be converted to characters, etc.

ABS

--
Alaric Snell-Pym
Work: http://www.snell-systems.co.uk/
Play: http://www.snell-pym.org.uk/alaric/
Blog: http://www.snell-pym.org.uk/?author=4






reply via email to

[Prev in Thread] Current Thread [Next in Thread]