[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Wide strings status
From: |
Mike Gran |
Subject: |
Re: Wide strings status |
Date: |
Tue, 21 Apr 2009 20:26:20 -0700 |
On Tue, 2009-04-21 at 23:37 +0200, Ludovic Courtès wrote:
> > This is all going to be slower than before because of the string
> > conversion operations, but, I didn't want to do any premature
> > optimization. First, I wanted to get it working, but, there is plenty
> > of room for optimization later.
>
> Good. Maybe it'd be nice to add simple micro-benchmarks for
> `string-ref', `string-set!' et al. under `benchmarks'.
>
I'll put it on my todo list.
> > Character encoding needs to be a property of ports, so that not all
> > string operations are done in the current locale. This is necessary so
> > that UTF-8-encoded source files are not interpreted differently based on
> > the current locale.
>
> You seem to imply that `scm_getc ()' will now return a Unicode
> codepoint, is that right? What about `scm_c_{read,write} ()', and
> `scm_{get,put}s ()'?
>
I vacillate on this, but, I think the most logical approach is to have
scm_getc return codepoints and to have the rest of those functions
return strings that could contain wide characters. This is if and only
if the port has been assigned a character encoding. If it doesn't have
an associated encoding, ports will be treated as de facto ISO-8859-1,
where character values between 0 and 255 are stored without any
interpretation and characters greater than 255 are invalid. (Unicode
codepoints 0 to 255 are by design the same as ISO-8859-1.)
> > The VM and interpreter need to be updated to deal with wide chars and
> > probably in other ways that are unclear to me now. Wide strings are
> > currently getting truncated to 8-bit somewhere in there.
>
> The compiler could use bytevectors when dealing with bytecode. Maybe
> that would clarify things.
On those issues, I'll have to concede to the wisdom of others. I'll do
what I can with the C code, and then I'll need help.
>
> Thanks,
> Ludo'.
>
Thanks for taking the time.
-Mike