chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] utf8 and string-ref performance


From: Alex Shinn
Subject: Re: [Chicken-users] utf8 and string-ref performance
Date: Wed, 24 Nov 2010 09:47:35 -0800

On Wed, Nov 24, 2010 at 7:37 AM, Alan Post <address@hidden> wrote:
>
> If possible, I would like to parse utf8 input.  I currently have
> utf8 enabled in my egg.
[...]
> Can anyone point me in the right direction?

Parsing is generally one of the things you get for
free with utf8.  Probably the only thing you need
to do is *remove* the reference to the utf8 egg
and everything will work.  The effect of this is
that parsing will work on bytes instead of characters,
but the results will be the same.

There may still be corner cases.  If the API allows
searching for individual characters, you need to
check if they are non-ASCII and if so convert them
into the relevant utf8 string.

"Indexes" on input and output would be in terms
of byte position.  If you want to make this char
position you have to convert once each on input
and output.  That's O(n), so no effect on asymptotic
performance.

-- 
Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]