Re: [Chicken-users] UTF-8 support in eggs

From:

Alex Shinn

Subject:

Date:

Thu, 10 Jul 2014 23:16:42 +0900

On Thu, Jul 10, 2014 at 3:51 PM, John Cowan <address@hidden> wrote:

Alex Shinn scripsit:

> The clean way to handle this is to duplicate the useful string
> APIs for bytevectors. This could be done without code duplication
> with the use of functors, though compiler assistance may be
> needed for efficiency (e.g. for inlined procedures). Even without
> code duplication there would be an increase in the core library
> size, though we could probably move most utilities to external
> libraries (how often do you need regexps that operate on binary
> data?).

+1. This is what Python 3.x does to help manage the same transition: the
only string APIs that don't have bytevector counterparts are formatting,
string-to-bytevector conversion, and a few others. This API is also
useful for dealing with binary protocols that have ASCII parts.

Hmmm... that's upsetting. Python 3 is a notorious dead-end

language.

Note Chibi implements utf8 in the core how I think it should be

done, having no backwards compatibility beyond R7RS to deal

with. It accounts for less than 6k of the library size, most of

which is for the split index/cursor API rather than the actual

utf8 processing routines. I do run into inconveniences from

time to time, but am gradually expanding the bytevector utilities

as needed (mostly in (chibi bytevector) and (chibi io)). When

the API is more stable it may be good to follow it.

For comparison, Chibi's ultra small and naive full numeric

tower implementation costs 53k in library size.

Alex