chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] UTF-8 support in eggs


From: Alex Shinn
Subject: Re: [Chicken-users] UTF-8 support in eggs
Date: Thu, 10 Jul 2014 23:16:42 +0900

On Thu, Jul 10, 2014 at 3:51 PM, John Cowan <address@hidden> wrote:
Alex Shinn scripsit:

> The clean way to handle this is to duplicate the useful string
> APIs for bytevectors.  This could be done without code duplication
> with the use of functors, though compiler assistance may be
> needed for efficiency (e.g. for inlined procedures).  Even without
> code duplication there would be an increase in the core library
> size, though we could probably move most utilities to external
> libraries (how often do you need regexps that operate on binary
> data?).

+1.  This is what Python 3.x does to help manage the same transition: the
only string APIs that don't have bytevector counterparts are formatting,
string-to-bytevector conversion, and a few others.  This API is also
useful for dealing with binary protocols that have ASCII parts.

Hmmm... that's upsetting.  Python 3 is a notorious dead-end
language.

Note Chibi implements utf8 in the core how I think it should be
done, having no backwards compatibility beyond R7RS to deal
with.  It accounts for less than 6k of the library size, most of
which is for the split index/cursor API rather than the actual
utf8 processing routines.  I do run into inconveniences from
time to time, but am gradually expanding the bytevector utilities
as needed (mostly in (chibi bytevector) and (chibi io)).  When
the API is more stable it may be good to follow it.

For comparison, Chibi's ultra small and naive full numeric
tower implementation costs 53k in library size.

-- 
Alex


reply via email to

[Prev in Thread] Current Thread [Next in Thread]