chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg


From: John Cowan
Subject: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Date: Tue, 18 Mar 2008 16:53:22 -0400
User-agent: Mutt/1.5.13 (2006-08-11)

Tobia Conforto scripsit:

> Let's see... ASCII is valid UTF-8, so all ASCII external  
> representations wouldn't need any encoding or decoding work.

True.  However, pure ASCII is less comment than people believe, as
indicated by the 59K Google hits for "8-bit ASCII".

> Most recent formats and protocols require or strongly recommend UTF-8
> (see XML etc.) so those wouldn't need any encoding/decoding either.

Well, there's an awful lot of content on the Internet and on local hard
disks that is neither true ASCII nor UTF-8.  In particular, UTF-16 is
the usual representation of Unicode on Windows, and various non-Unicode
character sets are the usual representation of text on Windows, and
consequently on the Web too.  UTF-8 is something of an oddity there.

> As far as internal representations covering all Unicode go, UTF-8
> looks like the one incurring in the less overhead, in the general case.
> Not to mention the less work on the developer side, as we already have
> the utf8 egg!

I'm fine with using UTF-8 as our internal representation.

> Unicode/UTF8-aware string operations will perform a correct  
> replacement and insert the two extra bytes, if the source string  
> really is plain ASCII.  If the source string (or just the part near  
> the change) is not correct UTF-8 or ASCII to begin with, they will  
> raise an error.

You're right.

-- 
Overhead, without any fuss, the stars were going out.
        --Arthur C. Clarke, "The Nine Billion Names of God"
                John Cowan <address@hidden>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]