chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Patch for CHICKEN 6 uri-generic


From: felix . winkelmann
Subject: Re: Patch for CHICKEN 6 uri-generic
Date: Fri, 24 May 2024 17:19:31 +0200

> On Fri, May 24, 2024 at 3:31 AM Peter Bex <peter@more-magic.net> wrote:
>
> - It encodes how many bytes to use in the first byte's leading bit,
>
> >   leading three bits, leading four bits or leading five bits depending
> >   on the length.
> >
> > This latter property is extra annoying because you can't just extract
> > the length from the first byte - you have to scan the first bit to
> > decide what to do next.  Then, you scan the second and third bit etc.
> >
>
> That's not actually true.  You can use a table of 128 entries with one
> single-byte entry for each possible value of the first byte, specifyfing
> the length of the UTF-8 value.  So table entries 0 to 127 have value 1,
> etc.  Entries that aren't valid UTF-8 leading bytes, such as 255, have 0 in
> the table.
>

See also "C_utf_expect" in utf.c (or "C_utf_bytes_needed" in chicken.h,
which can be called via "##core#inline").


felix




reply via email to

[Prev in Thread] Current Thread [Next in Thread]