lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] Update List of Character Entity Names


From: Thomas Dickey
Subject: Re: [Lynx-dev] Update List of Character Entity Names
Date: Mon, 13 Jan 2025 04:07:48 -0500

On Fri, Jan 10, 2025 at 09:56:12PM -0700, Brian Inglis wrote:
> On 2025-01-09 14:34, Thomas Dickey wrote:
> > On Thu, Jan 09, 2025 at 11:15:23AM -0700, Brian Inglis wrote:
> > > Hi folks,
> > > 
> > > Many sites are now using Character Entity Names defined under
> > > 
> > >   https://www.w3.org/TR/xml-entity-names/
> > 
> > https://www.w3.org/TR/xml-entity-names/#source
> 
>       https://www.w3.org/TR/xml-entity-names/bycodes.html
> 
>       https://www.w3.org/TR/xml-entity-names/byalpha.html
> 
> the former is about 184KB, and the latter about 386KB, with a lot of HTML 
> overhead.
> As they have to index character name strings not just codepoint combos, they
> probably need about an order of magnitude more space than compose data:
> ~50KB source with lots of overhead actually ~8KB.

I see.  I'm expecting other issues with zero-width-whatever, but will
(after current work on cdk & dialog) see about making a script to extract
the data from bycodes.html

-- 
Thomas E. Dickey <dickey@invisible-island.net>
https://invisible-island.net

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]