[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lynx-dev] Update List of Character Entity Names
From: |
Brian Inglis |
Subject: |
Re: [Lynx-dev] Update List of Character Entity Names |
Date: |
Fri, 10 Jan 2025 21:56:12 -0700 |
User-agent: |
Mozilla Thunderbird |
On 2025-01-09 14:34, Thomas Dickey wrote:
On Thu, Jan 09, 2025 at 11:15:23AM -0700, Brian Inglis wrote:
Hi folks,
Many sites are now using Character Entity Names defined under
https://www.w3.org/TR/xml-entity-names/
https://www.w3.org/TR/xml-entity-names/#source
https://www.w3.org/TR/xml-entity-names/bycodes.html
https://www.w3.org/TR/xml-entity-names/byalpha.html
the former is about 184KB, and the latter about 386KB, with a lot of HTML
overhead.
As they have to index character name strings not just codepoint combos, they
probably need about an order of magnitude more space than compose data: ~50KB
source with lots of overhead actually ~8KB.
Note: unicode.xml is over 5MB in size and may not really be suitable
for direct viewing in a browser. You may prefer to save the file
rather than follow the above link to unicode.xml in a browser.
(sounds like a lot of data - bigger than the current lynx executable)
On my system, that is ~1.5MB + ~10MB .so!
https://github.com/w3c/xml-entities/blob/gh-pages/unicode.xml
https://github.com/w3c/xml-entities/raw/refs/heads/gh-pages/unicode.xml
That's all the defined Unicode characters, some attributes and properties of
each, and their blocks, categories, scripts, etc. totalling ~10MB in source,
with a lot of XML overhead.
https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references
https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
which currently render verbatim rather than being ignored or blanked, for
example:
$ lynx -dump -nonumbers -nolist libera.chat | grep '&[^;]\+;'
Libera.​Chat
Libera.​Chat
Please consider updating your entities from what I can see in your
snapshots, or provide an innocuous default?
[Not subscribed, please CC:]
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retrancher but when there is no more to cut
-- Antoine de Saint-Exupéry