[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#40794: 26.3; HTML entities ☆ and ★ (inter alia) are not p
From: |
Tim Landscheidt |
Subject: |
bug#40794: 26.3; HTML entities ☆ and ★ (inter alia) are not parsed by libxml-parse-html-region |
Date: |
Thu, 23 Apr 2020 13:24:12 +0000 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) |
(Prologue: This bug showed up in the "ALT" attribute of an
"IMG" element of an HTML mail in Gnus. I am reasonably cer-
tain that this stems from libxml-parse-html-region and
should be fixed there, but there may be more prudent solu-
tions.)
With GNU Emacs 26.3 on Fedora:
| ELISP> (with-temp-buffer
| (insert "<!DOCTYPE html>
| <html lang=\"en\">
| <head><title>Title</title></head>
| <body>
| <p>Hello world</p>
| <p>ä</p>
| <p>☆</p>
| <p>★</p>
| </body>
| </html>")
| (libxml-parse-html-region (point-min) (point-max)))
| (html
| ((lang . "en"))
| (head nil
| (title nil "Title"))
| (body nil "\n "
| (p nil "Hello world")
| "\n "
| (p nil "ä")
| "\n "
| (p nil "☆")
| "\n "
| (p nil "★")
| "\n"))
| ELISP>
These should instead yield "ä" (228), "☆" (9734) and
"★" (9733).
lisp/leim/quail/sgml-input.el seems to contain the necessary
data for ☆ and ★ that could probably be fed to
libxml.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#40794: 26.3; HTML entities ☆ and ★ (inter alia) are not parsed by libxml-parse-html-region,
Tim Landscheidt <=