[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#30789: 26.0.91; xml-parse-region works but libxml-parse-html-region

From: Katsumi Yamaoka
Subject: bug#30789: 26.0.91; xml-parse-region works but libxml-parse-html-region doesn't
Date: Tue, 13 Mar 2018 11:28:45 +0900
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-unknown-cygwin)

On Tue, 13 Mar 2018 01:44:22 +0100, Lars Ingebrigtsen wrote:
> libxml is more strict about correctness of the input than most other
> HTML parsers.  I don't think there's anything we can do about this
> problematic input other than ponder whether Emacs should use a different
> HTML parser, which I think sounds of unlikely.  :-)

I see.  I agree not to modify libxml.  Jidanni, how about trying
the following patch personally if you often get such broken mails?
Though I'm not quite sure if it does not cause another problem,
it fixes at least the mail in question.

--- mm-decode.el~       2018-02-28 02:01:37.897607000 +0000
+++ mm-decode.el        2018-03-13 02:23:04.321753900 +0000
@@ -1810,6 +1810,11 @@
       (when (and (or coding
                     (setq coding (mm-charset-to-coding-system charset nil t)))
                 (not (eq coding 'ascii)))
+       ;; Remove extra bytes in utf-8 encoded data.
+       (when (eq coding 'utf-8)
+         (goto-char (point-min))
+         (while (re-search-forward "[\x00-\x7f]+\\([\x80-\xbf]\\)" nil t)
+           (replace-match "\\1")))
        (insert (prog1
                    (decode-coding-string (buffer-string) coding)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]