[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Using shtml with htmlprag - output of shtml->html is different to so
From: |
Kenan Toker |
Subject: |
Re: Using shtml with htmlprag - output of shtml->html is different to some given HTML |
Date: |
Fri, 6 Sep 2019 14:09:53 +1000 |
Hi Neil,
Thanks heaps, I'll give this fix a go and let you know how it works ASAP.
That all makes sense re: avoiding breaking changes in guile-lib. If this
fix works and is all that's needed, I'll use it instead of the version
currently available in guile-lib.
With that in mind, if I were to choose one of the 'distributions' of
htmlprag, is there one you yourself would pick? - or are the version
available in e.g. guile-lib and standalone for all intents and purposes
the same?
Cheers,
Kenan
On 6/9/19 8:33 am, Neil Van Dyke wrote:
> Kenan, could you please try the below "one-line" change, and let me
> know what you think?
>
> (It's an attempt at a minimal fix for the problem you were seeing, and
> for some related problems with modern HTML. However, it breaks
> backward-compatibility relative to the htmlprag currently in
> guile-lib. For example, consider someone doing Web scraping of modern
> HTML, and their scraping code only works with the previous, invalid
> parse. I'm not yet familiar with guile-lib and how the htmlprag in it
> is being used, so I don't want to be too quick to suggest breaking
> changes to it.)
>
> (Historical note: htmlprag was mostly written 18 years ago, when HTML
> was different in both standards and practice. Today, I'd write the
> parser very differently, though I think there's a good chance that
> htmlprag will still work for one's purpose, with this change.)
>
> Neil
>
> --- htmlprag.scm.ORIG 2019-09-05 18:21:40.850220789 -0400
> +++ htmlprag.scm 2019-09-05 18:21:40.850220789 -0400
> @@ -1099,7 +1099,7 @@
> (meta . (head))
> (noframes . (frameset))
> (option . (select))
> - (p . (body td th))
> + (p . (div blockquote body footer header li td th))
> (param . (applet))
> (tbody . (table))
> (td . (tr))
> @@ -1989,6 +1989,13 @@
> (t1 "<script>xxx" '((script "xxx")))
> (t1 "<script/>xxx" '((script) "xxx"))
>
> + (t1 "<div><p>x</p></div>" '((div (p "x"))))
> + (t1 "<header><p>x</p></>" '((header (p "x"))))
> + (t1 "<footer><p>x</p></>" '((footer (p "x"))))
> + (t1 "<blockquote><p>x</p></blockquote>" '((blockquote (p "x"))))
> + (t1 "<ul><li><p>x</p></li></ul>" '((ul (li (p "x")))))
> + (t1 "<ol><li><p>x</p></li></ol>" '((ol (li (p "x")))))
> +
> ;; TODO: Add verbatim-pair cases with attributes in the end tag.
>
> (t2 '(p) "<p></p>")
>
signature.asc
Description: OpenPGP digital signature