[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
wget2 | Parsing comments in <style> content (patch attached) (#540)
From: |
Sergei Litvin |
Subject: |
wget2 | Parsing comments in <style> content (patch attached) (#540) |
Date: |
Sun, 22 Nov 2020 20:13:01 +0000 |
Sergei Litvin created an issue: https://gitlab.com/gnuwget/wget2/-/issues/540
Hello, currently parsing html-file content fails if "<"-symbols occur in
<style> content.
Command line to reproduce:
```
wget2 -m --max-threads=1 --content-disposition --regex-type=pcre
--accept-regex="www\.3gpp\.org/DynaReport/23.*?\.htm|portal\.3gpp\.org/desktopmodules/Specifications/SpecificationDetails\.aspx\?specificationId=|portal\.etsi\.org/webapp/workprogram/Report_WorkItem\.asp\?WKI_ID=|www\.etsi\.org/deliver/etsi_ts/.*?\.pdf"
--domains="portal.etsi.org" --span-hosts --filter-urls
https://www.3gpp.org/ftp/Specs/html-info/23-series.htm
```
Parsing and following of <a ... href=23XXX.htm> links are expected.
Patch with proposed fix is attached:
[0001-Fix-parsing-comments-in-style-content.patch](/uploads/4da83b12c5b9ced80420d3ee6cec7a13/0001-Fix-parsing-comments-in-style-content.patch)
--
Reply to this email directly or view it on GitLab:
https://gitlab.com/gnuwget/wget2/-/issues/540
You're receiving this email because of your account on gitlab.com.
- wget2 | Parsing comments in <style> content (patch attached) (#540),
Sergei Litvin <=