[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Wget-dev] wget2 | Restricting domains with host-spanning doesn not work
From: |
一郎 |
Subject: |
[Wget-dev] wget2 | Restricting domains with host-spanning doesn not work (#483) |
Date: |
Fri, 18 Oct 2019 19:19:45 +0000 |
一郎 created an issue: https://gitlab.com/gnuwget/wget2/issues/483
I'm running this command in the hope of crawling subdomains under kedo.gov.cn:
`wget2 -r -w 8 --filter-mime-type="text/html" -a wget_log -H -D kedo.gov.cn
http://www.kedo.gov.cn`
If my assumptions are correct, when combined, `-H` enables host-spanning and
`-D` restricts the domains. However, after a minute of operation, I end up with
the following folder structure:
```
.
├── story.kedo.gov.cn
│ ├── index.html
│ ├── stories
│ │ └── kxr
│ │ └── index.html
│ └── story
│ └── legend
│ └── classics
│ └── index.html
├── wget_log
├── www.kedo.gov.cn
│ └── index.html
└── www.kepuchina.cn
├── index.html
└── public
└── 201710
└── t20171031_253123.shtml
```
While the `www.kedo.gov.cn` and `story.kedo.gov.cn` folders, and their contents
are desirable, the `www.kepuchina.cn` is *not*. It should clearly be excluded
by `-D`. I'm familiar with these two flags from the original `wget`
documentation, and have used them in the past.
How do I get wget2 to honor `-D`?
--
Reply to this email directly or view it on GitLab:
https://gitlab.com/gnuwget/wget2/issues/483
You're receiving this email because of your account on gitlab.com.
- [Wget-dev] wget2 | Restricting domains with host-spanning doesn not work (#483),
一郎 <=
- Re: [Wget-dev] wget2 | Restricting domains with host-spanning does not work (#483), Archit Pandey, 2019/10/22
- Re: [Wget-dev] wget2 | Restricting domains with host-spanning does not work (#483), Tim Rühsen, 2019/10/22
- Re: [Wget-dev] wget2 | Restricting domains with host-spanning does not work (#483), Tim Rühsen, 2019/10/22
- Re: [Wget-dev] wget2 | Restricting domains with host-spanning does not work (#483), 一郎, 2019/10/22
- Re: [Wget-dev] wget2 | Restricting domains with host-spanning does not work (#483), Tim Rühsen, 2019/10/22