bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Why does -A not work?


From: Tim Rühsen
Subject: Re: [Bug-wget] Why does -A not work?
Date: Wed, 20 Jun 2018 16:13:51 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

Hi Nils,

On 06/20/2018 06:16 AM, Nils Gerlach wrote:
> Hi there,
> 
> in #wget on freenode I was suggested to write this to you:
> I tried using wget to get some images:
> wget -nd -rH -Dcomicstriplibrary.org -A
> "little-nemo*s.jpeg","*html*","*.html.*","*.tmp","*page*","*display*" -p -e
> robots=off 'http://comicstriplibrary.org/search?search=little+nemo'
> I wanted to download the images only but wget was not following any of the
> links so I got that much more into -A. But it still does not follow the
> links.
> Page numbers of the search result contain "page" in the link, links to the
> big pictures i want wget to download contain "display". Both are given in
> -A and are seen in the html-document wget gets. Neither is followed by wget.
> 
> Why does this not work at all? Website is public, anybody is free to test.
> But this is not my website!

-A / -R works only on the filename, not on the path. The docs (man page)
is not very explicit about it.

Instead try --accept-regex / --reject-regex which acts on the complete
URL - but shell wildcard's won't work.

For your example this means to replace '.' by '\.' and '*' by '.*'.

To download those nemo jpegs:
wget -d -rH -Dcomicstriplibrary.org --accept-regex
".*little-nemo.*n\.jpeg" -p -e robots=off
'http://comicstriplibrary.org/search?search=little+nemo' --regex-type=posix

Regards, Tim

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]