Re: wget2 | crawl for urls without sending any http requests? (#554)

wget-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wget2 | crawl for urls without sending any http requests? (#554)

From:	@rockdaboot
Subject:	Re: wget2 \| crawl for urls without sending any http requests? (#554)
Date:	Sun, 04 Jul 2021 18:36:27 +0000



Tim Rühsen commented:


Not sure if i get it right... you want to download a single HTML file and print 
out all the URLs in there ?
Then take a look into `examples/print_html_urls.c`.

The point with recursive spidering is that you don't know from the URL what 
kind of file it is. So wget2 has to check the content type, which needs a 
request. If it is text/html or one of the other supported file types that 
contain more URLs, that file has to be downloaded and parsed for more URLs. And 
so on (recursive)...

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget2/-/issues/554#note_618223964
You're receiving this email because of your account on gitlab.com.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: wget2 | crawl for urls without sending any http requests? (#554), @rockdaboot <=

Prev by Date: Re: wget2 | Add FTP & FTPS support (#3)
Next by Date: Re: wget2 | Add FTP & FTPS support (#3)
Previous by thread: Re: wget2 | Add FTP & FTPS support (#3)
Next by thread: wget | * src/recur.c (download_child): Remove temporary robots.txt.tmp (!25)
Index(es):
- Date
- Thread