[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
limiting recurion for fetching but not for --page-requisites using --spa
From: |
Chris Lawson |
Subject: |
limiting recurion for fetching but not for --page-requisites using --span-hosts |
Date: |
Mon, 14 Nov 2022 00:42:09 -0600 |
Hello everyone,
I've been experimenting with combinations of --recursive, --span-hosts,
--page-requisites, --domains='X,Y,Z' for downloading pages from blogs and
forums, and can't figure out how to do exactly what I want.
I want to follow pages recursively, but only within certain domains, so I
set --recursive, --span-hosts, and --domains='X,Y,Z'. For each page fetched
I also want to grab all the page requisites, especially images and CSS
files, so I set --page-requisites, but it looks like --page-requisites is
subject to --span-hosts and the --domain= flag, so it won't grab images
outside of the domains I specify.
What I'd like is for --page-requisites to visit any domains needed without
restriction, but of course if I just set --span-hosts and don't set
--domains=, then I get a runaway recursive download.
(Currently I'm solving this by getting the pages once, grepping for img
tags, then adding those domains to my --domains flag. But this backfires on
me if someone links to the image hosting site in the page I'm fetching, and
I get runaway recursion.)
Is there any way to do what I want?
Thanks in advance.
Chris
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- limiting recurion for fetching but not for --page-requisites using --span-hosts,
Chris Lawson <=