bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] "Transparent proxy URL" ariation on "-E -k" options ?


From: Gabriel Somlo
Subject: [Bug-wget] "Transparent proxy URL" ariation on "-E -k" options ?
Date: Fri, 17 Oct 2014 15:39:53 -0400
User-agent: Mutt/1.5.23 (2014-03-12)

Hi,

I'm working on a "Web-in-a-sandbox" project, trying to host shallow
(-l 2) copies of several web sites on a server running in a private
Internet "replica".


So far, httrack's "-K5" option (which they call "transparent proxy
URL") appears to do what I need (see 
http://www.httrack.com/html/httrack.man.html):

1. rename script output of site.com/article.cgi?25 to
   ./site.com/articleDEADBEEF.css
   (note the difference from --adjust-extension)

2. rewrite the link in the referencing document as
   "http://site.com/articleDEADBEEF.css?25";

which works perfectly when hosting both referencing and referenced
sites in the sandbox.

I unfortunately found httrack to be otherwise very fragile, and (the
major dealbreaker for me) still unable to follow meta refresh links,
so I'd like to see wget gain the ability to rename source links and
target documents the way httrack's "-K5" flag works, as described
above.


With wget, I'm using "-k -E" (--convert-links and --adjust-extension)
when mirroring these web sites, but would be interested in an
alternative way of accomplishing --convert-links.

As far as I was able to tell, --adjust-extension will append a
.html or .css when saving script output, e.g. from something like

http://site.com/article.cgi?25  to  ./site.com/article.cgi?25.css

but not also rewrite the referencing URL in the document which caused
us to recurse and wget the output of this script.


If I try to add --convert-links into the mix, the referencing link
does get rewritten, but ends up looking like

"../site.com/article.cgi?25.html"

which is designed for offline viewing via "file://", and is unsuitable
for actually hosting both the referencing and referenced sites as
virtual servers in a web server within the sandbox.


Am I missing something about wget's capatiblities that would allow me
to get it to work in a way similar to httrack's -K5 option ?


If not, assuming I can come up with a patch, would there be any
interest in upstreaming this type of additional functionality ?

Thanks much,
--Gabriel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]