bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget -r not working with www.archive.org


From: Micah Cowan
Subject: Re: [Bug-wget] wget -r not working with www.archive.org
Date: Mon, 26 Oct 2009 09:33:15 -0700
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Gray wrote:
> wget does not seem to want to get from the WayBackMachine -
> http://www.archive.org stored web sites.

archive.org pages contain funny JavaScript code that allows it to work
properly in browsers, and not in Wget. IIRC, it's that they set the HTML
"base" tag so that links are retrieved from the original site (whether
it exists or not), rather than archive.org; but the JavaScript will then
rewrite the base tag so that browser clicks retrieve them from archive.org.

Please note that archive.org's FAQ explicitly forbids downloading local
archives via tools such as wget.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
Maintainer of GNU Wget and GNU Teseq
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkrlz0oACgkQ7M8hyUobTrEo7gCeMs7jOb60bNXdh3ptRG/XbPbY
mQwAn0wp+jJcG8RmGO9Fcr3db6x7AMwM
=SAVy
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]