[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Lynx-dev] circumventing blocking sites
From: |
Nelson H. F. Beebe |
Subject: |
[Lynx-dev] circumventing blocking sites |
Date: |
Sat, 4 Feb 2017 09:28:51 -0700 |
For several years, I have used lynx (and also wget, and rarely, curl)
to access publisher Web pages for new journal issues. Recently, I
noticed that a lynx pull of an page from Elsevier ScienceDirect would
never complete:
% lynx -source -accept_all_cookies -cookies --trace
http://www.sciencedirect.com/science/journal/00978493/62 > foo.62
parse_arg(arg_name=http://www.sciencedirect.com/science/journal/00978493/62,
mask=1, count=5)
parse_arg
startfile:http://www.sciencedirect.com/science/journal/00978493/62
... no further output, and no job completion ...
Similarly, I also find that wget and curl fail to complete.
This new behavior suggests that the publisher site has thrown up
http-agent-specific, rather than IP-address-specific blocks, because
accessing the same URL in a GUI browser on the SAME machine gets an
immediate return of the expected journal issue contents.
If I add the --debug option to wget, I find that it reports
---request begin---
GET /science/journal/00978493/62 HTTP/1.1
User-Agent: Wget/1.14 (linux-gnu)
Accept: */*
Host: www.sciencedirect.com
Connection: Keep-Alive
---request end---
Thus, it identifies itself as wget, and I assume that lynx probably
self identifies as well.
Does anyone on this list have an idea how to circumvent these apparent
blocks?
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: address@hidden -
- 155 S 1400 E RM 233 address@hidden address@hidden -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------
- [Lynx-dev] circumventing blocking sites,
Nelson H. F. Beebe <=