[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Lynx-dev] circumventing blocking sites

From: Nelson H. F. Beebe
Subject: [Lynx-dev] circumventing blocking sites
Date: Sat, 4 Feb 2017 09:28:51 -0700

For several years, I have used lynx (and also wget, and rarely, curl)
to access publisher Web pages for new journal issues.  Recently, I
noticed that a lynx pull of an page from Elsevier ScienceDirect would
never complete:

        % lynx -source -accept_all_cookies -cookies  --trace > foo.62
mask=1, count=5)
        ... no further output, and no job completion ...

Similarly, I also find that wget and curl fail to complete.

This new behavior suggests that the publisher site has thrown up
http-agent-specific, rather than IP-address-specific blocks, because
accessing the same URL in a GUI browser on the SAME machine gets an
immediate return of the expected journal issue contents.

If I add the --debug option to wget, I find that it reports

        ---request begin---
        GET /science/journal/00978493/62 HTTP/1.1
        User-Agent: Wget/1.14 (linux-gnu)
        Accept: */*
        Connection: Keep-Alive

        ---request end---

Thus, it identifies itself as wget, and I assume that lynx probably
self identifies as well.

Does anyone on this list have an idea how to circumvent these apparent

- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: address@hidden  -
- 155 S 1400 E RM 233                       address@hidden  address@hidden -
- Salt Lake City, UT 84112-0090, USA    URL: -

reply via email to

[Prev in Thread] Current Thread [Next in Thread]