lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Request for Assistance


From: Subir Grewal
Subject: Re: LYNX-DEV Request for Assistance
Date: Sun, 15 Dec 1996 10:00:23 -0800 (PST)

Dear address@hidden,

A Lynx user recently sent mail to the lynx-dev list asking for assitance
with the following problem:

On Sat, 14 Dec 1996, David Mischel wrote:

:My name is David Mischel.  My e mail address is address@hidden
:I am sightless.  I have been using lynx through an internet provider
:called hermes.  For 6 months I have been able to access the Washington
:Post without any problems.  Apparently they have updated their site.  Now,
:although I can stil get in, I can no longer access the articles.  I can go
:into the various sections, but when I hit on an article and try to call it
:up, I get an error message telling me that it's unable to connect.  People
:using another browser can get the articles, so the problem is with the
:compabability between lynx and the new set up.  The internet provider is
:using lynx version 2.6 which I understand is the latest.  Someone
:suggested that perhaps the scripts to get to the individual articles were
:too long.  I am not a technical expert, so I'm not sure what that means.
:If anyone has any suggestions,  I would appreciate them.  I can be reached
:by phone at 202-554-8079 or at the e mail address shown above.  thanks in
:advance for any assistance.  

My rather lengthy analysis of the problem(s) follows.

I tried to trace the problem and seem to have found two separate problems
with the implementation of cgi's on the Washington Post server.  The first
is rather straightforward and involves the manner in which your CGIs
perform searches for keywords.

For example, in the following document:

   Linkname: WashingtonPost.com: Federal Community News
        URL:
          http://www.washingtonpost.com/wp-srv/national/longterm/fedcom/c
          ausey/causey.htm

activating the following link (i.e. initiating a search for all recent
articles written by Michael Causey):

   Linkname: Causey's latest
   Filename:
          http://www.washingtonpost.com/cgi-bin/search?RELEVANCE_RANK=0&T
          OTAL_HITLIST=20&DB_NAME=WPlate&ALL=causey%3Abyline

results in the following document:

   Linkname: Washington Post: Search Results
        URL:
          http://www.washingtonpost.com/cgi-bin/search?RELEVANCE_RANK=0&T
          OTAL_HITLIST=20&DB_NAME=WPlate&ALL=causey%3Abyline
          
which is fine and dandy, except that all the links in that document are of
the following form:

   Linkname: Healthy Incentive to Retire
   Filename:
          http://www.washingtonpost.com/../wp-srv/WPlate/1996-12/15/110L-
          121596-idx.html

The .. throws the server off-course when Lynx tries to GET that URL.  The
problem seems to be with the CGI which is searrching the site, since it
resides in the /cgi-bin directory, it's taking Unix paths (using .. to
refer to the parent directory) and using them as URLs.  Or this is my
interpretation of the situation.  Kindly see if anything can be done about
this.  I wasn't able to find bad URLs of this sort on your site itself, so
I'd say its only a problem with the manner in which the searches are
implemented, it should be reasonably straightforward to fix.

The second problem is more complex (relatively) and occurs on the main
page itself (though there are similar problems all over the site).  It
involves the "pop-up boxes" that permit users to select one particular
section of the paper.  For example in the following document:

   Linkname: Welcome to WashingtonPost.com
        URL: http://www.washingtonpost.com/

the following link exhibits this problem:

   Linkname: Go
     Method: POST
     Action: http://www.washingtonpost.com/cgi-bin/navigate.py

What happens when this link is activated is that the currently selected
item in the pop-up box is sent to the server.  The CGI churns away,
processes the request, and returns a code 302 "Moved temporarily". This is
fine, but Lynx in compliance with the HTTP 1.0 protocol (which your server
advertises itself as) then proceeds to redirect the post content, after
having asked the user whether they wish to do this.  Unfortunately, the
CGI appears to be reelying on the expressly incorrect behaviour of some
user-agents (browsers) to change post redirects into GET requests.  Lynx
2.6 will not do this.  Ideally, for the functionality you desire the CGI
should return a code 303, which is a new implimentation in the draft HTTP
1.1 designed especially for this purpose (POST scripts that redirect the
user-agent to a real document).  I've excerpted the relevant portions of
HTTP 1.0 and HTTP 1.1 for your information:

-------- HTTP 1.0 <URL:http://ds.internic.net/rfc/rfc1945.txt> ----------


9.3  Redirection 3xx

   This class of status code indicates that further action needs to be
   taken by the user agent in order to fulfill the request. The action
   required may be carried out by the user agent without interaction
   with the user if and only if the method used in the subsequent
   request is GET or HEAD. A user agent should never automatically
   redirect a request more than 5 times, since such redirections usually
   indicate an infinite loop.

   300 Multiple Choices

   This response code is not directly used by HTTP/1.0 applications,
   but serves as the default for interpreting the 3xx class of
   responses.

   The requested resource is available at one or more locations.
   Unless it was a HEAD request, the response should include an entity
   containing a list of resource characteristics and locations from
   which the user or user agent can choose the one most appropriate.
   If the server has a preferred choice, it should include the URL in
   a Location field; user agents may use this field value for
   automatic redirection.

   301 Moved Permanently

   The requested resource has been assigned a new permanent URL and
   any future references to this resource should be done using that
   URL. Clients with link editing capabilities should automatically
   relink references to the Request-URI to the new reference returned
   by the server, where possible.

   The new URL must be given by the Location field in the response.
   Unless it was a HEAD request, the Entity-Body of the response
   should contain a short note with a hyperlink to the new URL.

   If the 301 status code is received in response to a request using
   the POST method, the user agent must not automatically redirect the
   request unless it can be confirmed by the user, since this might
   change the conditions under which the request was issued.





Berners-Lee, et al           Informational                     [Page 34]

RFC 1945                        HTTP/1.0                        May 1996


       Note: When automatically redirecting a POST request after
       receiving a 301 status code, some existing user agents will
       erroneously change it into a GET request.

   302 Moved Temporarily

   The requested resource resides temporarily under a different URL.
   Since the redirection may be altered on occasion, the client should
   continue to use the Request-URI for future requests.

   The URL must be given by the Location field in the response. Unless
   it was a HEAD request, the Entity-Body of the response should
   contain a short note with a hyperlink to the new URI(s).

   If the 302 status code is received in response to a request using
   the POST method, the user agent must not automatically redirect the
   request unless it can be confirmed by the user, since this might
   change the conditions under which the request was issued.

       Note: When automatically redirecting a POST request after
       receiving a 302 status code, some existing user agents will
       erroneously change it into a GET request.

   304 Not Modified

   If the client has performed a conditional GET request and access is
   allowed, but the document has not been modified since the date and
   time specified in the If-Modified-Since field, the server must
   respond with this status code and not send an Entity-Body to the
   client. Header fields contained in the response should only include
   information which is relevant to cache managers or which may have
   changed independently of the entity's Last-Modified date. Examples
   of relevant header fields include: Date, Server, and Expires. A
   cache should update its cached entity to reflect any new field
   values given in the 304 response.


-------------------------------------------------------------------------

-------- HTTP 1.1 <URL:http://www.w3.org/pub/WWW/Protocols/> ------------


10.3 Redirection 3xx

This class of status code indicates that further action needs to be
taken by the user agent in order to fulfill the request. The action
required MAY be carried out by the user agent without interaction with
the user if and only if the method used in the second request is GET or
HEAD. A user agent SHOULD NOT automatically redirect a request more than
5 times, since such redirections usually indicate an infinite loop.


10.3.1 300 Multiple Choices

The requested resource corresponds to any one of a set of
representations, each with its own specific location, and agent-driven
negotiation information (section 12) is being provided so that the user
(or user agent) can select a preferred representation and redirect its
request to that location.

Unless it was a HEAD request, the response SHOULD include an entity
containing a list of resource characteristics and location(s) from which
the user or user agent can choose the one most appropriate. The entity
format is specified by the media type given in the Content-Type header
field. Depending upon the format and the capabilities of the user agent,
selection of the most appropriate choice may be performed automatically.
However, this specification does not define any standard for such
automatic selection.

If the server has a preferred choice of representation, it SHOULD
include the specific URL for that representation in the Location field;
user agents MAY use the Location field value for automatic redirection.
This response is cachable unless indicated otherwise.


10.3.2 301 Moved Permanently

The requested resource has been assigned a new permanent URI and any
future references to this resource SHOULD be done using one of the
returned URIs. Clients with link editing capabilities SHOULD
automatically re-link references to the Request-URI to one or more of
the new references returned by the server, where possible. This response
is cachable unless indicated otherwise.

If the new URI is a location, its URL SHOULD be given by the Location
field in the response. Unless the request method was HEAD, the entity of
the response SHOULD contain a short hypertext note with a hyperlink to
the new URI(s).


Fielding, et al                                              [Page 54]



INTERNET-DRAFT            HTTP/1.1             Monday, August 12, 1996


If the 301 status code is received in response to a request other than
GET or HEAD, the user agent MUST NOT automatically redirect the request
unless it can be confirmed by the user, since this might change the
conditions under which the request was issued.

  Note: When automatically redirecting a POST request after receiving
  a 301 status code, some existing HTTP/1.0 user agents will
  erroneously change it into a GET request.


10.3.3 302 Moved Temporarily

The requested resource resides temporarily under a different URI. Since
the redirection may be altered on occasion, the client SHOULD continue
to use the Request-URI for future requests. This response is only
cachable if indicated by a Cache-Control or Expires header field.

If the new URI is a location, its URL SHOULD be given by the Location
field in the response. Unless the request method was HEAD, the entity of
the response SHOULD contain a short hypertext note with a hyperlink to
the new URI(s).

If the 302 status code is received in response to a request other than
GET or HEAD, the user agent MUST NOT automatically redirect the request
unless it can be confirmed by the user, since this might change the
conditions under which the request was issued.

  Note: When automatically redirecting a POST request after receiving
  a 302 status code, some existing HTTP/1.0 user agents will
  erroneously change it into a GET request.


10.3.4 303 See Other

The response to the request can be found under a different URI and
SHOULD be retrieved using a GET method on that resource. This method
exists primarily to allow the output of a POST-activated script to
redirect the user agent to a selected resource. The new URI is not a
substitute reference for the originally requested resource. The 303
response is not cachable, but the response to the second (redirected)
request MAY be cachable.

If the new URI is a location, its URL SHOULD be given by the Location
field in the response. Unless the request method was HEAD, the entity of
the response SHOULD contain a short hypertext note with a hyperlink to
the new URI(s).






Fielding, et al                                              [Page 55]



INTERNET-DRAFT            HTTP/1.1             Monday, August 12, 1996


10.3.5 304 Not Modified

If the client has performed a conditional GET request and access is
allowed, but the document has not been modified, the server SHOULD
respond with this status code. The response MUST NOT contain a message-
body.

The response MUST include the following header fields:

  o  Date

  o  ETag and/or Content-Location, if the header would have been sent in
     a 200 response to the same request

  o  Expires, Cache-Control, and/or Vary, if the field-value might
     differ from that sent in any previous response for the same variant

If the conditional GET used a strong cache validator (see section
13.3.3), the response SHOULD NOT include other entity-headers. Otherwise
(i.e., the conditional GET used a weak validator), the response MUST NOT
include other entity-headers; this prevents inconsistencies between
cached entity-bodies and updated headers.

If a 304 response indicates an entity not currently cached, then the
cache MUST disregard the response and repeat the request without the
conditional.

If a cache uses a received 304 response to update a cache entry, the
cache MUST update the entry to reflect any new field values given in the
response.

The 304 response MUST NOT include a message-body, and thus is always
terminated by the first empty line after the header fields.


10.3.6 305 Use Proxy

The requested resource MUST be accessed through the proxy given by the
Location field. The Location field gives the URL of the proxy. The
recipient is expected to repeat the request via the proxy.

-------------------------------------------------------------------------

I apologize for the ength of this message, but hopefully it will clarify a
few issues and highlight our concerns better than a short missive would.
I personally think everyone at the Post is doing a wonderful job with teh
On-line edition.  However, it is important to keep in mind that
visually-impaired users often have no other alternative but reading
(hearing) a digital version of newspapers.  This is of course one of the
great promises the medium makes to us.  I hope you will be able to correct
the problems I've pointed out.  If you'd like any assistance with this, I
would urge you to send a message to address@hidden where the Lynx
Developers will try their best to assist.  I am not an expert on HTTP, but
others here are and will be better able to answer detailed technical
questions.

I trust I've correctly identified David's problem.  If I haven't, I will
be getting back to you with another note later this week.

Thanks for you time and attention,

Subir Grewal

address@hidden  +  Lynx 2.6  +  PGP  +  http://www.crl.com/~subir/
Liar, n.:
        A lawyer with a roving commission.
                -- Ambrose Bierce, "The Devil's Dictionary"

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]