[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Why doesn't lynx cache HTML source?

From: Leonid Pauzner
Subject: Re: lynx-dev Why doesn't lynx cache HTML source?
Date: Sat, 14 Nov 1998 19:40:59 +0300 (MSK)

>> Use cache - always validate cached data (see Note3):

> A large proportion of web pages these days are uncacheable, often for
> misguided commercial reasons.  I strongly suspect that a disproportionate
> number of the ones that people will need to view source on or parse in
> different ways will fall in this category.

Thats mostly because Lynx say "HTTP/1.0" in it's header and server reply so.
HTTP 1.1 have unique ETag that allow advanced validation for any cached data.
So most benefits from lynx cache - to receive short responce like HEAD
instead of fetching a complete document (sometimes even a head-like request
not needed but this is an obvious check and not for your case).

> Pages are becoming uncacheable either because they are dynamically
> generated or because uncacheability is forced in order to give the
> page owner, or the web hosting service, better statistics on accesses
> and accessors.  Even the banner adverts on AltaVista are now dynamically
> generated GIFs as far as any cache is concerned.

>> use Last-Modified and ETag value from the previous request -
>> add If-Modified-Since: and If-None-Match: to GET request,
>> send protocol version as "HTTP/1.1"

> I think this is a rather strong statement by Lynx; I believe that you
> must not send this unless you can handle anything permitted by HTTP/1.1
> in reply.

This should be carefully inspected, but I feel most HTTP/1.1 things
conserns about "transparent proxies" that serve for different
clients and so the compatibility, replies to client to made a new request
and submit it again, etc. We are in another position: we need HTTP/1.1 client
with an optional addition of cache with HTTP 1.1 validation mechanism.

You will help me considerably if you read spec and compare against
the comments in HTTP.c - actions on return status (200, 304, etc, etc.)

>> If we got 304 (not Modified) status - use cached data,
>> but update header fields from this new responce.
>> If we got 200 (OK) status - do as usual but do not forget to pick it up
>> to the cache, with the flag for user interruption if any happen.
>> If we got other status - do as usual and NOTHING for cache in any case.

> RFC 2068 makes user agent caching policies more or less a local issue.
> Does this new draft spec require user agents to be much more strict?  I
> can think of cases where that would cause a lot of unnecessary refetches of
> dynamically created pages.  Certainly current generation GUI browsers have
> a strong element of local policy control and are typically set to revalidate
> only once per session.

Read my words in the beginning of this letter.
You probably not right: "REFRESH 60 seconds" usually works properly with GUI...
If any browser revalidate something once per session it obviously
break the spec: there are a special http/1.1 rules for this,
for example, Expired or "no-cache" documents should be validated every time
we are trying to access them.

The rules insist on validating (either by local calculation or with remote)
for entry using of cached data, no more nor less.
I think we may be a little more strict and ask the remote (server or proxy)
for validation when we could do this but too lazy to do our calclulations.
Anyway, this is a small overhead and could be easily done
when the main code will be implemented (not so easy!).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]