[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Why doesn't lynx cache HTML source?

From: David Woolley
Subject: Re: lynx-dev Why doesn't lynx cache HTML source?
Date: Sun, 15 Nov 1998 11:57:12 +0000 (GMT)

> > A large proportion of web pages these days are uncacheable, often for
> > misguided commercial reasons.  I strongly suspect that a disproportionate
> > number of the ones that people will need to view source on or parse in
> > different ways will fall in this category.
> Thats mostly because Lynx say "HTTP/1.0" in it's header and server reply so.
> HTTP 1.1 have unique ETag that allow advanced validation for any cached data.
> So most benefits from lynx cache - to receive short responce like HEAD
> instead of fetching a complete document (sometimes even a head-like request
> not needed but this is an obvious check and not for your case).

Very little of the web server and cache software around supports ETag.
But, in any case most cacheability failures are due to apparently deliberate
attempts to frustrate it (accurate hit counts are much more saleable that
fast access, it seems) or dynamic content that goes way beyond content
negotiation (I don't think many people even know it is possible).

> You will help me considerably if you read spec and compare against
> the comments in HTTP.c - actions on return status (200, 304, etc, etc.)

Looks like I will have to do this.

> Read my words in the beginning of this letter.
> You probably not right: "REFRESH 60 seconds" usually works properly with 
> GUI...

That's because Refresh 60 (which is NOT HTTP or HTML!) is a GUI browsers
designer's invention and the browser obviously will revalidate in that
case as the page is obviously dynamic.

> If any browser revalidate something once per session it obviously
> break the spec: there are a special http/1.1 rules for this,
> for example, Expired or "no-cache" documents should be validated every time
> we are trying to access them.

I think that IE4 will honour Expires - in fact we had to remove an
immediate expires from a dynamic page because it was expired even
before it could handed on to an OLE helper (Excel).  I'd have to check
the status of no-cache.  But most pages do not have these headers (and
when they start to, will probably have the authoring tools defaults!).
IE4 does not use heuristics on Last Modified Date and will still cache
pages without this header and without other cache controlling headers.
There is a user configuration option with three values: never revalidate,
validate once per session and always revalidate.  Out of the box it
is once per session and most users will never change it from that.
Lynx currently also behaves as once per session!

Calculation based revalidation normally requires an extensive set of rules,
e.g. most people configure external caches to assume that the likely lifetime
of a .gif is a much larger proportion of its lifetime up to the point of
fetching than they would for a .html or .htm.  The rules may also favour
certain sites.

The formal backing for this behaviour is section 13.13. of RFC 2068,
although there might be semantic arguments about the border between
caching and history mechanisms (as a user, I would expect the same result
from using the back button to return to a home page and using a link
on the subordinate page); in my view, all those wanting unrendered
caching in Lynx to support the \ command would want the history
interpretation, to avoid refetching of dynamic content.

In fact this section has a warning that using revalidation for history
pages may actually cause site designers not to use cache control
information properly on their pages, because to do so migh force
unexpected reloads and unstable content.

> The rules insist on validating (either by local calculation or with remote)
> for entry using of cached data, no more nor less.
> I think we may be a little more strict and ask the remote (server or proxy)
> for validation when we could do this but too lazy to do our calclulations.
> Anyway, this is a small overhead and could be easily done
> when the main code will be implemented (not so easy!).

Revalidation itself can sometimes be slow (and, as was the main point of the
article, will very often result in a complete reload of the page).  Slowness
is particularly a problem for GUI browsers, where a large number of GIFs
may have to be revalidated!  It also makes it impossible to operate the
browser in offline mode.  IE4 has explicit support for this, but even
Lynx will satisfy pages from its rendered cache, even though you've
stopped paying the phone company or ISP for connectivity.

Actually, if there is a case for source caching in Lynx, as against an
external caching proxy, it is that it can relax the revalidation rules.

Abstract from RFC 2068:

   User agents often have history mechanisms, such as "Back" buttons and
   history lists, which can be used to redisplay an entity retrieved
   earlier in a session.

   History mechanisms and caches are different. In particular history
   mechanisms SHOULD NOT try to show a semantically transparent view of
   the current state of a resource. Rather, a history mechanism is meant
   to show exactly what the user saw at the time when the resource was

   By default, an expiration time does not apply to history mechanisms.
   If the entity is still in storage, a history mechanism should display
   it even if the entity has expired, unless the user has specifically
   configured the agent to refresh expired history documents.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]