[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev HTDoRead() HTTCP.c possible bug - retry limit set too high?

From: Klaus Weide
Subject: Re: lynx-dev HTDoRead() HTTCP.c possible bug - retry limit set too high?
Date: Sun, 9 Jul 2000 13:50:44 -0500 (CDT)

On Sat, 8 Jul 2000, Vlad Harchev wrote:

> On Fri, 7 Jul 2000, Klaus Weide wrote:
> > On Fri, 7 Jul 2000, Vlad Harchev wrote:
> > > On Fri, 7 Jul 2000, Klaus Weide wrote:
> > > > On Fri, 7 Jul 2000, Vlad Harchev wrote:
> > > > >   Seems we should add new lynx.cfg setting READ_TIMEOUT to control 
> > > > > this (there
> > > > > already exists CONNECT_TIMEOUT). Does anybody object against it? 
> > > > 
> > > > As long as you keep the current behavior by default...
> > > 
> > >   I assume that by "current behaviour" you mean current value of timeout.
> > 
> > I meant current behavior in the same situation.  What you call "current
> > value of timeout" isn't all there is to it.
>  I don't get you - I assume you understand that 'READ_TIMEOUT' addition will
> just substitute '180000' with some expression 

That wasn't clearly stated, although I could assume that.

But anyway, that's not all that your change will do.  You will at least
also add a lynx.cfg and/or command line option, with some documentation
that implies a promise that READ_TIMEOUT will act as a read timeout.
Can you keep that promise, in all situations?  Or do you have to qualify

> - so all behaviour will remain the same. 
> > First of all, the 180000 wasn't origninally meant as a timeout.  Rather
> > as a protection against an infinite loop, which is subtly different:
>  Yes, this value defines the value of timeout (18000 seconds by default).

No, 180000 * (approx. 100 ms + extra processing time), which is not the same
as 180000 * (exactly 100 ms).  Small errors accumulate.  The clock will
run faster if the process gets for some reason many interrupts that result
in EINTR. The clock will not run while the process is stopped (^Z).

All this is to say that the 180000 wasn't originally meant as a timeout.
If it had been, it probably would have been implemented differently, to
work more reliably as a timeout.
Some historic CHANGES entries:

07-04-95 (Enjoy the fireworks!!!  8-)
* Increased the connect() and select() while()-looping limit in HTTCP.c
  to 30,000 tries. - FM
* Increased limits in select() loops to 5000 tries. - FM
* Increased the while() loop limit for select() tries in HTTCP.c to 500. - FM
* Limited the while() loop for select()'s in HTTCP.c to 50 tries, to help
  reduce likelyhood of a runaway CPU on undetected terminal disconnects. - FM

Note especially the last one, it sheds some light on the original motivation.
Note CHANGEs entries in the same timeframe that mention fixes to BSDselect,
that should give you an idea why a protection against infinite loop was needed.
Reading old lynx-dev messages would probably also be illuminating.

> >     while (!ready) {
> >         /*
> >         **  Protect against an infinite loop.

Note that it doesn't say "Time out after too meany tries" or something similar.

> >         */
> >         if (tries++ >= 180000) {
> >             HTAlert(gettext("Socket read failed for 180,000 tries."));

Note that it doesn't say "timed out" or something similar.

> > Secondly, note that not all systems will make use of your new READ_TIMEOUT
> > anyway.  Only those for which the
> > 
> >     #define NETREAD  HTDoRead
> > 
> > is not overridden in www_tcp.h will.
>   Yes, and it looks like cygwin and OS/2 will use HTDoRead.

And at least some instances on VMS won't.  Or so it seems - I don't know
if those combinations of Lynx with specific netwrking libraries are still

Anyway, you may end up promising a READ_TIMEOUT that doesn't actually have
any effect for some users.

Also, you seem to be thinking about *decreasing* the timeout with the new
hypothetical option.  But can one use it to increase the timeout?  What is
the absolute maximum?  Can one specify infinity?

The promise of using a specified (long) timeout also will not work if there
already is a shorter timeout, outside of lynx's control, imposed by OS or
the TCP {protocol,implementation} or a proxy server in the middle.

> > >  As for first part ("better use the script below") - we've discussed this
> > > before. This won't work for crawling 
> > 
> > That is not the situation here.  The original poster mentioned -dump, no
> > traversal.
> > 
> > Lynx's "traversal" code is quasi-interactive anyway.  You have a tty.
> > A normal 'z' should work just fine to interrupt a hanging connect or
> > read.  It did when I last checked.
>   I didn't know about 'z'.

And to expand on that, a specific READ_TIMEOUT option isn't needed for any
interactive lynx session, since one can always 'z'ap.  Specifying READ_TIMEOUT
for an interactive session just means you deny yourself the possibility to
decide to "give the connection a chance for a while longer".

So, just to make clear what we are talking about: addingt
is meant to be useful only with -dump or -source.

> > > Also, if we use the script, we can only limit
> > > the total time of the crawling session, not the timeout for each 
> > > individual
> > > document.
> > 
> > True.
> > 
> > It depends on what the problem to be solved is (which nobody has clearly
> > stated).  As I wrote explicitly, I assumed that
>   Let's think about entire spectrum of problems with respect to timeout on
> reading, not just described in original post.

Fine if you have the time to think of every conceivable situation. :)
But it seems to me that you are mostly concerned with parts of the spectrum
that you have no experience with or personal use for (crawling) - and that
could normally be better done with not-lynx, anyway.

> > > > the problem
> > > > is really: 'Non-interactive lynx processes hang around for too long
> > > > under some conditions'.
> > 
> > If you are talking about "crawling session", you are talking about something
> > else, apparently.  At least you're not talking about lynx with -dump.
>   Yes, I was talking about recursively storing rendered versions of documents
> recursively.

The canonical recommendation for this kind of thing has bee "use wget or
similar", for quite some time.

> > > > Better learn how to kill a process so that it *never* can run longer
> >     ====================================================================
> > > > than a max time.  Take the shell script below as a starting point.
> >     ===============
> > 
> > I stand by that.  Better learn how to do that, if that's what you need.
> > 
> > I didn't mean that a -read_timeout option would be useless.  Just that
> > in the situation at hand, as well as others (but not all), it is not
> > the most straightforward or reliable way to fullfil the requirement /
> > solve the problem.
>   Yes, that's what I mean - it won't me useless. But why do you think that it
> will be not the most reliable way to lfil the requirement / solve the problem?

As I've already said, because your READ_TIMEOUT won't always work ans one
might expect.


; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]