[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LYNX-DEV monster files & GridText tuning
Re: LYNX-DEV monster files & GridText tuning
Tue, 29 Jul 1997 18:19:17 -0500 (EST)
Klaus Weide <address@hidden> wrote:
>They are simply the first (approximately) 1M, 2M, and 4M bytes of
>an uncompressed copy of <URL:
>http://SUNSITE.UNC.EDU/pub/Linux/ls-lR.html.gz>. (NOTE: don't click
>that link just yet...) That file in full has an uncompressed size of
>nearly 10M. It is also invalid HTML, but that's beside the point here.
>When the files were fully loaded, lynx needed significantly more memory
>to hold the rendered version and its structures, about a factor of 3.
>The 4M file made the lynx process grow to ~ 14 MB, from 2 MB at startup
>(these were compiled with debugging and not stripped). This, as well as
>the time needed for loading, of course depends heavily on the contents
>of the files. In this case, they are dense with links, probably more than
>half of the bytes are long HREF URLs. The 4M version has more than
>I am referring to recent Lynx 2.7.1ac-0.* here, but there shouldn't be
>any relevant difference between this and the fotemods code.
None. I had checked that out back when he posted the message.
The uncompression worked fine in all cases. For the text/plain version,
it was fetched, uncompressed, rendered, and displayed in about 10 secs.
The text/html version has pitifully Bad HTML, but nothing problematic
for Lynx. The problem is that it has over FIFTY-EIGHT THOUSAND Anchors,
which takes a while for Lynx to deal with. :)
>The simple change given in Appendix A does not alter functionality and
>gives a noticable improvement:
>There is, for both versions, an approximately quadratic growth of
>loading time with the file size. As the size doubles, it takes 4 times
>as much time to load the file.
>Another change, see B below, improves things more, especially for the
>The behavior is not dominated by a quadratic law. (It seems rather linear,
>but there's not really enough data to say; maybe someone else wants to run
>a test on the full 10M :) )
>So what was Lynx doing? I was spending a lot of time going over the list
>of links already handled, for each new link (or line) it was adding to
>the text. The two instances of this which I found were both added since
>2.7.1. Change A is a simple optimization which avoids this looping for
>the most common case (documents which do not have elements with NAME or ID
>attributes within A elements). Change B is removing code which now is
>necessary, so I won't remove it execpt for testing. Maybe Fote can come
>up with a more effeicient implementation.
>Or maybe we should just leave things as they are, 'cause optimizing for
>multi-megabyte HTML files may be just wrong priorities. Actually, the
>fact that lynx needs more and more time to render a long document (with
>many anchors) can be seen as a kind of protection, it slows down the rate
>at which lynx can drive a machine into thrashing... (but it comes at the
>price of wasted cpu cycles).
This will be a rare case, but if the handling of such things
can be improved, why not?
>I noticed an annoying thing with these big files. Lynx doesn't check
>for a 'z' key interrupt when loading a local file, and it also doesn't
>give any progress indication. So a user who has been misguided into
>loading a 10M file will see nothing happening, and probably think that
>lynx is "broken". So I am adding a check for 'z' and a progress
>indication for local files. I let the display kick in only after a few
>hundred k have been read, since in the more normal case they would
>probably just be distracting, and fly by too fast to read. Also there
>probably is no point in making loading of short files interruptible.
>(The fread() itself cannot be interrupted by 'z', I simply assume that
>the read itself doesn't hang for local files. If that happens, ^C and
>^Z may be still possible.)
That's a longstanding problem, particularly for the UMN's
"All the Gopher Servers in the World" link back in the gopher days.
Your A looks OK, but I haven't checked it. Your B will
break too much that's needed now that ID applies to all BODY
elements in the so-called HTML 4.0 and we need to deal adequately
with technically embedded links (which is why those mods were made).
Foteos Macrides Worcester Foundation for Biomedical Research
address@hidden 222 Maple Avenue, Shrewsbury, MA 01545
; To UNSUBSCRIBE: Send a mail message to address@hidden
; with "unsubscribe lynx-dev" (without the
; quotation marks) on a line by itself.