lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

LYNX-DEV monster files & GridText tuning


From: Klaus Weide
Subject: LYNX-DEV monster files & GridText tuning
Date: Mon, 28 Jul 1997 20:22:19 -0500 (CDT)

Here are three big files:

sol:kweide$ ls -l ls-lR-*.html
-rw-r--r--   1 kweide   sysadmin  1000424 Jul 27 22:55 ls-lR-1M.html
-rw-r--r--   1 kweide   sysadmin  2000016 Jul 28 05:02 ls-lR-2M.html
-rw-r--r--   1 kweide   sysadmin  4000149 Jul 27 22:34 ls-lR-4M.html

They are simply the first (approximately) 1M, 2M, and 4M bytes of
an uncompressed copy of <URL:
http://SUNSITE.UNC.EDU/pub/Linux/ls-lR.html.gz>. (NOTE: don't click
that link just yet...)  That file in full has an uncompressed size of
nearly 10M. It is also invalid HTML, but that's beside the point here.

When the files were fully loaded, lynx needed significantly more memory
to hold the rendered version and its structures, about a factor of 3.
The 4M file made the lynx process grow to ~ 14 MB, from 2 MB at startup
(these were compiled with debugging and not stripped).  This, as well as
the time needed for loading, of course depends heavily on the contents
of the files.  In this case, they are dense with links, probably more than
half of the bytes are long HREF URLs.  The 4M version has more than
20000 links! 

I am referring to recent Lynx 2.7.1ac-0.* here, but there shouldn't be
any relevant difference between this and the fotemods code.

I give some timing data below.  They are only meant as rough indications.
I ran commands like the following to get them:

sol:kweide$ time lynx2-7-1ac-42/lynx.orig -nolist -dump ls-lR-1M.html >/dev/null

real       16.7
user       16.2
sys         0.4

(that is, about 16 seconds) and then show only the "user" value.
Note that these times are not very relevant for typical and "normal" use
of Lynx.  Normally you wouldn't want to load such big files, and in
interactive use other factors that those tested with "-nolist -dump"
come in.  Also on a system with limited memory things may turn up a bit
more unpleasant...

Anyway, here are the results for unmodified Lynx (devel code, compiled
with ncurses - which shouldn't play a role, but it gets slower if compiled
with the alpha color-style code enabled.)

   1M:   16.2         2M:   1:06.2         4M:   4:25.0

The simple change given in Appendix A does not alter functionality and
gives a noticable improvement:

   1M:    9.2         2M:     37.1         4M:   2:22.4

There is, for both versions, an approximately quadratic growth of
loading time with the file size.  As the size doubles,  it takes 4 times
as much time to load the file.

Another change, see B below, improves things more, especially for the
biggest files:

   1M:    8.1         2M:     17.7         4M:     43.3

The behavior is not dominated by a quadratic law.  (It seems rather linear,
but there's not really enough data to say; maybe someone else wants to run
a test on the full 10M :) )

So what was Lynx doing?  I was spending a lot of time going over the list
of links already handled, for each new link (or line) it was adding to
the text.  The two instances of this which I found were both added since
2.7.1.  Change A is a simple optimization which avoids this looping for
the most common case (documents which do not have elements with NAME or ID
attributes within A elements).  Change B is removing code which now is
necessary, so I won't remove it execpt for testing.  Maybe Fote can come
up with a more effeicient implementation.

Or maybe we should just leave things as they are, 'cause optimizing for
multi-megabyte HTML files may be just wrong priorities.  Actually, the
fact that lynx needs more and more time to render a long document (with
many anchors) can be seen as a kind of protection, it slows down the rate
at which lynx can drive a machine into thrashing...  (but it comes at the
price of wasted cpu cycles).

I noticed an annoying thing with these big files.  Lynx doesn't check
for a 'z' key interrupt when loading a local file, and it also doesn't
give any progress indication.  So a user who has been misguided into
loading a 10M file will see nothing happening, and probably think that
lynx is "broken".  So I am adding a check for 'z' and a progress
indication for local files.  I let the display kick in only after a few
hundred k have been read, since in the more normal case they would
probably just be distracting, and fly by too fast to read.  Also there
probably is no point in making loading of short files interruptible.
(The fread() itself cannot be interrupted by 'z', I simply assume that
the read itself doesn't hang for local files.  If that happens, ^C and
^Z may be still possible.)


   Klaus

Appendix A

One line changed at the start of HText_endAnchor:

Index: src/GridText.c
*** src/GridText.c.orig Sun, 20 Jul 1997 18:09:21 -0600 dickey
--- src/GridText.c      Mon, 28 Jul 1997 17:32:06 -0600 kweide
***************
*** 2457,2463 ****
       *  without needing to close any anchor with an HREF
       *  within which that link might be embedded. - FM
       */
!     if (number <= 0) {
        a = text->last_anchor;
      } else {
          for (a = text->first_anchor; a; a = a->next) {
--- 2459,2465 ----
       *  without needing to close any anchor with an HREF
       *  within which that link might be embedded. - FM
       */
!     if (number <= 0 || text->last_anchor->number == number) {
        a = text->last_anchor;
      } else {
          for (a = text->first_anchor; a; a = a->next) {


Appendix B

Remove (comment out) the following block at the bottom of split_line:
(I am recommending to do this, except for testing!)

    /*
     *  If we split the line, adjust the anchor
     *  structure values for the new line. - FM
     */
      if (split > 0) {
        for (a = text->first_anchor; a; a = a->next) {
            if (a->line_num == CurLine && a->line_pos >= split) {
                a->start += (1 + SpecialAttrChars - HeadTrim - TailTrim);
                a->line_pos -= (split - SpecialAttrChars + HeadTrim);
                a->line_num = text->Lines;
            }
        }
    }

  


;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]