lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev TRST : see LHFB


From: Klaus Weide
Subject: Re: lynx-dev TRST : see LHFB
Date: Wed, 17 Nov 1999 06:10:33 -0600 (CST)

On Sat, 13 Nov 1999, Philip Webb wrote:

Reoredered, to comment on the general first:
> you started this: can we keep trying to improve it a step at a time?

Someone should implement full table support, or some externel renderer or
script has to be used, for non-simple tables.  The presence of TRST
doesn't change this.  It isn't step-by-step improvable into full table
support - at least *I* don't know how.  By it's nature, it needs "simple"
tables.  Sure, we can probably push the boundary between simple enough
and not simple enough a little bit.  But that goes only so far.  If you
need support for table cells with arbitrary multiline contents (and some
of the tables you are dealing with really seem to require that), the
"simple" idea just isn't a very good starting point.

Of course, you can try to preprocess the HTML to make it more suitable
(which is what you are doing).  If you find some strategies that work
for a large number of cases, the equivalent transformations *might*[1]
even be easy to do in the Lynx code, basically before the TRST code sees
the stuff - maybe as some sort of "TableSoup" mode.  I think it's more
likely that such pre-tranformations will have to vary a lot with the
individual cases, so there may not be a general enough strategy.  And
what can be done will still be fundamentally limited by the "simple"
approach - unless you pack so much logic into the preprocessor that it
becomes a full table renderer itself...

[1] I say "might" because sed/awk/perl etc. can easily do things that
Lynx's one-pass HTML.c cannot do easily: esp. scanning ahead to see what's
in an element's content at the point of a start tag.


> i hope you are realising from my example (ocaa)
> that tables can be a central part of useful real-life documents,
> not just for people who watch football (grin at KD).

Well, somehow football sites seem to manage to produce better Web
pages than the OCAA...  In spite of all the financial support by AES
Kingston Inc., Canadian Niagara Power, Commission for Environmental
Cooperation, etc. etc....

> 991113 Klaus Weide wrote: 
> > On Sat, 13 Nov 1999, Philip Webb wrote:
> >> goto  www.chass.utoronto.ca/~purslow/ocaa2.html ,
> >> which has been tidied up with my awk program;
> >> look at Tables 3-1 to 3-5 , starting at page 27 (ie enter  27p ).
> >>> What about the headings?
> > there is nothing in the markup that says
> > "this is a column heading row" or "this is a 'real' table row".
> 
> yes: i've changed appropriate <TD> ... </TD>'s to <TH> ... </TH>'s
> in  www.chass.utoronto.ca/~purslow/ocaa3.html ,
> which improves the processing in some tables, but not others.

Well, now that I want to look at it, it's not there any more...

However, TH vs. TD should not make a big difference - it should
only affect cell alignment *if* the table is TRST-rendered.
There is another small difference in the SortaSGML data (affecting
which surrounding elements can be closed as error recovery).
I am surprised that replacing TD with TH would improve any of your
examples.

Note that TH can also be used for columns of cells (e.g. at the left
side), or even in the middle of a table.  In general it doesn't seem
wise to accord the TH/TD difference any great significance.

Anyway, if you want me to look at something specifically, please cut
out *one* table into a separate file.  Easier to understand, easier to
wade through traces...

> i have been L/R-scrolling using  most , with Lynx  -width=200 .
>  
> next point: the <TD> & <TH> tags contain attributes WIDTH="n%" :
> can't TRST use these to wrap headings/numbers which exceed those limits,
> allowing for the available column space (whatever Lynx -width says)?
> ie if i have an 80-col display, of which Lynx uses  74 cols ,
> TRST would say: "rounding down, we have  18 col  for each <TH>/<TD>,
> so i'll wrap with stated alignment (or default TH left, TD right)"?

The problem isn't recognizing the WIDTH, the problem is whether we can
do anything with that information.  Essantially the problem is in your
two (or 3?:)) little words "i'll wrap".  The TRST code itself cannot wrap;
the problem to TRST handling is not just with rows that are too long, but
also with lines that are wrapped - remember, there was explicit "wrapping"
(<br>, <p>) which you removed in he first place.  Introducing the wrapping
again by some sort of automatic procedure will just get us back to square
one.

Somewhat more generally, I see very little reason why WIDTH should be
observed at all - as long as we don't have a mechaninsm that can reflow
and linebreak cell contents and still render a table of such cells in
tabular form (iow: full table support).  The given WIDTHs will in general
be either too wide to make optimal use of the horizontal space, or too
narrow to make everything fit.

> in fact, if the <TH>/<TD> WIDTH's are observed,
> a table which calls <TH>'s <TD>'s (as the previous ocaa2.html does)
> won't cause problems, tho' L/R alignments may be imprecise.

Again, the major problem is in what you mean by "observe".

> TRST can keep track of how many lines/row it must allow (in pass 1),

In general, that number is 1 (one).

Sometimes it manages to do something interesting with TRs that end up
on more than one line.  At most one of those resulting lines will be
subject to TRST reformatting.  Not that I fully understand any more when
this is the case...  For an example of this, see
<http://www.securityfocus.com/templates/archive.pike?list=1&threaded=1>.
Note that the DIV in the middle cells causes Lynx to break the line before
and after the contents.  Note how the 3rd columns cells end up being
aligned under "Last Msg by" from the first row (the only one that isn't)
broken.  Now I'm not even sure whether this actually should happen...

> then use that information when reformatting the table later.

I don't know how you want it to format something like

 <TR><TD>Blah<br>more blah</TD><TD>Other blah<br>more other blah</TD></TR>

or which of the resulting partial (before/after the break) cells should
enter into the max. columns width calculations.  That's basically the
same problem, whether the <br> was already in the input or whether it
represents some break introduced by some Lynx logic to avoid overlong
lines (what you seem to suggest wrt. WIDTH).  But (as I think you
understand) it will never look like

         Blah      Other blah
         more blah more other blah

with the TRST approach.

  ----

Since you are testing external scripts anyway... Have you tried any
of the stuff at http://www.crl.com/%7Esubir/lynx/patches.html recently?

  Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]