[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LYNX-DEV problem when page has only one input area
From: |
Benjamin C. W. Sittler |
Subject: |
Re: LYNX-DEV problem when page has only one input area |
Date: |
Thu, 14 Nov 1996 10:49:44 -0700 (MST) |
On Wed, 13 Nov 1996, Foteos Macrides wrote:
> Filip M Gieszczykiewicz <address@hidden> wrote:
> >Greetings. I'm not sure if this issue has been beaten to death here,
> >so let me know if it has.
> >
> >It seems that people get the notion that Netscape is more than a browser
> >and also a validator - "if my pages work in Netscrape, they must be ok".
> >Too many of these same people don't know and don't care that this is
> >absolute BS.
> >
> >As a result, a lot of broken HTML gets out there and really messes with
> >stricter browsers like, say, lynx. I just heard that Fote fixed the
> >unclosed <form> - YES! I run across these all the time... another
> >infamous is <a name="blah"> (no closing </a> or tag text), and various
> >combinations of markup in links. Example:
> >
> ><ul>
> ><li><a href="Howdie1"><b>Howdie1</a>
> ><li><a href="Howdie2"><i>Howdie2<i></a>
> ><li><a href="Howdie3">Howdie3</b></a>
> ><li><a href="Howdie4"><b>Howdie4</b></a>
> ><li><a href="Howdie5">Howdie5</a>
> ></ul>
> >
> >The first doesn't show up as the link, neither does the second one
> >(and the bullet is gone from now on, as well), the third one is OK,
> >as is the fourth. The fifth one hoses up but DOES select, and shows
> >some silly "* Howd" on the sixth line...
>
> Lynx has no realistic prospect of handling HTML that bad
> as intended. When it's that bad, the objective is simply not to
> crash. Lynx can't ignore any interdigitated container (SGML_MIXED)
> tags that it recognizes, which is functionally what you have there.
> It must substitute the end tag it's expecting. It should unwind to
> what it's expecting, but I changed it not to do that a year or so
> ago, and that helped. I also changed the worst offenders to
> SGML_EMPTY, and look for their end tags explicitly, so those can
> be interdigitated, but you can't do that for everything and still
> have reasonable performance.
That may be so, but what if we used a "tag stack" model and only
unwound as far as absolutely necessary when encountering illegal markup?
(Sorta equivalent to making all end-tags omissible.) So for the abocve
doc you might have the following virtual tag sequence: (not strictly HTML)
<UL>
<LI><A HREF="Howdie1"><B><#PCDATA "Howdie1"></#PCDATA><!--
Parse Error: non-ommissible /B omitted.
--></B></A>
</LI><LI><A HREF="Howdie2"><I><#PCDATA "Howdie2"></#PCDATA><I><!--
Parse Error: non-omissible /I omitted.
--></I><--
Parse Error: non-omissible /I omitted.
--></I></A>
</LI><LI><A HREF="Howdie3"><#PCDATA "Howdie3"></#PCDATA><!--
Parse Error: /B does not close any open element.
--></A>
</LI><LI><A HREF="Howdie4"><B><#PCDATA "Howdie4"></#PCDATA></B></A>
</LI><LI><A HREF="Howdie5"><#PCDATA "Howdie5"></#PCDATA></A>
</LI></UL>
(All inferred tags are shown at the point of inference, possibly
preceded by an error message in a comment)
The reason for #PCDATA is that text implies the close of some HTML
tags, such as HR and IMG, and even DIV if HTML.Recommended is used,
although this would produce a parse error. One way to accomplish this
would be to have a two-dimensional data structure (perhaps an array)
indexed in both dimensions by tag name (including #PCDATA). Let's call
it OpenCloses[i][j]. OpenCloses[i][j] could take on one of two values,
depending on i and j:
0. false
<i> does no imply </j>
(the case when i==j==I, and when i==B and j==A)
1. true
<i> implies </j>
(the case when i==j==LI, and when i==#PCDATA and j==DIV)
Another data structure, call it OmitClose[i], would contain true (1)
when </i> is omissible (i.e. HTML, BODY), and false (0)
elsewhere. Thie would be used to generate the error messages above.
This, however, doesn't work for elements which may contain only a
certain sequence of contents (HTML3 UL, for example may have an LH
only at the beginning of the list.) To handle this we need to build a
one-dimensional array of regular expressions (or an equivalent)
indexed by tag. For example, the HTML3 entry for UL and OL might look
like this:
LH?, LI+ (excerpt from HTML3 DTD... optional LH followed by one or more
LIs.)
If this were implemented, Lynx could actually understand stuff like
</> and <> (SHORTTAG) that only full-featured SGML systems
understand. In fact, Lynx could be a real SGML system. According to
the HTML DTD, SHORTTAG *is* allowed in HTML docs.
> It may behave somewhat differently with that bad markup
> now that it's unwinding on EOF, but it won't make it "right".
EOF should cause a complete unwinding of the tag stack, producing
error messages for all non-omissible tags.
Perhaps Lynx 3 should be an SGML system? If not, I'll probably use Lynx
as the fetching engine for a web browser built around a valid parser,
but it will take awhile to develop.
Just a thought.
--
Ben
;
; To UNSUBSCRIBE: Send a mail message to address@hidden
; with "unsubscribe lynx-dev" (without the
; quotation marks) on a line by itself.
;
- Re: LYNX-DEV problem when page has only one input area, (continued)
- Re: LYNX-DEV problem when page has only one input area, Foteos Macrides, 1996/11/13
- Re: LYNX-DEV problem when page has only one input area, DR. JUAN-CARLOS LERMAN, 1996/11/13
- Re: LYNX-DEV problem when page has only one input area, DR. JUAN-CARLOS LERMAN, 1996/11/13
- Re: LYNX-DEV problem when page has only one input area, DR. JUAN-CARLOS LERMAN, 1996/11/13
- Re: LYNX-DEV problem when page has only one input area, Foteos Macrides, 1996/11/13
- Re: LYNX-DEV problem when page has only one input area, Foteos Macrides, 1996/11/13
- Re: LYNX-DEV problem when page has only one input area,
Benjamin C. W. Sittler <=
- Re: LYNX-DEV problem when page has only one input area, Hiram Lester, Jr., 1996/11/14
- Re: LYNX-DEV problem when page has only one input area, Larry W. Virden, x2487, 1996/11/15
- Re: LYNX-DEV problem when page has only one input area, Filip M Gieszczykiewicz, 1996/11/15
- Re: LYNX-DEV problem when page has only one input area, Larry W. Virden, x2487, 1996/11/15
- Re: LYNX-DEV problem when page has only one input area, Tom Zerucha, 1996/11/18
- Re: LYNX-DEV problem when page has only one input area, Benjamin C. W. Sittler, 1996/11/15
- Re: LYNX-DEV problem when page has only one input area, Benjamin C. W. Sittler, 1996/11/15
- LYNX-DEV monster modular browser, Al Gilman, 1996/11/15
- Re: LYNX-DEV monster modular browser, Benjamin C. W. Sittler, 1996/11/17
- Re: LYNX-DEV problem when page has only one input area, Christopher R. Maden, 1996/11/15