lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV more on parsing error recovery


From: Klaus Weide
Subject: Re: LYNX-DEV more on parsing error recovery
Date: Sat, 10 May 1997 12:52:37 -0500 (CDT)

On Thu, 8 May 1997, Laura Eaves wrote:

> The new parser seems to handle <form>s in pre blocks, but still
> doesn't close the pre block at </pre> if it's expecting a </form>:
>       ...
>       <pre><form ...>
>       ...
>       </pre>...</form>
> While it doesn't close the form prematurely like in 2.7.1, I think it should
> be able to close the PRE block even when it's expecting a </form>.
> I noticed the new parser puts FORM tags back on the stack.  

Yes.  It is an attempt to find out how much a more structured approach
(as opposed to a "tag salad" approach) can do to deal with the invalid
stuff.  As such, it is somewhat extreme.  As I have said elsewhere, in
the end the best approach will likely be a combination or compromise (if
it turns out that there is anything good left which the more structured
approach has to offer).  That probably includes treating FORM as in Fotes'
code (and the not-so-well-named "old" parsing).

But, since in another message you claimed that improving Lynx's handling
of invalid stuff doesn't affect its handling of valid HTML, I also have to
comment on the following:

> Since forms don't affect style [and don't nest, ...]

That is what people seem to think and to expect, but is it true?
If by "don't affect style" you mean "doesn't affect where lines are broken
or where paragraphs begin or end", then your statement is not correct
(a) in regard to Lynx's behavior, and neither (b) with respect to
"correct" HTML parsing.

As for (a), Lynx forces (usually) a line break when a </FORM> is
encountered (There's a LYEnsureSingleSpace() somewhere).  (This doesn't
happen, somewhat inconsistently I think, at a <FORM> start tag.)

As for (b), well FORM is declared as a block element in DTD's, it cannot
be contained in elements which don't allow that kind of content.
That includes P (which *is* a container in valid HTML parsing and not a
"command" to insert some line breaks or empty lines).  A "real" SGML
parser would supply </P>'s automatically before both the <FORM> and the
</FORM> in the following example, and not doing this does affect the 
handling of valid HTML since </P> can be validly omitted.

<P>Some paragraph text.
<FORM ACTION=... >
More text, is it part of the paragraph?
... form stuff ...
<P>
Another paragraph starting within the form.
</FORM>
More text, is it part of the paragraph which
started within the form?


Once again, I am not saying here that Lynx shouldn't do form handling like
in Fote's code.  I am just showing an example where treating FORM totally
separate from other block elements leads to different behavior for valid
documents.  (If applied without further checks.) 

   Klaus


;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]