lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev strict SortaSGML rules for <PRE>...</PRE> content


From: Vlad Harchev
Subject: Re: lynx-dev strict SortaSGML rules for <PRE>...</PRE> content
Date: Thu, 5 Aug 1999 16:17:56 +0500 (SAMST)

On Fri, 6 Aug 1999, Klaus Weide wrote:

> On Thu, 5 Aug 1999, Vlad Harchev wrote:
>[...] 
> I think adding options for this kind of micro-changes of parsing details
> is bad.  Where does it lead, and where will it end?
> Today you note that there are "several russian sites" that are broken,
> which happen to look better with one little change.  So you add a flag
> to make that change optional.  In two weeks somebody else notices that
> some (or just one) of his/her favorite sites "benefits" from another
> little change, and adds a flag for that.  And so on.  Soon we'll have

 Those sites are completely unusable without this change in SortaSGML mode,
not that they look slightly worse.
 I agree that approach of adding options for each small hack is slightly ugly
(but this is called "true configurability"). I agree that DTD parsing would be
better for this, but entire lynx should be rewritten (if DTD will specify
attributes too, not only nesting details) if DTD support is implemented.

> twenty or fifty or a hundred new options
>   -allow_headers_in_pre
>   -allow_pre_in_table
>   -allow_foo_in_blah
>   -allow_anchors_to_span_table_cells
>   -force_empty_font_tag
> and so on.  Or maybe we have only five or ten.  How many, and which,
> will depend on what kinds of pages some folks close to the lynx
> development process and able to create patches happen to read.
> 
> The proper way to achieve this kind of detailed configurability of
> parsing would be to make lynx use (and parse) a real DTD, which could
> then be taken from a user-specified file.  Nobody is thinking of doing
> that, afaik.  But if detailed control as above is really desired, this
> should be the way to go.

 As I said above, implementing complete DTD support is useless since lynx uses 
fixed integers to denote tags and their attributes. There are several
approaches in providing semi-DTD support for lynx IMO:
 1) Current state
 2) Since the lynx is OpenSource, we can document what to change in HTMLDTD.c
      to achieve desired behaviour (possibly moving out to the separate source
      file the static initialization of tags_new and tags_old) so users will
      be able to modify them at compile-time.
 3) Extend 2) and provide dynamic-linking or dynamic libraries 
     (so that symbols tags_new and tags_old are exported from some binary
      file)

 1) :), 2) and 3) don't require any additional changes in SGML.c and in HTML.c

 But seems providing even semi-support for sorta DTD doesn't worth it (the
fact that lynx is OpenSource is enough in this area).

>[...] 
> As a more specific criticism, I feel that changing the 'contains' and
> 'icontains' as you suggest - whether dynamically at runtime or fixed -
> leads straight back to "TagSoup".  It is changing the info about tags
> to not express their "real" content model, but something made up, in
> order to achieve some specific error recovery behavior.  (More or less
> the same as saying an element is empty when it isn't.  I invented
> "SortaSGML" parsing to get rid of some of this [and therefore get rid
> of the need to handle mis-ordered tags at the HTML.c level].)
> 
> Instead I would strongly prefer - as long as we are just thinking about
> the "DTD" structures such as they are and not a fundamental redesign -
> that 'contains', 'icontains', 'contained', and 'icontained' continue[*]
> to reflect the "real", official DTD, and that changes in behavior are
> done by changing only the 'behavioral' fields ('canclose', 'flags').
> 
> Coming back to the Hn vs. PRE case, the effect desired by Vlad for
> <Hn> tags can be achieved by changing a bit in 'canclose' in each the
> T_Hn macros.  It will also have an effect on <Hn>'s interaction with
> some other tags though which may not be desirable (while Vlad's
> approach has an effect on <PRE>'s interaction with some other tags
> than <Hn> which may also not be desirable).

 Yes, changing 'canclose' could be a better idea for this case, but the
tag class to which PRE belongs contains the following elts:
ADDRESS, BANNER, BLOCKQUOTE, BQ, BUTTON, CENTER, DIV, FIELDSET, FIG, FN,
NOTE, PRE

While Tgc_Plike contains
 H[1-6],P, BDO, Caption ,Credit.

So, in this particular case changing 'contains' and 'icontains' seems to be
less harful.  And I think that this change should go to tags_new
permamently and unconditionally - so no options should be introduced - 
(and this will also be in tact with Big Two) - do you agree?

>[...] 

 Best regards,
  -Vlad


reply via email to

[Prev in Thread] Current Thread [Next in Thread]