[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev Re: msg00798.html (was: 0x2276 handling)

From: Foteos Macrides
Subject: lynx-dev Re: msg00798.html (was: 0x2276 handling)
Date: Sat, 9 May 1998 02:01:41 -0400

"Leonid Pauzner" <address@hidden> wrote:
>>         For any raw x80-x9F characters when the document charset is
>Actually, Lynx ignores that characters only for iso-8859-1.

        That's obviously an oversight in upgrading the filters for
the expanded capabilities, but it should be trivial to generalize
the filters so that they apply to all iso-8859-x charsets.

>> iso-8859-x, Lynx just ignores them.  It traditionally ignored numeric
>> character references in that range (e.g, "&#145;"), as well, but in
>> v2.7.2 I changed that to the error recovery of assuming they're
>> MS Windows characters due to FrontPage's misuse of numeric character
>> references.  But that's *only* for numeric character references, not
>> raw 8-bit characters.
>That is near the v2.8 behaviour.

>I fix FrontPage's NCR in alt=,  right-to-left specials now ignored.
>Also I made few test/ files for this area, Tom will include it.
>As far as i18n URLs, I found HTML 4.0 validator on-line,
>but I need a remote server which is i18n capable, and appropriate page.

        An HTML validator wouldn't check something like that.  It's
not something that involves SGML declarations or is a part of the HTML
4.0 DTD, per se.  It's what's known as an "application convention".

        Many of the example links that were being maintained by Alis/Tango
folks, who were most active in developing the i18n specs, and are in the
set accessed via the "HTML i18n" link in the online 'h'elp, are no longer
active.  But you can see what Lynx is sending to the server via the TRACE
log, and of course can see what's being used in the advanced statusline
and ShowInfo page.  You can use your own test page with i18n URL attribute
values, and put the > 7 bit stuff in the query for an echo script, like
the one Alex set up.  Or, your test page can have i18n paths that you
send to any server, so you can look at what's sent via the TRACE log,
and just ignore the 404 returned by the server.

        The server or CGI script would just do a one-time hex unescaping,
and if the result has 8-bit characters, do the usual checks for and
handling of utf-8 multibytes.  There will be a transitional period when
old browsers might not be using utf-8, but those would have to be byte
characters with values less than 256, and easy for a server or script
to distinguish from utf-8.

        We don't, of course, have any way in Lynx to handle CJK di-bytes
in URLs according to the i18n specs, nor any strategy for handling them
in the forseeable future.  Hopefully, pages with such markup will have
the attribute values already cast to the i18n format.

Foteos Macrides

reply via email to

[Prev in Thread] Current Thread [Next in Thread]