lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev Re: msg00798.html (was: 0x2276 handling)


From: Leonid Pauzner
Subject: lynx-dev Re: msg00798.html (was: 0x2276 handling)
Date: Sat, 9 May 1998 16:05:17 +0400 (MSD)

> >>         For any raw x80-x9F characters when the document charset is
> >
> >Actually, Lynx ignores that characters only for iso-8859-1.
>
>         That's obviously an oversight in upgrading the filters for
> the expanded capabilities, but it should be trivial to generalize
> the filters so that they apply to all iso-8859-x charsets.
Which is not trivial - to find the location of this filter in the code :-(
>
>
> >> iso-8859-x, Lynx just ignores them.  It traditionally ignored numeric
> >
> >As far as i18n URLs, I found HTML 4.0 validator on-line,
> >but I need a remote server which is i18n capable, and appropriate page.
>
>         An HTML validator wouldn't check something like that.  It's
> not something that involves SGML declarations or is a part of the HTML
> 4.0 DTD, per se.  It's what's known as an "application convention".
>
>         Many of the example links that were being maintained by Alis/Tango
> folks, who were most active in developing the i18n specs, and are in the
> set accessed via the "HTML i18n" link in the online 'h'elp, are no longer
> active.  But you can see what Lynx is sending to the server via the TRACE
> log, and of course can see what's being used in the advanced statusline
Do you hope we may trust statusline here? Thanks for idea.
BTW, we may get broken retriving local files if move to i18n tatally,
it may need a special care.

Also, lynx_w32 wrongly translate local 8bit filenames under Win95,
while lynx_386 do it right under Win95...
(W95 filesystem hold two forms of filename, one in utf-8 (LFN)
and one 8bit in "OEM code page" (DOS 8+3).
It was found that lynx_w32 binary seems thought "OEM code page"
from assumed_local_charset, while it is from display_charset.)

So it is preferable to have a remore http server for such query,
at least CGI-script which return URL verbatim.  OK.
But it looks not so wonderful, as you note
"few people could do the utf-8 and then
hex conversions in their heads when writing HTML (certainly not me :)"

> and ShowInfo page.  You can use your own test page with i18n URL attribute
> values, and put the > 7 bit stuff in the query for an echo script, like
> the one Alex set up.  Or, your test page can have i18n paths that you
> send to any server, so you can look at what's sent via the TRACE log,
> and just ignore the 404 returned by the server.
>
>         The server or CGI script would just do a one-time hex unescaping,
> and if the result has 8-bit characters, do the usual checks for and
> handling of utf-8 multibytes.  There will be a transitional period when
> old browsers might not be using utf-8, but those would have to be byte
> characters with values less than 256, and easy for a server or script
> to distinguish from utf-8.
>
>         We don't, of course, have any way in Lynx to handle CJK di-bytes
> in URLs according to the i18n specs, nor any strategy for handling them
> in the forseeable future.  Hopefully, pages with such markup will have
> the attribute values already cast to the i18n format.
>
>                                 Fote



reply via email to

[Prev in Thread] Current Thread [Next in Thread]