[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev Re: msg00798.html (was: 0x2276 handling)

From: Foteos Macrides
Subject: lynx-dev Re: msg00798.html (was: 0x2276 handling)
Date: Sat, 9 May 1998 13:27:34 -0400

"Leonid Pauzner" <address@hidden> wrote:
>> >>         For any raw x80-x9F characters when the document charset is
>> >
>> >Actually, Lynx ignores that characters only for iso-8859-1.
>>         That's obviously an oversight in upgrading the filters for
>> the expanded capabilities, but it should be trivial to generalize
>> the filters so that they apply to all iso-8859-x charsets.
>Which is not trivial - to find the location of this filter in the code :-(

        The filter is applied in SGML_character() of SGML.c, and
HTPlain_write() of HTPlain.c.  It as a substantial "- FM" comment, so
you shouldn't have any trouble finding it within those functions.

        It's based on the LYlowest_eightbit value for the charset, so you
need to figure out why it wasn't set to 160 for iso-8859-x charsets other
than iso-8859-1 (used to be, so something got changed or broken in the
actual release).

        Note that the filter is blocked when the Display Character Set is
"transparent", but not if you specify the Display Character Set you
actually have and toggle on RAW mode.  I explained in an earlier message
why "transparent" is dangerous for general users, and that I was
apprehensive about including it.  I agree with David Wooley that some
warning about it, that will get noticed by general users, should be added
(I doubt it would ever be used for a denial of service attack and that
a CERT bulletin is needed.  That was just David going overboard again.
But do think about it. :)

>> >> iso-8859-x, Lynx just ignores them.  It traditionally ignored numeric
>> >
>> >As far as i18n URLs, I found HTML 4.0 validator on-line,
>> >but I need a remote server which is i18n capable, and appropriate page.
>>         An HTML validator wouldn't check something like that.  It's
>> not something that involves SGML declarations or is a part of the HTML
>> 4.0 DTD, per se.  It's what's known as an "application convention".
>>         Many of the example links that were being maintained by Alis/Tango
>> folks, who were most active in developing the i18n specs, and are in the
>> set accessed via the "HTML i18n" link in the online 'h'elp, are no longer
>> active.  But you can see what Lynx is sending to the server via the TRACE
>> log, and of course can see what's being used in the advanced statusline
>Do you hope we may trust statusline here? Thanks for idea.

        As a programmer you should trust only the TRACE output for what
is sent to the server, or walk through with a debugger and look at it.
As I tried to explain in an earlier message, in the code set that I
released as v2.7.2 the statusline and ShowInfo page also showed the
all-ASCII converted URL.  However, it was on my todo list to do what
the idealists in the IETF discussions wanted if the Display Character
Set is utf-8.  I don't remember now what I did if the URL has CJK
di-bytes (but as Lynx presently stands, there's no way to do it "right"
according to the i18n specs).

>BTW, we may get broken retriving local files if move to i18n tatally,
>it may need a special care.
>Also, lynx_w32 wrongly translate local 8bit filenames under Win95,
>while lynx_386 do it right under Win95...
>(W95 filesystem hold two forms of filename, one in utf-8 (LFN)
>and one 8bit in "OEM code page" (DOS 8+3).
>It was found that lynx_w32 binary seems thought "OEM code page"
>from assumed_local_charset, while it is from display_charset.)

        I'm still inadequately familiar with those platforms, so please
do post an explanation of how you decide to deal with those problems.

>So it is preferable to have a remore http server for such query,
>at least CGI-script which return URL verbatim.  OK.
>But it looks not so wonderful, as you note
>"few people could do the utf-8 and then
>hex conversions in their heads when writing HTML (certainly not me :)"

        Oh, I guess what you're really asking is how do you know if it's
correct when you look at the all-ASCII converted URL.  If you don't
have access to CGI resources on a server, perhaps you could add temporary
debugging code which back translates it in the way I explained that the
CGI scripts should do, and look at that.  Lynx has the functions needed
to do the back translation.

Foteos Macrides

reply via email to

[Prev in Thread] Current Thread [Next in Thread]