[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev Re: msg00798.html (was: 0x2276 handling)

From: Foteos Macrides
Subject: lynx-dev Re: msg00798.html (was: 0x2276 handling)
Date: Thu, 30 Apr 1998 22:25:36 -0400

"Leonid Pauzner" <address@hidden> wrote:
>> [...] To see
>> the problem we've been discussing, you should have used Alex Matulich's
>> test page (the URL was posted by Doug), [...]
>BTW, just for this letter I have prepared a variant of sgml.html with entities
>moved inside alt= attributes: I got _exactly the same_ result except
>-0x200D    &zwj;       HTMLspecial       # ZERO WIDTH JOINER
>-0x200E    &lrm;       HTMLspecial       # LEFT-TO-RIGHT MARK
>-0x200F    &rlm;       HTMLspecial       # RIGHT-TO-LEFT MARK
>+0x200D                HTMLspecial       # ZERO WIDTH JOINER
>+0x200E                HTMLspecial       # LEFT-TO-RIGHT MARK
>+0x200F                HTMLspecial       # RIGHT-TO-LEFT MARK
>There is no problem here.
>Anyway, URLs escaping should be tested/rewritten someday.

        The URL is:

        It has:

<a href="";></a>

        The "&lg=" in the query will be treated as "&lg;=" because the '='
is an implied terminator for the "&lg".  Alex has another paragraph in
that page where he changed the "&lg=" to "&lgin=". That works fine,
entirely, because "lgin" is not defined anywhere in Lynx as a named
character reference, and so the standard error recovery occurs, entirely.

        But for the above paragraph in Alex's page, the "lg" *is* a named
character reference for Lynx, even though it doesn't know what to do
with the Unicode value to which it has been defined.  The "lg" in the
linkname for the Anchor is handled by functions in SGML.c, which does
end up using the standard error recovery, and so you see "&lg=" in the
linkname when you look at his testlynx.html.

        The "lg" in the HREF value is handled by functions in LYCharUtils.c,
which are screwing up royally.  If you set advanced user mode, so that
the URL is displayed in the statusline, in the W32 port, which is using
the cp437 Display Character Set, the "lg" is replaced by three letters:
an uppercase gamma, an eacute, and a graphic character.  If you ACTIVATE
the link, Alex's script returns what Lynx submitted.  In that, the
intended "gl" is replaced by three other characters:  an acirc, an
uppercase P, and a colon.

        (To see the problem, try the markup which shows the problem. :)

        As far as what to do about that code is concerned, I'm afraid I
can't be very helpful.  I already had thrown my hands up is dismay, and
wrote other code, from scratch, in the code set that was released as
v2.7.2.  Presumeably, if you add chartrans defaults for all of the defined
named character references, the problems in LYCharUtils.c will again be
masked, although i18n URLs still will be handled improperly.

Foteos Macrides (address@hidden)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]