lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV fixed bug on bsdi


From: Foteos Macrides
Subject: Re: LYNX-DEV fixed bug on bsdi
Date: Mon, 29 Dec 1997 12:49:52 -0500 (EST)

address@hidden (Michael Sokolov) wrote:
>Fote,
>
>You write that LYUCFullyTranslateString() and LYUnEscapeToLatinOne() first
>translate the character entity to raw byte A0 (decimal 160). Why so? I have
>always thought (actually learned from other lynx-dev postings) that the
>document is first completely translated to Unicode and then the character
>entities are unescaped to Unicode too. Would you please clarify this for me and
>other people who are not as familiar with Lynx internals as you or Klaus?

        The character entities are first translated to Unicode, which
corresponds with ASCII in the 7-bit range and iso-8859-1 in the 8-bit
mono-byte range.  The Unicode value for NON-BREAKING SPACE is decimal
160 (U+00A0).  What is done beyond the initial conversion to Unicode
depends on the context.  For characters intended to be displayed as
text, the Unicode is converted to characters of the current Display
Character Set (or to 7-bit approximations if the current Display
Charcter Set doesn't include them).  For HREF and SRC values, in which
only non-control ASCII characters are permitted, lynx271f converts
non-ASCII characters to UTF-8 and then hex escapes successive bytes of
those multi-byte characters as proposed in IETF drafts for i18n URLs.
The devel code doesn't yet address that issue, as far as I can tell,
and the IETF specs for i18n URLs are still tentative. 

                                Fote

=========================================================================
 Foteos Macrides            Worcester Foundation for Biomedical Research
 address@hidden         222 Maple Avenue, Shrewsbury, MA 01545
=========================================================================

reply via email to

[Prev in Thread] Current Thread [Next in Thread]