[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev lRe: msg00798.html (was: 0x2276 handling)

From: Foteos Macrides
Subject: lynx-dev lRe: msg00798.html (was: 0x2276 handling)
Date: Wed, 6 May 1998 13:46:14 -0400

Nelson Henry Eric <address@hidden> wrote:
>> However, I tracked down a URL for the FAQ (would have been nice if he had
>> included it the the message):
>Sure would have.
>Thanks, Fote, I wanted to but didn't have the time to look for it.
>> do, and is wrong.  The server is returning "Content-Type: text/plain"
>> without a charset parameter, so the assumed charset should apply, and
>This mixed-bag page seems a "natural" for Asada's wizardry, and indeed
>is pretty much rendered correctly by using Lynx in CJK mode on either
>SunOS or Windows95.
>> characters, as the v2.7.2 code did?).  Also, when I set the assumed
>> charset to euc-jp or shift_jis (it's not clear which the FAQ is using),
>> I get different, but still 8-bit characters.
>Cut-n-paste of my options menu from Lynx 2.8rel.3 (SunOS4.1.3):
>     preferred document lan(G)uage: ja,en
>     preferred document c(H)arset : ISO-2022-JP
>     ^A)ssume charset if unknown  : euc-jp
>     display (C)haracter set      : Japanese (EUC)
>     Raw 8-bit or CJK m(O)de      : ON
>On MS Windows, you'd probably want the display (C)haracter set set
>at Japanese (SJIS).

        I re-subscribed, temporarily, to make it easier to reply to any
more messages in threads I contributed to when I was subscribed.

        I'm concerned, Henry, that you missed the point, and are again
making unfounded assumptions about the code, which might inadvertantly
derail the needed input from CJK users.  The CJK support in Lynx is
based on Asada's original implementation, but differs from that in a
number of respects, and was further modified in conjunction with adding
Unicode-based chartrans support.  The original implementation, and to
an unfortunately large extent the Unicode-based chartrans integration,
is content to have the CJK support *really* work only selectively, i.e.,
in the case where you *do* have a CJK Display Character Set, and the
document *does* contain corresponding CJK character representations.
For Japanese, there is trickery to determine if any of the Japanese
document character representations should be converted appropriately
for an euc-jp versus shift_jis Display Character Set, but there is also
support in Lynx for Korean and two Chinese Display Character sets, and
no intercoversions among those CJK languages.  When a CJK Display
Character set *is* selected (as indicated in your 'o'ptions menu) and
Lynx knows, or has no way of knowing and is set to assume, that the
document charset is not complementary to the CJK Display Character Set
(you've set it to assume that it is), Lynx does use the Unicode-based
chartrans support to convert from the known or assumed Unicode-supported
charset to 7-bit approximations, so the CJK users can get a reasonable
representation of iso-8859-1, or Windows-1252, or Cyrillic, or Greek
documents without any special, overt efforts on their part.  However,
if they happen to be at a terminal which does not have a CJK Display
Character Set, and are trying to get something out of a document which
has CJK character representations, they're up the creek without a paddle.
In contrast, if a Russian or Greek Lynx user happens to be at a terminal
which does not have a Russian/Cyrillic or Greek Display Character Set,
and they access a document with Russian/Cyrillic or Greek character
representations, they'll get the conversions to ASCII letters and symbols
which approximate their language.

        To illustrate this further, in that FAQ, for each line which has CJK
dibyte characters the author has placed a line with homologous ASCII strings
below it.  For non-CJK document charsets, which Lynx can handle via
Unicode-based character conversions, that wouldn't (and ideally, should not
ever) be necessary (we get the Cyrillic or Greek equivalents of "Kung Fu"
strings automatically :).  Leonid in effect was asking what Lynx should
do in such cases, but no matter what it does within the contraints of
the current CJK implementation, the screen display is not going to be
interpretable in such cases by people who might otherwise "sound out"
CJK strings converted to ASCII strings.

>The truth is, however, I am not having a lot of luck using Win32 Lynx
>2.7.1ac-0.81, if there is a meta tag describing the character set, e.g.,
><META HTTP-EQUIV="Content-Type" CONTENT="text/html;CHARSET=x-euc-jp">.
>Without a meta tag, a document is rendered okay.  Guess it means I need
>to upgrade to 2.8.

        That has nothing to do with CJK support, per se, beyond that you've
set up euc-jp as your assumed charset, and therefore get what you want when
Lynx has no idea what is the actual charset.  Klaus had bad logic and an
incomplete list for charset synonyms (which blew it on "x-euc-jp" and
"x-shift_jis").  He changed the logic to more like what I had in v2.7.2
and supplemented the synonyms lists equivalently, shortly before he took

Foteos Macrides

reply via email to

[Prev in Thread] Current Thread [Next in Thread]