[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Display of SGML Greek Math entities -- solution

From: Jungshik Shin
Subject: Re: lynx-dev Display of SGML Greek Math entities -- solution
Date: Thu, 22 Aug 2002 17:37:30 -0400 (EDT)

On Tue, 20 Aug 2002, Henry Nelson wrote:

> > legacy encodings. That's why I suggested Lynx makes use of iconv(3)
> > for conversions between various encodings.
> And as I pointed out, it's a wonderful suggestion, but requires that
> someone implement it.  Can you write the code?

  Before I asked Thomas to forward my message about using iconv(3),
I had taken a look at chartrans module of Lynx, but couldn't figure
out how to hook up iconv(3) to the existing code. I've just found that
SGML_characters() in SGML.c is one of places to look into and come up
with a couple of vague ideas which may or may not lead to a solution.

> [Did you see my reply that tells you that the CJK code in Lynx does
> automatic decoding between SJIS, euc-jp and iso-2202-jp?]

No, I didn't. Thank you for letting me know about this.  Anyway, I've
just found conversion routines for SJIS, EUC-JP, and ISO-2022-JP at the
end of SGML.c

> >   It's really strange that Henry's screenshot showed Greek characters
> > represented both in EUC-JP encoding and in NCRs as Greek characters

However, just because Lynx can algorithmically convert between three
Japanese encodings (SJIS, EUC-JP, and ISO-2022-JP) does not mean that it
can convert between Unicode on one hand and one of Japanese encodings
on the other hand. So, it still remains mysterious to me how you made
Lynx render  '&#0x3b1;" (or "α")  as "0xa6c1" in EUC-JP in your
Japanese terminal (U+03B1 corresponds to 0x2641 in JIS X 0208 which is
represented /serialized as "0xA6 0xC1" in EUC-JP). All of my attempts
under a few Japanese terminals failed, which is consistent with my reading
of Lynx code and what you wrote in response to my suggestion about using
iconv(3). That is, Lynx does not have any mapping tables between CJK
encodings and Unicode. Nor does it invoke any external library function
to do that job.  Perhaps, this might be regarded as off-topic in a sense,
but to me it's not. Actually, as Steve implied in his message on this
issue, knowing in what aspect your Lynx is different from the newest
snapshot (so that it can render 'alpha' represented both in EUC-JP raw
encoding and in NCR _identically_)  could result in at least a partial
solution for the problem I tried to solve. Therefore, could you tell me
any custom change in your Lynx if there's any?

> FINALLY someone who understands.  As with ANY character set that does
> not have a Unicode character within it's repertoire, Lynx has no choice

  As now it became clear to others, Lynx CANNOT tell whether or not
the character repertoire of CJK legacy encodings include a given Unicode
character. Therefore, even if they belong to the character repertoire of
CJK legacy encodings,  any characters represented in NCRs or entity names
(other than trivial characters in US-ASCII) are rendered as '&#dddd;'
OR with fall-back characters defined in def7_uni.tbl (if they happen to
be in def7_uni.tbl) when display character set is one of  CJK encodings.


; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]