--- Begin Message ---
lynx: display charset <-> doc. charset conversion using iconv
Wed, 7 Aug 2002 14:57:08 -0400 (EDT)
I'm very sorry to bother you with this.
I sent the following message to address@hidden about a week ago, but I'm
afraid it hasn't made it to list members because I haven't subscribed
to the list (it hasn't shown up in the mailing list archive, yet.)
I'd like to see Lynx become as good a I18Nized browser as possible,
but my interest is not large enough for me to subscribe to yet another
mailing list(although I'll keep eyes on the list archive from time
to time). I also tried submitting my suggestion via the web form at
http://lynx.isc.org/todo/todo-add.html only to find that it didn't work.
Therefore, it'd be very nice if you could forward my message enclosed
below to address@hidden
Once again, very sorry to bother you and thank you for your help in
---------- Forwarded message ----------
From: Jungshik Shin
Subject: display charset <-> doc. charset conversion using iconv
Date: Sun, 28 Jul 2002 00:27:14 -0400 (EDT)
I'd like to run Lynx under a UTF-8 terminal (e.g. xterm-16x under UTF-8
locale, putty for MS-Windows) to view web pages encoded in various
MIME charsets (ISO-8859-x, KOI8-R, WIndows-xxxx/CPxxx, EUC-JP/KR/CN,
Big5, Shift-JIS). By setting display charset(terminal charset) to UTF-8,
this is possible for pages in single byte legacy encodings (ISO-8859-x,
KOI8-R/U, Windows-12xx) because Lynx's chrtrans module handles mapping
between them and Unicode (and thus UTF-8, my display charset). However,
this does not work for CJK legacy encodings (EUC-xx, Shift_JIS, Big5, etc)
because Lynx does not support mapping between them and Unicode. Neither
does it work when the display charset(terminal charset) is one of CJK
legacy encodings and the document charset is another CJK legacy encoding.
I'm NOT suggesting that Lynx should include huge tables to map from
those legacy encodings to Unicode and vice versa. Instead, Lynx can
rely on 'iconv(3)' offered by C library on most Single Unix Spec(POSIX)
compliant Unix systems or libiconv(by Bruno Haibe) on other systems. Lynx
is already using iconv for localization ('po' file translation) so that
the portability issue related with 'iconv' must already have been taken
care of. Moreover, libiconv has been ported to a wide array of OS's
(Unix-like and non-Unix-like) making the portability issue easier to
handle than otherwise.
Another advantage of using iconv(3) is that Lynx would be able to
interpret any Unicode characters represented in NCRs(numeric character
reference) and render them in display character set. Currently, Lynx
cannot render characters UNKNOWN to chrtrans module(e.g. CJK characters)
even though those characters are perfectly displayable in the current
display character set. For instance, U+AC00 is covered by EUC-KR (one
of Korean encodings), but Lynx treats '가' (0xac00 = 44032) as
undisplayable even if the terminal/display charset is EUC-KR or UTF-8.
It also has to be noted that using iconv(3) benefits not only UTF-8
terminal users but also users of terminals with legacy encodings.
Shift_JIS and EUC-JP MIME charsets are used for Japanese web pages with
almost equal share and ISO-2022-JP being the distant third. With the
current Lynx, Japanese users have to switch manually the terminal charset
(most Japanese terminals support multile encodings/charset) between EUC-JP
and Shift_JIS. If Lynx could take care of the conversion between EUC-JP
and SHift_JIS, Japanese Lynx users could set 'display character set'
to either of them in their lynx conf. file and wouldn't have to switch
the terminal encoding manually when moving from a web site to another.
Thank you in advance for your kind attention to my proposal.
P.S. Just FYI, I believe Lynx's competitor w3m-m17n incorporated what
I suggested above. In addition, mutt(although it's not a web browser)
has a similar feature to render incoming emails in various MIME charset
under a single terminal.
--- End Message ---