[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev Japanese & spaces in forms-options menu
From: |
Leonid Pauzner |
Subject: |
Re: lynx-dev Japanese & spaces in forms-options menu |
Date: |
Tue, 22 Sep 1998 17:00:27 +0400 (MSD) |
> Conclusion is that for Chinese and Japanese, interword space is
> meaningless. Therefore introducing a space at a line wrap in the
> source means an added artifact that the author did not intend. I
> would agree that the correct way to handle it would be to go on the
> basis of the character set of the document, but there are just too
> many unlabled documents out there. If someone has their display
> charset set for Japanese or Chinese, then I assume they plan to
> read Japanese or Chinese and don't want extra spaces thrown in. If
> someone wants to read a language that requires interword space, they
> should set their display charset to that language.
This generally right but 7 bit us-ascii is the special case
which should be recognized automatically.
> I made the patch, and it was incorporated in the "1998-08-29 (2.8.1dev.23)"
> CHANGES: "* don't replace '\n' with ' ' if Chinese or Japanese - HN".
> If someone has the interest and the programming skills, then a "better"
> alternative I suppose would be to test if Lynx is really getting a multi-
> byte stream or not, and only _not_ add the space if that's true. Seems to
> add more complexity than necessary, but the "big two" do handle interword
> space correctly in mixed documents, and it's nice I admit.
> __Henry
A simplest way - check whether the previous character is from 20-7E or not.
But EUC-JP seems use "ISO 2022 rules" to select whether 20-7E byte is us-ascii
depending on SS2 and SS3 flags...
Quoted from IANA character sets list:
Name: JIS_Encoding
MIBenum: 16
Source: JIS X 0202-1991. Uses ISO 2022 escape sequences to
shift code sets as documented in JIS X 0202-1991.
Alias: csJISEncoding
Name: Shift_JIS (preferred MIME name)
MIBenum: 17
Source: A Microsoft code that extends csHalfWidthKatakana to include
kanji by adding a second byte when the value of the first
byte is in the ranges 81-9F or E0-EF.
Alias: MS_Kanji
Alias: csShiftJIS
Name: Extended_UNIX_Code_Packed_Format_for_Japanese
MIBenum: 18
Source: Standardized by OSF, UNIX International, and UNIX Systems
Laboratories Pacific. Uses ISO 2022 rules to select
code set 0: US-ASCII (a single 7-bit byte set)
code set 1: JIS X0208-1990 (a double 8-bit byte set)
restricted to A0-FF in both bytes
code set 2: Half Width Katakana (a single 7-bit byte set)
requiring SS2 as the character prefix
code set 3: JIS X0212-1990 (a double 7-bit byte set)
restricted to A0-FF in both bytes
requiring SS3 as the character prefix
Alias: csEUCPkdFmtJapanese
Alias: EUC-JP (preferred MIME name)
Name: Windows-31J
MIBenum: 2024
Source: Windows Japanese. A further extension of csShiftJIS
to include several OEM-specific kanji extensions.
Like csShiftJIS, it adds a second byte when the value
of the first byte is in the ranges 81-9F or E0-EF.
PCL Symbol Set id: 19K
Alias: csWindows31J
Name: GB2312 (preferred MIME name)
MIBenum: 2025
Source: Chinese for People's Republic of China (PRC) mixed one byte,
two byte set:
20-7E = one byte ASCII
A1-FE = two byte PRC Kanji
See GB 2312-80
PCL Symbol Set Id: 18C
Alias: csGB2312
Name: HZ-GB-2312
MIBenum: 2085
Source: RFC 1842, RFC 1843 [RFC1842, RFC1843]
Name: Big5 (preferred MIME name)
MIBenum: 2026
Source: Chinese for Taiwan Multi-byte set.
PCL Symbol Set Id: 18T
Alias: csBig5
[RFC1468] Murai, J., Crispin, M., and E. van der Poel, "Japanese
Character Encoding for Internet Messages", RFC 1468,
Keio University, Panda Programming, June 1993.
[RFC1554] Ohta, M., and K. Handa, "ISO-2022-JP-2: Multilingual
Extension of ISO-2022-JP", RFC1554, Tokyo Institute of
Technology, ETL, December 1993.
[RFC1557] Choi, U., Chon, K., and H. Park, "Korean Character Encoding
for Internet Messages", KAIST, Solvit Chosun Media,
December 1993.
[RFC1815] Ohta, M., "Character Sets ISO-10646 and ISO-10646-J-1",
RFC 1815, Tokyo Institute of Technology, July 1995.
[RFC1842] Wei, Y., J. Li, and Y. Jiang, "ASCII Printable
Characters-Based Chinese Character Encoding for Internet
Messages", RFC 1842, Harvard University, Rice University,
University of Maryland, August 1995.
[RFC1843] Lee, F., "HZ - A Data Format for Exchanging Files of
Arbitrarily Mixed Chinese and ASCII Characters", RFC 1843,
Stanford University, August 1995.
- lynx-dev Japanese & spaces in forms-options menu, Leonid Pauzner, 1998/09/21
- Re: lynx-dev Japanese & spaces in forms-options menu, Nelson Henry Eric, 1998/09/21
- Re: lynx-dev Japanese & spaces in forms-options menu, Leonid Pauzner, 1998/09/22
- Re: lynx-dev Japanese & spaces in forms-options menu, dickey, 1998/09/22
- Re: lynx-dev Japanese & spaces in forms-options menu, Nelson Henry Eric, 1998/09/22
- Re: lynx-dev Japanese & spaces in forms-options menu,
Leonid Pauzner <=
- Re: lynx-dev Japanese & spaces in forms-options menu, Nelson Henry Eric, 1998/09/23
- Re: lynx-dev Japanese & spaces in forms-options menu, Nelson Henry Eric, 1998/09/23