[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gnu-libiconv] GB2312 incompatible with GB18030; violation of GB 180
From: |
Mingye Wang (Arthur2e5) |
Subject: |
[bug-gnu-libiconv] GB2312 incompatible with GB18030; violation of GB 18030 "principles" |
Date: |
Thu, 29 Sep 2016 02:33:51 -0400 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 |
Hello,
I am not sure if someone has brought this up before, as what I am
reporting is, in fact, a well-documented issue. [1]
[1]: https://en.wikipedia.org/wiki/GB_2312#Two_implementations_of_GB2312
iconv encodes the GB code points A1A4 and A1AA differently for GB 2312
and GB 18030:
bytes gb2312 gb18030
----- ------ -------
A1A4 U+00B7 U+30FB
A1AA U+2014 U+2015
This slight difference breaks compatibility between these two encodings,
a principle of the mandatory GB 18030[^1] standard:
[^1]: -2000 and -2005. In 2000 it says "de facto internal encoding".
> 3. Principles
> =============
>
> This standard is backwards compatible with the internal encoding
> defined in GB 2312.
> ...
This violation of standard principles is not rare in the FOSS world,
according to [1]. Someone submitted a similar bug to Python[2], but it
got marked "wontfix" to ensure compatibility with "the rest of the FOSS
world" as well as round-trip safety (in case of a Ruby-like
normalization[^2]). I am submitting this bug in hope that changes in
libiconv, an important reference implementation for "the rest of the
FOSS world", can lead to revisions in other libraries.
[2]: https://bugs.python.org/issue24036
[^2]: Ruby uses a gb18030-compatible implementation internally, but
still accepts Unicode code points from the incompatible code points.
--
Regards,
Arthur2e5
signature.asc
Description: OpenPGP digital signature
- [bug-gnu-libiconv] GB2312 incompatible with GB18030; violation of GB 18030 "principles",
Mingye Wang (Arthur2e5) <=