bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#34862: 27.0.50; Trying to update pinyin.map


From: Eli Zaretskii
Subject: bug#34862: 27.0.50; Trying to update pinyin.map
Date: Fri, 15 Mar 2019 07:03:30 +0200

> From: Eric Abrahamsen <eric@ericabrahamsen.net>
> Date: Thu, 14 Mar 2019 14:49:51 -0700
> 
> 
> As discussed in bug#34215, I'm trying to update the
> romanization-to-Chinese-character mapping in the
> file ./leim/MISC-DIC/pinyin.map to use the more complete mapping
> provided by the Google pinyin input method, licensed under Apache 2.0.
> This expands the number of characters recognized by Emacs from around
> 7,000 to around 17,000. (And increases the size of the mapping file from
> 18K to 53K.)
> 
> I'm running into encoding problems when adding the new characters --
> Emacs says some of the characters can't be written using the existing
> coding system. The original file has an encoding cookie reading coding:
> cn-gb-2312, and describing the coding system gives me:
> 
> chinese-iso-8bit-dos (alias: cn-gb-2312-dos euc-china-dos euc-cn-dos
>   cn-gb-dos gb2312-dos)
> 
> The characters *can* be encoded using gb18030, and of course utf8. The
> wikipedia page for gb18030 describes gb2312 as "legacy"[1], and says
> gb18030 is a superset of 2312.
> 
> Is there any reason not to go straight to utf8 for this file? If that's
> not okay, would gb18030 be acceptable?

I'm not sure I understand the encoding of which file would you like to
change?  Could you please clarify?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]