[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gnu-libiconv] Updating iconv tables
From: |
Bruno Haible |
Subject: |
Re: [bug-gnu-libiconv] Updating iconv tables |
Date: |
Thu, 12 Jun 2008 02:42:00 +0200 |
User-agent: |
KMail/1.5.4 |
Hi,
I'm not sure I understand it all right.
> When people have
> gone to convert the EDICT file to UTF8 for other
> systems, the iconv utility simply dies on that character
In summary, you are saying that you have a particular character in EUC-JP,
that the iconv conversion from EUC-JP to UTF-8 does not grok?
Then the character is not EUC-JP.
I'm not sure which character you are talking about, because your mail
had an encoding specification of ISO-2022-JP, which usually means
ISO-2022-JP-2, but that particular character was invalid in ISO-2022-JP-2
(it was encoded as "ESC $ B - j"), the other character in that line was
U+682A, and you were talking about U+3231.
> The problem, I conclude, is with the compiled-in tables
> in iconv in the Linux distros. It seems Sun has gone to
> the trouble of keeping theirs up-to-date, but the standard
> distros haven't.
You have a misconception of what EUC-JP is. EUC-JP is a character encoding
scheme based on three standards: ASCII, JIS X 0208, and JIS X 0212. These
are standards issued by Japanese authorities, and carved in stone. Anyone
who thinks that EUC-JP tables have to be "kept up-to-date", is asking for
deviation from standards, and is asking for interoperability problems!
The interoperability problem that you encountered is *precisely* due to
your vendor having added "extensions" to their EUC-JP fonts, and you
expect that everyone else has the same extensions in their fonts and tables!
Take a look at
http://www.haible.de/bruno/charsets/conversion-tables/EUC-JP.html
to see how many variants of EUC-JP already exist!
Bruno