[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gnu-libiconv] iconv incorrectly converts escape characters 0x1b
From: |
Bruno Haible |
Subject: |
Re: [bug-gnu-libiconv] iconv incorrectly converts escape characters 0x1b from UTF-8 to ISO-2022-JP |
Date: |
Tue, 24 Mar 2015 04:22:58 +0100 |
User-agent: |
KMail/4.8.5 (Linux/3.2.0-64-generic; KDE/4.8.5; x86_64; ; ) |
Hello,
> ISO-2022-JP is one of the popular character encoding schemes for email
> texts in Japan.
I don't think that it is still popular, for 20 or 30 years already, as it
cannot encode half-width Katakana characters (it can only encode Katakana as
full-width characters, which is extremely unusual).
Try ISO-2022-JP-2 or ISO-2022-JP-3 instead. That's why these encodings
were created.
See https://en.wikipedia.org/wiki/ISO/IEC_2022#ISO.2FIEC_2022_character_sets
> I report incorrect conversion by iconv w.r.t. ISO-2022-JP.
> The byte value 0x1b in UTF-8 text is converted to the same byte value
> in ISO-2022-JP by iconv.
Since the byte value 0x1b is used as escape character in the ISO-2022-*
family of encodings, and these encodings provide no way to encode a ESC
character as such, "byte value 0x1b in UTF-8 text" is invalid input for
such a conversion. In other words, use ASCII without ESC characters,
or UTF-8 without ESC characters, as input.
Bruno