[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gnu-libiconv] iconv incorrectly converts escape characters 0x1b
From: |
Bruno Haible |
Subject: |
Re: [bug-gnu-libiconv] iconv incorrectly converts escape characters 0x1b from UTF-8 to ISO-2022-JP |
Date: |
Wed, 25 Mar 2015 12:09:23 +0100 |
User-agent: |
KMail/4.8.5 (Linux/3.2.0-64-generic; KDE/4.8.5; x86_64; ; ) |
Seikoh NISHITA wrote:
> Next, I sent a mail with same text of half-width katakana characters.
>
> o Gmail: UTF-8 (base64 encoded)
> o Yahoo Mail: ISO-2022-JP (converted to full-width katakana characters)
> o Outlook in Office365: ISO-2022-JP (converted to full-width
> katakana characters)
>
> These three mail sites uses UTF-8 or ISO-2022-JP (not ISO-2022-JP-2, or ..
> -3).
Yes, that's what I'm saying: Through the use of ISO-2022-JP (not
ISO-2022-JP-2/3), the distinction between half-width and full-width
Katakana characters gets lost. I thought this is unacceptable to Japanese
people?
> Because ISO-2022-* should not have invalid ESC characters as you wrote,
> and libiconv is one of the basic libraries for developers,
> I think libiconv should terminate conversion to ISO-2022-* when it
> finds invalid ESC characters.
> How do you think about it?
Good question.
On one hand, yes, when you look at the formal definition of
ISO-2022-JP https://tools.ietf.org/html/rfc1468
and ISO-2022-JP-2 https://tools.ietf.org/html/rfc1554
(definition of single-byte-char: "... not including ESC, SI, SO")
it forbids the use of ESC and other unrecognized ESC sequences in
the input.
On the other hand, by the structure of ISO-2022 and by long tradition
in the area of Japanese conversion software, byte sequences that are
not recognized are commonly passed through without modification, not
rejected.
How to decide what is best?
Now, ca. 15 years after the introduction of UTF-8 support in glibc and
the other operating systems, I think software that uses ISO-2022-JP-*
is "legacy" in the sense that people have a running system that they
don't want to touch much any more. I.e. they want minimal maintenance
cost. Therefore maximum backward compatibility is what libiconv and
glibc should do in this area. If we now change libiconv and glibc
to emit conversion errors where previously the conversion was silent,
we cause trouble and maintenance costs. Therefore I am against such
a change.
But if your question had been asked 15 years ago, I may well have judged
differently.
Bruno