[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gnu-libiconv] Invalid characters when converting from utf8 to i
From: |
Bruno Haible |
Subject: |
Re: [bug-gnu-libiconv] Invalid characters when converting from utf8 to iso-8859-15 |
Date: |
Thu, 18 Mar 2021 00:10:23 +0100 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-203-generic; KDE/5.18.0; x86_64; ; ) |
Hi,
Tom Sorensen wrote:
> Note -- this isn't just -15, but -1 as well, and possibly others.
>
> I have a utf8 text file
The official name of the encoding that you mean is UTF-8, not UTF8.
> that contains <c2 98> and <c2 80>. When converted
> to iso-8859-15 via:
> iconv -c -f utf8 -t iso_8859-15//IGNORE input > output
If that worked for you, you must be using iconv from GNU libc, not from
GNU libiconv. The proper bug report address for GNU libc is at
http://www.gnu.org/software/libc/bugs.html
But since GNU libiconv and GNU libc are based on very similar conversion
tables, the answer to your question is the same for both implementations.
> The resulting file contains characters x98 and x80.
This is as expected. All charset converter softwares know that the
characters 0x98 and 0x80 in ISO-8859-1 and ISO-8859-15 are equivalent to
U+0098 and U+0080, respectively. [1][2]
> These are considered
> invalid by some programs that expect iso8859-15 encoding -- including iconv
> itself.
Can you substantiate this claim? What did you do, and what was the outcome?
> Running the file through iconv a second time
Which command line did you use for the second time?
Bruno
[1] https://haible.de/bruno/charsets/conversion-tables/ISO-8859-1.html
[2] https://haible.de/bruno/charsets/conversion-tables/ISO-8859-15.html