Re: [bug-gnu-libiconv] Invalid characters when converting from utf8 to i

On Wed, Mar 17, 2021 at 7:13 PM Bruno Haible <bruno@clisp.org> wrote:> If that worked for you, you must be using iconv from GNU libc, not from
> GNU libiconv. The proper bug report address for GNU libc is at
> http://www.gnu.org/software/libc/bugs.html

Thank you for the clarification.

> > The resulting file contains characters x98 and x80.
>
> This is as expected. All charset converter softwares know that the
> characters 0x98 and 0x80 in ISO-8859-1 and ISO-8859-15 are equivalent to
> U+0098 and U+0080, respectively. [1][2]

Completely agreed. I also read a previous email from you on this list [1] about a similar conversion, where it was stated that 0-255 are valid characters; it's just the below behavior that caused me to email.

> > These are considered
> > invalid by some programs that expect iso8859-15 encoding -- including iconv
> > itself.
>
> Can you substantiate this claim? What did you do, and what was the outcome?

iconv -c -f utf8 -t iso_8859-15//IGNORE input | iconv -c utf8 -t iso_8859-15//IGNORE > output

No longer contains the characters.

If you omit -c from both, the second one fails (you can do this with an intermediary file to confirm it's not the first that fails):
iconv -f utf-8 -t iso_8859-15//IGNORE input | iconv -t iso_8859-15//IGNORE > output
foobarbaz
iconv: illegal input sequence at position 10

If I put -f iso_8859-15 there then it does pass the file through unchanged -- which is something I hadn't tried prior to just now, and certainly undermines my argument!

If -f utf-8 or no -f are used, then you get the error as above.

The original issue was found when loading the file into a database with encoding iso_8859-15 -- so it seems that whatever it is doing to the file is having the same issue.

If you have any insight into that behavior, I'd appreciate it. After this I'll take it to the libc list (if needed), as it certainly appears to be a question about the executable more than the library. I just suspect this list has the experts on encodings.

[1] https://lists.gnu.org/archive/html/bug-gnu-libiconv/2015-08/msg00001.html

--
Tom Sorensen

From:	Tom Sorensen
Subject:	Re: [bug-gnu-libiconv] Invalid characters when converting from utf8 to iso-8859-15
Date:	Thu, 18 Mar 2021 09:41:36 -0400