[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gnu-libiconv] iconv not catching bad bytes for ISO-8859-1
From: |
Kenneth Reid Beesley |
Subject: |
[bug-gnu-libiconv] iconv not catching bad bytes for ISO-8859-1 |
Date: |
Thu, 13 Aug 2015 19:10:22 -0600 |
Problem: iconv not catching/detecting bad bytes when converting from a file
alleged to be ISO-8859-1 (but it’s not)
Dear All,
I’m using iconv (GNU libiconv 1.14), written by Bruno Haible, in a SUSE Linux
system.
Also iconv (GNU libiconv 1.11) on a separate machine (OS X 10.10.4).
1. I create a file, input1252.txt, that contains hex byte values x91 and x92.
This file is encoded in CP1252,
where x91 and x92 are legal/defined bytes.
These two bytes are not defined in ISO-8859-1
2. I run the following script
iconv -f ISO-8859-1 -t UTF-8 —byte-subst=“<PROBLEM: 0x%x>”
—unicode-subst=“<PROBLEM: U+%04X>” input1252.txt > out.txt
i.e. telling iconv (incorrectly) that the input file is Latin 1, and telling it
to convert it
to UTF-8. I expect the x91 and x92 bytes to be recognized as
not-legal-in-Latin1,
and I expect to see <PROBLEM: 0x91> and <PROBLEM: 0x92> in the out.txt file.
But I don’t see them. The x91 and x92 bytes get copied straight across to the
output file
on both the systems that I’m using.
What am I missing?
Thanks,
Ken
input1252.txt
Description: Text document
script
Description: Binary data
********************************
Kenneth R. Beesley, D.Phil.
PO Box 540475
North Salt Lake UT 84054
USA
- [bug-gnu-libiconv] iconv not catching bad bytes for ISO-8859-1,
Kenneth Reid Beesley <=