[bug-gnu-libiconv] iconv not catching bad bytes for ISO-8859-1

bug-gnu-libiconv

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gnu-libiconv] iconv not catching bad bytes for ISO-8859-1

From:	Kenneth Reid Beesley
Subject:	[bug-gnu-libiconv] iconv not catching bad bytes for ISO-8859-1
Date:	Thu, 13 Aug 2015 19:10:22 -0600

Problem:  iconv not catching/detecting bad bytes when converting from a file 
alleged to be ISO-8859-1 (but it’s not)

Dear All,

I’m using iconv (GNU libiconv 1.14), written by Bruno Haible, in a SUSE Linux 
system.
Also iconv (GNU libiconv 1.11) on a separate machine (OS X 10.10.4).

1.  I create a file, input1252.txt, that contains hex byte values x91 and x92.  
This file is encoded in CP1252,
where x91 and x92 are legal/defined bytes.

These two bytes are not defined in ISO-8859-1 

2.  I run the following script

iconv -f ISO-8859-1 -t UTF-8 —byte-subst=“<PROBLEM: 0x%x>”  
—unicode-subst=“<PROBLEM: U+%04X>” input1252.txt > out.txt

i.e. telling iconv (incorrectly) that the input file is Latin 1, and telling it 
to convert it
to UTF-8.  I expect the x91 and x92 bytes to be recognized as 
not-legal-in-Latin1,
and I expect to see <PROBLEM: 0x91> and <PROBLEM: 0x92> in the out.txt file.
But I don’t see them.  The x91 and x92 bytes get copied straight across to the 
output file
on both the systems that I’m using.

What am I missing?

Thanks,

Ken

input1252.txt
Description: Text document

script
Description: Binary data





********************************
Kenneth R. Beesley, D.Phil.
PO Box 540475
North Salt Lake UT 84054
USA

[Prev in Thread]

Current Thread

[Next in Thread]

[bug-gnu-libiconv] iconv not catching bad bytes for ISO-8859-1, Kenneth Reid Beesley <=
- Re: [bug-gnu-libiconv] iconv not catching bad bytes for ISO-8859-1, Bruno Haible, 2015/08/14

Next by Date: Re: [bug-gnu-libiconv] iconv not catching bad bytes for ISO-8859-1
Next by thread: Re: [bug-gnu-libiconv] iconv not catching bad bytes for ISO-8859-1
Index(es):
- Date
- Thread