[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gnu-libiconv] iconv fails to convert utf8 with bom to cp1251
From: |
Bruno Haible |
Subject: |
Re: [bug-gnu-libiconv] iconv fails to convert utf8 with bom to cp1251 |
Date: |
Thu, 07 Dec 2017 01:57:22 +0100 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-101-generic; KDE/5.18.0; x86_64; ; ) |
Yan wrote:
> Arch linux, iconv (GNU libc) 2.26
Your report ought to have been directed to the glibc tracker, not to the
libiconv tracker.
But anyway, since glibc and GNU libiconv behave the same way in this regard,
I can answer it:
> Iconv doesn't understand utf8 with bom ("EF BB BF" prefix which is legal
> according to standard). It prints "iconv: illegal input sequence at
> position 0".
Quoting the standard [1]:
U+FEFF in the first position of a stream MAY be interpreted as a
zero-width non-breaking space, and is not always a signature.
A protocol SHOULD also forbid use of U+FEFF as a signature for
those textual protocol elements for which the protocol provides
character encoding identification mechanisms, when it is expected
that implementations of the protocol will be in a position to
always use the mechanisms properly.
You provided the encoding identification "UTF-8" to iconv, therefore
iconv SHOULD not allow a BOM in this conversion.
In other words, use of the BOM is only for those cases where no
encoding identification is present and some software has to guess.
Bruno
[1] https://tools.ietf.org/html/rfc3629#section-6