[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gnu-libiconv] Please restore "UTF8" as alias for UTF-8 charset
From: |
Bruno Haible |
Subject: |
Re: [bug-gnu-libiconv] Please restore "UTF8" as alias for UTF-8 charset |
Date: |
Sat, 12 Jan 2019 10:54:49 +0100 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-141-generic; KDE/5.18.0; x86_64; ; ) |
Hi Stuart,
> I don't think I could successfully
> argue your vision (that "UTF8" MUST NOT be accepted) to the maintainers
> of glibc, newlib, uclibc, musl, Bionic, FreeBSD, NetBSD and Cygwin
I did not say that other software, that already supports "UTF8", must
stop supporting it. That would cause backward compatibility problems.
I did say that the best answer to requests to support non-standard aliases
is to say NO. Such requests cause interoperability problems regarding the
use of that alias. And then, when finally after 5 or 10 years, all
software has been upgraded to support the alias and thus close the
interoperability problems, the same game starts again with another alias
(for the same or for another encoding).
> The standards authority for iconv_open() is the Open Group, not IANA.
> Per the standard, encoding names are implementation-defined
> (https://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv_open.html)
> therefore, an implementation can be as helpful, compatible or otherwise
> as it would like to be.
The Open Group is stating that they are not standardizing the encoding
names supported by iconv_open. The one and only standard in this area
is thus IANA.
> If you're not willing to create an alias, would you be willing to
> support Unicode Technical Standard #22, section 1.4?
> https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching
The "ignore case" rule is certainly good. The other rules in this section,
however, make a software not future-proof: If a software decides that
it should treat "Latin-1" like "Latin1", and later an encoding or alias
named "Latin-1" actually gets introduced, you have a problem.
I decided to make libiconv future-proof.
> Alternatively, would you consider following the WHATWG encoding
> standard, https://encoding.spec.whatwg.org/#names-and-labels -- not only
> do they mandate that web page authors MUST use "utf-8" as the encoding
> name, because that is the correct name (lowercased), they also mandate
> that web browsers MUST accept "utf8" as an alias for "utf-8".
The WHATWG spec is meant for web pages and web browsers. It has no
immediate force on iconv_open, since iconv's primary use is not for
web pages.
> Looks like
> the pressure got so bad that all the world's major web browsers agree to
> accept "utf8".
Yes, I agree, for web pages it surely makes sense.
> I would gladly accept it if libiconv's documentation made very clear
> that "UTF-8" is the standard name for the encoding
The 'iconv -l' output makes it clear:
$ iconv -l | grep -i utf
UTF-8
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7
And the documentation as well:
https://www.gnu.org/software/libiconv/
Bruno