[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gnu-libiconv] [PATCH] Support nl_langinfo (CODESET) correctly o
From: |
Bruno Haible |
Subject: |
Re: [bug-gnu-libiconv] [PATCH] Support nl_langinfo (CODESET) correctly on OS/2 |
Date: |
Sun, 04 Aug 2019 02:24:38 +0200 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-157-generic; KDE/5.18.0; x86_64; ; ) |
Hi KO,
> diff --git a/libcharset/lib/localcharset.c b/libcharset/lib/localcharset.c
> index da3ac45..40923fc 100644
> --- a/libcharset/lib/localcharset.c
> +++ b/libcharset/lib/localcharset.c
> @@ -378,26 +378,41 @@ static const struct table_entry alias_table[] =
> by Alex Taylor:
> <http://altsan.org/os2/toolkits/uls/index.html#codepages>.
> See also "IBM Globalization - Code page identifiers":
> - <https://www-01.ibm.com/software/globalization/cp/cp_cpgid.html>. */
> - { "CP1089", "ISO-8859-6" },
> - { "CP1208", "UTF-8" },
> - { "CP1381", "GB2312" },
> - { "CP1386", "GBK" },
> - { "CP3372", "EUC-JP" },
> - { "CP813", "ISO-8859-7" },
> - { "CP819", "ISO-8859-1" },
> - { "CP878", "KOI8-R" },
> - { "CP912", "ISO-8859-2" },
> - { "CP913", "ISO-8859-3" },
> - { "CP914", "ISO-8859-4" },
> - { "CP915", "ISO-8859-5" },
> - { "CP916", "ISO-8859-8" },
> - { "CP920", "ISO-8859-9" },
> - { "CP921", "ISO-8859-13" },
> - { "CP923", "ISO-8859-15" },
> - { "CP954", "EUC-JP" },
> - { "CP964", "EUC-TW" },
> - { "CP970", "EUC-KR" }
> + <https://www-01.ibm.com/software/globalization/cp/cp_cpgid.html>.
This URL is dead. You can remove it, or replace it with another suitable one.
> + See also "__convcp() of kLIBC":
> +
> <http://trac.netlabs.org/libc/browser/branches/libc-0.6/src/emx/src/lib/locale/__convcp.c>,
> + or:
> +
> <https://github.com/bitwiseworks/libc/blob/master/src/emx/src/lib/locale/__convcp.c>.
> */
The first of these two URLs is broken. You can therefore remove it.
> + { "CP1089", "ISO-8859-6" },
OK
> + { "CP1200", "UCS-2" },
UCS-2 cannot be used as a locale encoding, since it is not ASCII compatible.
This
cannot work.
> + { "CP1208", "UTF-8" },
> + { "CP1381", "GB2312" },
> + { "CP1383", "EUC-CN" },
> + { "CP1386", "GBK" },
> + { "CP3372", "EUC-JP" },
> + { "CP813", "ISO-8859-7" },
> + { "CP819", "ISO-8859-1" },
> + { "CP878", "KOI8-R" },
> + { "CP912", "ISO-8859-2" },
> + { "CP913", "ISO-8859-3" },
> + { "CP914", "ISO-8859-4" },
> + { "CP915", "ISO-8859-5" },
> + { "CP916", "ISO-8859-8" },
> + { "CP920", "ISO-8859-9" },
> + { "CP921", "ISO-8859-13" },
> + { "CP923", "ISO-8859-15" },
> + { "CP954", "EUC-JP" },
> + { "CP964", "EUC-TW" },
> + { "CP970", "EUC-KR" },
> + { "ISO8859-1", "ISO-8859-1" },
> + { "ISO8859-2", "ISO-8859-2" },
> + { "ISO8859-3", "ISO-8859-3" },
> + { "ISO8859-4", "ISO-8859-4" },
> + { "ISO8859-5", "ISO-8859-5" },
> + { "ISO8859-6", "ISO-8859-6" },
> + { "ISO8859-7", "ISO-8859-7" },
> + { "ISO8859-8", "ISO-8859-8" },
> + { "ISO8859-9", "ISO-8859-9" }
OK
> @@ -751,6 +766,24 @@ locale_charset (void)
> }
> # endif
>
> +# ifdef OS2
> + /* On OS/2, nl_langinfo (CODESET) returns IBM-XXX style normally. Convert
> it
> + to CPXXX style for mapping later except UCS-2LE and UCS-2BE. */
> + if (strcmp (codeset, "IBM-1200@endian=little") == 0)
> + return "UCS-2LE";
> + else if (strcmp (codeset, "IBM-1200@endian=big") == 0)
> + return "UCS-2BE";
> +
> + if (strncmp (codeset, "IBM-", 4) == 0 && isdigit (codeset[4]))
> + {
> + static char buf[2 + 10 + 1];
> +
> + snprintf (buf, sizeof (buf), "CP%s", codeset + 4);
> +
> + codeset = buf;
> + }
> +# endif
> +
I would prefer if you could get rid of this extra code, and instead just add
entries to the table above.
UCS-2LE and UCS-2BE are also unsuitable for locale encodings.
Bruno
- Re: [bug-gnu-libiconv] [PATCH] Support nl_langinfo (CODESET) correctly on OS/2,
Bruno Haible <=