bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments


From: Steffen Nurpmeso
Subject: Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments
Date: Thu, 12 May 2022 19:19:23 +0200
User-agent: s-nail v14.9.24-238-g9856c4e34b

Bruno Haible wrote in
 <4298913.vrqWZg68TM@omega>:
 |Steffen Nurpmeso wrote:
 |>  ...
 |>| [.] "UTF-7"."
 |> 
 |> That is overshoot.
 |
 |No. UTF-7 is invalid here because it produces output that is not NUL
 |terminated. See:
 |
 |$ printf 'ab\0' | iconv -t UTF-7 | od -t c
 |0000000   a   b   +   A   A   A   -
 |0000007
 |
 |strlen() on such a return value makes invalid memory accesses.
 |You can convince yourself by running
 |$ OUTPUT_CHARSET=UTF-7 valgrind ls --help

This is then surely bogus?  UTF-7 is a normal single byte
character set and is to be terminated like anything else.  Nothing
in RFC 2152 nor RFC 3501 if you want makes me think something
else.  (RFC 5092 "IMAP URL Scheme", which invents the sane-enough-
to-think-yourself "UTF-7 -> UTF-16 -> UCS-4 -> UTF-8 -> HEX"
conversion scheme, and reverse, even implies the opposite, the
example functions both NUL terminate the string.)
Except Mark Davis said something like "UTF-7 was a failure"
once on the Unicode ML, if i recall correctly, and i surely added
"sadly", given the Punycode mess with domain names.
But one more ship that sailed.  But a pity it is.
Why should NUL be treated differently??  No.  No, i think it is
a bug in GNU iconv that noone stumbled upon because noone is using
UTF-7.  Heck, how about that, for example:

  LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t utf-16 | od -t c
  0000000  \0  \0   a  \0   b  \0  \0  \0

Two leading NULs?

  LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t ucs-2 | od -t c
  0000000   a  \0   b  \0  \0  \0

That yes.

  LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t utf-8 | od -t c
  0000000   a   b  \0

Yes.

  LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t utf-7 | od -t c
  0000000   a   b   +   A   A   A   -

No.  Somehow they all bogus, take SunOS 5.10:

  LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t utf-16 | od -t
  0000000 376 377  \0   a  \0   b  \0  \0

Ooh, now it gets scary!!  Interestingly OpenBSD 7.1 behaves the
same, likely it is an old instance of GNU iconv thus, there it
says "GNU libiconv 1.16", here it says "iconv (GNU libc) 2.35".

So unless someone convinces me you are arguing based on buggy
software.  UTF-7 is just another 7-bit single byte character set,
and thus.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]