bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments


From: Harald van Dijk
Subject: Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments
Date: Fri, 13 May 2022 09:05:25 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Thunderbird/100.0

On 12/05/2022 23:10, Steffen Nurpmeso wrote:
Harald van Dijk wrote in
  <bd336669-960b-1f5f-fffc-30905d4c8e82@gigawatt.nl>:
  |On 12/05/2022 18:19, Steffen Nurpmeso via austin-group-l at The Open
  |Group wrote:
  |> Bruno Haible wrote in
  |>   <4298913.vrqWZg68TM@omega>:
  |>|Steffen Nurpmeso wrote:
  |>|>  ...
  |>|>| [.] "UTF-7"."
  |>|>
  |>|> That is overshoot.
  |>|
  |>|No. UTF-7 is invalid here because it produces output that is not NUL
  |>|terminated. See:
  |>|
  |>|$ printf 'ab\0' | iconv -t UTF-7 | od -t c
  |>|0000000   a   b   +   A   A   A   -
  |>|0000007
  |>|
  |>|strlen() on such a return value makes invalid memory accesses.
  |>|You can convince yourself by running
  |>|$ OUTPUT_CHARSET=UTF-7 valgrind ls --help
  |>
  |> This is then surely bogus?  UTF-7 is a normal single byte
  |> character set and is to be terminated like anything else.  Nothing
  |> in RFC 2152 nor RFC 3501 if you want makes me think something
  |> else.
  |
  |RFC 2152's rules 1 and 3 only allow specifying the listed characters as
  |their ASCII form. All other characters, including U+0000, must be
  |encoded using rule 2. GNU iconv is doing what the RFC specifies here.

No really, please.  And please do not strip important content,

I didn't think I did. You didn't read the RFC properly, I replied to show where and how the RFC specifies exactly what GNU iconv does, the rest of your message looks like it's based on the false assumption that the RFC specifies something other than what it does, which becomes irrelevant when that assumption is corrected. Looking in more detail, there is one thing I should have responded to. Included here.

UTF-7.  Heck, how about that, for example:

  LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t utf-16 | od -t c
  0000000  \0  \0   a  \0   b  \0  \0  \0

Two leading NULs?

This is not what GNU iconv prints at all, at least not on my system, which just uses the GNU version unmodified. Rather, it prints

0000000 377 376   a  \0   b  \0  \0  \0
0000010

That is, it includes a BOM, just like it showed in your SunOS output. Both the GNU iconv that is shipped as part of GNU libc 2.35, and the GNU iconv that is shipped as part of GNU libiconv 1.16, print this. Those are the current releases. If you are testing an older release, or a modified version, that is important information missing from your message. If you are seeing the leading null bytes in a current version, you may want to report this, including steps on how to get a GNU iconv that behaves this way.

i am neither Chinese nor Russian, and especially not one of the
other 7 billion that do not count.
(I said surely bogus because i alone see the shiny light of having
found give-me-five GNU iconv errors.  Or even beyond that.)

This makes absolutely zero sense. I am including it only to pre-empt you again claiming I am stripping important content.

Cheers,
Harald van Dijk



reply via email to

[Prev in Thread] Current Thread [Next in Thread]