bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: /usr/bin/printf: invalid universal character name


From: Hermann Peifer
Subject: Re: /usr/bin/printf: invalid universal character name
Date: Sun, 11 May 2008 17:08:16 +0200
User-agent: Thunderbird 2.0.0.12 (X11/20080227)

Jim wrote:
Hermann Peifer <address@hidden> wrote:
printf  \uHHHH  is expected to print Unicode chars. This work fine in
most cases, but  some legal code points are reported as errors: values
in the ASCII range and C1 control chars, and values between
U+D800..U+DFFF

I would say that this behaviour is rather a bug than a feature.

Thanks for the report, but this is not some arbitrary restriction,
but rather conformance to the standard (C99, ISO/IEC 10646) for
"universal character name" syntax:

  http://www.open-std.org/jtc1/sc22/wg14/www/docs/n717.htm

Here's part of printf.c, with a comment that probably came from
a version of N717:

      /* A universal character name shall not specify a character short
         identifier in the range 00000000 through 00000020, 0000007F through
         0000009F, or 0000D800 through 0000DFFF inclusive. A universal
         character name shall not designate a character in the required
         character set.  */
      if ((uni_value <= 0x9f
           && uni_value != 0x24 && uni_value != 0x40 && uni_value != 0x60)
          || (uni_value >= 0xd800 && uni_value <= 0xdfff))
        error (EXIT_FAILURE, 0, _("invalid universal character name \\%c%0*x"),
               esc_char, (esc_char == 'u' ? 4 : 8), uni_value);

/usr/bin/printf: invalid universal character name \u0000
/usr/bin/printf: invalid universal character name \u0001
...

I can understand that you'd find the restriction surprising,
but I wouldn't call it a bug.
Thanks for your swift reply. (BTW: are mails to address@hidden not copied to gnu.utils.bug?)

I do acknowledge that C0 and C1 control chars are some sort of a border case. It is true that the Unicode standard does not assign *normative names* for them but rather adds the placeholder "<control>" as a dummy name (btw, this was different in earlier versions of Unicode). However, all C0 and C1 *code points* are at least included in:

http://www.unicode.org/charts/PDF/U0000.pdf
http://www.unicode.org/charts/PDF/U0080.pdf
http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt

And I didn't expect /usr/bin/printf to worry about normative or non-normative names of Unicode chars, but rather print the chars themselves.

If we let the control chars question aside, it is still hard to believe that it is not a bug that almost all ASCII chars 0020..007e lead to EXIT_FAILURE. This rule is more than peculiar, to say the least and it is also inconsistent with its own comment:

     if ((uni_value <= 0x9f
           && uni_value != 0x24 && uni_value != 0x40 && uni_value != 0x60)


Only DOLLAR SIGN, COMMERCIAL AT and GRAVE ACCENT are legal in the range 0x00..0x9f ?

I still think that these 92 cases are bugs, rather than anything else:

/usr/bin/printf: invalid universal character name \u0020
/usr/bin/printf: invalid universal character name \u0021
/usr/bin/printf: invalid universal character name \u0022
/usr/bin/printf: invalid universal character name \u0023
/usr/bin/printf: invalid universal character name \u0025
/usr/bin/printf: invalid universal character name \u0026
/usr/bin/printf: invalid universal character name \u0027
/usr/bin/printf: invalid universal character name \u0028
/usr/bin/printf: invalid universal character name \u0029
/usr/bin/printf: invalid universal character name \u002a
/usr/bin/printf: invalid universal character name \u002b
/usr/bin/printf: invalid universal character name \u002c
/usr/bin/printf: invalid universal character name \u002d
/usr/bin/printf: invalid universal character name \u002e
/usr/bin/printf: invalid universal character name \u002f
/usr/bin/printf: invalid universal character name \u0030
/usr/bin/printf: invalid universal character name \u0031
/usr/bin/printf: invalid universal character name \u0032
/usr/bin/printf: invalid universal character name \u0033
/usr/bin/printf: invalid universal character name \u0034
/usr/bin/printf: invalid universal character name \u0035
/usr/bin/printf: invalid universal character name \u0036
/usr/bin/printf: invalid universal character name \u0037
/usr/bin/printf: invalid universal character name \u0038
/usr/bin/printf: invalid universal character name \u0039
/usr/bin/printf: invalid universal character name \u003a
/usr/bin/printf: invalid universal character name \u003b
/usr/bin/printf: invalid universal character name \u003c
/usr/bin/printf: invalid universal character name \u003d
/usr/bin/printf: invalid universal character name \u003e
/usr/bin/printf: invalid universal character name \u003f
/usr/bin/printf: invalid universal character name \u0041
/usr/bin/printf: invalid universal character name \u0042
/usr/bin/printf: invalid universal character name \u0043
/usr/bin/printf: invalid universal character name \u0044
/usr/bin/printf: invalid universal character name \u0045
/usr/bin/printf: invalid universal character name \u0046
/usr/bin/printf: invalid universal character name \u0047
/usr/bin/printf: invalid universal character name \u0048
/usr/bin/printf: invalid universal character name \u0049
/usr/bin/printf: invalid universal character name \u004a
/usr/bin/printf: invalid universal character name \u004b
/usr/bin/printf: invalid universal character name \u004c
/usr/bin/printf: invalid universal character name \u004d
/usr/bin/printf: invalid universal character name \u004e
/usr/bin/printf: invalid universal character name \u004f
/usr/bin/printf: invalid universal character name \u0050
/usr/bin/printf: invalid universal character name \u0051
/usr/bin/printf: invalid universal character name \u0052
/usr/bin/printf: invalid universal character name \u0053
/usr/bin/printf: invalid universal character name \u0054
/usr/bin/printf: invalid universal character name \u0055
/usr/bin/printf: invalid universal character name \u0056
/usr/bin/printf: invalid universal character name \u0057
/usr/bin/printf: invalid universal character name \u0058
/usr/bin/printf: invalid universal character name \u0059
/usr/bin/printf: invalid universal character name \u005a
/usr/bin/printf: invalid universal character name \u005b
/usr/bin/printf: invalid universal character name \u005c
/usr/bin/printf: invalid universal character name \u005d
/usr/bin/printf: invalid universal character name \u005e
/usr/bin/printf: invalid universal character name \u005f
/usr/bin/printf: invalid universal character name \u0061
/usr/bin/printf: invalid universal character name \u0062
/usr/bin/printf: invalid universal character name \u0063
/usr/bin/printf: invalid universal character name \u0064
/usr/bin/printf: invalid universal character name \u0065
/usr/bin/printf: invalid universal character name \u0066
/usr/bin/printf: invalid universal character name \u0067
/usr/bin/printf: invalid universal character name \u0068
/usr/bin/printf: invalid universal character name \u0069
/usr/bin/printf: invalid universal character name \u006a
/usr/bin/printf: invalid universal character name \u006b
/usr/bin/printf: invalid universal character name \u006c
/usr/bin/printf: invalid universal character name \u006d
/usr/bin/printf: invalid universal character name \u006e
/usr/bin/printf: invalid universal character name \u006f
/usr/bin/printf: invalid universal character name \u0070
/usr/bin/printf: invalid universal character name \u0071
/usr/bin/printf: invalid universal character name \u0072
/usr/bin/printf: invalid universal character name \u0073
/usr/bin/printf: invalid universal character name \u0074
/usr/bin/printf: invalid universal character name \u0075
/usr/bin/printf: invalid universal character name \u0076
/usr/bin/printf: invalid universal character name \u0077
/usr/bin/printf: invalid universal character name \u0078
/usr/bin/printf: invalid universal character name \u0079
/usr/bin/printf: invalid universal character name \u007a
/usr/bin/printf: invalid universal character name \u007b
/usr/bin/printf: invalid universal character name \u007c
/usr/bin/printf: invalid universal character name \u007d
/usr/bin/printf: invalid universal character name \u007e

Regards, Hermann




reply via email to

[Prev in Thread] Current Thread [Next in Thread]