[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: /usr/bin/printf: invalid universal character name
From: |
Bruno Haible |
Subject: |
Re: /usr/bin/printf: invalid universal character name |
Date: |
Thu, 15 May 2008 00:21:56 +0200 |
User-agent: |
KMail/1.5.4 |
Jim Meyering wrote:
> Paul Eggert added this feature 8 years ago
Well, all honours to Paul, but this feature I did submit to you on 2000-02-02.
> I don't know the motivation for those exceptions.
The motivation is that the ISO C 99 standard has these exceptions:
ISO C 99, 6.4.3(2):
"Constraints
A universal character name shall not specify a character whose short
identifier is less than 00A0 other than 0024 ($), 0040 (@), or 0060 (`),
nor one in the range D800 through DFFF inclusive."
and I find it undesirable to have different variants of the same concept in
different tools. For example, the hexadecimal escape syntax is different:
- In C, Awk, Emacs Lisp, it accepts any number of hexadecimal digits.
- In sh, PHP, Python, Perl, it accepts up to 2 hexadecimal digits.
- In C#, it accepts up to 4 hexadecimal digits.
Similarly, the octal escape syntax is different:
- In C, Awk, Emacs Lisp, it accepts up to 3 octal digits,
- In Perl, likewise, but values between \400 and \777 are valid.
It causes headaches to the programmers, for no real benefit.
The motivation for those exceptions in C are probably to avoid discussing
weird cases like
char foo[] = "abc\u000Adef"; // newline in string - allowed or not?
char bar[] = "abc\\u00789A"; // hexadecimal escape or not?
char mph[] = "abc\u0022"; // valid or not?
char mph[] = "abc\\u0022"; // abc\u0022 or abc" ?
and - to a letter extent - to allow faster parsing. A parser that needs to
interpret
\u0023include \u0022stdio.h"
is certainly slower than a parser that can reject this input.
Hermann Peifer wrote:
> Only DOLLAR SIGN, COMMERCIAL AT and GRAVE ACCENT are legal in the
> range 0x00..0x9f ?
>
> I still think that these 92 cases are bugs, rather than anything else:
You are entitled to your opinion. So that you cannot call it "bugs" any more,
I propose to make the restriction explicit in the coreutils manual:
2008-05-14 Bruno Haible <address@hidden>
* doc/coreutils.texi (printf invocation): Clarify invalid ranges for
Unicode character escape syntax.
--- coreutils.texi.bak 2008-03-14 01:48:04.000000000 +0100
+++ coreutils.texi 2008-05-15 00:18:50.000000000 +0200
@@ -10305,7 +10305,9 @@
four hexadecimal digits @var{hhhh}, and @samp{\U} for 32-bit Unicode
characters, specified as eight hexadecimal digits @var{hhhhhhhh}.
@command{printf} outputs the Unicode characters
-according to the @env{LC_CTYPE} locale.
+according to the @env{LC_CTYPE} locale. Unicode characters in the ranges
+U+0000...U+009F, U+D800...U+DFFF cannot be specified by this syntax, except
+for U+0024 ($), U+0040 (@@), and U+0060 (@`).
The processing of @samp{\u} and @samp{\U} requires a full-featured
@code{iconv} facility. It is activated on systems with glibc 2.2 (or newer),
- /usr/bin/printf: invalid universal character name, Hermann Peifer, 2008/05/11
- Re: /usr/bin/printf: invalid universal character name, Jim Meyering, 2008/05/11
- Re: /usr/bin/printf: invalid universal character name, Hermann Peifer, 2008/05/11
- Re: /usr/bin/printf: invalid universal character name, Jim Meyering, 2008/05/11
- Re: /usr/bin/printf: invalid universal character name, Hermann Peifer, 2008/05/11
- Re: /usr/bin/printf: invalid universal character name, Hermann Peifer, 2008/05/14
- Re: /usr/bin/printf: invalid universal character name, Jim Meyering, 2008/05/14
- Re: /usr/bin/printf: invalid universal character name, Jim Meyering, 2008/05/15
- Re: /usr/bin/printf: invalid universal character name,
Bruno Haible <=
- Re: /usr/bin/printf: invalid universal character name, Jim Meyering, 2008/05/15
- Re: /usr/bin/printf: invalid universal character name, Hermann Peifer, 2008/05/15
- Re: /usr/bin/printf: invalid universal character name, Jim Meyering, 2008/05/15
- Re: /usr/bin/printf: invalid universal character name, Bruno Haible, 2008/05/15
- Re: /usr/bin/printf: invalid universal character name, Hermann Peifer, 2008/05/15