[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#36887: coreutils-8.31: printf chokes on \u0041
From: |
Pádraig Brady |
Subject: |
bug#36887: coreutils-8.31: printf chokes on \u0041 |
Date: |
Thu, 1 Aug 2019 14:09:08 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 |
On 01/08/19 12:02, Ulrich Mueller wrote:
> [Forwarding bug https://bugs.gentoo.org/680244 as requested by the
> Gentoo package maintainer.]
>
> According to printf(1):
>
> Interpreted sequences are:
> [...]
>
> \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)
>
> \UHHHHHHHH
> Unicode character with hex value HHHHHHHH (8 digits)
>
> It does not work, though:
>
> $ /usr/bin/printf '\u0041\n'
> /usr/bin/printf: invalid universal character name \u0041
> $ /usr/bin/printf '\U00000041\n'
> /usr/bin/printf: invalid universal character name \U00000041
>
> Other tools interpret the sequence correctly:
>
> $ printf '\u0041\n' # bash
> A
> $ echo -e '\u0041' # bash
> A
> $ zsh -c "echo -e '\u0041'"
> A
> $ emacs -Q --batch --eval '(princ "\u0041\n")'
> A
> $ python -c "print ('\u0041')"
> A
> $ ruby -e 'print("\u0041\n")'
> A
I agree this is a bit surprising.
The full manual states:
"Unicode characters in the ranges
U+0000...U+009F, U+D800...U+DFFF cannot be specified by this syntax,
except for U+0024 ($), U+0040 (@), and U+0060 (`)."
This was previously discussed at:
https://lists.gnu.org/archive/html/bug-coreutils/2008-05/threads.html#00067