bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#36887: coreutils-8.31: printf chokes on \u0041


From: Pádraig Brady
Subject: bug#36887: coreutils-8.31: printf chokes on \u0041
Date: Thu, 1 Aug 2019 14:09:08 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 01/08/19 12:02, Ulrich Mueller wrote:
> [Forwarding bug https://bugs.gentoo.org/680244 as requested by the
> Gentoo package maintainer.]
> 
> According to printf(1):
> 
>    Interpreted sequences are:
>    [...]
>    
>    \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)
> 
>    \UHHHHHHHH
>           Unicode character with hex value HHHHHHHH (8 digits)
> 
> It does not work, though:
> 
> $ /usr/bin/printf '\u0041\n'
> /usr/bin/printf: invalid universal character name \u0041
> $ /usr/bin/printf '\U00000041\n'
> /usr/bin/printf: invalid universal character name \U00000041
> 
> Other tools interpret the sequence correctly:
> 
> $ printf '\u0041\n'   # bash
> A
> $ echo -e '\u0041'    # bash
> A
> $ zsh -c "echo -e '\u0041'"
> A
> $ emacs -Q --batch --eval '(princ "\u0041\n")'
> A
> $ python -c "print ('\u0041')"
> A
> $ ruby -e 'print("\u0041\n")'
> A

I agree this is a bit surprising.
The full manual states:

  "Unicode characters in the ranges
  U+0000...U+009F, U+D800...U+DFFF cannot be specified by this syntax,
  except for U+0024 ($), U+0040 (@), and U+0060 (`)."

This was previously discussed at:
https://lists.gnu.org/archive/html/bug-coreutils/2008-05/threads.html#00067





reply via email to

[Prev in Thread] Current Thread [Next in Thread]