bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17196: UTF-8 printf string formating problem


From: Pádraig Brady
Subject: bug#17196: UTF-8 printf string formating problem
Date: Sun, 06 Apr 2014 11:15:46 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 04/06/2014 12:17 AM, Jan Novak wrote:
> Hello,
> 
> printf string format counts bytes instead of chars, which leads to broken 
> output ...
> (the same problem occurs with bash built in printf)
> 
> 
> just try this:
> 
> $ echo $LANG
> us_US.UTF-8
> 
> 
> $ printf "|%3s|\n" "a"
> |  a|
> 
> $ printf "|%3s|\n" "á"     (char is a-acute)
> | á|
> 
> expected output:
> |  á|
> 
> Is there some easy solution ?
> 
> TIA for the answer

Yes printf follows the C standard which only considers bytes.
awk does respect characters in width specifiers though:

  $ awk 'BEGIN{printf "|%3s|\n", "á"}'
  |  á|

I don't think we'd be able to change the current operation of printf
due to backwards compat reasons? Though we might be able to somehow leverage
the existing multibyte character aware alignment/truncation code in:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=gl/lib/mbsalign.c;hb=HEAD

thanks,
Pádraig.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]