[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
builtin printf behaves incorrectly with "c and 'c character-value argume
From: |
Rich Felker |
Subject: |
builtin printf behaves incorrectly with "c and 'c character-value arguments |
Date: |
Thu, 1 Nov 2007 05:25:53 -0400 |
User-agent: |
Mutt/1.4.2.2i |
$ printf %d\\n \'À
-61
(expected 192)
This should be 192 regardless of locale on any system where wchar_t
values are ISO-10646/Unicode. Bash is incorrectly reading the first
byte of the UTF-8 which happens to be -61 when interpreted as signed
char; on a Latin-1 based locale it will probably give -63 instead.
Both POSIX and common sense are clear that the numeric values
resulting from 'c should be the wchar_t value of c and not the value
of the first byte of the multibyte character; from the SUSv3 printf(1)
documentation:
Note that in a locale with multi-byte characters, the value of a
character is intended to be the value of the equivalent of the
wchar_t representation of the character as described in the
System Interfaces volume of IEEE Std 1003.1-2001.
Language lawyers could argue that on 'single-byte' locales perhaps the
byte value should be used; however, strictly speaking a single-byte
locale is simply a special case of a multi-byte one, and sanity should
win in any case.
Fixing the issue should be easy; asciicode() in builtins/printf.def
simply needs to be changed to decode the character with mbrtowc rather
than reading the byte (and perhaps also should be renamed...).
Rich
- builtin printf behaves incorrectly with "c and 'c character-value arguments,
Rich Felker <=