[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#13947: bug report for core-utils command : OD
From: |
Pádraig Brady |
Subject: |
bug#13947: bug report for core-utils command : OD |
Date: |
Wed, 13 Mar 2013 21:53:39 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 03/13/2013 09:34 PM, Eric Blake wrote:
> On 03/13/2013 02:16 PM, Marc Grondin wrote:
>> Good Afternoon,
>
> Hello, and thanks for the report.
>
>>
>> My client was attempting to run the command : od -c on this xml file (sample
>> only)
>> ------------------------------------------------------------------------------
>> <?xml version = '1.0' encoding = 'UTF-8'?>
>> <top>
>> <x>丸</x>
>
> Here, you are representing a character in UTF-8
>
>> He was getting this output :
>> ------------------------------------------------------------------------------
>> 0000000 < ? x m l v e r s i o n =
>> 0000020 ' 1 . 0 ' e n c o d i n g =
>> 0000040 ' U T F - 8 ' ? > \n < t o p >
>> 0000060 \n < x > � � � < / x > \n
>
> and here, you were running od in a different character set:
>
>> This all based on the LANG env. He was using :
>> LANG=en_US.iso88591, instead of
>> LANG=en_US.UTF-8
>
> In ISO-88591, every byte is a character, and those particular bytes
> happen to be printable, so od was faithfully replaying the character as
> printable, only to then be shown by your UTF-8 terminal as an invalid
> UTF-8 sequence. Mismatching character sets between your program and
> your terminal is always a recipe for confusion.
>
> However, you HAVE identified a bug, in our documentation.
>
>>
>> ------------------------------------------------------------------------------
>>
>> Question :
>> Since the output is based on the ASCII character set, should it not, in both
>> cases give a numerical output (as it did in scenario #2)
>> for a symbol outside the ascii/extended-ascii character set ?
>
> Our documentation is lying. Here's what POSIX says about od -c:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/od.html
> "Interpret bytes as characters specified by the current setting of the
> LC_CTYPE category. Certain non-graphic characters appear as C escapes:
> "NUL=\0" , "BS=\b" , "FF=\f" , "NL=\n" , "CR=\r" , "HT=\t" ; others
> appear as 3-digit octal numbers."
>
> Nothing in there restricts the output to ASCII only. The bytes that are
> showing up as � are graphic characters in your current choice of
> LC_CTYPE, so there is no escaping performed (since escaping is permitted
> only on non-graphic characters). If your terminal was using the same
> character set as you ran od under, you would see proper graphical
> characters in the ISO-88591 set (but then again, you wouldn't see the
> nice 丸 character that the UTF-8 was representing).
>
> Coreutils is properly obeying the locale, what is wrong is the info
> documentation which stated:
>
> `-c'
> Output as ASCII characters or backslash escapes.
I agree. Thanks for the detailed description.
> In reality, that should state something like:
> Output as characters in the current locale, using octal sequences
> or backslash escapes for all non-graphic bytes.
Note we output spaces, so I'd s/non-graphic/non-printable/.
Also multi byte is always going to be problematic displaying
in a grid like this, so we'll probably continue to do as
we do now for the utf8 example above and output octal and dots.
So therefore s/characters/single byte characters/.
>
> Meanwhile, if you want to guarantee ASCII-only output from od, you have
> to use a different format, such as -b or -tx1, or use LC_ALL=C on a
> system where the C locale does not treat non-ascii bytes as graphical
> characters (most glibc systems, including the one you are using, fit
> this bill).
>
cheers,
Pádraig.
- bug#13947: bug report for core-utils command : OD, Marc Grondin, 2013/03/13
- bug#13947: bug report for core-utils command : OD, Eric Blake, 2013/03/13
- bug#13947: bug report for core-utils command : OD,
Pádraig Brady <=
- bug#13947: bug report for core-utils command : OD, Pádraig Brady, 2013/03/22
- bug#13947: bug report for core-utils command : OD, Eric Blake, 2013/03/22
- bug#13947: bug report for core-utils command : OD, Pádraig Brady, 2013/03/22
- bug#13947: bug report for core-utils command : OD, Mark JAEGER, 2013/03/27
- bug#13947: bug report for core-utils command : OD, Eric Blake, 2013/03/27