--- Begin Message ---
Subject: |
bug report for core-utils command : OD |
Date: |
Wed, 13 Mar 2013 13:16:16 -0700 (PDT) |
Good Afternoon,
My client was attempting to run the command : od -c on this xml file (sample
only)
------------------------------------------------------------------------------
<?xml version = '1.0' encoding = 'UTF-8'?>
<top>
<x>丸</x>
<y>丸</y>
<z>𠄌</z>
<x>?</x>
<x>?</x>
<x>?丸</x>
<x>??丸</x>
</top>
------------------------------------------------------------------------------
note : this system is a : 2.6.18-164.0.0.0.1.el5xen #1 SMP Thu Sep 3 00:34:43
EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
He was getting this output :
------------------------------------------------------------------------------
0000000 < ? x m l v e r s i o n =
0000020 ' 1 . 0 ' e n c o d i n g =
0000040 ' U T F - 8 ' ? > \n < t o p >
0000060 \n < x > � � � < / x > \n
0000100 < y > � � � 201 < / y > \n
0000120 < z > � � 204 214 < / z > \n
0000140 < x > ? < / x > \n < x > ?
0000160 < / x > \n < x > ? � � � 201
0000200 < / x > \n < x > ? ? � � �
0000220 201 < / x > \n < / t o p > \n
------------------------------------------------------------------------------
Instead of this :
------------------------------------------------------------------------------
000000 < ? x m l v e r s i o n =
0000020 ' 1 . 0 ' e n c o d i n g =
0000040 ' U T F - 8 ' ? > \n < t o p >
0000060 \n < x > 344 270 270 < / x > \n
0000100 < y > 360 257 240 201 < / y > \n
0000120 < z > 360 240 204 214 < / z > \n
0000140 < x > ? < / x > \n < x > ?
0000160 < / x > \n < x > ? 360 257 240 201
0000200 < / x > \n < x > ? ? 360 257 240
0000220 201 < / x > \n < / t o p > \n
0000235
------------------------------------------------------------------------------
This all based on the LANG env. He was using :
LANG=en_US.iso88591, instead of
LANG=en_US.UTF-8
------------------------------------------------------------------------------
Question :
Since the output is based on the ASCII character set, should it not, in both
cases give a numerical output (as it did in scenario #2)
for a symbol outside the ascii/extended-ascii character set ?
------------------------------------------------------------------------------
Regards,
Marc Grondin,
__________________________________
Oracle - Quebec city, Qc.
Senior System Administrator, PDIT
---------------------------------
400-330 St-Vallier, G1K 9C5
418.524.5665 # 1256
=================================
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#13947: bug report for core-utils command : OD |
Date: |
Fri, 22 Mar 2013 15:45:49 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 03/13/2013 09:53 PM, Pádraig Brady wrote:
> On 03/13/2013 09:34 PM, Eric Blake wrote:
>> In reality, that should state something like:
>
>> Output as characters in the current locale, using octal sequences
>> or backslash escapes for all non-graphic bytes.
>
> Note we output spaces, so I'd s/non-graphic/non-printable/.
>
> Also multi byte is always going to be problematic displaying
> in a grid like this, so we'll probably continue to do as
> we do now for the utf8 example above and output octal and dots.
> So therefore s/characters/single byte characters/.
Hopefully the attached clarifies things.
thanks,
Pádraig.
od-printable.patch
Description: Text Data
--- End Message ---