[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#9562: unexpected sort behaviour
From: |
Eric Blake |
Subject: |
bug#9562: unexpected sort behaviour |
Date: |
Tue, 20 Sep 2011 10:26:58 -0600 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.22) Gecko/20110906 Fedora/3.1.14-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.14 |
force-merge 9562 9561
tag 9562 notabug
thanks
On 09/20/2011 05:51 AM, vijay krishna wrote:
Hello Team,
May I please know the reason for the following behaviour of the sort
command...
Thanks for the report; however, this is not a bug. As mentioned in the
FAQ, you are encountering this behavior because of your choice of locale:
https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
0 $ sort -k 1 bug2_file1
b101 512
b1 512
------------------
sort (GNU coreutils) 5.97
Newer sort also comes with a --debug option that would help explain your
predicament (5.97 is YEARS old; the latest is 8.13, with numerous bug
fixes, although none of the behavior you show is affected by any of
those bug fixes).
$ printf 'b101 512\nb1 512\n' | LC_ALL=C sort -k1 --debug
sort: using simple byte comparison
b1 512
______
______
b101 512
________
________
$ printf 'b101 512\nb1 512\n' | sort -k1 --debug
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
b101 512
________
________
b1 512
______
______
$ printf 'b101 512\nb1 512\n' | sort -k1,1 --debug
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
b1 512
__
______
b101 512
____
________
In the en_US.UTF-8 locale, collation is done by dictionary ordering,
where whitespace is insignificant to the collation; and specification of
-k1 instead of the more precise k1,1 means that you are sorting the
entire line instead of the first field of the line. Since "b1512"
collates greater than "b101512" in en_US collation rules, the same
applies to "b1 512" and "b101 512". Notice how use of -k1,1 changed the
output by comparing only "b1" and "b101", or how use of LC_ALL=C changed
the output by switching to bytewise collation with no ditionary sorting,
where space becomes significant.
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org