bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9562: unexpected sort behaviour


From: Eric Blake
Subject: bug#9562: unexpected sort behaviour
Date: Tue, 20 Sep 2011 10:26:58 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.22) Gecko/20110906 Fedora/3.1.14-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.14

force-merge 9562 9561
tag 9562 notabug
thanks

On 09/20/2011 05:51 AM, vijay krishna wrote:
Hello Team,

   May I please know the reason for the following behaviour of the sort
command...


Thanks for the report; however, this is not a bug. As mentioned in the FAQ, you are encountering this behavior because of your choice of locale:

https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

0 $ sort -k 1 bug2_file1
b101 512
b1 512
------------------

sort (GNU coreutils) 5.97

Newer sort also comes with a --debug option that would help explain your predicament (5.97 is YEARS old; the latest is 8.13, with numerous bug fixes, although none of the behavior you show is affected by any of those bug fixes).

$ printf 'b101 512\nb1 512\n' | LC_ALL=C sort -k1 --debug
sort: using simple byte comparison
b1 512
______
______
b101 512
________
________

$ printf 'b101 512\nb1 512\n' | sort -k1 --debug
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
b101 512
________
________
b1 512
______
______
$ printf 'b101 512\nb1 512\n' | sort -k1,1 --debug
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
b1 512
__
______
b101 512
____
________


In the en_US.UTF-8 locale, collation is done by dictionary ordering, where whitespace is insignificant to the collation; and specification of -k1 instead of the more precise k1,1 means that you are sorting the entire line instead of the first field of the line. Since "b1512" collates greater than "b101512" in en_US collation rules, the same applies to "b1 512" and "b101 512". Notice how use of -k1,1 changed the output by comparing only "b1" and "b101", or how use of LC_ALL=C changed the output by switching to bytewise collation with no ditionary sorting, where space becomes significant.

--
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org





reply via email to

[Prev in Thread] Current Thread [Next in Thread]