[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#22109: Sort gives incorrect order when changing delimiters
From: |
Assaf Gordon |
Subject: |
bug#22109: Sort gives incorrect order when changing delimiters |
Date: |
Mon, 7 Dec 2015 11:49:39 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 |
tag 22109 notabug
close 22109
stop
Hello Ed,
On 12/07/2015 10:36 AM, Ed Brambley wrote:
The following problem came to light following a StackOverflow question [1]. The
lexical ordering of sort appears to depend on the delimiter used, and I believe
it shouldn't. As a minimal example:
### Correct ordering ###
$ printf "1,a,1\n2,aa,2" | LC_ALL=C sort -k2 -t,
1,a,1
2,aa,2
### Incorrect ordering by replacing the "," delimiter by "~" ###
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2 -t~
2~aa~2
1~a~1
This is not a bug in 'sort', but simply an incorrect usage of the key options.
The parameter "-k2" means: use the second key *and all characters until the end
of the line* to sort each line.
In this case, the character after the second key ',' or '~' does come into play.
The correct usage is to specify the key as "-k2,2" meaning: sort by the second
key alone (then resolve equal keys by the entire line, unless --stable is used).
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2,2 -t~
1~a~1
2~aa~2
Using sort's "--debug" option will illustrate the difference (notice the
underscore characters indicating what is the key that is being used):
Incorrect usage (-k2):
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort --debug -k2 -t~
sort: using simple byte comparison
2~aa~2
____
______
1~a~1
___
_____
Better usage (-k2,2):
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort --debug -k2,2 -t~
sort: using simple byte comparison
1~a~1
_
_____
2~aa~2
__
______
regards,
- assaf