bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22109: Sort gives incorrect order when changing delimiters


From: Assaf Gordon
Subject: bug#22109: Sort gives incorrect order when changing delimiters
Date: Mon, 7 Dec 2015 11:49:39 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

tag 22109 notabug
close 22109
stop

Hello Ed,

On 12/07/2015 10:36 AM, Ed Brambley wrote:
The following problem came to light following a StackOverflow question [1]. The 
lexical ordering of sort appears to depend on the delimiter used, and I believe 
it shouldn't. As a minimal example:

### Correct ordering ###
$ printf "1,a,1\n2,aa,2" | LC_ALL=C sort -k2 -t,
1,a,1
2,aa,2

### Incorrect ordering by replacing the "," delimiter by "~" ###
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2 -t~
2~aa~2
1~a~1


This is not a bug in 'sort', but simply an incorrect usage of the key options.

The parameter "-k2" means: use the second key *and all characters until the end 
of the line* to sort each line.
In this case, the character after the second key ',' or '~' does come into play.

The correct usage is to specify the key as "-k2,2" meaning: sort by the second 
key alone (then resolve equal keys by the entire line, unless --stable is used).

    $ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2,2 -t~
    1~a~1
    2~aa~2


Using sort's "--debug" option will illustrate the difference (notice the 
underscore characters indicating what is the key that is being used):

Incorrect usage (-k2):

    $ printf "1~a~1\n2~aa~2" | LC_ALL=C sort --debug -k2 -t~
    sort: using simple byte comparison
    2~aa~2
      ____
    ______
    1~a~1
      ___
    _____


Better usage (-k2,2):

    $ printf "1~a~1\n2~aa~2" | LC_ALL=C sort --debug -k2,2 -t~
    sort: using simple byte comparison
    1~a~1
      _
    _____
    2~aa~2
      __
    ______




regards,
 - assaf






reply via email to

[Prev in Thread] Current Thread [Next in Thread]