bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sort order bug in GNU sort


From: Luke Hutchison
Subject: Re: Sort order bug in GNU sort
Date: Thu, 29 Oct 2009 20:43:39 -0400

Hi Pádraig,
As stated, "The following is the output of GNU sort (without any
switches)" -- i.e. I used the defaults, and did not specify any
commandline switches.  If as you say, by default the whole line is the
sort key, and if default sorting is lexicographic order, how are the
following snippets from the sorted output possibly correct?

sampleId-1010,0.0625
sampleId-101,0.0625
sampleId-1010,1.0

sampleId-980,1.0
sampleId-98,1.0
sampleId-981,0.0625

sampleId-990,1.0
sampleId-99,1.0
sampleId-991,0.25

Based on ASCII encoding (',' < '0' < '1'), I believe these should be:

sampleId-101,0.0625
sampleId-1010,0.0625
sampleId-1010,1.0

sampleId-98,1.0
sampleId-980,1.0
sampleId-981,0.0625

sampleId-99,1.0
sampleId-990,1.0
sampleId-991,0.25

Even if in some weird locale, ',' > '0', or some other weird thing
were true, the two lines "sampleId-1010,0.0625" and
"sampleId-1010,1.0" should be grouped together either before or after
"sampleId-101,0.0625", because they share a common prefix
"sampleId-1010" -- but they are separated.  Similarly,
"sampleId-990,1.0" and "sampleId-991,0.25" absolutely should not be
separated by "sampleId-99,1.0", because there is no way in any locale
that '0' < ',' < '1'.

I was led to think that sorting happened field-wise (not line-wise) by
default by the man page, which says, "-t , --field-separator=SEP : use
SEP instead of non-blank to blank transition".  It would be helpful to
explicitly add to the description of "-k" that "If no key is given,
the whole line is used as the key".

Thanks,
Luke


2009/10/29 Pádraig Brady <address@hidden>
>
> Luke Hutchison wrote:
> > Hi,
> >
> > The following is the output of GNU sort (without any switches) on an
> > unsorted file.  Numerous errors (of the same variety) seem present in the
> > ordering.  I am using coreutils-7.2-4.fc11.x86_64.  Problems are shown in
> > red.
>
> You need to specify the sort command you used.
> Does this sort your data correctly?
>
> sort -t, -k1,1V
>
> > Additionally, there probably needs to be a switch added to sort that uses
> > the entire line as the sort key,
>
> It does that by default
>
> > not blank-to-non-blank transition
>
> Note also the 'b' option.
>
> cheers,
> Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]