bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18273: sort seems to misbehave if both -u and -n or -k are used


From: Eric Blake
Subject: bug#18273: sort seems to misbehave if both -u and -n or -k are used
Date: Fri, 15 Aug 2014 13:48:57 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0

tag 18273 notabug
thanks

On 08/15/2014 01:30 PM, Lennart Sorensen wrote:
> Here is the case that has me thinking there is a bug (it sure doesn't
> make sense as valid behaviour).

Thanks for the report.  However, the behavior you have demonstrated is
required by POSIX, and is therefore not a bug.  The --debug option can
be used to see what is really happening.


> 
> OK output using 'sort -n':
> 
> Version: 1.0.1e-2+deb7u11
> Version: 1.0.1e-2+deb7u11
> Version: 1.0.1e-2+deb7u12
> Version: 1.0.1e-2+deb7u12
> Version: 1.0.1e-2+deb7u7
> 
> (I may have hoped that one would sort by the last number given everything
> else is equal, but I did not expect it to actually do so).

Actually, using -n without any other hints says to treat _the entire
line_ as a number, and to quit parsing as soon as a non-numeric portion
is found.  Observe:

$ LC_ALL=C sort foo --debug -n
sort: using simple byte comparison
Version: 1.0.1e-2+deb7u11
^ no match for key
_________________________
Version: 1.0.1e-2+deb7u11
^ no match for key
_________________________
Version: 1.0.1e-2+deb7u12
^ no match for key
_________________________
Version: 1.0.1e-2+deb7u12
^ no match for key
_________________________
Version: 1.0.1e-2+deb7u7
^ no match for key
________________________

Furthermore, if you disable the last-resort comparison of the entire
line, then you get the input order, since all of your keys were
identically the empty numeric string at the front of the line:

$ LC_ALL=C sort foo --debug -n -s
sort: using simple byte comparison
Version: 1.0.1e-2+deb7u12
^ no match for key
Version: 1.0.1e-2+deb7u11
^ no match for key
Version: 1.0.1e-2+deb7u12
^ no match for key
Version: 1.0.1e-2+deb7u7
^ no match for key
Version: 1.0.1e-2+deb7u11
^ no match for key

> 
> OK output using 'sort -k 3':
> 
> Version: 1.0.1e-2+deb7u11
> Version: 1.0.1e-2+deb7u11
> Version: 1.0.1e-2+deb7u12
> Version: 1.0.1e-2+deb7u12
> Version: 1.0.1e-2+deb7u7

Umm, here, you don't HAVE a key 3.  Again, as soon as you disable
last-resort comparison, you get the original input order:

$ LC_ALL=C sort foo --debug -k3 -s
sort: using simple byte comparison
Version: 1.0.1e-2+deb7u12
                         ^ no match for key
Version: 1.0.1e-2+deb7u11
                         ^ no match for key
Version: 1.0.1e-2+deb7u12
                         ^ no match for key
Version: 1.0.1e-2+deb7u7
                        ^ no match for key
Version: 1.0.1e-2+deb7u11
                         ^ no match for key

> 
> Weird output using 'sort -n -u':
> 
> Version: 1.0.1e-2+deb7u12

No, perfectly defined output.  -u implictly enables -s, and I already
demonstrated that -n on your input picks the initial empty string.
Since all 5 lines have the same sort key, there is only one unique key
seen, and the output is exactly the first line with that unique sort
key.  If you want to FORCE entire-line fallback, then request that as a
fallback key (since -n by itself is global to all keys, I instead
request two keys: the first as the numeric sort of the first field, the
second as the fallback sort of the entire line):

$ LC_ALL=C sort foo --debug -k1,1n -k1 -u
sort: using simple byte comparison
Version: 1.0.1e-2+deb7u11
^ no match for key
_________________________
Version: 1.0.1e-2+deb7u12
^ no match for key
_________________________
Version: 1.0.1e-2+deb7u7
^ no match for key
________________________


> 
> Weird output using 'sort -k 3 -u':
> 
> Version: 1.0.1e-2+deb7u12

Again, as proven above, all 5 lines have the same empty string (no such
key at the end of the line), so the unique output is correct.

> 
> So is this actually the expected behaviour?  I would have thought from
> the documentation that -u would return unique lines of output, not just
> one line based on whatever sort key it happened to look at.

Yes, sort -u is required to treat lines as unique solely based on the
key(s) they were sorted by (and ignoring the default last-resort key,
since -u implicitly disables -s).

As this behavior is required by POSIX and consistent with other
implementations, I'm closing it as not a bug.  But if you have further
comments or questions, you can continue to reply to this email.

By the way, have you looked at sort -V, as a way to get what you appear
to want?

$ LC_ALL=C sort foo --debug -V -u
sort: using simple byte comparison
Version: 1.0.1e-2+deb7u7
________________________
Version: 1.0.1e-2+deb7u11
_________________________
Version: 1.0.1e-2+deb7u12
_________________________

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]