emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#18273: closed (sort seems to misbehave if both -u


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#18273: closed (sort seems to misbehave if both -u and -n or -k are used)
Date: Fri, 15 Aug 2014 19:50:03 +0000

Your message dated Fri, 15 Aug 2014 13:48:57 -0600
with message-id <address@hidden>
and subject line Re: bug#18273: sort seems to misbehave if both -u and -n or -k 
are used
has caused the debbugs.gnu.org bug report #18273,
regarding sort seems to misbehave if both -u and -n or -k are used
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
18273: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18273
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: sort seems to misbehave if both -u and -n or -k are used Date: Fri, 15 Aug 2014 15:30:11 -0400 User-agent: Mutt/1.5.21 (2010-09-15)
Here is the case that has me thinking there is a bug (it sure doesn't
make sense as valid behaviour).

input:

Version: 1.0.1e-2+deb7u12
Version: 1.0.1e-2+deb7u11
Version: 1.0.1e-2+deb7u12
Version: 1.0.1e-2+deb7u7
Version: 1.0.1e-2+deb7u11

OK output using 'sort':

Version: 1.0.1e-2+deb7u11
Version: 1.0.1e-2+deb7u11
Version: 1.0.1e-2+deb7u12
Version: 1.0.1e-2+deb7u12
Version: 1.0.1e-2+deb7u7

OK output using 'sort -u':

Version: 1.0.1e-2+deb7u11
Version: 1.0.1e-2+deb7u12
Version: 1.0.1e-2+deb7u7

OK output using 'sort -n':

Version: 1.0.1e-2+deb7u11
Version: 1.0.1e-2+deb7u11
Version: 1.0.1e-2+deb7u12
Version: 1.0.1e-2+deb7u12
Version: 1.0.1e-2+deb7u7

(I may have hoped that one would sort by the last number given everything
else is equal, but I did not expect it to actually do so).

OK output using 'sort -k 3':

Version: 1.0.1e-2+deb7u11
Version: 1.0.1e-2+deb7u11
Version: 1.0.1e-2+deb7u12
Version: 1.0.1e-2+deb7u12
Version: 1.0.1e-2+deb7u7

Weird output using 'sort -n -u':

Version: 1.0.1e-2+deb7u12

Weird output using 'sort -k 3 -u':

Version: 1.0.1e-2+deb7u12

So is this actually the expected behaviour?  I would have thought from
the documentation that -u would return unique lines of output, not just
one line based on whatever sort key it happened to look at.

-- 
Len Sorensen



--- End Message ---
--- Begin Message --- Subject: Re: bug#18273: sort seems to misbehave if both -u and -n or -k are used Date: Fri, 15 Aug 2014 13:48:57 -0600 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0
tag 18273 notabug
thanks

On 08/15/2014 01:30 PM, Lennart Sorensen wrote:
> Here is the case that has me thinking there is a bug (it sure doesn't
> make sense as valid behaviour).

Thanks for the report.  However, the behavior you have demonstrated is
required by POSIX, and is therefore not a bug.  The --debug option can
be used to see what is really happening.


> 
> OK output using 'sort -n':
> 
> Version: 1.0.1e-2+deb7u11
> Version: 1.0.1e-2+deb7u11
> Version: 1.0.1e-2+deb7u12
> Version: 1.0.1e-2+deb7u12
> Version: 1.0.1e-2+deb7u7
> 
> (I may have hoped that one would sort by the last number given everything
> else is equal, but I did not expect it to actually do so).

Actually, using -n without any other hints says to treat _the entire
line_ as a number, and to quit parsing as soon as a non-numeric portion
is found.  Observe:

$ LC_ALL=C sort foo --debug -n
sort: using simple byte comparison
Version: 1.0.1e-2+deb7u11
^ no match for key
_________________________
Version: 1.0.1e-2+deb7u11
^ no match for key
_________________________
Version: 1.0.1e-2+deb7u12
^ no match for key
_________________________
Version: 1.0.1e-2+deb7u12
^ no match for key
_________________________
Version: 1.0.1e-2+deb7u7
^ no match for key
________________________

Furthermore, if you disable the last-resort comparison of the entire
line, then you get the input order, since all of your keys were
identically the empty numeric string at the front of the line:

$ LC_ALL=C sort foo --debug -n -s
sort: using simple byte comparison
Version: 1.0.1e-2+deb7u12
^ no match for key
Version: 1.0.1e-2+deb7u11
^ no match for key
Version: 1.0.1e-2+deb7u12
^ no match for key
Version: 1.0.1e-2+deb7u7
^ no match for key
Version: 1.0.1e-2+deb7u11
^ no match for key

> 
> OK output using 'sort -k 3':
> 
> Version: 1.0.1e-2+deb7u11
> Version: 1.0.1e-2+deb7u11
> Version: 1.0.1e-2+deb7u12
> Version: 1.0.1e-2+deb7u12
> Version: 1.0.1e-2+deb7u7

Umm, here, you don't HAVE a key 3.  Again, as soon as you disable
last-resort comparison, you get the original input order:

$ LC_ALL=C sort foo --debug -k3 -s
sort: using simple byte comparison
Version: 1.0.1e-2+deb7u12
                         ^ no match for key
Version: 1.0.1e-2+deb7u11
                         ^ no match for key
Version: 1.0.1e-2+deb7u12
                         ^ no match for key
Version: 1.0.1e-2+deb7u7
                        ^ no match for key
Version: 1.0.1e-2+deb7u11
                         ^ no match for key

> 
> Weird output using 'sort -n -u':
> 
> Version: 1.0.1e-2+deb7u12

No, perfectly defined output.  -u implictly enables -s, and I already
demonstrated that -n on your input picks the initial empty string.
Since all 5 lines have the same sort key, there is only one unique key
seen, and the output is exactly the first line with that unique sort
key.  If you want to FORCE entire-line fallback, then request that as a
fallback key (since -n by itself is global to all keys, I instead
request two keys: the first as the numeric sort of the first field, the
second as the fallback sort of the entire line):

$ LC_ALL=C sort foo --debug -k1,1n -k1 -u
sort: using simple byte comparison
Version: 1.0.1e-2+deb7u11
^ no match for key
_________________________
Version: 1.0.1e-2+deb7u12
^ no match for key
_________________________
Version: 1.0.1e-2+deb7u7
^ no match for key
________________________


> 
> Weird output using 'sort -k 3 -u':
> 
> Version: 1.0.1e-2+deb7u12

Again, as proven above, all 5 lines have the same empty string (no such
key at the end of the line), so the unique output is correct.

> 
> So is this actually the expected behaviour?  I would have thought from
> the documentation that -u would return unique lines of output, not just
> one line based on whatever sort key it happened to look at.

Yes, sort -u is required to treat lines as unique solely based on the
key(s) they were sorted by (and ignoring the default last-resort key,
since -u implicitly disables -s).

As this behavior is required by POSIX and consistent with other
implementations, I'm closing it as not a bug.  But if you have further
comments or questions, you can continue to reply to this email.

By the way, have you looked at sort -V, as a way to get what you appear
to want?

$ LC_ALL=C sort foo --debug -V -u
sort: using simple byte comparison
Version: 1.0.1e-2+deb7u7
________________________
Version: 1.0.1e-2+deb7u11
_________________________
Version: 1.0.1e-2+deb7u12
_________________________

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]