emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#35636: closed (bug report sort command)


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#35636: closed (bug report sort command)
Date: Wed, 08 May 2019 14:43:02 +0000

Your message dated Wed, 8 May 2019 09:41:58 -0500
with message-id <address@hidden>
and subject line Re: bug#35636: bug report sort command
has caused the debbugs.gnu.org bug report #35636,
regarding bug report sort command
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
35636: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=35636
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: bug report sort command Date: Wed, 8 May 2019 10:35:01 +0200
I verified the following bug is there in:
  • sort (GNU coreutils) 8.21
  • sort (GNU coreutils) 8.22
  • sort (GNU coreutils) 8.23
Input file:
# cat sort.in
1|a|x
2|b|x
3|aa|x
4|bb|x
5|c|x

shell command and output:
# sort -t'|' -k2 <sort.in
3|aa|x
1|a|x
4|bb|x
2|b|x
5|c|x

I expected that key "a" to come before key "aa" and key "b" to come before key "bb".



--- End Message ---
--- Begin Message --- Subject: Re: bug#35636: bug report sort command Date: Wed, 8 May 2019 09:41:58 -0500 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
tag 35636 notabug
thanks

On 5/8/19 3:35 AM, Michele Liberi wrote:
> I verified the following bug is there in:
> 
>    - sort (GNU coreutils) 8.21
>    - sort (GNU coreutils) 8.22
>    - sort (GNU coreutils) 8.23
> 
> *Input file:*
> # cat sort.in
> 1|a|x
> 2|b|x
> 3|aa|x
> 4|bb|x
> 5|c|x
> 
> 
> *shell command and output:*
> # sort -t'|' -k2 <sort.in
> 3|aa|x
> 1|a|x
> 4|bb|x
> 2|b|x
> 5|c|x

Let's use --debug to see what sort really did:

$ sort --debug -t'|' -k2 <sort.in
sort: using ‘en_US.UTF-8’ sorting rules
3|aa|x
  ____
______
1|a|x
  ___
_____
4|bb|x
  ____
______
2|b|x
  ___
_____
5|c|x
  ___
_____


Since you did not specify an ending field, you are comparing the string
"aa|x" with "a|x", and the string "a|x" with "bb|x"; in the en_US.UTF-8
locale, punctuation is ignored on the first-order pass through
strcoll(), which means you are effectively comparing "aax" with "ax"
with "bbx", and the sort is correct; but even in a locale that does not
ignore punctuation:

$ LC_ALL=C sort --debug -t'|' -k2 <sort.in
sort: using simple byte comparison
3|aa|x
  ____
______
1|a|x
  ___
_____
4|bb|x
  ____
______
2|b|x
  ___
_____
5|c|x
  ___
_____

the sort is still correct, since ASCII '|' sorts after ASCII 'a'. Your
real problem is that you are sorting on too much data; you need to try
again with the key limited to exactly the second field:

$ sort --debug -t'|' -k2,2 <sort.in
sort: using ‘en_US.UTF-8’ sorting rules
1|a|x
  _
_____
3|aa|x
  __
______
2|b|x
  _
_____
4|bb|x
  __
______
5|c|x
  _
_____

where now sort can see that "a" is a prefix of "aa" because it is no
longer bleeding on to the rest of the line.


> 
> *I expected that key "a" to come before key "aa" and key "b" to come before
> key "bb".*

Your expectations are at odds with your incomplete command line.  sort
is behaving as required; therefore, I'm closing this as not a bug. But
feel free to reply if you have further questions.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]