bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#29044: sort --debug results improvement


From: Assaf Gordon
Subject: bug#29044: sort --debug results improvement
Date: Sat, 28 Oct 2017 21:06:01 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0

tag 29044 notabug
close 29044
thanks

Hello,

There are few issues at hand. Answering out of order:

> $ sort -k 2n -k 3n --debug file.txt
[...]
> Also the user is confused if
> ________________
> is a "key 3", or just a separator.
>
> Therefore please say
> ": key 1" or "1" etc. at the end of each of them.
> This is also important if there many keys.
>
> And add a separator bar, made of -, =, etc. but not _.

This is indeed a 3rd key: it is the default behavior
of the 'last resort' sorting by the entire line.
It is not a separator.

It is used to sort lines for which the specified keys are equal.
It can be disabled with "-s/--stable" option.

Consider the following:

Case 1: The first key is equal ("A" in both lines).
Sort then uses the last resort sorting and compares the entire
lines, making "A B" appear first:

  $ printf "%s\n" "A C" "A B" | sort --debug -k1,1
  A B
  _
  ___
  A C
  _
  ___


Case 2: Using "-s" disable last-resort, and lines with equal keys
are printed in the same order they were specified (hence "stable"):

  $ printf "%s\n" "A C" "A B" | sort --debug -k1,1 -s
  A C
  _
  A B
  _




On 2017-10-28 11:26 AM, Dan Jacobson wrote:
$ sort -k 2n -k 3n --debug file.txt
sort: using simple byte comparison
sort: key 1 is numeric and spans multiple fields
sort: key 2 is numeric and spans multiple fields
41 011 92.3 亞太
    ___
        ____
________________
41 011 97.1 大漢
    ___
        ____

OK but they look like they only span one field.

'sort --debug' will indicate the *actual* characters
that were used for the comparison.
In case of "-n" (numeric sort), the conversion to a numeric value
stopped at the space character, and it is indicated so.

This has nothing to do with the fact that the key specification
spans multiple fields for a single numeric key.


Consider the following cases (I'm using "-s" for all cases to
reduce clutter, it doesn't change the meaning):

Case 1: Because we used alphanumeric sorting order (the default),
All the characters until the first space are marked by "--debug":

  $ printf "%s\n" "11A A" "33 C" "4e4D D" | sort -k1,1 --debug -s
  11A A
  ___
  33 C
  __
  4e4D D
  ____


Case 2: with numeric sorting, only the digits are marked:

  $ printf "%s\n" "11A A" "33 C" "4e4D D" | sort -k1n,1 --debug -s
  4e4D D
  _
  11A A
  __
  33 C
  __


case 3: if using "-g" (general numeric sort, which can parse scientific notation) the "4e4" is parsed, but stopped at the "D" character:

  $ printf "%s\n" "11A A" "33 C" "4e4D D" | sort -s -k1g,1 --debug
  11A A
  __
  33 C
  __
  4e4D D
  ___



Also the Info documentation doesn't mention how to inflence
"sort: using simple byte comparison"
which seems to always be printed when using --debug no matter what.

This message indicates you are sorting in the C/POSIX locale.
Perhaps it is the default locale on your system ?

"sort --debug" will always print the sorting rules, e.g.:

  $ LC_ALL=en_CA.UTF-8 sort --debug < /dev/null
  sort: using ‘en_CA.UTF-8’ sorting rules

  $ LC_ALL=C sort --debug < /dev/null
  sort: using simple byte comparison





As such,
I'm marking this item as not-a-bug and closing it, but discussion can continue by replying to this thread.

regards,
 - assaf









reply via email to

[Prev in Thread] Current Thread [Next in Thread]