[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#29044: sort --debug results improvement
From: |
Assaf Gordon |
Subject: |
bug#29044: sort --debug results improvement |
Date: |
Sat, 28 Oct 2017 21:06:01 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 |
tag 29044 notabug
close 29044
thanks
Hello,
There are few issues at hand. Answering out of order:
> $ sort -k 2n -k 3n --debug file.txt
[...]
> Also the user is confused if
> ________________
> is a "key 3", or just a separator.
>
> Therefore please say
> ": key 1" or "1" etc. at the end of each of them.
> This is also important if there many keys.
>
> And add a separator bar, made of -, =, etc. but not _.
This is indeed a 3rd key: it is the default behavior
of the 'last resort' sorting by the entire line.
It is not a separator.
It is used to sort lines for which the specified keys are equal.
It can be disabled with "-s/--stable" option.
Consider the following:
Case 1: The first key is equal ("A" in both lines).
Sort then uses the last resort sorting and compares the entire
lines, making "A B" appear first:
$ printf "%s\n" "A C" "A B" | sort --debug -k1,1
A B
_
___
A C
_
___
Case 2: Using "-s" disable last-resort, and lines with equal keys
are printed in the same order they were specified (hence "stable"):
$ printf "%s\n" "A C" "A B" | sort --debug -k1,1 -s
A C
_
A B
_
On 2017-10-28 11:26 AM, Dan Jacobson wrote:
$ sort -k 2n -k 3n --debug file.txt
sort: using simple byte comparison
sort: key 1 is numeric and spans multiple fields
sort: key 2 is numeric and spans multiple fields
41 011 92.3 亞太
___
____
________________
41 011 97.1 大漢
___
____
OK but they look like they only span one field.
'sort --debug' will indicate the *actual* characters
that were used for the comparison.
In case of "-n" (numeric sort), the conversion to a numeric value
stopped at the space character, and it is indicated so.
This has nothing to do with the fact that the key specification
spans multiple fields for a single numeric key.
Consider the following cases (I'm using "-s" for all cases to
reduce clutter, it doesn't change the meaning):
Case 1: Because we used alphanumeric sorting order (the default),
All the characters until the first space are marked by "--debug":
$ printf "%s\n" "11A A" "33 C" "4e4D D" | sort -k1,1 --debug -s
11A A
___
33 C
__
4e4D D
____
Case 2: with numeric sorting, only the digits are marked:
$ printf "%s\n" "11A A" "33 C" "4e4D D" | sort -k1n,1 --debug -s
4e4D D
_
11A A
__
33 C
__
case 3: if using "-g" (general numeric sort, which can parse scientific
notation) the "4e4" is parsed, but stopped at the "D" character:
$ printf "%s\n" "11A A" "33 C" "4e4D D" | sort -s -k1g,1 --debug
11A A
__
33 C
__
4e4D D
___
Also the Info documentation doesn't mention how to inflence
"sort: using simple byte comparison"
which seems to always be printed when using --debug no matter what.
This message indicates you are sorting in the C/POSIX locale.
Perhaps it is the default locale on your system ?
"sort --debug" will always print the sorting rules, e.g.:
$ LC_ALL=en_CA.UTF-8 sort --debug < /dev/null
sort: using ‘en_CA.UTF-8’ sorting rules
$ LC_ALL=C sort --debug < /dev/null
sort: using simple byte comparison
As such,
I'm marking this item as not-a-bug and closing it, but discussion can
continue by replying to this thread.
regards,
- assaf