[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#23556: sort(1): misleading description of option -n
From: |
Assaf Gordon |
Subject: |
bug#23556: sort(1): misleading description of option -n |
Date: |
Mon, 16 May 2016 15:07:59 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 |
Hello Carsten,
On 05/14/2016 10:17 AM, Carsten Hey wrote:
the man page sort(1) contains a misleading description of the option -n:
[...]
$ man sort | grep -A1 -- --numeric-sort | sed -n -e 's/^ *//' -e '1!p'
compare according to string numerical value
[...]
This description reads as if this command:
$ printf '%s\n' 'x 9' 'x 10' | sort -n
x 10
x 9
[...]
but instead, -n stops doing its magic after finding the first
non-numeric, non-whitespace character. There is a short and simple
way to summarize this behaviour.
IIUC, you are disputing the accuracy (or clarity) of the term "string numerical
value" on the manual page,
and not the actual behavior of "sort -n" (which is mandated by posix and has
been this way for many many years,
as opposed to "sort -V" which was only introduced as GNU extension in coreutils
version 7.0 in 2008).
The description says "string numeric value" - which (to me) does not mean
anything other than numeric value
(implying letters will not be sorted properly), but opinions clearly differ.
Using the "--debug" option would immediately reveal the error:
$ printf '%s\n' 'x 9' 'x 10' | sort --debug -n
sort: using ‘en_US.UTF-8’ sorting rules
x 10
^ no match for key
____
x 9
^ no match for key
___
If you have a suggestion for improved wording, I'm sure they can be considered
for inclusion.
A patch against function usage() in sort.c would go even a longer way.
note that unlike FreeBSD/OpenBSD, the description in the man page is derived from
"sort --help",
and thus kept brief.
For completeness, here are similar descriptions of "sort -n" from other sources:
POSIX says
(http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html):
-n Restrict the sort key to an initial numeric string, consisting of
optional
<blank> characters, optional minus-sign, and zero or more digits with
an
optional radix character and thousands separators (as defined in the
current
locale), which shall be sorted by arithmetic value. An empty digit
string
shall be treated as zero. Leading zeros and signs on zeros shall not
affect ordering.
The GNU Coreutils manual (which is the official documentation, not the man
page) says:
(http://www.gnu.org/software/coreutils/manual/coreutils.html#sort-invocation)
-n
--numeric-sort
--sort=numeric
Sort numerically. The number begins each line and consists of optional
blanks,
an optional ‘-’ sign, and zero or more digits possibly separated by
thousands
separators, optionally followed by a decimal-point character and zero or
more digits.
An empty number is treated as ‘0’. The LC_NUMERIC locale specifies the
decimal-point
character and thousands separator. By default a blank is a space or a
tab, but
the LC_CTYPE locale can change this.
OpenBSD's man page has:
-n, --numeric-sort, --sort=numeric
An initial numeric string, consisting of optional blank space,
optional minus sign, and zero or more digits (including decimal
point) is sorted by arithmetic value. Leading blank characters
are ignored.
FreeBSD's man page has:
-n, --numeric-sort, --sort=numeric
Sort fields numerically by arithmetic value. Fields are supposed
to have optional blanks in the beginning, an optional minus sign,
zero or more digits (including decimal point and possible thou-
sand separators).
I'm leaving the bug open, other comments and feedback welcomed.
regards,
- assaf