bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#51011: [PATCH] sort: --debug: add warnings about radix and grouping


From: Pádraig Brady
Subject: bug#51011: [PATCH] sort: --debug: add warnings about radix and grouping chars
Date: Sun, 10 Oct 2021 18:57:57 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Thunderbird/84.0

On 09/10/2021 23:29, Paul Eggert wrote:
On 10/9/21 5:00 AM, Pádraig Brady wrote:
On 09/10/2021 04:48, Paul Eggert wrote:

'sort' could determine the group sizes from the locale, and
reject digit strings that are formatted improperly according to the
group-size rules. (Not that I plan to write the code to do that....)

Yes I agree that would be better, but not worth it I think
as there would still be ambiguity in what was a grouping char
and what was a field separator. Also that ambiguity would
now vary across locales.

I don't see the ambiguity problem. The field separator is used to
identify fields; once the fields are identified, the thousands
separator, decimal point, etc. contribute to numeric comparison in the
usual way. So it's OK (albeit confusing) for the field separator to be
'.' or ',' or '-' or '0' or any another character that could be part of
a number.

For example, with 'sort -t 0 -k 2,2n', the digit 0 is not part of the
numeric field that is compared, and there's no ambiguity about that even
though 0 is allowed in numbers. The same idea applies to 'sort -t , -k
2,2n'.

Indeed. I dropped -t, from my later examples and confused myself.

Attached is the proposed change to add appropriate warnings in this area.
Examples now diagnosed are:

  $ printf '0,9\n1,a\n' | sort -nk1 --debug -t, -s
  sort: key 1 is numeric and spans multiple fields
  sort: field separator ‘,’ is treated as a group separator in numbers
  1,a
  _
  0,9
  ___

  $ printf '1,a\n0,9\n' | LC_ALL=fr_FR.utf8 sort -gk1 --debug -t, -s
  sort: key 1 is numeric and spans multiple fields
  sort: field separator ‘,’ is treated as a decimal point in numbers
  0,9
  ___
  1,a
  __

  $ printf '1.0\n0.9\n' | sort -s -k1,1g --debug
  sort: numbers use ‘.’ as a decimal point in this locale
  0.9
  ___
  1.0
  ___

  $ printf '1.0\n0.9\n' | LC_ALL=fr_FR.utf8 sort -s -k1,1g --debug
  sort: numbers use ‘,’ as a decimal point in this locale
  0.9
  _
  1.0
  _

cheers,
Pádraig

Attachment: sort--debug-radix.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]