bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23677: sort --debug not ignoring punctuation when sort does


From: Karl Berry
Subject: bug#23677: sort --debug not ignoring punctuation when sort does
Date: Wed, 1 Jun 2016 22:14:48 GMT

Consider this two-line input file:
M !z
M /a
(! = ASCII 33; / = ASCII 47.)

Locale-dependent sort with debug:
LC_ALL=en_US.UTF-8 sort --debug -k2 /tmp/foo 

Output:
sort: using ‘en_US.UTF-8’ sorting rules
..
M /a
 ___
____
M !z
 ___
____

Due to the locale rules, the punctuation characters are being ignored
(presumably), or ! would sort before / (as it does with the LC_ALL=C
sort).  Therefore it seems the debug output would be closer to reality
if it was:

M /a
 _ _
____
M !z
 _ _
____

(I think; I'm not sure if all blanks are ignored in the locale
sort, or just multiple blanks collapsed to one.)

I realize that, in terms of mere string parsing, the punctuation is
included in the sort key.  But when a character is not actually used for
sorting, and the --debug output says it is, that seems suboptimal.
(Especially when the rules are, for all practical purposes,
undocumented.)

I also realize it is not necessarily feasible to change, even if there's
agreement on changing it.

@curmudgeon
How anyone can do anything useful with en_US.UTF-8 sort is beyond me ...
@end curmudgeon

Ok, no more from me in this area, you can be glad to know. --karl






reply via email to

[Prev in Thread] Current Thread [Next in Thread]