[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#23677: sort --debug not ignoring punctuation when sort does
From: |
Karl Berry |
Subject: |
bug#23677: sort --debug not ignoring punctuation when sort does |
Date: |
Wed, 1 Jun 2016 22:14:48 GMT |
Consider this two-line input file:
M !z
M /a
(! = ASCII 33; / = ASCII 47.)
Locale-dependent sort with debug:
LC_ALL=en_US.UTF-8 sort --debug -k2 /tmp/foo
Output:
sort: using âen_US.UTF-8â sorting rules
..
M /a
___
____
M !z
___
____
Due to the locale rules, the punctuation characters are being ignored
(presumably), or ! would sort before / (as it does with the LC_ALL=C
sort). Therefore it seems the debug output would be closer to reality
if it was:
M /a
_ _
____
M !z
_ _
____
(I think; I'm not sure if all blanks are ignored in the locale
sort, or just multiple blanks collapsed to one.)
I realize that, in terms of mere string parsing, the punctuation is
included in the sort key. But when a character is not actually used for
sorting, and the --debug output says it is, that seems suboptimal.
(Especially when the rules are, for all practical purposes,
undocumented.)
I also realize it is not necessarily feasible to change, even if there's
agreement on changing it.
@curmudgeon
How anyone can do anything useful with en_US.UTF-8 sort is beyond me ...
@end curmudgeon
Ok, no more from me in this area, you can be glad to know. --karl
- bug#23677: sort --debug not ignoring punctuation when sort does,
Karl Berry <=