bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17188: Sort bugs


From: Bob Proulx
Subject: bug#17188: Sort bugs
Date: Sat, 5 Apr 2014 14:23:29 -0600
User-agent: Mutt/1.5.23 (2014-03-12)

Nikos Balkanas wrote:
> Eric Blake wrote:
> > See the FAQ:
> >
> > https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
> >
> ​From that link:​
> ​"So far there is still no fully satisfactory solution to this problem. If
> you find one then please contact me so that this information can be listed."
> 
> If you are "me", then I would like to suggest that you make default
> the legacy sort behaviour, and add with -c the locale support that
> standards and non-English users ask for.

When I wrote that I did mean within the confines of continuing to
conform to the standards.  :-)

> ​UI is still a bug, though not a code bug. And legacy UI compatibility is
> broken.

Actually no.  If you were really using the legacy UI then you would be
using the legacy locale setting LC_ALL=C too.  If you aren't then you
aren't using the legacy UI.

> However, I am perfectly satisfied with your fast and long
> explanation of what the status is.
> You will, however, go crazy if you respond like that to every user with a
> locale sorting issue.

I usually rant:

You don't like it and I don't like it but the-powers-that-be have
confused working with data on a computer with talking about working
with data on a computer.  They have decided that the collation
ordering (sort ordering) for data should be dictionary ordering.  In
dictionary ordering case is folded together and punctuation is
ignored.  By having LANG set to any of the "en" locales the system is
instructed to use dictionary sort ordering.  This affects almost
everything on the system that sorts.  This includes commands such as
'ls' and also your shell (e.g. 'echo *') too.

> Can't you make default LOCALE=C for sorting and allow users to
> change that to the system settings using -c when they need it?

Actually no we can't.  That would break the opposite side of things
where people rely upon dictionary sorting based upon their chosen
locale setting.  After all of these years that would be equally bad in
the opposite way.

I am going to say "you" here but please don't take this as hostility.
It is a bad word in text email.  But I am really just trying to put
down the facts of the case.

Originally the locale was C.  If you go back to the C locale things
will be working for you as you wish it to work.  It will work as it
worked before.  Agreed?

Then you changed something.  You changed the locale.  You in your
environment set LANG=en_US.UTF-8 (or similar equivalent).  That is
when you notice that sort doesn't work as you want it to work.

Now you might say that you personally didn't make that choice but your
system vendor did.  It happened when you switched to a new machine
running a newer system or something.  Okay.  But you chose that system
vendor.  You could choose a different system vendor.  Or choose to go
back to the previous system with the previous LANG=C locale.  Or
choose to configure the new system as you wish it.  You are in control
of it.

As a pilot we have a saying, "Fly the airplane.  Don't let the
airplane fly you." :-)

You could file bugs with your system vendor that they defaulted you to
LANG=en_US.UTF-8 and ask them to allow users to choose LANG=C at
install time instead.  I have done this and unfortunately the response
from one vendor was "That was intentional." with the bug closed and
locked against further comment.  The door slammed in my face.  I am
now using a different software distribution.

> Nowadays users use other graphical tools to do sorting, sort is used
> mostly by scripts.

For you perhaps.  Not for me.  Not for many people.  I have no idea
what the survey count would be either way but it doesn't matter.
Can't make the mistake of assuming that any one environment is more
important to the exclusion of all others.

But you see the problem isn't a change in sort.  The problem is a
change in locale.  Sort is behaving as it has for years and years.
What changed was the locale that most people get by default.  It used
to be that users would get LANG=C.  But these days most users get
LANG=en_US.UTF-8.  But with a dictionary collating sort order locale
it behaves undesirably to many of us.  But to others that is exactly
what they want.  And so they wrote it into the locale.  Two opposing
viewpoints that being in opposition cannot be converged.

Note that this is bigger than just sort.  This affects everything on
your system.  It affects the shell.  Try "echo *" and look at the sort
ordering.  Same thing there.  The shell will sort by locale sort order.

The only way to fix it is to fix it at the source of the problem.  The
source is the locale collation sequence.  Which is why I always set
this in my environment.

  export LANG=en_US.UTF-8
  export LC_COLLATE=C

But while that works for most western locales I have no idea how that
would interact with chinese big5 for example.  Probably badly.  So it
can't really be offered as a general solution to the problem.  But if
you are using one of the set of western locales that it works for then
it does solve the problem for you.

I keep thinking that one of these days I should dig into it and create
my own locale.  Something like LANG=en_US.C.UTF-8 that would define a
sane sort ordering that wouldn't require LC_COLLATE=C to fix.  But
there isn't much itch to scratch there since LC_COLLATE=C does
effectively the same thing to fix the problem.  For western locales
anyway and we don't usually hear from anyone else with this problem.

Bob





reply via email to

[Prev in Thread] Current Thread [Next in Thread]