bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort


From: James Youngman
Subject: Re: sort
Date: Mon, 29 Aug 2005 16:38:17 +0100
User-agent: Mutt/1.5.9i

On Mon, Aug 29, 2005 at 05:10:21AM -0400, Nathan Moore wrote:

> I do not believe that the default behavior for GNU sort is what the
> man page and the info documents state.

Perhaps you have an oldish version of coreutils, because in the Info
page displayed with "info coreutils sort", the footnote says:

||    (1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to
|| `en_US'), then `sort' may produce output that is sorted differently
|| than you're accustomed to.  In that case, set the `LC_ALL'
|| environment variable to `C'.  Note that setting only `LC_COLLATE'
|| has two problems.  First, it is ineffective if `LC_ALL' is also
|| set.  Second, it has undefined behavior if `LC_CTYPE' (or `LANG',
|| if `LC_CTYPE' is unset) is set to an incompatible value.  For
|| example, you get undefined behavior if `LC_CTYPE' is `ja_JP.PCK'
|| but `LC_COLLATE' is `en_US.UTF-8'.

In other words, the default behaviour is affected by your locale
settings.  I've assumed that this is yoru problem.  It's explained in
more detail below.

> Also, I have found no flags to force this behavior.
> 
> If I have a file "file.txt" containing:
> ______________________________________
> %
> #
> (
> -
> !
> +
> *
> $
> #
> )
> (
> &
> ^
> @
> ______________________________________
> 
> then shouldn't sort file.txt sort the lines by the ascii values of
> the first characters on each line, yielding output where all
> duplicated symbols are on adjacent lines?
> 
> This is not what happens.  Am I wrong in gathering that this is the
> expected behavior from the documentation or is this a bug?

It's a bit hard to figure out the best answer to your question as you
don't state what output you DO get.  You might find
http://www.catb.org/~esr/faqs/smart-questions.html a useful guide on
how to ask for help via email.  Anyway, we'll do our best to help.

When I do this, I get the following output:

address@hidden:~$ sort file2.txt
^
-
!
(
(
)
@
$
*
&
#
#
%
+

These characters are not sorted according to their ASCII values, as
the following test shows:

address@hidden:~$ sort file2.txt | tr -d '\012' | od -b
0000000 136 055 041 050 050 051 100 044 052 046 043 043 045 053
0000016

The reason that this is the case is simply that it's the way I have
indicated I wanted the sort output to appear; it is the dictionary
sort order for the locale which I have selected:

address@hidden:~$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_ALL="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=


It's simple to override this, I can just set the LC_ALL
environment variable to something else:

address@hidden:~$ LC_ALL=C sort file2.txt
!
#
#
$
%
&
(
(
)
*
+
-
@
^
address@hidden:~$ LC_ALL=C sort file2.txt | tr -d '\012' | od -b
0000000 041 043 043 044 045 046 050 050 051 052 053 055 100 136
0000016

The above is a longer-winded version of the coreutils FAQ entry
http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
I'm mildly surprised that you managed to track down and subscribe to
the coreutils mailing list without coming into contact with the FAQ.  

Because you didn't state exactly what result you're seeing (e.g. I
assumed the problem wasn't that sort was crashing, for example) I
can't be sure that this is your problem.  However, it's a reasonable
guess.  If this proves not to be the answer to your question, please
provide some more detail and we'll try to help.

Regards,
James.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]