bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bugreport 'sort'


From: Bob Proulx
Subject: Re: Bugreport 'sort'
Date: Tue, 4 Mar 2008 11:36:29 -0700
User-agent: Mutt/1.5.13 (2006-08-11)

Pádraig Brady wrote:
> As a related issue, why do we indicate in the man page and FAQ to explicitly
> set the order use LC_ALL rather than LC_COLLATE ?

Because there is a priority of variables available the documentation
would need to say set LANG, unless LC_COLLATE is set in which case set
LC_COLLATE, unless LC_ALL is set in which case set LC_ALL.  That is
much longer in the usage and much more confusing to users than simply
saying set LC_ALL.  If they actually set LC_ALL then we are confident
that they will get the setting.  It is a compromise.

Also, I have heard of there being combinations of settings of
character type and collating sequence that do not work together.
Setting LC_ALL sets both but setting LC_COLLATE only sets one.  If a
user has an incompatible LC_CTYPE then setting LC_ALL=C is is okay but
setting just LC_COLLATE=C is not safe.  Unfortunately I am not
knowledgeable about this problem enough to know if this is real or
simply FUD.  If anyone knows how this really works and would care to
educate me I would appreciate it.  But again setting LC_ALL avoids
giving advice which might actually cause problems.

> So one can't print warning messages, hmm.

I think printing a warning message there would create a lot of
problems for people in situations where they don't really care.  Think
of all of the cron jobs across all of the machines that would suddenly
start generating output and sending mail.  It would mostly make using
sort outside of the C local impossible.

Remember that US-ASCII is a proper subset of UTF-8.  Sorting a data
file in a script seems reasonable.  But if on a data dependent basis
it sometimes output warnings and at others it did not because the
input data was different that would be a problem.  In order to use
UTF-8 sorting one would need to screen the input and see if it
actually contained any of those characters and change the locale for
sort on the fly in order to avoid the warning for those files that
didn't include them.

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]