bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#8032: Suggestion in re. error reports from 'comm'


From: Eric Blake
Subject: bug#8032: Suggestion in re. error reports from 'comm'
Date: Mon, 14 Feb 2011 10:33:09 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7

On 02/13/2011 01:27 PM, Paul E Condon wrote:
> 'comm' uses LC_COLLATE or LC_ALL to establish the collation that it
> uses in its check for proper sorting of input. (I think this is true.)

Thanks for the report.  Yes, per POSIX, the following list of rules
should apply to ALL utilities that perform sorting (first one that
applies wins):

If LC_ALL is non-empty, honor that
If LC_COLLATE is non-empty, honor that
If LANG is non-empty, honor that
Use an implementation-defined default.

Some implementations set the implementation-defined default to the C
locale, but that is not universally true.

> 
> The man page and info make no mention of LC_ALL (at least not as
> delivered in Debian squeeze) but LC_ALL seems to affect 'comm'
> behavior.

You are correct that none of the coreutils man pages mention the effect
of LC_* and LANG environment variables; our excuse is that they are so
universally applicable that it is assumed that you are aware of their
effect on all utilities.

But patches are welcome to correct the man pages, if you think it would
help.

Also, a patch to the info pages to add a detailed section with chapter 2
Common Options, discussing the effect of all LC_*/LANG environment
variables on all coreutils, would be appreciated.


> When neither LC_COLLATE nor LC_ALL is defined, 'comm' reports that the
> file is out of order. I think this is misleading. I think it should
> instead report that no LC_* is defined. 

How is coreutils supposed to know the difference between an environment
variable not being defined being an error, vs. an environment variable
not being defined meaning that you explicitly wanted the
implementation-defined default?

> 
> Alteratively, it might be OK to silently assume the definition,
> LC_COLLATE=C

No, it is NOT silently okay to do that.  Coreutils uses
setlocale(LC_ALL,""), which is the POSIX-blessed means for determining
the four-step collation choice documented above.  Either you provide one
of the three variables that affects sorting, or you get the
implementation-defined default.

> 
> I discovered this situation while writing (and debugging) a shell
> script that I wanted to work when invoked from /etc/cron.daily.  
> The scipt leaves a sorted file in disk for use in the next day run.
> Naturely, during testing I seeded that files from manual runs of
> the script. Always, when I set up what I thought was a working version,
> the script, as run from cron failed with message that the file(s)
> was/were out of order. Yes, I should have figured it out, and I 
> finally have. But it would have been so much faster if ...

Any well-written script that depends on not being interrupted by
localization settings will explicitly set LC_ALL (or specific
categories) as needed, rather than relying on defaults.  That's a fact
of life for modern script-writing, and a lesson that unfortunately is
learned more often by experience than by documentation.  Changing
coreutils to fail when the variable is not set, rather than going with
the implementation-defined default, would unfortunately violate POSIX.

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]