bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk ignores case with LANG=en_US


From: Jim Keniston
Subject: Re: gawk ignores case with LANG=en_US
Date: Wed, 13 May 2009 16:38:16 -0700

On Wed, 2009-05-13 at 14:29 -0600, Bob Proulx wrote:
> Jim Keniston wrote:
> > /^[a-z]/ { print }
> > Assuming that environment variables LC_ALL and LC_CTYPE are
> > undefined, if I run the above with the LANG environment variable
> > set to "en_US.utf8" or "en_US", "A" matches "^[a-z]" and the
> > output is as in output_buggy.  Setting IGNORECASE=0 in the
> > command line or the script doesn't help.
> 
> Unfortunately what you are seeing is expected behavior.  It isn't a
> bug in gawk.  Gawk is doing the correct thing there.
> 
> You don't like it and I don't like it but the-powers-that-be (not the
> gawk maintainer but above him in libc and the standards committees)
> have confused working with data on a computer with talking about
> working with data on a computer.  The P.T.B. have decided that the
> collation ordering (sort ordering) for data should be dictionary
> ordering.  In dictionary ordering case is folded together and
> punctuation is ignored.  By having LANG set to any of the "en" locales
> the system is instructed to use dictionary sort ordering.  This
> affects almost everything on the system that sorts or collates.
...
> 
> Hope this helps,
> Bob

Yes, thanks very much for your help.

FWIW, I looked through a gawk manual --
http://web.mit.edu/gnu/doc/html/gawk_toc.html -- which I found by
chasing links from gnu.org, and I didn't see anything about this
behavior.  In particular, I expected to find something about it in the
"Case-sensitivity in Matching" section.  I note that the manual is dated
April 1993.  Is there a later version that I should be reading?

Anyway, thanks for supporting gawk.  I've been using awk & gawk for
maybe 25 years, and this is the first time I've noticed this behavior.
I must have done something recently to tweak my LANG setting.

Jim





reply via email to

[Prev in Thread] Current Thread [Next in Thread]