[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gawk ignores case with LANG=en_US
From: |
Jim Keniston |
Subject: |
Re: gawk ignores case with LANG=en_US |
Date: |
Wed, 13 May 2009 16:38:16 -0700 |
On Wed, 2009-05-13 at 14:29 -0600, Bob Proulx wrote:
> Jim Keniston wrote:
> > /^[a-z]/ { print }
> > Assuming that environment variables LC_ALL and LC_CTYPE are
> > undefined, if I run the above with the LANG environment variable
> > set to "en_US.utf8" or "en_US", "A" matches "^[a-z]" and the
> > output is as in output_buggy. Setting IGNORECASE=0 in the
> > command line or the script doesn't help.
>
> Unfortunately what you are seeing is expected behavior. It isn't a
> bug in gawk. Gawk is doing the correct thing there.
>
> You don't like it and I don't like it but the-powers-that-be (not the
> gawk maintainer but above him in libc and the standards committees)
> have confused working with data on a computer with talking about
> working with data on a computer. The P.T.B. have decided that the
> collation ordering (sort ordering) for data should be dictionary
> ordering. In dictionary ordering case is folded together and
> punctuation is ignored. By having LANG set to any of the "en" locales
> the system is instructed to use dictionary sort ordering. This
> affects almost everything on the system that sorts or collates.
...
>
> Hope this helps,
> Bob
Yes, thanks very much for your help.
FWIW, I looked through a gawk manual --
http://web.mit.edu/gnu/doc/html/gawk_toc.html -- which I found by
chasing links from gnu.org, and I didn't see anything about this
behavior. In particular, I expected to find something about it in the
"Case-sensitivity in Matching" section. I note that the manual is dated
April 1993. Is there a later version that I should be reading?
Anyway, thanks for supporting gawk. I've been using awk & gawk for
maybe 25 years, and this is the first time I've noticed this behavior.
I must have done something recently to tweak my LANG setting.
Jim