[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Case insensitivity seems to ignore lower bound of interval
From: |
Paul Jarc |
Subject: |
Re: Case insensitivity seems to ignore lower bound of interval |
Date: |
Thu, 28 Apr 2011 00:17:07 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) |
Eric Bischoff <address@hidden> wrote:
> 1) Contradiction with the documentation :
>
> http://www.gnu.org/software/gawk/manual/gawk.html#Locales says that
>
> $ echo something1234abc | gawk '{ sub("[A-Z]*$", ""); print }'
>
> returns
>
> something1234
That example behaves as described in the documentation for some
locales, but not in others (such as yours, apparently). That's the
whole point of that section of the documentation--different locales
have different behavior for character ranges.
Note that case-insensitivity is not an intended feature at all. It's
just an accidental result of the character collation of some locales.
Some locales arrange characters in the order aAbBcC...zZ, so a range
like [A-Z] includes all upper- and lowercase letters except lowercase
a. Other locales may arrange them as AaBbCc...Zz, so [A-Z] excludes
lowercase z instead. But the usual expectation, and the actual
behavior in the C locale, is that [A-Z] includes only uppercase
letters, and [a-z] includes only lowercase letters.
paul