[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character ranges in regular expressions

From: Eric Blake
Subject: Re: character ranges in regular expressions
Date: Mon, 04 Oct 2010 14:51:00 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100921 Fedora/3.1.4-1.fc13 Mnenhy/0.8.3 Thunderbird/3.1.4

On 10/04/2010 02:43 PM, Aharon Robbins wrote:
Which is why my proposal is that glibc consider:

[A-Z] =>  match C locale; 26 letters, regardless of locale
[[.A.]-[.Z.]] =>  use collation rules, since we explicitly spelled things
with collation symbols (26 letters in POSIX local, 51 or even more in
other locales, since accented characters might be included in the
collation range), so that we aren't completely losing CEO behavior (if
someone seriously has a reason to use it)
[[:upper:]] =>  per POSIX rules in all locales

This would be great.  In what must be close to (or more than) the
10 years since gawk started supporting locales, I have yet to meet
anyone who thinks that [a-z] matching [A-Y] is a feature!

Great idea or not, Uli rejected it :(

------- Additional Comments From drepper dot fsp at gmail dot com 2010-10-04 02:42 ------- This stays as it is. If individual locale maintainers think the current behavior is unintentionally as-is then they can change it. But in general this is the long-implemented behavior and won't be changed. Collating elements are just not really useful outside the POSIX locale or when the locale is guaranteed to stay
the same.

-- What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX http://sourceware.org/bugzilla/show_bug.cgi?id=12051

Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]