bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?


From: Linda Walsh
Subject: Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?
Date: Mon, 21 May 2012 16:42:19 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.24) Gecko/20100228 Lightning/0.9 Thunderbird/2.0.0.24 Mnenhy/0.7.6.666



Eric Blake wrote:

On 05/21/2012 03:02 PM, Linda Walsh wrote:

the cat was out of the bag.  POSIX 2001 had to continue to allow
existing implementations, by stating that range expressions in anything
but the C locale are explicitly undefined.
---------------------


    Explicitly undefined?   Or locale dependent?

POSIX explicitly undefined ranges for all but the C locale.  _Other
standards_, such as Unicode, are free to add range requirements on top
of what POSIX requires, but alas, Unicode collation order does NOT
currently specify anything about regular expression or glob range
matching, so it is out of scope for Unicode to say what [A-Z] expands to.


----

I think this is the problem.

A-Z in regular expressions is defined to expand to those characters
that are _in collating order_, >A, and <Z...

Without a collating order that expression in RE's would never have made any
sense.  It requires a collating order and is dependent on it.

If there is no collating order, then you cannot expand A-Z, but if there
is, you expand it to the values between A-Z that are in the collating order.

The regex(7) man page says that [xx-xx] uses ***collating order**::

       If two characters  in  the  list
       are  separated  by '-', this is shorthand for the full range of charac-
       ters between those two (inclusive) in the collating sequence, for exam-
       ple,  "[0-9]" in ASCII matches any decimal digit.

----
Seems pretty clear -- regex's aren't exempt from collating order, they depend on it...







reply via email to

[Prev in Thread] Current Thread [Next in Thread]