|
From: | Linda Walsh |
Subject: | Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z? |
Date: | Mon, 21 May 2012 16:42:19 -0700 |
User-agent: | Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.24) Gecko/20100228 Lightning/0.9 Thunderbird/2.0.0.24 Mnenhy/0.7.6.666 |
Eric Blake wrote:
On 05/21/2012 03:02 PM, Linda Walsh wrote:the cat was out of the bag. POSIX 2001 had to continue to allow existing implementations, by stating that range expressions in anything but the C locale are explicitly undefined.--------------------- Explicitly undefined? Or locale dependent?POSIX explicitly undefined ranges for all but the C locale. _Other standards_, such as Unicode, are free to add range requirements on top of what POSIX requires, but alas, Unicode collation order does NOT currently specify anything about regular expression or glob range matching, so it is out of scope for Unicode to say what [A-Z] expands to.
---- I think this is the problem. A-Z in regular expressions is defined to expand to those characters that are _in collating order_, >A, and <Z... Without a collating order that expression in RE's would never have made any sense. It requires a collating order and is dependent on it. If there is no collating order, then you cannot expand A-Z, but if there is, you expand it to the values between A-Z that are in the collating order. The regex(7) man page says that [xx-xx] uses ***collating order**:: If two characters in the list are separated by '-', this is shorthand for the full range of charac- ters between those two (inclusive) in the collating sequence, for exam- ple, "[0-9]" in ASCII matches any decimal digit. ----Seems pretty clear -- regex's aren't exempt from collating order, they depend on it...
[Prev in Thread] | Current Thread | [Next in Thread] |