bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dealing with character ranges in grep


From: Bruno Haible
Subject: Re: Dealing with character ranges in grep
Date: Thu, 9 Jun 2011 13:12:47 +0200
User-agent: KMail/1.9.9

Paolo,

> With my proposal, distros/people that use --with-included-regex would 
> get understandable semantics + no equivalence classes
> ...
> locale behavior of regex are irremediably 
> broken.  For example, when you have a collation element, you can match 
> it using ranges (e.g. [d-i] matches "ch" in Czech; "ch" collates after 
> "h"), and even apply negation (e.g. [^c-h] matches "ch" too).  However 
> there is no way to anchor your match to the beginning of the collation 
> element.  So "chci" matches both /[c-h]+ci/ and /[^c-h]+ci/.  It is 
> beyond repair, and [=e=] is the only part that can be salvaged.

So, Jim and you appear to agree that equivalence classes [=e=] are a
reasonable feature outside LC_ALL=C.

What would it take to let distros/people use --with-included-regex and
get understandable semantics for ranges + working equivalence classes?

I would prefer that to your proposal, because it cannot be seen as a
regression by people who care about equivalence classes.

Can that be done through gnulib code? If not, what do we need from glibc
to get it done in gnulib?

Bruno
-- 
In memoriam Johanna Kirchner <http://en.wikipedia.org/wiki/Johanna_Kirchner>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]