bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug Report grep -E Debian Squeeze


From: Eric Blake
Subject: Re: Bug Report grep -E Debian Squeeze
Date: Mon, 25 Mar 2013 13:35:26 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4

On 03/25/2013 05:47 AM, Jean-Marc Messina wrote:
> Hi
> 
> I hope i report this bug from the good way, if not, please accept my
> aplogies and ignore that mail as it's my first bug report.
> 

> We have been facing a weird behaviour of "grep -E" on Debian Squeeze
> versions which seems not to happen in lenny or wheezy versions.

The behavior you are seeing is locale-dependent.

> 
> Exemple :
> 
> echo "tanZANIE" | grep -E '^[a-z]{2,20}$'
> No output (normal behaviour)
> 
> echo "tanzANIE" | grep -E '^[a-z]{2,20}$'
> output : "tanzANIE"

You are probably running grep inside a locale that has case-insensitive
sorting, and thus where the range [a-z] actually expands to [aAbB...yYz]
(but not Z).  For example, glibc's en_US.UTF-8 locale has that behavior.
 POSIX says that the use of range operators in regular expressions is
undefined outside of the C locale, precisely because of this
rather-confusing historical behavior.

There is an effort underway to convert GNU tools to use Rational Range
Interpretation, where [a-z] will be forcefully translated to [abc...yz]
regardless of locale, even when libc would behave otherwise by default.
 I'm not sure if that conversion has yet hit the version of grep that
you are using, but it may be part of the answer in the difference you
are seeing.  The other thing to do is to check the output of 'locale'
between the machines that differ.

Meanwhile, the only PORTABLE way to get the behavior you want is to
avoid range expressions outside of the C locale, by either spelling out
the range:

echo "tanzANIE" | grep -E '^[abcdefghijklmnopqrstuvwxyz]{2,20}$'

or by forcing the locale:

echo "tanzANIE" | LC_ALL=C grep -E '^[a-z]{2,20}$'

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]