bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug#577095: grep: bracket expressions fails depending on the locale


From: Jim Meyering
Subject: Re: Bug#577095: grep: bracket expressions fails depending on the locale
Date: Sat, 10 Apr 2010 09:42:28 +0200

Aníbal Monsalve Salazar wrote:
> I reproduced this bug, see below.
>
> grep --version
> GNU grep 2.6.3
>
> cat /tmp/a
> root:x:0:0:root:/root:/bin/bash
> anibal:x:1000:1000:Anibal Monsalve Salazar,,,:/home/anibal:/bin/bash
> Debian-exim:x:102:104::/var/spool/exim4:/bin/false
> ntp:x:106:108::/home/ntp:/bin/false
>
> grep -E '^[A-Z]' /tmp/a
> root:x:0:0:root:/root:/bin/bash
> Debian-exim:x:102:104::/var/spool/exim4:/bin/false
> ntp:x:106:108::/home/ntp:/bin/false
>
> grep -Ev '^[A-Z]' /tmp/a
> anibal:x:1000:1000:Anibal Monsalve Salazar,,,:/home/anibal:/bin/bash

Thanks for Cc'ing bug-grep, however this is not a bug in grep-2.6.3.
Rather, it demonstrates that grep-2.5.4-4 failed to honor your locale
settings.

As you noticed, what the [A-Z] range matches depends on your locale settings.
Run "locale" to print those settings.

In the C (aka POSIX) locale [A-Z] matches ASCII upper case ABC...Z,
but in many other locales it matches AbBbCc...Zz.
Demonstrate with this:

  $ for i in a A b B c C; do \
    printf "$i: "; echo $i | LC_ALL=en_US.UTF-8 grep -E '[A-Z]' || echo; done
  a:
  A: A
  b: b
  B: B
  c: c
  C: C

If you really want to match only the 26 ASCII upper case letters,
you can run grep in the C locale, even using that risky range notation:

  $ echo b | LC_ALL=C grep '[A-Z]'
  [Exit 1]
  $

However, it's better to avoid the '[A-Z]' range notation and to
prefer the '[[:upper:]]' character class.

Using the [[:CLASS_NAME:]] notation is essential if you also
want to match other (non-ASCII) upper case characters in your locale:

  $ echo É | LC_ALL=fr_FR.UTF-8 grep '[[:upper:]]'
  É

Using range notation is often not what you want:

  $ echo á | LC_ALL=fr_FR.UTF-8 grep '[A-F]'
  á




reply via email to

[Prev in Thread] Current Thread [Next in Thread]