[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep backreference seems to invalidate --ignore-case

From: Mabry Tyson
Subject: grep backreference seems to invalidate --ignore-case
Date: Mon, 19 Dec 2005 02:07:25 -0800
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.12) Gecko/20050915

I found what I believe to be a failure of grep to match something it should. It appears that the use of a backreference invalidates the existence of an --ignore-case switch. I get the same (unexpected) results in GNU grep 2.4.2, 2.5, and 2.5.1 on Solaris 8 and on grep 2.5 on Mac OSX 10.3.9. The version of grep from Sun in Solaris 8 does work as expected.

To make sure this hasn't been recently fixed, I downloaded ftp://ftp.gnu.org/gnu/grep/grep-2.5.1a.tar.gz
and built grep from that.

manresa 181: uname -a
SunOS manresa 5.8 Generic_108528-24 sun4u sparc SUNW,Sun-Blade-100
manresa 182: src/grep --version
grep (GNU grep) 2.5.1

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO

manresa 183: cat /tmp/test
A abcd abcd
manresa 184: src/grep  --ignore-case 'a \(abcd\) \1' /tmp/test
manresa 185: src/grep  --ignore-case 'A \(abcd\) \1' /tmp/test
A abcd abcd
manresa 186: src/grep  --ignore-case 'A \(ABCD\) \1' /tmp/test
manresa 187: src/grep  --ignore-case 'A \(ABCD\) ABCD' /tmp/test
A abcd abcd
manresa 188: src/grep  --ignore-case 'a \(ABCD\) ABCD' /tmp/test
A abcd abcd

It is my belief that all of these calls to grep should have returned the line from the file.

The grep distributed with Solaris 8 acts as I expect

manresa 192: /usr/bin/grep -i 'a \(abcd\) \1' /tmp/test A abcd abcd

Another test case:

manresa 51: cat /tmp/test2
a abcd aBcD
manresa 52: src/grep --ignore-case 'a \(abcd\) \1' /tmp/test2
manresa 53: /usr/bin/grep -i 'a \(abcd\) \1' /tmp/test2
a abcd aBcD

In this case, however, the documentation is somewhat ambiguous. --ignore-case is documented as "Ignore case distinctions in both the PATTERN and the input files." A backreference is documented as "matches the substring previously matched by the Nth parenthesized subexpression of the regular expression." It isn't clear whether a backreference must match the substring exactly, or possibly match it, ignoring case. It appears that at least the grep used in Solaris matched the substring, ignoring case if --ignore-case is also given. I would argue that this is the correct behavior as the --ignore-case indicates to ignore the case in the input files. However this is resolved, the documentation should clarify what it does.

It appears that GNU emacs 21.12.1 (on Mac OS X) does regular expression matching as I expect. When case-fold-search = t, the expression

(search-forward-regexp "a \\(abcd\\) \\1")

will match each of the lines

a abcd abcd A abcd abcd a abcd aBcD

reply via email to

[Prev in Thread] Current Thread [Next in Thread]