bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: find + sh + grep


From: Eric Blake
Subject: Re: find + sh + grep
Date: Mon, 24 Oct 2011 12:14:05 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110928 Fedora/3.1.15-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.4 Thunderbird/3.1.15

On 10/21/2011 07:33 PM, Kirk Korver wrote:


find . -type f -name bp_fs.h ! -exec sh -c "cat {} | head -n 5 | grep -q
'\$Header:.*\$'" \; -print

Your script has a useless use of cat; it is shorter to use this -exec:

-exec sh -c "head -n 5 {} | grep -q '\$Header:.*\$'" \;

Also, POSIX states that with -exec,

"If a utility_name or argument string contains the two characters "{}" , but not just the two characters "{}" , it is implementation-defined whether find replaces those two characters or uses the string without change."

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/find.html

GNU find happens to replace the {}, but not all other find implementations do, so to be portable, you have to rewrite your script:

-exec sh -c 'head -n 5 "$1" | grep -q '"'\$Header:.*\$'" sh {} \;

But those are just improvements, and don't answer your original question.

Your real problem is that you are using the wrong regex. Stick in a printf to see what you are currently executing:

-exec sh -c \
  'head -n 5 "$1" | printf :%s: grep -q '"'\$Header:.*\$'; echo" \
  sh {} \;

Hmm, see how that outputs:

:grep::-q::$Header.*$:

which means you were invoking:

grep -q '$Header.*$'

But grep cannot match (there is no file where the end of line is followed by additional text). What you _wanted_ was extra quoting in front of the first $ that survives _both_ shell and sh interpolation, so that you invoke

grep -q '\$Header.*$'

so to get that, you have to invoke:

-exec sh -c 'head -n 5 "$1" | grep -q '"'\\\$Header:.*\$'" sh {} \;

Fix your regex, and you should be good to go.


The last two times, I was just trying the grep by itself. I think this
experiment shows that grep does not find a match of $Header.....$, mostly
because there is not one, but somehow I am getting a match on the
complicated    find + sh + grep combination. I am sure I am missing
something, but I do not know what.

You were missing the fact that you were dealing with not one, but two levels of shell quoting (the one in the shell where you invoked find, and the one in the sh -c invocation).

Some parting thoughts:

grep (and sed) automatically match to the end of the line. That is, the regex '\$Header:.*$' and '\$Header:' are identical at determining a match.

Anything that uses head|grep can generally be rewritten to use sed, and by using just sed, you can avoid an intermediate sh. Fewer processes means faster operation (_especially_ on your cygwin environment). In this case, I like the GNU sed extension of q1 to force a non-zero exit status on a range-restricted match, while quitting with zero status if I exceed that range. Therefore, since cygwin already provides GNU find and sed, you can get away with this faster (but non-portable) variant, which also has the benefit of no double quoting:

find -type f -name bp_fs.h \
  -exec sed -n '1,5{/\$Header:/q1};6q' {} \; -print

Alas, I don't know of any way to use -exec ... {} + for fewer processes, even with sed.git with the -s option and the new F command, because I don't have enough sed expertise to write a script that outputs the filename of each of its multiple input files exactly in the case where the first five lines do not contain a regex match.

--
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]