bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: find + sh + grep


From: Kirk Korver
Subject: RE: find + sh + grep
Date: Mon, 24 Oct 2011 17:44:09 -0700

Dear Eric,
Thank you for taking the time to explain to me what I was doing wrong, and
to suggest a faster script. You are wonderful!

Kind regards,
Kirk

-----Original Message-----
From: Eric Blake [mailto:address@hidden 
Sent: Monday, October 24, 2011 11:14 AM
To: Kirk Korver
Cc: address@hidden
Subject: Re: find + sh + grep

On 10/21/2011 07:33 PM, Kirk Korver wrote:
>
>
> find . -type f -name bp_fs.h ! -exec sh -c "cat {} | head -n 5 | grep -q
> '\$Header:.*\$'" \; -print

Your script has a useless use of cat; it is shorter to use this -exec:

-exec sh -c "head -n 5 {} | grep -q '\$Header:.*\$'" \;

Also, POSIX states that with -exec,

"If a utility_name or argument string contains the two characters "{}" , 
but not just the two characters "{}" , it is implementation-defined 
whether find replaces those two characters or uses the string without 
change."

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/find.html

GNU find happens to replace the {}, but not all other find 
implementations do, so to be portable, you have to rewrite your script:

-exec sh -c 'head -n 5 "$1" | grep -q '"'\$Header:.*\$'" sh {} \;

But those are just improvements, and don't answer your original question.

Your real problem is that you are using the wrong regex.  Stick in a 
printf to see what you are currently executing:

-exec sh -c \
   'head -n 5 "$1" | printf :%s: grep -q '"'\$Header:.*\$'; echo" \
   sh {} \;

Hmm, see how that outputs:

:grep::-q::$Header.*$:

which means you were invoking:

grep -q '$Header.*$'

But grep cannot match (there is no file where the end of line is 
followed by additional text).  What you _wanted_ was extra quoting in 
front of the first $ that survives _both_ shell and sh interpolation, so 
that you invoke

grep -q '\$Header.*$'

so to get that, you have to invoke:

-exec sh -c 'head -n 5 "$1" | grep -q '"'\\\$Header:.*\$'" sh {} \;

Fix your regex, and you should be good to go.

>
> The last two times, I was just trying the grep by itself. I think this
> experiment shows that grep does not find a match of $Header.....$, mostly
> because there is not one, but somehow I am getting a match on the
> complicated    find + sh + grep combination. I am sure I am missing
> something, but I do not know what.

You were missing the fact that you were dealing with not one, but two 
levels of shell quoting (the one in the shell where you invoked find, 
and the one in the sh -c invocation).

Some parting thoughts:

grep (and sed) automatically match to the end of the line.  That is, the 
regex '\$Header:.*$' and '\$Header:' are identical at determining a match.

Anything that uses head|grep can generally be rewritten to use sed, and 
by using just sed, you can avoid an intermediate sh.  Fewer processes 
means faster operation (_especially_ on your cygwin environment).  In 
this case, I like the GNU sed extension of q1 to force a non-zero exit 
status on a range-restricted match, while quitting with zero status if I 
exceed that range.  Therefore, since cygwin already provides GNU find 
and sed, you can get away with this faster (but non-portable) variant, 
which also has the benefit of no double quoting:

find -type f -name bp_fs.h \
   -exec sed -n '1,5{/\$Header:/q1};6q' {} \; -print

Alas, I don't know of any way to use -exec ... {} + for fewer processes, 
even with sed.git with the -s option and the new F command, because I 
don't have enough sed expertise to write a script that outputs the 
filename of each of its multiple input files exactly in the case where 
the first five lines do not contain a regex match.

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]