bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: find + sh + grep


From: Eric Blake
Subject: Re: find + sh + grep
Date: Tue, 25 Oct 2011 10:25:36 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110928 Fedora/3.1.15-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.4 Thunderbird/3.1.15

[re-adding the list, so that others may learn from this conversation]

On 10/25/2011 10:08 AM, Kirk Korver wrote:
Eric,

I wanted to thank you again for all of your help. I have one additional
question. I will understand if you are too busy to respond. It is not
blocking me, it is just for my personal education. You are more
knowledgeable than anyone else I know.

/me blushes
Not quite true - part of becoming an "expert" is realizing that there is almost always someone out there better than you :)

Here is what I type, and the result.



grep '$Header:.*$' NoEnd.h

$Header:

So my initial analysis wasn't quite right.  According to POSIX,

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

A <dollar-sign> ( '$' ) shall be an anchor when used as the last character of an entire BRE. The implementation may treat a <dollar-sign> as an anchor when used as the last character of a subexpression.

Therefore, I was wrong in stating that you have to use \$; a POSIX-compliant regex engine should correctly recognize that '$Header' uses $ where it cannot be an anchor, and thus treat it as a literal character without needing the \$. [Now, how many regex implementations actually implement this part of POSIX, and how many get it wrong?] In fact, a strict reading of POSIX makes it sound like \$ is undefined if $ would not otherwise be an anchor!


So now on to my (mis)understandings



1) If I had typed    grep '\$Header:.*\$' NoEnd.h    I would not have a
match. The single quote tells the shell to not change the contents, and the
\$ is the dollar sign.

If $ can be an anchor, then \$ is the proper way to match a literal '$'. At least GNU sed and grep treat \$ as a literal '$' everywhere, whether or not the $ could match an anchor.


2) In what I did type, the dollar sign is the 'end of line'character

3) I typed, find a line where there is an end of line, followed by Header:
This is not my intent

Back to that pesky POSIX wording - if $ is not at the end of the regex or a subexpression of the regex, then it cannot be an anchor, therefore it does not match the end of line and instead matches a literal '$'.


4) I also tried     grep   "$Header:.*$" NoEnd.h    and got 3 matches. I
then realized that the    $Header   was being interpreted as the environment
variable  Header  which is currently not set, so this becomes      grep
':.*'   which matches all lines with a colon in them. I am not sure what the
$  all by itself means. Short lesson there, understand better the difference
between the single quote, and the double quote, and then type what I mean. J

Yes, in shell expansion, "$Header" is much different from '$Header'/"\$Header", which is in turn different from '\$Header'.

Also, in shell, "$" produces $, rather than a variable expansion, since there was no variable name to expand, but it's risky enough that you should generally escape that particular $.




Now my question, I do not understand why there is a match, in what I
initially typed. I am missing something about the regular expression. I
believe that

$Header:.*$

means

[end of line]Header:[any character][any number of times][end of line], which
should not yield a match.

It actually means literal $, literal Header:, any number of characters, and end of line.




Can you shed some light?

So my initial analysis wasn't quite right - but I still stand by the conclusion that you had a bad regex. It was the combination of double shell expansion, inside "", that ate enough levels of \ that you ended up expanding $Header instead of searching for a literal $, so you were using a different regex than you had planned (:.*$ instead of $Header:.*$).

At any rate, thanks for forcing me to re-read POSIX and add better information to this thread.

P.S. with an 801 number, do you live in SLC?

Yep.

--
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]