help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RE for any text, including white space


From: ken
Subject: Re: RE for any text, including white space
Date: Wed, 16 Mar 2011 17:53:04 -0400
User-agent: Thunderbird 2.0.0.24 (X11/20101213)

On 03/16/2011 03:40 PM PJ Weisberg wrote:
> On 3/16/11, ken <address@hidden> wrote:
>> What's the RE for any text, white space included?  I also want to grab
>> (for match-string...) this text.  The text is bounded by known
>> characters.  E.g.,
>>
>> <h3>Any Text-- <a name="thisname">
>> Hot Stuff</h3
>> In the above, how to grab the text of the title, i.e., everything
>> between <h3> and </h3>?  Conceivably this title text might contain
>> *anything* except "</[Hh]{1-9]".
>>
> 
> If A and B are your start and end points, then you want:
> 
> "A\\(.\\|\n\\)*?B"

That's almost it, but not quite.  It grabs only the on last character
before the "B"; in my example above it grabs just "f".  I'm needing to grab:

"Any Text-- <a name="thisname">
Hot Stuff"

-- without the quotes, of course.


> 
> You probably got thrown off by the fact that '.' matches anything
> EXCEPT a newline.  

Well, no, I discovered that a long time ago.  I'm thrown off by a lot of
things though... like why....  Well, I don't want to throw the thread
off in four other directions, so I won't say.

If what you gave me works to find just the "f" before "</h3", then
something like "<h3>\\(\\[.\n\t ]*\\)</h3" should work, right?  Nope.


> Regexps are usually assumed to be line-based.

Yeah.  That must be a throw-back to the mainframe days.  And that's
unfortunate.


> 
> The '?' is there to make the '*' non-greedy, to prevent it from
> matching everything between the first A and the last B in the whole
> buffer.

I've formulated a lot of other similar REs without using the '?' and
they work fine, so I didn't even try that.  Once I find something that
works, it would be interesting then to see the differential effect with
and without it.


> 
> The double '\'s are necessary in lisp code because it's interpreted as
> a string before it's passed to the regexp engine.

Yeah, I've seen and used a lot of that.  Most of the time my first guess
gets it right.


> 
> -PJ

Thanks much for the good attempt.

Ken





reply via email to

[Prev in Thread] Current Thread [Next in Thread]