help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug in elisp... or in elisper???


From: ken
Subject: Re: bug in elisp... or in elisper???
Date: Wed, 23 Mar 2011 10:18:34 -0400
User-agent: Thunderbird 2.0.0.24 (X11/20101213)

On 03/22/2011 08:15 PM PJ Weisberg wrote:
> On 3/22/11, ken <gebser@mousecar.com> wrote:
>> Fellow elispers,
>>
>> Something seems to be amiss in the search syntax here:
>>
>>  (setq aname-re-str
>> "<a\\([\s-\\|\n]+?\\)name=\"\\(.*?\\)\"\\([\s-\\|\n]*?\\)>\\(\\(.\\|\n\\)*?\\)</a\\(\\(
>> \\|\t\\|\n\\)*?\\)>" )
>>
> ...
>> The problem is that the 5th match-string should be either empty or
>> whitespace.  But it consistently contains the last character of of the
>> 4th match-string.  And these two matches are separated by the literal
>> character string, "</a"!!  What's up with this?
> 
> You miscounted your '('s.  The fifth group IS inside the fourth group,
> matching . or \n.
> 
> -PJ

It wasn't that I miscounted.  I read a doc which said that I couldn't
embed one potential match expression inside another.  (I mentioned this,
I believe, in a previous email.)  So I figured that, if this wasn't
allowed, I certainly couldn't count each expression inside a pair of
parens as another match.  But it seems that doc was wrong.

So this is actually good news: my RE works just as I want it to *and*
there's no bug in elisp to contend with.  I am, however, starting to
have trust issues with documentation I find on the web.  But I have you
guys here on this list as a reality check.

If one match expression *can* be embedded within another, this is good
news: it means I can write more comprehensive REs.  I.e., instead of
writing RE #1 to locate a section of text and then RE #2 to parse just
that section, REs #1 and #2 can be combined into one RE.  Radically cool.

So some further questions:

You might have noticed I use "\\([\s-\\|\n]+?\\)" to non-greedily match
one or more whitespace characters.  Can one "\\[...\\] be nested inside
another...?  e.g., "[[\s-\\|\n]+?]" or some syntax like that?

The "specialness" of "." seems to be lost when inside brackets; that is,
in "[.\n]*?" it seems to represent a regular period (.) rather than "any
character except newline".  Is there some way to bring back that
specialness?  Or is there some other RE to represent "multiple instances
of any character, including a newline"?

Is it actually true (what the docs say) that there's a limit of nine
sub-expression match-strings per RE?  Or can I do, e.g., "(match-string
12)" and "(match-string 15)"?  What is the actual limit?  Whatever it
is, is this hard-coded into elisp... or can it be changed/configured to
something else?


Thanks for the illumination.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]