help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: will we ever have zero width assertions in regexps?


From: Stefan Monnier
Subject: Re: will we ever have zero width assertions in regexps?
Date: Fri, 28 Jan 2011 21:51:55 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

>> To get rid of the occasional pathological case where matching takes
>> forever and Emacs appears to be frozen.  Programmers who are used to
>> backtracking matchers will usually intuitively stay away from regexps
>> that can show such behaviors, but not all programmers do, and even if
>> you're careful there are cases that are hard to avoid.
> Did you try it with Perl recently (last 10 years or so)?

No, but neither have I bumped into pathological cases in Perl before
that (when I did use it).

> As I said, I put some optimizations which in most (AFAIK) practical
> senses remove such pathologies.  (The underlying problems remain; the
> optimizations are only "heuristic"; but one needs to be extra
> inventive to circumvent the optimizations.)

A typical case could look something like "foo *(.*?) *bar". when
matching "foo ..<many space>.. baZ".
Emacs's regexp engine is not very clever and doesn't do much in terms of
avoiding backtracking (it mostly takes care of <foo>*<bar> when <foo>
can only match a single char and <bar> can only start with a char that's
not matched by <foo>), but I can't think of too many ways to handle the
above one efficiently within a "backtracking regexp matcher" framework.

>> Another minor reason is that it can be handy to have an incremental
>> matching primitive, so you can match over a long string one chunk at
>> a time.  I'm not sure how often this would be useful, but I've come
>> across a few cases where it seemed like it could be put to good use
>> (tho, for lack of experience with it, I can't sweat that it would turn
>> out to be a good idea).
> Do not know what you mean by this...

Basically, provide a primitive like (match-string RE STRING LIMIT) that
can not only say "matched between START and END", but also "reached
LIMIT within yet finding a match, here's the suspended SEARCH-STATE at
LIMIT", so you can later resume the search starting at LIMIT by passing
that state.


        Stefan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]