[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Stack overflow in regexp matcher

From: Stefan Monnier
Subject: Re: Stack overflow in regexp matcher
Date: Mon, 07 Feb 2011 15:24:14 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

> The following fails with "Stack overflow in regexp matcher" in emacs 23
> and 24:

> (string-match
>  "^\\[.+\\]$"
>  (concat
>   "["
>   (mapconcat (lambda (i) "x") (number-sequence 1 33500) "")
>   "]"))

> This surprised me; I assumed that the ^ and $ anchors, and the simple
> ".+" requirement in the middle would result in a simple, efficient
> regexp.

The problem is that Emacs's regexp engine is not very clever.

Basically, each time . succeeds we first recurse, trying to match the
rest against ".*\\]$", and if that fails then we try to match
"\\]$" instead.  The fact that at each step we have this "fork" of two
different possible ways to match, means that at each step we have to
push something on the stack, so the longer the string against which we
match, the deeper the stack we need.

You can fix it by using a regexp that does not need this backtracking.
E.g. "^\\[[^]\n]+\\]$".  If you do that, then Emacs's regexp engine will
notice that when [^]\n] matches, then "\\]$" can't match, so there's no
point pushing something on the stack to try the second path when the
first fails.  I.e. it will do the whole match with a constant amount of
stack space.

This relates also to the recent discussion with Ilya in the thread
titled "will we ever have zero width assertions in regexps?".


reply via email to

[Prev in Thread] Current Thread [Next in Thread]