emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fixing ill-conditioned regular expressions. Proof of concept.


From: Alan Mackenzie
Subject: Re: Fixing ill-conditioned regular expressions. Proof of concept.
Date: Thu, 26 Feb 2015 13:09:18 +0000
User-agent: Mutt/1.5.22 (2013-10-16)

Hello, Tassilo.

On Thu, Feb 26, 2015 at 12:05:37PM +0100, Tassilo Horn wrote:
> Alan Mackenzie <address@hidden> writes:

> Hi Alan,

> >> Sure, but you could remember how the \(...\) constructs were
> >> renumbered, and fix the match data after the underlying regexp call
> >> returned.  It shouldn't be a big deal.

> > Unfortunately, it's not that simple.  Consider the RE

> >     \(R\)+E*\(R\)+
> >      1  1    2  2

> > .  This gets transformed to

> >     \(R\)+\(?:E+\(R\)+\|\(R\)\)
> >      1  1        2  2    2  2

> > .  What was subexpression 2 in the original has become two
> > subexpressions straddling an \| sign in the transformation.  I don't
> > think there's a way of transforming R+E*R+ that preserves the
> > numbering of the subexpressions.

> Couldn't you use explicitly numbered groups, i.e., the regex would
> translate to

>     \(?1:R\)+\(?:E+\(?2:R\)+\|\(?2:R\)\)

> ?

Thanks.  That's a brilliant idea!  I think it would work.

> As long as the groups with the same number are exclusive there shouldn't
> be a problem.

I think that's true.  There might be a slight problem with groups which
match only the empty string.  Something like:

    R*\(\)R*

, but anybody who writes such regexps deserves what she gets.

> Bye,
> Tassilo

-- 
Alan Mackenzie (Nuremberg, Germany).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]