[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-hackers] [PATCH] [SECURITY] Update irregex to upstream 0.9.
From: |
Mario Domenech Goulart |
Subject: |
Re: [Chicken-hackers] [PATCH] [SECURITY] Update irregex to upstream 0.9.6 |
Date: |
Wed, 14 Dec 2016 21:34:53 +0100 |
Hi,
On Wed, 14 Dec 2016 20:39:18 +0100 Peter Bex <address@hidden> wrote:
> Attached is a patch to update irregex to the upstream version of 0.9.6.
>
> When compiling an absurdly nested regex like
> ($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($(${-2,16}+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)
> the engine would consume gigabytes of memory.
>
> The reason is that (+ foo) would be rewritten to (seq foo (* foo)),
> causing the regex to become twice as large. If the nested regex itself
> also contains +, this happens recursively. Each subexpression
> will be compiled to a backtracking matcher, building up closures, which
> eats up even more memory than just the SRE list representation.
>
> The fix is to handle + "natively" instead of rewriting it. The patch
> also refactors * in terms of + (it is simply + with a failure
> continuation that carries on matching the next expression).
>
> The patch also includes two small changes by Sudarshan S. Chawathe,
> who improved the clarity of the documentation (sre->string generates
> a PCRE regex pattern, not a POSIX pattern) and fixed a small bug in
> the sre matcher of sre->string: (sre->string '(seq)) will give an
> error "(cddr) bad argument type: ()" instead of returning "" as an
> empty sequence should.
Thanks a lot, Peter. Pushed.
All the best.
Mario
--
http://parenteses.org/mario