[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-users] an oddly slow regex ...
From: |
J Altfas |
Subject: |
Re: [Chicken-users] an oddly slow regex ... |
Date: |
Tue, 29 Oct 2013 00:47:00 -0700 |
FWIW, I tried the same regex in Tcl. Using tclsh, I entered at the
prompt:
% regexp -inline {[a-z][a-z0-9\\-_.]{0,20}}
"a012345678901234567890123456789"
a01234567890123456789
It seemed to work fast, but to quantify execution time, the timed
output of 100000 iterations was "1.3823 microseconds per iteration".
The Tcl "hybrid" regex engine is well-described at
http://wiki.tcl.tk/396 and http://wiki.tcl.tk/461. The "nuts and
bolts" of its design are beyond my level of comprehension, but those
more familiar with the subject might find it interesting.
Jules Altfas.
> On Sat, 26 Oct 2013 20:38:12 +0200, Peter Bex
<address@hidden> wrote:
> On Sat, Oct 26, 2013 at 10:37:36AM -0700, Matt Welland wrote:
> > This regex is so slow that you don't need a timer to see the
impact (at
> > least not on my machine with chicken 4.8.0):
> >
> > (string-match "[a-z][a-z0-9\\-_.]{0,20}"
"a012345678901234567890123456789")
> >
> > Changing the {0,20} to + makes it run normally fast so I just
replaced the
> > regex with a string-length and modified the "{0,20}" to "+" . I
don't
> > necessarily need a fix for this but it seems like a possible
symptom of a
> > deeper problem so I thought I'd report it.
>
> Hi Matt,
>
> Thanks for your report. I'm afraid this is a known problem with
> Irregex - to avoid producing a state machine with too many states,
> it will always use a backtracking implementation for all
repetition
> counts.
>
> I think it's best to take a look at how to fix this upstream
first.
> Maybe Alex has an idea of how to do that.
>
> > Pre-compiling the regex didn't seem to make any difference.
>
> That's because the backtracker matches really slowly.
>
> > Just for completeness I compared with Ruby and the ruby
equivalent
> > is (in human terms) instant.
> >
> > "a012345678901234567890123456789".match(/[a-z][a-z0-9]{0,20}/)
>
> Thanks for that! We need more of such real-world test cases.
> Ideally I'd love to have a complete library of regex examples,
which
> we can use to optimize irregex further.
>
> By the way, I note that your Ruby regex isn't exactly the same as
the
> one you used in CHICKEN. Does it make a difference if you use the
> same regex?
>
> Cheers,
> Peter
> --
> http://www.more-magic.net
>
> _______________________________________________
> Chicken-users mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/chicken-users
>