chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] an oddly slow regex ...


From: J Altfas
Subject: Re: [Chicken-users] an oddly slow regex ...
Date: Tue, 29 Oct 2013 00:47:00 -0700

FWIW, I tried the same regex in Tcl.  Using tclsh, I entered at the 
prompt:

% regexp -inline {[a-z][a-z0-9\\-_.]{0,20}} 
"a012345678901234567890123456789"
a01234567890123456789

It seemed to work fast, but to quantify execution time, the timed 
output of 100000 iterations was "1.3823 microseconds per iteration".

The Tcl "hybrid" regex engine is well-described at 
http://wiki.tcl.tk/396 and http://wiki.tcl.tk/461.  The "nuts and 
bolts" of its design are beyond my level of comprehension, but those 
more familiar with the subject might find it interesting.

Jules Altfas.

> On Sat, 26 Oct 2013 20:38:12 +0200, Peter Bex 
<address@hidden> wrote:

> On Sat, Oct 26, 2013 at 10:37:36AM -0700, Matt Welland wrote:
> > This regex is so slow that you don't need a timer to see the 
impact (at
> > least not on my machine with chicken 4.8.0):
> > 
> >  (string-match "[a-z][a-z0-9\\-_.]{0,20}" 
"a012345678901234567890123456789")
> > 
> > Changing the {0,20} to + makes it run normally fast so I just 
replaced the
> > regex with a string-length and modified the "{0,20}" to "+" . I 
don't
> > necessarily need a fix for this but it seems like a possible 
symptom of a
> > deeper problem so I thought I'd report it.
> 
> Hi Matt,
> 
> Thanks for your report.  I'm afraid this is a known problem with
> Irregex - to avoid producing a state machine with too many states,
> it will always use a backtracking implementation for all 
repetition
> counts.
> 
> I think it's best to take a look at how to fix this upstream 
first.
> Maybe Alex has an idea of how to do that.
> 
> > Pre-compiling the regex didn't seem to make any difference.
> 
> That's because the backtracker matches really slowly.
> 
> > Just for completeness I compared with Ruby and the ruby 
equivalent
> > is (in human terms) instant.
> > 
> > "a012345678901234567890123456789".match(/[a-z][a-z0-9]{0,20}/)
> 
> Thanks for that!  We need more of such real-world test cases.
> Ideally I'd love to have a complete library of regex examples, 
which
> we can use to optimize irregex further.
> 
> By the way, I note that your Ruby regex isn't exactly the same as 
the
> one you used in CHICKEN.  Does it make a difference if you use the
> same regex?
> 
> Cheers,
> Peter
> -- 
> http://www.more-magic.net
> 
> _______________________________________________
> Chicken-users mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/chicken-users
>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]