chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] regex speedup


From: Elf
Subject: Re: [Chicken-hackers] regex speedup
Date: Mon, 10 Mar 2008 23:39:40 -0700 (PDT)


patch applied!  thanks!

-elf

On Tue, 11 Mar 2008, Jim Ursetto wrote:

Core hackers,

The attached patch has been shown to speed up regex matching considerably
in a typical case.  It should also do so across the board.

Background: I observed a 40KB POST request to Spiffy
(x-www-form-urlencoded) to require 1.8 seconds on my machine for a
response.  Investigation showed the entire time was spent in
http:canonicalize-string.  There was a suboptimal algorithm there
which generated a lot of garbage; I fixed that in the http egg.

However, a core patch will complete the fix.  The problem is in
string-search-positions and ultimately in re-match.  Any time
a scheme string is passed to pcre_exec(), it is copied.
For large strings, or a typical use case for string-search-positions
-- e.g. repeatedly calling it to get the next match -- the overhead
is very high.

The patch just changes the type from c-string to scheme-pointer,
as pcre_exec() takes a length argument and doesn't need a null
terminator.

With both changes made Spiffy response time drops from 1.8 s to 0.1 s.

For overkill purposes, here is some supporting data.

Prelude:
(use url http-utils)
(define (encode str)
 (string-translate (url-encode str (list #\_ #\- #\* #\. #\@ #\space))
                   " " "+"))
(define raw ...)    ; raw text, length 30029
(define cooked (encode raw))  ; length 36889

Timing:
,t (begin (http:canonicalize-string cooked) (void))

original                w/regex patch

1.957 s elapsed        0.334 s elapsed
1.831 s in GC            0.23 s in GC
 2465 major GCs          311 major GCs

http fixes              w/regex patch

 0.26 s elapsed        0.022 s elapsed
0.195 s in GC              0 s in GC
  175 major GCs            0 major GCs





reply via email to

[Prev in Thread] Current Thread [Next in Thread]