[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: faster fnmatch
Re: faster fnmatch
Fri, 17 Apr 2009 10:47:20 +0100
[ CC += bug-findutils, bug-gnulib; bug-coreutils moved to BCC ]
On Thu, Apr 16, 2009 at 8:09 PM, Ondrej Bilka <address@hidden> wrote:
> Hello. I am writing partial fnmatch to speed up locate et al.
Please note that locate is part of GNU findutils, not GNU coreutils.
I have CC'ed this email to the correct list.
Regular expression matching with "locate -r" is also faster than the
default, globbing, match.
> Idea is save state state to match common prefix only once.
> fnmatch source is too complex so I wrote simplified version.
> My code is 2-4 times faster than fnmatch.
> Here is list what provided speedup and can be applied to original source
> FOLD causes worst performance slowdown.
> From what I tried is best convert in buffer to uppercase when needed.
This seems like an attractive option but I'm a little concerned about
whether this will produce the correct result with characters like ß or
Ï and ï.
> Other optimalizations are based on preprocessing pattern
> 1. For each * we compute minimal size of rest and we have smaller for cycles.
> 2. We can replace *? by ?*
> 3. If * is followed by letter we check it at for cycle of *
> 4. If pattern contains four consecutive letters we compare them as int
I'm, not certain I have fully understood your points, but if you can
speed up gnulib's fnmatch implementation, that would be good news. I
looked at preprocessing the glob pattern to convert it into a regular
expression, but I think there were some problems around collating
symbols and character equivalence classes (i.e. that it''s not
possible for the application to extract information about these from
the C library).
Since I last looked at this issue though, Bruno has implemented a
large amount of Unicode support in gnulib, and this may in fact help
solve the problem too. I've CC'ed the message to bug-gnulib since
that's where the folks who maintain the implementations of that
library and the fnmatch replacement hang out.
> I also tried optimalizations which didnt worked.
> use mask to match four letters and ?
> strlen trick to find first letter after * faster.
> Boyer-moore skiping didnt work either.
> 50% of the manual is in .pdf readme files
> Bug-coreutils mailing list
- Re: faster fnmatch,
James Youngman <=