bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep -f scales extremely poorly with number of lines in pattern file


From: Narfi Stefansson
Subject: Re: grep -f scales extremely poorly with number of lines in pattern file
Date: Mon, 20 Mar 2006 14:32:53 -0500

On 3/20/06, Stepan Kasal <address@hidden> wrote:
> On Sun, Mar 19, 2006 at 11:28:48PM -0500, Narfi Stefansson wrote:
> > I was expecting it to take order n time to match against n patterns, where n
> > is the number of lines in the pattern file.
> >
> > What I am experiencing:  The execution time of grep -f behaves like n^a, 
> > with
> > a somewhere in the range [2.5, 3.0].
>
> yes, that is the expected behaviour: the goal of grep is to scan files which
> are much bigger than the total size of all the given patterns.
>
> So it is OK that the time required to build the ``compiled structure'' from
> the pattern list is huge.  fgrep compiles all the strings to one huge 
> structure
> and grep/egrep should treat all the patterns as if it were one huge regular
> expression.
>
This explaines the implementation, but it doesn't answer the question
of whether this is desirable or not.  Look, the net effect is that
grep -f is nearly unusable with 1000 lines in the pattern file,
regardless of how small or large of a file it is scanning.

The speed of grep became a real issue for me when I needed to grep
through a file with 40.000 lines with 1000 lines in the pattern file. 
This took 40 min instead of the approx. 1 min 10 sec that I was
expecting.

Best,

Narfi




reply via email to

[Prev in Thread] Current Thread [Next in Thread]