bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep -f scales extremely poorly with number of lines in pattern file


From: Julian Foad
Subject: Re: grep -f scales extremely poorly with number of lines in pattern file
Date: Thu, 30 Mar 2006 03:04:17 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050511

Narfi Stefansson wrote:
On 3/20/06, Stepan Kasal <address@hidden> wrote:
On Sun, Mar 19, 2006 at 11:28:48PM -0500, Narfi Stefansson wrote:

I was expecting it to take order n time to match against n patterns, where n
is the number of lines in the pattern file.

What I am experiencing:  The execution time of grep -f behaves like n^a, with
a somewhere in the range [2.5, 3.0].

yes, that is the expected behaviour: the goal of grep is to scan files which
are much bigger than the total size of all the given patterns.

So it is OK that the time required to build the ``compiled structure'' from
the pattern list is huge.  [...]

This explaines the implementation, but it doesn't answer the question
of whether this is desirable or not.  Look, the net effect is that
grep -f is nearly unusable with 1000 lines in the pattern file,

Obviously this is not desirable. I think that O(N^2) behaviour is to be frowned upon, and anything worse is embarassing, even in a part of the program like this that is not really intended to take large numbers of patterns. There is no reason (I'm fairly confident) why it should need to be this bad, and so we ought to make it better.

However, it is not likely to be fixed soon unless you can fix it or get somebody else to do so.

Please file a bug report about this.

- Julian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]