bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6


From: Jim Meyering
Subject: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6
Date: Thu, 26 Nov 2020 09:03:42 -1000

On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering <jim@meyering.net> wrote:
> Thank you for the fine bug report.
> The grep-3.6 bug you've exposed is due to the fact that your input
> triggers excessive hash collisions when using the code modeled after
> gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase
> take O(N^2) time for N patterns. In the attached, I've switched grep
> to use the djb2 hash function, and that resolves the problem. I'll
> also add a NEWS entry and a test before pushing this.

Timings suggest that grep-3.6's preprocessing came closer to O(N^3).
Here's an example that would take 2-3 days with grep-3.6 and only
seconds with this fix:

  : | grep -Ff <(seq 6400000 | tr 0-9 A-J)

Here's a complete patch.
I'll push it later today.

Attachment: 0001-grep-avoid-performance-regression-with-many-patterns.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]