bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#40634: Massive pattern list handling with -E format seems very slow


From: Norihiro Tanaka
Subject: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28.
Date: Sun, 19 Apr 2020 11:10:26 +0900

On Sun, 19 Apr 2020 07:41:49 +0900
Norihiro Tanaka <address@hidden> wrote:

> 
> On Sat, 18 Apr 2020 00:22:26 +0900
> Norihiro Tanaka <address@hidden> wrote:
> 
> > 
> > On Fri, 17 Apr 2020 10:24:42 +0900
> > Norihiro Tanaka <address@hidden> wrote:
> > 
> > > 
> > > On Fri, 17 Apr 2020 09:35:36 +0900
> > > Norihiro Tanaka <address@hidden> wrote:
> > > 
> > > > 
> > > > On Thu, 16 Apr 2020 16:00:29 -0700
> > > > Paul Eggert <address@hidden> wrote:
> > > > 
> > > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote:
> > > > > 
> > > > > > I have had no idea to solve the problem yet.  If we revert it, 
> > > > > > bug#33357
> > > > > > will come back.
> > > > > 
> > > > > Yes, I'd rather not revert if we can help it.
> > > > > 
> > > > > My own thought was to not analyze the regular expression if we 
> > > > > discover that the input is empty. :-)
> > > > 
> > > > Now, I have a idea, it is that we build indexes of epsilon nodes
> > > > including in follows before remove epsilon nodes.
> > > 
> > > 
> > > I wrote fix for the bug, but it will be slower then at grep 2.27 yet.
> > 
> > It was improved previous patch.
> 
> Sorry, correct patch is here.

I made the previous patch even simpler.

before:

$ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null
real 7.24
user 7.14
sys 0.09

after:

$ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null
real 0.62
user 0.52
sys 0.10

Attachment: 0001-dfa-use-backword-set-in-removal-of-epsilon-closure.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]