[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#22357: grep -f not only huge memory usage, but also huge time cost
From: |
Norihiro Tanaka |
Subject: |
bug#22357: grep -f not only huge memory usage, but also huge time cost |
Date: |
Mon, 12 Dec 2016 00:26:33 +0900 |
On Sun, 11 Dec 2016 05:28:56 -0600
Trevor Cordes <address@hidden> wrote:
> On my box the above runs for >2m (never completes before I ^C) on the
> version **AFTER** the commits (v2.22). On the test build just *BEFORE*
> the commits (2.21.73-8058), it runs in <2s. So for me, I had a working
> command (-F -w -f) that used to run quickly, that suddenly became
> slow/useless. And none of the suggestions I've read can alleviate the
> problem for me, not locale settings, nor anything else. If you can
> confirm this is the case on your box, then maybe we can figure out why?
Can you re-run with current master, as dfa has been improved frequently?
> The main question I think for my case is why is "-F -w -f" switching
> modes to the slow DFA at all? AFAIK there's no encoding flaw in the
> input, so why is grep even switching modes in this use case? What was
> wrong with just using the old mode? From what I've read, people don't
> want DFA-mode when using -F and -f.
>
> Or, there should be an option to grep that we can specify to turn off
> this new behavior to get back the old (<2s) performance.
>
> Thanks!
dfa matcher is not always slower than kws matcher.
- $ env LC_ALL=C grep -F -w 0 k
- $ env LC_ALL=C grep -F -w -f /usr/share/dict/words /dev/null
First is faster after the changes, and second is slower after the
changes. It's a trade-off. Can you have any idea to select the better
matcher for both two cases?
If we think simple algorithm as we select kws matcher for grep -F -w -f
and select grep dfa matcher for -F -w -e, we will get extreamly different
performance for following two cases.
- $ env LC_ALL=C grep -F -w -f /usr/share/dict/words /dev/null
- $ env LC_ALL=C grep -F -w -e "`cat /usr/share/dict/words`" /dev/null
Thanks,
Norihiro
- bug#22357: grep -f not only huge memory usage, but also huge time cost, Trevor Cordes, 2016/12/09
- bug#22357: grep -f not only huge memory usage, but also huge time cost, Norihiro Tanaka, 2016/12/10
- bug#22357: grep -f not only huge memory usage, but also huge time cost, Trevor Cordes, 2016/12/11
- bug#22357: grep -f not only huge memory usage, but also huge time cost,
Norihiro Tanaka <=
- bug#22357: grep -f not only huge memory usage, but also huge time cost, Bruno Haible, 2016/12/11
- bug#22357: grep -f not only huge memory usage, but also huge time cost, arnold, 2016/12/11
- bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/11
- bug#22357: grep -f not only huge memory usage, but also huge time cost, Bruno Haible, 2016/12/12
- bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/14
- bug#22357: grep -f not only huge memory usage, but also huge time cost, Norihiro Tanaka, 2016/12/17
- bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/19
- bug#22357: grep -f not only huge memory usage, but also huge time cost, Norihiro Tanaka, 2016/12/19
- bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/19
- bug#22357: grep -f not only huge memory usage, but also huge time cost, Norihiro Tanaka, 2016/12/20