[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Grep-devel] bug#22357: grep -f not only huge memory usage, but also
From: |
Paul Eggert |
Subject: |
Re: [Grep-devel] bug#22357: grep -f not only huge memory usage, but also huge time cost |
Date: |
Mon, 19 Dec 2016 15:38:12 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 |
On 12/19/2016 02:38 PM, Norihiro Tanaka wrote:
BTW, What case do you the worst? One more I think previous 'replace' is
not O(N*(N + log N)) but O(N + N log N) i.e. O(N log N) .
Well, perhaps I misunderstood it, but the old 'replace' called 'delete'
up to N times, and 'delete' took O(log N) to find the deleted item plus
O(N) to move later, non-deleted items.
Anyway, I verified that the change improved performance on the test case
with -f /usr/share/dict/words (not as much as your earlier patch
improved things! just 5% to 10%, if I recall correctly), so even if I'm
wrong about the big-O improvement the change should still be a win.
1. dfa is not optimize parsed patterns.
For example, dfa does not replace pattern 'a \|a \|a \| ... \|a \|b'
to 'a \|b'.
I tried it several times, have not succeeded it yet.
That sounds a bit like REduce:
Valgenti VC, Kim MS, Oh S-I, Lee I. REduce: removing redundancy from
regular expression matching in network security. ICCCN 2015.
http://dx.doi.org/10.1109/ICCCN.2015.7288457
2. grep matcher compiles regex always to check syntax of a pattern,
even when it is not used in searching.
I also tried it, have not succeeded it yet.
Yes, that sounds worthy too. Also, a lot of work has been done in
regular-expression matches over the past several years, and if someone
has the time it'd be nice to see whether GNU grep can use this stuff on
modern architectures. For example:
Cameron RD, Medforth N, Lin D, Denis D, Sumner WN. Bitwise data
parallelism with LLVM: the icGrep case study. ICA3PP 2015. LNCS 9529.
http://dx.doi.org/10.1007/978-3-319-27122-4_26
Valgenti VC, Chhugani J, Sun Y, Satish N, Kim MS, Kim C, Dubey P.
GPP-grep: high-speed regular expression processing engine on general
purpose processors. RAID 2012. LNCS 7462.
http://dx.doi.org/10.1007/978-3-642-33338-5_17
- Re: [Grep-devel] bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/11
- Re: [Grep-devel] bug#22357: grep -f not only huge memory usage, but also huge time cost, Bruno Haible, 2016/12/12
- Re: [Grep-devel] bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/14
- Re: [Grep-devel] bug#22357: grep -f not only huge memory usage, but also huge time cost, Norihiro Tanaka, 2016/12/17
- Re: [Grep-devel] bug#22357: grep -f not only huge memory usage, but also huge time cost, Paul Eggert, 2016/12/19
- Re: [Grep-devel] bug#22357: grep -f not only huge memory usage, but also huge time cost, Norihiro Tanaka, 2016/12/19
- Re: [Grep-devel] bug#22357: grep -f not only huge memory usage, but also huge time cost,
Paul Eggert <=
- Re: [Grep-devel] bug#22357: grep -f not only huge memory usage, but also huge time cost, Norihiro Tanaka, 2016/12/20