bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22357: grep -f not only huge memory usage, but also huge time cost


From: Paul Eggert
Subject: bug#22357: grep -f not only huge memory usage, but also huge time cost
Date: Mon, 19 Dec 2016 15:38:12 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1

On 12/19/2016 02:38 PM, Norihiro Tanaka wrote:
BTW, What case do you the worst? One more I think previous 'replace' is
not O(N*(N + log N)) but O(N + N log N) i.e. O(N log N) .

Well, perhaps I misunderstood it, but the old 'replace' called 'delete' up to N times, and 'delete' took O(log N) to find the deleted item plus O(N) to move later, non-deleted items.

Anyway, I verified that the change improved performance on the test case with -f /usr/share/dict/words (not as much as your earlier patch improved things! just 5% to 10%, if I recall correctly), so even if I'm wrong about the big-O improvement the change should still be a win.

1. dfa is not optimize parsed patterns.
     For example, dfa does not replace pattern 'a \|a \|a \| ... \|a \|b'
     to 'a \|b'.

     I tried it several times, have not succeeded it yet.

That sounds a bit like REduce:

Valgenti VC, Kim MS, Oh S-I, Lee I. REduce: removing redundancy from regular expression matching in network security. ICCCN 2015. http://dx.doi.org/10.1109/ICCCN.2015.7288457


  2. grep matcher compiles regex always to check syntax of a pattern,
     even when it is not used in searching.

     I also tried it, have not succeeded it yet.

Yes, that sounds worthy too. Also, a lot of work has been done in regular-expression matches over the past several years, and if someone has the time it'd be nice to see whether GNU grep can use this stuff on modern architectures. For example:

Cameron RD, Medforth N, Lin D, Denis D, Sumner WN. Bitwise data parallelism with LLVM: the icGrep case study. ICA3PP 2015. LNCS 9529. http://dx.doi.org/10.1007/978-3-319-27122-4_26

Valgenti VC, Chhugani J, Sun Y, Satish N, Kim MS, Kim C, Dubey P. GPP-grep: high-speed regular expression processing engine on general purpose processors. RAID 2012. LNCS 7462. http://dx.doi.org/10.1007/978-3-642-33338-5_17






reply via email to

[Prev in Thread] Current Thread [Next in Thread]