bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22357: grep -f not only huge memory usage, but also huge time cost


From: Jim Meyering
Subject: bug#22357: grep -f not only huge memory usage, but also huge time cost
Date: Fri, 11 Mar 2016 12:17:48 -0800

[resending to keep the list on Cc]
On Thu, Mar 10, 2016 at 10:05 PM, JQK <address@hidden> wrote:
> On 03/11/2016 01:26 AM, Jim Meyering wrote:
>> On Thu, Mar 10, 2016 at 3:00 AM, JQK <address@hidden> wrote:
>>> If in the following situation,
>>>
>>> ===========
>>> file1 has numbers from 1 to 200000, 200000 lines
>>> file2 has several lines(about 200 ~300lines) of random numbers in the
>>> range of 1-200000
>>> ===========
>>>
>>> The time cost for finishing the following command could be over 15
>>> minutes on linux -- a little huge.
>>>
>>> $ grep -v -f file1 file2
>>>
>>> (FYI, on AIX it could only be less than 1 second)
>>>
>>> Maybe there is also a room for optimization not only on the memory usage
>>> but also on the time cost.
>>
>> What version of grep are you using?
>> With the latest (grep-2.23), this takes
>> less than 1.5s on a core-i7-4770S-based system:
>>
>>   $ env time grep -v -f <(seq 200000) <(shuf -i 1-200000 -n 250)
>>   1.27user 0.16system 0:01.43elapsed 100%CPU (0avgtext+0avgdata
>> 839448maxresident)k
>>   0inputs+0outputs (0major+233108minor)pagefaults 0swaps
>
> Sorry.
> In my situation, the grep command could be a little different, the
> command is:
>
> # grep -w -f file1 file2

The command I provided is stand-alone, and equivalent to
what you described, except that it generates the two
input files as part of the command. However, the cost of
generating those two inputs is minimal. The <(...) notation
is a feature called process substitution. It should work
both with bash and with zsh.

Please show the precise commands (and output) that
you used to produce the inputs and to time the grep
invocation.

> Also after testing with the latest grep-2.23, it could slow.

I don't understand the above. Please rephrase.
If you used a system-provided version of grep,
tell us what "rpm -q grep" prints.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]