bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22357: grep -f not only huge memory usage, but also huge time cost


From: JQK
Subject: bug#22357: grep -f not only huge memory usage, but also huge time cost
Date: Mon, 14 Mar 2016 14:31:50 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 03/12/2016 04:17 AM, Jim Meyering wrote:
> [resending to keep the list on Cc]
> On Thu, Mar 10, 2016 at 10:05 PM, JQK <address@hidden> wrote:
>> On 03/11/2016 01:26 AM, Jim Meyering wrote:
>>> On Thu, Mar 10, 2016 at 3:00 AM, JQK <address@hidden> wrote:
>>>> If in the following situation,
>>>>
>>>> ===========
>>>> file1 has numbers from 1 to 200000, 200000 lines
>>>> file2 has several lines(about 200 ~300lines) of random numbers in the
>>>> range of 1-200000
>>>> ===========
>>>>
>>>> The time cost for finishing the following command could be over 15
>>>> minutes on linux -- a little huge.
>>>>
>>>> $ grep -v -f file1 file2
>>>>
>>>> (FYI, on AIX it could only be less than 1 second)
>>>>
>>>> Maybe there is also a room for optimization not only on the memory usage
>>>> but also on the time cost.
>>>
>>> What version of grep are you using?
>>> With the latest (grep-2.23), this takes
>>> less than 1.5s on a core-i7-4770S-based system:
>>>
>>>   $ env time grep -v -f <(seq 200000) <(shuf -i 1-200000 -n 250)
>>>   1.27user 0.16system 0:01.43elapsed 100%CPU (0avgtext+0avgdata
>>> 839448maxresident)k
>>>   0inputs+0outputs (0major+233108minor)pagefaults 0swaps
>>
>> Sorry.
>> In my situation, the grep command could be a little different, the
>> command is:
>>
>> # grep -w -f file1 file2
> 
> The command I provided is stand-alone, and equivalent to
> what you described, except that it generates the two
> input files as part of the command. However, the cost of
> generating those two inputs is minimal. The <(...) notation
> is a feature called process substitution. It should work
> both with bash and with zsh.
> 
> Please show the precise commands (and output) that
> you used to produce the inputs and to time the grep
> invocation.
> 
>> Also after testing with the latest grep-2.23, it could slow.
> 
> I don't understand the above. Please rephrase.
> If you used a system-provided version of grep,
> tell us what "rpm -q grep" prints.
> 

The testing is as following:

【grep version】
# grep -V
grep (GNU grep) 2.23
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see
<http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

【without option "-F"】
# env time grep -w -f <(seq 200000) <(shuf -i 1-200000 -n 250)
:
288.77user 64.23system 10:35.71elapsed 55%CPU (0avgtext+0avgdata
3492784maxresident)k
8967032inputs+0outputs (154389major+1493890minor)pagefaults 0swaps

【with option "-F"】
# env time grep -F -w -f <(seq 200000) <(shuf -i 1-200000 -n 250)
:
0.10user 0.01system 0:00.22elapsed 53%CPU (0avgtext+0avgdata
87856maxresident)k
0inputs+0outputs (0major+5534minor)pagefaults 0swaps

-- 
Junkui Quan (JQK)
www.redhat.com

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]