grep-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to profile the grep engine despite the "/dev/null optimization"


From: Zev Weiss
Subject: Re: How to profile the grep engine despite the "/dev/null optimization" ?
Date: Mon, 29 May 2023 14:18:42 -0700

On Sat, May 27, 2023 at 02:20:02PM PDT, alexandre.ferrieux@orange.com wrote:
Hello,

In some circumstances, it is useful to profile, or do perf anaylsis on, the grep engine.
This happens for example when you need to tune your regexps for performance.
In the old times, we just did:

    time grep PATTERN < FILE > /dev/null

But nowadays, grep *detects* /dev/null and behaves differently in that case: it sets the done_on_match flag, and accordingly exits very quickly, on the first matching line of your hundreds-of-gigabytes test file....

It's not immediately obvious why grep tries to be "smarter than the human", as "-q" is an explicit way for the human to request this behavior.
Anyway, I realize I'm pretty late to complain, as this happened 7 years ago:

    af6af28 Paul Eggert     Sun May 1 22:56:39 2016 -0700           grep: /dev/null output speedup

So, what is today's recommended idiom to do the same ?

Thanks in advance,

-Alex


PS: Note "grep ... | cat > /dev/null" is a VERY poor approximation, as the scheduler's backpressure hits pretty bad. Even with enlarged pipe buffers, grep runs slower with a pipe than with a redirection to a RAMdisk file (/dev/shm/foo), which unfortunately is not scalable to hundreds of gigabytes on most machines.

PS2: I am aware I can fool grep's detection method, which is to compare inodes, by creating a "/dev/null2" with same device number (but different inode). However, I dearly hope one doesn't need to resort to such horrendous hacks for simple perf tuning...

Note that while it's far less commonly used in this way, /dev/zero provides the same property of discarding all written data -- and from a quick inspection of src/grep.c, grep does not appear to detect it for optimization purposes, so I suspect it would suit your purposes well.


Zev




reply via email to

[Prev in Thread] Current Thread [Next in Thread]