[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20768: RFC: Multithreaded grep

From: Zev Weiss
Subject: bug#20768: RFC: Multithreaded grep
Date: Tue, 9 Jun 2015 14:41:32 -0500
User-agent: Mutt/1.5.23 (2014-03-12)

On Tue, Jun 09, 2015 at 12:04:11PM +0100, Aaron Crane wrote:
Zev Weiss <address@hidden> wrote:
Hmm -- I picked --parallel largely for consistency with the corresponding
flag for coreutils' sort, which strikes me as a closer relative to grep than
either make or parallel.

That's a good point; I wasn't aware of sort's --parallel option.
Though I also note that "sort --parallel=4" limits the number of
threads to 4, rather than increasing the number of threads from 1 to
4, so the comparison isn't exact.

sort doesn't
have a matching short option though, so I went with -M to suggest
"mulithreaded" (since, as you point out, -P is already in use).  Though I
notice now that lower-case -p is still available; perhaps that might be
better than -M.

I'm a little unhappy about the idea of proliferating the world's set
of short options in this space, to be honest. If grep didn't already
have -P, I'd be happy enough with -P and either --parallel or
--max-procs, but I'm not terribly fond of the idea of introducing
either -M or -p.

Aaron Crane ** http://aaroncrane.co.uk/

True, I suppose that's a reasonable concern (especially given how many there are now). My thought was that at least for me (and it sounds like perhaps Paul as well) this would be fairly likely to be a commonly used option, so I'd like a nice concise way of enabling it. With sort there's no real downside to just enabling multithreading by default, so a longopt-only flag is fine. With grep however (at least with my current implementation) there are tradeoffs with output ordering that may be undesirable (and which I don't see a good way around without introducing a bunch of potentially-complicated and performance-reducing per-file output buffering), so I kept it off by default.

There's also the question of the argument parsing mentioned in my original email -- as it stands now, '-M' would be the only short option with an optional argument, which has potential to be confusing. Thinking about it a bit more, I realize that what I really want out of the short flag is just a shorter way to say --parallel=NUMCPUS (and not have to remember how many CPUs the machine I'm on has), so perhaps another possibility on that front would be to leave the long option as-is but have the short flag (assuming there is one) not take an argument (though I suppose that could perhaps be seen as confusing in its own way too).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]