[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: getting 'grepped' files from file list

From: Bob Proulx
Subject: Re: getting 'grepped' files from file list
Date: Mon, 15 Jun 2009 15:59:02 -0600
User-agent: Mutt/1.5.18 (2008-05-17)

Dave B wrote:
> Edward Peschko wrote:
> > 1: suppose I want to build a 'project' file, which contains a list of
> > files that I care about - one that avoids temporary files, binary
> > files, etc. This would then be a very nice feature.
> You can use xargs to obtain pretty much the same effect.
> xargs grep pattern < filelist.txt
> (with the usual issues to consider when using xargs)

You can avoid using xargs here if you don't want to use it.

  grep PATTERN $(<filelist.txt)

> > 2. suppose - in this file, I want to pass commands to grep (in the
> > form of config flags, or something more fancy. Then this is a very
> > nice feature.

Something like this?  This seems to meet your needs as stated so far.

  grep PATTERN $(<grep-args)

> You can set the environment variable GREP_OPTIONS with the flags you like.

I don't prefer that since if unchecked that would also leak into any
other grep run in that environment.  It must be used carefully.

> > 3. even with the find solution you are opening up multiple grep
> > processes per file which costs time. This feature avoids that overhead.
> It wouldn't be multiple grep processes per file, it would be "just" one grep
> process per file. But read on...
> If you look carefully, you'll see that he used find ... -exec grep .. +
> The plus at the end is very important. It's what allows to run a single
> instance of grep with as many arguments as possible, as opposed to run one
> grep per file.

Yes!  Thanks for observing that.  Using the "{} +" syntax is a very
efficient way of running commands.  It will NOT run one grep per file.
The least number of grep invocations possible will be used.

> > In any case, the 'arg list too long' problem being solved in
> > new versions of linux doesn't help me because it isn't cross
> > platform.
> Neither are the grep flags you use below, unless you're sure you have GNU
> grep available on every system where your script runs.


> > Ultimately, I have something like this in mind in the filelist I
> > wish to pass to grep:
> > 
> > -n
> > --expand-gzip
> > --expand-tar
> > /file1
> > /file2
> > /file3.tar.gz

Then this works now:

  grep PATTERN $(<grep-args)

> > or more fancy:
> > 
> > -n
> > <helper_command>:  /file1
> > <helper_command>: /file2
> > 
> > where helper command is a command that is
> > run on the file contents of /file1 before
> > being passed to grep..

More things that are already possible:

  grep PATTERN $(helper_command file1 file2)

And the advantage is that techniques such as these works with all
commands and not just with grep.

> > Anyways, that's ok, I've implemented a bit hacky
> > way of doing this, so if it's unacceptable to
> > have this in the core, I have a workaround.
> Note that I'm not against the idea you propose, in principle; whether that's
> a good or bad idea is probably open to depate, and the opinions of the
> developers are to keep into consideration. More simply, I was just trying to
> give you some ideas on how to implement what you want.

After 40 years of use I consider most of the core Unix commands to be
fairly mature and I wish their interfaces to remain stable.  Stability
allows their long term portable use in scripts.  Therefore I feel that
modifications need to have a strong rationale behind them.  Meaning
that I am NOT opposed to changes as long as they are well thought out,
discussed openly and implemented cleanly.  Interface mistakes hang
around punishing us forever.  Transparent development practices reduce
the risk of making long term problems.  My goal here is to take part
constructively in the discussion so as to produce the best long term
result for the system.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]