bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: getting 'grepped' files from file list


From: Dave B
Subject: Re: getting 'grepped' files from file list
Date: Tue, 16 Jun 2009 22:30:49 +0200
User-agent: Thunderbird 2.0.0.21 (X11/20090527)

Bob Proulx wrote:

>> Yes that's true. I just felt that xargs (well, GNU xargs to be honest) gives
>> you a bit more control in case the filelist contains filenames with spaces,
>> wildcards or other oddities, which will cause problems with the
>> $(<filelist.txt) approach, since the shell does word splitting and pathname
>> expansion on the result of that.
> 
> I was simply reacting to the "usual issues" statement.  (shrug) Yes,
> whitespace in filenames may be a problem.  But then xargs doesn't help
> there unless you use zero terminated strings.  Using newline
> terminated strings with xargs without options, I think, is the same as
> using the $(<file) way of doing things.  At least it has very close to
> the same set of problems.

Standard xargs has traditionally been fragile and quite painful to use
because of its peculiar way of interpreting its input, with rules much
similar to shell escaping and quoting rules, so you had to be careful not
only with spaces, but also with filenames containing quotes, backslashes,
etc. Unfortunately, POSIX rules for xargs are still pretty much the same,
even in the latest standard.
GNU xargs, on the other hand, has the -d option that lets you specify the
separator, and, at the same time, "quotes and backslash are not special;
every character in the input is taken literally."

So basically xargs -d '\n' is able to handle every kind of filename, except
those with embedded newlines. But if the OP wants to keep his filenames in a
text file, that means that he doesn't have such filenames :)

> Of course the traditionalist simply avoids any cases with whitespace
> in filenames.  Then it won't matter.  :-)

Agreed.

>> xargs -d '\n' grep pattern < filelist.txt
>> should take care of most of the issues.
> 
> Uhm, well, hmm...  Once you start worring about whitespace in
> filenames handling spaces but not handling newlines just doesn't feel
> right.  

IMHO it does, in the specific case suggested by the OP (ie, when file names
are in a TEXT file). In the general case, I agree with you.

> Yes that will handle spaces but then not files with newlines
> in them.  

As I said, you can't keep /recognizable/ filenames with embedded newlines in
a TEXT file. To do that, you would need null separators, but then (according
to POSIX definition) what you have is not a text file anymore..

> If effort is going to be put in there to handle one then the
> other is not too far away using zero terminated strings.
> 
> Even though it is a little painful it does seem best to use zero
> terminated strings for filename data.  Oh well.

Maybe, but then you would sacrifice the ability to use all the usual tools
(editors, filters, etc.) that exist for working on text files, just because
you want to handle an admittedly very rare case.
The downside in using null separators, as I said above, is that then
maintaining and editing the filename list file becomes a pain, whether it is
done by hand (how many editors would handle that in a way that is
comfortable for the user?), or using other unix tools (not all tools can
happily work with NUL characters; I'd say only a minority will).
In the end, I'd say that keeping newline-separated names in the file and
using xargs -d '\n' would be a good compromise **for this specific problem**
(and similar ones; I'm not claiming universal validity for it).

>> It does require GNU xargs however, as I said.
> 
> Agreed.  Because then you can use xargs --null.  The only thing I
> preferred otherwise was the use of GREP_OPTIONS which is just too
> global for my taste.  Otherwise I think we are in general agreement.

Well, that was just what my convoulted mind thought at that time :), but
actually if xargs is used options can be just added at the beginning of the
filename list,just before the first filename.

-- 
D.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]