bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #64253] Suggestion - Add support for libmagic and xattr


From: Bernhard Voelker
Subject: Re: [bug #64253] Suggestion - Add support for libmagic and xattr
Date: Wed, 31 May 2023 23:18:18 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.2

Without commenting here about -magic/-mime, i.e. just to discuss the given
statements on what is possible today.

On 5/25/23 21:18, anonymous wrote:
Currently - with find : We need xargs and sed and so have to worry about
whitespace paths and filenames, we are also spawning several sub-commands.


find -type f |
  xargs file |
   sed -n 's/:.*PE32 executable.*/p' |
    xargs my_command

With find(1), one does not have to "worry about whitespace". There are several
safe ways to stay on the safe side:
- executing per file (which may be inefficient):
    $ find ... -exec $TOOL '{}' ';'
- bulk execution:
    $ find ... -exec $TOOL '{}' +
- if $TOOL understands Zero-separated input (e.g. like grep):
    $ find ... -print0 | $TOOL -z
- else
    $ find ... -print0 | xargs -r0 $TOOL

Re. file(1): unfortunately, this tool - although it has a --files-from option - 
does
not allow Zero-separated input.  For the search case, it would also come handy 
if
file(1) would have a --filter=PATTERN option, and furthermore allow to only 
print
the file name matching the pattern for safe post-processing in other tools.

Today, one could efficiently and safely use something like this to find files
where file(1) returns a magic string matching PATTERN :

  $ find ... -exec file -00 '{}' + \
      | sed -nz 'h;n; /PATTERN/{g;p}' \
      | xargs -0 my_command

Here's an example to filter on regular files smaller than 40000 bytes, then 
letting
the "file ...|sed ..." pipe filter the wanted magic string "C source", and 
finally
continue the search in a subsequent find(1) command.

  $ find -type f -size -40000c -mtime -1 -exec file -00 '{}' + \
      | sed -nz 'h;n;/^C source/{g;p}' \
      | find/find -files0-from - -ls

Obviously, the file(1) run is always by far the most expensive part, because it
has to read all the files, but at least it is only spawned as less as possible,
which hence saves the number of times the magic file has to be loaded.

Have a nice day,
Berny



reply via email to

[Prev in Thread] Current Thread [Next in Thread]