bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: faster version of find (for -exec)?


From: James Youngman
Subject: Re: faster version of find (for -exec)?
Date: Sat, 19 Jul 2014 18:20:42 +0100

On Sat, Jul 19, 2014 at 5:02 PM, Peng Yu <address@hidden> wrote:
> The point is not to call an external program. For example, if there is
> a `find` tool made in python, so that I can specify a python
> expression to the tool (to replace the function from the external
> program), then everything will be run within one process, which avoid
> the spawning of new processes.
>
> My question is if there is such a `find` tool available.

I'm not sure I understand, then, what you mean by "tool".

If your question is really "Is it possible to do what find does,
without using find?" then of course the answer is yes; one only needs
to re-implement find in one's chosen language, adapting it to suit
one's purpose.   This clearly must be true, or it would not be
possible to implement find!

If on the other hand your question is, "what interfaces are available
in various languages for examining the file system?" then it depends
on what level of abstraction you require.    You could use
opendir/readdir/closedir/lstat, just as oldfind does.   You could use
fts, as ftsfind does.   (The "find" binary built in GNU findutils is
either "oldfind" or "ftsfind", depending on the configure options).
Or nftw.  One could use os.walk in Python.   In Haskell, something
like System.FilePath.Find.     Similarly in Scheme:

~$ cat ftw.scm
(use-modules (ice-9 ftw))

(define ls
  (lambda (file statinfo directory)
    (display file)
    (newline)
    #t))

(ftw "/tmp" ls)
~$ guile -s  ftw.scm
/tmp
/tmp/.X11-unix
/tmp/.X11-unix/X0
/tmp/pulse-bxPFzY1KQFl4
/tmp/pulse-PKdhtXMmr18n
/tmp/.X0-lock
/tmp/pulse-8xAOfnbEnkqU
/tmp/.ICE-unix
/tmp/.ICE-unix/29098

The spawning of new processes is not an all-or nothing thing.  One is
almost never constrained to launch one process per matched file (the
sole exception is when a nonzero return code needs to trigger a branch
in find's behaviour, such as using "-exec foo {} ; -o -quit").  The
right approach is going to depend on a lot of factors, in particular
what work you need to do on the discovered files.   If just printing
their names, then you don't need to exec anything in find, either.  If
on the other hand, you want to compress the selected files for
example, then while you could implement this in (say) Python this is
unlikely to be as fast as launching an external command to do this.
Consider that for example to compress the more than 66000 files in
/usr/lib on my system here, "find" only needs to launch gzip 29 times
so the overhead of launching find is inconsequential next to the
overall runtime.

James.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]