bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Faster way to prune directory?


From: Peng Yu
Subject: Re: Faster way to prune directory?
Date: Thu, 16 Apr 2015 09:51:31 -0500

On Thu, Apr 16, 2015 at 1:26 AM, Bernhard Voelker
<address@hidden> wrote:
> On 04/16/2015 06:04 AM, Peng Yu wrote:
>> Hi, The following code shows that -prune when used with -exec can be
>> very slow. Is there somehow a way to speed this up?
>>
>> ~$ cat main.sh
>> #!/usr/bin/env bash
>>
>> tmpdir=$(mktemp -d)
>>
>> function mkalotdir {
>> local n=$1
>> local i
>> local j
>> local k
>> for i in $(seq -w "$n")
>> do
>>   for j in $(seq -w "$n")
>>   do
>>     for k in $(seq -w "$n")
>>     do
>>       echo "$tmpdir/$i/$j/$k"
>>     done
>>   done
>> done | xargs mkdir -p
>> }
>>
>> function myfind {
>> find "$tmpdir" > /dev/null
>> }
>>
>> function myfindprune {
>> find "$tmpdir" -exec $(type -P test) -e {}/.findignore ';' -prune -o
>> -print > /dev/null
>> }
>>
>> mkalotdir 10
>> echo myfind
>> time myfind
>> echo myfindprune
>> time myfindprune
>>
>> ~$ ./main.sh
>> myfind
>>
>> real 0m0.018s
>> user 0m0.005s
>> sys 0m0.011s
>> myfindprune
>>
>> real 0m5.354s
>> user 0m1.145s
>> sys 0m1.539s
>
> Well, half a second for 1111 times creating and running /usr/bin/test
> doesn't seem too much.  At least, I can second your timing results.
>
> The time is not lost in find but with executing the test(1) program
> for so many times.  To get an idea, start the above command line with
> "strace -f -v find ...".
>
> You'll see that you are "comparing apples with pears" - my home country
> doesn't grow oranges, so that's what this saying looks like over here.
> ;-)
>
> To get a little better result, you could avoid the overhead in test(1)
> regarding NLS etc. by rolling your own, puristic(!) test program:
>
>   $ cat /tmp/mystat.c
>   #include <sys/stat.h>
>   int main (int argc, char**argv) {
>     struct stat sb;
>     return -1 == stat (argv[1], &sb);
>   }
>
>   $ gcc -Wall -O3 -o /tmp/mystat /tmp/mystat.c
>   $ strip /tmp/mystat
>
>   $ time find . -type d -exec /tmp/mystat '{}'/.findignore \; -prune -o 
> -print >/dev/null
>
>   real    0m0.340s
>   user    0m0.014s
>   sys     0m0.064s

On my machine, the speed up using C stat() is only about 1x.

myfind

real 0m0.020s
user 0m0.005s
sys 0m0.012s
myfindprune

real 0m5.408s
user 0m1.137s
sys 0m1.565s
myfindcppprune # using C stat()

real 0m2.455s
user 0m0.699s
sys 0m1.088s


> Given what the system has to process compared to a bare "find .",
> this is IMO quite good, isn't it?

The real question is whether using `-prune` as it is in `find` is a good idea.

It sounds like in cases like the one that I showed, it is better just
to do two `find`s: one searches for all directories without any
restriction, the other searches for .findignore to get all the
directories to be ignored. Then a program (to be written, which can be
included in findutils) can be used to prune the directory in one shot
based on the results on both runs of find.

But the fact that there is `-prune` in there will promote people use
it which results in low performance search.

-- 
Regards,
Peng



reply via email to

[Prev in Thread] Current Thread [Next in Thread]