bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#29921: O(n^2) performance of rm -r


From: Pádraig Brady
Subject: bug#29921: O(n^2) performance of rm -r
Date: Mon, 1 Jan 2018 16:40:26 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

tag 29921 notabug
close 29921
stop

On 31/12/17 21:07, Niklas Hambüchen wrote:
> Hello,
> 
> in commit
> 
>   rm -r: avoid O(n^2) performance for a directory with very many entries
>   http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a
> 
> it says that `rm -r`
> 
>   "now displays linear performance, even when operating on million-entry
> directories on ext3 and ext4"
> 
> I found this not to be the case on my systems running ext4 on Linux 4.9
> on a WD 4TB spinning disk, coreutils rm 8.28.
> As reported on https://bugs.python.org/issue32453#msg309303, I got:
> 
>  nfiles     real   user     sys
> 
>  100000    0.51s  0.07s   0.43s
>  200000    2.46s  0.15s   0.89s
>  400000   10.78s  0.26s   2.21s
>  800000   44.72s  0.58s   6.03s
> 1600000  180.37s  1.06s  10.70s
> 
> Each 2x increase of number of files results in 4x increased deletion
> time, making for a clean O(n^2) quadratic curve.
> 
> I'm testing this with:
> 
>   set -e
>   rm -rf dirtest/
>   echo  100000 && (mkdir dirtest && cd dirtest && seq 1  100000 | xargs
> touch) && time rm -r dirtest/
>   echo  200000 && (mkdir dirtest && cd dirtest && seq 1  200000 | xargs
> touch) && time rm -r dirtest/
>   echo  400000 && (mkdir dirtest && cd dirtest && seq 1  400000 | xargs
> touch) && time rm -r dirtest/
>   echo  800000 && (mkdir dirtest && cd dirtest && seq 1  800000 | xargs
> touch) && time rm -r dirtest/
>   echo 1600000 && (mkdir dirtest && cd dirtest && seq 1 1600000 | xargs
> touch) && time rm -r dirtest/
> 
> 
> On another system, Ubuntu 16.04, coretuils rm 8.25, with Linux 4.10 on
> ext4, I get:
> 
>  nfiles     real   user      sys
> 
>  100000    0.94s  0.06s   0.516s
>  200000    2.94s  0.10s   1.060s
>  400000   10.88s  0.30s   2.508s
>  800000   34.60s  0.48s   4.912s
> 1600000  203.87s  0.99s  11.708s
> 
> Also quadratic.
> 
> Same machine on XFS:
> 
>  nfiles     real   user     sys
> 
>  100000    3.37s  0.04s   0.98s
>  200000    7.20s  0.06s   2.03s
>  400000   22.52s  0.16s   5.11s
>  800000   53.31s  0.37s  11.46s
> 1600000  200.83s  0.76s  22.41s
> 
> Quadratic.
> 
> What is the complexity of `rm -r` supposed to be?
> Was it really linear in the past as the patch describes, and this is a
> regression?
> Can we make it work linearly again?

Note the patch mentioned above, sorts by inode order,
which is usually a win, avoiding random access across the device.

The user and sys components of the above are largely linear,
so the increasing delay is waiting on the device.
I presume that's coming from increased latency effects
as the cached operations are sync'd more to the device
with increasing numbers of operations.
There are many Linux kernel knobs for adjusting
caching behaviour in the io scheduler.

To confirm I did a quick check on a ramdisk (/dev/shm)
to minimize the latency effects, and that showed largely
linear behaviour in the real column also.

Note if you want to blow away a whole tree quite often,
it may be more efficient to recreate the file system
on a separate (loopback) device.

cheers,
Pádraig





reply via email to

[Prev in Thread] Current Thread [Next in Thread]