[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#29921: O(n^2) performance of rm -r
From: |
Pádraig Brady |
Subject: |
bug#29921: O(n^2) performance of rm -r |
Date: |
Mon, 1 Jan 2018 16:40:26 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 |
tag 29921 notabug
close 29921
stop
On 31/12/17 21:07, Niklas Hambüchen wrote:
> Hello,
>
> in commit
>
> rm -r: avoid O(n^2) performance for a directory with very many entries
> http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a
>
> it says that `rm -r`
>
> "now displays linear performance, even when operating on million-entry
> directories on ext3 and ext4"
>
> I found this not to be the case on my systems running ext4 on Linux 4.9
> on a WD 4TB spinning disk, coreutils rm 8.28.
> As reported on https://bugs.python.org/issue32453#msg309303, I got:
>
> nfiles real user sys
>
> 100000 0.51s 0.07s 0.43s
> 200000 2.46s 0.15s 0.89s
> 400000 10.78s 0.26s 2.21s
> 800000 44.72s 0.58s 6.03s
> 1600000 180.37s 1.06s 10.70s
>
> Each 2x increase of number of files results in 4x increased deletion
> time, making for a clean O(n^2) quadratic curve.
>
> I'm testing this with:
>
> set -e
> rm -rf dirtest/
> echo 100000 && (mkdir dirtest && cd dirtest && seq 1 100000 | xargs
> touch) && time rm -r dirtest/
> echo 200000 && (mkdir dirtest && cd dirtest && seq 1 200000 | xargs
> touch) && time rm -r dirtest/
> echo 400000 && (mkdir dirtest && cd dirtest && seq 1 400000 | xargs
> touch) && time rm -r dirtest/
> echo 800000 && (mkdir dirtest && cd dirtest && seq 1 800000 | xargs
> touch) && time rm -r dirtest/
> echo 1600000 && (mkdir dirtest && cd dirtest && seq 1 1600000 | xargs
> touch) && time rm -r dirtest/
>
>
> On another system, Ubuntu 16.04, coretuils rm 8.25, with Linux 4.10 on
> ext4, I get:
>
> nfiles real user sys
>
> 100000 0.94s 0.06s 0.516s
> 200000 2.94s 0.10s 1.060s
> 400000 10.88s 0.30s 2.508s
> 800000 34.60s 0.48s 4.912s
> 1600000 203.87s 0.99s 11.708s
>
> Also quadratic.
>
> Same machine on XFS:
>
> nfiles real user sys
>
> 100000 3.37s 0.04s 0.98s
> 200000 7.20s 0.06s 2.03s
> 400000 22.52s 0.16s 5.11s
> 800000 53.31s 0.37s 11.46s
> 1600000 200.83s 0.76s 22.41s
>
> Quadratic.
>
> What is the complexity of `rm -r` supposed to be?
> Was it really linear in the past as the patch describes, and this is a
> regression?
> Can we make it work linearly again?
Note the patch mentioned above, sorts by inode order,
which is usually a win, avoiding random access across the device.
The user and sys components of the above are largely linear,
so the increasing delay is waiting on the device.
I presume that's coming from increased latency effects
as the cached operations are sync'd more to the device
with increasing numbers of operations.
There are many Linux kernel knobs for adjusting
caching behaviour in the io scheduler.
To confirm I did a quick check on a ramdisk (/dev/shm)
to minimize the latency effects, and that showed largely
linear behaviour in the real column also.
Note if you want to blow away a whole tree quite often,
it may be more efficient to recreate the file system
on a separate (loopback) device.
cheers,
Pádraig
- bug#29921: O(n^2) performance of rm -r,
Pádraig Brady <=