[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#29921: O(n^2) performance of rm -r

From: Niklas Hambüchen
Subject: bug#29921: O(n^2) performance of rm -r
Date: Tue, 2 Jan 2018 00:54:26 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Thunderbird/58.0

Hello Jim and Paul,

thank you very much for verifying my observations.

Also, thanks Jim for taking the time to make the nature of the issue
more precise:
Your patch still helps a lot with big big constant factors but doesn't
relieve us of the overall O(n²) problem.

I can also confirm the super-linear performance on SSDs you mentioned.
On my systems, I get roughly:

O(n^2  ) on spinning disks
O(n^1.7) on SSDs

I see this on ext4 on SSDs and spinning disks; also on xfs and zfs on
spinning disks but I haven't tried those two on SATA SSDs or NVMe yet.

@ Jim, if you your NVMe device available for that, could you check all
three of those filesystems on it? If it happens to be linear on NVMe on
all filesystem types, that might help narrow down the issue.

> I doubt we can do anything in the coreutils to fix it.

I guess that depends on what we will find the root cause of the issue to be:
If there is a favourable deletion order that makes the underlying
systems behave linearly, then `rm` could try to produce that order in
the same spirit as your previous patch does.

To go for a simple exmaple: If the filesystem implemented the directory
lists as a balanced binary tree, then deleting leaves evenly spread
across the width of the tree should avoid any rebalancing and result in
O(log n) for a single unlink().

> perhaps we should file an ext4 bug report?

OK, I will do that right now.

Naively, I would expect that a single unlink() can always be implemented
in amortised O(1) or at least O(log n), so that we should end up with
O(n) or O(n * log n) overall.

Another possibility (though I find it unlikely) is that deletion is in
fact O(n), but that the linear behaviour only appears once we get to
much larger n. It seems worthwhile running a deletion loop with
ever-increasing n = n*2 over night.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]