bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

fts: eliminating the worst-case memory-use behavior


From: Jim Meyering
Subject: fts: eliminating the worst-case memory-use behavior
Date: Tue, 16 Aug 2011 14:24:18 +0200

Applying fts (via find, rm, du, chmod -R, etc.)  to a directory with 4
million entries would make the parent program consume about 1GiB of RSS.
Similarly, with 32 million entries, the parent would require about 8GiB of
memory.  Thus, with a large enough directory, these tools become unusable.

I've changed fts so that its internal memory use now stays reasonable
for those programs.

For example, consider the following before-and-after comparison.
I created the scenario like this in a tmpfs file system:

    mkdir rm-test && (cd rm-test && seq 4000000|xargs touch)

Then I ran "rm -rf rm-test", first with the old (coreutils-8.12)
version of rm, and again with a version of rm using the new fts.c:

      rm from coreutils-8.12: removing a 4M-entry directory

    GB
1.043^                                          #
     |                                         :#::
     |                                       @::#:::
     |                                    :::@::#:::::
     |                                  ::@::@::#::::::
     |                                @:::@::@::#:::::::@
     |                              ::@:::@::@::#:::::::@:
     |                            ::::@:::@::@::#:::::::@::
     |                          ::::::@:::@::@::#:::::::@::::
     |                        ::::::::@:::@::@::#:::::::@:::::
     |                      ::::::::::@:::@::@::#:::::::@::::::@
     |                   :::::::::::::@:::@::@::#:::::::@::::::@:
     |                 :::::::::::::::@:::@::@::#:::::::@::::::@::
     |               :::::::::::::::::@:::@::@::#:::::::@::::::@::::
     |             :::::::::::::::::::@:::@::@::#:::::::@::::::@:::::
     |           :::::::::::::::::::::@:::@::@::#:::::::@::::::@::::::@
     |         :::::::::::::::::::::::@:::@::@::#:::::::@::::::@::::::@:
     |       ::: :::::::::::::::::::::@:::@::@::#:::::::@::::::@::::::@:::
     |    :::::: :::::::::::::::::::::@:::@::@::#:::::::@::::::@::::::@:::::
     |  :::::::: :::::::::::::::::::::@:::@::@::#:::::::@::::::@::::::@::::::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   2.050



      rm with improved fts: removing a 4M-entry directory

    MB
28.27^ #
     | #           :            :                      : :                   :
     | #           :       :    :   :                  : : :               @ :
     | #  :        :   :   :    :   :                 :: : : :           : @ :
     | #  :    :   : : :   :  : :   :                 :: : : ::         :: @ :
     | #  :    :   : : : : :  : : : :               : :: : : :: :       :: @ :
     | #  :    :   : : : : :: : : : :         :     : :: : : :: : :   : :: @ :
     | #: :    :   : : : : :: : : : :         :   : : :: : : :: : : : : :: @ :
     | #: :    :   : : : : :: : : : : :  :    :   ::: :: : : :: : : :@: :: @ :
     | #: : :::: : : : : : :: : : : : :  :    :   : :::: : : :: : : :@:::: @ :
     | #: : :: : : : : : : :: : : : : :  :    :   : :::: : : :: ::: :@:::: @ :
     | #: :::: : ::: : : : :: : ::: : :  : :  :   : :::: : : :::::: :@:::: @::
     | #: :::: : ::: : : : :: : ::: : : :: :  :   : :::: : : :::::: :@:::: @::
     | #: :::: : ::: : : : :: : ::: : : :: :  :  :: :::: ::: :::::: :@:::: @::
     | #: :::: : ::: ::: : :: : ::: : : :: :  :  :: :::: :::::::::: :@:::: @::
     | #: :::: : ::: ::: : :: : ::: : : :: ::::  :: :::: :::::::::: :@:::: @::
     | #: :::: : ::::::: : :::: ::: : : :: :::::::: :::: :@:::::@:: :@:::: @::
     | #: :::: : ::::::: :::::: ::::: :::::::::: :: :::: :@:::::@:: :@:::: @::
     | #:::::: : ::::::: :::::::::::: : :::::::: :: ::::::@:::::@::::@:::: @::
     | #:::::: : ::::::: :::::::::::::: :::::::: :: ::::::@:::::@::::@:::::@::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   2.072

Obviously we prefer the new version, with peak utilization of just 28MB.
The former required 1GB.

The new fts accomplishes this by batching its readdir calls into groups
of 100,000 (hence the ~25MiB, at ~250B/entry) when possible.  Before, it
would always read all directory entries into memory before processing
any of them.  Even now, when you specify an entry-comparison function,
fts must still resort to reading all entries before invoking that function.

Note that with very many directory entries (more than 10,000), fts must
sort by inode number before processing them (only on some FS types).
One concern is that batching might end up decreasing performance, since
it is sorting only a fraction of the entries at a time.
I haven't measured that yet (not relevant above, since there is no
seek penalty -- or sorting -- on memory-backed file systems).

I'll post patches in a day or two.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]