coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

My experience with using cp to copy a lot of files (432 millions, 39 TB)


From: chrjae
Subject: My experience with using cp to copy a lot of files (432 millions, 39 TB)
Date: Thu, 11 Sep 2014 20:42:32 -0700 (PDT)

This post has made it to Hacker News[1].

We have discussed optimization possibilities a bit, and I made the
suggestion to replace the usage of a hash table in cp with sorting a
list.

For example: walk the source tree and write a list of ino/dev/path to
a temporary file, then sort that file according to ino/dev (e.g. using
GNU sort, which I seem to remember is already using a memory-efficient
algorithm (i.e. works well with files much bigger than RAM)?), then
parse the file back and copy the first path of every group with the
same ino/dev and link the rest.

The assumption is that sorting a list requires much less RAM to be
efficient than the hash table. (I can't find my copy of TAOCP right
now, I think it describes solutions.)

Christian.

[1] https://news.ycombinator.com/item?id=8305283




reply via email to

[Prev in Thread] Current Thread [Next in Thread]