bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Threaded versions of cp, mv, ls for high latency / parallel filesyst


From: Jim Meyering
Subject: Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?
Date: Sat, 08 Nov 2008 19:05:25 +0100

Andrew McGill <address@hidden> wrote:
> Greetings coreutils folks,
>
> There are a number of interesting filesystems (glusterfs, lustre? ... NFS)
> which could benefit from userspace utilities doing certain operatings in
> parallel.  (I have a very slow glusterfs installation that makes me think
> that some things can be done better.)
>
> For example, copying a number of files is currently done in series ...
>       cp a b c d e f g h dest/
> but, on certain filesystems, it would be roughly twice as efficient if
> implemented in two parallel threads, something like:
>       cp a c e g dest/ &
>       cp b d f h dest/
> since the source and destination files can be stored on multiple physical
> volumes.

How about parallelizing it via xargs, e.g.,

    $ echo a b c d e f g h | xargs -t -n4 --no-run-if-empty \
      --max-procs=2 -- cp --target-directory=dest
    cp --target-directory=dest a b c d
    cp --target-directory=dest e f g h

Obviously the above is tailored (-L4) to your 8-input example.
In practice, you'd use a larger number, unless latency is
so high as to dwarf the cost of extra "fork/exec" syscalls,
in which case even -L1 might make sense.

mv and ln also accept the --target-directory=dest option.

> Simlarly, ls -l . will readdir(), and then stat() each file in the directory.
> On a filesystem with high latency, it would be faster to issue the stat()
> calls asynchronously, and in parallel, and then collect the results for

If you can demonstrate a large performance gain on
systems that many people use, then maybe...

There is more than a little value in keeping programs
like those in the coreutils package relatively simple,
but if the cost(maintenance+portability burden)/benefit
ratio is low enough, then anything is possible.

For example, a well-encapsulated, optionally-threaded
"stat_all_dir_entries" API might be useful in some situations.

If getting any eventual patch into upstream coreutils is
important to you, be sure there is some consensus on this
list before doing a lot of work on it.

> display.  (This could improve performance for NFS, in proportion to the
> latency and the number of threads.)
>
>
> Question:  Is there already a set of "improved" utilities that implement this
> kind of technique?

Not that I know of.

> If not, would this kind of performance enhancements be
> considered useful?

It's impossible to say without knowing more.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]