bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cannot specify the number of threads for parsort


From: Mario Roy
Subject: Re: Cannot specify the number of threads for parsort
Date: Sat, 18 Feb 2023 03:42:36 -0600

Are you in the high memory consumption scenario which Nigel describes?

The issue is running parsort on large scale machines. Running on all cores is mostly not desirable for memory intensive applications. The memory channels become the bottle neck, eventually.

The mcesort variant has reached the incubator stage (code 100% completed). It supports the -j (short option) and --parallel. Obviously, specifying 1% will not be less than 1 CPU core, minimally.

-jN   integer value
-jN%  percentage value; e.g. -j1% .. -j100%
-jmax or -jauto  same as 100% or available N logical cores

The test file is a mockup of random generated key-value pairs. There are 323+ million rows.

$ ls -lh /dev/shm/huge
-rw-r--r-- 1 mario mario 2.8G Feb 18 00:48 /dev/shm/huge

$ wc -l /dev/shm/huge
323398400 /dev/shm/huge


Using parsort, one cannot specify the number of cores processing a file. So, it spawns 64 workers on this machine. The Perl MCE variant performs similarly. I get better throughput by running 38 workers versus 64.

$ time parsort /dev/shm/huge | cksum
3409526408 2910585600

real 0m18.147s
user 0m13.920s
sys 0m3.660s

$ time mcesort -j64 /dev/shm/huge | cksum
3409526408 2910585600

real 0m18.081s
user 2m52.082s
sys 0m10.860s

$ time mcesort -j38 /dev/shm/huge | cksum
3409526408 2910585600

real 0m16.788s
user 2m21.384s
sys 0m8.263s


Regarding standard input, I can run parsort using a wrapper script (given at the top of this email thread). Notice how parsort has better throughput running 38 workers.

$ time parsort -j64 </dev/shm/huge | cksum
3409526408 2910585600

real 0m19.553s
user 0m14.030s
sys 0m3.520s


$ time mcesort -j64 </dev/shm/huge | cksum
3409526408 2910585600

real 0m18.312s
user 2m42.042s
sys 0m11.546s

$ time parsort -j38 </dev/shm/huge | cksum
3409526408 2910585600

real 0m17.609s
user 0m11.856s
sys 0m3.451s


$ time mcesort -j38 </dev/shm/huge | cksum
3409526408 2910585600

real 0m16.819s
user 2m21.108s
sys 0m9.523s



I find it interesting in not seeing the total user time running parsort (tally of all workers' time).

This was a challenge and can see the finish line, hoping by next week.



On Fri, Feb 17, 2023 at 2:49 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 2/17/23 13:41, Mario Roy wrote:
It looks like we may not get what we kindly asked for. So, I started making "mcesort" using Perl MCE's chunking engine.

On Thu, Feb 16, 2023 at 5:08 AM Nigel Stewart <nigels@nigels.com> wrote:
Can you elaborate on what I am missing from the picture?

Ole,

Perhaps your workloads are more CPU and I/O intensive, and latency is less of a priority.
If the workload is memory-intensive, that can be the more important constraint than
the number of available cores.  If the workload is interactive (latency-sensitive) it's
undesirable to have too many jobs in flight competing for CPU and I/O, delaying each other.

- Nigel
 
Are you in the high memory consumption scenario which Nigel describes?

If you're going to develop it anyway, you could try submitting a patch to GNU Parallel. 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]