bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cannot specify the number of threads for parsort


From: Mario Roy
Subject: Re: Cannot specify the number of threads for parsort
Date: Fri, 10 Feb 2023 22:02:25 -0600


I modified parsort locally to support -jN, -j N, and check the PARALLEL environment for -jN or -j N. The -j option is working for processing STDIN via (sort_stdin) and input as files via (sort_files). Note: The -j option must be the first option to parsort.

--- parsort 2023-01-23 18:18:11.995898238 -0600
+++ parsort.new 2023-02-10 21:24:25.894632414 -0600
@@ -86,6 +86,17 @@
 Getopt::Long::Configure("bundling","require_order");
 
 my @ARGV_before = @ARGV;
+my $NJOBS;
+
+if($ARGV[0] eq "-j") {
+    shift, $NJOBS = shift, shift @ARGV_before for 1..2;
+    $ENV{PARALLEL} = "-j $NJOBS";
+} elsif($ARGV[0] =~ /-j(\d+)/) {
+    shift, $NJOBS = $1, shift @ARGV_before;
+    $ENV{PARALLEL} = "-j $NJOBS";
+} elsif($ENV{PARALLEL} =~ /-j\s*(\d+)/) {
+    $NJOBS = $1;
+}
 
 GetOptions(
     "debug|D" => \$opt::D,
@@ -175,7 +186,7 @@
     # Input is stdin
     # Spread the input between n processes that each sort
     # n = number of CPU threads
-    my $numthreads = `parallel --number-of-threads`;
+    my $numthreads = defined($NJOBS) ? $NJOBS : `parallel --number-of-threads`;
     my @fifos = map { tmpfifo() } 1..$numthreads;
     map { mkfifo($_,0600) } @fifos;
     # This trick removes the fifo as soon as it is connected in the other end




I very much like parsort now regarding -j consistency with parallel (including PARALLEL env). There are possibly more reasons to have -j. For example, imagine an enterprise environment connected to SAN storage. Oftentimes, the SAN storage server results in a bottleneck.  ++ for the -j option to parsort.

Please feel free to reject my change above. I tried to follow the format of the code with no spaces between if( and paren.




On Fri, Feb 10, 2023 at 8:10 PM Sam James <sam@cmpct.info> wrote:


> On 11 Feb 2023, at 01:44, Mario Roy <marioeroy@gmail.com> wrote:
>
> > Can you explain why you need this? To me it seems odd not to use all the cores you paid for.
>
> I was trying to simulate running on a smaller machine how long the task takes for sharing with a friend. Anyway, ++ for parsort supporting the -j argument; to be consistent with parallel -j. Ditto for parsort checking the PARALLEL environment variable if it has -jN or -j N; this too for consistency with parallel.
>
> > There's a current phobia of having a machine actually using all it's cores and a misconception that the users can do a better job of scheduling than the scheduler!
>
> No, not at all. Having the -j option to parsort is desirable on larger NUMA-aware machines. This is true when a node is busy and wish to run on the other node. Surely, one can use nodeadm or taskset. But why default to spawning all those processes only for the OS to task switch more on that particular node.
>
Yeah, it's pretty normal to want to limit the number of jobs either because of other work one is doing on the machine, or because others are using the machine.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]