parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Improper handling of non-default termination sequences (--termseq)


From: Vladimir Oltean
Subject: Improper handling of non-default termination sequences (--termseq)
Date: Sun, 8 Oct 2017 22:32:45 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0


Hi Ole,

I am running several worker executables with gnu parallel. As the input data is a bit hard to predict upfront, I am not passing the input to the workers via gnu parallel. Instead, the workers have a rather long lifespan, and continuously pick jobs from a job pool, in a while loop. Their behavior on SIGTERM is to finish the current job they are assigned to, and then stop taking new jobs from the pool and exit. If a second SIGTERM is passed to them, they will even drop the job they're working on, clean up and exit. Now this is all good and dandy, and I have changed gnu parallel's default --termseq from "TERM,200,TERM,100,TERM,50,KILL,25" to just "INT". I am expecting to be able to pass multiple Ctrl-c (SIGINT) signals received by gnu parallel into multiple termination sequences sent to my worker tasks. But alas, the current behavior of gnu parallel is to not properly wait for its children to die. It will thus leave them orphaned and I don't ever get the chance to send the second Ctrl-c to gnu parallel. Also, the wait_and_exit(255) is (incorrectly for my scenario) called from within the signal handler. This means gnu parallel is not able to handle any subsequent SIGINT signals while waiting for its children die.

I am able to do my job with this patch:

diff -u -r a/src/parallel b/src/parallel
--- a/src/parallel
+++ b/src/parallel
@@ -33,6 +33,8 @@
 # Used to ensure code quality
 use strict;
 use File::Basename;
+our $got_sigterm = 0;
+our $got_sigint  = 0;

 save_stdin_stdout_stderr();
 save_original_signal_handler();
@@ -2674,6 +2676,14 @@
     my $sleep = 0.2;
     do {
         while($Global::total_running > 0) {
+
+           if ($got_sigint || $got_sigterm) {
+               $got_sigint  = 0;
+               $got_sigterm = 0;
+               if($opt::tmux) { ::qqx("tmux kill-session -t p$$"); }
+               wait_and_exit(255);
+           }
+
             debug($Global::total_running, "==", scalar
                  keys %Global::running," slots: ", $Global::max_jobs_running);
            if($opt::pipe) {
@@ -3771,14 +3781,8 @@
     # Uses:
     #   %Global::original_sig
     # Returns: N/A
-    $SIG{INT} = sub {
-       if($opt::tmux) { ::qqx("tmux kill-session -t p$$"); }
-       wait_and_exit(255);
-    };
-    $SIG{TERM} = sub {
-       if($opt::tmux) { ::qqx("tmux kill-session -t p$$"); }
-       wait_and_exit(255);
-    };
+    $SIG{INT}  = sub { $got_sigint  = 1; };
+    $SIG{TERM} = sub { $got_sigterm = 1; };
     %Global::original_sig = %SIG;
     $SIG{TERM} = sub {}; # Dummy until jobs really start
     $SIG{ALRM} = 'IGNORE';
@@ -3927,12 +3931,39 @@
     # Convert pids to process groups ($processgroup = -$pid)
     my @pgrps = map { -$_ } @_;
     my @term_seq = split/,/,$opt::termseq;
+    my $pid;
     if(not @term_seq) {
        @term_seq = ("TERM",200,"TERM",100,"TERM",50,"KILL",25);
     }
     while(@term_seq) {
        @pgrps = kill_sleep(shift @term_seq, shift @term_seq, @pgrps);
     }
+    # Under a normal termseq, all child process groups should get killed
+    # by SIGKILL and this code would not execute. However, if we just send
+    # a non-fatal termseq, we have to wait for the children to die.
+    # We also need to make sure that further signals get forwarded to them.
+    while (@pgrps) {
+       print "Process groups left alive: @pgrps\n";
+       while (($pid = waitpid(-1, &WNOHANG)) <= 0) {
+           if ($got_sigint || $got_sigterm) {
+               $got_sigint  = 0;
+               $got_sigterm = 0;
+               # kill_sleep_seq expects pids, not process groups
+               kill_sleep_seq( map { -$_ } @pgrps);
+               last;
+           }
+           sleep(1);
+       }
+       # $pid might be zero because a signal was received while waiting
+       if ($pid) {
+           print "pid $pid exited\n";
+           # Remove process group leaders one by one.
+           @pgrps = grep { $_ != -$pid } @pgrps;
+       }
+       # Re-count the children that are still alive after
+       # a recursive invocation of kill_sleep_seq
+       @pgrps = grep { kill( 0, $_) } @pgrps;
+    }
 }

 sub kill_sleep {
---

Please help review and merge it once it meets your quality standards.

Thanks,
-Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]