[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Improper handling of non-default termination sequences (--termseq)
From: |
Vladimir Oltean |
Subject: |
Improper handling of non-default termination sequences (--termseq) |
Date: |
Sun, 8 Oct 2017 22:32:45 +0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 |
Hi Ole,
I am running several worker executables with gnu parallel. As the input
data is a bit hard to predict upfront, I am not passing the input to the
workers via gnu parallel. Instead, the workers have a rather long
lifespan, and continuously pick jobs from a job pool, in a while loop.
Their behavior on SIGTERM is to finish the current job they are assigned
to, and then stop taking new jobs from the pool and exit. If a second
SIGTERM is passed to them, they will even drop the job they're working
on, clean up and exit.
Now this is all good and dandy, and I have changed gnu parallel's
default --termseq from "TERM,200,TERM,100,TERM,50,KILL,25" to just
"INT". I am expecting to be able to pass multiple Ctrl-c (SIGINT)
signals received by gnu parallel into multiple termination sequences
sent to my worker tasks.
But alas, the current behavior of gnu parallel is to not properly wait
for its children to die. It will thus leave them orphaned and I don't
ever get the chance to send the second Ctrl-c to gnu parallel.
Also, the wait_and_exit(255) is (incorrectly for my scenario) called
from within the signal handler. This means gnu parallel is not able to
handle any subsequent SIGINT signals while waiting for its children die.
I am able to do my job with this patch:
diff -u -r a/src/parallel b/src/parallel
--- a/src/parallel
+++ b/src/parallel
@@ -33,6 +33,8 @@
# Used to ensure code quality
use strict;
use File::Basename;
+our $got_sigterm = 0;
+our $got_sigint = 0;
save_stdin_stdout_stderr();
save_original_signal_handler();
@@ -2674,6 +2676,14 @@
my $sleep = 0.2;
do {
while($Global::total_running > 0) {
+
+ if ($got_sigint || $got_sigterm) {
+ $got_sigint = 0;
+ $got_sigterm = 0;
+ if($opt::tmux) { ::qqx("tmux kill-session -t p$$"); }
+ wait_and_exit(255);
+ }
+
debug($Global::total_running, "==", scalar
keys %Global::running," slots: ", $Global::max_jobs_running);
if($opt::pipe) {
@@ -3771,14 +3781,8 @@
# Uses:
# %Global::original_sig
# Returns: N/A
- $SIG{INT} = sub {
- if($opt::tmux) { ::qqx("tmux kill-session -t p$$"); }
- wait_and_exit(255);
- };
- $SIG{TERM} = sub {
- if($opt::tmux) { ::qqx("tmux kill-session -t p$$"); }
- wait_and_exit(255);
- };
+ $SIG{INT} = sub { $got_sigint = 1; };
+ $SIG{TERM} = sub { $got_sigterm = 1; };
%Global::original_sig = %SIG;
$SIG{TERM} = sub {}; # Dummy until jobs really start
$SIG{ALRM} = 'IGNORE';
@@ -3927,12 +3931,39 @@
# Convert pids to process groups ($processgroup = -$pid)
my @pgrps = map { -$_ } @_;
my @term_seq = split/,/,$opt::termseq;
+ my $pid;
if(not @term_seq) {
@term_seq = ("TERM",200,"TERM",100,"TERM",50,"KILL",25);
}
while(@term_seq) {
@pgrps = kill_sleep(shift @term_seq, shift @term_seq, @pgrps);
}
+ # Under a normal termseq, all child process groups should get killed
+ # by SIGKILL and this code would not execute. However, if we just send
+ # a non-fatal termseq, we have to wait for the children to die.
+ # We also need to make sure that further signals get forwarded to them.
+ while (@pgrps) {
+ print "Process groups left alive: @pgrps\n";
+ while (($pid = waitpid(-1, &WNOHANG)) <= 0) {
+ if ($got_sigint || $got_sigterm) {
+ $got_sigint = 0;
+ $got_sigterm = 0;
+ # kill_sleep_seq expects pids, not process groups
+ kill_sleep_seq( map { -$_ } @pgrps);
+ last;
+ }
+ sleep(1);
+ }
+ # $pid might be zero because a signal was received while waiting
+ if ($pid) {
+ print "pid $pid exited\n";
+ # Remove process group leaders one by one.
+ @pgrps = grep { $_ != -$pid } @pgrps;
+ }
+ # Re-count the children that are still alive after
+ # a recursive invocation of kill_sleep_seq
+ @pgrps = grep { kill( 0, $_) } @pgrps;
+ }
}
sub kill_sleep {
---
Please help review and merge it once it meets your quality standards.
Thanks,
-Vladimir
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Improper handling of non-default termination sequences (--termseq),
Vladimir Oltean <=