[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Alternate termination sequence option --term-seq
From: |
Ole Tange |
Subject: |
Re: Alternate termination sequence option --term-seq |
Date: |
Tue, 26 May 2015 00:29:28 +0200 |
On Thu, Apr 30, 2015 at 12:47 PM, Rasmus Villemoes
<rv@rasmusvillemoes.dk> wrote:
> On Wed, Apr 29 2015, Ole Tange <ole@tange.dk> wrote:
>
>> On Wed, Apr 29, 2015 at 2:07 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk>
>> wrote:
>>> On Wed, Apr 29 2015, Ole Tange <ole@tange.dk> wrote:
>>>
>>>> This still has the risk of killing an innocent PID and its children.
>>>
>>> Killing (in the sense of sending any signal whatsoever) an
>>> innocent/unrelated PID is completely unacceptable, IMO. On a reasonably
>>> busy system, PID reuse within 10 seconds is far from unlikely.
I have tried making each process its own process group. It fails for
two reasons:
* open3 does not support it
* giving a process its own process group also gives it a new
terminal, and thus Ctrl-C does not kill the children and it breaks
--tty. That means Ctrl-C will have to be propagated by Parallel.
The git code now does:
my @family_pids = family_pids(keys %Global::running);
my @pids = @family_pids;
my @term_seq = ("TERM",200,"TERM",200,"KILL",200);
while(@term_seq) {
@pids = kill_sleep(shift @term_seq, shift @term_seq, @pids);
}
kill_sleep {
my ($signal, $sleep_max, @pids) = @_;
kill $signal, @pids;
my $sleepsum = 0;
my $sleep = 0.001;
for (; kill(0, @pids) and $sleepsum < $sleep_max;
$sleepsum += $sleep) {
if(waitpid(-1, &WNOHANG) > 0)) {
# Remove (grand)*children that are dead
# Problematic re-use can happen here
@pids = grep { kill( 0, $_) } @pids;
$sleep = $sleep/2+0.001;
} else {
$sleep *= 1.1;
::usleep($sleep);
}
}
@pids = grep { kill( 0, $_) } @pids;
return @pids;
}
So new pids will never be added: Only the original family will be
killed. In practice that works really well.
But it is not bullet proof.
I have setup a system with only 1000 pids:
echo 1000 | sudo tee /proc/sys/kernel/pid_max
Then I have started a few processes that respawns until their PID is >
900, then they wait a second before respawning, and which complain if
they are 'kill -TERM'-ed:
sleep 0.$RANDOM; perl -e '$SIG{TERM}= sub { qx{echo `date` $$
>>termlog }; open A,">",$ARGV[0]; print A "TERM $$\n"; }; while(not
fork) { if(900 < $$ and $$ < 1000) { sleep 1 } }' `tty`
They make it likely that new processes will have PID > 900, thus
drastically increases the chance of reusing a pid.
Finally I ran jobs that would trigger the killing:
seq 100 | parallel --halt -2 -j1 -N0 "seq 300 | ./parallel -Dkill
--delay 0.01 -j0 --halt 2 'sleep 1{} & sleep 1 & sleep 1.\$RANDOM
&sleep 1;false';cat termlog"
After some minutes a process is wrongly killed.
As far as I can tell it is because the family_pids are dynamic, and we
cannot tell for sure if the a pid has been reused: it can happen
milliseconds after family_pids have been computed. In a normal system
chances are extremely slim, but as my test system shows it is
non-zero.
All in all this sucks.
I believe Rasmus' idea of process groups is the only safe way forward,
but it is harder for me to see, how that can be implemented: Wrapping
it in a perl oneliner will require GNU Parallel to forward signals
(which signals?) to the process groups. And if a --tty is requested,
this will have to be disabled.
Ideas and especially patches are welcome.
/Ole