Re: Alternate termination sequence option --term-seq

parallel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Alternate termination sequence option --term-seq

From:	Ole Tange
Subject:	Re: Alternate termination sequence option --term-seq
Date:	Tue, 26 May 2015 00:29:28 +0200

On Thu, Apr 30, 2015 at 12:47 PM, Rasmus Villemoes
<rv@rasmusvillemoes.dk> wrote:
> On Wed, Apr 29 2015, Ole Tange <ole@tange.dk> wrote:
>
>> On Wed, Apr 29, 2015 at 2:07 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk> 
>> wrote:
>>> On Wed, Apr 29 2015, Ole Tange <ole@tange.dk> wrote:
>>>
>>>> This still has the risk of killing an innocent PID and its children.
>>>
>>> Killing (in the sense of sending any signal whatsoever) an
>>> innocent/unrelated PID is completely unacceptable, IMO. On a reasonably
>>> busy system, PID reuse within 10 seconds is far from unlikely.

I have tried making each process its own process group. It fails for
two reasons:

  * open3 does not support it
  * giving a process its own process group also gives it a new
terminal, and thus Ctrl-C does not kill the children and it breaks
--tty. That means Ctrl-C will have to be propagated by Parallel.

The git code now does:

    my @family_pids = family_pids(keys %Global::running);
    my @pids = @family_pids;
    my @term_seq = ("TERM",200,"TERM",200,"KILL",200);
    while(@term_seq) {
        @pids = kill_sleep(shift @term_seq, shift @term_seq, @pids);
    }

kill_sleep {
    my ($signal, $sleep_max, @pids) = @_;
    kill $signal, @pids;
    my $sleepsum = 0;
    my $sleep = 0.001;
    for (; kill(0, @pids) and $sleepsum < $sleep_max;
         $sleepsum += $sleep) {
        if(waitpid(-1, &WNOHANG) > 0)) {
            # Remove (grand)*children that are dead
            # Problematic re-use can happen here
            @pids = grep { kill( 0, $_) } @pids;
            $sleep = $sleep/2+0.001;
        } else {
            $sleep *= 1.1;
            ::usleep($sleep);
        }
    }
    @pids = grep { kill( 0, $_) } @pids;
    return @pids;
}

So new pids will never be added: Only the original family will be
killed. In practice that works really well.

But it is not bullet proof.

I have setup a system with only 1000 pids:

echo 1000 | sudo tee /proc/sys/kernel/pid_max

Then I have started a few processes that respawns until their PID is >
900, then they wait a second before respawning, and which complain if
they are 'kill -TERM'-ed:

sleep 0.$RANDOM; perl -e '$SIG{TERM}= sub { qx{echo `date` $$
>>termlog }; open A,">",$ARGV[0]; print A "TERM $$\n"; }; while(not
fork) { if(900 < $$ and $$ < 1000) { sleep 1 } }' `tty`

They make it likely that new processes will have PID > 900, thus
drastically increases the chance of reusing a pid.

Finally I ran jobs that would trigger the killing:

seq 100 | parallel --halt -2 -j1 -N0 "seq 300 | ./parallel -Dkill
--delay 0.01 -j0 --halt 2 'sleep 1{} & sleep 1 & sleep 1.\$RANDOM
&sleep 1;false';cat termlog"

After some minutes a process is wrongly killed.

As far as I can tell it is because the family_pids are dynamic, and we
cannot tell for sure if the a pid has been reused: it can happen
milliseconds after family_pids have been computed. In a normal system
chances are extremely slim, but as my test system shows it is
non-zero.

All in all this sucks.

I believe Rasmus' idea of process groups is the only safe way forward,
but it is harder for me to see, how that can be implemented: Wrapping
it in a perl oneliner will require GNU Parallel to forward signals
(which signals?) to the process groups. And if a --tty is requested,
this will have to be disabled.

Ideas and especially patches are welcome.

/Ole

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Alternate termination sequence option --term-seq, Ole Tange, 2015/05/01
- Re: Alternate termination sequence option --term-seq, Ole Tange <=

Prev by Date: GNU Parallel 20150522 ('Nepal') released [stable]
Previous by thread: Re: Alternate termination sequence option --term-seq
Next by thread: Which exit code should --halt give?
Index(es):
- Date
- Thread