monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit process restart problem - simultaneous stop/start race


From: Christopher Johnston
Subject: Re: monit process restart problem - simultaneous stop/start race
Date: Thu, 27 Sep 2012 10:55:55 -0400

I see this happen often as well, sometimes forced to restart monitd entirely.

On Thu, Sep 27, 2012 at 10:53 AM, Brano Zarnovican <address@hidden> wrote:
Hi,

when I restart service manually, via init script (service foo restart)
it works every time.
When you try the same with monit (monit restart foo), it will end up
in Execution failed most of the time.

Root cause:
On restart action, monit will fork and execute start program as soon
as the monitored process disappears, irrespective if stop program has
finished or it is still running, leading to a partial overlap of the
end of stop execution and beginning of start.

Typical init script

start() {
    start service &
    echo $! > /var/run/foo.pid
}
stop() {
    kill `cat /var/run/foo.pid`
    rm -f /var/run/foo.pid
}


State #1: process 'foo' is running with pid 100, pid file exists
monit restart foo

stop: kill `cat /var/run/foo.pid`
start: start service &
start: echo $! > /var/run/foo.pid
stop: rm -f /var/run/foo.pid

State #2: process 'foo' is running with pid 200, pid file is missing

(later, monit attempts to start a process which he consider to be down)
start: start service &
start: echo $! > /var/run/foo.pid

depending on how good your scripts are, you end up with either
State #3a: process 'foo' is running with pid 200, pid file contains
300 (failed second process)
or
State #3b: process 'foo' is running with pid 200, pid file is still missing

Workaround is to insert few sleeps here and there (best place is
pre-startup). Or save the timestamp of pid file before kill-ing and
check if it was changed just before 'rm'. Or, don't delete pid file at
all..

The root of the problem is that there might be a code which is
executed after the process is stopped which simply cannot overlap with
start. Pid file is just one example. Imagine that you would delete
some tmp or persistent state file during stop which are also created
during startup.

Suggested solution:
Introduce an option that would make monit to wait for the end of stop
program instead of process termination. Respectively, the later of the
two events. Only then it would call start program.

Regards,

BranoZ

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general


reply via email to

[Prev in Thread] Current Thread [Next in Thread]