[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
monit process restart problem - simultaneous stop/start race
From: |
Brano Zarnovican |
Subject: |
monit process restart problem - simultaneous stop/start race |
Date: |
Thu, 27 Sep 2012 16:53:54 +0200 |
Hi,
when I restart service manually, via init script (service foo restart)
it works every time.
When you try the same with monit (monit restart foo), it will end up
in Execution failed most of the time.
Root cause:
On restart action, monit will fork and execute start program as soon
as the monitored process disappears, irrespective if stop program has
finished or it is still running, leading to a partial overlap of the
end of stop execution and beginning of start.
Typical init script
start() {
start service &
echo $! > /var/run/foo.pid
}
stop() {
kill `cat /var/run/foo.pid`
rm -f /var/run/foo.pid
}
State #1: process 'foo' is running with pid 100, pid file exists
monit restart foo
stop: kill `cat /var/run/foo.pid`
start: start service &
start: echo $! > /var/run/foo.pid
stop: rm -f /var/run/foo.pid
State #2: process 'foo' is running with pid 200, pid file is missing
(later, monit attempts to start a process which he consider to be down)
start: start service &
start: echo $! > /var/run/foo.pid
depending on how good your scripts are, you end up with either
State #3a: process 'foo' is running with pid 200, pid file contains
300 (failed second process)
or
State #3b: process 'foo' is running with pid 200, pid file is still missing
Workaround is to insert few sleeps here and there (best place is
pre-startup). Or save the timestamp of pid file before kill-ing and
check if it was changed just before 'rm'. Or, don't delete pid file at
all..
The root of the problem is that there might be a code which is
executed after the process is stopped which simply cannot overlap with
start. Pid file is just one example. Imagine that you would delete
some tmp or persistent state file during stop which are also created
during startup.
Suggested solution:
Introduce an option that would make monit to wait for the end of stop
program instead of process termination. Respectively, the later of the
two events. Only then it would call start program.
Regards,
BranoZ
- monit process restart problem - simultaneous stop/start race,
Brano Zarnovican <=