Yeah was looking through the code and saw the call to check if process is running before issuing stop (ProcessTree_findProcess), so that was only thought I had as well.
check process foo matching /usr/local/bin/foo.py
start program = "/bin/bash -l -c 'nohup /usr/local/bin/foo.py &'" as uid "nobody"
stop program = "/usr/bin/pkill -u nobody -f /usr/local/bin/foo.py" as uid "nobody"
if uptime > 11 hours then alert
if uptime > 12 hours then exec "/usr/bin/pkill -u nobody -f -9 /usr/local/bin/foo.py" as uid "nobody"
if 2 restarts within 3 cycles then timeout
group apps
depends foo.py
check process bar matching ^/usr/local/bin/bar
start program = "/bin/bash -lc 'HOME=/home/someuser nohup /usr/local/bin/bar.sh > /tmp/bar-startup.out 2>&1 &'"
stop program = "/bin/bash -c '/usr/bin/pkill -f ^/usr/local/bin/bar; sleep 1; /usr/bin/pkill -f ^/usr/local/bin/bar'"
onreboot nostart
if uptime > 12 hours then exec "/usr/bin/pkill -9 -f ^/usr/local/bin/bar"
group apps
mode passive
Here are logs from yesterday and today wrt to "bar"
[CST Mar 1 15:15:01] info : 'bar' stop action done
[CST Mar 4 07:02:01] info : 'bar' start on user request
[CST Mar 4 07:02:01] info : 'bar' start action done
[CST Mar 4 07:02:01] error : 'bar' uptime test failed for /usr/local/bin/bar-- current uptime is 259177 seconds
<we get above since it failed to shutdown on 3/1>
[CST Mar 4 07:02:01] info : 'bar' exec: '/usr/bin/pkill -9 -f /usr/local/bin/bar'
[CST Mar 4 07:02:21] error : 'bar' process is not running
<above line repeats every 20 seconds until we manually start it via monit>
[CST Mar 4 07:51:11] info : 'bar' start: '/bin/bash -lc HOME=/home/someuser nohup /usr/local/bin/bar.sh > /tmp/bar-startup.out 2>&1 &'
[CST Mar 4 07:51:11] info : 'bar' start action done
[CST Mar 4 07:51:11] info : 'bar' process is running with pid 4897
[CST Mar 4 07:51:11] info : 'bar' uptime test succeeded [current uptime = 1 seconds]
[CST Mar 4 15:15:01] info : 'bar' stop on user request
[CST Mar 4 15:15:01] info : 'bar' stop action done
<below same thing repeats itself the following morning>
[CST Mar 5 07:02:01] info : 'bar' start on user request
[CST Mar 5 07:02:01] info : 'bar' start action done
[CST Mar 5 07:02:01] error : 'bar' uptime test failed for /usr/local/bin/bar-- current uptime is 83451 seconds
[CST Mar 5 07:02:01] info : 'bar' exec: '/usr/bin/pkill -9 -f /usr/local/bin/bar'
Thanks again for looking. Worst case I'll just build a debug version of monit with some extra logging to see what is going on.