Hi Jan-Henrik,
I went ahead and created a sample script to make sure this actually works and I can confirm it does with that simple script. The issue as logs show is apparently a result of a double notification. The script took so long that monit killed it but the timeout was exactly equal to the time of next occurence:
[EDT Jul 13 03:15:51] error : 'myscript' program timed out after 7230 seconds. Killing program with pid 4407
[EDT Jul 13 03:15:51] error : 'myscript' Sun Microsystems Inc. SunOS 5.10 Generic January 2005
You have new mail.
The first is a real error but from the myscript logs I can see that on 03:15 it did start and it was running correctly until suddenly it stopped presumably because monit killed it. So my best guess at this moment would be:
1. Monit receives previous myscript timeout notification at the same time as current myscript run events
2. Monit kills both instances
3. Monit alerts on the timeout and on the killed process, however on the latter there is nothing in stderr so monit defaults to stdout
Clearly I have a workaround which is setting a shorter than the script run cycle (2 hours for this script case)
On a side note/question I noticed monit switches to "waiting" for the next occurrence of the script instead of staying in failed status. After all I would like to run 'monit summary' and make sure I know if the script failed last time or not (and not rely uniquely on an alert). Is this a feature to be considered? You can see this easily just scheduling a simple bash bash script and forcing it to exit with status=1 for example.
Thanks!
- Nestor