[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Monit not detecting service failure, reports service is up when it i

From: Martin Pala
Subject: Re: Monit not detecting service failure, reports service is up when it is not.
Date: Sun, 17 Apr 2011 12:06:26 +0200

There are few possible problems:

1.) monit tests in cycles, whereas the interval between cycles is given by the "set daemon <seconds>" in monitrc => in the case that your interval is large (lets say 5 minutes), then monit won't detect the problem until next cycle and may present the process as running in the GUI (this information is cached until next test cycle). Using short interval such as 5s is better to get quick reaction on problems.

2.) The pidfile based test does check for the process running with given PID - you can use the match based check to make sure that the specified process is running independent of pidfile (if you want to use this check, use Monit 5.2.5 or newer):
check process apache matching "/usr/sbin/httpd"

3.) The process uptime was displayed based on the timestamp of the pidfile for Monit < 5.2. The Monit 5.2 or newer shows real uptime (based on process table informations).

=> i suggest to upgrade Monit to 5.2.5 and use match based process check if the pidfile based check is not reliable in your environment


On Apr 16, 2011, at 7:43 PM, Eduardo Gutierrez wrote:

I am experiencing the following strange circumstances:

- I visit the monit service manager and it reports that all my services are running
- The service is clearly not running.  When I try to use it, it fails in a manner that indicates that it is not running (i.e. it is unavailable), and when I run "ps -aef | grep <service name>" nothing shows up.  Also, if I scan through the ps -aef list myself, I see nothing resembling the service.
- I also searched for the process ID stored in the pid file using "ps -aef | grep <pid>" and I got nothing.  In the past I have seen monit think that the service is running if that pid file, having been left there by the previous instance of the service, contains the pid of another unrelated service that is currently running.  In other words, monit *seems* to dumbly check to see if the pid indicated in the pid file matches any currently running process in deciding that its monitored service is running.  But that is not happening in this case.  What *is* happening here?  How is it that monit decides whether or not the monitored service is actually running?
- The strangest thing is that the monit service manager is reporting that the service has been up for 1 day 10 hours and 18 minutes, whereas the other monitored service has been up for 10 hours and 18 minutes.  The server is itself reporting an uptime of 10 hours and 18 minutes, so the amount of time that the service manager is reporting is completely wrong.

Any thoughts?
To unsubscribe:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]