I have about 50 systems running monit to a m/monit server. The config files for all of them are the same, although the versions of linux are not necessarily so. I am seeing a number of inconsistencies in the different systems. Many of these have problems with ntpd:
check process date-time with pidfile /var/run/ntpd.pid start program = "/sbin/service ntpd start" stop program = "/sbin/service ntpd stop" # if failed host pool.ntp.org port 123 type udp for 2 times within 3 cycles then alert
if 2 restarts within 3 cycles then timeout
These systems are rebooted every night.
Most of the systems are ok. However, a number of them, across all versions of linux, keep thinking ntpd is not running, and restarting it, sometimes to the point of unmonitoring it (even though it's still running when I log on to the system in question to check). Looking at the events, I see that monit has restarted ntpd once in a while, like 3 or 4 times arbitrarily. Before I installed monit, ntpd never stopped on its own to my knowledge. So monit is doing the stop/restart.
Any ideas on what can be causing this? Why would monit think its stopped when it's not? The pid file contains the correct pid, --