Re: checking ntpd

On Sun, Apr 17, 2011 at 5:49 AM, Martin Pala <address@hidden> wrote:

Hi,

please can you send the monit log? It will show the reason why ntpd was restarted - whether the process died or the protocol test failed.

The reason for repeated restarts could be the ntpd behavior when the time difference is large (which may happen if the system was booted and time was not set) - if the ntpd is started and the time difference is bigger then 1000s, then ntpd usually exits - if monit is set to restart it, the ntpd will be started again, but will also exit again. In such case it is necessary to step the time for example using ntpdate.

It will be better to modify the configuration this way:

--8<--
check process date-time with pidfile /var/run/ntpd.pid
        start program =  "/bin/bash -c '/usr/sbin/ntpdate -s pool.ntp.org && /sbin/service ntpd start'"

        stop program = "/sbin/service ntpd stop"
   if failed host 127.0.0.1 port 123 type udp protocol ntp3 for 2 times within 3 cycles then restart

        if 2 restarts within 3 cycles then timeout

check host ntp_peer with address pool.ntp.org
   if failed port 123 type udp protocol ntp3 for 2 times within 3 cycles then alert

--8<--

=> the start program is modified to set the time using ntpdate before ntpd is started.

The "protocol ntp3" is added - this is highly recommended especially for the UDP tests because of the connection-less nature of UDP. It allows to speedup the test because monit knows what the server should return - generic UDP test (without protocol specification) is tricky, as the only way to check that the packet arrived to the destination is, that no network error was indicated by ICMP.

Regards,
Martin

On Apr 16, 2011, at 8:21 PM, Mike Schmidt wrote:

Hi,

I have about 50 systems running monit to a m/monit server. The config files for all of them are the same, although the versions of linux are not necessarily so. I am seeing a number of inconsistencies in the different systems. Many of these have problems with ntpd:

check process date-time with pidfile /var/run/ntpd.pid
        start program = "/sbin/service ntpd start"
        stop program = "/sbin/service ntpd stop"
#       if failed host pool.ntp.org port 123 type udp for 2 times within 3 cycles then alert
        if 2 restarts within 3 cycles then timeout

These systems are rebooted every night.

Most of the systems are ok. However, a number of them, across all versions of linux, keep thinking ntpd is not running, and restarting it, sometimes to the point of unmonitoring it (even though it's still running when I log on to the system in question to check). Looking at the events, I see that monit has restarted ntpd once in a while, like 3 or 4 times arbitrarily. Before I installed monit, ntpd never stopped on its own to my knowledge. So monit is doing the stop/restart.

Any ideas on what can be causing this? Why would monit think its stopped when it's not? The pid file contains the correct pid,
--
Mike SCHMIDT
CTO
Intello Technologies Inc.
address@hidden

Canada: 1-888-404-6261 x320
USA: 1-888-404-6268 x320
www.intello.com

--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general

From:	Mike Schmidt
Subject:	Re: checking ntpd
Date:	Sun, 17 Apr 2011 11:03:09 -0400