[monit] can't stop FPs

From: Len Conrad
Subject: [monit] can't stop FPs
Date: Wed, 9 Sep 2009 16:17:21 +0200

I have a group of machines on the same subnet running monit, where several of 
the machines are "check host" each other.  These machines don't generate FPs 
amonng themselves.

I have another machine on a remote site running monit that "check host" the 
group of machines, with 

check host hostname-ALIVE with address hostname
 if failed icmp type echo count 5 
   with timeout 1 seconds 3 times within 3 cycles 
   then alert

The average ping time from remote monit to the group is 65 ms.

The remote monit sporadically generates false alerts for only one machine at a 
time in the group (not alerts for the entire group), with the FP moving among 
the group.  When the remote monit sends an alert for machine x, the other 
machines in the group also monitoring machine x send no alert.

I'm talking 2 or 3 FPs/day.  Not a disaster but does anybody have any 
suggestions how to kill these FPs?  I could up to "count 50 with 10 times 
within 10 cycles", but I'd rather understand why the above doesn't suffice.


