I have a bunch of Centos 5.2 Servers running Apache, I've installed monit 4.9 (RPMs from DAG repository). These servers are heavily loaded most of the day (average 1 min over 20, many hours a day). I keep getting the following messages in my Mailbox:
** Subject httpd Timeout - httpd unmonitor on XXXXX: 'httpd' service timed out and will not be checked anymore. ** Subject httpd Connection failed - httpd restart on XXXXXX: 'httpd' failed protocol test [HTTP] at INET[WW.WW.WWW.ZZZ:80] via TCP.
** httpd Does not exist - httpd restart on XXXXXX: 'httpd' process is not running.
The last one really puzzles me, because Apache is actually running !!!!!
My monit configuration file
set daemon 180
set logfile syslog facility log_daemon set mailserver mail.company.net set mail-format { from: address@hidden subject: $SERVICE $EVENT
message: $SERVICE $ACTION on $HOST: $DESCRIPTION.} set httpd port 2812 and use address localhost # only accept connection from localhost allow localhost # allow localhost to connect to the server and
check system XXXXXX if loadavg (1min) > 50 then alert if loadavg (5min) > 75 then alert if memory usage > 90% then alert if cpu usage (user) > 99% then alert if cpu usage (system) > 99% then alert
if cpu usage (wait) > 99% then alert alert address@hidden
check file apache_bin with path /usr/sbin/httpd group apache if failed checksum then unmonitor
if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
check process httpd with pidfile /var/run/httpd.pid start program = "/etc/init.d/httpd start" as uid 0 as gid 0
stop program = "/etc/init.d/httpd stop" as uid 0 as gid 0 if cpu > 99% for 5 cycles then restart if loadavg(5min) greater than 45 for 3 cycles then restart if failed host WW.WW.WWW.ZZZ port 80 protocol HTTP request "/site/page.php" timeout 15 seconds 10 cycles then restart
if 6 restarts within 10 cycles then timeout alert address@hidden depends on apache_bin group apache
I'm pulling my hair, It doesn't work flawlesly. I receive many alerts, even when the servers are working. Thanks