[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Restart timer for checking services
From: |
David Paper |
Subject: |
Restart timer for checking services |
Date: |
Wed, 7 Aug 2013 14:04:13 -0400 |
Greetings,
I've dug through the monit docs, examples and changelog from 5.2.3 to 5.5.1,
and I am unable to find a reference to this problem. Here is what I am seeing.
Using Monit 5.2.3 on RedHat linux 5.4 86_x64 platform.
I have a process that locks up due to out of memory (java) and monit tries to
stop/start it. When I manually stop/start the process, monit waits the 180
seconds before it begins testing, and can test successfully. The job works as
defined. The process takes more than 2 minutes to come online and start
listening for TCP requests. What doesn't work is that the monit restart
functionality appears to immediately test the port 1 second after restart,
again at 1 minute after restart, then sensing the process isn't working
correctly, tries to restart it, and the sequence begins all over. If I didn't
know better, I would say that Monit is ignoring the defined time/cycle settings
on a restart.
My monit job for this process looks like this:
check process jboss-ssp with pidfile /var/run/jboss/jboss-sspnode.pid
start program = "/opt/jboss/bin/monit_run.sh -c sspnode -b 10.91.51.32
-g ssp-io-lp1 -u 239.255.150.1 -Djboss.messaging.ServerPeerID=1"
as uid 349 and as gid 349 with timeout 180 seconds
stop program = "/bin/bash -c 'kill -9 `cat
/var/run/jboss/jboss-sspnode.pid`'"
as uid 349 and as gid 349
if failed host 10.91.51.141 port 8080 for 5 times within 5 cycles then
alert
if failed host 10.91.51.141 port 8080 for 5 times within 5 cycles then
restart
Here is my monitrc:
set daemon 60 # check services at 1-minute intervals
with start delay 60 # optional: delay the first check by 1-minute
set logfile syslog facility log_daemon
set idfile /var/run/monit.id
set statefile /var/run/monit.state
set mailserver smartmail.mydomain.com, # primary mailserver
set eventqueue
basedir /opt/monit/eventqueue #set the base directory where events will be
stored
slots 100 # optionally limit the queue size
set alert address@hidden # receive all alerts
set httpd port 2812 and
use address localhost # only accept connection from localhost
allow localhost # allow localhost to connect to the server and
include /opt/monit/monit.d/*
The syslog messages that show monits behavior:
Aug 7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
connection to INET[10.91.51.141:8080] via TCP
Aug 7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
Aug 7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
Aug 7 04:02:27 stdeciovag1 monit[4111]: 'jboss-ssp' start:
/opt/jboss/bin/monit_run.sh
Aug 7 04:02:27 stdeciovag1 logger: Running /opt/jboss/bin/run.sh
Aug 7 04:02:28 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
connection to INET[10.91.51.141:8080] via TCP
Aug 7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
connection to INET[10.91.51.141:8080] via TCP
Aug 7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
Aug 7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
Aug 7 04:03:29 stdeciovag1 monit[4111]: 'jboss-ssp' start:
/opt/jboss/bin/monit_run.sh
Aug 7 04:03:29 stdeciovag1 logger: Running /opt/DECE_jboss/bin/run.sh
Aug 7 04:03:30 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
connection to INET[10.91.51.141:8080] via TCP
Aug 7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
connection to INET[10.91.51.141:8080] via TCP
Aug 7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
Aug 7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
Aug 7 04:04:31 stdeciovag1 monit[4111]: 'jboss-ssp' start:
/opt/jboss/bin/monit_run.sh
….
This goes on forever until someone manually intervenes and stops and starts the
monit job manually.
Any help/guidance would be appreciated.
Thanks,
-dave
- Restart timer for checking services,
David Paper <=