monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Restart timer for checking services


From: David Paper
Subject: Restart timer for checking services
Date: Wed, 7 Aug 2013 14:04:13 -0400

Greetings,

I've dug through the monit docs, examples and changelog from 5.2.3 to 5.5.1, 
and I am unable to find a reference to this problem.  Here is what I am seeing. 
 Using Monit 5.2.3 on RedHat linux 5.4 86_x64 platform.  

I have a process that locks up due to out of memory (java) and monit tries to 
stop/start it. When I manually stop/start the process, monit waits the 180 
seconds before it begins testing, and can test successfully.  The job works as 
defined.  The process takes more than 2 minutes to come online and start 
listening for TCP requests.    What doesn't work is that the monit restart 
functionality appears to immediately test the port 1 second after restart, 
again at 1 minute after restart, then sensing the process isn't working 
correctly, tries to restart it, and the sequence begins all over.   If I didn't 
know better, I would say that Monit is ignoring the defined time/cycle settings 
on a restart.

My monit job for this process looks like this:

check process jboss-ssp with pidfile /var/run/jboss/jboss-sspnode.pid
        start program = "/opt/jboss/bin/monit_run.sh -c sspnode -b 10.91.51.32 
-g ssp-io-lp1 -u 239.255.150.1 -Djboss.messaging.ServerPeerID=1" 
                as uid 349 and as gid 349 with timeout 180 seconds
        stop program = "/bin/bash -c 'kill -9 `cat 
/var/run/jboss/jboss-sspnode.pid`'"
                as uid 349 and as gid 349 
        if failed host 10.91.51.141 port 8080 for 5 times within 5 cycles then 
alert
        if failed host 10.91.51.141 port 8080 for 5 times within 5 cycles then 
restart

Here is my monitrc:

set daemon  60            # check services at 1-minute intervals
     with start delay 60  # optional: delay the first check by 1-minute
set logfile syslog facility log_daemon                       
set idfile /var/run/monit.id
set statefile /var/run/monit.state
set mailserver smartmail.mydomain.com,               # primary mailserver
set eventqueue
     basedir /opt/monit/eventqueue #set the base directory where events will be 
stored
     slots 100           # optionally limit the queue size
set alert address@hidden                # receive all alerts
set httpd port 2812 and
    use address localhost  # only accept connection from localhost
    allow localhost        # allow localhost to connect to the server and
include /opt/monit/monit.d/*

The syslog messages that show monits behavior:

Aug  7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a 
connection to INET[10.91.51.141:8080] via TCP
Aug  7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
Aug  7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
Aug  7 04:02:27 stdeciovag1 monit[4111]: 'jboss-ssp' start: 
/opt/jboss/bin/monit_run.sh
Aug  7 04:02:27 stdeciovag1 logger: Running /opt/jboss/bin/run.sh
Aug  7 04:02:28 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a 
connection to INET[10.91.51.141:8080] via TCP
Aug  7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a 
connection to INET[10.91.51.141:8080] via TCP
Aug  7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
Aug  7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
Aug  7 04:03:29 stdeciovag1 monit[4111]: 'jboss-ssp' start: 
/opt/jboss/bin/monit_run.sh
Aug  7 04:03:29 stdeciovag1 logger: Running /opt/DECE_jboss/bin/run.sh
Aug  7 04:03:30 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a 
connection to INET[10.91.51.141:8080] via TCP
Aug  7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a 
connection to INET[10.91.51.141:8080] via TCP
Aug  7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
Aug  7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
Aug  7 04:04:31 stdeciovag1 monit[4111]: 'jboss-ssp' start: 
/opt/jboss/bin/monit_run.sh
….
 
This goes on forever until someone manually intervenes and stops and starts the 
monit job manually.

Any help/guidance would be appreciated.

Thanks,

-dave







reply via email to

[Prev in Thread] Current Thread [Next in Thread]