[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Monit not restarting a service reliably
From: |
Jan Rychter |
Subject: |
Monit not restarting a service reliably |
Date: |
Fri, 31 May 2019 10:14:49 -0700 |
Hi,
I'm looking for help, because I can't figure out what I'm doing wrong. I have a
simple monit setup, which is supposed to monitor a web server and restart it if
anything seems wrong.
This seems to work but not always. Monit does restart the service, but on
subsequent failures it just notices that the service isn't working and doesn't
act anymore.
Example from the log, where the service was restarted, but went down again, and
monit didn't do anything:
[CEST May 31 06:44:11] info : 'triac.mysite.com' Monit 5.16 started
[CEST May 31 09:36:29] error : 'mysite.com' failed protocol test [HTTP] at
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
temporarily unavailable
[CEST May 31 09:37:39] error : 'mysite.com' failed protocol test [HTTP] at
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
temporarily unavailable
[CEST May 31 09:37:39] info : 'mysite.com' exec: /usr/bin/supervisorctl
[CEST May 31 09:38:49] error : 'mysite.com' failed protocol test [HTTP] at
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
temporarily unavailable
[CEST May 31 09:39:59] error : 'mysite.com' failed protocol test [HTTP] at
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
temporarily unavailable
[CEST May 31 09:41:09] error : 'mysite.com' failed protocol test [HTTP] at
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
temporarily unavailable
[CEST May 31 09:42:19] error : 'mysite.com' failed protocol test [HTTP] at
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
temporarily unavailable
[CEST May 31 09:43:29] error : 'mysite.com' failed protocol test [HTTP] at
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
temporarily unavailable
[CEST May 31 09:44:39] error : 'mysite.com' failed protocol test [HTTP] at
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
temporarily unavailable
[CEST May 31 09:45:50] error : 'mysite.com' failed protocol test [HTTP] at
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
temporarily unavailable
[CEST May 31 09:47:00] error : 'mysite.com' failed protocol test [HTTP] at
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
temporarily unavailable
[CEST May 31 09:48:10] error : 'mysite.com' failed protocol test [HTTP] at
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
temporarily unavailable
The net result is that the service doesn't work and monit just sits there,
knowing that the service failed the protocol test, but doing nothing about it.
I suspect this is because monit does not notice that the service was OK after
restarting for a moment, so it does not notice another transition from OK to
failed.
Here is the relevant part of the configuration (nearly all of it):
set daemon 60
check host mysite.com with address mysite.com
if failed
port 443
protocol https
with ssl options {verify: enable}
for 2 cycles
then exec "/usr/bin/supervisorctl restart mysite"
if 20 restarts within 60 cycles then unmonitor
Is there a way to achieve unconditional actions? E.g. "even though I haven't
noticed the service to transition from failed to working, restart it anyway
after 60 seconds if it is still in the failed state"
Any help would be much appreciated.
--J.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Monit not restarting a service reliably,
Jan Rychter <=