monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Monit not restarting a service reliably


From: Jan Rychter
Subject: Monit not restarting a service reliably
Date: Fri, 31 May 2019 10:14:49 -0700

Hi,

I'm looking for help, because I can't figure out what I'm doing wrong. I have a 
simple monit setup, which is supposed to monitor a web server and restart it if 
anything seems wrong.

This seems to work but not always. Monit does restart the service, but on 
subsequent failures it just notices that the service isn't working and doesn't 
act anymore.

Example from the log, where the service was restarted, but went down again, and 
monit didn't do anything:

[CEST May 31 06:44:11] info     : 'triac.mysite.com' Monit 5.16 started
[CEST May 31 09:36:29] error    : 'mysite.com' failed protocol test [HTTP] at 
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
temporarily unavailable
[CEST May 31 09:37:39] error    : 'mysite.com' failed protocol test [HTTP] at 
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
temporarily unavailable
[CEST May 31 09:37:39] info     : 'mysite.com' exec: /usr/bin/supervisorctl
[CEST May 31 09:38:49] error    : 'mysite.com' failed protocol test [HTTP] at 
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
temporarily unavailable
[CEST May 31 09:39:59] error    : 'mysite.com' failed protocol test [HTTP] at 
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
temporarily unavailable
[CEST May 31 09:41:09] error    : 'mysite.com' failed protocol test [HTTP] at 
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
temporarily unavailable
[CEST May 31 09:42:19] error    : 'mysite.com' failed protocol test [HTTP] at 
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
temporarily unavailable
[CEST May 31 09:43:29] error    : 'mysite.com' failed protocol test [HTTP] at 
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
temporarily unavailable
[CEST May 31 09:44:39] error    : 'mysite.com' failed protocol test [HTTP] at 
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
temporarily unavailable
[CEST May 31 09:45:50] error    : 'mysite.com' failed protocol test [HTTP] at 
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
temporarily unavailable
[CEST May 31 09:47:00] error    : 'mysite.com' failed protocol test [HTTP] at 
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
temporarily unavailable
[CEST May 31 09:48:10] error    : 'mysite.com' failed protocol test [HTTP] at 
[mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
temporarily unavailable

The net result is that the service doesn't work and monit just sits there, 
knowing that the service failed the protocol test, but doing nothing about it.

I suspect this is because monit does not notice that the service was OK after 
restarting for a moment, so it does not notice another transition from OK to 
failed.

Here is the relevant part of the configuration (nearly all of it):

set daemon 60
check host mysite.com with address mysite.com
if failed
  port 443
  protocol https
  with ssl options {verify: enable}
  for 2 cycles
then exec "/usr/bin/supervisorctl restart mysite"
if 20 restarts within 60 cycles then unmonitor

Is there a way to achieve unconditional actions? E.g. "even though I haven't 
noticed the service to transition from failed to working, restart it anyway 
after 60 seconds if it is still in the failed state"

Any help would be much appreciated.

--J.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]