monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Script times out in 5.2.1 monit version


From: rhickson
Subject: Re: Script times out in 5.2.1 monit version
Date: Thu, 02 Dec 2010 11:20:19 -0600
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3

Hi all

I will try again, I could use a pointer perhaps on where to go with this. We are
convinced the new versions on Monit work differently.  If you have a service
which is constantly failing (trying to restart), other services are not allowed
to actually start (until Monit internally times out after 30 seconds).

Obviously, the core issue is to resolve the failing service, but is there a
"workaround" around to this behavior. We would "hope" other services would be
unaffected by this failure, and could start concurrently.

Any insights, or perhaps someone we could talk to, would be greatly
appreciated.

Thanks

Rich Hickson


On 11/15/2010 01:34 PM, rhickson wrote:
Hi all

This is Rich from ALU. I have a question about the 5.2.1 version, and the
behavior we see.  We upgraded from the 4.10.1 version, and we believe the
behavior is changed.  I'll try my best to explain.

When we perform an /etc/init.d/pdns start, it often times out. The reason is because a monit controlled sshd script is constantly failing. The pdns
script does not attempt to start, until Monit finishes trying to start
sshd (see logs following, search for "NOTE" for the key steps).

We believe this is new behavior, the previous version would attempt to
start pdns right away (we may remember this incorrectly).  After the sshd
script times out and fails after 15 seconds, the pdns script
executes and starts properly.

Our question, is this expected behavior? It seems to use Monit could do this
in parallel, and did so in the past.

Any insights into all this are appreciated.

Thanks

Rich Hickson
address@hidden

###############################################################################

NOTE: sshd script started by Monit
----------------------------------

[CST Nov 10 14:13:41] error : 'non_root_internal_fixed.pid' file doesn't exist [CST Nov 10 14:13:41] debug : -------------------------------------------------------------------
------------
[CST Nov 10 14:13:41] debug    :     /opt/LU3P/bin/monit [0x8053d1e]
[CST Nov 10 14:13:41] debug : /opt/LU3P/bin/monit(LogError+0x22) [0x80540e2] [CST Nov 10 14:13:41] debug : /opt/LU3P/bin/monit(Event_post+0x382) [0x8051842] [CST Nov 10 14:13:41] debug : /opt/LU3P/bin/monit(check_file+0x676) [0x8065476] [CST Nov 10 14:13:41] debug : /opt/LU3P/bin/monit(validate+0x1bf) [0x806467f]
[CST Nov 10 14:13:41] debug    :     /opt/LU3P/bin/monit [0x8056055]
[CST Nov 10 14:13:41] debug : /opt/LU3P/bin/monit(main+0x524) [0x8056914] [CST Nov 10 14:13:41] debug : /lib/libc.so.6(__libc_start_main+0xdc) [0xf7c4ce9c]
[CST Nov 10 14:13:41] debug    :     /opt/LU3P/bin/monit [0x804e931]
[CST Nov 10 14:13:41] debug : -------------------------------------------------------------------
------------
[CST Nov 10 14:13:41] info : 'non_root_internal_fixed.pid' trying to restart [CST Nov 10 14:13:41] debug : Monitoring disabled -- service non_root_internal_fixed.pid [CST Nov 10 14:13:41] debug : Monitoring enabled -- service non_root_internal_fixed.pid [CST Nov 10 14:13:41] debug : monit: pidfile '/var/opt/run/openssh/non_root_internal_fixed' does
not exist
[CST Nov 10 14:13:41] error : 'non_root_internal_fixed' process is not running [CST Nov 10 14:13:41] debug : -------------------------------------------------------------------
------------
[CST Nov 10 14:13:41] debug    :     /opt/LU3P/bin/monit [0x8053d1e]
[CST Nov 10 14:13:41] debug : /opt/LU3P/bin/monit(LogError+0x22) [0x80540e2] [CST Nov 10 14:13:41] debug : /opt/LU3P/bin/monit(Event_post+0x382) [0x8051842] [CST Nov 10 14:13:41] debug : /opt/LU3P/bin/monit(check_process+0xaa) [0x80641ca] [CST Nov 10 14:13:41] debug : /opt/LU3P/bin/monit(validate+0x1bf) [0x806467f]
[CST Nov 10 14:13:41] debug    :     /opt/LU3P/bin/monit [0x8056055]
[CST Nov 10 14:13:41] debug : /opt/LU3P/bin/monit(main+0x524) [0x8056914] [CST Nov 10 14:13:41] debug : /lib/libc.so.6(__libc_start_main+0xdc) [0xf7c4ce9c]
[CST Nov 10 14:13:41] debug    :     /opt/LU3P/bin/monit [0x804e931]
[CST Nov 10 14:13:41] debug : -------------------------------------------------------------------
------------
[CST Nov 10 14:13:41] info : 'non_root_internal_fixed' trying to restart [CST Nov 10 14:13:41] debug : Monitoring disabled -- service non_root_internal_fixed [CST Nov 10 14:13:41] debug : monit: pidfile '/var/opt/run/openssh/non_root_internal_fixed' does
not exist
[CST Nov 10 14:13:41] debug : monit: pidfile '/var/opt/run/openssh/non_root_internal_fixed' does
not exist
[CST Nov 10 14:13:41] info : 'non_root_internal_fixed' start: /etc/init.d/LU3Psshd_non_root_inte
rnal_fixed
[CST Nov 10 14:13:41] debug : monit: pidfile '/var/opt/run/openssh/non_root_internal_fixed' does
not exist
[CST Nov 10 14:13:53] debug : monit: pidfile '/var/opt/run/openssh/non_root_internal_fixed' does
not exist
[CST Nov 10 14:13:53] info : start service 'dnsproxy_proxy_ingress' on user request

NOTE: pdns_server started by user
---------------------------------

[CST Nov 10 14:13:53] info     : monit daemon at 11459 awakened
[CST Nov 10 14:13:53] debug : monit: pidfile '/var/opt/run/openssh/non_root_internal_fixed' doesnot exist [CST Nov 10 14:13:54] debug : monit: pidfile '/var/opt/run/openssh/non_root_internal_fixed' doesnot exist [CST Nov 10 14:13:56] error : 'non_root_internal_fixed' failed to start [CST Nov 10 14:13:56] debug : -------------------------------------------------------------------
------------
[CST Nov 10 14:13:56] debug    :     /opt/LU3P/bin/monit [0x8053d1e]
[CST Nov 10 14:13:56] debug : /opt/LU3P/bin/monit(LogError+0x22) [0x80540e2] [CST Nov 10 14:13:56] debug : /opt/LU3P/bin/monit(Event_post+0x382) [0x8051842]
[CST Nov 10 14:13:56] debug    :     /opt/LU3P/bin/monit [0x804f768]
[CST Nov 10 14:13:56] debug : /opt/LU3P/bin/monit(control_service+0xbb) [0x804f95b]
[CST Nov 10 14:13:56] debug    :     /opt/LU3P/bin/monit [0x8051147]
[CST Nov 10 14:13:56] debug : /opt/LU3P/bin/monit(Event_post+0x3be) [0x805187e] [CST Nov 10 14:13:56] debug : /opt/LU3P/bin/monit(check_process+0xaa) [0x80641ca] [CST Nov 10 14:13:56] debug : /opt/LU3P/bin/monit(validate+0x1bf) [0x806467f]
[CST Nov 10 14:13:56] debug    :     /opt/LU3P/bin/monit [0x8056055]
[CST Nov 10 14:13:56] debug : /opt/LU3P/bin/monit(main+0x524) [0x8056914] [CST Nov 10 14:13:56] debug : /lib/libc.so.6(__libc_start_main+0xdc) [0xf7c4ce9c]
[CST Nov 10 14:13:56] debug    :     /opt/LU3P/bin/monit [0x804e931]
[CST Nov 10 14:13:56] debug : -------------------------------------------------------------------
------------
[CST Nov 10 14:13:56] debug : Monitoring enabled -- service non_root_internal_fixed [CST Nov 10 14:13:56] debug : 'non_root_internal_fixed.status' file exists check succeeded [CST Nov 10 14:13:56] debug : 'non_root_internal_fixed.status' is a regular file [CST Nov 10 14:13:56] debug : 'non_root_internal_fixed.status' file size check succeeded [current size=0 B] [CST Nov 10 14:13:56] error : 'non_root_internal_fixed.status' timestamp was changed for /var/opt/lib/monit/non_root_internal_fixed.status [CST Nov 10 14:13:56] debug : -------------------------------------------------------------------
------------
[CST Nov 10 14:13:56] debug    :     /opt/LU3P/bin/monit [0x8053d1e]
[CST Nov 10 14:13:56] debug : /opt/LU3P/bin/monit(LogError+0x22) [0x80540e2] [CST Nov 10 14:13:56] debug : /opt/LU3P/bin/monit(Event_post+0x382) [0x8051842]
[CST Nov 10 14:13:56] debug    :     /opt/LU3P/bin/monit [0x8063e74]
[CST Nov 10 14:13:56] debug : /opt/LU3P/bin/monit(check_file+0x37d) [0x806517d] [CST Nov 10 14:13:56] debug : /opt/LU3P/bin/monit(validate+0x1bf) [0x806467f]
[CST Nov 10 14:13:56] debug    :     /opt/LU3P/bin/monit [0x8056055]
[CST Nov 10 14:13:56] debug : /opt/LU3P/bin/monit(main+0x524) [0x8056914] [CST Nov 10 14:13:56] debug : /lib/libc.so.6(__libc_start_main+0xdc) [0xf7c4ce9c]
[CST Nov 10 14:13:56] debug    :     /opt/LU3P/bin/monit [0x804e931]
[CST Nov 10 14:13:56] debug : -------------------------------------------------------------------
------------
[CST Nov 10 14:13:56] info : 'non_root_internal_fixed.status' exec: /opt/LSS/share/monit/monit_alarm

NOTE: pdns_server actually started by Monit
-------------------------------------------

[CST Nov 10 14:13:56] debug    : 'pdns.pid' file exists check succeeded
[CST Nov 10 14:13:56] debug    : 'pdns.pid' is a regular file
[CST Nov 10 14:13:56] debug : 'pdns.pid' timestamp was not changed for /var/run/pdns.pid [CST Nov 10 14:13:56] debug : 'pdns_authoritative_local' zombie check succeeded [status_flag=0000 [CST Nov 10 14:13:56] debug : 'pdns_authoritative_local.status' file exists check succeeded [CST Nov 10 14:13:56] debug : 'pdns_authoritative_local.status' is a regular file [CST Nov 10 14:13:56] debug : 'pdns_authoritative_local.status' file size check succeeded [curren
t size=0 B]
[CST Nov 10 14:13:56] debug : 'pdns_authoritative_local.status' timestamp was not changed for /var/opt/lib/monit/pdns_authoritative_local.status [CST Nov 10 14:13:56] debug : 'root_only_internal_fixed.pid' file exists check succeeded [CST Nov 10 14:13:56] debug : 'root_only_internal_fixed.pid' is a regular file [CST Nov 10 14:13:56] debug : 'root_only_internal_fixed.pid' timestamp was not changed for /var/opt/run/openssh/root_only_internal_fixed [CST Nov 10 14:13:56] debug : 'root_only_internal_fixed' zombie check succeeded [status_flag=0000
]
[CST Nov 10 14:13:56] debug : 'root_only_internal_fixed.status' file exists check succeeded [CST Nov 10 14:13:56] debug : 'root_only_internal_fixed.status' is a regular file [CST Nov 10 14:13:56] debug : 'root_only_internal_fixed.status' file size check succeeded [current size=0 B] [CST Nov 10 14:13:56] debug : 'root_only_internal_fixed.status' timestamp was not changed for /var/opt/lib/monit/root_only_internal_fixed.status
[CST Nov 10 14:13:56] info     : Awakened by User defined signal 1
[CST Nov 10 14:13:56] debug : 'dnsproxy_proxy_ingress' Error testing process id [12691] -- No such process [CST Nov 10 14:13:56] debug : 'dnsproxy_proxy_ingress' Error testing process id [12691] -- No such process [CST Nov 10 14:13:56] info : 'dnsproxy_proxy_ingress' start: /etc/init.d/LU3Pdnsproxy_proxy_ingress [CST Nov 10 14:13:56] debug : 'dnsproxy_proxy_ingress' Error testing process id [12691] -- No such process [CST Nov 10 14:13:56] debug : 'dnsproxy_proxy_ingress' Error testing process id [12691] -- No such process [CST Nov 10 14:13:57] debug : 'dnsproxy_proxy_ingress' Error testing process id [12691] -- No such process [CST Nov 10 14:13:58] debug : Monitoring enabled -- service dnsproxy_proxy_ingress [CST Nov 10 14:13:58] info : 'dnsproxy_proxy_ingress' start action done







reply via email to

[Prev in Thread] Current Thread [Next in Thread]