monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Check program problem


From: Dmitry Zamaruev
Subject: Re: Check program problem
Date: Mon, 19 Nov 2012 18:12:04 +0200

Nope, this will just increase poll interval for particular service, so service will be restarted twice, but with increased time between restarts :)

Assuming we have some service running with PID=10 (in /tmp/file.pid), and script that checks if process mentioned in /tmp/file.pid have less then 100 threads, if not - return 1.

Poll (cycle) #1:
- /tmp/script.sh is run against /tmp/file.pid (contains 10) and returns 1, but this value is not collected by monit until next cycle

Poll #2:
- monit collects status#1, fires event that 'status != 0'
- BEFORE processing event /tmp/script.sh is run again (/tmp/file.pid still contains 10) and return value is 1 again, and again it is postponed till next poll period
- monit process exec action (because status#1 == 1) and restart service (now /tmp/file.pid will contain 20 for example)

Poll #3:
- monit collects status#2, fires event that 'status != 0' - but service was already restarted at #2 and this is obsolete value!
- before processing event /tmp/script.sh is run against /tmp/file.pid (contains 20) and returns 0 (because it is fresh process)
- monit process exec action (because status#2 == 1) and restart service (now /tmp/file.pid will contain 30 for example)

Poll #4:
- monit collects status#3 and see that it is ok


So the problem is that 'check program' result is one step behind than other actions, and at some point in time (poll #3) it uses obsolete information to perform actions.



On Mon, Nov 19, 2012 at 5:43 PM, Jan-Henrik Haukeland <address@hidden> wrote:
I'm not sure I understand the problem, but that does not prevent me from having a suggestion :) I'm wondering if the every statement could help in this situation? As in:

check program with path '/tmp/script.sh'
  every 2 cycles
  if status != 0 then exec '/tmp/some_service.sh restart'

Any luck with that?


On Nov 19, 2012, at 12:12 PM, Dmitry Zamaruev <address@hidden> wrote:

> Hi,
>
> I'm using 'check program' to monitor thread leak in one of our applications. All is working nice, except that application is always restarted twice. I dig through source code and found that it should be related to how 'check program' is handled.
> Here is my configuration example:
>
> check program with path '/tmp/script.sh'
>   if status != 0 then exec '/tmp/some_service.sh restart'
>
> Here is the workflow I'm seeing:
>
> - Poll period #1:
>   - start /tmp/script.sh
>
> - Poll period #2:
>   - collect exit code from /tmp/script.sh
>   - raise event with status = 1
>   - start /tmp/script.sh  <<== problem here, script is run against service before restart! so it will return status=1
>   - process event - exec '/tmp/some_service.sh restart'
>
> - Poll period #3
>   - collect exit code from /tmp/script.sh
>   - raise event with status = 1
>   - start /tmp/script.sh  <<== here script is run against fresh service after restart at step #2
>   - process event - exec '/tmp/some_service.sh restart'
>
> - Poll period #4
>   - collect exit code from /tmp/script.sh
>   - exit status == 0, so all ok now
>
> If I try to use different condition, for example 'status == 1 for 2 cycles' - this event chain will be just longer, i.e. after two failures it will restart application, but because next poll cycle is also "failure" - three failed cycles, monit will still successfully match against 'status == 1 for 2 cycles'.
>
> Is there any way to workaround double restart (time for restart is up to 15-20 seconds) using monit configuration, either ignoring exit status on some step,  or writing some special condition ?
>
> wbr,
> Dmitry.



--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general


reply via email to

[Prev in Thread] Current Thread [Next in Thread]