[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: EXEC does not work properly when using CYCLES

From: Andreas Oesterer
Subject: Re: EXEC does not work properly when using CYCLES
Date: Thu, 29 Sep 2005 15:14:11 -0700

I see...
In my case I need the fuzzy logic behavior where it failed one or two tests but it is not really a reason to try to restart yet. I guess I'm using monit in the opposite way that you intended it to be used for this case. If the external service goes down, I do not try to restart it, instead I suspend my own process and resume it when the external service comes back up.
I did a little customization of monit where the handle_alert function returns a flag indicating if an email was sent or not. Depending on that flag, the spawn function is invoked or not. This brings the behavior of ALERT and EXEC in synch. For now this gives me what I need but will put me in a more difficult situation when I take the next monit upgrade ;)
Thanks, Andreas
PS: You guys are doing a great job with monit. Really useful out of the box. Even the code is clean enough to do modifications in 2 minutes.

On 9/29/05, Martin Pala <address@hidden> wrote:
This is (documented) feature - monit retries the exec and restart
actions every failed cycle.

For restart action it is possible to use the following statement to
restrict the attempts count:

'if X restarts within Y cycles then timeout'

(there's no similar alternative for exec action currently)

Snippet from monit manual:

Constant object tests are related to failed/passed state.  In the
case of error, monit will watch whether the failed parameter will
recover - in such case it will handle recovery related
action. General format:


For constant object tests if the <TEST> should validate to true,
then the selected action is executed each cycle the condition
remains true. The value for comparision is constant. Recovery
action is evalueated only once (on failed->recovered state change
only). The 'ELSE IF PASSED' part is optional - if omitted,
monit will do alert action on recovery by default. The alert is
delivered only once on each state change unless overriden by
'reminder' alert option.

It could be probably good to allow to set the action frequency in failed
state as well.

Solutions topic (?):

1.) restrict the action by some option, for example 'retry'. Examples:

... then exec '/foo'
     (=> retry each failed cycle - current monit's default)

... then exec '/foo' retry 3 times
     (=> retry for 3 consecutive cycles and then giveup)

... then exec '/foo' retry each 3 cycles
     (=> retry each 3rd cycle)

2.) add the environment variable such as MONIT_EVENT_COUNT which will
describe how many times is the service in the given failed state. The
executed script can then use this variable to modify its behavior.

3.) add the  'if X execs within Y cycles then timeout' statement


Andreas Oesterer wrote:
> I recently go the monit-4.6-beta1 and a cycle specific bug fix that
> updated event.c and util.c
> Everything works as expected when using ALERT, however EXEC causes the
> scripts to be executed too often.
> My setup:
> --------------
> set daemon  30
> check host monit_test with address < >
>     if failed port 8081 protocol http and request "/index.jsp" with
> timeout 5 seconds for 3 times within 3 cycles then EXEC
> "/root/monit_test/suspend"
>     else if passed for 3 cycles then EXEC "/root/monit_test/resume"
> Test execution
> ---------------------
> 1) Begin where the service is running
> 2) Stopped service at 14:09:31
> 3) Restarted service at 14:14:54
> My test output:
> ----------------------
> Wed Sep 28 14:09:52 PDT 2005 Resuming  *
> Wed Sep 28 14:10:22 PDT 2005 Resuming  *
> Wed Sep 28 14:10:52 PDT 2005 Suspeding    -> "failed" Email is sent
> Wed Sep 28 14:11:22 PDT 2005 Suspeding *
> Wed Sep 28 14:11:52 PDT 2005 Suspeding *
> Wed Sep 28 14:12:22 PDT 2005 Suspeding *
> Wed Sep 28 14:12:52 PDT 2005 Suspeding *
> Wed Sep 28 14:13:22 PDT 2005 Suspeding *
> Wed Sep 28 14:13:52 PDT 2005 Suspeding *
> Wed Sep 28 14:14:22 PDT 2005 Suspeding *
> Wed Sep 28 14:14:57 PDT 2005 Suspeding *
> Wed Sep 28 14:15:27 PDT 2005 Suspeding *
> Wed Sep 28 14:15:57 PDT 2005 Resuming  -> "passed" Email is sent
> I marked the log entries where the script should not have been called
> with a "*". While it starts to detect that the service is down, it
> executes the resume script and as long as the service is down it calls
> the suspend script at every cycle. There is no issue when the service is
> up at every cycle.
> Essentualy if the EXEC code would be called when it sends the emai, then
> everything would work fine.
> Thanks, Andreas
> ------------------------------------------------------------------------
> --
> To unsubscribe:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]