monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit 4.2 release?


From: Jan-Henrik Haukeland
Subject: Re: monit 4.2 release?
Date: Mon, 16 Feb 2004 18:34:41 +0100
User-agent: Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.4 (Reasonable Discussion, linux)

Martin Pala <address@hidden> writes:

> 0.) new syntax:
> ---------------
>
> To the IF...THEN statement is added ELSE statement, which specifies
> action in common syntax, dummy example:
>
>
> #    if failed port 443 type tcpssl proto http with timeout 15 seconds
> #       then restart
> #       else alert
>
> => in the case that connecion failed, monit will restart service and
> send alert. As soon as the service is up again, it will send alert.
>
> Note: "else alert" part is implicit => it is not needed to write
> it. ELSE statement has sense in the case, that you need to do other
> action (for example EXEC) in the case that the service is up again,
> like:
>
> #    if failed port 22 proto ssh with timeout 15 seconds
> #       then restart
> #       else exec "/bin/sms_send 'ssh is up again'"

Assume:

 if failed 
    then do X
 else 
    do Y

The logical way to read this is that if a test failed then do X and if
not do Y. That is, whenever the test is okay, Y is executed. 

It may be that adding an ELSE part to an IF statement is a good thing,
but I do feel that the ELSE part should *not* be associated with an
up-event (An up-event in this context is an alert event sent when a
service that was down comes back up again.)

To use my favorite word of the week, it is not orthogonal.  That is,
assuming I understand you correctly, the meaning of the ELSE statement
has a "hidden" meaning; it is _only_ called when an IF-test is TRUE
again. This hidden meaning is not evident when reading a IF-ELSE
statement.

That said, I'm not sure what the best way to register interest in an
up-event is. Maybe the simplest and most logical is to associate it
with an alert statement like so:

check ...
  if failed port 22 proto ssh with timeout 15 seconds
     then alert and alert on comeback
  if failed port 21 proto ftp with timeout 15 seconds
     then exec "..." and alert on comeback

In other words, "ALERT ON COMEBACK" is the statement to use if you
want to get an alert message when a service comes back online again
*after* it went down.

Thoughts?

> * ... lot of work, insufficient time. I didn't wanted to spent much
> time now (to not freeze my project), so if you preffer simplier
> solution, i'll by happy :)

First thanks for a very thorough description with many good ideas. And
as far as I can see your suggestions handle the current short commings
of the event model, such as encapsulating the event handling action,
which we now perform in the source code (which is pretty ugly). 

But as you just said, it will take some time to implement this and
maybe we can go for a simpler solution first :) For the up-event I was
simply thinking about adding something like this throughout
validate.c. It does not handle process up but maybe you can think of a
clever solution?



       for(pr= s->resourcelist; pr; pr= pr->next) {
        if(!check_process_resources(s, pr, report)) {
          pr->cycle=0;
          if(! pr->event_handled) {
            pr->event_flag= TRUE;
            pr->event_handled= TRUE;
            if(! eval_actions(pr->action, s, report, "resource",
                              EVENT_RESOURCE)) {
              reset_resource_counter(s);
              return FALSE;
            }
          }
        } else {
          if(pr->event_handled) {
-->          /* The service is comming up again */
             Event_post(s, EVENT_UP, "Event: '%s' Passed resource test\n", 
s->name);
          }
          sl->event_handled= FALSE;
        }
       }

-- 
Jan-Henrik Haukeland




reply via email to

[Prev in Thread] Current Thread [Next in Thread]