monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Monitoring Apache mod_status (was Apache, rotatelogs & chroot enviro


From: Jan-Henrik Haukeland
Subject: Re: Monitoring Apache mod_status (was Apache, rotatelogs & chroot environment)
Date: Mon, 13 Dec 2004 16:49:39 +0100

I've added your patch with some of the modifications mentioned below. The patch was added untested so please let me know if I messed up! I changed the syntax a bit so you can write stuff like this:

  if failed host www.foo.bar port 80 with protocol apache-status
                 loglimit > 60% or
                 keepalivelimit > 50% or
                 waitlimit < 20%
  then alert


On Dec 10, 2004, at 23:09, David Fletcher wrote:

Hi,

Reading your responses is useful, there are some good ideas.

BTW, I see a shortcoming in the protocol test interface. I think there
should be a way to kickback error reports to validate.c so it can be
included in the alert. Now lots of interesting errors can only be
logged. We can change the signature of a protocol test to: int
check_foobar(Socket_T s, char **errors); Where the protocol-test can
allocate an error string upon errors and assign it to the errors
parametere. validate.c will use this error string in the alert if it is
non-null and validate.c is also responsible for freeing the
errors-buffer. What do you think?

I agree that giving more information in the alert is a good idea, since currently the infomation is going to the logs, but the alerts are just general for the protocol. I think this change needs input from everyone, not just me!

1.) the limits in the patch are defined as percentage, but it is not
obvious at first sigth. Currently '%' character is used in monit control
file for other tests where percentage limit is supported (cpu, memory,

This sounds a good idea. I used percentages since they cope with
changes in the total number of Apache children, but making it clearer would be
good.

2.) it could be good to support comparision operators as well, so it
will be possible to use various combinations. It will be more consitent
with other tests syntax too (such as in the case of 'space' example
above). We can then check for example that there are always 10% child
processes waiting for connection (i.e. ready to serve requests immediately):

   if failed port 80
      protocol apache-status waitlimit < 10% then alert

This will allow to stack the actions too based on various error levels:

   if failed port 80
      protocol apache-status loglimit > 50% then alert
   if failed port 80
      protocol apache-status loglimit > 90% then restart


These comparisons are already there, but not given in the control file.
For all the monitored limits, except waitlimit, an action is taken when the measured quantity exceeds the limit. For waitlimit, an action is triggered
when the measured quantity is below the waitlimit.

The 'escalation' approach of alert and restart is a good idea, but I haven't tried it yet. I agree that some other name for the *limits would be good in
this case, perhaps *trigger or *level would be good.

I have other work to do until after Christmas at the earliest, so I won't make any changes now. It has taken quite a lot of work to get the patch this far!! What do people want to do? I am happy if other people want to make
changes, and integrate the patch better with the rest of Monit.

Regards,

David.

--
-------------------------------------------------
Email: address@hidden
-------------------------------------------------


_______________________________________________
monit-dev mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monit-dev


--
Jan-Henrik Haukeland
Mobil +47 97141255





reply via email to

[Prev in Thread] Current Thread [Next in Thread]