monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit validate shows no errors, but monit status does


From: Martin Pala
Subject: Re: monit validate shows no errors, but monit status does
Date: Tue, 19 Jun 2007 23:20:48 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20070217 Iceape/1.1.1 (Debian-1.1.1-2)

You identified the cause ... but the fix is not correct ;)

The reason for the condition in question is, that when monit is started, it won't report the passed states for all services, since this is the default and expected state and we prefer to reduce the extra mails and send the alert only when the service was failed and then recovered.

The Event_check_state() should return TRUE only when the state changed (failed->passed or vice versa) according to the rules and considering the required event ratio.

The problem is, that when the service was failed before the reload, monit keeps the error flag but since the event state machine was in STATE_INIT and the passed message was posted, it ignored it and waited for error.

The fix is to check the error flag as well => in the case that it is set, the Event_check_state() will do its job even if the event was initialized and the state is passed.

Thanks :)
Martin

Patch (checked in cvs as well):

--8<--
diff -u -r1.64 event.c
--- event.c     3 Jan 2007 09:31:01 -0000       1.64
+++ event.c     19 Jun 2007 20:49:44 -0000
@@ -347,12 +347,16 @@
   int       i;
   int       count = 0;
   Action_T  action;
+  Service_T service;
   long long flag;

   ASSERT(E);

+  if(!(service = Event_get_source(E)))
+    return TRUE;
+
   /* Only the true failed state condition can change the initial state */
-  if(S == STATE_PASSED && E->state == STATE_INIT)
+ if(S == STATE_PASSED && E->state == STATE_INIT && !(service->error & E->id))
   {
     return FALSE;
   }
--8<--

Claus Klein wrote:
Hi,

tody, I have tried to understand what happens in my test case. It seems, that after a 'monit reload' the transion from STATE_INIT to STATE_PASSED is not correct handeled. For me is is a problem in event.c:

What is the reason for this? I found no site effect.

Claus

/**
* Return the actual event state based on event state bitmap
* and event ratio needed to trigger the state change
* @param E An event object
* @param S Actual posted state
* @return The Event raw state
*/
short Event_check_state(Event_T E, short S) {

 int       i;
 int       count = 0;
 Action_T  action;
 long long flag;

 ASSERT(E);

 /* tbd: always state change after initial state! ck */
 if(E->state == STATE_INIT)
 {
return TRUE; // with this, it works; but why is there the next statement?
 }

 /* Only the true failed state condition can change the initial state */
 if(S == STATE_PASSED && E->state == STATE_INIT)
 {
   return FALSE;
 }

 action = (S == STATE_PASSED)?E->action->passed:E->action->failed;

 /* Compare as many bits as cycles able to trigger the action */
 for(i = 0; i < action->cycles; i++)
 {
   /* Check the state of the particular cycle given by the bit position */
   flag = (E->state_map >> i) & 0x1;

   /* Count occurences of the posted state */
   if(flag == S)
   {
     count++;
   }
 }

 if(count >= action->count && S != E->state)
 {
   return TRUE;
 }
return FALSE;
}


Claus Klein schrieb:
my current disk usage is:
address@hidden:/usr/src/linux/Documentation/serial# df
Dateisystem          1K-Blöcke   Benutzt Verfügbar Ben% Eingehängt auf
/dev/hda5              5245016   4825732    419284  93% /
tmpfs                   253284        12    253272   1% /dev/shm

with the following monit  configuration:

check device rootfilesystem with path /dev/hda5
  mode passive
  if space usage > 90% then alert

address@hidden:/usr/src/linux/Documentation/serial# monit summary
The monit daemon 4.8.1 uptime: 0m

System 'localhost'                  Monit instance changed
Process 'snmptrapd'                 running
Process 'nagios2'                   running
Directory 'nagios2-command_file'    Permission failed
Device 'rootfilesystem'             Resource limit matched
Process 'nsca'                      running
Process 'apache2'                   running
Process 'ntpd'                      running
Process 'sshd'                      running
Process 'dbus'                      running
Process 'avahi-daemon'              running
Process 'avahi-dnsconfd'            not monitored
Process 'privoxy'                   running
address@hidden:/usr/src/linux/Documentation/serial# vi /etc/monit/monitrc

# changed to:
check device rootfilesystem with path /dev/hda5
  mode passive
  if space usage > 95% then alert

address@hidden:/usr/src/linux/Documentation/serial# monit reload
Reinitializing monit daemon
address@hidden:/usr/src/linux/Documentation/serial# monit summary
The monit daemon 4.8.1 uptime: 0m

System 'localhost'                  Monit instance changed
Process 'snmptrapd'                 running
Process 'nagios2'                   running
Directory 'nagios2-command_file'    Permission failed
Device 'rootfilesystem'             Resource limit matched
Process 'nsca'                      running
Process 'apache2'                   running
Process 'ntpd'                      running
Process 'sshd'                      running
Process 'dbus'                      running
Process 'avahi-daemon'              running
Process 'avahi-dnsconfd'            not monitored
Process 'privoxy'                   running
address@hidden:/usr/src/linux/Documentation/serial# monit validate

# Note: no error
# but now there is still an 'Resource limit matched':

address@hidden:/usr/src/linux/Documentation/serial# monit summary
The monit daemon 4.8.1 uptime: 0m

System 'localhost'                  Monit instance changed
Process 'snmptrapd'                 running
Process 'nagios2'                   running
Directory 'nagios2-command_file'    Permission failed
Device 'rootfilesystem'             Resource limit matched
Process 'nsca'                      running
Process 'apache2'                   running
Process 'ntpd'                      running
Process 'sshd'                      running
Process 'dbus'                      running
Process 'avahi-daemon'              running
Process 'avahi-dnsconfd'            not monitored
Process 'privoxy'                   running
address@hidden:/usr/src/linux/Documentation/serial# address@hidden:/usr/src/linux/Documentation/serial# monit status | less

# Note: the timestamp is not updated too!

Device 'rootfilesystem'
 status                            Resource limit matched
 monitoring status                 monitored
 permission                        660
 uid                               0
 gid                               6
 block size                        4096 B
 blocks total                      1311254 [5122.1 MB]
 blocks free for non superuser     104933 [409.9 MB] [8.0%]
 blocks free total                 104933 [409.9 MB] [8.0%]
 data collected                    Thu Jun 14 12:38:27 2007

address@hidden:/usr/src/linux/Documentation/serial# monit status | less

...

Device 'rootfilesystem'
 status                            Resource limit matched
 monitoring status                 monitored
 permission                        660
 uid                               0
 gid                               6
 block size                        4096 B
 blocks total                      1311254 [5122.1 MB]
 blocks free for non superuser     104911 [409.8 MB] [8.0%]
 blocks free total                 104911 [409.8 MB] [8.0%]
 data collected                    Thu Jun 14 12:41:31 2007

address@hidden:/usr/src/linux/Documentation/serial# date
Do 14. Jun 12:42:35 CEST 2007

----------------------------------

The same happens for directory status after 'Permission failed' if I fix the permission and run monit validate again!
I tested this again with the current monit release version 4.9,
but same result occured.

claus









--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general



--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]