[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: automatic resume of monitoring, is it possible?

From: John (yt) Hogenmiller
Subject: Re: automatic resume of monitoring, is it possible?
Date: Sun, 27 Feb 2011 20:39:27 -0500

Thanks for the clarification.

So in this case, instead of telling the dependent (file2) about the
parent (file1), I would have the parent (file1) start and stop
monitoring on its dependents (file2).

I could see that working.  In my case though, I would probably have to
define groups and start and stop monitoring on a group.  I have one
device that connects almost everything together.    I actually have a
sort of cascade going on.  One access point has three subscriber units
connected to it, and then each subscriber unit has its own access
point attached.  One of the subscriber units has a router and a server
behind it.

Here is my network layout if the formatting holds up

                                              / -> fv3su -> fv3ap
monit server -> fvinside -> fv1ap  -> fv4su  -> fv4ap
                                              \-> fv2su -> fv2ap
                                                       \-> fvoffice ->

Again, these are all physically discrete devices with no way to
automatically restart them.    The biggest one is if FV1 or FVINSIDE
goes down, we'll get 8-9 other devices also showing down.  If FV2SU
goes down, only 3 other devices show down.

I could perhaps create some monitoring like so (going back to the file example):

check file1 with path /tmp/file1
  if failed permission 555 then exec "/usr/sbin/monit start file1recover"
  if failed permission 555 then stop

check file1recover with path /tmp/file1
  if succeeded permission 555 for 2 cycles then  exec "/usr/bin/monit
start file1"
  if succeeded permission 555 for 2 cycles then  exec "/usr/bin/monit
-g subfiles start"
  if succeeded permission 555 for 2 cycles then  exec "/usr/bin/monit
stop file1recover"

check file2 with path /tmp/file2
  if failed permission 555 then alert
  group "subfiles"
  depends "file1"

check file3 with path /tmp/file3
  if failed permission 555 then alert
  group "subfiles"
  depends "file1"

I  haven't had a chance to test this yet, does monit have any issues
with multiple checks being the same?  Any other suggestions would be
appreciated.  I've been working with nagios and mrtg on this network
already.  Nagios even has a really nice network map built in.
However, I like the straightforward configuration presented with
monit, and I even like the list of status up/downs monit provides on
the web interface.  With nagios, it might show all services as
up/green on the network map, but it's not until you click on a
specific service that you see that 1 service (like ssh) is timing out.
  Also, I'm running the monitoring on a system with 128MB of memory,
so lean and fast is good.


On Sun, Feb 27, 2011 at 12:38 PM, Martin Pala <address@hidden> wrote:
> Hello,
> The action "monitor" really doesn't exist - i have fixed the documentation. 
> The "monitor" action wouldn't make sense, as the service is monitored already.
> The "stop" action stops the service and disables monitoring => monit doesn't 
> check the service anymore until the monitoring is enabled again (using "monit 
> monitor ... or "monit start ...").
> The setup which should work in your case:
> --8<--
> check file file1 with path "/tmp/file1"
>    if failed permission 555 then exec "/usr/bin/monit stop file2" else if 
> succeeded then exec "/usr/bin/monit start file2"
> check file file2 path "/tmp/file2"
>    if failed permission 555 then alert
> --8<--
> => if the permissions fail, the "file2" service is stopped, but the 
> monitoring of "file1" service continues. If "file1" recovers, the "file2" is 
> started again.
> Regards,
> Martin
> On Feb 27, 2011, at 1:58 AM, John (yt) Hogenmiller wrote:
>> Hello list,
>> I've been playing with monit in hopes of using it to monitor a
>> wireless installation.  At first, it looked like
>> it was doing ok, but then I noticed the "depends on" wasn't working as
>> I had hoped.  If deviceA is unreachable, deviceB
>> and deviceC will also be unreachable, so I setup my depends on
>> accordingly, but I still got alerts for all three services.
>> After looking further into the documentation, it seems "depends on"
>> requires monitoring to be stopped on a service for the depends
>> on service to stop monitoring.  That's fine, but I'm looking for a way
>> to restart monitoring automatically.    In our scenario, if a device
>> goes
>> unpingable, someone would have to physically power cycle it to bring
>> it back online (or potentially replace the device).
>> The documentation wasn't too clear (at least to me) on a way to
>> configure monit this way, so setup an instance that
>> polled every 10 seconds and monitored two files.  All the steps I took
>> are below.  If anyone can look at my testing and offer advice,
>> I'd appreciate it.  Perhaps I'm reading the documentation wrong, or
>> perhaps there's just now way to do what I'm trying (perhaps
>> M/Monit has such capabilities).
>> I originally tested under 5.0.3 (latest with Ubuntu/apt-get), but then
>> upgraded to 5.2.4 hoping for different results.
>> First, my checks:
>>       check file file1 with path "/tmp/file1"
>>              if failed permission 555 then unmonitor
>>               # manul implies that I can do "else if succeeded then 
>> monitor", but
>> this fails syntax
>>                else if succeeded then alert
>>       check file file2 path "/tmp/file2"
>>          if failed permission 555 then alert
>>          depends on file1
>> changing /tmp/file1 to 500 does indeed stop monitoring on file1 and file2
>> [EST Feb 26 13:30:47] debug    : monitor service 'file1' on user request
>> [EST Feb 26 13:30:47] info     : Awakened by User defined signal 1
>> [EST Feb 26 13:30:47] info     : monit daemon at 31932 awakened
>> [EST Feb 26 13:30:47] info     : 'file1' monitor action done
>> On a lark, I updated my config like so:
>>       check file file1 with path "/tmp/file1"
>>               if failed permission 555 then stop
>>               else if succeeded then start
>>       check file file2 path "/tmp/file2"
>>               if failed permission 555 then alert
>>               depends on file1
>> Upon changing file1 to 500, both services went into not monitored
>> Upong changing file1 back to 555, services did not resume.  If
>> manually tell it to start monitoring file1, file2 does not
>> automatically begin monitoring again.
>> Other notes:
>> I had a whole bug report showing that you can't restart monitoring a
>> service from the command line, but I realised that was a bug
>> in 5.0.3, which is the latest Ubuntu provides, but this was fixed once
>> I downloaded 5.2.4.   I only mention this for anyone else using monit
>> from the Ubuntu repositories.
>> --
>> To unsubscribe:
> --
> To unsubscribe:

John Hogenmiller - address@hidden
Used for mailing lists - sporadic response

reply via email to

[Prev in Thread] Current Thread [Next in Thread]