monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: <service> start Generates email noise


From: Aaron Scamehorn
Subject: RE: <service> start Generates email noise
Date: Tue, 12 Dec 2006 09:43:59 -0600

Hi Martin,

Thanks for the detailed description...

I've attached my monitrc file.  Obviously the executable (logClient) is
an in-house exe, but that shouldn't matter, should it?

I wonder if it is due to the amount of time it takes for the exe to
update it's pid file...

Seems as though it has something to do with the wait_start starting it's
own thread to wait???

The scenario is rather simple.  I can reproduce by stopping the service,
then issuing a monit <service> start via the CLI.

If this is not enough detail, or I can help out more, please let me
know.

Thanks,
Aaron
 

-----Original Message-----
From: address@hidden
[mailto:address@hidden On
Behalf Of Martin Pala
Sent: Wednesday, December 06, 2006 8:50 AM
To: The monit developer list
Subject: Re: <service> start Generates email noise

I have looked on it ...

I will first explain how it works in monit 4.8.2:

Two threads come into play:
- http thread
- monitoring thread

The http thread process the user requested actions (posted either using 
CLI or HTML interface). The action to be done is scheduled in 
http/cervlet.c:handle_action() via setting of the s->doaction flag for 
the appropriate service. When there is no action scheduled, the 
s->doaction flag is set to ACTION_IGNORE (in p.y during service 
initialization or in validate.c after it was handled). In addition the 
Run.doaction is set to TRUE just to signalize that there is some 
scheduled action in the service tree. The main monitoring thread is then

wake up by http thread to speedup the action handling.

The main thread then in validate.c:validate() checks whether the 
Run.doaction flag is set, since the user actions are preferred. In the 
case that it is set, it walks the service tree and for each service 
performs the scheduled s->doaction using control_service() and then 
resets the s->doaction flag to ACTION_IGNORE. This is all done under 
mutex and signal protection, so it cannot be interrupted nor race 
condition can occure. The only thread which can call control_service and

physicaly start/restart/etc. the service is the main thread. The 
control_service also sets the s->visited flag.

The second service loop is then evaluated - monit walks the service 
tree, for each service locks mutex and blocks signals. In the case that 
the service was not handled in the same cycle already (s->visited flag 
is compared in the check_skip) it checks the s->doaction flag again (to 
improve the response time for the services, which has scheduled action 
in between the first and second loop in the same cycle). In the case 
that it is set, it performs the action, otherwise it checks the service.


The design is similar to signal handling. The http thread just sets the 
flag, whereas the monitoring thread handle the action. From theory point

of view, i think no race condition could occure.

I tried to reproduce the problem (official monit-4.8.2 release) without 
success.

Can you prepare simple monit configuration and procedure for problem 
reproduction?

Thanks,
Martin



Aaron Scamehorn wrote:
> Hi Martin,
> 
> Actually I think you've now got one thread doing an ACTION_START, and
> another doing an ACTION_RESTART on the exact same service.
> 
> It is the ACTION_RESTART that is generating what I perceived to be
> extraneous emails. 
> 
> It looks like the do_wakeupcall that you added to
> http/cervlet.c:handle_action() is the culprit.  Without it, I don't
get
> the ACTION_RESTART problem.
> 
> Of course you need this now, or else it takes Poll Time to actully
> respond to the HTTP events, which is what you were trying to speed up
in
> the first place.
> 
> Here is the log output, with a bunch of extra messages, including
> pthread_t.
> 
> 3086927552 [CST Dec  1 14:53:25] debug    : 'data_dir' filesystem
flags
> has not changed since last cycle
> 3086927552 [CST Dec  1 14:53:25] debug    : 'data_dir' space usage
check
> passed [current space usage=10.6%]
> 3086924720 [CST Dec  1 14:53:26] info     : monit daemon at 24175
> awakened
> 3086927552 [CST Dec  1 14:53:26] info     : Awakened by User defined
> signal 1
> 3086927552 [CST Dec  1 14:53:26] debug    : control_service:
> ACTION_START for 'LogClient'
> 3086927552 [CST Dec  1 14:53:26] debug    : control_service:
> ACTION_START Util_isProcessRunning for 'LogClient'
> 3086927552 [CST Dec  1 14:53:26] debug    : 'LogClient' Error testing
> process id [24220] -- No such process
> 3086927552 [CST Dec  1 14:53:26] debug    : do_start:
> Util_isProcessRunning for 'LogClient'
> 3086927552 [CST Dec  1 14:53:26] debug    : 'LogClient' Error testing
> process id [24220] -- No such process
> 3086927552 [CST Dec  1 14:53:26] info     : 'LogClient' start:
> /cogcap/ccts/bin/logclnt
> 3086927552 [CST Dec  1 14:53:26] debug    : 'LogClient' Error testing
> process id [24220] -- No such process
> 3086927552 [CST Dec  1 14:53:26] debug    : Monitoring enabled --
> service LogClient
> 3086927552 [CST Dec  1 14:53:26] debug    : check_process: calling
> Util_isProcessRunning for 'LogClient'
> 3086927552 [CST Dec  1 14:53:26] debug    : 'LogClient' Error testing
> process id [24220] -- No such process
> 3086927552 [CST Dec  1 14:53:26] error    : 'LogClient' process is not
> running
> 3086927552 [CST Dec  1 14:53:26] debug    : Does not exist
notification
> is NOT sent to address@hidden
> 3086927552 [CST Dec  1 14:53:26] debug    : Does not exist
notification
> is sent to address@hidden
> 3076434864 [CST Dec  1 14:53:26] debug    : static void* wait_start
for
> 'LogClient'
> 3076434864 [CST Dec  1 14:53:26] debug    : 1) wait_start: calling
> Util_isProcessRunning for 'LogClient', max_tries= 29
> 3076434864 [CST Dec  1 14:53:26] debug    : 'LogClient' Error testing
> process id [24220] -- No such process
> 3086927552 [CST Dec  1 14:53:26] debug    : control_service:
> ACTION_RESTART for 'LogClient'
> 3086927552 [CST Dec  1 14:53:26] info     : 'LogClient' trying to
> restart
> 3086927552 [CST Dec  1 14:53:26] debug    : Monitoring disabled --
> service LogClient (stop)
> 3086927552 [CST Dec  1 14:53:26] debug    : do_stop:
> Util_isProcessRunning for 'LogClient'
> 3086927552 [CST Dec  1 14:53:26] debug    : 'LogClient' Error testing
> process id [24220] -- No such process
> 3086927552 [CST Dec  1 14:53:26] debug    : 'data_dir' filesystem
flags
> has not changed since last cycle
> 3086927552 [CST Dec  1 14:53:26] debug    : 'data_dir' space usage
check
> passed [current space usage=10.6%]
> 3076434864 [CST Dec  1 14:53:27] debug    : 1) wait_start: calling
> Util_isProcessRunning for 'LogClient', max_tries= 28
> 3076434864 [CST Dec  1 14:53:27] debug    : 2) wait_start: calling
> Util_isProcessRunning for 'LogClient'
> 3086927552 [CST Dec  1 14:53:56] debug    : check_process: calling
> Util_isProcessRunning for 'LogClient'
> 3086927552 [CST Dec  1 14:53:56] info     : 'LogClient' process is
> running with pid 24375
> 3086927552 [CST Dec  1 14:53:56] debug    : Exists notification is NOT
> sent to address@hidden
> 3086927552 [CST Dec  1 14:53:56] debug    : Exists notification is
sent
> to address@hidden
> 3086927552 [CST Dec  1 14:53:56] debug    : 'LogClient' zombie check
> passed [status_flag=0000]
> 3086927552 [CST Dec  1 14:53:56] debug    : 'LogClient' loadavg(5min)
> check passed [current loadavg(5min)=0.2]
> 3086927552 [CST Dec  1 14:53:56] debug    : 'LogClient' cpu usage
check
> passed [current cpu usage=0.0%]
> 3086927552 [CST Dec  1 14:53:56] debug    : 'LogClient' mem amount
check
> passed [current mem amount=2764kB]
> 3086927552 [CST Dec  1 14:53:56] debug    : 'data_dir' filesystem
flags
> has not changed since last cycle
> 3086927552 [CST Dec  1 14:53:56] debug    : 'data_dir' space usage
check
> passed [current space usage=10.6%]
> 
> 
> -----Original Message-----
> From: address@hidden
> [mailto:address@hidden On
> Behalf Of Martin Pala
> Sent: Thursday, November 30, 2006 4:20 PM
> To: The monit developer list
> Subject: Re: <service> start Generates email noise
> 
> Hello,
> 
> this behavior isn't bug - the 'nonexist' event type has possitive and 
> negative variants:
> 
>    Does not exist (positive 'nonexist')
> 
>      vs.
> 
>    Exists (negative 'nonexist')
> 
> The alert statement allows to filter just the general event type, not 
> the particular polarity (there is no 'exist' option).
> 
> => when you have registered the 'nonexist' event, you should get two 
> alerts informing about the beggining and end of the problem.
> 
> Martin
> 
> 
> Aaron Scamehorn wrote:
>> Hello,
>>
>>  From version 4.8 to 4.8.2, the following bug has been introduced:
>>
>> When we issue a monit <service> start command, we get "Does not
exist"
> 
>> and a corresponding "Exists" emails.
>>
>> Here is the debug output showing this behavior in 4.8.2:
>> 'LogClient' Error testing process id [11034] -- No such process
>> 'LogClient' Error testing process id [11034] -- No such process
>> 'LogClient' start: /cogcap/ccts/bin/logclnt
>> 'LogClient' Error testing process id [11034] -- No such process
>> Monitoring enabled -- service LogClient
>> 'LogClient' Error testing process id [11034] -- No such process
>> 'LogClient' process is not running
>> Does not exist notification is sent to address@hidden
>> 'LogClient' Error testing process id [11034] -- No such process
>> 'LogClient' trying to restart
>> Monitoring disabled -- service LogClient (stop)
>> 'LogClient' Error testing process id [11034] -- No such process
>> 'LogClient' process is running with pid 11189
>> Exists notification is sent to address@hidden
>> 'LogClient' zombie check passed [status_flag=0000]
>> 'LogClient' loadavg(5min) check passed [current loadavg(5min)=0.2]
>> 'LogClient' cpu usage check passed [current cpu usage=0.0%]
>> 'LogClient' mem amount check passed [current mem amount=2776kB]
>>
>>
>> Under version 4.8, we don't get the annoying "Does not exist" and a 
>> corresponding "Exists" emails:
>>
>> 'LogClient' Error testing process id [10970] -- No such process
>> 'LogClient' Error testing process id [10970] -- No such process
>> 'LogClient' start: /cogcap/ccts/bin/logclnt
>> 'LogClient' Error testing process id [10970] -- No such process
>> Monitoring enabled -- service LogClient
>> 'LogClient' Error testing process id [10970] -- No such process
>> 'LogClient' Error testing process id [10970] -- No such process
>> 'LogClient' zombie check passed [status_flag=0000]
>> 'LogClient' loadavg(5min) check passed [current loadavg(5min)=0.1]
>> 'LogClient' cpu usage check passed [current cpu usage=0.0%]
>> 'LogClient' mem amount check passed [current mem amount=2776kB]
>>
>>
>>
>> Additionally, in our config file, we have the following set:
>> set alert address@hidden only on { nonexist, exec, connection
}
>>
>> We shouldn't be getting an "Exists" email under any circumstance,
> should 
>> we?
>>
>> Thanks,
>> Aaron
>>
>>
>>
>
------------------------------------------------------------------------
>> _______________________________________________
>> monit-dev mailing list
>> address@hidden
>> http://lists.nongnu.org/mailman/listinfo/monit-dev
> 
> 
> _______________________________________________
> monit-dev mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/monit-dev
> 
> 
> _______________________________________________
> monit-dev mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/monit-dev


_______________________________________________
monit-dev mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monit-dev

Attachment: monitrc.punisher
Description: monitrc.punisher


reply via email to

[Prev in Thread] Current Thread [Next in Thread]