Have you thought about putting the program's in groups I use this method to stop and start groups of apps with monit without any issues and I am starting between 10 and 16 processes.
Sent from my iPhone
Okie, we switched our central 'launch' script which essentially takes the list of apps from 'monit summary' stops them (some of them), then does a start on the list that matches the regex. If 10 sequential commands get sent to monit it will fail to start 1 or 2 of them and I see this error in my logs. Does monit have issues receiving multiple commands all at once? Seems like an issue to me that monit can't scale to handle requests like this. This is a multi-user environment where app owners stop and start their apps at their leisure.
<27> Feb 3 12:41:00.441595 -08:00 dev001 monit: monit: action failed -- Other action already in progress -- please try again later
On Thu, Feb 2, 2012 at 12:53 PM, Christopher Johnston <address@hidden>
Ok - I grokked the script that handles the restar. I think this could be the cause, it is essentially doing a 'stop && start' so the initiating start is producing that message since there is already another action going (to stop the app). We will modify this to use 'restart' instead.
On Thu, Feb 2, 2012 at 10:55 AM, Christopher Johnston <address@hidden>
I am a little confused on why I am seeing this. I have 4 applications on my host (in some cases up to 10) where we need to do a dailly/weekly rolling restart of all the apps on the host. If I signal 4 monit restart commands to the apps in sequence I will end up in a situation where only 2 or 3 out of the apps come up and monit complains that an action is already in progress (assuming its from the other commands). Monit can't handle getting signaled 4x to take down apps and restart them? This creates some issues for us when we are doing a mass code roll out to 100s of applications. We end up having to go and clean up things manually and the driver behind using monit is to provide an automated framework for managing apps and guaranteeing uptime.
Is there any way to remedy this? We are using a very low timeout in monit since we can't risk having apps down for long periods could this have something to do with it?
<27> Feb 2 07:48:43.202228 -08:00 dev001 monit: monit: action failed -- Other action already in progress -- please try again later