[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: problem with dead processes

From: Martin Pala
Subject: Re: problem with dead processes
Date: Mon, 20 Aug 2012 16:26:27 +0200

The /proc filesystem is kernel interface … if the /proc/12222/ exists and getpgid() returns positive value, the PID seems to be running - maybe the bug is in the "ps" on your system, or "ps" is somehow modified to not display certain processes - this may be footprint of hacked system. There are tool little informations about your system (is it physical machine or virtualised one?), but it seems that the problem really is not related to monit.


On Aug 20, 2012, at 4:08 PM, Avi Vigder <address@hidden> wrote:

The issue is not with monit but with the system
The pid in the pid file was 12222
The small program:
#include <netdb.h>
int main(int argc, char **argv)
        int xx = getpgid(12222);
        printf("the gpid is %d\n", xx);
Prints a positive number
Ps –ef | grep 12222 – does NOT show a process with ID 12222, but there is a directory
/proc/12222,  (which cwd, points to another process that has a different pid (that shows in /proc and on ps –ef).
Any ideas?
From: address@hidden [mailto:monit-address@hidden] On Behalf Of Martin Pala
Sent: Monday, August 20, 2012 1:06 PM
To: This is the general mailing list for monit
Subject: Re: problem with dead processes
Please can you provide the following data?:
1.) output of "cat <path_to_the_pidfile_of_the_process_which_is_not_running>"
2.) output of "ps -ef | grep `cat <path_to_the_pidfile_of_the_process_which_is_not_running>`"
3.) output of: strace -f -o /tmp/monit.strace -p `cat /var/run/`   (note: the output will be stored in the /tmp/monit.strace file)
On Aug 20, 2012, at 9:01 AM, Avi Vigder <address@hidden> wrote:

I do check with pid file, but this is not the issue.
We did encounter what you describe, on server startup, and have an init processes to clean the pid files for that.
The situation here is that the pid file contains a pid number that does not exist when checking with ‘ps –ef’ but the getpgid returns a positive number on that pid.
From: address@hidden [mailto:monit-address@hidden] On Behalf Of Martin Pala
Sent: Sunday, August 19, 2012 11:56 PM
To: This is the general mailing list for monit
Subject: Re: problem with dead processes
what process check type do you use? The pidfile based or pattern based? If pidfile, monit depends on the pidfile content => it is possible that the pidfile contains valid PID, which was assigned to different process after the original one died => monit thinks that the process is running (which in fact is true, as monit is set to watch the given PID and has no other information about how the process should look like). The solution could be to use the pattern based process check.
On Aug 19, 2012, at 5:08 PM, Avi Vigder <address@hidden> wrote:

We run monit 5.4 and monitor large number of processes (1K-2K).
When I kill all the processes. Most are restarted by monit.
but some are marked as running although the processes is not alive.
I’ve notices that monit uses getpgid() to determine if the processes is alive, and it seems that the system call returns a positive value although the pid does not exist (checking with ps –ef).
Any ideas?
To unsubscribe:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]