bug-cfengine
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Strange cfexecd/cfagent interaction


From: Mark . Burgess
Subject: Re: Strange cfexecd/cfagent interaction
Date: Tue, 20 Aug 2002 18:42:22 +0200 (MET DST)

I like this explanation. I hope it's true, because it would not be
correct to leave stdin open....

M

On 20 Aug, Tim Auckland wrote:
> I think you'll see the EBADF on the systems that are working too. 
> That's quite normal on an exec.
> 
> I would suspect this is another instance of cfexecd's pthread running
> out of stack space.  This happened a lot in the betas of 2.0.0, but
> should be fixed by now.  As with any memory problem, commenting out
> an unrelated line of code can sometimes "fix" the problem.
> 
> Take a look at the thread initialisation code in cfexecd, and try more
> stack space, or try compiling without threads support, and see if that
> makes any difference.
> 
> Tim
> 
> On Tue, 2002-08-20 at 09:07, David J. Bianco wrote:
>> I've noticed on about 7 of my machines (out of about 150), cfexecd
>> seems like it can't run the cfagent process when it starts up.  Here's
>> what I see in my syslog:
>> 
>> Aug 20 11:01:53 xxx.jlab.org cfexecd[26729]:  cfengine defines no system
>> administrator address
>> Aug 20 11:01:53 xxx.jlab.org cfexecd[26729]:  Need: sysadm = ( 
>> address@hidden )
>> in control 
>> 
>> Now, I use the same config files on each of my hosts, and the same
>> binaries too, architecture permitting.  None of my other hosts complain
>> and a manual check of the cfagent.conf file shows that I do define
>> my email address there properly.  I even get tons of reports emailed to
>> me from the other machines, but not these malfunctioning 7.
>> 
>> On the machines that malfunction, an strace of cfexecd when it starts
>> up shows the following excerpt:
>> 
>> [pid   997] close(0)                    = 0
>> [pid   997] getpid()                    = 997
>> [pid   997] rt_sigaction(SIGRT_0, {SIG_DFL}, NULL, 8) = 0
>> [pid   997] rt_sigaction(SIGRT_1, {SIG_DFL}, NULL, 8) = 0
>> [pid   997] rt_sigaction(SIGRT_2, {SIG_DFL}, NULL, 8) = 0
>> [pid   997] execve("/var/cfengine/sbin/cfagent",
>> ["/var/cfengine/sbin/cfagent", "-z"], [/* 57 vars */]) = 0
>> [pid   997] fcntl(0, F_GETFD)           = -1 EBADF (Bad file descriptor)
>> [pid   997] --- SIGSEGV (Segmentation fault) ---
>> <... read resumed> "", 4096)            = 0
>> --- SIGCHLD (Child exited) ---
>> 
>> Translation: cfexecd tried to exec cfagent -z.  Cfagent started, but
>> before main() was invoked the process initialization routine tried 
>> to see if stdin should be preserved across the exec.  Stdin was already
>> closed, though, so fcntl() segfaulted before cfagent really had a chance
>> to run.  
>> 
>> I traced this down to one line in cfpopen.c which seemed to be the 
>> trigger for this behavior, line 89:
>> 
>> if (pid == 0)
>>     {
>>     switch (*type)
>>        {
>>        case 'r':
>> 
>>              /* THIS CLOSE IS THE TRIGGER LINE FOR THE BUG */
>>            close(pd[0]);        /* Don't need output from parent */
>> 
>>            if (pd[1] != 1)
>>               {
>>               dup2(pd[1],1);    /* Attach pp=pd[1] to our stdout */
>>               dup2(pd[1],2);    /* Merge stdout/stderr */
>>               close(pd[1]);
>>               }
>> 
>>            break;
>> 
>> This is the line that actually closes stdin for the newly created
>> child process.  If I comment it out, cfagent runs beautifully.
>> If I leave it in, it bombs when cfexecd starts up.  
>> 
>> Now, I would argue that this is probably a bug in fcntl, since it
>> should do some sort of error checking and return a -1 with errno
>> set, rather than just segfaulting.  Still, this code has been failing
>> on more than one OS.  The 7 machines it has trouble on are a mixture
>> of HP-UX, Linux and Solaris. 
>> 
>> Has anyone else seen this?  What would the implications be of *not*
>> closing the child's stdin before execing cfagent?  My brief analysis
>> leads me to believe that it would be pretty safe, but I haven't looked
>> into every call to cfpopen() in all parts of the code.  
>> 
>> Anyway, I'm not sure what the final fix for this is, but it seems 
>> that keeping stdin open might be a good one.
>> 
>>      David
>> 
>> 
>> -- 
>> David J. Bianco, GSEC                <address@hidden>
>> Thomas Jefferson National Accelerator Facility
>> 
>>      The views expressed herein are solely those of the author and
>>          not those of SURA/Jefferson Lab or the US DOE.
>> 
>> 
>> 
>> _______________________________________________
>> Bug-cfengine mailing list
>> address@hidden
>> http://mail.gnu.org/mailman/listinfo/bug-cfengine
> 
> 
> 
> 
> _______________________________________________
> Bug-cfengine mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/bug-cfengine



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Work: +47 22453272            Email:  address@hidden
Fax : +47 22453205            WWW  :  http://www.iu.hio.no/~mark
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~






reply via email to

[Prev in Thread] Current Thread [Next in Thread]