spamass-milt-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Jamming up with mutex_lock


From: Andrew Daviel
Subject: Re: Jamming up with mutex_lock
Date: Wed, 20 Jun 2007 17:54:01 -0700 (PDT)


On Tue, 19 Jun 2007, Dan Nelson wrote:

Try running "thread apply all bt" to get stack traces of all threads at
once.  If it's really a deadlock, you should see a least one other hung
thread with a different stack trace.

Thanks for the tip.

Recently I have not had mutex_lock (where I had several distinct processes running). But I don't entirely understand the thread behaviour.

On a test machine, I see 2 threads with ps -efL with no load. When
a message arrives, a new thread is started to handle it, and exits when the message is delivered, or times out, or SMTP is cancelled. If I send 2 messages at once I get 4 threads in total.

But on the production machine, I see about 20 threads. Several of them
have been around for an hour, far in excess of the sendmail timeouts which are set to 5 minutes total. spamd is generally taking less than 5 seconds, with a 2nd-sigma of maybe 60 seconds. If I look at the threads with gdb, none of them are in mutex_wait but one is in do_sigwait

I tried running the original code for a while and I see the same thing - several threads exceeding the timeout values. It may be if I just let it run for a couple of days it will end up in the mutex_lock state.

I was thinking to try and log the thread id with gettid() to correlate with mail logs, but can't find which library it's in ..

just get the wrong time or the wrong error message.  localtime is only
used here to convert the current time, and strerror won't get called
unless there's already something else wrong.

I've taken out strerror_r; it's too painful. At home with glibc-2.3.6
the Posix version is the default, but on the production machine with
glibc-2.3.4  GNU is the default. Compiling a test C program with
-D_XOPEN_SOURCE=600 as per string.h worked, but not the milter under C++, which still got the GNU version.

As per Joe Maimon's note, getpwent may not be threadsafe (the manpage on a RHEL 4.4 system mentions this, but not on our RHEL 4.3 production system or my FC4 system at home, where I had carefully searched ALL section 3 manpages for the word "thread" ...). Also getpwent_r/getpwent may have the file pointer reset by another thread, so I have put a mutex_lock around the match_gecos sections.

Current diffs
http://andrew.triumf.ca/email/spamass-milter.patch.jun20

--
Andrew Daviel, TRIUMF, Canada
Tel. +1 (604) 222-7376  (Pacific Time)
Network Security Manager




reply via email to

[Prev in Thread] Current Thread [Next in Thread]