spamass-milt-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Jamming up with mutex_lock


From: Andrew Daviel
Subject: Jamming up with mutex_lock
Date: Tue, 19 Jun 2007 02:10:42 -0700 (PDT)


I have been running a modified version of spamass-milter-0.3.1
(match_gecos, per-user rejection threshold). It worked fine in testing, but in production it jams up after a day or so. The milter continues to run, but sendmail cannot connect to it, logging
"error connecting to filter". Sometimes there a few messages
"Milter (spamassassin): to error state"
"milter_read(spamassassin): cmd read returned 0"
earlier, though the milter continues to operate for a while - maybe a couple of hours.

When I look at the processes, I see two or more copies of spamass-milter
in sleep (S) state as well as the parent in sleep (Ss1) state.

If I connect to one of the processes with gdb and do a backtrace, I typically see something like
 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
 in __lll_mutex_lock_wait () from /lib/tls/libc.so.6
 in _L_mutex_lock_29 () from /lib/tls/libc.so.6
 in strdup () from /lib/tls/libc.so.6
 in SpamAssassin::Connect (this=0x8bb01f8) at spamass-milter.cpp:1506
 in mlfi_header ... at spamass-milter.cpp:1148
from which I assume that two threads have got in a deadlocked state.
Sometimes I see "debug" instead of "strdup".

I have tried replacing localtime() and strerror(), which are not threadsafe on Linux, with localtime_r and strerror_r(), but
that does not help.

Elsewhere on the Web I see a comment that mutex lock may be caused by calling malloc or printf inside a signal handler. I don't think spamass-milter is a signal handler, though strdup and vsyslog would call malloc and printf, so it's a not-impossible explanation. I had earlier seen mutex_lock called from strlwr, but have now replaced the complex tolower() call with a much simpler 7-bit ASCII routine.

The somewhat similar smf-clamd milter runs OK with no problem (similar in that it uses the same libraries and also passes mail to a daemon
for processing).

RHEL 4.3
sendmail-8.13.1-3.2.el4.i386
glibc-2.3.4-2.25.i686
kernel 2.6.9-34.0.1.ELsmp

(I doubt that my changes are directly responsible, bacause I've been playing with them without affecting the lock-up. Trying the stock milter on the production machine is an issue because the users expect their
whitelists to work based on match_gecos - address@hidden
-> user "juser")
--
Andrew Daviel, TRIUMF, Canada
Tel. +1 (604) 222-7376  (Pacific Time)
Network Security Manager




reply via email to

[Prev in Thread] Current Thread [Next in Thread]