help-gplusplus
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

newbie Q.: GNU STL set<>, LinuxThreads, signals, setitimer => misbehavio


From: Frantisek . Rysanek
Subject: newbie Q.: GNU STL set<>, LinuxThreads, signals, setitimer => misbehavior
Date: 5 Feb 2006 23:57:13 -0800
User-agent: G2/0.2

Dear everyone,

I'm facing a problem somewhere between G++ and GNU STL,
LinuxThreads, signals and setitimer - or maybe just my
"delete while traversing a set" algorithm.

Can't say exactly where the problem is - I have tested
all those elements individually on past occasions, but
they don't seem to work together in this case :-)

It might indeed be a multi-way clash, or it might
be a trivial coding error.

I have a for(;;) loop, iterating over a set<>, deleting
elements during the iteration.
That iteration runs in a "consumer" thread. Elements
are added to the set<> in a "producer" thread.
Both these threads run along timers, implemented
using setitimer(), and synchronize using posix mutex
calls wrapped around the critical sections touching
the set<>.
In addition, on program startup and shutdown,
the program main thread uses SIGALRM (and a bool
latch) to invoke shutdown in the two worker threads.

The problem is, that when I invoke shutdown via a SIGTERM
or SIGINT to the program main thread, the loop iteration
goes awry in an interesting way - one loop cycle runs to
the closing bracket, and then the thread starts to eat 100%
of CPU time without starting another loop cycle, but also
without breaking the loop. As if, while evaluating
the loop condition, the comparison operator of the
set<>::iterator got trapped in an endless loop...

I've discovered the problem while working on a somewhat
complex piece of software, involving several
producer/consumer/timer threads - but I've managed to
narrow down the problem to a simpler example.

You can check out the "small" demonstration example here:
http://sweb.cz/Frantisek.Rysanek/cc_test.tgz

The key part of the code can be summarized in an even
shorter form - take a look at the code snippet below.
Please note especially the for(;;) loop that attempts
to delete items of a set<> while iterating.



// <SNIPPET>

class timer_job
{
public:
   struct timeval time_of_enq;
   int check();  // returns a "1" if timed out, returns a "0" if OK
}



set<timer_job*> jobs;



void check_jobs()
{
 set<timer_job*>::iterator i, deletion_candidate;
 bool was_first;

   for (i = jobs.begin(); i != jobs.end(); i++)
   {
      // Check each job in our queue/set,
      // to see if by any chance it has timed out

      deletion_candidate = i;
      if (i == jobs.begin()) was_first = true;
      else was_first = false;

      if ((*i)->check() > 0)   // if job has timed out
      {
         // do not decrement past the array base
         if (! was_first) i--;

         delete (*deletion_candidate);  // kick off, you
         jobs.erase(deletion_candidate);

         // reset ptr to the new array base
         if (was_first) i = jobs.begin();
         // else OK, go ahead from the previous element,
         //    incrementing at loop end
      }
      // else OK, this job is still pending
   }
}

// </SNIPPET>


If I've already answered my question (can't delete
while iterating), please suggest a workaround.
In my practical scenario, I do need to keep
my data in a set<>, rather than e.g. vector<>
or queue<>, because i need to check for unique
occurrence when inserting the set elements, and
the tree rebalancing doesn't hurt that bad.
In the past, when on one occasion I used GLIB gtrees,
I used to stuff my elected "deletion candidates"
into a scratchpad vector<> during the traversal,
and deleted from there afterwards...

In the real-world software piece that I'm working on,
the loop goes nuts, in the same way, even without any
signals flying or threads terminating.

I've tried using sleep() and usleep() for the timing,
which didn't make a difference. Blocking SIGALRM
for the critical sections doesn't make a difference
either.

If I add debugging instrumentation (syslog calls)
to any of the threads, it often does change behavior a bit
- the flaw occurs less frequently, it happens in another
similar thread, two threads go nuts at the same time etc.
(At timing periods in the hundreds of milliseconds, syslog
calls tend to flood syslog and load the system in general.)

If I comment out the whole for(;;) loop, the example code
runs and terminates just fine.

Both the example and my real-world code misbehave
consistently if compiled by GCC 2.96 on Linux 2.2
(RH 6.2) or GCC 3.x on Linux 2.4 (RH 8.0, MDK 9.1),
and on various hardware platforms:
Intel Pentium MMX @ 166 MHz,
Cyrix/NS/AMD Geode GX1 @ 233 MHz,
Intel P4 @ 1.7 GHz.

I sure have further ideas of what else to try, but maybe
someone will point out my mistake straight away, before
I take the time to try all the dead ends :-)

Specifically I'd be interested to know if some of the
components that I'm (ab)using are potentially thread-
and signal-unsafe: GNU STL, SGI STL, GNU libstdc++,
GNU G++, LinuxThreads, setitimer()...

Please note that I'm calling setitimer() from several
threads in parallel, and there does seem to be a per-thread
timer, at least in LinuxThreads, not sure about NPTL.

I'd crosspost to comp.lang.c++, but I'd hate to become
yet another off-topic poster in that very focused
and useful newsgroup.

Thanks for any ideas

Frank Rysanek



reply via email to

[Prev in Thread] Current Thread [Next in Thread]