bug-commoncpp
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thread::isRunning() returning false even though thread hasn't fully


From: Matt Scifo
Subject: Re: Thread::isRunning() returning false even though thread hasn't fully terminated
Date: Wed, 19 Nov 2003 17:56:10 -0800

I've noticed some other interesting things regarding this issue.

Though I didn't mention it in my original post, the code that launches
my threads is part of a shared library that I link in with my main
application.  

Here is the code that loads the modules...

    DSO *handle;
    try {
        handle = new DSO(filename, false);
    } catch (DSO *error) {
        cout << "Error loading dso: " << error->getError() << endl;
        return false;
    };

    load_module_t* load_module = (load_module_t*)
(*handle)["load_module"];
    create_instance_t* create_instance = (create_instance_t*)
(*handle)["create_instance"];
    destroy_instance_t* destroy_instance = (destroy_instance_t*)
(*handle)["destroy_instance"];


load_module(), create_instance(), and destroy_instance() are extern C
functions that are included in each c++ module.

I noticed something funny when I added code to delete the dso handle. 
Immediately upon issuing a "delete handle" for the module MyThread, all
the threads terminate and show the proper GDB message 
"[Thread xxxxxxx (zombie) exited]".  So for some reason, threads are not
fully terminating until seconds after they have exited their run()
method and have been deleted, yet terminate immediately upon deleting
the dso handle that was used to load the module.  How strange is that?

So, while not fully satisfied that this was a valid workaround, this at
least gave me a way to make sure all threads were really terminating
before continuing.


The continuation is now causing me another thread headache.  After all
the threads have been deleted (as the result of a SIGHUP handler), my
application reloads it's configuration and attempts to recreate the
threads.  Note that it is fully recreating them, not trying to restart
the old threads (which don't even exist since they were deleted).  I
noticed that while the code to recreate the threads finished 
successfully without error, only a few threads were actually created. 
It didn't make any sense.  Until I did an "info threads" in GDB.

(gdb) info threads
// 35 MyThread module threads
  289 Thread -1080124624 (LWP 3460)  0xffffe002 in ?? ()
  288 Thread -1087489232 (LWP 3459)  0xffffe002 in ?? ()
  287 Thread -1094853840 (LWP 3458)  0xffffe002 in ?? ()
  286 Thread -1102218448 (LWP 3457)  0xffffe002 in ?? ()
  285 Thread -1109583056 (LWP 3456)  0xffffe002 in ?? ()
  284 Thread -1116947664 (LWP 3455)  0xffffe002 in ?? ()
  283 Thread -1124312272 (LWP 3454)  0xffffe002 in ?? ()
  282 Thread -1131676880 (LWP 3453)  0xffffe002 in ?? ()
  281 Thread -1139041488 (LWP 3452)  0xffffe002 in ?? ()
  280 Thread -1146406096 (LWP 3451)  0xffffe002 in ?? ()
  279 Thread -1153770704 (LWP 3450)  0xffffe002 in ?? ()
  278 Thread -1161135312 (LWP 3449)  0xffffe002 in ?? ()
  277 Thread -1168499920 (LWP 3448)  0xffffe002 in ?? ()
  276 Thread -1175864528 (LWP 3447)  0xffffe002 in ?? ()
  275 Thread -1183229136 (LWP 3446)  0xffffe002 in ?? ()
  274 Thread -1190593744 (LWP 3445)  0xffffe002 in ?? ()
  273 Thread -1197958352 (LWP 3444)  0xffffe002 in ?? ()
  272 Thread -1205322960 (LWP 3443)  0xffffe002 in ?? ()
  271 Thread -1212687568 (LWP 3442)  0xffffe002 in ?? ()
  270 Thread -1220052176 (LWP 3441)  0xffffe002 in ?? ()
  269 Thread -1227416784 (LWP 3440)  0xffffe002 in ?? ()
  268 Thread -1234781392 (LWP 3439)  0xffffe002 in ?? ()
  267 Thread -1242146000 (LWP 3438)  0xffffe002 in ?? ()
  266 Thread -1249510608 (LWP 3437)  0xffffe002 in ?? ()
  265 Thread -1256875216 (LWP 3436)  0xffffe002 in ?? ()
  264 Thread -1264239824 (LWP 3435)  0xffffe002 in ?? ()
  263 Thread -1271604432 (LWP 3434)  0xffffe002 in ?? ()
  262 Thread -1278969040 (LWP 3433)  0xffffe002 in ?? ()
  261 Thread -1286333648 (LWP 3432)  0xffffe002 in ?? ()
  260 Thread -1293698256 (LWP 3431)  0xffffe002 in ?? ()
  259 Thread -1301062864 (LWP 3430)  0xffffe002 in ?? ()
  258 Thread -1308427472 (LWP 3429)  0xffffe002 in ?? ()
  257 Thread -1315792080 (LWP 3428)  0xffffe002 in ?? ()
  256 Thread -1323156688 (LWP 3427)  0xffffe002 in ?? ()
  255 Thread -1330521296 (LWP 3426)  0xffffe002 in ?? ()
  254 Thread -1337885904 (LWP 3425)  0xffffe002 in ?? ()
// recreated module_handler thread
  253 Thread -1345250512 (LWP 3424)  0xffffe002 in ?? ()
// main process
* 1 Thread 1086991616 (LWP 3166)  0xffffe002 in ?? ()    

The number of threads my application creates is variable and can be
increased/decreased by calling a function and then of course reloading,
which is explained above.  In linux, the number of user-creatable
processes is quite high, but it is limited further by memory.  If the
required stack size for a thread is higher than the amount of free
memory, then boom you can't create any more threads, even if the OS says
your limit is much higher.

I am currently launching 250 threads of the MyThread module, along with
the main process and another module_handler thread.  So 252 threads
total.  

My current environment settings...

$ ulimit -s -u
stack size            (kbytes, -s) 7192
max user processes            (-u) 7168

That means that for each thread I launch I use 7192KB of memory.  When
my application HUPs it (supposedly) terminates the MyThread module
threads and the module_handler thread.  However, when it recreates the
module_handler and (in this case) the 250 MyThread module threads, GDB
shows that the thread count starts at 253.  From there it only launches
36 more threads (1 module_handler and 35 MyThread module threads) out of
251 threads before it reaches the operating system limit (due to memory
and required thread stack size) of 289 threads.

For some reason, threads are not properly terminating and registering
that termination with the kernel.

David, could this be a bug in the way commoncpp handles the termination
of threads?  This is really frustrating.  Can anyone help?

Thanks

Matt Scifo





reply via email to

[Prev in Thread] Current Thread [Next in Thread]