Sorry for the late reply; my email client mananged to hide your email where I won't see it. I need to fix this.
You'll need to look at the stack frames on the Scheme stack. It can be
done from GDB if necessary, but it might be sufficient to use Guile's
My first thought was to suggest ",break lock-mutex", but that doesn't
work, presumably because it's a C primitive (although we should fix
that), but I was able to patch in a pure Scheme wrapper for it, which
then allows us to set a Scheme breakpoint:
I'll poke around and report back. Meanwhile, some orienting remarks. Since last email, I've accumulated several CPU-months running guile-2.9.2 with zero crashes and zero hangs. So I like it! For threading, I see three or four different behaviors or modes; sometimes it works great, sometimes it doesn't work at all, and I'm still trying to figure out why.
A works-great example:
(par-for-each (lambda (stuff) (... little bit of scheme + CPU-intensive C++)) some-precomputed-list)
The "CPU-intensive C++" are calls that take at least a millisecond to run, sometimes seconds. The above will very happily use all 24 cores and deliver a 24x speedup over single-threaded. Yay!
A works-poorly example:
(par-for-each (lambda (stuff) (.. (fold (lambda () numeric addition after tiny C++)) some-list) list)
Here, "tiny C++" is something that just enters C++ and leaves almost immediately; it's used to grab and return a numeric value. This runs as if it were single-threaded, and delivers performance equivalent to being single-threaded. I don't know why; this is what I reported in the earlier emails.
A doesn't-work example: mostly same as "works-poorly", but performance is worse-than-single-threaded. Sometimes 2x worse. Why, I don't know. (It does seem to do a LOT of gc; that might account for all of the slowdown; not sure. These loops iterate over millions/tens-of-millions of items and can take an hour to complete...)
The worse-than-single-threaded behavior was actually the norm for guile-2.2; its no longer the norm (yay!). In guile-2.2, there seemed to be some kind of livelock, where two threads were 1.5x faster than one, three threads were 1.2x faster than one, and four threads were slower, and sometimes one-thousand-fold slower! (but still making forward progress, i.e. not a deadlock) That era seems to be over, yay!
I'll report on the rest, later, when I get a chance (the compute jobs are hard to manage, and take an hour to set up)