Hi,
Here's another version that crashes quickly with "very high
probability".
(cond-expand
(chicken-5 (import (chicken base))
(import (chicken time))
(import srfi-18))
(else (import chicken)
(use srfi-18)))
(define m (make-mutex))
(print "@@ " (current-thread) " " "lock")
(mutex-lock! m)
(define t (current-milliseconds))
(define (get-tosleep)
(/ (floor (* 1000 (- (+ t .030) (current-milliseconds)))) 1000))
(thread-start!
(make-thread (lambda ()
;; (thread-sleep! .01)
(print "@@ " (current-thread) " " "lock")
(let lp ()
(when (not (mutex-lock! m (get-tosleep)))
(thread-yield!)
(lp)))
(print "@@ " (current-thread) " " "unlock")
(mutex-unlock! m))))
(print "@@ " (current-thread) " " "sleep")
(thread-sleep! (get-tosleep))
(print "@@ " (current-thread) " " "unlock")
(mutex-unlock! m)
(thread-yield!)
(thread-sleep! .01)
(print "All ok!!")
--- typical output of a failing execution:
$ stdbuf -oL -eL ./t |& cat -n
1 @@ #<thread: primordial> lock
2 #<thread: primordial>: locking #<mutex>
3 @@ #<thread: primordial> sleep
4 #<thread: primordial> blocks for timeout 933.0
5 ==================== scheduling, current: #<thread: primordial>,
ready: (#<thread: thread1>)
6 timeout: #<thread: primordial> -> 933.0 (now: 904)
7 switching to #<thread: thread1>
8 @@ #<thread: thread1> lock
9 #<thread: thread1>: locking #<mutex>
10 #<thread: thread1> blocks for timeout 933.0
11 #<thread: thread1> sleeping on mutex mutex0
12 ==================== scheduling, current: #<thread: thread1>,
ready: ()
13 timeout: #<thread: primordial> -> 933.0 (now: 904)
14 timeout: #<thread: primordial> -> 933.0 (now: 934)
15 timeout expired for #<thread: primordial>
16 unblocking: #<thread: primordial>
17 timeout: #<thread: thread1> -> 933.0 (now: 934)
18 timeout expired for #<thread: thread1>
19 unblocking: #<thread: thread1>
20 switching to #<thread: primordial>
21 @@ #<thread: primordial> unlock
22 #<thread: primordial>: unlocking mutex0
23
24 Error: (mutex-unlock) Internal scheduler error: unknown thread
state
25 #<thread: thread1>
26 ready
27
28 Call history:
29
30 t.scm:27: chicken.base#print
31 t.scm:28: get-tosleep
32 t.scm:15: chicken.time#current-milliseconds
33 t.scm:15: scheme#floor
34 t.scm:15: scheme#/
35 t.scm:28: srfi-18#thread-sleep!
36 t.scm:29: srfi-18#current-thread
37 t.scm:29: chicken.base#print
38 t.scm:30: srfi-18#mutex-unlock! <--
(There's an extra debug message on line 15.
Add (dbg "timeout expired for " tto) in this true branch:
(if (>= now tmo1) ; timeout reached?
in ##sys#schedule)
--- The issue
mutex-unlock! makes the decision that a thread freed from
the mutex's waiting list cannot be in the 'ready state.
From the output above you see a case how a thread waiting on a mutex
can end up being in the 'ready state.
line 2: The mutex is locked by primordial thread (pt)
line 4: The pt goes to sleep until 933.0
line 7: As the pt goes to sleep thread1 is scheduled to run
line 10: thread1 tries to lock the mutex, but sets a timeout that
happens to be at time 933.0
lines 12-14: Both threads asleep, time advances to 934
lines 15-16: pt gets put on the ready list
lines 17-19: thread1 gets put on the ready list
line 20: pt starts running
lines 21-22: pt executes mutex-unlock! while thread1 is ready to run
--- A fix
Just allow the 'ready state for threads in mutex-unlock!
In the patch I arbitrarily call ##sys#schedule after removing a thread
from the list, but I think doing nothing would work equally well.
Is this a correct fix?
Sorry, I can't help with that one..
Maybe it's possible there's threads on the waiting list, but the thread
that gets removed is not going to lock the mutex:
There are 3 threads in this scenario, A, B and C.
* A locks mutex
* A sleeps until t
* B tries to lock mutex until t
* C tries to lock mutex
* A and B are woken up at t
* A unlocks mutex, frees B
* B is scheduled to run as per the patch
* B finds out about the timeout, gives up and starts doing something else
* Now thread C is waiting on the mutex but no-one is going to free it!