chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] PATCH: Re: Regarding #1564: srfi-18: (mutex-unlock


From: Jörg F . Wittenberger
Subject: Re: [Chicken-hackers] PATCH: Re: Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error
Date: 03 Dec 2018 22:26:34 +0100

Attached a version against master.

This only appears to be correct. :-/ I messed up with the prefix, thus srfi-18 did not load and I will really not find the time to come back to the issue in a timely manner.

On Dec 3 2018, Jörg F. Wittenberger wrote:

Attached a patch against 4.13

master still compiling

On Nov 30 2018, megane wrote:

Hi,

Here's another version that crashes quickly with "very high
probability".

(cond-expand
(chicken-5 (import (chicken base))
           (import (chicken time))
           (import srfi-18))
(else (import chicken)
      (use srfi-18)))

(define m (make-mutex))

(print "@@ " (current-thread) " " "lock")
(mutex-lock! m)

(define t (current-milliseconds))
(define (get-tosleep)
 (/ (floor (* 1000 (- (+ t .030) (current-milliseconds)))) 1000))

(thread-start!
(make-thread (lambda ()
               ;; (thread-sleep! .01)
               (print "@@ " (current-thread) " " "lock")
               (let lp ()
                 (when (not (mutex-lock! m (get-tosleep)))
                   (thread-yield!)
                   (lp)))
               (print "@@ " (current-thread) " " "unlock")
               (mutex-unlock! m))))
(print "@@ " (current-thread) " " "sleep")
(thread-sleep! (get-tosleep))
(print "@@ " (current-thread) " " "unlock")
(mutex-unlock! m)
(thread-yield!)
(thread-sleep! .01)
(print "All ok!!")

--- typical output of a failing execution:

$ stdbuf -oL -eL ./t |& cat -n
    1   @@ #<thread: primordial> lock
    2   #<thread: primordial>: locking #<mutex>
    3   @@ #<thread: primordial> sleep
    4   #<thread: primordial> blocks for timeout 933.0
5 ==================== scheduling, current: #<thread: primordial>, ready: (#<thread: thread1>)
    6   timeout: #<thread: primordial> -> 933.0 (now: 904)
    7   switching to #<thread: thread1>
    8   @@ #<thread: thread1> lock
    9   #<thread: thread1>: locking #<mutex>
   10   #<thread: thread1> blocks for timeout 933.0
   11   #<thread: thread1> sleeping on mutex mutex0
12 ==================== scheduling, current: #<thread: thread1>, ready: ()
   13   timeout: #<thread: primordial> -> 933.0 (now: 904)
   14   timeout: #<thread: primordial> -> 933.0 (now: 934)
   15   timeout expired for #<thread: primordial>
   16   unblocking: #<thread: primordial>
   17   timeout: #<thread: thread1> -> 933.0 (now: 934)
   18   timeout expired for #<thread: thread1>
   19   unblocking: #<thread: thread1>
   20   switching to #<thread: primordial>
   21   @@ #<thread: primordial> unlock
   22   #<thread: primordial>: unlocking mutex0
   23
24 Error: (mutex-unlock) Internal scheduler error: unknown thread state
   25   #<thread: thread1>
   26   ready
   27
   28           Call history:
   29
   30           t.scm:27: chicken.base#print
   31           t.scm:28: get-tosleep
   32           t.scm:15: chicken.time#current-milliseconds
   33           t.scm:15: scheme#floor
   34           t.scm:15: scheme#/
   35           t.scm:28: srfi-18#thread-sleep!
   36           t.scm:29: srfi-18#current-thread
   37           t.scm:29: chicken.base#print
   38           t.scm:30: srfi-18#mutex-unlock!         <--

(There's an extra debug message on line 15.
Add (dbg "timeout expired for " tto) in this true branch:

(if (>= now tmo1) ; timeout reached?

in ##sys#schedule)

--- The issue
mutex-unlock! makes the decision that a thread freed from
the mutex's waiting list cannot be in the 'ready state.

From the output above you see a case how a thread waiting on a mutex
can end up being in the 'ready state.

line  2: The mutex is locked by primordial thread (pt)
line  4: The pt goes to sleep until 933.0
line  7: As the pt goes to sleep thread1 is scheduled to run
line 10: thread1 tries to lock the mutex, but sets a timeout that
        happens to be at time 933.0

lines 12-14: Both threads asleep, time advances to 934
lines 15-16: pt gets put on the ready list
lines 17-19: thread1 gets put on the ready list
line 20: pt starts running
lines 21-22: pt executes mutex-unlock! while thread1 is ready to run

--- A fix

Just allow the 'ready state for threads in mutex-unlock!

In the patch I arbitrarily call ##sys#schedule after removing a thread
from the list, but I think doing nothing would work equally well.

Is this a correct fix?
Sorry, I can't help with that one..

Maybe it's possible there's threads on the waiting list, but the thread
that gets removed is not going to lock the mutex:

There are 3 threads in this scenario, A, B and C.

* A locks mutex
* A sleeps until t
* B tries to lock mutex until t
* C tries to lock mutex
* A and B are woken up at t
* A unlocks mutex, frees B
* B is scheduled to run as per the patch
* B finds out about the timeout, gives up and starts doing something else
* Now thread C is waiting on the mutex but no-one is going to free it!


Attachment: 0002-Fix-1564-internal-scheduler-error.patch
Description: 0002-Fix-1564-internal-scheduler-error.patch


reply via email to

[Prev in Thread] Current Thread [Next in Thread]