chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] Scheduler fdset assert patch


From: Jonathan Chan
Subject: Re: [Chicken-hackers] Scheduler fdset assert patch
Date: Thu, 09 Jul 2015 16:29:53 -0400

... and here is hopefully the final version of the patch for review. In
the end I think hijacking a thread to run the signal handler is too much
prone to error. Although the previous patches try to cope with this
behavior, as soon as the signal handler tries to do something like lock
a mutex via srfi-18 or yield, very strange things will happen depending
on what thread was hijacked, which threads were blocking, etc. Blocking
##sys#schedule from running is a partial fix, but doesn't help when it
would have been expected for ##sys#schedule not to return immediately,
e.g. if a signal handler tries to block on a socket. For example, seems
to make the the tcp unit go into a constant loop while trying to recv.

In this newest patch, the ##sys#interrupt-handler in scheduler.scm will
instead save the old handler in ##sys#signal-vector, change the handler
to one that makes a thread with ##sys#make-thread to call the old
handler and then schedules that thread to run, wrap state so that the
original ##sys#interrupt-handler will reset the handler and then finally
call the old hook so everything starts going.

In addition to the new patch, I have attached a revised version of the
old test case that should be more likely to trigger the assert failure
in unpatched versions of the scheduler as well as a new test case that
1) shows problems with the previous patches 2) ensures that the signal
handler can properly call blocking code and 3) possibly demonstrates a
bug with multiple threads blocking on one fd that I am not certain
about.

Instructions for scheduler-fdset.scm:

1. Run with csi or csc.
2. Send SIGINT (e.g. with Ctrl-C)
3. The test should write "locking... unlocking... unlocked" and then
exit without error.

Instructions for scheduler-fdset-2.scm:
You will need to compile sink.c with gcc sink.c -o sink

1. Run the test
2. Wait a short bit for the child processes to start
3. Start netcat in a new terminal with `nc -l 8080`
4. Send SIGINT (e.g. with ^C)
5. Verify that the process is not using 100% CPU
6. Type `hi` (or anything) in the terminal with netcat and press Return
7. The test should exit without error

Sorry for the many versions and emails! This at least lets me get back
to the work that needed the scheduler in the first place.
I hope that this finally kills the bug...

All the best,
-- 
  Jonathan Chan
  address@hidden

On Thu, Jul 9, 2015, at 03:18 AM, Jonathan Chan wrote:
> I realized that if I am able to set a flag about whether or not we are
> in the interrupt hook, I might as well save the previous block object of
> the thread and then restore it instead of possibly causing spurious
> wakeups. As an aside, it seems as if force-primordial doesn't currently
> do this, and relies on the code that asked the scheduler to block on i/o
> to retry on its own. As I stated in the previous email, this seems to be
> what the code that uses the scheduler's blocking facilities does anyway.
> 
> In any case, this patch also passes my tests (as the previous one did)
> but now should more cleanly avoid "spurious wakeups" of threads blocking
> on I/O.
> 
> Also, I realized that the test case lacked instructions; you should just
> be able to start it, wait a second, then send SIGINT (e.g. with Ctrl-C)
> and the unpatched scheduler should cause the program to exit with an
> assert error while with the patched scheduler you should just see a
> nice:
> 
>     address@hidden tests]$ csi -s scheduler-fdset.scm
>     ^Clocking...
>     unlocking...
>     unlocked...
>     done!
> 
> Best regards,
> -- 
>   Jonathan Chan
>   address@hidden
> 
> On Wed, Jul 8, 2015, at 11:34 PM, Jonathan Chan wrote:
> > Hi again,
> > 
> > It turns out that changing the current thread's state to 'running
> > doesn't actually work in all cases because a) unblocking from condition
> > variables will skip a thread if its state is running b) the call to
> > change the state back was never called because the call to the old hook
> > would never return. I found this by doing some more testing with the
> > program that caused me to run into the original assert bug.
> > 
> > Here is a revised patch which instead sets a variable to indicate
> > whether or not we are in the interrupt hook and skips the scheduler if
> > so. I've also included an updated test case that catches the problem
> > with the previous patch.
> > 
> > Best regards,
> > -- 
> >   Jonathan Chan
> >   address@hidden
> > 
> > On Wed, Jul 8, 2015, at 08:12 PM, Jonathan Chan wrote:
> > > Hi all,
> > > 
> > > I ended up having some free time before the weekend and went back to the
> > > scheduler assert problem. I wrote a test case for the bug and revised
> > > the scheduler assert patch after some further testing.
> > > 
> > > In the previous patch, the thread in which the interrupt handler ran
> > > would just be removed from ##sys#fd-list if the block object was changed
> > > from a file. This didn't seem to handle some cases where the interrupt
> > > handler ended up unblocking. In one of these cases, when mutex-lock! and
> > > then mutex-unlock! were called in succession, it also turned out that
> > > the "current thread"'s state needed to be saved, set to 'running during
> > > the interrupt handler, then restored in order to avoid mutex-unlock!
> > > from calling the scheduler and causing the program to hang/do strange
> > > things.
> > > 
> > > I've attached the the test case for the bug and the revised patch.
> > > Again, if someone experienced with the scheduler could review it and
> > > make sure it doesn't cause strange things to happen, it would be very
> > > much appreciated.
> > > 
> > > Best regards,
> > > -- 
> > >   Jonathan Chan
> > >   address@hidden
> > > 
> > > On Sat, Jul 4, 2015, at 07:49 PM, Jonathan Chan wrote:
> > > > Hello all,
> > > > 
> > > > I was in the IRC channel earlier trying to find out why my program was 
> > > > failing due to a scheduler assertion about the fdset for poll. This 
> > > > would happen consistently when I sent SIGINT.
> > > > 
> > > > The error looks like:
> > > > 
> > > > scheduler.c:51: C_fd_ready: Assertion `fd == C_fdset_set[pos].fd' failed
> > > > 
> > > > After looking at the scheduler code, it looks like this happened 
> > > > because 
> > > > the SIGINT was causing the interrupt handler to be called, which ended 
> > > > up unblocking some thread to run a signal handler, which caused a 
> > > > condition variable broadcast. In the end, this made a thread that was 
> > > > previously blocking on a file descriptor to now block on a mutex.
> > > > 
> > > > When create-fdset procedure iterated over the ##sys#fd-list, it skipped 
> > > > threads in the list who were not blocking on fds while it added those 
> > > > who were to the fdset using fdset-set. However, it did not remove these 
> > > > threads from ##sys#fd-list, which resulted in the scheduler throwing an 
> > > > error because it expected a fd to be in the fdset when it was not there.
> > > > 
> > > > I have attached a patch which modifies create-fdset to remove threads 
> > > > from ##sys#fd-list who are not blocking on an fd, rather than only skip 
> > > > them. This seems to fix my problem and has not introduced any bizarre 
> > > > behvaior in the few tests I have run, but hopefully someone who is more 
> > > > familiar with the scheduler could look it over and see if it fixes the 
> > > > issue.
> > > > 
> > > > Best regards,
> > > > -- 
> > > >    Jonathan Chan
> > > >    address@hidden
> > > > _______________________________________________
> > > > Chicken-hackers mailing list
> > > > address@hidden
> > > > https://lists.nongnu.org/mailman/listinfo/chicken-hackers
> > > > Email had 1 attachment:
> > > > + scheduler-assert.diff
> > > >   2k (text/plain)
> > > _______________________________________________
> > > Chicken-hackers mailing list
> > > address@hidden
> > > https://lists.nongnu.org/mailman/listinfo/chicken-hackers
> > > Email had 2 attachments:
> > > + scheduler-assert-fix.diff
> > >   5k (application/octet-stream)
> > > + scheduler-fdset.scm
> > >   2k (application/octet-stream)
> > _______________________________________________
> > Chicken-hackers mailing list
> > address@hidden
> > https://lists.nongnu.org/mailman/listinfo/chicken-hackers
> > Email had 3 attachments:
> > + scheduler-fdset.scm
> >   2k (application/octet-stream)
> > + sink.c
> >   1k (text/plain)
> > + scheduler-assert-fix.diff
> >   14k (application/octet-stream)
> _______________________________________________
> Chicken-hackers mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/chicken-hackers
> Email had 1 attachment:
> + scheduler-assert-fix.diff
>   14k (application/octet-stream)

Attachment: scheduler-assert-new.diff
Description: Binary data

Attachment: scheduler-fdset-2.scm
Description: Binary data

Attachment: scheduler-fdset.scm
Description: Binary data

Attachment: sink.c
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]