dotgnu-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[DotGNU]RE: PNet monitors


From: Thong (Tum) Nguyen
Subject: [DotGNU]RE: PNet monitors
Date: Tue, 27 Apr 2004 21:39:20 +1200

Hi,

I've cc-ed this to the developers forum so it can googlized for the
future...

> -----Original Message-----
> From: Russell Stuart [mailto:address@hidden
> Sent: Tuesday, 27 April 2004 8:52 p.m.
> To: Thong (Tum) Nguyen
> Subject: RE: PNet monitors
> 
> On Fri, 2004-04-23 at 23:21, Thong (Tum) Nguyen wrote:
> > > 2.  On pnet/engine/lib_thread.c, line 93, in the function
> > >     _IL_Monitor_CheckAndReturnMonitorToFreeList, we have the code:
> > >   if(monitor->waiters == 0 &&                   // @1
> > >            ILWaitMonitorCanClose(monitor->supportMonitor))
> > >   {
> > >     ILLockWord current;
> > >     /* Remove the monitor from the object if the object hasn't
> > >              been assigned a new monitor */
> > >     current = CompareAndExchangeObjectLockWord( // @2
> > >             thread, obj, IL_LW_MARK(0), IL_LW_MARK(monitor));
> > >     Note that if monitor->waiters>0, then the monitor is in use.
> > >     Freeing it in that case would be bad, which is why this condition
> > >     is checked in the first line.  Ergo, the monitor must be locked
> > >     between the lines marked @1 and @2, otherwise another thread
> > >     aquire the monitor and change waiters with this code noticing.
> > >     There are no locks in this function, but the comment at the top,
> > >     at line 97, says:
> > >        * This method is only safe to call if you have
> > >        * exclusive access to the object's lockword.
> > >     It turns out that this precondition is not strong enough.  Having
> > >     exclusive access to the objects lockword does not mean you have
> > >     exclusive access to the monitor.
> >
> > If you have exclusive access to the lockword you can be sure that the
> > object's monitor will not change.
> 
> Yes - no disagreement here.  You alter the lockword only when a global
> lock is in place (via CompareAndExchangeObjectLockWord).
> 
> >   If the monitor provided (the monitor we
> > "think" the object has) doesn't match the current object's monitor then
> the
> > monitor has already been returned to the freelist.  Only one thread will
> > ever be able to return a monitor to the freelist because that is done
> using
> > a CompareAndExchange.  A monitor is returned to the freelist if an
> object
> > has been "left" and has no waiters.
> >
> > The reason it is done in this seemingly complex way is for speed
> reasons.  A
> > thread will always try to acquire a monitor without using any global
> locks.
> 
> Hmm.  _IL_Interlocked_Decrement_Ri calls ILThreadAtomicStart, which
> acquires a global lock.  There doesn't seem to be any reason why it
> should be global, but it appears it always has been.  This is called
> twice in _IL_Monitor_InternalTryEnter and once in _IL_Monitor_Exit.
> _IL_Monitor_Exit also calls _IL_ObjectLockword_WaitAndMark, which also
> uses a global lock.

Most processors support concurrent CAS and INCR instructions (on x86 you
preprend them with the lock instruction).  Increment and Drecrement has to
use a global lock because it those functions *must* be exclusive to other
Interlocked functions.

> 
> So currently the typical "lock (...) ..." statement acquires 3 global
> locks.  Admittedly, 2 of those (in _IL_Interlocked_Decrement_Ri) don't
> need to be be global, so the total could be 1.  I think it can be
> reduced to 0.

It can be reduced to 0 most of the time.  Eventually interlocked support
will be added "pnet/support" library :-).

> 
> > This means that you might end up getting a monitor that has already been
> > released and is no longer the real monitor for that object.  This "race"
> is
> > what the double check code (line @1) is for.  That double check code is
> used
> > in any place where a stable reference to the object's monitor is
> required.
> > It looks like the Monitor.Wait code doesn't do this properly.
> >
> > >
> > >     Consider these lines in pnet/engine/lib_thread.c, line 264,
> > >     within the function _IL_Monitor_InternalTryEnter:
> > >   monitor = IL_LW_UNMARK(lockword);               // @3
> > >   _IL_Interlocked_Increment_Ri(thread,
> > >           (ILInt32 *)&monitor->waiters);
> > >   lockword = GetObjectLockWord(thread, obj);
> > >   if (IL_LW_MARKED(lockword)
> > >           /* This doubles as a check for lockword == 0 */
> > >           || IL_LW_UNMARK(lockword) != monitor)
> > >   {
> > >     /* Mark the object's lockword */
> > >     _IL_ObjectLockword_WaitAndMark(thread, obj);  // @4
> > >     /* The current thread has exclusive access to
> > >              the object's lockword */
> > >     _IL_Interlocked_Decrement_Ri(thread,
> > >             (ILInt32 *)&monitor->waiters);
> > >     _IL_Monitor_CheckAndReturnMonitorToFreeList(  // @5
> > >              thread, monitor, obj);
> > >     The line marked "@4" locks the objects lockword.  But there
> > >     are no locks in place between where the monitor is read, at
> > >     the line marked "@3" and the line marked "@4".  Ergo the
> > >     lockword may no longer point to the monitor that is about to
> > >     be released at the line marked "@5".  Thus even though the
> > >     lockword is locked, the monitor may not be.  And thus the call
> > >     to _IL_Monitor_CheckAndReturnMonitorToFreeList may go awry.
> > >
> >
> > This is the reasoning for this "double checking" code.
> 
> I have looked at the code again, and agree my original analysis was
> wrong.  That section of the code looks OK.
> 
> > After the monitor is retrieved, there is a possibility that the thread
> > owning the monitor will exit and return the monitor to the free-list.
> This
> > cannot happen if monitor->waiters has been incremented.  To check for
> this,
> > we increment monitor->waiters and then get a fresh copy of the lockword
> (and
> > hence monitor).  If the object now has a new monitor then we know that
> the
> > monitor "monitor" that we have is stale and is no longer associated with
> > that object.  It is possible that the monitor was returned and may have
> even
> > been given to a totally different object which is why we decrement the
> > monitor->waiters count (having a count too high for a short amount of
> time
> > isn't a problem I don't think).  The call to
> CheckAndReturnMonitorToFreeList
> > will actually only return the monitor to the free list if the monitor
> > matches the object's current monitor (this is done with a
> > compare-and-exchange so is thread safe).  This call usually will amount
> to
> > nothing because the monitor *is* different from what we have stored in
> > "monitor" but another thread could have had enough time to free the
> monitor
> > and then reacquire it in the short time between when we read the monitor
> and
> > do the double check (the if statement).
> >
> > Once any thread has locked the lockword of any object, it can be sure
> that
> > the monitor for that object (the least significant bits of the lockword)
> > will not be changed.
> 
> Thanks for the explanation.
> 
> > Which locks?  The compare and exchange the inline increments?  Those
> will
> > eventually be optimized to cpu single instructions once someone gets
> around
> > to it...
> 
> Ahhh - I understand your reasoning now.  That makes sense.
> 
> But it is sort of at odds with the 'P' in PNet isn't it?  Memory
> models and OS locking mechanisms vary widely.  The current
> implementation will always have global locks on machines
> that just have POSIX threads.

I think it's the opposite.  PNET remains as portable as ever -- it will
still fall back to global locks on unsupported platforms and use more
advanced (fast) features on support platforms.  This isn't much different to
any other part of pnet.  A good example is how Rhys has written all the wait
handle, mutex, thread-join and monitor support using pure C and one type of
OS-supplied synchronization object (mutexes).

> 
> As I said in an earlier post, if don't free the monitors you
> can get away with one global lock - when the monitor is first
> attached to the object.  As most monitor objects are long
> lived this generally has little performance impact - even if
> you use pthreads.

That is true but this will mean that you need a way to free the monitor when
the object gets collected.  You can do this by allocating the monitor on the
GC-ed heap.  The only problem then is that you won't be able to allocate
primitive arrays using GCAllocAtomic.  This means that large (large!) arrays
of primitives will have to be unnecessarily scanned by the GC.  Another way
is to attach finalizers to each and every object.  This has side-effects as
well.
 
I don't think it's really possible to know which would be "better" without
some real world benchmarks -- I think both techniques work well in different
situations.  I originally did "just" allocate monitors on the GC heap (it
was HEAPS easier) and that might be something that needs to be explored
again.  At the time I figured using the double-check algorithm would pay off
in the long run.  There's obviously a lot of optimisation that can be done
but getting it to work is of course the number one priority.

> 
> BTW, the code in pthreads seems to be very efficient.  I was
> looking at it the other day to find out what memory model
> Linux i386 assumes - in order to trace down this bug.  I doubt
> you will get much improvement by avoiding it.  I am fairly
> sure the PNet overhead of calling the Monitor.Enter would be
> higher than the phthread_mutex_lock() call.
> 
> > Have you got code that causes the deadlock?  I can trace the race down.
> > I've had to trace many others before :-).
> 
> Firstly I have compiled and implemented the stuff you committed
> over the weekend.  Recall that I said I did the obvious thing -
> added a check for monitor==0 in _IL_Monitor_InternalWait?  I
> see you have now done the same thing.  It has had the same result -
> the program now deadlocks very early on.  If you look in the
> bug history, you can see that I attached a patch, then withdrew
> it.  That was because the my patch made program deadlock - something
> it didn't appear to do before.  It turned out it did deadlock -
> it just that the deadlocks were rare.
> 

There is definitely a deadlock which I can reproduce it easily.  I don't
think it's a "big" problem with the algorithm and I'll chase it down as soon
as I have some more time.  I actually added wait/pulse support at the end
without much testing and it shows ;-).  Me hates deadlocks me does.

All the best,

^Tum



reply via email to

[Prev in Thread] Current Thread [Next in Thread]