dotgnu-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [DotGNU]RE: PNet monitors


From: Thong (Tum) Nguyen
Subject: RE: [DotGNU]RE: PNet monitors
Date: Wed, 28 Apr 2004 01:22:29 +1200

> -----Original Message-----
> From: Russell Stuart [mailto:address@hidden
> Sent: Tuesday, 27 April 2004 10:58 p.m.
> To: Thong (Tum) Nguyen
> Cc: address@hidden
> Subject: [DotGNU]RE: PNet monitors
> 
> On Tue, 2004-04-27 at 19:39, Thong (Tum) Nguyen wrote:
> > I think it's the opposite.  PNET remains as portable as ever -- it will
> > still fall back to global locks on unsupported platforms and use more
> > advanced (fast) features on support platforms.  This isn't much
> different to
> > any other part of pnet.  A good example is how Rhys has written all the
> wait
> > handle, mutex, thread-join and monitor support using pure C and one type
> of
> > OS-supplied synchronization object (mutexes).
> 
> Yes, perhaps it will happen.  In the mean time, why not speed up the
> implementation as it stands?  It will have the nice side effect of
> speeding up all platforms - not just those somebody finds the time to
> write specialised code for.

No-one is stopping anyone from speeding up the implementation.

> 
> > >
> > > As I said in an earlier post, if don't free the monitors you
> > > can get away with one global lock - when the monitor is first
> > > attached to the object.  As most monitor objects are long
> > > lived this generally has little performance impact - even if
> > > you use pthreads.
> >
> > That is true but this will mean that you need a way to free the monitor
> when
> > the object gets collected.  You can do this by allocating the monitor on
> the
> > GC-ed heap.  The only problem then is that you won't be able to allocate
> > primitive arrays using GCAllocAtomic.  This means that large (large!)
> arrays
> > of primitives will have to be unnecessarily scanned by the GC.  Another
> way
> > is to attach finalizers to each and every object.  This has side-effects
> as
> > well.
> >
> > I don't think it's really possible to know which would be "better"
> without
> > some real world benchmarks -- I think both techniques work well in
> different
> > situations.  I originally did "just" allocate monitors on the GC heap
> (it
> > was HEAPS easier) and that might be something that needs to be explored
> > again.  At the time I figured using the double-check algorithm would pay
> off
> > in the long run.  There's obviously a lot of optimisation that can be
> done
> > but getting it to work is of course the number one priority.
> 
> Well, right now the situation we have is:
>   - the code is complex, and as a consequence hard to understand,

Lots of things in pnet are hard to understand.  I don't see that as a reason
not to attempt to do it that way.  Concurrency is always tricky!

>   - it reportedly runs slowly, or at least so people say on IRC - I
>     admit that might be just idle gossip,

How can you or anyone else *possibly* know that without comparing it to
something else and then how can they be sure it's the monitor code that's
the bottleneck?

Optimisation early on is the root of all evil.  IMO, the algorithm will pay
off later but if there is time on offer, we could try a simpler version in
the mean time.

Remember, with the current algorithm, any uncontested request for a lock
(studies have shown that this is the most likely case) will result in no
global locks and just a compare and exchange.  The lockword combined with a
CAS provides a fast method of testing and acquiring an uncontested monitor
without using any locks.

>   - right now it doesn't work.

It works but there is a race condition in the Pulse code which hasn't been
fixed yet.

> 
> I do agree that current the code is probably the fastest implementation
> possible - no global blocks at all with the right machine specific
> support.  Can't we come up with something that combines the two - that
> uses the double check algorithm on machines that can support it, but on
> machines that don't revert to the GC based algorithm?
> 
> The GCAllocAtomic problem looks solvable - even if we just make it a
> special case.

It looks like we can solve the GCAllocAtomic problem eloquently by using the
typed allocation feature of the boehm gc (see gc_typed.h).

Have two algorithms depending on the machine support sounds like we're
making it even more complicated to me but it's probably a good idea if
someone has the time to do it -- it shouldn't be hard but remember to use GC
typed allocation.  The current implementation black-boxes how monitors are
associated with objects so that support thin/hashing monitors doesn't
require changing the monitor acquisition algorithm.  Try not to change this
cause thin locks are important for embedded devices with limited memory ;).

> 
> In the mean time, how shall we proceed wrt the source?  Do you need it?
> Do you want me to just sent it to you?  Do you want to organise to get
> the CE Machine talking to you at some later stage?
> 

I have since written a test case which causes the deadlock so I don't need
it (yet).  I can't debug until I get my linux box back up and running.
Sorry, I should have been clearer!  We'll if the problem on CE remains after
I fix the bug -- if it still remains then I may need access to a CE machine.
Thanks heaps for the offer.

I can add support for GC-heap monitors in the weekend -- let me know if you
would prefer to work on this yourself.

Regards,

^Tum



reply via email to

[Prev in Thread] Current Thread [Next in Thread]