[Swarm-Support] Re: Performance issues

swarm-support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Swarm-Support] Re: Performance issues

From:	Marcus G. Daniels
Subject:	[Swarm-Support] Re: Performance issues
Date:	Sun, 02 Feb 2003 11:24:49 -0700
User-agent:	Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.3b) Gecko/20030117

Bill Northcott wrote:

First it would be very surprising if Swarm ran as well on a PowerPCmachine as it does on a P4. As far as I can see all the recentdevelopment has been done on Intel hardware.

Mostly it was done on Sun hardware. That's what SFI and the SDG hadavailable.I would not say that it modified any programming, really. As far asprofiling goes, I'd say most of that happened on Sun, and then in someisolated nasty logistical cases on RedHat on Intel. Bleeding edgecompiler and toolchain things tend to work better on Linux-based systems.At first I could only profile full-native (GCJ-based) Java/Swarm modelson Redhat.

So it is sort of inevitablethat the code contains many optimisations for Intel architectures even ifmost of them are unconscious.

Unfortunately, the profiling of Swarm (again, mostly on Suns), hasconsisted of gprof after gprof run and memory profiles. Reduceempirically-observed bottlenecks and reduce memory usage. There hasn'treally been any attention given to profiling Swarm with cachesimulators. It would be a good thing to do, preferably on multiplearchitectures (or simulated architectures).

Excuse my ignorance, Marcus, but would your benchmark use both CPUs?

Nope, and neither would Swarm and neither would most applications.Compilers don't magically parallelize code. A different benchmark, saythe SQL example could be multithreaded, though (mysql would be).

Whether a level 3 cache helps reduces to what the working set of aproblem is and the memory access pattern. When that working set getsbig and the pattern sparse and disordered, then memory latency (or evenswap) will come to dominate runtime. With lots of agents running aroundin a simulation, and a shared landscape, I think a few megabytes ofcache isn't going to help a whole lot. But it really depends. Adetermined person can always measure and tune (and finally, for a givenarchitecture) given the tools and some knowledge.

Whether the level 3 cache helps a multithreaded program will reduce tothe memory demands of the two threads. My concern, especially for anymoderately complex agent-based model (assuming Swarm could spawnthreads, which it can't), would be that having two agents runninginstead of one agent just increases the chance that the cache will bebusted.

2. Vector processors. Apple/Motorola included these because they couldproduce huge speed ups in signal processing apps. Apple wanted it for itsmultimedia users, and Motorola for its big market in embedded telecommschips.It is extremely effective for certain types of problem. Witness thebenchmark for SETI which uses Altivec, and the impressive performance onPhotoshop filters and video codecs.I am sure Marcus' banchmark does not use the vector processors so asubstantial part of the CPU silicon is sitting there doing nothing.

If we are talking about how SIMD features can increase performance in anapplication then we need to be able to compare apples with apples. Itseems to me the comparison has to reduce to"if I compile program X with standard or easily-accessible tools onplatform A and B which runs faster?". It's not fair to say, "I reallymean program Y", which is what you are saying if you mean "you shouldreally write that program with the Altivec in mind". If that's thecase, then we can just say in response "you should really write thatprogram with the SSE2 Pentium 4 extensions in mind".

3. Disk throughput. As far as I can see, the target market for Xserve wasthe film industry and their extensive digital processing. These peopleare dealing in tens of terabytes of data. The published benchmarks forXServe show that it compares extremely well with 1U PC (Dell PowerEdge1650) servers when serving large files to multiple clients. Indeed theystand up well against other architectures with much higher price tags.

Apple's material show that the Dell PowerEdge 1650 and XServe havesimilar I/O performance. Well, that's nice the 1650 is the low-end rackserver based on the Pentium III!

So is any of this relevant to Swarm/ABM?  It seems to me that it is.
There has been plenty of discussion about multithreading/multiCPUs.Currently the stuff is not there in Swarm because Intel architecture didnot provide the hardware, but the benefits should be fairly obvious. IBMand Apple clearly think multiple cpus are the way to go rather than veryhigh clock speeds. I think the standard for Power4 is 4 cpus per header.It looks like Apple will start using this architecture with 64 bit chips(PPC970) from IBM later this year. It could provide very good priceperformance after the novelty premum has worn off.

When vendors have compiler technology or killer-apps thatauto-parallelize reliably, I'll buy this argument. We already had adiscussion here about some of the practical problems of multithreadedSwarm models. Until there are economical 4 or 8 processor systems, Ijust don't see the benefit of complex, delicate code for multithreadingof Swarm models. And I am highly skeptical that software-baseddistributed shared memory systems can provide fast enough memory accessto enable clusters to function well on agent-based models.In my experience there is a massive amount of parameter tweaking anditeration involved in understanding agent-based models, and thisiteration is easy to parallelize. Just run multiple simulations at oncewith different parameters on different CPUs.

Vector processors. It seems to me this would be the easier optimisationto put in Swarm. My misguided thoughts would be to look at random numbergenerators and agents using regression/neural net/signal processingdecision methods as prime candidates for Altivec speed up. I think thenecessary code is already incorporated in the current GNU compilersources. I am sure Marcus can comment on this much better than me.

There might be some opportunities for Altivec or SSE2 usage, but I thinkthey'd mostly be for add-in libraries. Like you say, neural nets,perhaps some GA fitness evaluations, etc. Whether it would justify thework, I don't know.

*I rather discount the comparison with multiCPU Xeon basedserver/workstation (Dell Precision 650s etc.) architectures. These cannotbe described as cheap PCs.

Intel has been cutting prices on Xeon chips lately.  The 2.8Ghz Xeons @$485 are 
around $100 cheaper than the 3Ghz Hyperthreaded Pentium 4s and $100 more than 
the 2.8Ghz Pentium 4s.   Dual chip Xeon motherboards start at about $300.  One 
issue for rack systems like the Xserve or Poweredge 1650 is heat and current 
usage.  In terms of price/performance for a fast CPU and I/O 
server/workstation, Xeon is not bad.

[Prev in Thread]

Current Thread

[Next in Thread]

[Swarm-Support] Re: Performance issues, Marcus G. Daniels <=
- [Swarm-Support] Re: Performance issues, Russell Standish, 2003/02/02

Prev by Date: [Swarm-Support] [Swarm Announce] Mailing lists have been migrated to www.swarm.org
Next by Date: [Swarm-Support] Re: Performance issues
Previous by thread: [Swarm-Support] [Swarm Announce] Mailing lists have been migrated to www.swarm.org
Next by thread: [Swarm-Support] Re: Performance issues
Index(es):
- Date
- Thread