freepooma-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

compilers and speed


From: Richard Guenther
Subject: compilers and speed
Date: Sat, 26 Apr 2003 11:31:19 +0200 (CEST)

Hi!

I decided to just sum up what I do to get the most performance out of
POOMA. First, I use POOMA from CVS (that has bugfixes and support for
ISO conforming compilers) and cheetah with SCore MPI. To go anywhere
near hand-coded performance for simple Brick arrays, I need to use
recent gcc-3.3 with at least -O2 -funroll-loops --param
min-inline-insns=250, of course selecting the right -march and perhaps
-ffast-math helps in some cases. Intel icc is disappointing in
performance, but I didnt try using profile-directed optimization with it.

Performace compared to hand-coded loops is on-par as soon as you're going
out of L2 cache, within cache dont expect anything good from POOMA.

The real advantage of POOMA for single Brick arrays is the possibility to
adjust loop processing for cache optimality (i.e. do handcrafted
"multipatching" inside the evaluators) - still on my todo-list.

I never had KAI CC available to compare its performance, but I cannot
confirm that IRIX CC does a good job on optimizing POOMA. I hope Intel
icc will solve its problems, as then _very_ simple OpenMPization (I've
done it) can be applied to POOMA as well.

Hope, this answers most of the questions,

Richard.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]