freepooma-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freepooma-devel] Re: ReRe: [pooma-dev] SIMD


From: Richard Guenther
Subject: Re: [Freepooma-devel] Re: ReRe: [pooma-dev] SIMD
Date: Thu, 17 Mar 2005 09:19:59 +0100 (CET)

On Wed, 16 Mar 2005, Roman Krylov wrote:

> Hi all.
> Richard,
> gcc 4.0 (I have gcc version 4.0.0 20050130 (experimental)) provides
> autovectorization for c++,
> but, as you had noticed it doesn't vectorize pooma loops.
> Is the main reason for that that vectorizer can autovect only local &
> aligned arrays?

Yes, the vectorizer is not powerful enough in 4.0 - I've been told
this may change with 4.1.  But you may have noticed that the Intel
compiler is also not able to vectorize any of the POOMA loops.

Another difficulty for the analyzers is aliasing - we don't mark
the LHS restrict, so something like the Intel #pragma ivdep is
necessary, but even with that vectorizing seems difficult.

> Maybe it would be good to have some static cache and distribute it among
> args and ret : split evaluation loop by the number of num_args+1 and
> have internal loop with that size.
> The cache is filled by another loop before that inner loop is reached;
> The size of the cache is determined by the user on program startup;
> Sorry if I'm talking nonsense, merely I am fascinated by vectorizer - on
> primitive loops it gave 2.3 factor in performance on my P4(with sse2 as
> I think).

It's impossible to do this in general, because we have only a
LHS object and a RHS object where in the RHS object a full
expression is encoded, which can have an arbitrary number of
data sources, like f.i. the expression

 a = b + c;

is given us as

 LHS = RHS

and we have no easy way to construct local copies of the source
arrays of b and c (because we don't know that there are exactly
two).  I also think that the extra copying will remove any
benefit we get from the vectorization.

So we really need to wait for a better vectorizer.  There is
autovect branch in the gcc cvs repository where there is support
for #pragma ivdep and also peeling for data alignment.  Together
with the right inlining that _may_ be enough - but I didn't try.

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]