freepooma-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [pooma-dev] Yes, Vector temporaries do appear in every operation...!


From: Richard Guenther
Subject: Re: [pooma-dev] Yes, Vector temporaries do appear in every operation...!!
Date: Fri, 28 May 2004 13:35:15 +0200 (CEST)

On Fri, 28 May 2004, Radek Pecher wrote:

> | Note that without your debugging stuff in the constructors, these
> | get inlined and optimized away by the optimizer.  Of course one
> | could argue creating the copies should be avoided in the first
> | place, but I cannot see how this can be done, as, f.i. for
> | BinaryOp<Vector1, Vector2, OpMultiply>::operator() we clearly need
> | to return a _new_ Vector as result.  To avoid this one would have
> | to expression-template the vector itself, so only primitive
> | variable types are ever copied.  But I don't think this will work
> | or pay off.
>
> I actually compiled the code with the original (unmodified) version of
> Vector.h first and used GDB to run it and disassemble it. Without
> much analysing, I noticed several looping jumps at the place of the
> algebraic expression which only confirms that the optimising compiler
> did not produce the required code:
> v2(0) = v1(0)*v1(0) + v1(0)*v1(0);
> v2(1) = v1(1)*v1(1) + v1(1)*v1(1);
> as was supposed to. (And I also tried several other optimisation
> configurations, of course.)

I don't have these temporaries.  Compiling with gcc 3.4, using options
-O2 -funroll-loops -DNOPAssert -S I get:

.L171:
        fldl    -24(%ebp)
        leal    -24(%ebp), %eax
        leal    -72(%ebp), %ecx
        fldl    -16(%ebp)
        fxch    %st(1)
        movl    %eax, -88(%ebp)
        fmul    %st(0), %st
        fxch    %st(1)
        movl    %eax, -84(%ebp)
        leal    -104(%ebp), %edx
        fmul    %st(0), %st
        fxch    %st(1)
        movl    %eax, -120(%ebp)
        movl    %eax, -116(%ebp)
        leal    -56(%ebp), %eax
        cmpl    %eax, %ebx
        fstl    -72(%ebp)
        fxch    %st(1)
        fstl    -64(%ebp)
        fxch    %st(1)
        fstl    -104(%ebp)
        fadd    %st(0), %st
        fxch    %st(1)
        fstl    -96(%ebp)
        fadd    %st(0), %st
        fxch    %st(1)
        movl    %ecx, -136(%ebp)
        movl    %edx, -132(%ebp)
        fstl    -56(%ebp)
        fxch    %st(1)
        fstl    -48(%ebp)
        je      .L282
        fxch    %st(1)
        fstpl   -40(%ebp)
        fstpl   -32(%ebp)
        jmp     .L179
        .p2align 4,,7
.L282:
        fstp    %st(0)
        fstp    %st(0)
        .p2align 4,,15
.L179:

which I haven't analyzed for optimal-ness in detail, but certainly there
is no loop left and no calls to constructors/destructors.  There are
unnecessary stores to not-removed temporaries though.

> As to the need for the return of a Vector, I suppose that
> Vector<2, double, BinaryVectorOp<...> > is all is needed (with the
> references to its two operands). There is no need at all to take this
> object and make its Full-engine copy for any subsequent operations.

Well, yes, this would be a step to expression-template the vector
classes.  You then need assignment operators / constructors that know
how to transfer this into a regular Vector - which would be the expression
template expanders.

Maybe it's really simple - you might want to try ;)

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]