help-gplusplus
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

boost::array loop unrolling performance


From: per . nordlow
Subject: boost::array loop unrolling performance
Date: 4 Jul 2006 03:04:10 -0700
User-agent: G2/0.2

Hi, C++ Lovers!

I am using the boost::array template class trying to generalize my
handcrafted vector specialization for the dimensions 2, 3 and 4.

As performance is of great importance to me I have written an initial
benchmarker that tests how well g++ can unroll loops whose number of
iterations can be determined at compile time or upon entry to the loop.
The gcc switch "-funroll-loops" should do just that. The test program
calculates the dotproduct of two four-dimensional arrays of int 10
million times and looks like follows:

The calculation is performed with a general and a specialized version
of
the dot product: general_dot() and special_dot() respectively.

#include <boost/array.hpp>
#include "../Timer.hpp"

template <typename T, std::size_t N>
inline T general_dot(const boost::array<T, N> & a,
                     const boost::array<T, N> & b)
{
    T c = 0;
    for (size_t i = 0; i < N; i++)
    {
        c += a[i] * b[i];
    }
    return c;

}

template <typename T>
inline T special_dot(const boost::array<T, 4> & a,
                     const boost::array<T, 4> & b)
{
    return (a[0] * b[0] +
            a[1] * b[1] +
            a[2] * b[2] +
            a[3] * b[3]);
}

template <typename T, std::size_t N>
std::ostream & operator << (std::ostream & os,
                            const boost::array<T, N> & a)
{
    os << '[';
    for (size_t i = 0; i < N; i++)
    {
        os << ' ' << a[i];
    }
    os << ']';
    return os;
}

typedef int S;                  //*< Scalar Type.

int main(int argc, char * argv[])
{
    typedef boost::array<S, 4> T;

    T a;
    a.assign(3);

    T b = a;

    Timer t;

    const unsigned int nloops = 10000000;

    S sum = 0;
    t.reset();
    for (unsigned int i = 0; i < nloops; i++)
    {
        sum += general_dot(a, b);
    }
    t.read();
    std::cout << "general: " << t << std::endl;

    S tum = 0;
    t.reset();
    for (unsigned int i = 0; i < nloops; i++)
    {
        tum += special_dot(a, b);
    }
    t.read();
    std::cout << "special: " << t << std::endl;

    if (sum == tum)
    {
        std::cout << "Checksums are equal. OK" << std::endl;
    }
    else
    {
        std::cout << "Checksums are not equal. NOT OK" << std::endl;
    }

    return 0;

}


Compiling with g++-3.3.6 using the switches "-O3 -funroll-all-loops"
and running this on my Pentium 4 yields the following benchmark:

  general: 60.965ms
  special: 902us
  Checksums are equal. OK

As we can see the performance of the general_dot() is terrible (~60
times slower) compared to the special_dot().

Is g++-3.3.6 really that bad at optimizing or have I forgotten
something?

Do I have to switch to gcc version 4.0, 4.1 or 4.2 to make g++ compile
the instantiation of general_code() to a code having similar/equal
performance compared to the one produced by special_code()?


Many thanks in advance,

Per Nordlöw
Swedish Defence Research Agency
Linköping
Sweden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]