lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Using auto-vectorization (was: Replacing boost with std C++11)


From: Vadim Zeitlin
Subject: Re: [lmi] Using auto-vectorization (was: Replacing boost with std C++11)
Date: Sat, 21 Jan 2017 02:20:17 +0100

On Fri, 20 Jan 2017 23:47:14 +0000 Greg Chicares <address@hidden> wrote:

GC> Would you like to propose a patch to 'expression_template_0_test.cpp'
GC> so that we can measure what you'd prefer against the other methods
GC> already tested there? It tests array lengths of 10*{0, 1, 2, 3, 4, 5}.
GC> It would be extremely interesting to see whether auto-vectorization
GC> has obviated the need for expression templates.

 At the first glance, it doesn't seem so. I've enabled auto-vectorization
for gcc6, using -O3 (notice that by default it's disabled as we use -O2
which doesn't include -ftree-vectorize) and as soon as it kicks in, which
happens for N=100, it results in significant (although smaller than I
thought, but maybe I was just unreasonably optimistic) gains for C,
valarray and PETE versions and smaller gains for STL and μBLAS, so the
former still remain faster.

 To give some numbers: for N=1000, C and PETE versions speed up is 48%
(438ns with O3 against 846ns with O2) and valarray is almost 50%, however
the difference between them is so small that it's inside the measurement
error interval and they are all roughly equivalent. Plain STL version also
gains 25%, meaning that it's even more slower than the fastest code: with
O2 STL plain is ~2.35 times slower while with O3 it's still ~3.4 times
slower. Fancy STL time is reduced by 35% with vectorization, but the end
result is still the same: instead of being 1.3 times slower, it's 1.6 times
slower with vectorization.

 So, if anything, using STL is even worse with auto-vectorization. But the
excellent news is that compiler manages to auto-vectorize PETE code just as
well as manual loops. And while it could be measurement error again, PETE
somehow consistently manages to be faster than C version, although the
effect is smaller and smaller as N increases, i.e.:

        N       PETE time in terms of C
        -------------------------------
            1    80%
           10    83%
          100    89%
         1000    97%
        10000   101%

I probably could spend more time looking at this, notably trying to
understand what exactly is -fopt-info-vec-missed telling me...


GC> I mention it, though, to ask your opinion of changing code like that,
GC> thus:
GC> 
GC> +     for(auto const& i : AllVectors) crc += *i.second;
GC> -     for(auto const& i : AllVectors)
GC> -         {
GC> -         crc += *i.second;
GC> -         }
GC> 
GC> when the resulting line is not too wide to punch on a single IBM 5081
GC> card, so I can still use my Port-A-Punch if the power goes off, with
GC> the incidental benefit that it looks nice in vim with a large font.

 I think this is rare enough to not merit a special exception.


GC> BTW, although I like vim, I've found that installing it has broken the
GC> 'u' and '/' keys in other programs. Now, in notepad-style editors,
GC> pressing '/' inserts a literal slash, and 'u' doesn't undo it, so I
GC> have to use the menus. Oddly enough, it didn't break 'less', and git
GC> still seems to work fine.:wq

 The worst bug is that Ctrl-W can close the window you're typing the text
in, notably in the browser. I remember being really annoyed by this, but
then I discovered Vimperator and since then I at least never had this
problem in Firefox.

 Regards,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]