[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Lzip-bug] Speedup by including intrinsics for vectorization

From: Erick Couts II
Subject: [Lzip-bug] Speedup by including intrinsics for vectorization
Date: Wed, 12 Oct 2016 23:10:06 -0500

I wanted to point out that no SSE intrinsics were included in the source code in order to vectorize the encoding process.  I've found that a small but decent speed gain can be achieved by including the immintrin.h header and then compiling with auto-vectorization enabled in GCC and LTO for linktime.  I also profiled the program after running with the --best option in order to further optimize the program.  The resulting gains were 57 sec with optimization to 69 sec without on a 134 MB file (contents of the MS Reserved Partition passed through dd).  I would recommend looking into adding the intrinsic header so as to allow GCC to automatically optimize the compilation based upon what CPU is in use.  Including a header for a later CPU will not add intrinsics which the CPU cannot handle to the program.
While I have seen a speed increase, it did increase the size of the final binary by about 4 KB as well.
I know that you like to keep code simple, but just adding in the #include immintrin.h to the headers will allow for auto-vectorization without requiring further changes to any of the existing code.
Anyway, hope this helps!

reply via email to

[Prev in Thread] Current Thread [Next in Thread]