freetype-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: intel compiler support interest?


From: David Turner
Subject: Re: intel compiler support interest?
Date: Thu, 18 Jun 2020 15:24:38 +0200



Le dim. 14 juin 2020 à 07:06, Stephen McDowell <svenevs.dev@gmail.com> a écrit :
Hi Alexei,

It's only __builtin_shuffle that's a problem.  I'm a simd novice at best hehe.  I played around for a good long while trying to find an equivalent shuffle intrinsic, for now I was just working off of the GCC examples for __builtin_shuffle: https://godbolt.org/z/gPiZQL

It's technically successful, but with a big caveat that in order for me to try and translate this to the freetype code I need help understanding how the mask={0,1,1,3} gets transformed into 212 in emitted `pshufd xmm0, xmm0, 212` from the gcc __builtin_shuffle call.  Look for `#define MAGIC` in the example, anything stick out as to how that value is created?  If we know how that is done, I can begin looking into shorts (v82 type used in freetype code) rather than int in the example code.

212 decimal is 0xD4 hex, which is binary for 11010100, or 11_01_01_00 when separating 2-bit values, which corresponds to the {0, 1, 1, 3} mask in little-endian order.
For more details, see https://software.intel.com/sites/landingpage/IntrinsicsGuide/#cats=Swizzle&text=shuffle_epi32&expand=5144 which explains how the second argument to __mm_shuffle_epi32 is interpreted by the CPU.
 
I'm game to push a little further on it, but to be honest adding in conditional trickery for intel will make this code more confusing.  It's going to have to convert between v82 and one of the _mXXXi vector types and shuffle splitting (can't call _mm_shuffle* with v82 type).  In other words, while intel users may not get the fastest possible code, previously none of this code was vectorized anyway so it's kind of a wash.  That said, I totally understand the desire to vectorize it if we can :)

I think it makes sense to first disable the vectorized code path to get the source to build properly with the Intel compiler.
A second patch could try to optimize the code using Intel intrinsics on x86 and x86_64, this would probably be portable to more compilers. Not sure this is worth it though.


Let me know your thoughts!

-Stephen


On Sat, Jun 13, 2020 at 2:36 PM Alexei Podtelezhnikov <apodtele@gmail.com> wrote:
On Fri, Jun 12, 2020 at 8:07 AM Stephen McDowell <svenevs.dev@gmail.com> wrote:
> I help maintain the spack package manager when I can, currently users with intel compilers cannot build / install any version after 2.7.1 due to the usage of __builtin_shuffle (for some reason Intel still doesn't support this).

Is there by any chance an equivalent intrinsic?
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#cats=Bit%20Manipulation
What about __builtin_clz that FreeType also uses?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]