openexr-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Openexr-devel] Vector-of-float to vector-of-half conversion?


From: James Bowman
Subject: Re: [Openexr-devel] Vector-of-float to vector-of-half conversion?
Date: Wed, 20 Jul 2011 16:59:18 -0700

Here's my conversion code.  It handles infinity, overflows and underflows.  But it does not handle denormals or NaNs.  It also truncates the mantissa, instead of rounding it.  As I clock it, it's about 2X the speed of the conversion in "Half".

#include <xmmintrin.h>

#define Ki(x) _mm_set_epi32((x),(x),(x),(x))

// Convert n floats at src to half-floats at dst.  n is a multiple of 4
static void qhalf(unsigned short *dst, const float *src, size_t n)
{
  while (n) {
    __m128i full = _mm_loadu_si128((__m128i*)src);
    __m128i sign = full & Ki(1UL << 31);                      // extract sign
    __m128i mantissa = full & Ki((1 << 23) - 1);              // extract mantissa
    __m128i exp8 = _mm_srli_epi32(full, 23) & Ki(0xff);       // extract exponent

    __m128i is_huge = _mm_cmpgt_epi32(exp8, Ki(0x8f));        // means flush to INF
    __m128i is_tiny = _mm_cmplt_epi32(exp8, Ki(0x70));        // means flush to zero

    __m128i exp5 = _mm_add_epi32(exp8, Ki(-0x7f + 0xf));      // rebias exponent

    // INF is represented by exponent=0x1f and mantissa=0
    exp5 = _mm_and_si128(_mm_or_si128(exp5, is_huge), Ki(0x1f));
    mantissa = _mm_andnot_si128(is_huge, mantissa);

    __m128i half = _mm_slli_epi32(exp5, 10) | _mm_srli_epi32(mantissa, 13);

    // Underflow: force to zero
    half = _mm_andnot_si128(is_tiny, half);

    // apply sign
    half = _mm_or_si128(half, _mm_srai_epi32(sign, 16));

    _mm_storel_pi((__m64*)dst, (__m128)_mm_packs_epi32((__m128i)half, (__m128i)half));

    dst += 4;
    src += 4;
    n -= 4;
  }
}


On Tue, Jul 19, 2011 at 11:19 AM, Florian Kainz <address@hidden> wrote:

Have any of you tried to convert vectors of 32-bit floating-point
numbers to vectors of 16-bit floating-point numbers in a way that
is faster than calling OpenEXR's half-from-float constructor for
each vector element?  Maybe using MMX, SSE or AVX instructions?

_______________________________________________
Openexr-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/openexr-devel



--
James Bowman
http://www.excamera.com/sphinx/articles-openexr.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]