[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Discuss-gnuradio] Writing SIMD code with sse
From: |
Dominik Auras |
Subject: |
Re: [Discuss-gnuradio] Writing SIMD code with sse |
Date: |
Wed, 12 Dec 2007 19:45:10 +0100 |
User-agent: |
Thunderbird 2.0.0.9 (X11/20071115) |
Hi!
The intrinsics are more or less C wrapper functions for assembler
commands. You can find a detailed description here:
http://www.intel.com/products/processor/manuals/index.htm
SSE1-3 is supported by modern AMD and Intel processors.
There are many possible improvements, but you need to have
processor-specific selection of code.
An example for intrinsics:
typedef float v4sf __attribute__ ((vector_size(16)));
typedef short int v8hi __attribute__ ((vector_size(16)));
typedef int v4si __attribute__ ((vector_size(16)));
v4sf * o = static_cast<v4sf*>(buffer->write_pointer());
const v8hi * in = reinterpret_cast<v8hi*>(usrp_buffer);
for(i = 0; i < nbytes; i+=16, o+=2, ++in){
const v8hi x = *in;
o[0] = __builtin_ia32_cvtdq2ps(
__builtin_ia32_psradi128(
reinterpret_cast<v4si>(
__builtin_ia32_punpcklwd128(x,x)),16));
o[1] = __builtin_ia32_cvtdq2ps(
__builtin_ia32_psradi128(
reinterpret_cast<v4si>(
__builtin_ia32_punpckhwd128(x,x)),16));
}
The code snippet fastly converts the shorts the usrp delivers to floats,
using SSE. Actually, it ignores the endian-order and assumes
little-endian. The buffer size is supposed to be a multiple of 16 bytes.
Dominik