[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss-gnuradio] complex dotprod speedup
From: |
Eric Blossom |
Subject: |
[Discuss-gnuradio] complex dotprod speedup |
Date: |
Wed, 10 Nov 2004 16:42:51 -0800 |
User-agent: |
Mutt/1.5.6i |
Thanks to some serious SSE and 3DNow! hacking by Stephane Fillod, we
now have a much faster version of the complex/complex/complex dot
product function. This function is at the bottom of the
gr.freq_xlating_fir_filter_ccc function. This function performs
channel selection and digital downconversion on complex data. This is
really handy when using the USRP, since we're dealing with complex
data on the host.
Below is a selection of benchmark results on different machines. One
thing that I find interesting is the wide variation between the
generic and SIMD times as a function of machine microarchitecture.
The generic implementations are C++ with partial loop unrolling.
Eric
taps = number of filter taps
input = number of input samples
cpu = combined user+sys cpu time
taps/sec = derived performance measure. Higher is better.
=== This machine is a Pentium M (1.4 GHz) ===
address@hidden tests]$ ./benchmark_dotprod_ccc
generic: taps: 256 input: 4e+07 cpu: 121.476 taps/sec: 8.43e+07
SSE: taps: 256 input: 4e+07 cpu: 39.010 taps/sec: 2.625e+08
=== This machine is an Athlon MP 1800+ (1.5 GHz) ===
address@hidden tests]$ ./benchmark_dotprod_ccc
generic: taps: 256 input: 4e+07 cpu: 118.090 taps/sec: 8.671e+07
3DNow!Ext: taps: 256 input: 4e+07 cpu: 29.705 taps/sec: 3.447e+08
3DNow!: taps: 256 input: 4e+07 cpu: 33.213 taps/sec: 3.083e+08
SSE: taps: 256 input: 4e+07 cpu: 37.242 taps/sec: 2.75e+08
=== This machine is a Pentium 4 (1.7 GHz) ===
address@hidden tests]$ ./benchmark_dotprod_ccc
generic: taps: 256 input: 4e+07 cpu: 156.850 taps/sec: 6.529e+07
SSE: taps: 256 input: 4e+07 cpu: 23.241 taps/sec: 4.406e+08
- [Discuss-gnuradio] complex dotprod speedup,
Eric Blossom <=