discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Performance on ARM Cortex-A8


From: Philip Balister
Subject: Re: [Discuss-gnuradio] Performance on ARM Cortex-A8
Date: Fri, 15 Jul 2011 16:42:51 -0400
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110621 Fedora/3.1.11-1.fc14 Thunderbird/3.1.11

On 07/15/2011 04:24 PM, Marcus D. Leech wrote:
On 07/13/2011 04:40 AM, Riadh Elloumi wrote:
Hi all,

I complied DAB demodulation for ARM Cortex-A8 (TI OMAP 3). It
successfully demodulate DAB+ but spends 13 seconds decoding 1 second of
radio baseband (USRP file).

I used all the optimized code for Cortex-A8 like dotprod_ccf_armv7_a.c.
My compilation flags are: -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon
-O2. I used fftw-3.2.2.
What does -mfloat-abi=softfp do? Does that cause software floating-point
to be used?
If it does, then your floating-point performance is going to be
completely awful.

No, that chooses the soft float ABI only. Basically, return values can not be in NEON registers. This is not to bad, since we normally are passing pointers to arrays.

We can compile the entire system with the hard float ABI, but it is not a big win and adds some complexity for people using certain binary only libraries (which are usually built with soft float).


A good test for comparing oranges/oranges would be to construct simple C
program
that does, let's say, 10e6 single-precision floating-point
multiply/accumulate operations,
and compare among platforms with simiilar clock speeds, etc.

From a quick look at Tom's oprofile results, first find out who is calling into libm and see if you can change the block to stopp calling libm. For example, calculate sin/cos via a table approximation (I think GNU Radio already does that).

Then look at the signal processing blocks that are next in usage and do some NEON optimizations using ORC.

Philip



Why is gnu radio too slow demodulating DAB+? Do you have some figures of
CPU consumption on ARM Cortex cores? Is there some optimization I missed
for the platform?







reply via email to

[Prev in Thread] Current Thread [Next in Thread]