discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Complex Short/INT16 type


From: Josh Blum
Subject: Re: [Discuss-gnuradio] Complex Short/INT16 type
Date: Tue, 08 Nov 2011 20:00:10 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1


On 11/08/2011 07:40 PM, Nowlan, Sean wrote:
> 3 quick questions - first, does the cmake setup automatically turn on
> gcc optimizations, i.e, with "-O3"?  Second, is there anything to be
> gained (or lost) by turning on "-ftree-vectorize" and
> "-funsafe-math-optimizations"? Finally, is the gcc on E100 really
> CodeSourcery's arm-none-eabi-gcc (or an upstream GNU version
> thereof)?
> 

CMake will automatically build in release mode, which gives you -03.
Other important flags need to be specified, you can do this in one fell
swoop with a toolchain file. Once is checked into the cmake/Toolchains
directory, see comments for usage

-josh

> Thanks, Sean
> 
> -----Original Message----- From: Nick Foster [mailto:address@hidden 
> Sent: Tuesday, November 08, 2011 4:10 PM To: Nowlan, Sean;
> address@hidden Subject: Re: [Discuss-gnuradio] Complex Short/INT16
> type
> 
> On Tue, Nov 8, 2011 at 12:50 PM, Nowlan, Sean
> <address@hidden> wrote:
>> So, what needs to be done? I noticed that there are already hooks
>> for NEON in the volk library but no implementation (or very
>> little... don't remember exactly).
> 
> Josh is putting together a little example that uses Volk in
> Gnuradio's core blocks (add, subtract, etc.). This will eventually
> (hopefully) become the replacement for much of the functionality in
> gnuradio-core. We've been talking about this for a long time, and it
> should provide a pretty major speedup on all platforms, but
> especially those for which the compiler sucks (ARM being the worst
> offender). Josh's example should provide a framework for you to work
> with while we get Volk fully integrated into Gnuradio "for real".
> 
> You can also always use Volk functions in your own custom dsp blocks
> to speed them up. You can also just use Volk outside of Gnuradio if
> you like.
> 
>> 
>> My understanding of Orc is that it generates architecture-dependent
>> vector processor instructions from an Orc abstraction language. Is
>> integrating Orc into Volk for NEON as simple as linking into liborc
>> with a compile switch indicating that we want NEON output? Are the
>> smarts already built into the cmake build process?
> 
> Orc is actually a little cooler than that -- it's a runtime-compiled
> architecture-independent vector assembly language. It's integrated as
> one alternative architecture for implementing Volk functions. Volk
> has been set up to automatically select the fastest implementation
> available for a given function at runtime, so for the user it's as
> simple as #include <volk/volk.h> and then
> volk_32f_x2_add_32f_a16(...) to implement an adder. Volk will
> automatically choose the fastest implementation at runtime the first
> time the function is invoked, after figuring out what architecture
> it's running on and what implementations are available for that given
> function. If an Orc version of a function is available, it will be
> automatically selected and the Orc code will runtime-compile to
> vectorized NEON. You don't have to link against liborc at all, just
> against libvolk. We don't have any native NEON in Volk -- we use Orc
> to provide coverage on NEON platforms. We've found that Orc tends to
> be around 90% as fast as good, hand-tuned assembly most of the time,
> and sometimes faster. The reason we don't just use Orc for everything
> is that it's usually possible to do a little better with careful
> optimization and compiler intrinsics, and we were "gifted" a large
> library of well-optimized SSE DSP routines to use.
> 
>> 
>> Can I drop Philip's _fff and _ccf filters into volk and hit "go?"
>> (I know there's more nuance to it, but if the combination of
>> integrating Orc code and NEON FIR filter code that's already
>> written gets me 90% of the way there, I'd be VERY happy!
> 
> You can, but the _fff and _ccf filters are already implemented and
> working in NEON. They were done by Phil before Volk was integrated,
> so they're written in assembly in the filter core. They are also
> automatically selected at runtime, so they should be "just working" 
> for you already. Eventually we'll pull the assembly implementations
> out and put them into Volk.
> 
> If you send me your flowgraph, I'll take a look at it on an E100 and
> see if I can get some things optimized.
> 
> --n
> 
>> 
>> Thanks, Sean ________________________________________ From: Nick
>> Foster address@hidden Sent: Tuesday, November 08, 2011 1:27 PM 
>> To: address@hidden Cc: address@hidden; Nowlan, Sean 
>> Subject: Re: [Discuss-gnuradio] Complex Short/INT16 type
>> 
>> Sean, with all the talk about optimization for ARM, the first thing
>> I would do is start to integrate Volk with existing floating-point
>>  blocks. Stock GCC is very, very bad at vectorizing for the NEON
>> SIMD unit -- even when hardware floating point is used in GCC, most
>> float instructions end up allocated to the VFP rather than the NEON
>> unit. You might find an easy 2x-3x improvement just by doing the
>> heavy lifting in Volk rather than in C++. All of the Orc functions
>> in Volk will work for NEON. There's no FIR filter in Orc right now
>> (need to get accumulators working properly in Orc), but Philip
>> Balister already wrote NEON FIR filter cores for the _fff and _ccf
>> FIR filters.
>> 
>> This isn't to say that short complex wouldn't be a useful addition
>> to GR. Just that it's likely going to be more work than making use
>> of the existing floating-point hardware the E100 already has.
>> 
>> This is work that needs to be done anyway to make ARM platforms as
>>  useful as possible, and we (Josh, Phil, and I) are happy to help
>> you optimize your application for E100 if you give us details on
>> how your application works. We're putting together a "motivating
>> example" using Volk to show users how to Volkify their own blocks.
>> 
>> --n
>> 
>> On Tue, Nov 8, 2011 at 9:13 AM, Josh Blum <address@hidden> wrote:
>>> 
>>> 
>>> On 11/07/2011 02:15 PM, Nowlan, Sean wrote:
>>>> Hi all -
>>>> 
>>>> I'm getting limited by the slow ARM processor in the E100 and I
>>>> want to modify parts of gr-digital and gnuradio-core to support
>>>> complex short/INT16 types in the modulation schemes. I suspect
>>>> that it won't be as trivial as defining "typedef
>>>> std::complex<short> gr_complexs;" in
>>>> gnuradio-core/src/lib/runtime/gr_complex.h and doing a
>>> find-and-replace in the relevant source files. There are
>>> probably
>>> 
>>> It may be that simple for some blocks. Like the symbol table in
>>> BPSK.
>>> 
>>>> issues with dynamic range that I'll have to deal with in
>>>> addition to having to implement filters using fixed-point
>>>> math.
>>>> 
>>> 
>>> Often blocks will need to have scale factors. Fortunatly, with a
>>> FIR filter, you get a free scale factor in the "filter taps"
>>> 
>>>> Questions:
>>>> 
>>>> 1)      Do you think I'd save anything by doing all the
>>>> modulation & filtering in complex float32 and then converting
>>>> at the very end?
>>> 
>>> Its good to make the conversion part of an operation that does 
>>> something useful rather than doing it for the sake of
>>> converting. Like a filter that takes in floats and spits out
>>> shorts.
>>> 
>>>> This will reduce the bandwidth requirement to the FPGA by two,
>>>> but I'm afraid the float math is the true limitation.
>>>> 
>>> 
>>> The format going into the FPGA is always integer. If you pass
>>> floats into the UHD, they are copy-converted from host buffer to
>>> memory mapped buffers.
>>> 
>>>> 2)      Why is there a gr_complex_to_interleaved_short block
>>>> but not a gr_complex_to_complex_short block? Would it be better
>>>> if I rolled my own or just hooked up a
>>>> gr_complex_to_interleaved_short block and then a deinterleave
>>>> block? Or alternatively, split the complex float vector into
>>>> two streams and feed them to a USRP sink block using 
>>>> COMPLEX.INT16?
>>>> 
>>> The interleaved short block is a strange hold-over from ancient 
>>> times. I would ignore it. I think a block such as
>>> "gr_complex_to_complex_short" is a good idea.
>>> 
>>>> 3)      What specific parts of the modulation examples or 
>>>> gnuradio-core do you think I need to change to support complex
>>>> short ints?
>>>> 
>>> 
>>> Probably some new sc16 filter blocks for the matched filters. I
>>> have mentioned the importance of volk before.
>>> 
>>> The constellation stuff relies on this new constellation library
>>> in gr-digital. Perhaps Ben can lean in here and offer some advice
>>> on how to modify this for alternative data types.
>>> 
>>> The recovery stuff in the BPSK is using Tom's new
>>> gri-control-loop to simplify writing things like FLLs, PLLs.
>>> Thats a place too look, see how the timing recovery blocks make
>>> use of it.
>>> 
>>> -Josh
>>> 
>>> _______________________________________________ Discuss-gnuradio
>>> mailing list address@hidden 
>>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>>> 
>> 
> 
> _______________________________________________ Discuss-gnuradio
> mailing list address@hidden 
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio



reply via email to

[Prev in Thread] Current Thread [Next in Thread]