[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant : Optimizati
From: |
Abhishek Bhowmick |
Subject: |
Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant : Optimization with VOLK |
Date: |
Wed, 26 Feb 2014 12:19:15 +0530 |
Thanks everyone. These are quite a few pointers, I will spend some time
digesting it all.
So there are really two approaches, large complex kernels on
one hand and AVX2/AVX/FMA on the other, or a combination of the two.
I guess I should propose identifying and implementing larger complex kernels
and then further accelerating using AVX2/FMA etc. Doing both will of
course limit the
number of applications/algorithms I can feasibly target. What's your take on
this ?
Abhishek
On Wed, Feb 26, 2014 at 5:03 AM, West, Nathan
<address@hidden> wrote:
> On Tue, Feb 25, 2014 at 4:37 PM, West, Nathan
> <address@hidden> wrote:
>>> > On Sun, 2/23/14, Abhishek Bhowmick <address@hidden>
>>> wrote:
>>> >
>>> > Subject: [Discuss-gnuradio] Google Summer of Code
>>> 2014 applicant : Optimization with VOLK
>>> > To: address@hidden
>>> > Date: Sunday, February 23, 2014, 8:52 AM
>>> >
>>> > Hello,
>>> > I have completed a Bachelor's degree in
>>> > Electrical Engineering from IIT Bombay, India and
>>> will be
>>> > joining a masters program in Computer Science in
>>> August. For
>>> > the summer, I am interested in participating GSoC
>>> 2014 and
>>> > GNU Radio is an organization wheAbhishekre my background
>>> fits
>>> > nicely.
>>> >
>>> >> > --------------------------------------------
>>
>>> > I went through the ideas page and was
>>> > particularly interested in doing performance
>>> optimization
>>> > with VOLK. After going through some online
>>> documentation
>>> > about the library and the SDR'12 paper, I
>>> realised that
>>> > following areas need work :
>>> >
>>> > 1. Profiling GNU radio code to identify new
>>> > kernels and implement them for existing Intel
>>> SIMD
>>> > extensions, also porting kernels to other ISA
>>> extensions.
>>> > 2. Better testing of the effects of more complex
>>> > scheduler logic on larger environments (beyond
>>> simple
>>> > kernels)
>>> >
>>> > 3. Exploring extension of Volk to GPU ISAs, to
>>> > leverage chips such as AMD Fusion (However, this
>>> seems to
>>> > more research than software development)
>>> >
>>> > According to the GSoC proposal, point (1) seems
>>> > to be the expectation. Given this, I would like
>>> some advice
>>> > on how to go ahead looking for potential ideas
>>> (and some
>>> > feedback on feasibility of the other ideas as
>>> well)
>>> >
>>> >
>>> > My background : C++, Python, Signal Processing,
>>> > Computer Architecture
>>> >
>>> > Thanks,
>>> > Abhishek Bhowmick
>>> >
>>
>>
>> This is a great conversation, and I'll take the opportunity to plug
>> the up coming VOLK working group call
>> (https://plus.google.com/u/1/events/ch3jrjcvp7mdiqelpismfieg3n0).
>> Bogdan, your results aren't particula> >
>> --------------------------------------------
>> rly surprising, but the feedback is really good to hear.
>>
>> Back to GSoC:
>>
>> Abhishek,
>>
>>>Thanks for the pointers to gr-atsc and gr-80211. I have started
>>>looking there as a
>>>starting point. Are there similar modules which are undergoing volk
>>>speedup fixes?
>>>I am also trying to meet up with other people who have been using GNU radio
>>>to identify potential modules for acceleration. As you are now a
>>>mentor organization, I feel it's a good time for us to get into
>>>detailed discussions.
>>
>> From the previous discussion it should be apparent that how algorithms
>> are implemented will make the biggest difference, and that the new
>> acceleration is primarily going to come from larger more complex
>> kernels. At the end of the day it's going to be your proposal. So far
>> on the list of places to look we have
>>
>> * in-tree OFDM (contact Martin)
>> * gr-atsc (use Andrew Davis' fork)
>> * gr-dvbt
>> * gr-fecapi
>>
>> For your proposal I would recommend looking at their code, then
>> getting in contact with the author(s) of those modules to ask about
>> their thoughts on accelerating blocks they have written. The reality
>> of this project is that we are accelerating some signal processing
>> algorithm and knowledge of that algorithm is useful for acceleration.
>> Whatever application you have interested and/or knowledge in (fresh
>> out of a BS it's more likely to be interest) should guide your
>> proposal. If you know anything about error correcting codes then the
>> latter 2 would be good fits. OFDM frame detection probably has a
>> gentler learning curve since at the basic level you're looking at
>> convolution, and there's papers you can look for on more involved
>> algorithms. Other algorithms to look at might include agc or
>> equalizers.
>>
>> If you're interested in GPU programming don't forget to checkout gr-gpu.
>>
>>>
>>>>
>>>> At the moment the only mainstream ISA not being targeted is probably
>>>> AVX2, which has
>>>> some nice features for the type of kernels we're doing. If you went
>>>> that route it would likely need add
>>>> protokernels to a pretty large number of kernels.
>>>>
>>>> Nathan
>>>
>>>This also seems to be promising, though I guess it would require me to
>>>come up to speed with AVX2 (which I would love to do). Could you
>>>please elaborate
>>>a little on the kind of beneficial features you have in mind ? I am
>>>concerned that the
>>>job of adding proto-kernels might turn out to be mundane/tedious ? Is
>>>that a valid concern ?
>>
>> Right, so as Martin mentioned the answer is sort of relative. I
>> wouldn't go so far as to say it's mundane, especially if you have
>> little
>> experienhttp://gnss-sdr.org/documentation/google-summer-code-2014-ideas-listce
>> with using intrinsics and SIMD instructions. One
>> reason AVX isn't so prominently featured (I suspect) is that the
>> instructions are almost the same as SSE instructions, but the vectors
>> are twice as long so that is actually mundane. AVX2/FMA extensions
>> introduce some new features to the amd64 instruction set. The most
>> obvious being that it looks like Intel and AMD finally settled in on
>> the same fused multiply-add (there's also a multiply-subtract that's
>> good for complex numbers) implementation. That will likely be able to
>> speed things up a bit, but I'm also looking forward to seeing gains
>> from the various load_gathers that have been introduced. They allow
>> you to do a single load operation that gathers vector elements that
>> span pretty large ranges. VOLK won't be so interested in the large
>> ranges (except maybe decimators), but it could be useful for loading
>> complex vectors. There's some other math functions we may be able to
>> leverage, but those are two features that I think would be widely
>> applicable.
>>
>> In your proposal you should definitely include what ISAs you intend to
>> use, and if there are features specific to that instruction set then
>> point out why it's a good choice. This is mostly important for
>> choosing between SSE and friends, AVX, AVX2/FMA. It would be good to
>> see plans that include NEON support for anything you'd add to amd64
>> platforms, but that's not a requirement.
>>
>>
>> Nathan
>
> I also see that GNSS-SDR made it to GSoC and they have a VOLK related project.
> http://gnss-sdr.org/documentation/google-summer-code-2014-ideas-list
Yeah, I also noticed that. I might submit a proposal to them also.
Abhishek