discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] OpenCL FPGA Recommendation?


From: GhostOp14
Subject: Re: [Discuss-gnuradio] OpenCL FPGA Recommendation?
Date: Wed, 26 Apr 2017 08:19:55 -0400

Thanks Marcus!  I do know what the root cause is in the OpenCL implementation of the poor performance.  Maybe it'll help provide some background.  (I've actually been working on the gr-clenabled GNURadio blocks [in pybombs now] OpenCL study I published a month or so ago for about 4 months).  For OpenCL the massively parallel processing across a number of lower-throughput cores on data sets where the data can all be processed in parallel works well.  For instance calculations such as a[i] = b[i] + c[i].  All calculations can be handled in parallel and the lower performance of each core is offset by having 10's or 100's running at the same time for a good throughput boost. 

For calculations such as a Costas Loop where an error is calculated for each point then used in the next calculation, you can't run the calculations in parallel and they have to be done in order to get the right results. You can switch OpenCL to a task-parallel mode with a work set size of 1, but for GNURadio what it really amounts to because each block just gets 1 thread is running the same function on a single lower performance GPU core.  In that case the single-core GPU performance is an order of magitude worse than a general CPU core for the same task.

I know there's a number of IP cores for FPGA's focused on DSP, so my thought / hope was that for those algorithms that couldn't be done in parallel, that moving from CPU-speed to hardware speed on the FPGA would run faster.  Kind of like with RFNoC, just for more general purpose FPGA's. 

I think I'd still be okay if I had to pull the DSP blocks together in an FPGA dev environment like Xilinx Vivado as long as it could help generate the C++ interface code (I did see one article someone wrote on doing something like this), then just having to write the GNURadio block to interface with it.  I just don't know FPGA's well enough (and I know it's not a simple learning curve) to know.


---------- Forwarded message ----------
From: Marcus Müller <address@hidden>
Date: Wed, Apr 26, 2017 at 7:31 AM
Subject: Re: [Discuss-gnuradio] OpenCL FPGA Recommendation?
To: address@hidden


Dear Ghost,


On 04/26/2017 01:01 PM, GhostOp14 wrote:
> I tested it as a single task in OpenCL on a GPU and the performance
> was horrible so I want to get the same algorithm running on an FPGA
> and see if the performance significantly improves.
Gut feeling: I wouldn't spend any money on an FPGA implementation before
I have not understood why it worked so terribly on GPU, and have a good
reason why it should work better on FPGA. Frankly, I don't think you
realize how hard it is to properly optimize things for specific
architectures, and OpenCL on an FPGA will not be easier to "get right"
than OpenCL on a GPU.
>
> Given some high-bandwidth goals, I'm actually thinking either USB 3.0
> or PCIe would be the requirement.  I was looking at the Opal Kelly
> line like the one they have based on the Xilinx Artix-7.  I actually
> think the USB 3.0 interface if I can transfer runtime data to/from it
> at USB 3.0 speeds would be more portable (say laptop/desktop).  I'm
> still new to FPGA's so any other thoughts are much appreciated.  It
> looks like I may still have to work in Vivado and build the FPGA code
> but then I could interface with it from C++ and a GNURadio block?
Probably! Don't know the FPGA manufacturer's OpenCL tools and whether
they offer an easy-to-use interface to PC software.
>
> Am I on the right track?
Don't know – again, I'd recommend going into a much deeper analysis of
why things work badly on your CPU and GPU, and why an FPGA should make
that better.

Best regards,
Marcus


_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio


reply via email to

[Prev in Thread] Current Thread [Next in Thread]