discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] GNURadio and CUDA reprised


From: Tom Rondeau
Subject: Re: [Discuss-gnuradio] GNURadio and CUDA reprised
Date: Wed, 12 Jan 2011 11:03:46 -0500

On Wed, Jan 12, 2011 at 9:56 AM, Steven Clark <address@hidden> wrote:
> On Wed, Jan 12, 2011 at 2:44 AM, Moeller <address@hidden> wrote:
>>
>> On 11.01.2011 23:13, Andrew Hofmaier wrote:
>> > I've begun to look into accelerating GNURadio applications with Nvidia
>> > CUDA GPU's
>> > and have scanned through the archives of the discussion list.  I had two
>> > questions on the topic:
>> >
>> > 1.  Is the CUDA-GNURadio port done by Martin DvH circa 2008 still
>> > available and runnable?  All links I've seen are broken.
>>
>> Is CUDA really suitable? There is a certain overhead in data
>> communications.
>> CUDA is only useful, if it can compute complex things without
>> communicating.
>> But a data streaming application needs lots of I/O.
>> The CPU with SSE is also very fast in things like FFT.
>> I made some experiments with CUDA, but they were not very successful,
>> far below the peak FLOPS you get in benchmarks.
>> But I'm not an experienced programmer ...
>>
>> > 2.  Much of the results I've seen, both here and elsewhere, suggest that
>> > CUDA is not typically applicable to general GNURadio applications.  It
>> > has worked in specific cases, but only where the data throughput
>> > requirements are very high and the algorithms are extremely
>>
>> Yes, I had the same experiences. I tried to let CUDA do the
>> one-dimensional FFT.
>> It was slower than on CPU, had a large communication overhead.
>> Maybe better with larger FFT sizes, or with 2D FFT, or better programming
>> ...
>> In contrast, the sample programs were very fast, but also very special
>> like Fractals computing, Image processing or particle physics.
>>
>> > these cards for GNURadio applications?  Some of the major relevant
>> > improvements are the ability to concurrently schedule multiple kernels
>> > and asynchronously perform memory transfers.
>>
>> I think important is that the kernels have to compute very much, compared
>> to data transmission tasks. 1D FFT is not very computing-intensive,
>> related to
>> data shifting. What kind of algorithm do you want to port to CUDA?
>>
>>
>> _______________________________________________
>> Discuss-gnuradio mailing list
>> address@hidden
>> http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
>
> I've done some work with both CUDA and GNURadio, and I think there's
> definitely some potential there for using them jointly, but only for certain
> applications, and only if the software is architected intelligently.
>
> GPUs are incredibly powerful, with 1+TFLOP operation and 100+GB/s memory
> speeds within the GPU. I've used GPUs to perform real-time signal processing
> on 300+MHz of continuously-streaming data, without dropping a sample. But
> the PCI bus bandwidth of ~5GB/s can sometimes be a real bottleneck, so you
> have to design accordingly.
>
> You DON'T want to try to make individual drop-in CUDA replacements for
> multiple GNURadio processing blocks in a chain. It doesn't make any sense to
> send data to the GPU, perform an operation (eg filtering), bring the result
> back to the host, send some more data to the GPU, perform a 2nd operation,
> bring the data back, etc. The PCI transfers will eat you alive. The key is
> to send large chunks (10s or 100s of MBs) of data to the GPU, and do as much
> computation as possible while there. Large batched ffts, wideband frequency
> searches, channelizing, it's all gravy. It's great if you can stream
> wideband data to the GPU, have it do some computationally intensive stuff,
> perform a rate reduction, then stream the lower bandwidth data back to the
> host to do further (annoyingly serial) operations. You could even (if you
> wanted to) implement an entire transmitter or receiver within the GPU, with
> the CPU solely shuttling data to or from the ADC/DAC.
>
> In summary, yes please do get excited about CUDA/OpenCL -- it's great
> technology. When the USRP 9.0 comes out with a gigasample ADC/DAC, GPUs are
> there ready to do the heavy lifting :)
>
> -Steven


Steven,

That's great information and about along the lines of what I was going
to say (sans the example of doing 300 MHz of processing since I
haven't done anything that wide on it).

I wanted to throw out another idea that no one seems to be bringing
up, and this relates to a comment back about how CUDA is limited
because of the bus transfers. That's not CUDA that is doing that but
the architecture of the machine and having the host (CPU) and device
(GPU) separated on a bus. That has nothing to do with CUDA as a
language.

But I keep thinking about the new Tegra from nVidia and to a lesser
extent Sanybridge from Intel. These are showing a trend of moving GPUs
and CPUs together on the same die. Sandybridge isn't really exciting
from this perspective (yet) since their GPU core isn't very powerful
and (I don't believe) CUDA-enabled. My point is, though, that the
trend is exciting, and we are starting to see architectures that are
moving away from the bus issues that are the biggest problems with GPU
programming right now. Any effort spent now on working on GPU
programming I think will have legs far into the future as the
architectures become more amenable to our kind of problems.

Currently, though, GPUs still have a place for certain applications,
even in signal processing and radio. They are not a panacea for
improving the performance of all signal processing applications, but
if you understand the limitations and where they benefit you, you can
get some really good gains out of them. I'm excited about anyone
researching and experimenting in this area and very hopeful for the
future use of any knowledge and expertise we can generate now.

Tom



reply via email to

[Prev in Thread] Current Thread [Next in Thread]