discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] GNURadio and CUDA reprised


From: Michael Dickens
Subject: Re: [Discuss-gnuradio] GNURadio and CUDA reprised
Date: Wed, 12 Jan 2011 15:22:58 -0500

On Jan 12, 2011, at 2:56 PM, Moeller wrote:
> On 12.01.2011 14:25, Michael Dickens wrote:
>> the CPU).  I think that if a GPU can be used, it will be most effective in 
>> things like filterbanks, or when searching for packets (via their unique 
>> sync sequence, so matched filtering), or very large FIR filters -- places 
>> where a LOT of computations and data must be processed and can be 
>> parallelized easily.
> 
> Is there an efficient parallel FIR implementation for CUDA? You need only few 
> operations on
> a large set of data. So, isn't this too much for the stream-processor 
> local-memory?
> If GPU global memory has to be used, this would lead to a slower concurrent 
> access.
> And then there is still the transfer time from/to the computer RAM.
> It would be great to have a fast filter, but is it really faster than an 
> optimized SSE CPU FIR?
> I had the feeling, that the ratio of computing operations vs. number of 
> samples has to be
> high for a significant GPU vs. CPU speedup.
> I'm curious about how much speedup you can achieve for FIR filters
> (let's say large/sharp filters of 1024 taps).

The "very large FIR filters" was a thought, as an example of an operation that 
might benefit from a GPU at least when using OpenCL (or CUDA).  I haven't done 
testing yet to know if a GPU can do better than a CPU using vector instructions 
... but I'm getting there.  If/when I do get there, I'll post my results & 
thoughts.

Your comment about global versus local memory certainly does seem true from 
reading the OpenCL specs.  Most modern GPUs have 3 levels of memory: global 
(for the whole GPU, across all cores), core (across all kernel execution 
units), and kernel -- in order of decreasing size, increasing access speed, and 
increasing time to move data to/from.  I've been playing around with global 
memory only so far, but I'll look into the other levels as well to see what 
they can provide & the trade-offs required.

Good & interesting discussion! - MLD




reply via email to

[Prev in Thread] Current Thread [Next in Thread]