Re: [Discuss-gnuradio] GNURadio and CUDA reprised

discuss-gnuradio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] GNURadio and CUDA reprised

From:	Michael Dickens
Subject:	Re: [Discuss-gnuradio] GNURadio and CUDA reprised
Date:	Wed, 12 Jan 2011 15:22:58 -0500

On Jan 12, 2011, at 2:56 PM, Moeller wrote:
> On 12.01.2011 14:25, Michael Dickens wrote:
>> the CPU).  I think that if a GPU can be used, it will be most effective in 
>> things like filterbanks, or when searching for packets (via their unique 
>> sync sequence, so matched filtering), or very large FIR filters -- places 
>> where a LOT of computations and data must be processed and can be 
>> parallelized easily.
> 
> Is there an efficient parallel FIR implementation for CUDA? You need only few 
> operations on
> a large set of data. So, isn't this too much for the stream-processor 
> local-memory?
> If GPU global memory has to be used, this would lead to a slower concurrent 
> access.
> And then there is still the transfer time from/to the computer RAM.
> It would be great to have a fast filter, but is it really faster than an 
> optimized SSE CPU FIR?
> I had the feeling, that the ratio of computing operations vs. number of 
> samples has to be
> high for a significant GPU vs. CPU speedup.
> I'm curious about how much speedup you can achieve for FIR filters
> (let's say large/sharp filters of 1024 taps).

The "very large FIR filters" was a thought, as an example of an operation that 
might benefit from a GPU at least when using OpenCL (or CUDA).  I haven't done 
testing yet to know if a GPU can do better than a CPU using vector instructions 
... but I'm getting there.  If/when I do get there, I'll post my results & 
thoughts.

Your comment about global versus local memory certainly does seem true from 
reading the OpenCL specs.  Most modern GPUs have 3 levels of memory: global 
(for the whole GPU, across all cores), core (across all kernel execution 
units), and kernel -- in order of decreasing size, increasing access speed, and 
increasing time to move data to/from.  I've been playing around with global 
memory only so far, but I'll look into the other levels as well to see what 
they can provide & the trade-offs required.

Good & interesting discussion! - MLD

[Prev in Thread]

Current Thread

[Next in Thread]

[Discuss-gnuradio] GNURadio and CUDA reprised, Andrew Hofmaier, 2011/01/11
- Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Moeller, 2011/01/12
  - Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Sylvain Munaut, 2011/01/12
    - Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Michael Dickens, 2011/01/12
    - Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Marc Epard, 2011/01/12
    - Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Moeller, 2011/01/12
    - Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Michael Dickens <=
    - Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Steven Clark, 2011/01/12
    - Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Marcus D. Leech, 2011/01/12
    - Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Tom Rondeau, 2011/01/12
    - Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Moeller, 2011/01/13
  - Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Steven Clark, 2011/01/12
    - Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Tom Rondeau, 2011/01/12
    - Re: [Discuss-gnuradio] GNURadio and CUDA reprised, Douglas Geiger, 2011/01/12

Prev by Date: [Discuss-gnuradio] Open Source USRP ?
Next by Date: Re: [Discuss-gnuradio] Finally compiled USRP2 code works fine with UDPimage ...but not with compiled Raw Ethernet Image
Previous by thread: Re: [Discuss-gnuradio] GNURadio and CUDA reprised
Next by thread: Re: [Discuss-gnuradio] GNURadio and CUDA reprised
Index(es):
- Date
- Thread