discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss-gnuradio] CUDA-Enabled GNURadio gr_benchmark10 possible improve


From: Yu-Hua Yang
Subject: [Discuss-gnuradio] CUDA-Enabled GNURadio gr_benchmark10 possible improvements
Date: Mon, 29 Jun 2009 05:10:52 -0400

Hi

Has anyone able to successfully improve CUDA-Enabled GNURadio's performance? At the moment I am very new at this stuff so I am just looking at Martin's code without any really solid understanding. I know that the gr_benchmark10_test.py performance is slow computing on GPU due to the over-head memory calls to and from the CPU and GPU, and that if more compution/work is done per call, the GPU can out-perform the CPU. However looking at the gr_benchmark10 code, it seems that very trivial computations are being done to compare the CPU and GPU. Specifically:

testblock3= cuda.fir_filter_fff(1,taps)
testblock4= cuda.multiply_const_ff(1.0)
testblock5= cuda.multiply_const_ff(1.0)
testblock6= cuda.multiply_const_ff(1.0)

I attempted to "increase" the GPU performance by inserting very large floating point numbers as parameters to cuda.multiply_const_ff and also messing around taps which is declared by:

taps=range(1,64,1)

But in doing so, I assume that I am passing in "more work" to be done so the GPU should be faster, but it is not. the CPU still takes fractions of a second to complete (with large floating points) while the GPU takes a little over 1 second.

- Following this thread:http://lists.gnu.org/archive/html/discuss-gnuradio/2009-01/msg00378.html
 I would like to approach the problem by increasing computation intensity, thats why I am changing the benchmark parameters, but it doesnt seem to work, Am I approaching this correctly?

- From this thread: http://lists.gnu.org/archive/html/discuss-gnuradio/2008-11/msg00292.html
If I benchmark a single block with a big output_multiple then I do see
performance increases.
How do I do the above? How have the experts (Martin, Achilleas) been able to tweak the performance of CUDA-Enabled GNURadio to show that GPU computing can indeed be faster?

- Is there anyway to measure the time the memory calls to and from CPU and CUDA? This way we can know what exactly is the overhead.

Please help!!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]