Re: [Discuss-gnuradio] CUDA-Enabled GNURadio gr

Re: [Discuss-gnuradio] CUDA-Enabled GNURadio gr_benchmark10 possible im

From:

Yu-Hua Yang

Subject:

Re: [Discuss-gnuradio] CUDA-Enabled GNURadio gr_benchmark10 possible improvements

Date:

Tue, 30 Jun 2009 02:52:19 -0400

cuda_muiltiply_const_ff_kernel.cu

__global__ void
cuda_multiply_const_ff_kernel(const float* g_idata, float* g_odata,const int noutput_items,const float konst)

int
get_cuda_multiply_const_ff_kernel_params ( cuda_multiply_const_ff_kernel_params *params )
{
int result=0;
//const unsigned int max_num_threads_per_block = MAX_NUM_THREADS_ALL;   //can use the maximum number of threads if wanted
//unsigned int max_num_blocks         = MAX_NUM_BLOCKS_ALL;

unsigned int num_blocks=4096 ;// = gridDim.x;                                 //NUM_CUDABLOCKS
unsigned int num_threads_per_block=512;// = blockDim.x;                     //NUM_THREADS;
unsigned int num_outputs_per_block=num_threads_per_block;

const unsigned int num_outputs_per_grid= num_outputs_per_block*num_blocks; //(blockDim.x)*gridDim.x

size_t dynamic_shared_mem_size = 0;//256*sizeof(float);//0;//num_threads_per_block*sizeof(gr_complex);
dim3 griddim( num_blocks, 1, 1);
dim3 threaddim( num_threads_per_block, 1, 1);

params->griddim=griddim;
params->threaddim=threaddim;
params->dynamic_shared_mem_size=dynamic_shared_mem_size;
params->num_outputs_padded=num_outputs_per_grid;
params->num_inputs_padded=num_outputs_per_grid;
params->num_inputs=0;//num_outputs_per_grid;//num_outputs;
params->num_outputs=0;//num_outputs_per_grid;//num_outputs;

//Now you can do the kernel invocation like this:
//cuda_multiply_const_ff_filter_kernel<<< params->griddim, params->threaddim, params->dynamic_shared_mem_size >>>(g_idata, g_odata, params->num_outputs_padded*X,konst);
return result;
}

The kernel invocation is completely commented out! The result is initialized as 0 at the top and returns it. All the work in between to specify and allocate thread, block sizes does not seem to matter. Not sure why this code exists this way, did someone make an edit or did Martin specifically commented out the kernel invocation? Is this suppose to be this way? I don't see how this can be a proper benchmarking if it seems that we just test about allocating threads and blocks on the device and memory access times, but really don't do any computation.
I am probably way off here, doesnt make any sense......someone please clarify!

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Discuss-gnuradio] CUDA-Enabled GNURadio gr_benchmark10 possible improvements, Yu-Hua Yang <=

Re: [Discuss-gnuradio] CUDA-Enabled GNURadio gr_benchmark10 possible improvements, Martin DvH, 2009/06/30