discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] How to utilize multi-thread processor


From: Tom Rondeau
Subject: Re: [Discuss-gnuradio] How to utilize multi-thread processor
Date: Sun, 2 Sep 2012 17:12:00 -0400

On Sun, Sep 2, 2012 at 5:22 AM, Qing Yang <address@hidden> wrote:
> Hi Tom,
>
> We are profiling our codes on Xeon w3530(8 cores)+12GB memory+N210, and find
> some interesting issues.
>
> 1. The receiver works well at 1MHz sample rate, we see each core is 10%~20%
> occupied using system monitor. Once we set sample rate larger than 1M (say
> 2M), the program blocks(no decoding output) and we see only one core is 100%
> occupied while others are idle. Using Kcachegrind, we see 86% cpu time is
> cost by function "raw_peak_detector_fb::work(...)". This function is used by
> the first module (synchronization) of RawOFDM, I think this is the module
> that choke the system. My first step is to dig into this module and try to
> make it faster.

Qing,
Sounds like you're on the right track to id the low-performing blocks
to optimize them.

> 2. In the ordinary case (1MHz) both the transmitter and receiver call the
> function "gr_multiply_cc::work()" frequently, and its cost is quite high
> (nearly 18% of the program). I think there are methods to boost this
> function, right? Perhaps the VOLK lib will help, I will try it out.

In the current release (since 3.6.0, if I recall), the gr_multiply_cc
function has used VOLK. So make sure that you've run volk_profile on
your machine to select the best version of the kernel to use at
runtime. As it is, you're probably not going to be doing any better
than this for performance of a complex multiply. It's likely that the
blocks giving you specific problems are those running at the highest
sampling rate. You might think about how to re-engineer the system to
avoid doing this or to somehow wrap the multiply into another block's
function as opposed to trying to optimize this particular block.

> Sincerely,
> --
> Yang, Qing
> Information Engineering, CUHK

Tom


> 2012/8/28 Tom Rondeau <address@hidden>
>>
>> On Mon, Aug 27, 2012 at 7:07 AM, Qing Yang <address@hidden> wrote:
>> > Hi there,
>> >
>> > I am currently doing a OFDM transceiver project based on RawOFDM. We
>> > want to
>> > implement 20MHz bandwidth transmit/receive, but the RawOFDM code seems
>> > to
>> > support only narrow band (<1MHz). Once I set the sample-rate larger than
>> > 1MHz, my program will block with overrun messages (more details here
>> >
>> > http://lists.gnu.org/archive/html/discuss-gnuradio/2012-08/msg00069.html). 
>> > I
>> > think the reason is that at 20MHz sample-rate, USRP produces too much
>> > data
>> > for the PC to process and drain PC's computation power.
>> >
>> > To boost the speed, I have two questions
>> >
>> > 1) My cpu have 8 threads(4 cores), can I manually dedicate one thread to
>> > each gr block, and make it a pipe-line system? Tom mentioned that
>> > gnuradio
>> > use a "thread-per-block" scheduler
>> >
>> > (http://lists.gnu.org/archive/html/discuss-gnuradio/2010-09/msg00274.html)
>> > but in my case only two threads are 100% occupied when I run the
>> > program.
>> >
>> > 2) Inside some blocks, we extensively use vector multiplications (e.g.,
>> > precoding, CFO compensation). I've heard about the use of SSE to boost
>> > the
>> > speed of vector multiplication. How can I utilize this technology in my
>> > program?
>> >
>> >
>> > Best regards,
>> > --
>> > Yang, Qing
>> > Information Engineering, CUHK
>>
>>
>> Qing,
>>
>> Yes, the default scheduler is the thread-per-block, so each block
>> operates in its own thread, and the OS will distribute those across
>> the CPU's. What you are seeing is probably that two blocks in
>> particular are taking a long time to process and starving the others.
>> So CPU affinity won't help you. From your other posts, it looks like
>> you are trying to profile the code. That's the better way to go;
>> figure out which blocks are taking the most time and try to optimize
>> them.
>>
>> Tom
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]