bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] 80 core performance results


From: Juergen Sauermann
Subject: Re: [Bug-apl] 80 core performance results
Date: Tue, 13 May 2014 14:36:37 +0200
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130330 Thunderbird/17.0.5

Hi,

I guess I know what went wrong. The workload per thread was so small
(reading the CPU cycle counter and that was it) that the first threads will
have finished while the tasks were still being distributed.

Due to the lack of core binding, some cores would therefore be used several
times and could find the code already in their caches. So  the < 10000 cycles cases
shown for OMP are most likely coming from these cases.

I will update the benchmark to do some real work - no point to repeat the
measurements in a loop before that.

/// Jürgen





On 05/11/2014 05:02 PM, Juergen Sauermann wrote:
Hi Elias,

thanks, already interesting. If you could loop around the core count:

for ((i=1; $i<=80; ++i)); do
 ./Parallel $i
 ./Parallel_OMP $i
done


then I could understand the data better. Also not sure if something
is wrong with the benchmark program. On my new 4-core with OMP I get
fluctuations from:

address@hidden ~/apl-1.3/tools $ ./Parallel_OMP 4
Pass 0: 4 cores/threads, 8229949 cycles total
Pass 1: 4 cores/threads, 8262 cycles total
Pass 2: 4 cores/threads, 4035 cycles total
Pass 3: 4 cores/threads, 4126 cycles total
Pass 4: 4 cores/threads, 4179 cycles total

to:

address@hidden ~/apl-1.3/tools $ ./Parallel_OMP 4
Pass 0: 4 cores/threads, 11368032 cycles total
Pass 1: 4 cores/threads, 4042228 cycles total
Pass 2: 4 cores/threads, 7251419 cycles total
Pass 3: 4 cores/threads, 3846 cycles total
Pass 4: 4 cores/threads, 2725 cycles total

The fluctuations with the manual parallel for are smaller:

Pass 0: 4 cores/threads, 87225 cycles total
Pass 1: 4 cores/threads, 245046 cycles total
Pass 2: 4 cores/threads, 84632 cycles total
Pass 3: 4 cores/threads, 63619 cycles total
Pass 4: 4 cores/threads, 93437 cycles total

but still considerable. The picture so far suggests that OMP fluctuates much
more (in the start-up + sync time) than manual with the highest OMP start-up above manual
and the lowest far below. One change on my  TODO list is to use futexes instead of mutexes
(like OMP does), probably not an issue under Solaris sunce futextes are linux-specific.

/// Jürgen


On 05/11/2014 04:23 AM, Elias Mårtenson wrote:
Here are the files that I promised earlier.

Regards,
Elias



reply via email to

[Prev in Thread] Current Thread [Next in Thread]