[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Shift in C++

From: Przemek Klosowski
Subject: Re: Shift in C++
Date: Tue, 30 Sep 2003 11:16:01 -0400 (EDT)

    left: for (int i = 0; i < length; i++) { retval(i) = retval(i+1); }
   right: for (int i = length; i > 0; i--) { retval(i) = retval(i-1); }

   To shift left, there is no problem, but when I shift to the right, it
   goes very slow...anyone an idea?

(note: valid retval indices go from 0 to length-1, so your 'right' code
   is wrong by one; also, shift usually rotates elements; otherwise you
   have to worry about what happens at the end)

Welcome to the wonderful world of modern CPUs. They execute an
instruction every 0.3 ns, but they still have to wait 50 ns for an
operand, if it is not in the cache. In other words, if the data is
never in the cache, your calculation speed is reduced by a factor of
150; that is why it is very important to write cache-friendly code.
In fact, the ATLAS library excels in speed because it painstakingly
figures out the right cache-friendly access pattern to the data;
its algorithm is fixed, and only blocking sizes are adjusted.

To illustrate this, consider the following scenario: we have a loop
going over a floating point array (4 byte numbers). Say the CPU cache
line is 64 bytes; on a certain loop iteration, we access an array
element that is not in the cache, and so we trigger a cache miss and
associated 50ns delay, during which 64/4 = 16 floating numbers are
brought into the cache (this is simplified wrt. what really happens in
modern CPUs but should have the right flavor).

In the optimal case, the subsequent iterations use consecutive array
elements that we just brought in the cache; i.e. there will be 16 loop
iterations which will have waiting for us in the cache, so the 150
cycle cache miss will only happen 1/16 of the time.

In the worst case, subsequent iterations require elements from the
array that are not adjacent to the current one, and so the laboriously
brought in cache line will be promptly trashed and replaced by another
line. This is can happen for instance when traversing row-first arrays
in column-first order, or using indirect indexing (A(indexarray(i)), etc.

In practice, we tend to get something in between a perfect, always in
cache, behavior, and the pessimal, always-missing morass---but this
means a tremendous range of speeds depending on where we are within
this range. Additionally, code generated for the simple increment case
is likely faster, so that's what you want to do, especially in C++.

Bottom line: see how much better you'd run with 

    for (int i = 1; i < length; i++) { retval(i) = retval(i-1); }

followed by retval(0)=retval(length-1) if you would want circular behaviour

Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:
How to fund new projects:
Subscription information:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]