help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Octave and threaded ATLAS and FFTW


From: Dmitri A. Sergatskov
Subject: Re: Octave and threaded ATLAS and FFTW
Date: Wed, 21 Jan 2004 15:37:02 -0700
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040115

Here is some experience with Octave 2.1.50 compiled with thread-enabled ATLAS
on 2xAthlonMP 2000 MHz. Linux Fedoras Core 1.

To make long story short -- it does seems to help with some matrix manipulations
(most notably multiplication). The gory details are following:

I used ATLAS 2.6.0 and compiled it (without using default parameters) on 
runlevel 1
(took some 5 hours). The binary available at
http://coffee.phys.unm.edu/dima/octave/Linux_ATHLONSSE1_2_das.tgz

I could not get configure to pickup libptcblas instead of libcblas, so after 
running
./configure --enable-shared --enable-dl --disable-static I manually modified 
Makeconf

orig:        BLAS_LIBS = -lcblas -lf77blas -latlas
changed to:  BLAS_LIBS = -lptcblas -lptf77blas -latlas

orgig:       LIBS = -lreadline  -lncurses -ldl -lm
changed to:  LIBS = -lreadline  -lncurses -ldl -lpthread -lm

It compiled fine. The only problem I found so far is 'cputime' start returning
rediculously small numbers (I reported it before in this thread).
I verify with a stopwatch that tic/toc returns correct numbers so I used it for
benchmarking. I started with Octave2.m benchmark from www.sciviews.org and
slightly modified it. I removed rand() from the first benchmark and increase
matrix sizes to make it size large than my cache (256k) and increase the 
execution
time to a few seconds. (http://coffee.phys.unm.edu/dima/octave/Octave2l.m)

The relevant numbers (full benchmark results are in the file
http://coffee.phys.unm.edu/dima/octave/bench2cpu.txt) are below.
The first column (Pthread on 2CPU) is the result of octave
linked with threaded ATLAS running on SMP kernel.
The second column (Pthread on 1CPU) is the same octave running on the same 
computer
booted into UniProcessor kernel.
The third column ("Normal" on 2CPU) is octave linked to the normal ATLAS
running on the same computer with SMP kernel.

Pthread on 2CPU         Pthread on 1CPU         "Normal" on 2CPU
transp., deformation of a 3000x3000 matrix (sec):
2.152                   2.188                   2.306

3000x3000 normal distributed random matrix ^1000 (sec):
1.401                   1.395                   1.409

Sorting of 2,000,000 random values (sec):
7.904                   8.049                   7.831

3000x3000 cross-product matrix (b = a' * a)(sec):
10.31                   19.26                   18.19

Linear regression over a 3000x3000 matrix (c = a \ b') (sec):
9.142                   13.15                   11.94

FFT over 800,000 random values (sec):
0.3907                  0.4066                  0.3924

Eigenvalues of a 500x500 random matrix (sec):
7.386                   7.386                   7.215

Determinant of a 2000x2000 random matrix (sec):
3.225                   4.279                   3.93

Cholesky decomposition of a 3000x3000 matrix (sec):
3.353                   5.025                   4.716

Inverse of a 2000x2000 random matrix (sec):
7.336                   10.37                   9.579


Hope it of some interest.
Sincerely,

Dmitri.

p.s.: I looked into using threaded FFTW and post
some thoughts lately.



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]