[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Octave and threaded ATLAS and FFTW
From: |
Dmitri A. Sergatskov |
Subject: |
Re: Octave and threaded ATLAS and FFTW |
Date: |
Wed, 21 Jan 2004 15:37:02 -0700 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040115 |
Here is some experience with Octave 2.1.50 compiled with thread-enabled ATLAS
on 2xAthlonMP 2000 MHz. Linux Fedoras Core 1.
To make long story short -- it does seems to help with some matrix manipulations
(most notably multiplication). The gory details are following:
I used ATLAS 2.6.0 and compiled it (without using default parameters) on
runlevel 1
(took some 5 hours). The binary available at
http://coffee.phys.unm.edu/dima/octave/Linux_ATHLONSSE1_2_das.tgz
I could not get configure to pickup libptcblas instead of libcblas, so after
running
./configure --enable-shared --enable-dl --disable-static I manually modified
Makeconf
orig: BLAS_LIBS = -lcblas -lf77blas -latlas
changed to: BLAS_LIBS = -lptcblas -lptf77blas -latlas
orgig: LIBS = -lreadline -lncurses -ldl -lm
changed to: LIBS = -lreadline -lncurses -ldl -lpthread -lm
It compiled fine. The only problem I found so far is 'cputime' start returning
rediculously small numbers (I reported it before in this thread).
I verify with a stopwatch that tic/toc returns correct numbers so I used it for
benchmarking. I started with Octave2.m benchmark from www.sciviews.org and
slightly modified it. I removed rand() from the first benchmark and increase
matrix sizes to make it size large than my cache (256k) and increase the
execution
time to a few seconds. (http://coffee.phys.unm.edu/dima/octave/Octave2l.m)
The relevant numbers (full benchmark results are in the file
http://coffee.phys.unm.edu/dima/octave/bench2cpu.txt) are below.
The first column (Pthread on 2CPU) is the result of octave
linked with threaded ATLAS running on SMP kernel.
The second column (Pthread on 1CPU) is the same octave running on the same
computer
booted into UniProcessor kernel.
The third column ("Normal" on 2CPU) is octave linked to the normal ATLAS
running on the same computer with SMP kernel.
Pthread on 2CPU Pthread on 1CPU "Normal" on 2CPU
transp., deformation of a 3000x3000 matrix (sec):
2.152 2.188 2.306
3000x3000 normal distributed random matrix ^1000 (sec):
1.401 1.395 1.409
Sorting of 2,000,000 random values (sec):
7.904 8.049 7.831
3000x3000 cross-product matrix (b = a' * a)(sec):
10.31 19.26 18.19
Linear regression over a 3000x3000 matrix (c = a \ b') (sec):
9.142 13.15 11.94
FFT over 800,000 random values (sec):
0.3907 0.4066 0.3924
Eigenvalues of a 500x500 random matrix (sec):
7.386 7.386 7.215
Determinant of a 2000x2000 random matrix (sec):
3.225 4.279 3.93
Cholesky decomposition of a 3000x3000 matrix (sec):
3.353 5.025 4.716
Inverse of a 2000x2000 random matrix (sec):
7.336 10.37 9.579
Hope it of some interest.
Sincerely,
Dmitri.
p.s.: I looked into using threaded FFTW and post
some thoughts lately.
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------