[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gsl] Improving gsl_blas_dsyrk routine
From: |
王一 |
Subject: |
[Bug-gsl] Improving gsl_blas_dsyrk routine |
Date: |
Sat, 12 Apr 2008 16:27:26 +0800 |
Dear GSL developers:
I benifits a lot recently by using your powerful routines. Thus I became
interested in the detail of your routines.
When I look through the BLAS source code, I found gsl_blas_dsyrk routine can
be significantly improved by a change of looping order.
As I'm just a Ph.D. student of Statistical Genetics, I'm not sure my finding is
generally true.
I just wish to help the lovely GSL! Please let me know the result if possible.
The version number of GSL: 1.8
The hardware and operating system: T7100 CPU, Windows XP
The compiler used, including version number and compilation options:
Dev-C++/MinGW , full optimization
A description of the bug behavior: I use a real dataset with 6238 rows by 617
columns of double precision. The gsl_blas_dsyrk routine has 1/5 speed than my
routine in some cases.
Rows Columns Transpose Mine/sec GSL_BLAS/sec
6238 617 No 26.8
25.2
6238 617 Yes 3.4
15.9
617 6238 No 2.4
2.4
617 6238 Yes 54
256.5
A short program which exercises the bug:
My gsl_blas_dsyrk routine analog is listed below. It is a function of my C++
matrix template class. Hope this will not trouble you.
template <class type>
void matrix<type>::square(matrix& M, bool T){
register uint r, c, k;
register type s;
if(!T){
resize(M.rn, M.rn);
for(r=0; r<M.rn; r++) for(c=r; c<M.rn; c++){
s=type(0);
for(k=0; k<M.cn; k++) s+=M(r, k)*M(c, k);
(*this)(c, r)=(*this)(r, c)=s;
}
}
else{
resize(M.cn, M.cn);
// The follow code changes looping order, just like DGEMM does
for(k=0; k<M.rn; k++) for(r=0; r<M.cn; r++){
s=M(k, r);
for(c=r; c<M.cn; c++) (*this)(r, c)+=s*M(k, c);
}
for(r=0; r<M.cn; r++) for(c=r+1; c<M.cn; c++) (*this)(c, r)=(*this)(r,
c);
}
}
And your corresponding codes (similar with the Fortan code) are:
if (uplo == CblasUpper && trans == CblasNoTrans) {
for (i = 0; i < N; i++) {
for (j = i; j < N; j++) {
BASE temp = 0.0;
for (k = 0; k < K; k++) {
temp += A[i * lda + k] * A[j * lda + k];
}
C[i * ldc + j] += alpha * temp;
}
}
} else if (uplo == CblasUpper && trans == CblasTrans) {
// The following code may be improved
for (i = 0; i < N; i++) {
for (j = i; j < N; j++) {
BASE temp = 0.0;
for (k = 0; k < K; k++) {
temp += A[k * lda + i] * A[k * lda + j];
}
C[i * ldc + j] += alpha * temp;
}
}
王一
2008-04-12
- [Bug-gsl] Improving gsl_blas_dsyrk routine,
王一 <=