bug-gsl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

## [Bug-gsl] Improving gsl_blas_dsyrk routine

 From: 王一 Subject: [Bug-gsl] Improving gsl_blas_dsyrk routine Date: Sat, 12 Apr 2008 16:27:26 +0800

```Dear GSL developers:
I benifits a lot recently by using your powerful routines. Thus I became
interested in the detail of your routines.
When I look through the BLAS source code, I found  gsl_blas_dsyrk routine can
be significantly improved by a change of looping order.
As I'm just a Ph.D. student of Statistical Genetics, I'm not sure my finding is
generally true.
I just wish to help the lovely GSL! Please let me know the result if possible.

The version number of GSL:                              1.8
The hardware and operating system:      T7100 CPU, Windows XP
The compiler used, including version number and compilation options:
Dev-C++/MinGW ,  full optimization
A description of the bug behavior: I use a real dataset with 6238 rows by 617
columns of double precision. The gsl_blas_dsyrk routine has 1/5 speed than my
routine in some cases.

Rows    Columns Transpose       Mine/sec        GSL_BLAS/sec
6238            617                     No                      26.8
25.2
6238            617                     Yes                     3.4
15.9
617                     6238            No                      2.4
2.4
617                     6238            Yes                     54
256.5

A short program which exercises the bug:
My gsl_blas_dsyrk routine analog is listed below. It is a function of my C++
matrix template class. Hope this will not trouble you.

template   <class type>
void    matrix<type>::square(matrix&  M,  bool    T){
register uint    r,  c, k;
register    type    s;
if(!T){
resize(M.rn,    M.rn);
for(r=0;    r<M.rn; r++)    for(c=r;    c<M.rn; c++){
s=type(0);
for(k=0;    k<M.cn; k++)    s+=M(r, k)*M(c, k);
(*this)(c,  r)=(*this)(r,  c)=s;
}
}
else{
resize(M.cn,    M.cn);
// The follow code changes looping order, just like DGEMM does
for(k=0;    k<M.rn; k++) for(r=0;    r<M.cn; r++){
s=M(k, r);
for(c=r;    c<M.cn; c++) (*this)(r,  c)+=s*M(k, c);
}
for(r=0;    r<M.cn; r++) for(c=r+1;    c<M.cn; c++) (*this)(c,  r)=(*this)(r,
c);
}
}

And your corresponding codes (similar with the Fortan code) are:
if (uplo == CblasUpper && trans == CblasNoTrans) {
for (i = 0; i < N; i++) {
for (j = i; j < N; j++) {
BASE temp = 0.0;
for (k = 0; k < K; k++) {
temp += A[i * lda + k] * A[j * lda + k];
}
C[i * ldc + j] += alpha * temp;
}
}

} else if (uplo == CblasUpper && trans == CblasTrans) {
// The following code may be improved
for (i = 0; i < N; i++) {
for (j = i; j < N; j++) {
BASE temp = 0.0;
for (k = 0; k < K; k++) {
temp += A[k * lda + i] * A[k * lda + j];
}
C[i * ldc + j] += alpha * temp;
}
}

2008-04-12
```

reply via email to

 [Prev in Thread] Current Thread [Next in Thread]