[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Performance question concerning chicken flonum vs "foreign flonum"

From: christian.himpe
Subject: Performance question concerning chicken flonum vs "foreign flonum"
Date: Thu, 04 Nov 2021 16:46:50 +0100 (CET)

Dear All,

I am currently experimenting with Chicken Scheme and I would like to ask about 
the following situation: I am comparing a "pure" Scheme fused-multiply-add 
(fma) using chicken.flonum against C99's fma via chicken.foreign. Here is my 
test code:

;;;; fma-test.scm

(import (chicken flonum) (chicken foreign) srfi-4)

(foreign-declare "#include <math.h>")

;; FMA via nested fp+ and fp* from chicken-flonum
(define (scm-fma x y z)
  (fp+ z (fp* x y)))

;; FMA via C99 function through chicken-foreign
(define c99-fma (foreign-lambda double "fma" double double double))

;; Test function for FMAs
(define (dot fma a b)
  (do [(idx 0 (add1 idx))
       (dim (f64vector-length a))
       (ret 0.0 (fma (f64vector-ref a idx) (f64vector-ref b idx) ret))]
    ((= idx dim) ret)))

;; Test vector dimension
(define dim 2000000)

;; Test vector 1
(define a (make-f64vector dim 1.2345))

;; Test vector 2
(define b (make-f64vector dim 0.9876))

;; Test repetitions
(define N 200)

;; Test scm-dot
(time (do [(n 0 (add1 n))]
        ((= n N))
        (dot scm-fma a b)))

;; Test fma-dot
(time (do [(n 0 (add1 n))]
        ((= n N))
        (dot c99-fma a b)))


Runnnig this code as follows:

csc -O5 fma-test.scm && ./fma-test

yields the results in:

7.558s CPU time, 0/225861 GCs (major/minor), maximum live heap: 30.78 MiB
8.839s CPU time, 0/256410 GCs (major/minor), maximum live heap: 30.78 MiB

Now I wonder why C's single function (instruction) is slower than two Scheme 
functions calls. I have four potential explanations:

1. chicken.foreign needs to do some type conversion for each argument and 
return value which accounts for the extra time. If so could this be avoided by 
type declarations somehow?

2. chicken.flonum does something to make fpX calls very fast. If so can this be 
done for the foreign fma, too?

3. I am using chicken.foreign inefficiently, but I think srfi-144 is using it 

4. This is an effect only on my machine?

It would be great to get some help or explanation with this issue.

Here is my setup:

CHICKEN Scheme 5.2.0
gcc 10.3.0
Ubuntu 20.04
AMD Ryzen 5 4500U with 16GB

Thank you very much


reply via email to

[Prev in Thread] Current Thread [Next in Thread]