chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] [PATCH][5] numbers integration


From: Felix Winkelmann
Subject: Re: [Chicken-hackers] [PATCH][5] numbers integration
Date: Fri, 13 Feb 2015 11:30:42 +0100 (CET)

> 
> It turns out that the overhead of the generic vararg numeric operators is
> so much bigger that the total runtime of all the benchmarks with this
> small change in rewrites is only 30% slower than the non-numbers version
> of CHICKEN 5 master, instead of 100% slower!  See the attached performance
> difference between chicken-5 master and the current numbers integration
> version.  In fact, a few benchmarks are even slightly faster with the new
> code (but I can't really explain that)!
> 
> I've pushed the version with the performance improvement to my
> "numbers-integration" branch (and fixed a nasty stupid bug in "eqv?" I
> found while working on this).

Well done, Peter!

> 
> I hope this performance difference is more acceptable.  

No. :-)

(at least not for me)

We must avoid the CPS-call, at all costs. The example posted by Alex
shows directly what needs to be done:

[old]

(lambda (k10)
  (let ((t12 (set! test
               (lambda (k14 n1)
                 (let ((loop2 (##core#undefined)))
                   (let ((t18 (set! loop2
                                (lambda (k20 i3 result4)
                                  (if (##core#inline "C_i_greaterp" i3 n1)
                                    (k20 42)
                                    (let ((a30 (##core#inline_allocate 
("C_a_i_plus" 4) i3 1)))
                                      (let ((a33 (##core#inline_allocate
                                                   ("C_a_i_plus" 4)
                                                   result4
                                                   1)))
                                        (loop2 k20 a30 a33))))))))
                     (loop2 k14 1 result)))))))
    (let ((k36 (##core#lambda (t8) (k10 (##core#undefined))))) (test k36 10))))

vs.

[new]

(lambda (k176)
  (let ((t177 (set! test
                (lambda (k179 n1)
                  (let ((loop2 (##core#undefined)))
                    (let ((t183 (set! loop2
                                  (lambda (k185 i3 result4)
                                    (if (##core#inline "C_i_greaterp" i3 n1)
                                      (k185 42)
                                      (let ((k196 (##core#lambda
                                                    (r197)
                                                    (let ((a195 r197))
                                                      (let ((k200 
(##core#lambda (r201) (loop2 k185 a195 r201))))
                                                        (+ k200 result4 1))))))
                                        (+ k196 i3 1)))))))
                      (loop2 k179 1 result)))))))
    (let ((k203 (##core#lambda (r204) (k176 (##core#undefined)))))
      (test k203 10))))

In the former case, the compiler is able to produce a single C
function. Regardless how contrived this is, small loops with integer-
or flonum math will suffer heavily.

Isn't there some crazy scheme to postpone allocation in some way?
Could we optimistically assume an operation will not overflow into a
bignum? Perhaps use a static buffer for the result (kept on a list
inside the runtime system) and migrated into the heap on the next GC
opportunity? This will not help in tight loops like above, for those,
using specialized fixnum-arithmetic is the only choice. But for normal
code, the number of CPS calls could be reduced significantly.

Or not. I'm just thinking loudly, here.

Or range-analysis. But this will not help across procedure-boundaries,
and it isn't clear how much can be inferred.


felix



reply via email to

[Prev in Thread] Current Thread [Next in Thread]