[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-hackers] [PATCH][5] numbers integration
From: |
Felix Winkelmann |
Subject: |
Re: [Chicken-hackers] [PATCH][5] numbers integration |
Date: |
Fri, 13 Feb 2015 11:30:42 +0100 (CET) |
>
> It turns out that the overhead of the generic vararg numeric operators is
> so much bigger that the total runtime of all the benchmarks with this
> small change in rewrites is only 30% slower than the non-numbers version
> of CHICKEN 5 master, instead of 100% slower! See the attached performance
> difference between chicken-5 master and the current numbers integration
> version. In fact, a few benchmarks are even slightly faster with the new
> code (but I can't really explain that)!
>
> I've pushed the version with the performance improvement to my
> "numbers-integration" branch (and fixed a nasty stupid bug in "eqv?" I
> found while working on this).
Well done, Peter!
>
> I hope this performance difference is more acceptable.
No. :-)
(at least not for me)
We must avoid the CPS-call, at all costs. The example posted by Alex
shows directly what needs to be done:
[old]
(lambda (k10)
(let ((t12 (set! test
(lambda (k14 n1)
(let ((loop2 (##core#undefined)))
(let ((t18 (set! loop2
(lambda (k20 i3 result4)
(if (##core#inline "C_i_greaterp" i3 n1)
(k20 42)
(let ((a30 (##core#inline_allocate
("C_a_i_plus" 4) i3 1)))
(let ((a33 (##core#inline_allocate
("C_a_i_plus" 4)
result4
1)))
(loop2 k20 a30 a33))))))))
(loop2 k14 1 result)))))))
(let ((k36 (##core#lambda (t8) (k10 (##core#undefined))))) (test k36 10))))
vs.
[new]
(lambda (k176)
(let ((t177 (set! test
(lambda (k179 n1)
(let ((loop2 (##core#undefined)))
(let ((t183 (set! loop2
(lambda (k185 i3 result4)
(if (##core#inline "C_i_greaterp" i3 n1)
(k185 42)
(let ((k196 (##core#lambda
(r197)
(let ((a195 r197))
(let ((k200
(##core#lambda (r201) (loop2 k185 a195 r201))))
(+ k200 result4 1))))))
(+ k196 i3 1)))))))
(loop2 k179 1 result)))))))
(let ((k203 (##core#lambda (r204) (k176 (##core#undefined)))))
(test k203 10))))
In the former case, the compiler is able to produce a single C
function. Regardless how contrived this is, small loops with integer-
or flonum math will suffer heavily.
Isn't there some crazy scheme to postpone allocation in some way?
Could we optimistically assume an operation will not overflow into a
bignum? Perhaps use a static buffer for the result (kept on a list
inside the runtime system) and migrated into the heap on the next GC
opportunity? This will not help in tight loops like above, for those,
using specialized fixnum-arithmetic is the only choice. But for normal
code, the number of CPS calls could be reduced significantly.
Or not. I'm just thinking loudly, here.
Or range-analysis. But this will not help across procedure-boundaries,
and it isn't clear how much can be inferred.
felix
- Re: [Chicken-hackers] [PATCH][5] numbers integration, (continued)
- Re: [Chicken-hackers] [PATCH][5] numbers integration, Felix Winkelmann, 2015/02/08
- Re: [Chicken-hackers] [PATCH][5] numbers integration, Alex Shinn, 2015/02/09
- Re: [Chicken-hackers] [PATCH][5] numbers integration, Peter Bex, 2015/02/09
- Re: [Chicken-hackers] [PATCH][5] numbers integration, Alaric Snell-Pym, 2015/02/11
- Re: [Chicken-hackers] [PATCH][5] numbers integration, Felix Winkelmann, 2015/02/11
- Re: [Chicken-hackers] [PATCH][5] numbers integration, Jörg F. Wittenberger, 2015/02/11
- Re: [Chicken-hackers] [PATCH][5] numbers integration, Peter Bex, 2015/02/11
- Re: [Chicken-hackers] [PATCH][5] numbers integration, Peter Bex, 2015/02/12
- Re: [Chicken-hackers] [PATCH][5] numbers integration, Alex Shinn, 2015/02/12
- Re: [Chicken-hackers] [PATCH][5] numbers integration,
Felix Winkelmann <=
- Re: [Chicken-hackers] [PATCH][5] numbers integration, Peter Bex, 2015/02/15
- Re: [Chicken-hackers] [PATCH][5] numbers integration, Felix Winkelmann, 2015/02/15