[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Chicken-hackers] [PATCH] Speedup in walk-generic and generate less LET
From: |
Peter Bex |
Subject: |
[Chicken-hackers] [PATCH] Speedup in walk-generic and generate less LET statements |
Date: |
Sun, 19 Feb 2012 22:55:52 +0100 |
User-agent: |
Mutt/1.4.2.3i |
Hi all,
Another week, another patch :)
The first patch gets rid of the calls to SRFI-1's EVERY and MAP
in favor of a hand-rolled loop. This is slightly more verbose,
but by doing it this way we can avoid some recursive consing which
MAP performs (if nodes are equal, no need to reverse the list)
and because EVERY is so fully generic it does a lot of stuff
before actually getting started on the list, while the lists
in WALK-GENERIC are very often only one or two items.
The second patch is a bit more delicate and deserves some more
explanation. When a program initially gets translated into a
node tree, it is normalised to have all bodies transformed into
let-statements, as well as the toplevel.
For example, the test program "(print 1) (print 2) (print 3)"
gets normalised to this (use csc -debug T to see this tree):
[initial node tree]
(lambda ()
(let ((t1 (##core#callunit "library")))
(let ((t2 (##core#callunit "eval")))
(let ((t3 (print 1)))
(let ((t4 (print 2)))
(let ((t5 (print 3)))
(let ((t6 ((##sys#implicit-exit-handler))))
(##core#undefined))))))))
Then, during CPS conversion, each statement gets converted to
a lambda which explicitly accepts the value from the previous
continuation. While this is done, all LET statements are
converted so that the same variable still refers properly to
the value, but now it needs to refer to the lambda's argument
(use csc -debug 3 to see this tree):
[cps]
(lambda (k8)
(let ((k9 (##core#lambda
(r10)
(let ((t1 r10))
(let ((k12 (##core#lambda
(r13)
(let ((t2 r13))
(let ((k15 (##core#lambda
(r16)
(let ((t3 r16))
(let ((k18 (##core#lambda
(r19)
(let ((t4 r19))
(let ((k21
(##core#lambda
(r22)
(let
((t5 r22))
(let ((k24 (##core#lambda
(r25)
(let ((t6 r25)) (k8 (##core#undefined))))))
(let ((k27 (##core#lambda (r28) (r28 k24))))
(##sys#implicit-exit-handler k27)))))))
(print k21
3))))))
(print k18 2))))))
(print k15 1))))))
(##core#callunit "eval" k12))))))
(##core#callunit "library" k9)))
As you can see, there are lots of unneccessary LETs in here:
(t1 r10), (t2 r13), (t3 r16), (t4 r19), (t5 r22) and (t6 r25).
Instead, we could change the process so that the lambda's argument
isn't a normal gensym but takes its name from the LET if we know
the translation is from a LET. Then we can drop the LET:
[cps]
(lambda (k8)
(let ((k10 (##core#lambda
(t1)
(let ((k12 (##core#lambda
(t2)
(let ((k14 (##core#lambda
(t3)
(let ((k16 (##core#lambda
(t4)
(let ((k18 (##core#lambda
(t5)
(let ((k20
(##core#lambda (t6) (k8 (##core#undefined)))))
(let ((k22
(##core#lambda (r23) (r23 k20))))
(##sys#implicit-exit-handler k22))))))
(print k18 3)))))
(print k16 2)))))
(print k14 1)))))
(##core#callunit "eval" k12)))))
(##core#callunit "library" k10)))
As you can see, there are no unneccessary LET statements anymore.
The code even almost fits the screen ;)
Less variables is a good thing because each variable adds extra
overhead, since it gets looked at by the analyzer (which means this
change can almost halve the number of variables looked at by the
analyzer), which then stores them with its attributes in a hash table
(which requires use to hash the symbol, several times for lookup too).
The optimizer then needs to eliminate these variables again when it
decides they are really just aliases for the lambda arguments (which
it does, on the first iteration).
Both patches combined reduce the compilation time of the numbers
test from about 70 seconds to 53 seconds, so they're quite worth it IMO.
The walk-generic change is really needed by the LET change, because
for some reason this causes it to hit that procedure more than
without the change.
Oddly, this seems to cause a shift in the performance profile.
There's now a major bottleneck in the second optimization step
but all the other steps are a lot faster now. I hope this will
make it easier to further analyze this and eventually get rid of
that (final?) bottleneck, too.
Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
is especially attractive, not only because it can be economically
and scientifically rewarding, but also because it can be an aesthetic
experience much like composing poetry or music."
-- Donald Knuth
0001-Use-a-hand-rolled-loop-in-WALK-GENERIC-this-saves-us.patch
Description: Text document
0002-Don-t-generate-extra-LET-statements-during-cps-trans.patch
Description: Text document
- [Chicken-hackers] [PATCH] Speedup in walk-generic and generate less LET statements,
Peter Bex <=