|
From: | Peter Bex |
Subject: | [Chicken-hackers] [PATCH] Fix performance bottleneck in compiling large files, add profiling option |
Date: | Sat, 21 Jan 2012 20:38:39 +0100 |
User-agent: | Mutt/1.4.2.3i |
Hi all, I'm currently still in the process of getting the "numbers" tests to compile in reasonable time. To do this, I managed to find one part in Chicken's compilation process where it's slow thanks to Felix's hint to use -debug p. To debug this, I created a small script that builds a file with an increasing number of toplevel forms (see measure-compilation.scm). It appears that simply putting (print #t) in a file a lot of times simulates a large toplevel pretty well and triggers the exponential behaviour reliably. This is visualized by the green line in the file "comparison.png". Afterwards, I added the option to selectively turn on profiling for parts in Chicken itself to the Makefiles (see patch 0001 or changeset 57fff73344771739e7723e28df0c0019058a0268 in the sjamaan-pending branch). This helped me analyze the performance by simply profiling Chicken while it compiled a test file containing 1000 toplevel expressions. Turns out that the "k1000 in k999 in k998 in ..." comments that Chicken generates in the C file are nested rather deeply when there's a long procedure or a lot of toplevel forms, which causes the bottleneck to appear in ##compiler#real-name. The notes.txt file shows this quite obviously, it has a short heading of chicken-profile output. Instead of using sprintf and allocating a new string and copying over every time, I've changed real-name to use build up a list and use string-intersperse to join them together at the end (see patch 0002 or changeset 27dbaf0440617fa92a33c592aba9d52b72a95117 in my branch). This causes compilation time to grow a lot less hard (see the red line in comparison.png) and compilation time to be more than halved when compiling the 1000-form file, but real-name is still dominating the procedure-calls which makes sense since there are places where the comment is several standard 80x25 screenfulls. Perhaps these comments should be cut off at a certain length? Unfortunately, the compilation time is still exponential on the input, but with a much smaller growth factor (you can see this more clearly in string-intersperse.png). Also, the compilation time for the numbers egg's tests is still close to 7 minutes (this change shaved off about half a minute). Turns out that the numbers test is sufficiently more complex than the simple tests you see here, so the main bottleneck is probably elsewhere (I can tell because the time spent in the other phases is now higher than the time spent in the code generation phase). I'll dig in deeper and try to work out where it's being slow, but I didn't want to withhold these patches and insights from y'all :) Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth
0001-Add-option-to-enable-profiling-more-easily-for-speci.patch
Description: Text document
0002-Improve-performance-by-not-using-sprintf-to-build-co.patch
Description: Text document
comparison.png
Description: PNG image
measure-compilation.scm
Description: Text document
notes.txt
Description: Text document
string-intersperse.png
Description: PNG image
[Prev in Thread] | Current Thread | [Next in Thread] |