[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gnubg] Profiler output
From: |
Jim Segrave |
Subject: |
[Bug-gnubg] Profiler output |
Date: |
Mon, 11 Nov 2002 18:05:52 +0100 |
User-agent: |
Mutt/1.2.5.1i |
Interestingly, both the profiler and running unprofiled code show that
inlining EqualKeys() actually is a slight performance loss.(about 2%
slower over the same 376 move analysis). I am guessing that it might
stop some other optimisations.
On Mon 11 Nov 2002 (14:48 +1300), Joseph Heled wrote:
> What does your profiler says about?
>
> ApplySubMove ApplyMove, SaveMoves, LegalMove, GenerateMovesSub,
> GenerateMoves
I don't totally trust the profiler output - very short routines are
under-represented because the setup for a call is charged to the
caller, not the callee. For example, the savings made by getMEMultiple
replacing many calls to getME within a single function are small or,
in many cases, non-existant. getMEMultiple looks up between 10 and 30
MET entries, not all of which will be needed. But saving the call
setup 10 or more times, when there are a lot of arguments to be passed
gives a major win.
The counts of number of calls are correct (where they are given)
though. So, in answer to your question:
Total program time 1315 seconds or so (not counting profiler time
itself)
% self self total
time seconds calls ms/call ms/call name
0.1 1.01 33811782 0.00 0.00 ApplySubMove
0.0 0.00 1065 0.00 0.00 ApplyMove
1.7 23.85 24288468 0.00 0.00 SaveMoves
0.5 6.45 54605445 0.00 0.00 LegalMove
1.1 14.52 1172242 0.01 0.09 GenerateMovesSub
0.0 0.08 683734 0.00 0.16 GenerateMoves
The real time takers based on sample times are (these are all the ones
over 2% of run time):
37.6 515.83 EvaluateFromBase
14.4 197.50 18987656 0.01 0.01 CalculateHalfInputs
9.1 124.77 111637935 0.00 0.00 getMEMultiple
4.2 58.04 NeuralNetEvaluate
4.1 56.71 277644558 0.00 0.00 Escapes
3.9 52.86 35368076 0.00 0.00 GetPoints
3.1 42.87 777994269 0.00 0.00 EqualKeys
2.9 39.32 27684624 0.00 0.00 SanityCheck
2.8 38.12 40820225 0.00 0.00 PositionKey
2.8 37.79 Evaluate
2.1 28.34 17104674 0.00 0.00 PositionFromKey
I can't explain why functions in neuralnet.c don't show call
counts. I find that most odd. And I can't get gnu's gprof (which
allows line counts within a function to be profiled) to build under
FreeBSD, as it needs functions to describe core files and elf images
which don't exist under FreeBSD (well, they must, because you can
build gdb, but I can't figure out how to build gprof), so I'm using
the FreeBSD version, which isn't as flexible.
For the benefit of those who want to see the whole thing, I've
attached the gprof full output from an analysis of a 9 game 376 move
match. I'd think anyone running Linux will have profiling libraries
available and probably a real gnu gprof. Simply adding -pg to CFLAGS
in the top Makefile and doing a make clean;make gets you a profiling
version. 'info gprof' for more details. Don't forget that profiling
costs a lot of real time in execution, so things will seem to run a
lot slower.
I have a suspicion, but it will take a lot more work to see that it's
so, that several of the parts of CalculateHalfInputs and Escapes might
benefit from building more bitmaps of the board and using logical
operators. This is just a top of the head, skim over the code feeling,
but I suspect that having bit maps of:
Players board blots - bits 1..24 set to 1 if player has a blot on that
point.
Players board made points - like the blots map, this time showing
points where the player has more than one chequer.
Opponents blots
Opponents made points
These, along with the reverses of these maps make some operations
very easy:
Using bit maps of dice rolls allows a quick check for escapes - you
'&' the bit map of the roll with the opponents map of made points. If
the result is non-zero, you can't go there.
The same '&' with the opponent's blot map tells you if you can hit.
I have a strong suspicion that this method may give better performance
than some of the iterative step-through-the-array and compare counts
which are done to set up some of the neural net inputs, where there
currently seem to be a lot of for loops.
I'd like to look at doing that, but I need to go through each loop
carefully to see exactly what it's doing and why before trying to
substitue a different representation. And it's possible, although I
think it's not liekly that the cost of building the maps would use up
the gains.
--
Jim Segrave address@hidden
gprof.out.gz
Description: application/gunzip