|Subject:||Re: [Bug-gnubg] pubeval benchmark|
|Date:||Fri, 3 Feb 2012 11:00:28 -0500|
If you're just looking at probability of win the gammon node doesn't matter, though of course if you want to look at equity then you'll get value from it.
Using 80 hidden units and simple inputs I got a player to 63% wins against pubeval, and gnubg 0-ply (with more hidden units and extended inputs) wins around 71%.
So 67.5% sounds a bit high but not unbelievable. (On 20k matches the standard error on the % of win estimate is around +/- 0.35%, so 67.5% is significantly different from 63% or 71%.)
It's surprisingly tricky to implement pubeval correctly - I had a bunch of mistakes in my first attempt, and gnubg's implementation also had a bug until recently.
One subtle implementation point: when comparing potential moves you have to make sure that you use the race or contact weights based on the starting position, not on whether each potential move is contact or race. That's because pubeval's evaluation function is a separate linear regression for contact and race, and so the results of the two regressions aren't sensibly comparable (ie they don't represent probability of win).
That makes a smallish but noticeable different to average pubeval performance.
On Feb 3, 2012, at 10:32 AM, boomslang wrote:
|[Prev in Thread]||Current Thread||[Next in Thread]|