[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnubg] Help with a new MET

From: Timothy Y. Chow
Subject: Re: [gnubg] Help with a new MET
Date: Mon, 11 Nov 2019 21:17:33 -0500 (EST)
User-agent: Alpine 2.21 (LRH 202 2017-01-01)


Thanks for putting all this effort into a new MET!

I don't know too much about the innards of GNU Backgammon, but I do know something about math and statistics.

In terms of how many matches you would have to play between GNU-old-MET and GNU-new-MET, that depends on how much stronger GNU-new-MET is. Suppose that GNU-new-MET has a 51%/49% edge over GNU-old-MET. That means that if you played 1000 matches, then you would expect a score of 510 to 490. The problem is that if GNU-old-MET were playing against itself, the standard deviation would be about 15.8. So a 510 to 490 result would be far from statistically significant. You'd need about 10000 trials to barely reach statistical significance: The expected score would be 5100 to 4900 and the standard deviation would be 50, so 5100 would be two standard deviations away. In general the formula for the standard deviation is sqrt(n)/2 where n is the number of matches.

There's another point to be cognizant of, which is that there is a distinction between statistically significant evidence of the bare-bones claim that "the new MET is better," and a good estimate of *how* much stronger GNU-new-MET is than GNU-old-MET. Let's say you played 10000 matches and the score was 5100 to 4900. You could then claim that the new MET is better, and say that this claim is significant at the two standard deviation level. But you *couldn't* claim that you are 95% confident that the new MET gives you a 51%/49% edge over the old MET. To get a good estimate of the edge requires more trials. How many trials you need would depend on how sharp an estimate you want.

I don't have as much insight into what might be going wrong with the cubeful calculations. It does sound to me that there might be a problem with floating-point precision, but someone with knowledge of the code will have to comment on that.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]