
From:  Timothy Y. Chow 
Subject:  Re: [gnubg] Help with a new MET 
Date:  Mon, 11 Nov 2019 21:17:33 0500 (EST) 
Useragent:  Alpine 2.21 (LRH 202 20170101) 
Ian, Thanks for putting all this effort into a new MET!I don't know too much about the innards of GNU Backgammon, but I do know something about math and statistics.
In terms of how many matches you would have to play between GNUoldMET and GNUnewMET, that depends on how much stronger GNUnewMET is. Suppose that GNUnewMET has a 51%/49% edge over GNUoldMET. That means that if you played 1000 matches, then you would expect a score of 510 to 490. The problem is that if GNUoldMET were playing against itself, the standard deviation would be about 15.8. So a 510 to 490 result would be far from statistically significant. You'd need about 10000 trials to barely reach statistical significance: The expected score would be 5100 to 4900 and the standard deviation would be 50, so 5100 would be two standard deviations away. In general the formula for the standard deviation is sqrt(n)/2 where n is the number of matches.
There's another point to be cognizant of, which is that there is a distinction between statistically significant evidence of the barebones claim that "the new MET is better," and a good estimate of *how* much stronger GNUnewMET is than GNUoldMET. Let's say you played 10000 matches and the score was 5100 to 4900. You could then claim that the new MET is better, and say that this claim is significant at the two standard deviation level. But you *couldn't* claim that you are 95% confident that the new MET gives you a 51%/49% edge over the old MET. To get a good estimate of the edge requires more trials. How many trials you need would depend on how sharp an estimate you want.
I don't have as much insight into what might be going wrong with the cubeful calculations. It does sound to me that there might be a problem with floatingpoint precision, but someone with knowledge of the code will have to comment on that.
Tim
[Prev in Thread]  Current Thread  [Next in Thread] 