
From:  Joseph Heled 
Subject:  Re: [gnubg] Help with a new MET 
Date:  Tue, 12 Nov 2019 16:04:34 +1300 
Ian,
Thanks for putting all this effort into a new MET!
I don't know too much about the innards of GNU Backgammon, but I do know
something about math and statistics.
In terms of how many matches you would have to play between GNUoldMET
and GNUnewMET, that depends on how much stronger GNUnewMET is.
Suppose that GNUnewMET has a 51%/49% edge over GNUoldMET. That means
that if you played 1000 matches, then you would expect a score of 510 to
490. The problem is that if GNUoldMET were playing against itself, the
standard deviation would be about 15.8. So a 510 to 490 result would be
far from statistically significant. You'd need about 10000 trials to
barely reach statistical significance: The expected score would be 5100 to
4900 and the standard deviation would be 50, so 5100 would be two standard
deviations away. In general the formula for the standard deviation is
sqrt(n)/2 where n is the number of matches.
There's another point to be cognizant of, which is that there is a
distinction between statistically significant evidence of the barebones
claim that "the new MET is better," and a good estimate of *how* much
stronger GNUnewMET is than GNUoldMET. Let's say you played 10000
matches and the score was 5100 to 4900. You could then claim that the new
MET is better, and say that this claim is significant at the two standard
deviation level. But you *couldn't* claim that you are 95% confident that
the new MET gives you a 51%/49% edge over the old MET. To get a good
estimate of the edge requires more trials. How many trials you need would
depend on how sharp an estimate you want.
I don't have as much insight into what might be going wrong with the
cubeful calculations. It does sound to me that there might be a problem
with floatingpoint precision, but someone with knowledge of the code will
have to comment on that.
Tim
[Prev in Thread]  Current Thread  [Next in Thread] 