I am just finishing a project that has taken me many months (years), the creation of a new backgammon MET. My nearly finished MET is called 'PR2' and it is a combination of both rollouts and a theoretical MET. It uses rollout trials (+500k) of all match scores less than 9a9a and then a specially developed theoretical MET I call the 'Variable MET' to extend the rollout results to 31a31a. The Variable MET could generate all of the 9a9a MET probabilities in its own right, however, I use it in my MET as an extrapolation tool. A lot of time and effort has gone into the accuracy of both the 9a9a rollouts and the development of the Variable MET, more so the latter.
I could go into a lot more detail if you wish, however, what I would like to do now is test my new PR2 MET and your help with 1) below is what I care most about. Tony Lezard (of Dueller renown) suggested I contact the Gnubg team after I asked him for help on testing.
1) What I would like to do is test my PR2 MET by doing a series of 5pt matches where one Gnubg player uses the PR2 MET and the other the now standard Kazaross-XG2 MET (in particular). My 'PR2' player faces himself in 500 by 5pt matches at a time and the results are recorded. The moves I don't care about, who won that 500 match series is all that is important. I know that Gnubg can play itself now, however, not with different MET's loaded and not without a lot of human input (every game end requires a manual prompt from the user for the next game to begin). That way of doing things is unworkable for me. What I need is a set-and-forget solution, something I can start to do overnight and in the morning the match wins is reported as something like (say) 257-243.
I am only guessing how long 500 by 5pt matches would take me, even if fully automated. Additionally, I do not know how many sets of 500 by 5pt matches I would need to do to see a significant difference in METS. Maybe 5000, 50000 or 500000. After seeing the difference in equity the PR2 MET can sometimes produce I am hoping for the former.
I have a friend who is a lot more computer savvy than I am and he has started playing around with different sockets/ports and instances of Gnubg. He tells me "You actually need 3 instances of gnubg running - I run all three without the graphical interface, only pure terminal versions". However, before he goes to too much trouble I thought it best to contact the Gnubg team and see if you can help.
Maybe you only have to change "a couple of lines of programming" as someone on my forum suggested (lol). It won't be that easy, I know!
2) Jim Segrave thought this issue might be of interest to the team.
There should not be any difference in the cubeless and cubeful results, however, there is. I think the cubeless results are right and the cubeful result discrepancy is due to some cubeful calculation drift. This particular rollout shows the discrepancy near the 5th dp. In other rollouts I did I believe the discrepancy crept into the 4th dp.
My PR2 MET tries for accuracy to the second dp(%) in all of the 9a9a entries I rolled. E.g. I have 1a2aC as 68.36% after compiling over a million trials and that should be accurate to 2dp(%).
Here is a further example. When I first rolled out 8a1a over 1 million times in a single rollout I got a final cubeful result of ~0.10705. However, I happened to be around my computer to watch the result at ~93% completion and see the equity climb steadily from 0.10688 for over an hour to reach 0.10705. So what, you may ask? Well, I have watched enough rollouts to suppose that the 3rd decimal place if not the fourth should be set in stone at nearly a million trials. Additionally, rollouts will have the equity jumping up and down a bit due to variance, this rollout was not doing that, equity just went up and up in this case.
I was very suspicious so I then checked my 8a1a result by choosing 5 new seeds and doing 5x12960 trial rollouts using the same Gnubg settings. I got:
The mean of these means is ~0.10672.
In terms of a MET entry that would be 10.67% vs 10.71% for the million+ rollout. 5x12960=64800 trials is not really a lot, however, I have done enough rollouts to know something is probably wrong here. I repeated this exercise with another million+ trial rollout vs 5x12960 trials. In this second case, the 5x12960 results were all close to the mean 89.70% while the million+ rollout was 89.45%. Again, very different and the million+ trials are inaccurate in my opinion.
I am guessing that there is some problem with the cubeful algorithm that first creeps in at the 7th significant figure (sf), then migrates to the 6th, 5th, 4th sf etc... all governed by the number of trials. For an average user, they won't ever see a problem at 5184 trials or even 51840 trials. However, I saw a problem with 518400 trials and above. At the time of first seeing this issue, I abandoned the 25 x 1M+ trials I had done for my MET project and started again. The way around this problem for me was to do sets of 46656 trials and tabulate them carefully.
An esoteric problem for sure and one that might be nearly irrelevant to everyone except me. However, there might be an easy remedy that has to do with increasing the number of sf used in Gnubg's cubeful algorithm(s).
3) Lastly, this is a small display problem to consider.
Since in building a 31a31 MET I would check its extremities quite regularly to see if I had the right PR2 MET version loaded and I noticed a problem. There is a display problem at 23a31a where the equity for 25a31a is shown instead. Incidentally, the 31a23a equity is correct in the Gnubg table. You will not see a problem in the display of most of the MET's you have loaded (probably all the default ones you have) since a calculation will internally extrapolate results from ~15a15a (mec.c perhaps). My PR2 MET is different, the extrapolation calculations Gnubg does for other MET's do not start until after 31a31a. I think you have a small address problem to fix.
(Australian Backgammon Federation Director)