bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] GNUBG gammon bug


From: Jim Segrave
Subject: Re: [Bug-gnubg] GNUBG gammon bug
Date: Mon, 21 Jul 2003 15:57:33 +0200
User-agent: Mutt/1.2.5.1i

On Mon 21 Jul 2003 (10:51 +0200), Henrik Bukkjaer wrote:
> Actually this position comes out even more tricky when you analyze it
> further in gnubg:
> 
> - gnubg cannot even find the best move on 4-ply in this very simple
> position.

This is not entirely surprising. I think this position (and the likely
following positions) are ones in which gnubg's neural net is probably
weak. If it gets it badly wrong at 0 ply, then I don't think this is
a position where looking ahead is going to clear its vision that
much

> - std. 0-ply 36 games rollout gives negative MWC for one of the moves (as I
> reported yesterday)

This is from a version of gnubg before any of the recent changes to
the rollout code were made (there's a detailed history of them below).

I don't know of any way to get negative MWC, trying using a version of
gnubg from 12 July, before all my recent rollout work, doesn't produce
this for me.

> - when rolling out on 0-ply 216 games, gnubg yields some very strange
> results when using variance reduced and cubeful rollout:
>    - the move with the most single game wins (2,6%) is the hopeless 6/3(3)
> 5/2.
>    - the best moves (MWC) all have one thing in common: they undo the
> 6-point!
> - using a 0-ply 216 games rollout without extras, gnubg's results seem more
> "correct"

When I rolled this out, at 216 games I got:

Rolled 33 (+0.074%):
     1. Rollout          11/8 6/3(3)                  MWC:   3.47%
         1.4%   0.0%   0.0% -  98.6%  88.2%  72.8% CL   3.45% CF   3.47%
      [  1.0%   0.0%   0.0% -   1.0%   2.7%   3.0% CL   0.81% CF   0.82%]
     2. Rollout          14/8 6/3(2)                  MWC:   3.33% ( -0.14%)
         1.6%   0.0%   0.0% -  98.4%  89.0%  76.2% CL   3.32% CF   3.33%
      [  1.0%   0.0%   0.0% -   1.0%   2.1%   2.2% CL   0.65% CF   0.67%]
     3. Rollout          14/8 11/8 6/3                MWC:   3.17% ( -0.30%)
         0.6%   0.0%   0.0% -  99.4%  88.3%  72.8% CL   3.17% CF   3.17%
      [  0.6%   0.0%   0.0% -   0.6%   1.4%   2.0% CL   0.41% CF   0.41%]

The standard deviations for these values are so large as to make them
worthless for comparision. Consider what it's telling you - maybe 1 or
2% of games will be wins for Spock. Which means it saw somewhere
between 2 and maybe 5 wins while rolling out 216 trials. You simply
haven't enough trials to make this figure very reliable. Moving up to
1296 trials is still discouraging - the weird moves clearing the 6
point are now down the list where they belong ... 

*    1. Rollout          14/5 11/8                    MWC:   3.37%
         0.2%   0.0%   0.0% -  99.8%  86.9%  73.4% CL   3.40% CF  3.37%
      [  0.1%   0.0%   0.0% -   0.1%   0.5%   0.8% CL   0.14% CF  0.14%]
        Full cubeful rollout with var.redn.
        1296 games, Mersenne Twister dice gen. with seed 1410125879
        and quasi-random dice
        Play: 0-ply cubeful [expert]
        Cube: 0-ply cubeful [expert]

     2. Rollout          14/8 11/8 6/3                MWC:   3.01% (-0.36%)
         0.5%   0.0%   0.0% -  99.5%  88.7%  73.8% CL   3.02% CF  3.01%
      [  0.2%   0.0%   0.0% -   0.2%   0.6%   0.8% CL   0.17% CF  0.17%]

     3. Rollout          14/2                         MWC:   2.92% ( -0.45%)
         0.0%   0.0%   0.0% - 100.0%  88.5%  74.3% CL   2.94% CF  2.92%
      [  0.1%   0.0%   0.0% -   0.1%   0.4%   0.7% CL   0.12% CF  0.12%]

...
     7. Rollout          14/11 6/3(3)                 MWC:   2.56% ( -0.81%)
         0.1%   0.0%   0.0% -  99.9%  90.1%  74.5% CL   2.56% CF  2.56%
      [  0.4%   0.0%   0.0% -   0.4%   1.0%   1.1% CL   0.31% CF  0.32%]
...
    10. Rollout          11/8 6/3(3)                  MWC:   2.27% ( -1.10%)
         0.2%   0.0%   0.0% -  99.8%  91.4%  76.7% CL   2.26% CF  2.27%
      [  0.4%   0.0%   0.0% -   0.4%   1.0%   1.2% CL   0.31% CF  0.32%]
...
    14. Rollout          6/3(3) 5/2                   MWC:   2.04% ( -1.33%)
         0.9%   0.0%   0.0% -  99.1%  93.2%  78.2% CL   2.03% CF  2.04%
      [  0.4%   0.0%   0.0% -   0.4%   0.8%   1.0% CL   0.27% CF  0.28%]


But look at the win column figures - they are all under 1% and they still
have relatively large standard deviations. Move 14 appears to have the
highest win percentage of the 20 moves rolled out, but we can only be
confident that it's somewhere above 0.1% and unlikely to be over
1.6%. If you roll out with different seeds, you get, not surprisingly,
very different small numbers.

The first 2 above differ by .36% MWC, but the J.S.D. is .22, from
which you can conclude that move 1 is probably better (confidence -
maybe 80%), but there's still a significant chance that in fact move 2
is better than move 1.

> I suspect that it is the variance reduction functionality in a position
> that the neural net cannot figure out, which is causing these
> problems.

Since it is clear from the evaluations that gnubg isn't good at
evaluating this position, its luck estimates may well be somewhat
off. This would not mean that you get incorrect results, but it does
mean it takes longer to get the variance down to reasonable levels. I
doubt that the estimates are so bad that you are worse off with
variance reduction, but it may not speed things up very much. Figures
like the ones above are simply not very useful - they have far too
wide a variance to give any confidence in one over the other or even
in the estimated wins/losses/gammons, etc. When you are concerned
about a relatively rare event (like managing a win in this case), you
need a *lot* of samples before you have a meaningful figure. I suspect
that to get a good estimate of the winning chances for Spock in the
above position you may well need 


> Jim: could you be more specific as to which problems you found and
> corrected in the rollout code?

Here's a list, but as you can see, they are all later than the build
of gnubg you are using. Also note that none of the work described
below was intended to change the output of rollouts in any way, so if
all the bugs have now been cleared, all this would mean is that the
new code is more convenient to use but gives *exactly* the same
results as the version you have.

Code changed on 
Sun 13 July 02:00 GMT - bugs introduced (details below)
                          1) incorrect cube decisions in rollouts
                          2) when rolling out moves, if one side had
                             chequers borne off, the opponent was
                             never seen to lose a gammon or backgammon
                          3) rollout of initial position no longer got
                             fully balanced series of dice rolls = a
                             patch was lost while merging in the
                             changes in this update)

Sun 13 July 18:17 GMT - resuming rollouts start faster (no effect on
                          rollout results)
Mon 14 July 22:11 GMT - fix bug 1) from 13 July
Wed 16 July 10:27 GMT - changes to the rollout display window and some
                          other progress windows (no effect on rollout
                           results)
Thu 17 July 10:31 GMT - remove an unused variable to suppress a
                           compiler warning (no effect on rollout results)
Fri 18 July 14:05 GMT - new code to allow stopping on joint standard
                           deviations - this could cause the record of
                           the number of games rolled out to be
                           incorrect if that move were stopped but
                           other moves continued to be rolled out
Sun 20 July 10:49 GMT   fix for incorrect number of games bug from Fri
                           18 July.
Sun 20 July 22:02 GMT   fix bug 2) from 13 July
Mon 21 July 11:50 GMT   fix bug 3) from 13 July

> And where can I read more about how variance reduction works (I do not have
> the time to read it out of the source code, if some more readable texts is
> available I would appreciate a link).

The most readable description I have seen of how VR is done in
backgammon rollouts is an article by David Montgomery in the
Gammonline Feb 2000 issue. Unfortunately, this is a subscription
service (and very much worth the price in my opinion), so if you
aren't a subscriber you won't be able to see it). However, I assume
the David Montgomery who often posts to the bug-gnubg mailing list is
the same person, so perhaps he might be able to help you get a copy,
assuming he isn't contractually prevented from doing so.

Otherwise there's a less easily readable thread of 6 postings (of
which 2 are really useful). On google advanced group search, look for:

Subject: Proposed Algorithm for Roll Outs 
Author:  Jim Williams 
Group:   rec.games.backgammon

Jim Williams initial posting and Brian Sheppard's follow up are a
reasonably good introduction to the concepts. But David Montgomery's
article with its work-through of a simple example was a much better
introductino in my view.

I assume that, since it's a widely used technique for statistical
analysis, that there is much coverage, probably with deep mathematical
backgrounds in university level statistics books, but I haven't
looked.

-- 
Jim Segrave           address@hidden





reply via email to

[Prev in Thread] Current Thread [Next in Thread]