bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Extendable rollouts


From: Joseph Heled
Subject: Re: [Bug-gnubg] Extendable rollouts
Date: Sun, 13 Jul 2003 15:07:13 +1200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624

Nice work!

Most of this (all?) should be in the manual/HowTo or whatever. Just posting means you have to start searching emails/groups every time there is a question ...

-Joseph


Jim Segrave wrote:
I've just committed a mega-patch to include this feature. Here's some
notes to go with it. I'll look forward to people testing it,

This code does the following:
For any rollout, the saved results include a rollout context which is
   extended with two additional fields - the number of trials actually
   rolled out and the value of 'nSkip' which affects the quasi-random
   dice generation.
When rolling out a move/cube decision, gnubg now looks to see if the
   last analysis was a rollout. If so, it initiates the rollout using the
   saved context except that:
The number of trials is taken from your current rollout settings. The settings for options to stop rollouts early (currently only the
      stop when STD is small enough), are taken from your current rollout
      settings, not from the saved rollout context.
.sgf files also save the complete rollout context. The values of win/win gammon/.../cubeless equity/cubeful equity/, their std's and the value of rScore and rScore2 are now saved using %.10g format,
   which was a guess on my part on what would be sufficient accuracy to
   resume rollouts.
When a rollout is running, if you press stop, the number of completed
   rollouts is recorded.
So, to extend a rollout, you don't need to remember what the previous
   settings were, you simply select the number of trials you want the
   rollout to go to and it simply works.
When rolling out multiple alternative moves, all the moves are done in
   step rather than the old way of completing the rollout of the first
   alternative before starting the second (it's a bit more complicated
   what happens when extending one rollout and starting or extending a
   different move's rollout, but the effect is the same - moves/decisions
   with a lower number of completed trials are done until they catch up
   with those which are being extended, then the process proceeds in
   parallel.
.sgf files from earlier versions will be readable by this code, but
   rollouts won't be extendible. .sgf files which contain rollouts
   written by this code will not be readable by (and in some cases may
   cause a crash) older versions. .sgf files written by this code which
   don't contain rollouts will be fully interchangeable with older
   versions of gnubg

Caveats:

   Testing rollouts is very slow, so I can't claim this code is
   bug-free. I have done a large number (in the hundreds) or rollouts,
   primarily of a small number of positions and compared the results
   to those of the version of gnubg just before these changes. This
does not mean that there can't be bugs - RolloutGeneral has been heavily modified, as has CommandRollout, GeneralCubeDecsionR and
   a few others. Anyone finding bugs, please report them.
Accuracy:

   Resuming a rollout costs a small amount in accuracy. Since only the
   value and standard deviation are saved, some of the internal variables
   used in the rollout loop are reconstructed from them. In particular,
   the accumulated variance and the accumulated sum are reconstructed and
   slight differences will appear. My experiments, necessarily limited in
   length and number, suggest that the effects are at most visible in the
   4th decimal place.
.sgf files containing rollouts will be slightly larger, because they
   contain a complete rollout context and the values are stored with
   greater precision.
Repeatability: Every extended rollout will use the same seed as the initial part of
   that rollout. If you are using quasi-random dice. then the games
   rolled out (as long as there are less than 128 rolls in any one game)
   will get the exact same sequence of dice as they would have got had
   the rollout not been interrupted. This does assume you are using a
   repeatable RNG, if not, the results are unpredictable, but it's hard
   to argue that they are intrinsically different from what you would have
   got if you didn't interrupt the rollout.
Misc things: If you have a match where different moves have been rolled out with
   different rollout settings (whether it's truncation, move filters, rng
   seed, or whatever), then each one will be extended using the corresponding
   rollout settings.
If you have two different moves which have been rolled out to a
   different number of trials (say one move was stopped at 360 trials,
   another at 720), and you set your rollout for 1296 games. The
   rollout would begin by doing the 360 trial game until it reaches 720,
   then both games will progress.
Output of rollout results will now show the number of trials
   actually done, not the number originally requested.
The only way to roll a move out with different settings is to get rid
   of the previous rollout results. The simplest way is to press the
   0-ply button, then select the move again and press rollout.
The code in RolloutGeneral() now can take an arbitrary list of
   positons, cubeinfo and rollout contexts and roll them out in one
   pass. While there's currently no way in gnubg to select positions from
   more than one point in a game or match, if there were a way, then all
   the selected positions can be rolled out with a single call to
RolloutGeneral. Limitations:

   Cube rollouts from the Annotation Window, results from Command
   Rollout, and the Rollout commands in the Analysis dropdown are not
   put into the move list and hence are lost. It would be good if this
   data were put into the moverecords.

Implementation notes:
The calls to RolloutGeneral (and in turn the calls to callers thereof)
   now allow passing in multiple board, eval setups, cubeinfos, result
   arrays, etc. There is no longer an assumption that these multiple
   items actually form a single array, so all the callers now pass an
   array of pointer to the various items. This makes call setup more
   tedious, but gives the flexibility to pass pointers to the relevant
   pieces of a large number of separate moves to a single call. See the
   lengthy comments just before RolloutGeneral in rollout.c
In a couple of places, I avoided the use of gnubg's dynamic arrays in
   favour of calls to alloca(). This is simply because debugging with
   dynamic arrays is sometimes very difficult - the debugger symbol table
   address for an item is not necessarily the location where the item
   actually resides, whereas this problem doesn't occur with alloca.
Whoever did the quasi-random dice code deserves special honours - it
   fits in so well with this system and already optimises some of the
   most common things - like checking if the rng seed used to set it up
   is the same. This is really nice code.
Because we may now be using the rollout context saved in .sgf
   files, I have done two things:
   Instead of the old indicator 'R ' to mark a rollout, the newer ones
   use 'X ' for eXtendable rollouts. I have also added an integer
   version number (set to 1), defined in eval.h. sgf.c will ensure
   that if the version numbers don't match, the rollout context will
   be marked as not being extendabl.

Future plans:
I'd like to discard the current 'stop when std is less than...' and
   replace it with a condition to be used only when rolling out more
   than one move or a cube decision. The idea would be to set a
   minimum number of trials, then, once those are completed, at the
   end of each iteration, caclulate the joint standard deviation of
   the cubeful or cubeless equity for the best move paired with each
other one being rolled out. When the equity difference exceeds some user chosen multiple of the joint standard deviation, stop
   rolling out the inferior move. Continue until only one move is
   left. The idea would be to automate rolling out close decisions
   until either your patience is exhausted and you have exceeded the
number of trials selected or when there is a reasonable confidence that one move is actually better than any of the others being considered.
   It would be nice to fix CommandRollout and the cube decision code
   to save their results.













reply via email to

[Prev in Thread] Current Thread [Next in Thread]