[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] The race training and benchmark datasets

From: Øystein Schønning-Johansen
Subject: Re: [Bug-gnubg] The race training and benchmark datasets
Date: Mon, 10 Jun 2019 14:49:53 +0200


I will try re-rolling out these positions. Do you have any experience of how to do good rollouts of race positions? Good rollout settings for race positions?


On Sun, Jun 9, 2019 at 11:38 PM Philippe Michel <address@hidden> wrote:
On Fri, Jun 07, 2019 at 08:30:10PM +0200, Øystein Schønning-Johansen wrote:

> (Of course I remove any position duplicated in the two datasets, such that
> the training and validation set are disjoint.)

Is it really important (in general) ? I know one shouldn't use the same
dataset but is some limited random overlap really an issue ? I didn't
verify how limited it is in the case of gnubg's databases, though...

> I train a neural network. If I validate the training with a 10% fraction of
> the training dataset itself, I get a MSE error of about 1.0e-04. But if I
> validate against the dataset generated from train.bm-1.00.bz2 I get an MSE
> error of 7e-04. About 7 times higher!
> This makes me believe that the rolled out positions in the race-train-data
> file is rolled out in an other way (different tool, different settings,
> different neural net?) than the positions in train.bm-1.00.bz2.

Different tool and different neural net.

For the benchmark databases it is recorded as a comment at the beginning
of the file :

s version 1.93 weights 1.00 moves2plyLimit 20 rolloutLimit 5 nRollOutGames 1296 cubeAway 7 include0Ply 1 evalPlies 2 shortCuts 1 osrGames 1296 osrInRoll 1

This is version 1.93 of the sagnubg tool, using the 1.OO weights file
(the current one). I rerolled the benchmark databases with it after the
new weights file was generated.

The training database was rolled out with a slightly modified gnubg
(merely to have gnubg -t print the rollout results in the right format).

This was done with earlier weights. I didn't kept notes but I think I
used one intermediate weights set for the race and possibly more than
one for the crashed net (rollout the training database with the 0.90
net, train a new net, reroll the training database with it, etc...). For
the contact net I'm not sure.

In any case, this was with different weights than the current benchmark

> Joseph? Philippe? Ian? Others? Do you know how these data where generated?
> Is it maybe worth rolling these positions out again? I do remember that
> Joseph made a separate rollout tool, but I'm not sure what Philippe did?

It is likely the different errors you got have another cause : as far as
I can see,the sagnubg tool used for creating the benchmark databases
doesn't use variance reduction.

That should be enough of a reason to seriously consider rerolling them,
but we would have to implement variance reduction in sagnubg first or
use gnubg with some substantial pre- and post-processing.

> (I also remember that the original benchmark was move based, and it
> calculates the loss based on incorrect moves picked, and that it might not
> be that interesting if the rollout values are abit wrong....)

I'm afraid they may not be just a bit wrong. It seems the standard
deviation of a 1296 trials rollout without variance reduction is larger
than the vast majority of the "errors" found when running the benchmark.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]