[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Gnubg pubeval score

From: Øystein Schønning-Johansen
Subject: Re: [Bug-gnubg] Gnubg pubeval score
Date: Sat, 26 Jan 2019 10:42:37 +0100

Thanks for your effort Philippe. Your numbers looks correct.

However, I think it is important to state some more details.

First: Are the games played to completion? Or are the games terminated at race or bearoff or ...
Second: Does the pubeval evaluate all the position classes? I once did the mistake in a similar experiment where the pubeval player actually used a full bearoff look up table.
And then: These are cubeless moneygames I assume. These are not one-point matches.

(Another potential bug is the opening roll. I guess that it is taken care of.)

Thanks again for you effort, Philippe.

On Fri, Jan 25, 2019 at 9:51 PM Philippe Michel <address@hidden> wrote:
On Tue, Jan 22, 2019 at 08:31:08AM -0800, Robert Edgar wrote:

> Can anyone confirm the score of a recent version of gnubg vs. pubeval? I
> hacked the source and found that gnubg v1.06 averaged +1.1ppg (82% wins)
> over 10k games, but a recent paper Papahristou & Refanidis (2017) quotes
> +0.60 ppg which is only marginally better than TD-Gammon (+0.59). My
> number seems high, but +0.6 seems too low considering how much effort
> went into optimizing the gnubg code.

Three 10k games trials with the current net give (for 0 ply evaluations) :
+0.635ppg (71.1% wins)
+0.630ppg (70.9% wins)
+0.645ppg (71.7% wins)

Without counting backgammons the nubers become 0.612, 0.603 and 0.620.

+1.1ppg and 82% wins is simply impossible. There must be some bug in
your pubeval implementation or usage.

Amusingly, the message quoted in Ian Shaw's answer is from a thread
started by someone who got a similarly high number (from his own program
rather than gnubg) and it was due to such a bug :

FWIW, I ran shorter trials at 1 ply and 2 ply.
1000 games @ 1 ply : +0.66ppg
100 games @ 2 ply : +0.70ppg

If someone is interested, I could do these with 10 times more games (it
would take a few hours instead of a few minutes) but there would still
be a lot of uncertainty in the 2 ply result.

Bug-gnubg mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]