[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Gammon output setup

From: Mark Higgins
Subject: Re: [Bug-gnubg] Gammon output setup
Date: Sun, 11 Dec 2011 10:47:51 -0500

Softmax activation looks pretty interesting! I guess in that case you'd need to change the meaning of the outputs to ( prob of single win, prob of single loss, prob of gammon win, prob of gammon loss, prob of bg win, prob of bg loss ); then they all have to sum to 1 but there's no restriction that one be larger or smaller than another. ie rather than having a "prob of any win" output.

On training with conditional probabilities: it actually makes no difference in the middle of the game - the "new" value you're training against is just the network estimate of the conditional probability again, so no division necessary. You have to be careful at the end of the game - ie do you train the conditional gammon win node if the player loses? I'm finding a fair bit of sensitivity to assumptions about this and I'm probably doing something wrong there. :)

Along these lines, even with the usual gammon win output: do you train this any longer in mid-game if the opponent has borne in a checker, or do you just stop training?

On Dec 11, 2011, at 7:56 AM, Øystein Schønning-Johansen wrote:

Hi Mark!

How's your rally driving going. ;-)

On Sun, Dec 11, 2011 at 4:45 AM, Mark Higgins <address@hidden> wrote:
I notice in gnubg and other neural networks the probability of gammon gets its own output node, alongside the probability of (any kind of) win.

Doesn't this sometimes mean that the estimated probability of gammon could be larger than the probability of win, since both sigmoid outputs run from 0 to 1?

There is a sanity check function called after the neural net evaluation, that check that gammons don't exceed wins and backgammon does not exceed gammons.
I'm playing around with making the gammon node represent the probability of a gammon win conditioned on a win; then the unconditional probability of a gammon win = prob of win * conditional prob of gammon win. In that setup, both outputs are free to roam (0,1) without causing inconsistencies.

That's a possibility, but I go not believe it gains anything. (This is of course just a guess, since I've not tried. And you are of course free to try.) I guess you also need a similar scheme for backgammons?
Is there something I'm missing here about why this is suboptimal? Is there some other way people tend to ensure that prob of gammon win <= prob of any kind of win?

I guess you have to divide by the win prob in the training, which is still just an estimate. Hmmm.. I'm still thinking, maybe it can gain something, since they are kind of depending on each other.

However... what I would rather try is to have six outputs with a softmax activation function. Several neural net experts recommends softmax in their books and papers, and other parameter update rules (other than backpropagation) has been developed based on softmax outputs.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]