bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] TD-Gammon input, output encoding shemes


From: Joern Thyssen
Subject: Re: [Bug-gnubg] TD-Gammon input, output encoding shemes
Date: Sat, 3 May 2003 08:57:30 +0000
User-agent: Mutt/1.4i

On Wed, Apr 16, 2003 at 06:14:11PM +0700, Truong Khanh wrote
> hello all,
> 
> I was researching to apply the Reinforcement Learning, TD(lambda) into
> some game projects, and the stuffs I looked into that TD-Gammon game
> from Gerald Tesauro and GNUBG. A big obstacle that makes me feel
> difficult to understand is input and output encoding scheme. 

> As I know, Tesauro used 3 layers with 198 input units, 40 hidden
> units, and 4 output units, then update the connection weights by
> formula TD(lambda).

> Typically, the neural net uses the pair (0,1) for input value along
> with log-sigmoid function. My question is what the input and output
> values for input and output units? 

I don't know what Tesauro used as input, but I can tell you what gnubg
uses for the contact network:

200 inputs are used to represent the board, 4 inputs per point per
player:

player has exactly 1 chequer on point 
player has exactly 2 chequers on point
player has exactly 3 chequers on point
(number of chequers - 3 ) / 2

Besides that gnubg uses 50 so-called "pseudo-inputs" to represent
features that could help the net play backgammon. Examples are: location
of back anchor, location of forward anchor. the degree of contact, pip
loss from hitting opponent, number of rolls that hit opponent, etc. See
eval.c in the gnubg distribution for a complete list.

The output nodes are simply: wins, gammon wins, backgammon wins, gammon
loses, and backgammon loses. "loses" are not represented as it linearly
dependent on "wins".

Jørn




reply via email to

[Prev in Thread] Current Thread [Next in Thread]