bug-gnubg
[Top][All Lists]

## Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question

 From: Øystein Johansen Subject: Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question Date: Thu, 21 May 2009 10:18:57 +0200 User-agent: Thunderbird 2.0.0.21 (Windows/20090302)

```boomslang wrote:
> Hi all,
>
> I have a question regarding TD(lambda) training by Tesauro (see
> http://www.research.ibm.com/massive/tdl.html#h2:learning_methodology).
>
> The formula for adapting the weights of the neural net is
>
> w(t+1)-w(t) = a * [Y(t+1)-Y(t)] * sum(lambda^(t-k) * nabla(w)Y(k);
> k=1..t).
>
> I would like to know if nabla(w)Y(k) in the formula above is the
> gradient of Y(k) to the weights of the net at time t (i.e. the
> current net) or to the weights of the net at time k.  I assume the
> former.

That really doesn't matter much, I believe. I guess, as you that it is
the former. You can check this with Sutton/Barto I guess.

However: This equation was never implemented in gnubg! All TD training
that was done in gnubg, (and that's a long time ago and abandoned at an
early stage), was done with lambda = 0. Notice how lambda = 0 simplifies
the equation. There will only be one term -- when t = k. This simplifies
the implementation to only take into account the previous position when
updating the weights. Can be simply solved with backprop.

Our experience is: TD is nice for kickstarting the training process. But
supervised training is the real thing. Make a big database of positions
and the rollout results according to these positions and train supervised.

If you still would like to do TD training with your system, I really
recommend looking at Sutton/Barto.

Good luck!
-Øystein

```

signature.asc
Description: OpenPGP digital signature