bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question


From: Øystein Johansen
Subject: Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question
Date: Thu, 21 May 2009 10:18:57 +0200
User-agent: Thunderbird 2.0.0.21 (Windows/20090302)

boomslang wrote:
> Hi all,
> 
> I have a question regarding TD(lambda) training by Tesauro (see
> http://www.research.ibm.com/massive/tdl.html#h2:learning_methodology).
> 
> The formula for adapting the weights of the neural net is
> 
> w(t+1)-w(t) = a * [Y(t+1)-Y(t)] * sum(lambda^(t-k) * nabla(w)Y(k);
> k=1..t).
> 
> I would like to know if nabla(w)Y(k) in the formula above is the
> gradient of Y(k) to the weights of the net at time t (i.e. the
> current net) or to the weights of the net at time k.  I assume the
> former.

That really doesn't matter much, I believe. I guess, as you that it is
the former. You can check this with Sutton/Barto I guess.

However: This equation was never implemented in gnubg! All TD training
that was done in gnubg, (and that's a long time ago and abandoned at an
early stage), was done with lambda = 0. Notice how lambda = 0 simplifies
the equation. There will only be one term -- when t = k. This simplifies
 the implementation to only take into account the previous position when
updating the weights. Can be simply solved with backprop.

Our experience is: TD is nice for kickstarting the training process. But
supervised training is the real thing. Make a big database of positions
and the rollout results according to these positions and train supervised.

If you still would like to do TD training with your system, I really
recommend looking at Sutton/Barto.

Good luck!
-Øystein

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]