[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Temporal difference learning. Lambda parameter.

From: Philippe Michel
Subject: Re: Temporal difference learning. Lambda parameter.
Date: Sun, 22 Dec 2019 00:09:07 +0100

On Sat, Dec 14, 2019 at 01:12:34PM +0100, Øystein Schønning-Johansen wrote:

> The reinforcement learning that has been used up til now is plain temporal
> difference learning like described in Sutton and Barto (and done by several
> science projects) with TD(lambda=0).

I don't think this is the case (or the definiton of TD is much wider 
than what I thought).

The 1.0 version uses straightforward supervised training on a rolled-out 

I wasn't involved at the time, but as far as I know :

Earlier versions, by Joseph Heled, used supervised training on a 
database evaluated at 2-ply. 

The very first versions by Gary Wong did indeed use TD training but this 
was abandonned when it seemed stuck at an intermediate level of play 
(but the problem was probably not due to the training method since 
TD-Gammon before that and BGBlitz since then did very well with TD).

> Do you think that the engine can be better at planning ahead, if lambda is
> increased? Has anyone done a lot of experiments with lambda other than 0?
> (I don't think it's code in the repo to do anything else than lambda=0, so
> maybe someone with some other research code base on this can answer?) Or
> someone with general knowledge of RL can answer?

The engine doesn't "plan ahead", does it ? It approximates the 
probabilities of the game outcomes from the current position (or we can 
say its equity for simplification).

My understanding is that its potential accuracy depends on the neural 
network (architecture + input features) and the training method 
(including the training database in the case of supervised learning) has 
influence on how close to this potential one can go, and how fast.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]