[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question

From: Øystein Johansen
Subject: Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question
Date: Thu, 21 May 2009 11:41:17 +0200
User-agent: Thunderbird (Windows/20090302)

Ian Shaw wrote:
>> Our experience is: TD is nice for kickstarting the training 
>> process. But supervised training is the real thing. Make a big
>> database of positions and the rollout results according to these
>> positions and train supervised.
>> If you still would like to do TD training with your system, I 
>> really recommend looking at Sutton/Barto.
> It's probably worth noting that Frank Berger has had a different
> experience. If I recall correctly, Frank used only TD training for
> BgBlitz, with no supervised training. (This was some years ago, so I
> may be out of data or just wrong.)

Really right.

> With the increase in processing power since the current gnubg net was
> developed, I wonder if there is some merit in having another crack at
> it. Are you doing any work on the NN side of things, Øystein? I think
> Joseph has stopped.

I did some effort about 2 years ago, but I could not harvest any fruits
from it. I'm hoping to catch up with that work. Among the things I did
was to rewrite/refactor some of the evaluation code. I also tried to
make different position-classes with a k-means scheme. I can't say it
did not work, but it has to be fine tuned and further trained to give
better results, I believe.

I remember I first tried TD training. (lambda=0), and I made the same
experience as Joseph reported. TD is slow. However, I was able to run
5000 games/minutes. TDG 1.0 was trained with 300.000 games, and I'm able
to reach that in an hour. Maybe TD can be reconsidered.

BTW: I also think Frank's training algorithm uses other values for
lambda. I'm not sure of all the details in his project.


Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]