bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Bug-gnubg] User training of the Neural Nets


From: Ian Shaw
Subject: RE: [Bug-gnubg] User training of the Neural Nets
Date: Fri, 25 Aug 2006 16:08:22 +0100

Øystein Johansen wrote on 23 August 2006 19:43
 
> 1. Get the gnubg-nn code
>    cvs -d:something:blah co gnubg-nn

So there's a separate program for developing NNs? I'd be interested in having a 
look. What do I need to do?

> 3. Before you start training anything:
>    Steal the neural net evaluation code from gnubg, the code
>    that uses SSE, and apply it to the code ing gnubg-nn.
>    This step will save you a lot of time in the traing.
>    (commit the changes back to the cvs)

Am I right in thinking that the 5-node pruning nets do not use SSE 
vectorisation. At one point, you were considering implementing this. Did 
anything come of it? IIRC, you also mentioned increasing the hidden nodes to 8 
- because the loops have to be in multiples of 4.
 
> Here's where I stranded... It worked it worked! I could breed 
> new nets, but none of the nets I trained was significantly 
> better than the original onem no matter how long I trained.
> 
> 6. A programmer can now try out different things, like further
>    splitting of neural nets, or altering the inputs, or guessing
>    other algorithms thar might work.
> 
>    Look at the different hand crafted inputs, can anyone be
>    removed? Can anything be added? I believe there is code to
>    dynamically add and remove nn inputs. If you add a input
>    make sure you add a new 'concept' and not just something
>    that's linearly depending on some other inputs.
> 


LINEARITY

Can someone clarify what is meant by "linearly". Tesauro has also mentioned 
this and I would like to ensure I know exactly what is meant in the nn context.

For example, I would call the pipcount linear because it is simply the sum of 
all the chequer distances from home. Indeed, pipcount is not in the gnubg input 
set.

I would call home-board strength non-linear, because the number of dancing 
rolls is proportional to the square of the number of points closed.

Is this correct?

NEW NEURAL NET INPUTS

I spent some of my holiday reading eval.c, trying to understand the current set 
of nn inputs .(Much to my wife's discomfort: "Ian, have you brought work on 
holiday?" "No." "Well it looks like work." "Yes, but it's much more fun!")

I was inspired by the idea of trying to capture some of the concepts Robertie 
espoused in Modern Backgammon. This has the appeal of continuing the 
man-machine feedback loop, since the theme of Robertie's book is to explain 
concepts have been learnt from the bots. If you've not read the book, the four 
concepts are Efficiency, Connectivity, Non-commitment and Robustness. These 
don't seem to be explicitly encoded as nn inputs (though they may be included 
in the I_MOBILITY and I_ESCAPES features, which I don't fully understand yet.) 

The first two are probably easier to implement. Here are some ideas - 
suggestions welcome.

Connectivity could be measured by the number of friendly chequers up to six 
points ahead of each chequer, or the number of rolls that join one chequer to 
another. However, this seems quite linear to me, so might to help.

I was considering point-making rolls as one measure of efficiency. This tends 
to be the square of the number of occupied points, so I see it as non-linear. 
Similarly, one could count Point-on-Head rolls.

The distribution of spares is another obvious measure of efficiency. However, 
this is already encoded in the basic board structure.

The common factor in these suggestions is that they attempt to look ahead to 
the players next roll. Perhaps they can identify some of the tactical 
advantages currently only found by 2-ply analysis.

It might make sense to split the board into two halves for these measures, 
since localized tactics on your side of the board are often different to 
tactics on the other side of the board. For example, we try to build primes on 
our half of the board, but run to safety with the back men.

For Non-commitment, perhaps one could define an input that measures the 
"purity" of a position.

Robustness might be measured as the degree of freedom each chequer has to play 
each die 1-6 in terms of being unblocked or not deep in the home board.

Joseph mentioned that he had tried and discarded various hand-crafted inputs. 
Is there a record of what has been tried already? (I assume that the inputs in 
eval.c CalculateHalfInputs are all actually used by the nn, not deemed 
unhelpful and weighted to 0 elsewhere.)

BASIC BOARD ENCODING

Has anyone tried modifying the basic board encoding recently?

For each point, there are three boolean inputs and one integer input per point. 
The boolean inputs are set true for 1, 2 and 3 chequers on the point, i.e. a 
blot, a point, and a spare. The integer input counts chequers above three. 

AFAIK, this encoding was first used by Tesauro in the early 90s. It worked well 
so it seems that everyone has used it since. Perhaps it was the best 
configuration at the time, given the computing power available, but today's PCs 
are about 10 times as fast and have oodles more RAM. 

It doesn't strike me that that this encoding is naturally best. Does it not 
imply that there is a linear relationship between the number of excess spares 
on a point? I don't think this is true. For example. A fourth chequer on a 
point is often good, allowing one to make points with doubles, or slot and 
cover on consecutive rolls. It is usually only once we get a 5th chequer on a 
point that we consider it to be "stacked". Even then, an opening 65: 24/13 is 
good.

The boolean encoding of features was the stroke of genius; perhaps it should be 
extended.  One could try boolean inputs for fourth and fifth chequers, saving 
the integer for serious stacking. If the extra 50 or 100 inputs are too much, 
perhaps just the points that commonly get heavily loaded could have additional 
boolean inputs, the six-, eight- and midpoints.

Again, I'm aware that the issues I'm considering have already been tackled, so 
any pointers would be most welcome.

-- Ian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]