bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Even, odd and half plies (WAS: New Contact Net Error Rat


From: Nis
Subject: Re: [Bug-gnubg] Even, odd and half plies (WAS: New Contact Net Error Rates)
Date: Tue, 25 Feb 2003 20:47:05 +0100

ALERT: Long and boring article ahead. Includes math.

--On Thursday, February 20, 2003 12:02 -0500 "Moore, Dave" <address@hidden> wrote:

Jim Segrave wrote:

This leads me to a point I've been wondering about. Everyone discusses
gnubg having errors on odd ply evaluations.

I remembered it this way also - but looking back at the gnubg archive, it seems like the onlything discussed was that gnubg has huge DIFFERENCES between odd and even ply on some types of positions. See this thread:

http://mail.gnu.org/archive/html/bug-gnubg/2002-07/msg00023.html

I think this has mixed in the brains of myself and others with the fact that 1-ply does not play much better than 0-ply - which I also remember reading somewhere, probably on this list.

Can someone explain why
this would be? I have always taken it on faith that this is true, but
I'd like to understand the mechanism.

Here is a quick explanation:

This is a little imprecise. What happens is:

The 1-ply evaluation is the result of:

1. Finding the best move on one ply for each of the 21 possible rolls
2. Averaging the resulting positions, as evaluated on 1-ply (with the other player on roll)

Thus 1-ply evaluation is the average of 21 OTHER positions with the opponent on roll. However, a lot of these positions will be similar to the current one (see the example in the link above)

If gnubg is better at evaluating a certain position from one side than the other, then 1-ply and
0-ply might differ a lot. The same is true for higher even and odd plies

So far, I have seen no good arguments for why gnubg should be better at one than the other. Since I haven't been able to find the explanation, here is a try at it:

Gnubg has been trained with the specific purpose of being able to make better decisions on 0-ply. Thus, either through evolution (the training proces) or breeding (the selection of which changes to make to the net input), the neural nets have been improving more on positions where there are active decisions to make. Since crashed positions are mostly characterized by one side having very few checker play decisions, one side of the positions have been favored by this.

It is worth noting that this means, that it might not be a good idea to fix the evaluations - since this would mean weaker play (at even ply) for the side having to make the hard decisions. The exception would be doubling decisions - which are likely to be equally important from both sides of the board.

My naive interpretation:

For example, a position viewed from player 0's point of view may evaluate
to +0.600 equity, but the same position evaluated from the other side will
evaluate to -0.589 equity.

Almost. It is the result of the 21 resulting positions (with opponent on roll) having equity -0.589 on average.

I also have two naive questions:

1.  Wouldn't it be possible to run the odd-ply evaluations while always
evaulating the board from player 0's point of view?
You would still go
through the possible dice and moves for player 1, but the move would be
selected by evaluating the resulting position from player 0's point of
view, always.  This would eliminate the jumps in absolute equity numbers.

For short: No, and even if we could it wouldn't give more precise results, only more consistent ones.

(very naive)
2.  Could positions that evaulate to different equity from different sides
of the board be used as training data so that the Net would converge to an
agreed upon answer when looking at things from either side of the board?

Not stupid at all. It seems the general agreement last time was that this was too dangerous, since there would be a risk that the net became better at this kind of positions at the expense of other, more common types of positions.

I have, however, thought of an idea for overcoming differences between odd and even plies:

The basic idea is to introduce the half-ply: The average between 0 and 1 ply. or in general between n and (n-1) ply. This would decrease the average square of the error for the kind of position - since the "true" equity for the position is most likely to be somewhere between these two evaluations.

At the same time, however, we loose something - since hopefully the evaluation at n-ply should be better on average than the one at (n-1). After all, that is why we evaluate at higher plies.

An obvious way of correcting for this fact would be to use a weighted average of the n and (n-1) evaluations - with a weighing factor determined by empiric research. Does anyone have a large database of rolled out positions lying around - if possible including at 0 and 1-ply evaluations from the current net as well.

My idea would be to find the average of (rollout - 0-ply)/(1-ply - 0-ply) and use this as the weight given to the n-ply evaluation.

Another way to make a half-ply evaluation would be to evaluate some of the rolls at 0-ply, some at 1-ply. This can be extended to (n+1/2)-ply by recursion, just like it is done with the integer plays today When I got this idea, I thought to myself: "So THAT must be how reduced evaluation works". Looking in the list archives and then into the source code, it seems like this is not the case. Gnubg actually only evaluates some of the rolls at each leaf in the ply-tree. This does however mean, that we have an existing framework in eval.c for doing the half-ply evaluation.

This approach would have the same positive and negative effects as the (weighted) average model described above - with the exception that we do not "waste our time" by doing a full 1-ply eval, but get a very good approximation of what it would have been.

I have more ideas than the ones given here - but let me hear the reactions of the rest of you before I "go wild".

--
Nis Jorgensen
Greenpeace
Amsterdam




reply via email to

[Prev in Thread] Current Thread [Next in Thread]