bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] 0ply doubles early


From: Joseph Heled
Subject: Re: [Bug-gnubg] 0ply doubles early
Date: Sat, 16 Dec 2006 11:40:05 +1300

On 12/16/06, Robert-Jan Veldhuizen <address@hidden> wrote:


On 12/15/06, Joseph Heled <address@hidden> wrote:
> I understand your frustration with gnubg not handling the cube as well
> as you think it should at those "simple" or "straightforward"
> situations.

Well, 0-ply cube is still pretty good and I normally use 2-ply cube anyway,
which I'd recommend to anyone. It just seems that it wouldn't be that hard
to improve 0-ply's cube handling by fine-tuning the volatility estimate?

I think 0-ply cube is awesome and that is due to 0-ply awesome ability
to get the cubeless equities right. You keep saying it would not be
hard to improve by  "fine-tuning the volatility estimate" but I have
not yet seem a sign you checked that the code contains such a knob or
how hard it would be to "fine-tune" it. I don't find those comments
constructive when not based on facts.



> Yet I do not think the situation is simple. race is
> simpler than contact, still someone may make great improvements if she
> is willing to do the proper research.

Yes and I think Christian made a good start, suggesting strongly that 0-ply
doubles too early.

And people been saying for years gnubg plays badly at "close to race"
positions, yet when pressed where not even able to define a criteria
for categorizing them.


> Personally I do not believe any of the numbers below except the
> cubeless winning percentage.

Backgammon is not solved except for a very small subset of positions. So,
numbers aren't going to be exact.

And you are telling *me* that. how illuminating. I stand corrected.


I don't see the problem with that. The idea is to improve, not to be
perfect, isn't it? And decent settings rollouts with enough trials figure to
be (much) better than evaluations nearly always, so it seems like a good
idea to use rollout results to improve gnubg's evaluations.

Isn't that a straightforward method being used all the time to improve bots?

I think you fail to understand the basics of the problem. The method
use to train gnubg works great for cubeless evaluations since you have
a very firm starting point, and that is the bearoff which can be
solved quite accurately by brute force. This is not the case for
cubeless evaluations. If you want to do the same you have to start by
doing the same thing - i.e. build a base you can be sure off. Only
then you can start incremental improvements which are based first on
rollouts, then possibly on higher plys. Without such a base you have
nothing to stand on. A net based on random 0ply moves will generate
random 2ply moves, not matter how much you wish it to be otherwise.


For the simple race position I posted, I have no doubt whatsoever that the
rollouts are more accurate than the evaluations, especially the numbers
after double/take should be very close to the truth, since only few trials
will have to make non-obvious cube decisions after a double/take in the
actual position: first a big turn-around, then a cube action that is close
enough so that gnubg could get it wrong.

More evidence for this is the fact that whereas 0-ply evaluation says
double/take and 2-ply says no dbouel/take, a rollout with 0-ply evaluation
says no double/take and a rollout with 2-ply cube evaluation is nearly
identical, also saying no double/take.

It's no proof, but it's pretty strong evidence already. Higher settings
rollouts could be done but I doubt anything different would come out.

> 2ply, 4ply and rollout cubfull numbers
> are all based on a large number of 0ply cubefull decisions, and if
> this is suspect, why would they be any good. 2ply play will not be any
> better than 0ply if your 0ply is awful.

I am very surprised you write this. Maybe I'm misunderstanding you here.

I think the above is not true at all and simulations have proven that. 0-ply
is often somewhat inaccurate, but 2-ply will average over a lot of 0-ply
evaluations. The end result will nearly always be better than a single 0-ply
evaluation. A similar argument goes for rollouts.

Just because the 2-ply numbers aren't perfect, doesn't mean they are not an
improvement over 0-ply.

> If I was to start somewhere, I
> would start with doubling on the very last stage of bearoff - where
> you first get the true actions by brute force. This requires a large
> database since you need the result for each score.

Short bearoffs have very much different volatility estimates than races or
contact positions, so I don't see how this would help 0-ply's cubes in
general. Also, using this approach seems practically impossible with current
processing power, no?

I think a bit of experimentation could help make GNUBG's cube action
stronger. Christian's sample suggest quite a strong bias in 0-ply's cube
handling towards early doubling.

That brings up two basic questions: does 0-ply overestimate equities on
average? If so, then this might not be so easy to solve. However, I think
it's more likely that 0-ply cube action is just using a too high volatility
estimate and that is not so hard to improve.

Since backgammon is and will not be solved any time soon, I don't see any
better way than using rollouts to help improve GNUBG.

 | In addition I am
> not sure I agree with the doubling code in gnubg. I always used my own
> code which is part of the fibs2html or gnubg-nn, which I think is
> better (but I may be wrong). If someone want to take this code and
> integrate it into gnubg, where one can choose which method to use
> would be a great start as well.

That sounds interesting. What is the difference between your
algorithm/formula and GNUBG's present algorithm/formula?

My code is based on Tom Keith ideas in "How to Compute a Match Equity
Table " (  http://www.bkgm.com/articles/met.html)  and "Match Play
Doubling Strategy" (http://www.bkgm.com/articles/mpd.html).


--
Robert-Jan Veldhuizen




reply via email to

[Prev in Thread] Current Thread [Next in Thread]