[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnugo-devel] twin endgame match
From: |
alain Baeckeroot |
Subject: |
[gnugo-devel] twin endgame match |
Date: |
Fri, 3 Mar 2006 14:38:29 +0100 |
User-agent: |
KMail/1.9.1 |
Hi
Following Arend advice, gg378 and twin-378 had a 85 games endgame-match:
- twin 26 win (1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 3 5 7 10 14 15 21 25 28)
- GNU Go 14 win(-9 -3 -3 -3 -2 -2 -2 -1 -1 -1 -1 -1 -1 -1)
- 45 unchanged
The sum is +135, the average on 85 games +1.6
_but_ when one looks at the attached plot of cumulative +PASS -FAIL versus
game_status, the twin fails a lot of end-game tests (game_status>0.85). It is
already a huge task to check big failures, but i feel too lazy to investigate
this 40 tests and more than 50 regressions in endgame, (and i am a very bad
yose player ;-)
By construction, the twin "knows" exactly how gg378 evaluates the game, and
the twin may steal a big point before gg378 plays it, but it is still
gnugo-logic. So i wonder if this endgame match is significant or if it is
just a systematic error.
In other words, a reliable endgame comparison should imply an other engine,
good at endgame, and compare the results of both against the reference
engine.
Am i right, or just paranoid ?
Is there such an engine available ?
- Alain
PS: the plot include all boardsizes, it is not so flat when separating them,
but i have made too much clean-up, and erased the results, so ... i re run
regression tests again :(
twin4-d1.5_cumul+P-F_vs_gstatus.png
Description: PNG image
- [gnugo-devel] twin endgame match,
alain Baeckeroot <=