bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnubg] Problem with the crashed benchmark database


From: Philippe Michel
Subject: [Bug-gnubg] Problem with the crashed benchmark database
Date: Tue, 5 Jun 2012 00:33:37 +0200 (CEST)
User-agent: Alpine 2.00 (BSF 1167 2008-08-23)

The benchmark database for the crashed positions seems seriously corrupted.

Offhand, it seems that many positions with significant backgammons in play are very innacurate.

For instance :

#R HLDHABAADEAEAAAAAAAA 0.999958 0.696718 0.000878771 0 0
#R HHDHACAADEAEAAAAAAAA 0.999994 0.695403 0.00295781 0 0
#R LLJLCAACDAAEAAAAAAAA 0.999805 0.970848 0 0 0
#R LLJLABEADAAEAAAAAAAA 0.999858 0.969998 0 0 0
#R LLJLACCADAAEAAAAAAAA 0.999771 0.978238 0.00154321 0 0
#R LLJLCAAECIAEAAAAAAAA 1 0.444444 0 0 0
#R LLJLCACACBAEAAAAAAAA 1 0.666667 0 0 0
m AEAAAAOMGOICAANAAAAA 5 4 LLJLCAAECIAEAAAAAAAA -1.444444 LLJLCACACBAEAAAAAAAA 0.222223 HLDHABAADEAEAAAAAAAA 0.253066 HHDHACAADEAEAAAAAAAA 0.253902372 LLJLABEADAAEAAAAAAAA 0.525268 LLJLCAACDAAEAAAAAAAA 0.52601 LLJLACCADAAEAAAAAAAA 0.534875

The position is :

    +24-23-22-21-20-19------18-17-16-15-14-13-+  O: GNUbg
OOO | X  X  O          |   |                  |  0 points
OOO | X                |   |                  |
OOO |                  |   |                  |
OOO |                  |   |                  |
 OO |                  |   |                  |
    |                  |BAR|                  |v
    |                  |   |                  |
    |                  |   |                  |
    |    X  X          |   |                  |
    | X  X  X  X       |   |                  |  Rolled 54
    | X  X  X  X     X |   |             X    |  0 points
    +-1--2--3--4--5--6-------7--8--9-10-11-12-+  X: You

What ID corresponds to what move doesn't really matter : it is obvious thax X loses a lot of backgammons, and more than 44, 67 or 70% gammons.

The position is worth about -2.96 (cubeless), not -1.44, the possible errors are small, not random 0.2 or 0.5 blunders.

This is a rather extreme case, but I think the aggregate effect is important enough to significantly impair the usefulness of this benchmark.

I didn't look at the other databases. It seems the were done later, with a more recent version of the rollout tool.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]