[gnugo-devel] Regression test suites

I would be interested in the regression test suite improvement/expansion. I have done software testing for 11 years and have the CSQE certification from ASQ.

As for the super KO task on the tactical list, would using (hashes/message digests) to encode the board position of a given move be adequate.

What I envision is this:

Convert the board position at the end of each move into a 361-character message consisting of characters from a 3 character alphabet.

In English the letters could be O, B, and W for Open, Black and White respectively.
Use a fixed serpentine path through the board; E.g. A1-A19, B1-B19, C1-19,..H1-H19, J1-19, …, R1-R19, S1-19, T1-T19.
For example the string at the end of move zero (beginning of the game) the string would be:

i. 361 O’s for no handicap stones,

ii. 60 O’s, a B, 239 O’s, a B, and 60 O’s if 2 handicap stones are placed by Japanese rules

With the 361 character string generated, create a hash of the string.

Any sound, cryptographic hash will do. The possibilities are 128, 160, 192, 256, 512 or 1024-bits.

i. I will use 512-bits (SHA512) in this example.

ii. This is represented in memory as an array of

1. 64, 8-bit bytes,

2. 32, 16-bit words,

3. 16, 32-bit double words

4. 8, 64-bit quad words

iii. Since the number is used internally by the program, there should be no interoperability issues or even little/big-endian issues.

Store the hash/move information; either

With the hash stored as part of the move information,
As a separately managed list of values which can be searched efficiently for a 32-bit target. The first 32-bits of the message digest would be the key value used for list.

The super Ko rule then is this:

At the end of each move (or proposed move in the case of is_legal()) construct the 361-byte message which is the board position and the hash(es).
Search for the 32-bit target number in the list of past moves; either

By walking the list of moves from first to last (because the move information has the hash information)
By searching the hash list in some binary/efficient way.

If the search for the target 32-bit value does not matched the 32-bit key value of any past move, Super Ko has certainly NOT been violated. Return NO_SUPER_KO.
If the search for the 32-bit value does match some past move, Super Ko probably HAS been violated. But, there is a chance the Super Ko has not been violated.

Treat the expensive hash as an array of 32-bit (or 64-bit) numbers.
Compare each array element to the corresponding array element of the expensive hash of the target pattern.

i. If different, Ko has certainly NOT been violated. Return NO_SUPER_KO.

ii. If same , continue to next pair array elements.

If all 16 (8) pairs of 32-bit (64-bit) numbers match, SUPER_KO has been violated with a probability of 1-(2^(-512)). I would return SUPER_KO_VIOLATED: same as move M at this point and take my lumps on the small chance of error.

But, if a 1 in 2^512 chance of erroneously returning SUPER_KO_VIOLATED: same as move M.

Compare the actual board patters of the target pattern and the board pattern of past move M identified in steps 3 and 4. SUPER_KO_VIOLATED: same as move M. Otherwise return NO_SUPER_KO.

Repeat steps 2 through 5 for all past moves.

There are only 3^361 possible board patterns. This is less than 2^573. Thus, SHA512 with 512-bits should prove an efficient substitute for encoding the whole-board pattern of any move. But, if the fact that there are 2^61 of the possible 3^361 board patterns which “fold” into the same 512-bit message digest, then create 2, SHA512 hashes. One hash is for the original board string and the second hash is for the lexical reversal of the board string. Use strrev() for example. This reversal for a bouble hash will only be effective if the hash algorithm is cryptographically sound. CRC’s would not work since the CRC of message and the CRC of the reversal is (by design) the same.

The reason to use a sound, cryptographic hash algorithm is that the:

1. The hash is designed to be computationally efficient.

2. The hash is designed to minimize collision (2 different messages yields the same hash value)

3. The hash is designed to maximize diffusion. (A single bit change anywhere in the board pattern is nearly equi-likely to change a given bit in the hash value). This increases the likelihood that a move anywhere on the board affected a bit in the first 32-bits of the hash. Thus, the difference of board position can be detected quickly.

In summary:

Two, 512-bit hashes (128, 8-bit bytes) can encode any board position without loss of information.

A single 512-bit hash (64, 8-bit bytes) can encode any board position with some loss of information; on the order of 1 part in 2^512.

A single 256-bit hash (32, 8-bit bytes) can encode any board position with some loss of information; on the order of 1 part in 2^256.

I recommend the double SHA512 approach. This is a total memory usage of 35,200 bytes for the double hashes of a game consisting of 275 moves. I do not know if this storage requirement is too large for GNU Go.

Hope this helps.

In Liberty

John Washburn

From:	John Washburn
Subject:	[gnugo-devel] Regression test suites
Date:	Thu, 28 Jul 2005 07:55:39 -0600