bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Idea for encoding 2-sided database


From: Jim Segrave
Subject: Re: [Bug-gnubg] Idea for encoding 2-sided database
Date: Tue, 29 Mar 2005 18:46:35 +0200
User-agent: Mutt/1.4.2.1i

On Tue 29 Mar 2005 (10:11 +0200), Jim Segrave wrote:
> On Tue 29 Mar 2005 (08:27 +0100), Ian Shaw wrote:
> > 
> > 
> > 
> > From: Jim Segrave [mailto:address@hidden 
> > 
> > > On Mon 28 Mar 2005 (16:47 +0100), Ian Shaw wrote:
> > > > Could we reduce the size of the two-sided database by omitting gin 
> > > > positions?
> > > > 
> > > > Gnubg could assume that if the position was inside the 
> > > database space 
> > > > but the result was not stored, then the position is gin. A simple 
> > > > 0-ply evaluation could be used to verify who has won, and 
> > > as a sanity check.
> > > > 
> > > > I don't know how much effect this would have, but it would have 
> > > > greater benefits for the larger databases because more 
> > > positions would be gin.
> > > 
> > > I've not looked at the code, but I think it calculates where 
> > > to look as a fixed function of the current position. If 
> > > that's the case, then you need different indexing of the 
> > > database, which might not be a gain.
> > > 
> > I was thinking the index number would remain the same. If there is no
> > entry for that index, then the position is gin. Or is the index number
> > not actually part of the database? Does gnubg just count CRs or commas
> > or something?
> 
> I'm not at all sure what the structure of the bearoff files is. You
> might be right that there's a separate index with pointers, in which
> case it would be possible to have a value indicating no data (or
> pointing to a common entry). I think it might be time for a bit of
> scripting to see how many duplicate entries there are (gin being just
> one possibility).

I've had a look - the one sided bearoff db can have two forms:

Uncompressed (every position has a full entry - either 64 bytes or 128
bytes, containing 32/64 float values (64 if gammon probabilities are
included). 

Compressed = every position has an 8 byte entry giving the location of
the data, the number of non-zero entries in the data and the offset of
the first non-zero entry. Entries can run from 0 bytes to 128 bytes in
length, depending on the number of zero/non-zero entries.

The two sided database has a single form - every position has an 8
byte entry (winning probability and money equity for the three
possible cube ownerships) There's no indexing, so there's no way in
this format to save any space. Indexing would require at least 4
bytes/entry, so unique entries would require 12 bytes instead of 8,
duplicates would save 4 bytes. The two sided database shrinks if more
than 1/2 of the entries occur in pairs.

I ran sorts on the actual data in the two sided databases, then
counted the number of duplicated entries. From that I could calculate
the size of a file using a pointer (4 bytes) to the data (8 bytes)

6x6: 14% duplicated entries

positions: 853776, dups = 119568, oldsize = 6830208, newsize: 9517816,
growth 39.35%

6x7: 11.9% duplicated entries
positions: 2944656, dups = 351488, oldsize = 23557248, newsize: 33033656,
growth 40.23%

6x8: 10.7% duplicated entries
positions: 9018009, dups = 964888, oldsize = 72144072, newsize: 101857012,
growth 41.19%

6x9: 11.6% duplicated entries
positions: 25050025, dups = 2901128, oldsize = 200400200, newsize: 280303732, 
growth 39.87%

For those only interested in match play, a two sided database without
the cubeful equities would be a major gain - the entries drop from 8
bytes/position to 2 bytes/position. This option is availaible with the
-C or --cubeless option to makebearoff

-- 
Jim Segrave           address@hidden





reply via email to

[Prev in Thread] Current Thread [Next in Thread]