bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Re: Race database and bearoff DB


From: Joern Thyssen
Subject: Re: [Bug-gnubg] Re: Race database and bearoff DB
Date: Tue, 1 Oct 2002 16:19:43 +0000
User-agent: Mutt/1.4i

On Tue, Oct 01, 2002 at 05:20:15PM +0200, Jim Segrave wrote
> On Tue 01 Oct 2002 (13:09 +0000), Joern Thyssen wrote:
> > On Tue, Oct 01, 2002 at 02:30:41PM +0200, ?ystein O Johansen wrote
> > > 
> > > > I have ported Joseph's OSR code to a local copy of gnubg,
> > > > but I never got around testing it.
> > > 
> > > Which bearoff database do you use for this? The problem with
> > > the porting is that bearoff databases. I ported this as well
> > > but I had to use the bearoff database from Joseph, and then
> > > where was suddenly two bearoff databases for one program.
> > > 
> > > (BTW: Should we maybe find a better encoding for the gnubg
> > > bearoff database? We don't want to store all these zeros.)
> > 
> > If we try to compress it, it'll be much harder to read it. We can easily
> > compress the current gnubg.bd since it's read into memory and we can
> > just read it with functions from libz. However, for the new gnubg_os.bd
> > it'll be more complicated since we can't read it into memory. We have to
> > use something like fseek to find bearoff equities. Perhaps there is a
> > libz_fseek... I'll check.
> 
> Ugh. Most compression schemes work on a sliding window over the input
> stream, so you'd have to set re-sync points throughout the compressed
> file and seek to there, then walk forward to the data you want.
> 
> What's the actual structure of the database? 

Basically it's 32 unsigned short ints for each position. The current
one-sided bear-off database is for 15 chequers on the 1-6 points, a
total of N(21,6)=54264. 

Each of the 32 integers represent the probability that we bear off in i
moves (multiplied by 255).

                                #Pos     #Short ints      #bytes
15 chrs on p1-6 : N(21,6)       54264     1736448        3.4 MB
15 chrs on p1-7 : N(22,7)      170544     5457408        5.4 MB
15 chrs on p1-8 : N(23,8)      490314    15690048       31.3 MB 
15 chrs on p1-9 : N(24,9)     1307504    41840128       83.7 MB
15 chrs on p1-10: N(25,10)    3268760   104600320      209.2 MB
15 chrs on p1-11: N(26,11)    7726160   247237120      494.5 MB
15 chrs on p1-12: N(27,12)   17383860   556283520     1112.6 MB

We currently use 15/1-6 which easily fits into memory and can be
referenced as:

aaBearoff1 ( position, i )

However, for 15/1-10 we cannot easily fit it into memory, so we need to
read it from file every time (obviously we're going to make some sort of
cache, but anyway). To get the 32 short ints for a given position we
need something like:

fseek ( pfBearoffHuge, 32 * 2 * position, SEEK_SET );
fread ( aBearoff, 2, 32, pfBearoffHuge );

If we decide to, say, gzip pfBearoffHuge it'll be more complicated to
fseek to the right place.

> There are a lot of
> algorithms out there for storing data, indexed files, etc. and we have
> a pool of people with the ability to adapt/implement them. With a
> better view of what's in the database (and maybe a large enough sample
> of raw data), there will be some simple run length encoding scheme to
> allow reducing the size without adversely impacting performance

Yes, I'll start by making it work with a simple algorithm, then we can
try to optimise size and speed.

I have a real space saving idea, but I'm not sure if it would work! 

We have aaBearoff ( position, i ) that gives the probability that I
bear-off in i moves. If we assume aaBearoff ( position, i ) is normal
distributed over i we may substitute with normal distribution:

aaBearoff (position,i ) ~= phi ( position, i )

where phi is a normal distribution characterised by mu(position) and
sigma(position).

Usually we calculate the winning chance as:

p = \sum_{i=0}^(31} ( aaBearoff ( my position, i ) *
                      [ \sum_{j=i}^{31] aaBearoff ( opp position, j ) ] )
                                                                (Eq. 1)

However, using the normal distribution we get:

p = \int_{i=0}^{\infty} ( phi ( my position, i ) *
                          [ \int_{j=i}^{oo} phi ( opp position, j ) dj ] di )
                                                                (Eq. 2)

The idea is that instead of storing 64 bytes per position only 8 bytes
is requirement to store mu and sigma (assuming we store mu and sigma in
single precision), hence we have a reduction of factor 8. Basically this
would allow us to store the 15/1-14 DB on one CD.

What remains is: 

- how the hell do I integrate Eq. (2)? I can probably use erf for the
  inner integral but the outer integral has to be integrated with some
  numerical scheme.
- is it ok to approximate aaBearoff ( position, i ) with a normal
  distribution?

I think I'll program this for the small database and see how it works.

Jørn

-- 
Joern Thyssen, PhD
Vendsysselgade 3, 3., DK-9000 Aalborg, Denmark
+45 9813 2791 (private) / +45 2818 0183 (mobile) / +45 9633 7036 (work)
Note: new mobile number!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]