octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: binocdf inaccuracy in Octave


From: Daniel J Sebald
Subject: Re: binocdf inaccuracy in Octave
Date: Mon, 08 Jul 2013 15:10:06 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16

On 07/08/2013 12:33 PM, Daniel J Sebald wrote:
On 07/08/2013 12:29 PM, Daniel J Sebald wrote:
On 07/08/2013 11:38 AM, Rik wrote:
7/8/13

Dr. Klein,

I think this is actually a much easier problem to solve that it first
appeared. In the the file binocdf.m the formula used to calculate the
CDF is

cdf(k) = 1 - betainc (p, tmp + 1, n - tmp);

According to Wikipedia
(http://en.wikipedia.org/wiki/Binomial_distribution) the CDF for the
binomial distribution is

\textstyle I_{1-p}(n - k, 1 + k)

or

I(1-p, n-k, 1+k)

So it appears that we simply have the arguments wrong to the betainc
function.

But it is also true that

\textstyle I_{x}(a, b) = I_{1-x}(b, a)

Oops, make that

I_{x}(a,b) = 1 - I_{1-x}(b,a)

Dan

http://en.wikipedia.org/wiki/Regularized_incomplete_beta_function#Incomplete_beta_function



In other words, the two expressions you are comparing are mathematically
equivalent. Where is the discrepancy then? Numerical issues?

We should be a little careful here. There was a discrepancy at one of the extreme ends of the function. Your modification may have just moved that discrepancy to the other end of the range, i.e., when the argument is 1e-3 as opposed to 1-1e-3. And we aren't 100% sure which is correct (but it is looking like Octave because of the strange string of zeros at the front), Erlang or Octave. Octave is using a FORTRAN routine:

double
betainc (double x, double a, double b)
{
  double retval;
  F77_XFCN (xdbetai, XDBETAI, (x, a, b, retval));
  return retval;
}

So, we should introduce some tests, if possible, that check results inherent to Octave's results. There are some tests, which I haven't explored, as part of the betainc involving "eps".

There doesn't seem to be any tests as part of binocdf that check for accuracy, mainly just sanity of the inputs. One or more tests along the lines of what Alex referred to should be useful. That is, compute the CDF two ways, one from binocdf and one from cumulating the probability mass function

p = 1-1e-3;
pval = binopdf ([0:50]', 50, p);
cumsum (pval) - binocdf ([0:50]', 50, p)

and check those results.  Here is what I'm seeing:

p = 1-1e-3;
pval = binopdf ([0:50]', 50, p);
cumsum (pval) - binocdf ([0:50]', 50, p)
ans =

   1.00000000000007e-150
   4.99510000000035e-146
   1.22260117600006e-141
   1.95424813815765e-137
   2.29399723360430e-133
   2.10841676614637e-129
   1.57977022596915e-125
   9.92031009822021e-122
   5.32697826475548e-118
   2.48350747999872e-114
   1.01724999177793e-110
   3.69550684776946e-107
   1.19987796084329e-103
   3.50394897537906e-100
    9.25151215525190e-97
    2.21822759875234e-93
    4.84771698248160e-90
    9.68615422923810e-87
    1.77409990314262e-83
    2.98511415284120e-80
    4.62253801912459e-77
    6.59738446954312e-74
    8.68836583529609e-71
    1.05672286748147e-67
    1.18770467184289e-64
    1.23406745773931e-61
    1.18550990718313e-58
    1.05282245558334e-55
    8.64033611769894e-53
    6.54884440079011e-50
    4.58011354682620e-47
    2.95231633143266e-44
    1.75142047732177e-41
    9.54507616566026e-39
    4.76856227562884e-36
    2.17814373131045e-33
    9.06845557842062e-31
    3.42871128091703e-28
    1.17213664214074e-25
    3.60414974466429e-23
    9.90534035957436e-21
    2.41464425822922e-18
   -3.79109725646771e-17
    5.02325782207112e-17
   -7.98300326864362e-18
    5.36716049465165e-17
   -1.89714998454670e-17
    2.55465136891897e-18
    1.04083408558608e-17
    1.94289029309402e-16
    4.44089209850063e-16

Rik, that string of zeros at the front of binocdf () has me wondering if the issue here is that the FORTRAN routine computing betainc () is single precision float and not double precision float. Note that the binopdf.m script uses the following computations:

    pdf(k) = exp (gammaln (n+1) - gammaln (x(k)+1) - gammaln (n-x(k)+1)
                  + x(k)*log (p) + (n-x(k))*log (1-p));

Are these all double precision operations?

Dan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]