[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: why there are multiple functions for CDF of beta distribution?
From: |
Vasu Jaganath |
Subject: |
Re: why there are multiple functions for CDF of beta distribution? |
Date: |
Fri, 10 Jan 2020 11:48:15 -0700 |
Thanks Martin,
I settled on similar approach, I use the bisection method in the
betainv.c, This approach seems much cleaner, I however have some strange
corner cases where a>100 and b >100, I handle them separately.
I will try to benchmark both approaches with my data to see which is faster.
On Fri, Jan 10, 2020 at 1:04 AM Martin Jansche <address@hidden> wrote:
> Yup, this is a problem. When I compile and run your program gsl_invBeta.c
> (which should #include <gsl/gsl_cdf.h>), I get:
>
> Observed:
> gsl: betainv.c:181:
> <http://git.savannah.gnu.org/cgit/gsl.git/tree/cdf/betainv.c#n181> ERROR:
> inverse failed to converge
> Default GSL error handler invoked.
> Abort trap: 6
>
> Expected result:
> 1.0
>
> Result with more precision:
> 0.99999999999999999999999999980178072518730245884823...
>
> For the supplied values of alpha and beta, the CDF rises very steeply near
> 1 and the inverse CDF is equal to 1.0 within double precision when P>0.5
> (that may not be a tight bound, but is a sufficient condition).
>
> Generic workaround:
>
> #include <stdio.h>
> #include <gsl/gsl_sf_gamma.h>
>
> // Computes the inverse function of gsl_sf_beta_inc by binary search.
> double qbeta(double p, double a, double b) {
> double x = 0.5;
> for (double delta = 0.25; delta > 0; delta /= 2) {
> const double beta_P = gsl_sf_beta_inc(a, b, x);
> const double new_x = p > beta_P ? x + delta : x - delta;
> if (beta_P == p || x == new_x) break;
> x = new_x;
> }
> return x;
> }
>
> int main() {
> double a = 5.3729222;
> double b = 0.018122864;
> double p = 0.67276126;
>
> printf("%.18g\n", qbeta(p, a, b));
>
> printf("%.7g (expected: 0.25)\n", qbeta(0.25, 1, 1));
> printf("%.7g (expected: 0.9922861)\n", qbeta(0.25, 5, 0.1));
> printf("%.7g (expected: 2.954261e-200)\n", qbeta(1e-50, 0.25, 0.5));
>
> return 0;
> }
>
> Hope this helps,
> -- mj
>
> On Fri, Jan 10, 2020 at 1:30 AM Vasu Jaganath <address@hidden>
> wrote:
>
>> Martin and others,
>>
>> Following up, I have a particular example for which both Q and P variants
>> of inverse functions fail.
>>
>> where as scipy beta.ppf (equivalent to gsl_cdf_beta_Pinv) does the right
>> thing and converges to 1 or nearest double.
>>
>> I am attaching a very simple example with exact same data, the gsl
>> function fails where as scipy function does the right thing.
>>
>> Any thoughts? any workarounds? Maybe there is a way I can specify
>> convergence criteria?
>>
>> Thanks,
>> Vasu
>>
>>
>> On Wed, Jan 8, 2020 at 12:42 PM Vasu Jaganath <address@hidden>
>> wrote:
>>
>>> Thanks Martin,
>>>
>>> I will test it out.
>>>
>>> On Tue, Jan 7, 2020 at 11:16 PM Martin Jansche <address@hidden>
>>> wrote:
>>>
>>>> There are many more floating point values between 0.0 and 0.001 than
>>>> there are between 0.999 and 1.0. The difference between 1.0 and the next
>>>> smaller double value is only around 1e-16, but the next larger double value
>>>> after 0.0 is about 1e-303. So beta_P(0.9, 1, 17) will be necessarily
>>>> equivalent to 1.0 due to lack of precision, whereas beta_Q(0.9, 1, 17) will
>>>> be 1e-17. (Haven't tried this in GSL. You may want to try and report back.)
>>>>
>>>> On Wed, Jan 8, 2020 at 1:35 AM Vasu Jaganath <address@hidden>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> This is probably a very silly question,
>>>>>
>>>>> I don't understand why there are two separate P and Q variants for
>>>>> CDFs?
>>>>> particularly for beta distribution?
>>>>>
>>>>>
>>>>> https://www.gnu.org/software/gsl/doc/html/randist.html#the-beta-distribution
>>>>>
>>>>> Thanks,
>>>>> Vasu
>>>>>
>>>>