[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: problem with rand ?
From: |
Paul Kienzle |
Subject: |
Re: problem with rand ? |
Date: |
Wed, 28 Jul 2004 01:57:23 -0400 |
Dimitri,
The first thing to note is that I'm only using 32 bits of randomness,
not 53.
I was wondering if that might be a problem. Apparently it is.
With a slight change to the code (using rand53() rather than randu())
the following expression usually equals 0:
length(find(diff(sort(rand(1e6,1)))==0))
Calculating the probability of at least one duplicate is the birthday
paradox:
1-p = m!/(m^n (m-n)!)
Using Stirling's approximation,
log(1-p) ~ (m-n+1/2)(log m - log m-n) - n
For m=2^32, n=1e6, 1-p ~ 2.7e-51 => very high probability of
duplicates.
For m=2^53, n=1e6, 1-p ~ 0.99983 => very low probability of
duplicates.
For m=2^53, n=6e7, 1-p ~ 0.55 => even chance of a duplicate.
For m=2^32, n=7.7e4, 1-p ~ 0.50 => even chances of a duplicate.
Unfortunately, my computer is not hefty enough to run this experiment
for n=6e7 (it takes 15 min/trial), but running 100 trials on n=7.7e4
yields a not unexpected 54 trials with no duplicates. FWIW, the n=6e7
trial I did run yielded 0 duplicates.
The 53-bit code is some 35% slower than the 32-bit code. Unless
there are objections, I will change to 53-bits. Accuracy over speed.
- Paul
On Jul 27, 2004, at 11:46 PM, Dmitri A. Sergatskov wrote:
I found that random series returned by rand (from octave-forge) returns
a higher number of _identical_ values than I would expect:
octave:1> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1);
any(dss1==0)
ans = 1
octave:2> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1);
find(dss1==0)
ans = [](1x0)
octave:3> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1);
find(dss1==0)
ans = 60641
octave:4> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1);
find(dss1==0)
ans = [](1x0)
octave:5> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1);
find(dss1==0)
ans =
26998 27604
octave:6> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1);
find(dss1==0)
ans = 16034
octave:7> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1);
find(dss1==0)
ans = 23032
octave:8> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1);
find(dss1==0)
ans = 121716
etc...
I would think that for random 2^17 numbers probability of two identical
numbers would be something like 2^17 / 2^54 (assuming ieee 64 bit
double
representation)
Matlab does not show this behavior even for larger series:
>> s1=rand(1,256*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0)
ans =
0
>> s1=rand(1,1024*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0)
ans =
0
>> s1=rand(1,1024*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0)
ans =
0
Am I missing something here?
Sincerely,
Dmitri.
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------