help-octave
[Top][All Lists]

## Re: problem with rand ?

 From: Paul Kienzle Subject: Re: problem with rand ? Date: Wed, 28 Jul 2004 01:57:23 -0400

```Dimitri,

```
The first thing to note is that I'm only using 32 bits of randomness, not 53.
```I was wondering if that might be a problem.  Apparently it is.

With a slight change to the code (using rand53() rather than randu())
the following expression usually equals 0:

length(find(diff(sort(rand(1e6,1)))==0))

```
Calculating the probability of at least one duplicate is the birthday paradox:
```
1-p = m!/(m^n (m-n)!)

Using Stirling's approximation,

log(1-p) ~ (m-n+1/2)(log m - log m-n) - n

```
For m=2^32, n=1e6, 1-p ~ 2.7e-51 => very high probability of duplicates. For m=2^53, n=1e6, 1-p ~ 0.99983 => very low probability of duplicates.
```For m=2^53, n=6e7,  1-p ~ 0.55  => even chance of a duplicate.
For m=2^32, n=7.7e4, 1-p ~ 0.50 => even chances of a duplicate.

Unfortunately, my computer is not hefty enough to run this experiment
for n=6e7 (it takes 15 min/trial), but running 100 trials on n=7.7e4
yields a not unexpected 54 trials with no duplicates.  FWIW, the n=6e7
trial I did run yielded 0 duplicates.

The 53-bit code is some 35% slower than the 32-bit code.  Unless
there are objections, I will change to 53-bits.  Accuracy over speed.

- Paul

On Jul 27, 2004, at 11:46 PM, Dmitri A. Sergatskov wrote:

```
```I found that random series returned by rand (from octave-forge) returns
a higher number of _identical_ values than I would expect:

```
octave:1> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0)
```ans = 1
```
octave:2> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
```ans = [](1x0)
```
octave:3> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
```ans = 60641
```
octave:4> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
```ans = [](1x0)
```
octave:5> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
```ans =

26998  27604

```
octave:6> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
```ans = 16034
```
octave:7> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
```ans = 23032
```
octave:8> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
```ans = 121716

etc...

I would think that for random 2^17 numbers probability of two identical
```
numbers would be something like 2^17 / 2^54 (assuming ieee 64 bit double
```representation)

Matlab does not show this behavior even for larger series:

>> s1=rand(1,256*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0)

ans =

0

>> s1=rand(1,1024*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0)

ans =

0

>> s1=rand(1,1024*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0)

ans =

0

Am I missing something here?

Sincerely,

Dmitri.

-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------

```
```

-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------

```