coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: Re: Generating pseudo-random integers


From: Melikamp The Medley
Subject: Fwd: Re: Generating pseudo-random integers
Date: Sat, 05 Feb 2011 15:47:33 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101213 Lightning/1.0b2 Lanikai/3.1.7

On 02/05/2011 02:04 PM, Bob Proulx wrote:
> Melikamp The Medley wrote:
>> Bob Proulx wrote:
>>>> and link it with coreutils.
>>> Why?
>> So that I don't have to write a new PRNG, or anything, really, besides
>> a CL option parser. I would rather use ISAAC and randint.
> If you use awk or perl then you still don't have to do that.  So that
> isn't yet a reason for pushing this into coreutils.

You seem to be implying that I am suggesting that my yet to be written
code is to be added to coreutils. I don't believe I gave out that
impression. Of course, I would be delighted if any of
my code was ever useful to anyone, and escpecially GNU, but so far
I simply announced an intention to link with coreutils, because it
already contains a very fast implementation of a desired algorithm.
The only place I am pushing this into is my own git tree.

>> Bob Proulx wrote:
>>>   awk 'BEGIN{srand();for(i=0;i<1000;++i){print int(100*rand());}}'
>> Jim Meyering wrote:
>>> $ perl -le 'foreach (1..20) { print int rand 1000 }'
>> This should introduce the rounding bias, small as it is. I want
>> the integer-generating algorithm to have no bias on the assumption
>> that I have a truly random source.
> I am an engineer, not a theoretical mathematician, but I see no
> rounding bias there.  Why do you think that awk's and perl's random
> number generators including rounding bias?  Please provide information
> indicating the problem.  They have both been extensively analyzed and
> peer reviewed.

I disregard the bias inherent in the generator itself. ISAAC is biased too.
What I do want to see, though, is the ability to use any file as a random
source. Does awk or perl allow to change the random source? Even if they
do, they ruin it when giving you floats. In a way, POSIX is to blame :)

The bias is already introduced at the stage where the internal
state is converted into a float, even if random() itself gives unbiased
integers from 0 to (2^n - 1):

/* awk's builtin.c */
return tmp_number((AWKNUM) (random() % GAWK_RANDOM_MAX) / GAWK_RANDOM_MAX);

A user introduces even more bias when multiplying and rounding to an
integer. I won't discuss the nature of the bias, I'll just show it exists.
This entire process has an upper bound on run-time: generating a
random integer from a set {0,1,2}, for example, will be done in M instructions
or less, M some constant. But the binary nature of our computers makes it
impossible for the distribution to be uniform. If all you can do is flip a
fair coin k times (for each of k random bits), then the resulting event
space has 2^k equally likely basic events, and it is impossible to
partition it into 3 (or any non-power-of-2) equally likely events. This can
be fixed in two ways. One is having a different random source for each prime\
(an infinite set of biased coins), which is insane. The other one is to flip
fair coins until a "good" result is observed. This makes the run-time
unbounded, but is entirely practical, as the expected run-time is finite, I
believe, and is actually very low. This is how coreutils randint works.

> If you really believe you have found a problem with
> them then you should report the problem to them so that they get fixed
> and everyone benefits.

I am sure they are fully aware, but disregard the bias as negligible,
as most practical people would. I am just not that practical, and have too
much spare time on my hands. Seriously, though, if they were to satisfy
people like me, they would have to re-implement randint, and that would
be silly ;)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]