bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new coreutil? shuffle - randomize file contents


From: Paul Eggert
Subject: Re: new coreutil? shuffle - randomize file contents
Date: Thu, 02 Jun 2005 23:31:26 -0700
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.4 (gnu/linux)

Philip Rowlands <address@hidden> writes:

> I'm still interested to read what Paul considers to be the
> difficulties of such an implementation?

Suppose you're randomizing an input file of 10 million lines.  And
suppose you want to approximate a "truly random" key by using a
128-bit random key for each input line.

Then you'll need about 1.3 billion random bits.  On my desktop,
/dev/random generates only about 200 random bits per second, and it'll
take me about 2.5 months to randomize the file.

If you're clever you can tune things so you need only about 0.22
billion random bits: this is because the log base 2 of (10 million
factorial) is about 220 million (I used Stirling's approximation).
But even with that optimization it'll still take me a couple of weeks
to randomize the file.

One way to be clever is to use the Knuth shuffle (a.k.a. the
Fisher-Yates shuffle).  However, this doesn't easily generalize to an
algorithm that will work efficiently if the input is too large to fit
into main memory.  There are other ways to be clever even for large
files, but I haven't had time to think them through.

And don't forget: even if we're clever, it still takes me a couple of
weeks to randomize a 10-million line file.

For more on this subject, please see:

http://en.wikipedia.org/wiki/Knuth_shuffle




reply via email to

[Prev in Thread] Current Thread [Next in Thread]