[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Shuffling elements in a dataset (fwd)
From: 
John W. Eaton 
Subject: 
Shuffling elements in a dataset (fwd) 
Date: 
Fri, 21 Feb 1997 18:52:10 0600 
On 21Feb1997, Ted Harding <address@hidden> wrote:
 I find that something like

 [dummy,ix] = sort(rand(1,rows(x))); new_x = x(ix,:);

 seems pretty fast. (0.04 secs for 10000 rows, 0.05 secs for 100000 rows,
 or for 1000000, on a 386DX/25MHz; 0.003 secs for 10000 rows, 0.004 secs
 for 100000 rows, or for 1000000, on Pentium120, i.e. almost independent
 of number of rows. However for 10000000 rows it starts swapping and takes
 a while (48 MB RAM)). Above timings for 1 column only; reduce sizes pro
 rata for extra columns (RAM limit).
Hmm. Here is what I get (counting down to minimize memory problems):
for n = [[8, 6, 4, 2]*1e5, 10.^[5, 4, 3, 2, 1]]
x = rand (n, 1);
t = cputime ();
[dummy,ix] = sort(rand(1,rows(x)));
new_x = x(ix,:);
t = cputime ()  t;
printf ("N = %8d, t = %6.2f seconds, n*log(n) = %.2e\n", n, t, n*log(n));
fflush (stdout);
endfor
N = 500000, t = 13.96 seconds
N = 100000, t = 2.30 seconds
N = 10000, t = 0.17 seconds
N = 1000, t = 0.02 seconds
N = 100, t = 0.01 seconds
N = 10, t = 0.01 seconds
at N == 10^6, I get into paging trouble on my 64MB Pentium system.
How did you end up with almost constant time? I took Octave's sort
algorithm from Knuth. It's good, but I don't think it's *that* good.
jwe