[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Shuffling elements in a dataset
From: 
John W. Eaton 
Subject: 
Shuffling elements in a dataset 
Date: 
Fri, 21 Feb 1997 19:02:35 0600 
[Sorry about posting this more than once, it got away while I was
playing around with the example code and before I was ready to send
it. jwe]
On 21Feb1997, Ted Harding <address@hidden> wrote:
 I find that something like

 [dummy,ix] = sort(rand(1,rows(x))); new_x = x(ix,:);

 seems pretty fast. (0.04 secs for 10000 rows, 0.05 secs for 100000 rows,
 or for 1000000, on a 386DX/25MHz; 0.003 secs for 10000 rows, 0.004 secs
 for 100000 rows, or for 1000000, on Pentium120, i.e. almost independent
 of number of rows. However for 10000000 rows it starts swapping and takes
 a while (48 MB RAM)). Above timings for 1 column only; reduce sizes pro
 rata for extra columns (RAM limit).
Hmm. Here is what I get (counting down to minimize memory problems):
for n = [5e5, 10.^[5, 4, 3, 2, 1]]
x = rand (n, 1);
t = cputime ();
[dummy,ix] = sort(rand(1,rows(x)));
new_x = x(ix,:);
t = cputime ()  t;
printf ("N = %8d, t = %6.2f seconds\n", n, t);
fflush (stdout);
endfor
N = 500000, t = 13.50 seconds
N = 100000, t = 2.31 seconds
N = 10000, t = 0.18 seconds
N = 1000, t = 0.02 seconds
N = 100, t = 0.01 seconds
N = 10, t = 0.00 seconds
at N ~ 8e5, I get into paging trouble on my 64MB Pentium system.
How did you end up with almost constant time? I took Octave's sort
algorithm from Knuth. It's good, but I don't think it's *that* good.
jwe