Shuffling elements in a dataset (fwd)

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Shuffling elements in a dataset (fwd)

From:	John W. Eaton
Subject:	Shuffling elements in a dataset (fwd)
Date:	Fri, 21 Feb 1997 18:52:10 -0600

On 21-Feb-1997, Ted Harding <address@hidden> wrote:

| I find that something like
| 
|    [dummy,ix] = sort(rand(1,rows(x)));  new_x = x(ix,:);
| 
| seems pretty fast. (0.04 secs for 10000 rows, 0.05 secs for 100000 rows,
| or for 1000000, on a 386-DX/25MHz; 0.003 secs for 10000 rows, 0.004 secs
| for 100000 rows, or for 1000000, on Pentium-120, i.e. almost independent
| of number of rows. However for 10000000 rows it starts swapping and takes
| a while (48 MB RAM)). Above timings for 1 column only; reduce sizes pro
| rata for extra columns (RAM limit).

Hmm.  Here is what I get (counting down to minimize memory problems):

  for n = [[8, 6, 4, 2]*1e5, 10.^[5, 4, 3, 2, 1]]
    x = rand (n, 1);
    t = cputime ();
    [dummy,ix] = sort(rand(1,rows(x)));
    new_x = x(ix,:);
    t = cputime () - t;
    printf ("N = %8d, t = %6.2f seconds, n*log(n) = %.2e\n", n, t, n*log(n));
    fflush (stdout);
  endfor

  N =   500000, t =  13.96 seconds
  N =   100000, t =   2.30 seconds
  N =    10000, t =   0.17 seconds
  N =     1000, t =   0.02 seconds
  N =      100, t =   0.01 seconds
  N =       10, t =   0.01 seconds

at N == 10^6, I get into paging trouble on my 64MB Pentium system.

How did you end up with almost constant time?  I took Octave's sort
algorithm from Knuth.  It's good, but I don't think it's *that* good.

jwe

[Prev in Thread]

Current Thread

[Next in Thread]

Shuffling elements in a dataset (fwd), Ted Harding, 1997/02/21
- Shuffling elements in a dataset (fwd), John W. Eaton <=
- Shuffling elements in a dataset (fwd), Ted Harding, 1997/02/22

Prev by Date: Shuffling elements in a dataset (fwd)
Next by Date: Shuffling elements in a dataset
Previous by thread: Shuffling elements in a dataset (fwd)
Next by thread: Shuffling elements in a dataset (fwd)
Index(es):
- Date
- Thread