[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Are these bugs in cluster?
From: |
John Darrington |
Subject: |
Re: Are these bugs in cluster? |
Date: |
Sat, 30 May 2015 09:13:16 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Fri, May 29, 2015 at 09:33:30AM -0500, Alan Mead wrote:
John suggested that I post to pspp-dev. I'm adding code to the k-means
(i.e., quick-cluster.c) procedure to show cluster membership.
CLUSTER works perfectly on a trivial two-dimensional problem but it
fails miserably on some real data. For example, in one analysis
requesting 3 clusters on 98 cases, it found that everyone was in cluster
3 and zero people were in clusters 1 & 2. I think part of it is that
the starting values seem to be a pattern of 1's and zero's, even though
the comments describe selecting random individuals as starting values.
My question is about accessing the data. I copied other code to use a
"casereader" to iterate over the rows of data. Below are the relevant
parts of the code I've added that seems to display cluster membership.
If I want to randomly select cases as starting values, is there a way to
retrieve random records directly?
Ben is the casereader expert! Maybe he can comment? But I think you might
be able to use the function casereader_select (defined in casereader-select.c)
casereader_select (subreader, random_number - 1, random_number + 1, 1);
You would have to ensure that random_number was within the range of subreader.
Alternatively, we might be able to come up with a function similar to
casereader_select,
which advances the subreader by a (pseudo) random number on each read.
Disclaimer: These are first ideas, which I haven't thought through in any
degree.
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature