pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are these bugs in cluster?


From: John Darrington
Subject: Re: Are these bugs in cluster?
Date: Sat, 30 May 2015 09:13:16 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Fri, May 29, 2015 at 09:33:30AM -0500, Alan Mead wrote:
     John suggested that I post to pspp-dev.  I'm adding code to the k-means
     (i.e., quick-cluster.c) procedure to show cluster membership.
     
     CLUSTER works perfectly on a trivial two-dimensional problem but it
     fails miserably on some real data. For example, in one analysis
     requesting 3 clusters on 98 cases, it found that everyone was in cluster
     3 and zero people were in clusters 1 & 2.  I think part of it is that
     the starting values seem to be a pattern of 1's and zero's, even though
     the comments describe selecting random individuals as starting values.
     
     My question is about accessing the data.  I copied other code to use a
     "casereader" to iterate over the rows of data. Below are the relevant
     parts of the code I've added that seems to display cluster membership.
     If I want to randomly select cases as starting values, is there a way to
     retrieve random records directly?
     

Ben is the casereader expert!  Maybe he can comment?  But I think you might 
be able to use the function casereader_select (defined in casereader-select.c)

casereader_select (subreader, random_number - 1, random_number + 1, 1);

You would have to ensure that random_number was within the range of subreader.

Alternatively, we might be able to come up with a function similar to 
casereader_select,
which advances the subreader by a (pseudo) random number on each read.


Disclaimer:  These are first ideas, which I haven't thought through in any
degree.

J'




-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]