SPSS K-means & R

What would be the best function/package to use in R to try and replicate the K-means clustering method used in SPSS? Here is an example of the syntax I would use in SPSS:

QUICK CLUSTER VAR1 TO VAR10       
   /MISSING=LISTWISE                  
   /CRITERIA=CLUSTER(5) MXITER(50) CONVERGE(.02)
   /METHOD=KMEANS(NOUPDATE)

Thanks!

Answers


In SPSS, use the /PRINT INITIAL option. This will give you the initial cluster centers, which seem to be fixed in SPSS, but random in R (see ?kmeansfor parameter centers).

If you use the printed initial cluster centers from SPSS output and the argument="Lloyd" parameter in kmeans, you should get the same results (at least it worked for me, testing with several repetitions).

Example of an SPSS-output of the initial cluster centers:

           Cluster
           Cl1  Cl2  Cl3  Cl4
Var A      1    1    4    3
Var B      4    1    4    1
Var C      1    1    1    4
Var D      1    4    4    1
Var E      1    4    1    2
Var F      1    4    4    3

This table, replicated as matrix in R, with kmeans computation:

mat <- matrix(c(1,1,4,3,4,1,4,1,1,1,1,4,1,4,4,1,1,4,1,2,1,4,4,3), nrow=4, ncol=6)
kmeans(na.omit(data.frame), centers=mat, iter.max=20, algorithm="Lloyd")

Be sure to use the same amount of maximum iterations in SPSS and R-kemans, and use Lloyd-method in R-kmeans.

However, I don't know whether it's better to have a fixed or a random choice of initial centers. I personally like the random choice, and compute a linear discriminant analysis with the found cluster groups to assess the classification accuracy, and rerun the kmeans clustering until I have a statisfying group classification.

Edit: I found this posting where the SPSS procedure of selecting initial clusters is described. Perhaps somebody knows of an R implementation?


Need Your Help

HTML/CSS Validation Links/Buttons

html w3c-validation

Is there a reason many websites place a small link/button to the W3C CSS/HTML validation of the respective site or is this just a weird practice that caught on?

Error when changing a vector value in a class

c++ class variables vector double

I am attempting to change a value in a vector which is a variable in a class using a function of a class. When I compile, i get the following errors pointing to the "check[c] = cval;" line: