# SPSS K-means & R

What would be the best function/package to use in R to try and replicate the K-means clustering method used in SPSS? Here is an example of the syntax I would use in SPSS:

```QUICK CLUSTER VAR1 TO VAR10
/MISSING=LISTWISE
/CRITERIA=CLUSTER(5) MXITER(50) CONVERGE(.02)
/METHOD=KMEANS(NOUPDATE)
```

Thanks!

In SPSS, use the /PRINT INITIAL option. This will give you the initial cluster centers, which seem to be fixed in SPSS, but random in R (see ?kmeansfor parameter centers).

If you use the printed initial cluster centers from SPSS output and the argument="Lloyd" parameter in kmeans, you should get the same results (at least it worked for me, testing with several repetitions).

Example of an SPSS-output of the initial cluster centers:

```           Cluster
Cl1  Cl2  Cl3  Cl4
Var A      1    1    4    3
Var B      4    1    4    1
Var C      1    1    1    4
Var D      1    4    4    1
Var E      1    4    1    2
Var F      1    4    4    3
```

This table, replicated as matrix in R, with kmeans computation:

```mat <- matrix(c(1,1,4,3,4,1,4,1,1,1,1,4,1,4,4,1,1,4,1,2,1,4,4,3), nrow=4, ncol=6)
kmeans(na.omit(data.frame), centers=mat, iter.max=20, algorithm="Lloyd")
```

Be sure to use the same amount of maximum iterations in SPSS and R-kemans, and use Lloyd-method in R-kmeans.

However, I don't know whether it's better to have a fixed or a random choice of initial centers. I personally like the random choice, and compute a linear discriminant analysis with the found cluster groups to assess the classification accuracy, and rerun the kmeans clustering until I have a statisfying group classification.

Edit: I found this posting where the SPSS procedure of selecting initial clusters is described. Perhaps somebody knows of an R implementation?