Title: | Cluster Analysis with Trimming |
---|---|
Description: | Trimmed k-means clustering. The method is described in Cuesta-Albertos et al. (1997) <doi:10.1214/aos/1031833664>. |
Authors: | Christian Hennig <[email protected]> |
Maintainer: | Valentin Todorov <[email protected]> |
License: | GPL |
Version: | 0.1-5 |
Built: | 2024-11-01 11:29:51 UTC |
Source: | https://github.com/cran/trimcluster |
The trimmed k-means clustering method by Cuesta-Albertos, Gordaliza and Matran (1997). This optimizes the k-means criterion under trimming a portion of the points.
trimkmeans(data,k,trim=0.1, scaling=FALSE, runs=100, points=NULL, countmode=runs+1, printcrit=FALSE, maxit=2*nrow(as.matrix(data))) ## S3 method for class 'tkm' print(x, ...) ## S3 method for class 'tkm' plot(x, data, ...)
trimkmeans(data,k,trim=0.1, scaling=FALSE, runs=100, points=NULL, countmode=runs+1, printcrit=FALSE, maxit=2*nrow(as.matrix(data))) ## S3 method for class 'tkm' print(x, ...) ## S3 method for class 'tkm' plot(x, data, ...)
data |
matrix or data.frame with raw data |
k |
integer. Number of clusters. |
trim |
numeric between 0 and 1. Proportion of points to be trimmed. |
scaling |
logical. If |
runs |
integer. Number of algorithm runs from initial means (randomly chosen from the data points). |
points |
|
countmode |
optional positive integer. Every |
printcrit |
logical. If |
maxit |
integer. Maximum number of iterations within an algorithm
run. Each iteration determines all points which
are closer to a different cluster center than the one to which they are
currently assigned. The algorithm terminates if no more points have
to be reassigned, or if |
x |
object of class |
... |
further arguments to be transferred to |
plot.tkm
calls plotcluster
if the
dimensionality of the data p
is 1, shows a scatterplot
with non-trimmed regions if p=2
and discriminant coordinates
computed from the clusters (ignoring the trimmed points) if p>2
.
An object of class 'tkm' which is a LIST with components
classification |
integer vector coding cluster membership with trimmed
observations coded as |
means |
numerical matrix giving the mean vectors of the k classes. |
disttom |
vector of squared Euclidean distances of all points to the closest mean. |
ropt |
maximum value of |
k |
see above. |
trim |
see above. |
runs |
see above. |
scaling |
see above. |
Christian Hennig [email protected] http://www.homepages.ucl.ac.uk/~ucakche/
Cuesta-Albertos, J. A., Gordaliza, A., and Matran, C. (1997) Trimmed k-Means: An Attempt to Robustify Quantizers, Annals of Statistics, 25, 553-576.
set.seed(10001) n1 <-60 n2 <-60 n3 <-70 n0 <-10 nn <- n1+n2+n3+n0 pp <- 2 X <- matrix(rep(0,nn*pp),nrow=nn) ii <-0 for (i in 1:n1){ ii <-ii+1 X[ii,] <- c(5,-5)+rnorm(2) } for (i in 1:n2){ ii <- ii+1 X[ii,] <- c(5,5)+rnorm(2)*0.75 } for (i in 1:n3){ ii <- ii+1 X[ii,] <- c(-5,-5)+rnorm(2)*0.75 } for (i in 1:n0){ ii <- ii+1 X[ii,] <- rnorm(2)*8 } tkm1 <- trimkmeans(X,k=3,trim=0.1,runs=3) # runs=3 is used to save computing time. print(tkm1) plot(tkm1,X)
set.seed(10001) n1 <-60 n2 <-60 n3 <-70 n0 <-10 nn <- n1+n2+n3+n0 pp <- 2 X <- matrix(rep(0,nn*pp),nrow=nn) ii <-0 for (i in 1:n1){ ii <-ii+1 X[ii,] <- c(5,-5)+rnorm(2) } for (i in 1:n2){ ii <- ii+1 X[ii,] <- c(5,5)+rnorm(2)*0.75 } for (i in 1:n3){ ii <- ii+1 X[ii,] <- c(-5,-5)+rnorm(2)*0.75 } for (i in 1:n0){ ii <- ii+1 X[ii,] <- rnorm(2)*8 } tkm1 <- trimkmeans(X,k=3,trim=0.1,runs=3) # runs=3 is used to save computing time. print(tkm1) plot(tkm1,X)