(Sparse) Robust Principal Components using the Grid search algorithm
Description
Computes a desired number of (sparse) (robust) principal components using
the grid search algorithm in the plane.
The global optimum of the objective function is searched in planes, not in
the p-dimensional space, using regular grids in these planes.
Usage
PCAgrid (x, k = 2, method = c ("mad", "sd", "qn"),
maxiter = 10, splitcircle = 25, scores = TRUE, zero.tol = 1e-16,
center = l1median, scale, trace = 0, store.call = TRUE, control, ...)
sPCAgrid (x, k = 2, method = c ("mad", "sd", "qn"), lambda = 1,
maxiter = 10, splitcircle = 25, scores = TRUE, zero.tol = 1e-16,
center = l1median, scale, trace = 0, store.call = TRUE, control, ...)
Arguments
x |
a numerical matrix or data frame of dimension (n x p )which
provides the data for the principal components analysis.
|
k |
the desired number of components to compute
|
method |
the scale estimator used to detect the direction with the
largest variance. Possible values are "sd" , "mad" and
"qn" , the latter can be called "Qn" too. "mad" is the
default value.
|
lambda |
the sparseness constraint's strength(sPCAgrid only).
A single value for all components, or a vector of length k with
different values for each component can be specified.
See opt.TPO for the choice of this argument.
|
maxiter |
the maximum number of iterations.
|
splitcircle |
the number of directions in which the algorithm should
search for the largest variance. The direction with the largest variance
is searched for in the directions defined by a number of equally spaced points
on the unit circle. This argument determines, how many such points are used to
split the unit circle.
|
scores |
A logical value indicating whether the scores of the
principal component should be calculated.
|
zero.tol |
the zero tolerance used internally for checking
convergence, etc.
|
center |
this argument indicates how the data is to be centered. It
can be a function like mean or median or a vector
of length ncol(x) containing the center value of each column.
|
scale |
this argument indicates how the data is to be rescaled. It
can be a function like sd or mad or a vector
of length ncol(x) containing the scale value of each column.
|
trace |
an integer value >= 0, specifying the tracing level.
|
store.call |
a logical variable, specifying whether the function call
shall be stored in the result structure.
|
control |
a list which elements must be the same as (or a subset of)
the parameters above. If the control object is supplied, the parameters from
it will be used and any other given parameters are overridden.
|
... |
further arguments passed to or from other functions.
|
Details
In contrast to PCAgrid
, the function sPCAgrid
computes sparse
principal components. The strength of the applied sparseness constraint is
specified by argument lambda
.
Similar to the function princomp
, there is a print
method
for the these objects that prints the results in a nice format and the
plot
method produces a scree plot (screeplot
). There is
also a biplot
method.
Angle halving is an extension of the original algorithm. In the original
algorithm, the search directions are determined by a number of points on the
unit circle in the interval [-pi/2 ; pi/2). Angle halving means this angle is
halved in each iteration, eg. for the first approximation, the above mentioned
angle is used, for the second approximation, the angle is halved to
[-pi/4 ; pi/4) and so on. This usually gives better results with less
iterations needed.
NOTE: in previous implementations angle halving could be suppressed by the
former argument "anglehalving
". This still can be done by setting
argument maxiter = 0
.
Value
The function returns an object of class "princomp"
, i.e. a list
similar to the output of the function princomp
.
sdev |
the (robust) standard deviations of the principal components.
|
loadings |
the matrix of variable loadings (i.e., a matrix whose columns
contain the eigenvectors). This is of class "loadings" :
see loadings for its print method.
|
center |
the means that were subtracted.
|
scale |
the scalings applied to each variable.
|
n.obs |
the number of observations.
|
scores |
if scores = TRUE , the scores of the supplied data on the
principal components.
|
call |
the matched call.
|
obj |
A vector containing the objective functions values. For function
PCAgrid this is the same as sdev .
|
lambda |
The lambda each component has been calculated with
(sPCAgrid only).
|
Note
See the vignette "Compiling pcaPP for Matlab" which comes with this package to compile and use these functions in Matlab.
Author(s)
Heinrich Fritz, Peter Filzmoser <[email protected]>
References
C. Croux, P. Filzmoser, M. Oliveira, (2007).
Algorithms for Projection-Pursuit Robust Principal Component Analysis,
Chemometrics and Intelligent Laboratory Systems, Vol. 87, pp. 218-225.
C. Croux, P. Filzmoser, H. Fritz (2011).
Robust Sparse Principal Component Analysis Based on Projection-Pursuit,
?? To appear.
See Also
PCAproj
, princomp
Examples
library(mvtnorm)
x <- rbind(rmvnorm(200, rep(0, 6), diag(c(5, rep(1,5)))),
rmvnorm( 15, c(0, rep(20, 5)), diag(rep(1, 6))))
pc <- PCAgrid(x)
biplot(pc)
pc <- princomp(x)
biplot(pc)
set.seed (0)
x <- data.Zou ()
pc <- princomp (x)
unclass (pc$load[,1:3])
pc$sdev[1:3]
lambda <- c (0.23, 0.34, 0.005)
spc <- sPCAgrid (x, k = 3, lambda = lambda, method = "sd")
unclass (spc$load)
spc$sdev[1:3]
par (mfrow = 1:2)
biplot (pc, main = "non-sparse PCs")
biplot (spc, main = "sparse PCs")