Title: | Robust Methods for Multiway Data Analysis, Applicable also for Compositional Data |
---|---|
Description: | Provides methods for multiway data analysis by means of Parafac and Tucker 3 models. Robust versions (Engelen and Hubert (2011) <doi:10.1016/j.aca.2011.04.043>) and versions for compositional data are also provided (Gallo (2015) <doi:10.1080/03610926.2013.798664>, Di Palma et al. (2018) <doi:10.1080/02664763.2017.1381669>). Several optimization methods alternative to ALS are available (Simonacci and Gallo (2019) <doi:10.1016/j.chemolab.2019.103822>, Simonacci and Gallo (2020) <doi:10.1007/s00500-019-04320-9>). |
Authors: | Valentin Todorov [aut, cre] , Violetta Simonacci [aut], Maria Anna Di Palma [aut], Michele Gallo [aut] |
Maintainer: | Valentin Todorov <[email protected]> |
License: | GPL (>=3) |
Version: | 0.5-0 |
Built: | 2024-11-10 03:44:34 UTC |
Source: | https://github.com/valentint/rrcov3way |
A data set containing five simple laboratory-made samples where each sample contains different amounts of tyrosine, tryptophan and phenylalanine dissolved in phosphate buffered water. The samples were measured by fluorescence (excitation 240-300 nm, emission 250-450 nm, 1 nm intervals) on a PE LS50B spectrofluorometer.
data(amino)
data(amino)
A three-way array with dimension 5x201x61
.
The first dimension refers to the 5 samples. The second dimension
refers to the emission measurements (250-450nm, 1nm intervals).
The third dimension refers to the excitation (240-300 nm, 1nm intervals).
https://ucphchemometrics.com/datasets/.
Bro, R, PARAFAC: Tutorial and applications, Chemometrics and Intelligent Laboratory Systems, 1997, 38, 149-171 Bro, R, Multi-way Analysis in the Food Industry. Models, Algorithms, and Applications. 1998. Ph.D. Thesis, University of Amsterdam (NL) & Royal Veterinary and Agricultural University (DK). Kiers, H.A.L. (1998) A three-step algorithm for Candecomp/Parafac analysis of large data sets with multicollinearity, Journal of Chemometrics, 12, 155-171.
## Not run: data(amino) ## Plotting Emission spectra oldpar <- par(mfrow=c(2,1)) matplot(t(amino[,,1]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence emission spectra") matplot(t(amino[,,5]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence emission spectra") par(oldpar) ## Plotting excitation spectra oldpar <- par(mfrow=c(2,1)) matplot(t(amino[,1,]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence excitation spectra") matplot(t(amino[,30,]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence excitation spectra") par(oldpar) ## End(Not run)
## Not run: data(amino) ## Plotting Emission spectra oldpar <- par(mfrow=c(2,1)) matplot(t(amino[,,1]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence emission spectra") matplot(t(amino[,,5]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence emission spectra") par(oldpar) ## Plotting excitation spectra oldpar <- par(mfrow=c(2,1)) matplot(t(amino[,1,]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence excitation spectra") matplot(t(amino[,30,]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence excitation spectra") par(oldpar) ## End(Not run)
Chemical composition of water in the main stream of Arno river.
data("Arno")
data("Arno")
A three-way array with dimension 23x11x4. The first dimension refers to 23 distances from the spring. The second dimension refers to the 11 chemical compositions. The third dimension refers to the time of collection - four occasions.
The Arno data example was used in Gallo and Buccinati (2013) to illustrate a particular version of the Tucker model, known as the weighted principal component analysis. The Tucker3 results are usually given in the form of tables or plots and in this work for the representation of the Tucker3 results of logratio data, is proposed to use one-mode plots, clr-joint biplots (Gallo, 2015), and trajectory plots.
Nisi B., Vaselli O., Buccianti A., Minissale A., Delgado-Huertas A., Tassi F., Montegrossi G. (2008). Geochemical and isotopic investigation of the dissolved load in the running waters from the Arno valley: evaluation of the natural and anthropogenic input. In Memorie Descrittive della Carta Geologica d'Italia, Nisi (eds.), 79: 1-91.
Nisi B., Buccianti A., Vaselli O., Perini G., Tassi F., Minissale A., Montegrossi G. (2008) Hydrogeochemistry and strontium isotopes in the Arno river basin (Tuscany, Italy): Constraints on natural controls by statistical modeling. Journal of Hydrology 360: 166-183.
Gallo M. and Buccianti A. (2013). Weighted principal component analysis for compositional data: application example for the water chemistry of the Arno river (Tuscany, central Italy), Environmetrics, 24(4):269-277.
Gallo M. (2015). Tucker3 model for compositional data. Communications in Statistics-Theory and Methods, 44(21):4441-4453.
data(Arno) dim(Arno) # [1] 23 11 4 dim(Arno[,,1]) # [1] 23 11 rownames(Arno[,,1]) # the 23 distances from the spring colnames(Arno[,,1]) # the 11 chemical compositions dim(Arno[,1,]) # [1] 23 4 colnames(Arno[,1,]) # the four occasions res <- Tucker3(Arno, robust=FALSE, coda.transform="ilr") res ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") ## Paired component plot, mode A plot(res, which="comp", main="Paired component plot (mode A)") ## Paired component plot, mode B plot(res, which="comp", mode="B", main="Paired component plot (mode B)") ## Joint biplot plot(res, which="jbplot", main="Joint biplot") ## Trajectory plot(res, which="tjplot", main="Trajectory biplot")
data(Arno) dim(Arno) # [1] 23 11 4 dim(Arno[,,1]) # [1] 23 11 rownames(Arno[,,1]) # the 23 distances from the spring colnames(Arno[,,1]) # the 11 chemical compositions dim(Arno[,1,]) # [1] 23 4 colnames(Arno[,1,]) # the four occasions res <- Tucker3(Arno, robust=FALSE, coda.transform="ilr") res ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") ## Paired component plot, mode A plot(res, which="comp", main="Paired component plot (mode A)") ## Paired component plot, mode B plot(res, which="comp", mode="B", main="Paired component plot (mode B)") ## Joint biplot plot(res, which="jbplot", main="Joint biplot") ## Trajectory plot(res, which="tjplot", main="Trajectory biplot")
The function congruence(x, y)
computes the Tucker's
congruence (phi) coefficients among two sets of factors.
congruence(x, y = NULL)
congruence(x, y = NULL)
x |
A vector or matrix of factor loadings. |
y |
A vector or matrix of factor loadings (may be NULL). |
Find the Tucker's coefficient of congruence between two sets of factor loadings. Factor congruences are the cosines of pairs of vectors defined by the loadings matrix and based at the origin. Thus, for loadings that differ only by a scaler (e.g. the size of the eigen value), the factor congruences will be 1.
For factor loading vectors of X and Y the measure of factor congruence, phi, is
If y=NULL
and x
is a numeric matrix, the congruence
coefficients between the columns of the matrix x
are returned.
The result is a symmetric matrix with ones on the diagonal. If two matrices
are provided, they must have the same size and the result is a square matrix containing the
congruence coefficients between all pairs of columns of the two matrices.
A matrix of factor congruences.
Valentin Todorov, [email protected]
L.R Tucker (1951). A method for synthesis of factor analysis studies. Personnel Research Section Report No. 984. Department of the Army, Washington, DC.
require(rrcov) data(delivery, package="robustbase") X <- getLoadings(PcaClassic(delivery)) Y <- getLoadings(PcaHubert(delivery, k=3)) round(congruence(X,Y),3)
require(rrcov) data(delivery, package="robustbase") X <- getLoadings(PcaClassic(delivery)) Y <- getLoadings(PcaHubert(delivery, k=3)) round(congruence(X,Y),3)
Alternating Least Squares (ALS) algorithm with optional constraints for the minimization of the Candecomp/Parafac (CP) loss function.
cp_als( X, n, m, p, ncomp, const = "none", start = "random", conv = 1e-06, maxit = 10000, trace = FALSE )
cp_als( X, n, m, p, ncomp, const = "none", start = "random", conv = 1e-06, maxit = 10000, trace = FALSE )
X |
A three-way array or a matrix. If |
n |
Number of A-mode entities |
m |
Number of B-mode entities |
p |
Number of C-mode entities |
ncomp |
Number of components to extract |
const |
Optional constraints for each mode. Can be a three element
character vector or a single character, one of |
start |
Initial values for the A, B and C components. Can be
|
conv |
Convergence criterion, default is |
maxit |
Maximum number of iterations, default is |
trace |
Logical, provide trace output. |
The result of the decomposition as a list with the following elements:
fit
Value of the loss function
fp
Fit value expressed as a percentage
ss
Sum of squares
A
Component matrix for the A-mode
B
Component matrix for the B-mode
C
Component matrix for the C-mode
iter
Number of iterations
tripcos
Minimal triple cosine between two components
across the three component matrices, used to inspect degeneracy
mintripcos
Minimal triple cosine during the iterative
algorithm observed at every 10 iterations, used to inspect degeneracy
ftiter
Matrix containing in each row the function value
and the minimal triple cosine at every 10 iterations
const
Optional constraints (same as the input parameter
const
)
The argument const
should be a three element character vector.
Set const[j]="none"
for unconstrained update in j-th mode weight
matrix (the default),
const[j]="orth"
for orthogonal update in j-th mode weight matrix,
const[j]="nonneg"
for non-negative constraint on j-th mode or
const[j]="zerocor"
for zero correlation between the extracted
factors.
The default is unconstrained update for all modes.
The loss function to be minimized is ,
where
is a diagonal matrix holding the
k
-th row of
C
.
Valentin Todorov, [email protected]
Harshman, R.A. (1970). Foundations of Parafac procedure: models and conditions for an "explanatory" multi-mode factor analysis. UCLA Working Papers in Phonetics, 16: 1–84.
Harshman, R. A., & Lundy, M. E. (1994). PARAFAC: Parallel factor analysis. Computational Statistics and Data Analysis, 18, 39–72.
Lawson CL, Hanson RJ (1974). Solving Least Squares Problems. Prentice Hall, Englewood Cliffs, NJ.
## Not run: ## Example with the OECD data data(elind) dim(elind) res <- cp_als(elind, ncomp=3) res$fp res$fp res$iter res <- cp_als(elind, ncomp=3, const="nonneg") res$A ## End(Not run)
## Not run: ## Example with the OECD data data(elind) dim(elind) res <- cp_als(elind, ncomp=3) res$fp res$fp res$iter res <- cp_als(elind, ncomp=3, const="nonneg") res$A ## End(Not run)
Alternating Trilinear Decomposition algorithm for estimating the Candecomp/Parafac (CP) model. It is based on an alternating least squares principle and replaces the iterative procedure used in the traditional PARAFAC algorithm by an improved procedure. This improved procedure contains Moore-Penrose pseudoinverse computations based on singular value decomposition (SVD) which should be theoretically more robust to similarities in spectra and time profiles
cp_atld( X, n, m, p, ncomp, conv = 1e-06, start = "random", maxit = 5000, trace = FALSE )
cp_atld( X, n, m, p, ncomp, conv = 1e-06, start = "random", maxit = 5000, trace = FALSE )
X |
A three-way array or a matrix. If |
n |
Number of A-mode entities |
m |
Number of B-mode entities |
p |
Number of C-mode entities |
ncomp |
Number of components to extract |
conv |
Convergence criterion, default is |
start |
Initial values for the A, B and C components. Can be
|
maxit |
Maximum number of iterations, default is |
trace |
Logical, provide trace output. |
The result of the decomposition as a list with the following elements:
fit
Value of the loss function
fp
Fit value expressed as a percentage
ss
Sum of squares
A
Component matrix for the A-mode
B
Component matrix for the B-mode
C
Component matrix for the C-mode
iter
Number of iterations
tripcos
Minimal triple cosine between two components
across the three component matrices, used to inspect degeneracy
mintripcos
Minimal triple cosine during the iterative
algorithm observed at every 10 iterations, used to inspect degeneracy
ftiter
Matrix containing in each row the function value
and the minimal triple cosine at every 10 iterations
Valentin Todorov, [email protected]; Violetta Simonacci, [email protected]
H.-L. Wu, M. Shibukawa, K. Oguma, An alternating trilinear decomposition algorithm with application to calibration of HPLC-DAD for simultaneous determination of overlapped chlorinated aromatic hydrocarbons, Journal of Chemometrics 12 (1998) 1–26.
## Not run: ## Example with the OECD data data(elind) dim(elind) res <- cp_atld(elind, ncomp=3) res$fp res$fp res$iter ## End(Not run)
## Not run: ## Example with the OECD data data(elind) dim(elind) res <- cp_atld(elind, ncomp=3) res$fp res$fp res$iter ## End(Not run)
Generates nsim
data sets according to the given parameters.
If eps > 0
, the specified fraction of random outliers of the identified by the
parameter type
type are added to the data sets.
cp_gen( I = 20, J = 20, K = 20, nsim = 200, nf = 3, noise = 0.05, noise1 = 0, Acol = TRUE, Bcol = TRUE, Ccol = TRUE, congA = 0.5, congB = 0.5, congC = 0.5, eps = 0, type = c("none", "bl", "gl", "og"), c1 = 10, c2 = 0.1, silent = FALSE )
cp_gen( I = 20, J = 20, K = 20, nsim = 200, nf = 3, noise = 0.05, noise1 = 0, Acol = TRUE, Bcol = TRUE, Ccol = TRUE, congA = 0.5, congB = 0.5, congC = 0.5, eps = 0, type = c("none", "bl", "gl", "og"), c1 = 10, c2 = 0.1, silent = FALSE )
I |
number of observations |
J |
number of variables |
K |
number of occasions |
nsim |
number of data sets to generate |
nf |
number of PARAFAC components |
noise |
level of homoscedastic (HO) noise |
noise1 |
level of heteroscedastic (HE) noise |
Acol |
whether to apply collinearity with factor congA to mode A |
Bcol |
whether to apply collinearity with factor congB to mode B |
Ccol |
whether to apply collinearity with factor congC to mode C |
congA |
collinearity factor for mode A |
congB |
collinearity factor for mode B |
congC |
collinearity factor for mode C |
eps |
fraction of outliers (percent contamination) |
type |
type of outliers: one of |
c1 |
parameter for outlier generation ( |
c2 |
parameter for outlier generation ( |
silent |
whether to issue warnings |
A list consisting of the following lists:
As list of nsim
matrices for the mode A
Bs list of nsim
matrices for the mode B
Cs list of nsim
matrices for the mode C
Xs list of nsim
PARAFAC data sets, each with dimension IxJxK
Os list of nsim
vectors containing the added outliers (if any)
param list of parameters used for generation of the data sets
Todorov, V. and Simonacci, V. and Gallo, M. and Trendafilov, N. (2023). A novel estimation procedure for robust CANDECOMP/PARAFAC model fitting. Econometrics and Statistics. In press.
Tomasi, G. and Bro, R., (2006). A comparison of algorithms for fitting the PARAFAC model. Computational Statistics & Data Analysis 50 (7), 1700–1734.
Faber, N.M. and Bro, R. and Hopke, P.K. (2003). Recent developments in CANDECOMP/PARAFAC algorithms: A critical review. Chemometrics and Intelligent Laboratory Systems 65, 119–137.
## Generate one PARAFAC data set (nsim=1) with R=2 components (nf=2) and dimensions ## 50x10x10. Apply 0.15 homoscedastic noise and 0.10 heteroscedastic noise, apply ## collinearity with congruence factor 0.5 to all modes. Add 20% bad leverage points. library(rrcov3way) xdat <- cp_gen(I=50, J=100, K=10, nsim=1, nf=2, noise=0.15, noise1=0.10, Acol=TRUE, Bcol=TRUE, Ccol=TRUE, congA=0.5, congB=0.5, congC=0.5, eps=0.2, type="bl") names(xdat)
## Generate one PARAFAC data set (nsim=1) with R=2 components (nf=2) and dimensions ## 50x10x10. Apply 0.15 homoscedastic noise and 0.10 heteroscedastic noise, apply ## collinearity with congruence factor 0.5 to all modes. Add 20% bad leverage points. library(rrcov3way) xdat <- cp_gen(I=50, J=100, K=10, nsim=1, nf=2, noise=0.15, noise1=0.10, Acol=TRUE, Bcol=TRUE, Ccol=TRUE, congA=0.5, congB=0.5, congC=0.5, eps=0.2, type="bl") names(xdat)
Integrated algorithm combining ATLD and ALS for the minimization of the Candecomp/Parafac (CP) loss function.
cp_int2( X, n, m, p, ncomp, initconv = 0.01, conv = 1e-06, const = "none", start = "random", maxit = 5000, trace = FALSE )
cp_int2( X, n, m, p, ncomp, initconv = 0.01, conv = 1e-06, const = "none", start = "random", maxit = 5000, trace = FALSE )
X |
A three-way array or a matrix. If |
n |
Number of A-mode entities |
m |
Number of B-mode entities |
p |
Number of C-mode entities |
ncomp |
Number of components to extract |
initconv |
Convergence criterion for the initialization phase (ATLD),
default is |
conv |
Convergence criterion, default is |
const |
Optional constraints for each mode. Can be a three element
character vector or a single character, one of |
start |
Initial values for the A, B and C components. Can be
|
maxit |
Maximum number of iterations, default is |
trace |
Logical, provide trace output. |
The result of the decomposition as a list with the following elements:
fit
Value of the loss function
fp
Fit value expressed as a percentage
ss
Sum of squares
A
Component matrix for the A-mode
B
Component matrix for the B-mode
C
Component matrix for the C-mode
iter
Number of iterations
tripcos
Minimal triple cosine between two components
across the three component matrices, used to inspect degeneracy
mintripcos
Minimal triple cosine during the iterative
algorithm observed at every 10 iterations, used to inspect degeneracy
ftiter
Matrix containing in each row the function value
and the minimal triple cosine at every 10 iterations
const
Optional constraints (same as the input parameter
const
)
The argument const
should be a three element character vector.
Set const[j]="none"
for unconstrained update in j-th mode weight
matrix (the default),
const[j]="orth"
for orthogonal update in j-th mode weight matrix or
const[j]="zerocor"
for zero correlation between the extracted
factors.
The default is unconstrained update for all modes.
The loss function to be minimized is ,
where
is a diagonal matrix holding the
k
-th row of
C
.
Valentin Todorov, [email protected]; Violetta Simonacci, [email protected]
H.-L. Wu, M. Shibukawa, K. Oguma, An alternating trilinear decomposition algorithm with application to calibration of HPLC-DAD for simultaneous determination of overlapped chlorinated aromatic hydrocarbons, Journal of Chemometrics 12 (1998) 1–26.
Simonacci, V. and Gallo, M. (2020). An ATLD–ALS method for the trilinear decomposition of large third-order tensors, Soft Computing 24 13535–13546.
Todorov, V. and Simonacci, V. and Gallo, M. and Trendafilov, N. (2023). A novel estimation procedure for robust CANDECOMP/PARAFAC model fitting. Econometrics and Statistics. In press.
## Not run: ## Example with the OECD data data(elind) dim(elind) res <- cp_int2(elind, ncomp=3) res$fp res$fp res$iter res <- cp_int2(elind, ncomp=3, const="orth") res$A ## End(Not run)
## Not run: ## Example with the OECD data data(elind) dim(elind) res <- cp_int2(elind, ncomp=3) res$fp res$fp res$iter res <- cp_int2(elind, ncomp=3, const="orth") res$A ## End(Not run)
The estimated model will be renormalized, reflected (change of sign) or the components will be reordered. Functions that provide access to some components of the model: coordinates, weights.
## S3 method for class 'tucker3' do3Postprocess(x, reflectA, reflectB, reflectC, reorderA, reorderB, reorderC, ...) ## S3 method for class 'parafac' do3Postprocess(x, reflectA, reflectB, reflectC, reorder, ...) ## S3 method for class 'parafac' coordinates(x, mode = c("A", "B", "C"), type = c("normalized", "unit", "principal"), ...) ## S3 method for class 'tucker3' coordinates(x, mode = c("A", "B", "C"), type = c("normalized", "unit", "principal"), ...) ## S3 method for class 'parafac' weights(object, ...) ## S3 method for class 'tucker3' weights(object, mode = c("A", "B", "C"), ...) ## S3 method for class 'parafac' reflect(x, mode = c("A", "B", "C"), rsign = 1, ...) ## S3 method for class 'tucker3' reflect(x, mode = c("A", "B", "C"), rsign = 1, ...)
## S3 method for class 'tucker3' do3Postprocess(x, reflectA, reflectB, reflectC, reorderA, reorderB, reorderC, ...) ## S3 method for class 'parafac' do3Postprocess(x, reflectA, reflectB, reflectC, reorder, ...) ## S3 method for class 'parafac' coordinates(x, mode = c("A", "B", "C"), type = c("normalized", "unit", "principal"), ...) ## S3 method for class 'tucker3' coordinates(x, mode = c("A", "B", "C"), type = c("normalized", "unit", "principal"), ...) ## S3 method for class 'parafac' weights(object, ...) ## S3 method for class 'tucker3' weights(object, mode = c("A", "B", "C"), ...) ## S3 method for class 'parafac' reflect(x, mode = c("A", "B", "C"), rsign = 1, ...) ## S3 method for class 'tucker3' reflect(x, mode = c("A", "B", "C"), rsign = 1, ...)
x |
Tucker3 or Parafac solution |
object |
Tucker3 or Parafac solution (alternative of |
reflectA |
How to handle the signs of the components of mode A - can be a single number or a vector with length of the number of components of A |
reflectB |
How to handle the signs of the components of mode B - can be a single number or a vector with length of the number of components of B |
reflectC |
How to handle the signs of the components of mode C - can be a single number or a vector with length of the number of components of C |
reorder |
How to reorder the components of a Parafac solution - a vector with length of the number of components |
reorderA |
How to reorder the components of mode A - a vector with length of the number of components of A giving the new order |
reorderB |
How to reorder the components of mode B - a vector with length of the number of components of B giving the new order |
reorderC |
How to reorder the components of mode C - a vector with length of the number of components of C giving the new order |
mode |
For which mode to provide the coordinates or weights. Default is mode A |
type |
Which type of coordinates to provide. Default is "normalized" |
rsign |
How to change the sign of the components of the given mode. Can be a single number or a vector with length of the number of components of the corresponding mode. |
... |
Potential further arguments passed to lower level functions. |
The output value of do3Postproces() is the postprocessed solution, Parafac or Tucker3.
The output of weights()
and coordinates()
are the respective values.
Valentin Todorov [email protected] and Maria Anna Di Palma [email protected] and Michele Gallo [email protected]
Kroonenberg (2008). Applied multiway data analysis. Wiley series in probability and statistics. Hoboken NJ, Wiley.
data(elind) x1 <- do3Scale(elind, center=TRUE, scale=TRUE) (cp <- Parafac(x1, ncomp=3, const=c("orth", "none", "none"))) cp$B cp1 <- do3Postprocess(cp, reflectB=-1) # change the sign of all components of B cp1$B weights(cp1) coordinates(cp1) coordinates(cp1, type="principal") ## Same as above - the centering and scaling is done inside the Parafac procedure (cp1 <- Parafac(elind, ncomp=3, const=c("orth", "none", "none"), center=TRUE, scale=TRUE)) ## Robust estimation with robust scaling with median and mad (cp1 <- Parafac(elind, ncomp=3, const=c("orth", "none", "none"), center=median, scale=mad, robust=TRUE)) ## Robust estimation with robust scaling with median and Qn (Rousseeuw and Croux, 1993) require(robustbase) (cp1 <- Parafac(elind, ncomp=3, const=c("orth", "none", "none"), center=median, scale=Qn, robust=TRUE))
data(elind) x1 <- do3Scale(elind, center=TRUE, scale=TRUE) (cp <- Parafac(x1, ncomp=3, const=c("orth", "none", "none"))) cp$B cp1 <- do3Postprocess(cp, reflectB=-1) # change the sign of all components of B cp1$B weights(cp1) coordinates(cp1) coordinates(cp1, type="principal") ## Same as above - the centering and scaling is done inside the Parafac procedure (cp1 <- Parafac(elind, ncomp=3, const=c("orth", "none", "none"), center=TRUE, scale=TRUE)) ## Robust estimation with robust scaling with median and mad (cp1 <- Parafac(elind, ncomp=3, const=c("orth", "none", "none"), center=median, scale=mad, robust=TRUE)) ## Robust estimation with robust scaling with median and Qn (Rousseeuw and Croux, 1993) require(robustbase) (cp1 <- Parafac(elind, ncomp=3, const=c("orth", "none", "none"), center=median, scale=Qn, robust=TRUE))
Computes varimax rotation of the core and component matrix of a Tucker3 model to simple structure.
do3Rotate(x, ...) ## S3 method for class 'tucker3' do3Rotate(x, weights = c(0, 0, 0), rotate = c("A", "B", "C"), ...)
do3Rotate(x, ...) ## S3 method for class 'tucker3' do3Rotate(x, weights = c(0, 0, 0), rotate = c("A", "B", "C"), ...)
x |
A Tucker 3 object |
... |
Potential further arguments passed to called functions. |
weights |
A numeric vector with length 3: relative weights (greater or
equal 0) for the simplicity of the component matrices |
rotate |
Within which mode to rotate the Tucker3 solution:
|
A list including the following components:
Valentin Todorov, [email protected]
## Rotation of a Tucker3 solution data(elind) (t3 <- Tucker3(elind, 3, 2, 2)) xout <- do3Rotate(t3, c(3, 3, 3), rotate=c("A", "B", "C")) xout$vvalue
## Rotation of a Tucker3 solution data(elind) (t3 <- Tucker3(elind, 3, 2, 2)) xout <- do3Rotate(t3, c(3, 3, 3), rotate=c("A", "B", "C")) xout$vvalue
Centering and/or normalization of a three way array or a matricized array across one mode (modes indicated by "A", "B" or "C").
## S3 method for class 'tucker3' do3Scale(x, renorm.mode = c("A", "B", "C"), ...) ## S3 method for class 'parafac' do3Scale(x, renorm.mode = c("A", "B", "C"), ...) ## Default S3 method: do3Scale(x, center = FALSE, scale = FALSE, center.mode = c("A", "B", "C", "AB", "AC", "BC", "ABC"), scale.mode = c("B", "A", "C"), only.data=TRUE, ...)
## S3 method for class 'tucker3' do3Scale(x, renorm.mode = c("A", "B", "C"), ...) ## S3 method for class 'parafac' do3Scale(x, renorm.mode = c("A", "B", "C"), ...) ## Default S3 method: do3Scale(x, center = FALSE, scale = FALSE, center.mode = c("A", "B", "C", "AB", "AC", "BC", "ABC"), scale.mode = c("B", "A", "C"), only.data=TRUE, ...)
x |
Three dimensional array of order (I x J x K) or a matrix (or data.frame coerced to a matrix) of order (I x JK) containing the matricized array (frontal slices) |
center |
Whether and how to center the data. Can be |
scale |
Whether and how to scale the data. Can be |
center.mode |
Across which mode to center. Default is |
scale.mode |
Within which mode to scale. Default is |
renorm.mode |
Within which mode to renormalize a Parafac or Tucker3 solution.
See in Details how this is performed for the different models.
Default is |
only.data |
Whether to return only the centered/scaled data or also
the center and the scale themselves. Default is |
... |
potential further arguments passed to lower level functions. |
A named list, consisting of the centered and/or scaled data, a center vector, a scale vector and the mode in which the data were centered/scaled.
Valentin Todorov [email protected] and Maria Anna Di Palma [email protected] and Michele Gallo [email protected]
Kiers, H.A.L. (2000).Towards a standardizrd notation and terminology in multiway analysis. Journal of Chemometrics, 14:105-122.
Kroonenberg, P.M. (1983).Three-mode principal component analysis: Theory and applications (Vol. 2), DSWO press.
data(elind) (x1 <- do3Scale(elind, center=TRUE, scale=TRUE)) (x2 <- do3Scale(elind, center=TRUE, scale=TRUE, center.mode="B")) (x3 <- do3Scale(elind, center=TRUE, scale=TRUE, center.mode="C", scale.mode="C"))
data(elind) (x1 <- do3Scale(elind, center=TRUE, scale=TRUE)) (x2 <- do3Scale(elind, center=TRUE, scale=TRUE, center.mode="B")) (x3 <- do3Scale(elind, center=TRUE, scale=TRUE, center.mode="C", scale.mode="C"))
A data set with 27 synthetic samples containing different concentrations of four analytes (hydroquinone, tryptophan, phenylalanine and dopa) measured in a Perkin-Elmer LS50 B fluorescence spectrometer.
data(dorrit)
data(dorrit)
A three-way array with dimension 27 x 116 x 18
.
The first dimension refers to the 27 samples. The second dimension
refers to the emission measurements (251-481nm, 2nm intervals).
The third dimension refers to the excitation (230-315nm, 5nm intervals).
Each fluorescence landscape corresponding to each sample in the original data set consists of 233 emission wavelengths (250-482 nm) and 24 excitation wavelengths (200-315 nm taken each 5 nm). The fluorescence data is three-way. Ideally, the data is trilinear, the components of the modes corresponding to concentrations (27), emission spectra (233) and excitation spectra (24). Hence a three-way PARAFAC model should be capable of uniquely and meaningfully describing the variation in the data set.
The data set is modified as described in Engelen and Hubert (2011):
the emission wavelengths are taken at 2 nm,
noisy parts situated at the excitation wavelengths from 200 to 230 nm and
at emission wavelengths below 250 nm are excluded. The severe Rayleigh
scattering areas present in all samples are replaced by interpolated values.
Thus we end up with a (27 x 116 x 18)
data array.
https://ucphchemometrics.com/datasets/.
Bro, R, Sidiropoulos, ND and Smilde, AK (2002). Maximum likelihood fitting using ordinary least squares algorithms. Journal of Chemometrics, 16(8–10), 387–400.
Riu, J and Bro, R (2003) Jack-knife estimation of standard errors and outlier detection in PARAFAC models. Chemometrics and Intelligent Laboratory Systems, 65(1), 35–49
Engelen, S and Hubert, M (2011) Detecting outlying samples in a parallel factor analysis model, Analytica Chemica Acta 705 155–165.
Baunsgaard, D (1999) Factors Affecting 3-way Modelling (PARAFAC) of Fluorescence Landscapes, Royal Veterinary and Agricultural University, Department of Dairy and Food Science, Frederiksberg, Denmark.
## Not run: data(dorrit) ## Plotting Emission spectra oldpar <- par(mfrow=c(2,1)) matplot(t(dorrit[,,1]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence emission spectra") matplot(t(dorrit[,,5]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence emission spectra") par(oldpar) ## Plotting excitation spectra oldpar <- par(mfrow=c(2,1)) matplot(t(dorrit[,1,]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence excitation spectra") matplot(t(dorrit[,30,]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence excitation spectra") par(oldpar) persp(as.numeric(dimnames(dorrit)[[2]]), as.numeric(dimnames(dorrit)[[3]]), dorrit[4,,], xlab="Emission", ylab="Excitation", zlab="Intensity", theta = 30, phi = 30, expand = 0.5, col = "lightblue", ticktype="detailed") pp <- Parafac(dorrit, ncomp=4, robust=TRUE) plot(pp) ## End(Not run)
## Not run: data(dorrit) ## Plotting Emission spectra oldpar <- par(mfrow=c(2,1)) matplot(t(dorrit[,,1]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence emission spectra") matplot(t(dorrit[,,5]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence emission spectra") par(oldpar) ## Plotting excitation spectra oldpar <- par(mfrow=c(2,1)) matplot(t(dorrit[,1,]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence excitation spectra") matplot(t(dorrit[,30,]), type="l", xlab="Wavelength/nm", ylab="Intensity", main="Fluorescence excitation spectra") par(oldpar) persp(as.numeric(dimnames(dorrit)[[2]]), as.numeric(dimnames(dorrit)[[3]]), dorrit[4,,], xlab="Emission", ylab="Excitation", zlab="Intensity", theta = 30, phi = 30, expand = 0.5, col = "lightblue", ticktype="detailed") pp <- Parafac(dorrit, ncomp=4, robust=TRUE) plot(pp) ## End(Not run)
OECD publishes comparative statistics of the export size of various sectors of the electronics industry:
information science,
telecommunication products,
radio and television equipment,
components and parts,
electromedical equipment, and
scientific equipment.
The data consist of specialisation indices of electronics industries of 23 European countries for the years 1973–1979. The specialization index is defined as the proportion of the monetary value of an electronic industry compared to the total export value of manufactured goods of a country compared to the similar proportion for the world as a whole (see D'Ambra, 1985, p. 249 and Kroonenberg, 2008, p.282).
data(elind)
data(elind)
A three-way array with dimension 23x6x7. The first dimension refers to 23 countries. The second dimension refers to the six indices of electronics industries. The third dimension refers to the years in the period 1978–1985.
The data set is available from Pieter Kroonenberg's web site at: "three-mode.leidenuniv.nl/data/electronicindustriesinfo.htm"
D'Ambra, L. (1985). Alcune estensione dell'analisi in componenti principali per lo studio dei sistemi evolutivi. Uno studio sul commercio internazionale dell'elettronica. In: Ricerche Economiche. 2. del Dipartimento di Scienze Economiche Ca'Foscari, Venezia.
Kroonenberg PM (2008). Applied multiway data analysis. Wiley series in probability and statistics. John Wiley and Sons, Hoboken, NJ, p.282.
data(elind) res <- Parafac(elind, robust=FALSE, coda.transform="none") ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") ## Paired component plot, mode A plot(res, which="comp", main="Paired component plot (mode A)") ## Paired component plot, mode B plot(res, which="comp", mode="B", main="Paired component plot (mode B)") ## Per-component plot plot(res, which="percomp", comp=1, main="Per component plot") ## all components plot plot(res, which="allcomp", main="All components plot", legend.position="topright")
data(elind) res <- Parafac(elind, robust=FALSE, coda.transform="none") ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") ## Paired component plot, mode A plot(res, which="comp", main="Paired component plot (mode A)") ## Paired component plot, mode B plot(res, which="comp", mode="B", main="Paired component plot (mode B)") ## Per-component plot plot(res, which="percomp", comp=1, main="Per component plot") ## all components plot plot(res, which="allcomp", main="All components plot", legend.position="topright")
Thirty girls selected from a French auxiological study (1953-1975) to get insight into the physical growth patterns of children from ages four to fifteen, Sempe (1987). They were measured yearly between the ages 4 and 15 on the following eight variables:
weight = Weight
length = Length
crump = Crown-rump length
head = Head circumference
chest = Chest circumference
arm = Arm
calf = Calf
pelvis = Pelvis
The data set is three way data array of size 30 (girls) x 8 (variables) x 12 (years).
data("girls")
data("girls")
The format is a three way array with the following dimensions: The first dimension refers to 30 girls. The second dimension refers to the eight variables measured on the girls. The third dimension refers to the years – 4 to 15.
The data are generally preprocessed as standard multiway profile data. For details see Kroonenberg (2008), Chapters 6 and 15.
The data sets are available from Pieter Kroonenberg's web site at: "three-mode.leidenuniv.nl/data/girlsgrowthcurvesinfo.htm"
Sempe, M. (1987). Multivariate and longitudinal data on growing children: Presentation of the French auxiological survey. In J.Janssen et al. Data analysis. The Ins and Outs of solving real problems (pp. 3-6). New York: Plenum Press.
Kroonenberg (2008). Applied multiway data analysis. Wiley series in probability and statistics. Hoboken NJ, Wiley.
data(girls) str(girls) ## Center the data in mode A and find the "average girl" center.girls <- do3Scale(girls, center=TRUE, only.data=FALSE) X <- center.girls$x center <- center.girls$center average.girl <- as.data.frame(matrix(center, ncol=8, byrow=TRUE)) dimnames(average.girl) <- list(dimnames(X)[[3]], dimnames(X)[[2]]) ## Divide these variables by 10 to reduce their range average.girl$weight <- average.girl$weight/10 average.girl$length <- average.girl$length/10 average.girl$crrump <- average.girl$crrump/10 average.girl p <- ncol(average.girl) plot(rownames(average.girl), average.girl[,1], ylim=c(min(average.girl), max(average.girl)), type="n", xlab="Age", ylab="") for(i in 1: p) { lines(rownames(average.girl), average.girl[,i], lty=i, col=i) points(rownames(average.girl), average.girl[,i], pch=i, col=i) } legend <- colnames(average.girl) legend[1] <- paste0(legend[1], "*") legend[2] <- paste0(legend[3], "*") legend[3] <- paste0(legend[4], "*") legend("topleft", legend=legend, col=1:p, lty=1:p, pch=1:p)
data(girls) str(girls) ## Center the data in mode A and find the "average girl" center.girls <- do3Scale(girls, center=TRUE, only.data=FALSE) X <- center.girls$x center <- center.girls$center average.girl <- as.data.frame(matrix(center, ncol=8, byrow=TRUE)) dimnames(average.girl) <- list(dimnames(X)[[3]], dimnames(X)[[2]]) ## Divide these variables by 10 to reduce their range average.girl$weight <- average.girl$weight/10 average.girl$length <- average.girl$length/10 average.girl$crrump <- average.girl$crrump/10 average.girl p <- ncol(average.girl) plot(rownames(average.girl), average.girl[,1], ylim=c(min(average.girl), max(average.girl)), type="n", xlab="Age", ylab="") for(i in 1: p) { lines(rownames(average.girl), average.girl[,i], lty=i, col=i) points(rownames(average.girl), average.girl[,i], pch=i, col=i) } legend <- colnames(average.girl) legend[1] <- paste0(legend[1], "*") legend[2] <- paste0(legend[3], "*") legend[3] <- paste0(legend[4], "*") legend("topleft", legend=legend, col=1:p, lty=1:p, pch=1:p)
The data are drawn from a study (Kojima, 1975) of the perception of parental behaviour by parents and their children. Two data sets, boys and girls are available as Kojima.boys and Kojima.girls.
Boys data were analysed in Kroonenberg (2008)
Girls data were analysed in Kroonenberg, Harshman, & Murakami (2009).
data(Kojima)
data(Kojima)
Both data sets are three dimensional arrays:
boys: 150 x 18 x 4
girls: 153 x 18 x 4
The rows (1st mode) are 150 Japanese sons/153 Japanese daughters. The columns (2nd mode) are 18 scales (Acceptance, Child centerness, Possesiveness, etc.). The slices (3rd mode) are the 4 judgements (See Details for explanation).
The boys
data are ratings expressing the judgments of parents with respect to their
own behaviour toward their sons, and the judgments of their sons with respect to their
parents. Thus, there are four conditions:
Father-Own behaviour (F-F),
Mother-Own behaviour (M-M),
Son-Father (B-F),
Son-Mother (B-M).
The judgments involved 150 middle-class Japanese eighth-grade boys on the 18 subscales of the inventory. Thus, the data set consists of a 150 (Sons) x 18 (scales) x 4 (judgment combinations) data array.
Similarly, the girls data are ratings expressing the judgments of parents with respect to their own behaviour toward their daughters, and the judgments of their daughters with respect to their parents. Thus, there are four conditions:
Father-Own behaviour (F-F),
Mother-Own behaviour (M-M),
Daughter-Father (G-F),
Daughter-Mother (G-M).
The judgments involved 153 middle-class Japanese eighth-grade girls on the 18 subscales of the inventory. Thus, the data set consists of a 153 (Daughters) x 18 (scales) x 4 (judgment combinations) data array.
Preprocessing Given that the data are three-way profile data they are treated in the standard manner: centering per occasion-variable combination and by normalising the data after centring per lateral slice i.e. per scale over all sons/daughters x judges combinations. For details see Kroonenberg (2008), Chapter 13.
The data sets are available from the Pieter Kroonenberg's web site at https://three-mode.leidenuniv.nl/.
Kojima, H. (1975). Inter-battery factor analysis of parents' and children's reports of parental behavior. Japanese Psychological Bulletin, 17, 33-48 (in Japanese).
Kroonenberg, P. M. (2008). Applied multiway data analysis. Wiley series in probability and statistics. Wiley, Hoboken NJ.
Kroonenberg, P. M., Harshman, R. A, & Murakami, T. (2009). Analysing three-way profile data using the Parafac and Tucker3 models illustrated with views on parenting. Applied Multivariate Research, 13:5-41. PDF available at: http://www.phaenex.uwindsor.ca/ojs/leddy/index.php/AMR/article/viewFile/2833/2271
data(Kojima) dim(Kojima.boys) dim(Kojima.girls)
data(Kojima) dim(Kojima.boys) dim(Kojima.girls)
The function krp(A,B)
returns the Khatri-Rao product of two matrices A
and B
, of
dimensions I x K and J x K respectively. The result is an IJ x K matrix formed by the matching
column-wise Kronecker products, i.e. the k-th column of the Khatri-Rao product is
defined as kronecker(A[, k], B[, k])
.
krp(A, B)
krp(A, B)
A |
Matrix of order I x K. |
B |
Matrix of order J x K. |
The IJ x K matrix of columnwise Kronecker products.
Valentin Todorov, [email protected]
Khatri, C. G., and Rao, C. Radhakrishna (1968). Solutions to Some Functional Equations and Their Applications to Characterization of Probability Distributions. Sankhya: Indian J. Statistics, Series A 30, 167-180.
Smilde, A., Bro R. and Gelardi, P. (2004). Multi-way Analysis: Applications in Chemical Sciences, Chichester:Wiley
a <- matrix(1:12, 3, 4) b <- diag(1:4) krp(a, b) krp(b, a)
a <- matrix(1:12, 3, 4) b <- diag(1:4) krp(a, b) krp(b, a)
Computes the trace of a square numeric matrix. If A
is not numeric and square matrix,
the function terminates with an error message.
mtrace(A)
mtrace(A)
A |
A square numeric matrix. |
the sum of the values on the diagonal of the matrix A
, i.e. sum(diag(A))
.
Valentin Todorov, [email protected]
(a <- matrix(c(5,2,3, 4,-3,7, 4,1,2), ncol=3)) (b <- matrix(c(1,0,1, 0,1,2, 1,0,3), ncol=3)) mtrace(a) mtrace(b) ## tr(A+B)=tr(A)+tr(B) all.equal(mtrace(a) + mtrace(b), mtrace(a+b)) ## tr(A)=tr(A') all.equal(mtrace(a), mtrace(t(a))) ## tr(alphA)=alphatr(A) alpha <- 0.5 all.equal(mtrace(alpha*a), alpha*mtrace(a)) ## tr(AB)=tr(BA) all.equal(mtrace(a %*% b), mtrace(b %*% a)) ## tr(A)=tr(BAB-1) all.equal(mtrace(a), mtrace(b %*% a %*% solve(b)))
(a <- matrix(c(5,2,3, 4,-3,7, 4,1,2), ncol=3)) (b <- matrix(c(1,0,1, 0,1,2, 1,0,3), ncol=3)) mtrace(a) mtrace(b) ## tr(A+B)=tr(A)+tr(B) all.equal(mtrace(a) + mtrace(b), mtrace(a+b)) ## tr(A)=tr(A') all.equal(mtrace(a), mtrace(t(a))) ## tr(alphA)=alphatr(A) alpha <- 0.5 all.equal(mtrace(alpha*a), alpha*mtrace(a)) ## tr(AB)=tr(BA) all.equal(mtrace(a %*% b), mtrace(b %*% a)) ## tr(A)=tr(BAB-1) all.equal(mtrace(a), mtrace(b %*% a %*% solve(b)))
Compute a robust Parafac model for compositional data
Parafac(X, ncomp = 2, center = FALSE, center.mode = c("A", "B", "C", "AB", "AC", "BC", "ABC"), scale=FALSE, scale.mode=c("B", "A", "C"), const="none", conv = 1e-06, start="svd", maxit=10000, optim=c("als", "atld", "int2"), robust = FALSE, coda.transform=c("none", "ilr", "clr"), ncomp.rpca = 0, alpha = 0.75, robiter = 100, crit=0.975, trace = FALSE)
Parafac(X, ncomp = 2, center = FALSE, center.mode = c("A", "B", "C", "AB", "AC", "BC", "ABC"), scale=FALSE, scale.mode=c("B", "A", "C"), const="none", conv = 1e-06, start="svd", maxit=10000, optim=c("als", "atld", "int2"), robust = FALSE, coda.transform=c("none", "ilr", "clr"), ncomp.rpca = 0, alpha = 0.75, robiter = 100, crit=0.975, trace = FALSE)
X |
3-way array of data |
ncomp |
Number of components |
center |
Whether to center the data |
center.mode |
If centering the data, on which mode to do this |
scale |
Whether to scale the data |
scale.mode |
If scaling the data, on which mode to do this |
const |
Optional constraints for each mode. Can be a three element character
vector or a single character, one of |
conv |
Convergence criterion, defaults to |
start |
Initial values for the A, B and C components. Can be |
maxit |
Maximum number of iterations, default is |
optim |
How to optimize the CP loss function, default is to use ALS, i.e.
|
robust |
Whether to apply a robust estimation |
coda.transform |
If the data are a composition, use an ilr or clr transformation.
Default is non-compositional data, i.e. |
ncomp.rpca |
Number of components for robust PCA |
alpha |
Measures the fraction of outliers the algorithm should resist. Allowed values are between 0.5 and 1 and the default is 0.75 |
robiter |
Maximal number of iterations for robust estimation |
crit |
Cut-off for identifying outliers, default |
trace |
Logical, provide trace output |
The function can compute four versions of the Parafac model:
Classical Parafac,
Parafac for compositional data,
Robust Parafac and
Robust Parafac for compositional data.
This is controlled though the paramters robust=TRUE
and coda.transform=c("none", "ilr").
An object of class "parafac" which is basically a list with components:
fit |
Fit value |
fp |
Fit percentage |
ss |
Sum of squares |
A |
Orthogonal loading matrix for the A-mode |
B |
Orthogonal loading matrix for the A-mode |
Bclr |
Orthogonal loading matrix for the B-mode, clr transformed. Available only if coda.transform="ilr", otherwise NULL |
C |
Orthogonal loading matrix for the C-mode |
Xhat |
(Robustly) reconstructed array |
const |
Optional constraints (same as the input parameter) |
iter |
Number of iterations |
rd |
Residual distances |
sd |
Score distances |
flag |
The observations whose residual distance |
robust |
The paramater |
coda.transform |
Which coda transformation is used, can be |
Valentin Todorov [email protected] and Maria Anna Di Palma [email protected] and Michele Gallo [email protected]
Harshman, R.A. (1970). Foundations of Parafac procedure: models and conditions for an "explanatory" multi-mode factor analysis. UCLA Working Papers in Phonetics, 16: 1–84.
Engelen, S., Frosch, S. and Jorgensen, B.M. (2009). A fully robust PARAFAC method analyzing fluorescence data. Journal of Chemometrics, 23(3): 124–131.
Kroonenberg, P.M. (1983).Three-mode principal component analysis: Theory and applications (Vol. 2), DSWO press.
Rousseeuw, P.J. and Driessen, K.V. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3): 212–223.
Egozcue J.J., Pawlowsky-Glahn V., Mateu-Figueras G. and Barcel'o-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3): 279-300
############# ## ## Example with the UNIDO Manufacturing value added data data(va3way) dim(va3way) ## Treat quickly and dirty the zeros in the data set (if any) va3way[va3way==0] <- 0.001 ## res <- Parafac(va3way) res print(res$fit) print(res$A) ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") data(ulabor) res <- Parafac(ulabor, robust=TRUE, coda.transform="ilr") res ## Plot Orthonormalized A-mode component plot plot(res, which="comp", mode="A", main="Component plot, A-mode") ## Plot Orthonormalized B-mode component plot plot(res, which="comp", mode="B", main="Component plot, B-mode") ## Plot Orthonormalized C-mode component plot plot(res, which="comp", mode="C", main="Component plot, C-mode")
############# ## ## Example with the UNIDO Manufacturing value added data data(va3way) dim(va3way) ## Treat quickly and dirty the zeros in the data set (if any) va3way[va3way==0] <- 0.001 ## res <- Parafac(va3way) res print(res$fit) print(res$A) ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") data(ulabor) res <- Parafac(ulabor, robust=TRUE, coda.transform="ilr") res ## Plot Orthonormalized A-mode component plot plot(res, which="comp", mode="A", main="Component plot, A-mode") ## Plot Orthonormalized B-mode component plot plot(res, which="comp", mode="B", main="Component plot, B-mode") ## Plot Orthonormalized C-mode component plot plot(res, which="comp", mode="C", main="Component plot, C-mode")
Permutes the matricized (n
x
m
x
p
) array X
to the
matricized array Y
of order (m
x
p
x
n
).
permute(X,n,m,p)
permute(X,n,m,p)
X |
Matrix (or data.frame coerced to a matrix) containing the matricized array |
n |
Number of |
m |
Number of |
p |
Number of |
Y |
Matrix containing the permuted matricized array |
H.A.L. Kiers (2000). Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics 14:105–122.
X <- array(1:24, c(4,3,2)) dim(X) ## matricize the array Xa <- unfold(X) # matricized X with the A-mode entities in its rows dim(Xa) Xa ## matricized X with the B-mode entities in its rows Xb <- permute(Xa, 4, 3, 2) dim(Xb) Xb ## matricized X with the C-mode entities in its rows Xc <- permute(Xb, 3, 2, 4) dim(Xc) Xc
X <- array(1:24, c(4,3,2)) dim(X) ## matricize the array Xa <- unfold(X) # matricized X with the A-mode entities in its rows dim(Xa) Xa ## matricized X with the B-mode entities in its rows Xb <- permute(Xa, 4, 3, 2) dim(Xb) Xb ## matricized X with the C-mode entities in its rows Xc <- permute(Xb, 3, 2, 4) dim(Xc) Xc
Different plots for the results of Parafac or Tucker3 analysis, stored in a
Parafac
or a tucker3
object, see Details.
## S3 method for class 'tucker3' plot(x, which = c("dd", "comp", "allcomp", "jbplot", "tjplot", "all"), ask = (which == "all" && dev.interactive(TRUE)), id.n, ...) ## S3 method for class 'parafac' plot(x, which = c("dd", "comp", "percomp", "allcomp", "all"), ask = (which == "all" && dev.interactive(TRUE)), id.n, ...)
## S3 method for class 'tucker3' plot(x, which = c("dd", "comp", "allcomp", "jbplot", "tjplot", "all"), ask = (which == "all" && dev.interactive(TRUE)), id.n, ...) ## S3 method for class 'parafac' plot(x, which = c("dd", "comp", "percomp", "allcomp", "all"), ask = (which == "all" && dev.interactive(TRUE)), id.n, ...)
x |
A |
which |
Which plot to select (see Details). Default is |
ask |
Generates all plots in interactive mode |
id.n |
Number of items to highlight |
... |
Other parameters to be passed to the lower level functions. |
Different plots for a tucker3
or parafac
object will be produced. Use the parameter
which
to select which plot to produce:
dd
Distance-distance plot
comp
Paired components plot
percomp
Per-component plot - only for Parafac
allcomp
All components plot
jbplot
Joint biplot - only for Tucker3
tjplot
Trajectory plot - only for Tucker3
Valentin Todorov [email protected] and Maria Anna Di Palma [email protected] and Michele Gallo [email protected]
Kiers, H.A. (2000).Some procedures for displaying results from three-way methods. Journal of Chemometrics. 14(3): 151-170.
Kroonenberg, P.M. (1983).Three-mode principal component analysis: Theory and applications (Vol. 2), DSWO press.
############# ## ## Example with the UNIDO Manufacturing value added data data(va3way) dim(va3way) ## Treat quickly and dirty the zeros in the data set (if any) va3way[va3way==0] <- 0.001 ## res <- Tucker3(va3way) res print(res$fit) print(res$A) ## Print the core matrix print(res$GA) ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") ## Paired component plot, mode A plot(res, which="comp", main="Paired component plot (mode A)") ## Paired component plot, mode B plot(res, which="comp", mode="B", main="Paired component plot (mode B)") ## Joint biplot plot(res, which="jbplot", main="Joint biplot") ## Trajectory plot(res, which="tjplot", choices=c(1:4), arrows=FALSE, main="Trajectory biplot")
############# ## ## Example with the UNIDO Manufacturing value added data data(va3way) dim(va3way) ## Treat quickly and dirty the zeros in the data set (if any) va3way[va3way==0] <- 0.001 ## res <- Tucker3(va3way) res print(res$fit) print(res$A) ## Print the core matrix print(res$GA) ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") ## Paired component plot, mode A plot(res, which="comp", main="Paired component plot (mode A)") ## Paired component plot, mode B plot(res, which="comp", mode="B", main="Paired component plot (mode B)") ## Joint biplot plot(res, which="jbplot", main="Joint biplot") ## Trajectory plot(res, which="tjplot", choices=c(1:4), arrows=FALSE, main="Trajectory biplot")
Restore an array from its matricization with all the frontal slices of the array next to each other (mode="A")
toArray(x, n, m, r, mode = c("A", "B", "C"))
toArray(x, n, m, r, mode = c("A", "B", "C"))
x |
Matrix (or data.frame coerced to a matrix) containing the elements of the frontal slices of an array |
n |
number of A-mode elements |
m |
number of B-mode elements |
r |
number of C-mode elements |
mode |
in which mode is the matricized array |
Three way array
Valentin Todorov [email protected] and Maria Anna Di Palma [email protected] and Michele Gallo [email protected]
H.A.L. Kiers (2000). Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics, 14: 105–122.
data(elind) di <- dim(elind) toArray(unfold(elind), di[1], di[2], di[3])
data(elind) di <- dim(elind) toArray(unfold(elind), di[1], di[2], di[3])
Compute a robust Tucker3 model for compositional data
Tucker3(X, P = 2, Q = 2, R = 2, center = FALSE, center.mode = c("A", "B", "C", "AB", "AC", "BC", "ABC"), scale = FALSE, scale.mode = c("B", "A", "C"), conv = 1e-06, start="svd", robust = FALSE, coda.transform=c("none", "ilr", "clr"), ncomp.rpca = 0, alpha = 0.75, robiter=100, crit=0.975, trace = FALSE)
Tucker3(X, P = 2, Q = 2, R = 2, center = FALSE, center.mode = c("A", "B", "C", "AB", "AC", "BC", "ABC"), scale = FALSE, scale.mode = c("B", "A", "C"), conv = 1e-06, start="svd", robust = FALSE, coda.transform=c("none", "ilr", "clr"), ncomp.rpca = 0, alpha = 0.75, robiter=100, crit=0.975, trace = FALSE)
X |
3-way array of data |
P |
Number of A-mode components |
Q |
Number of B-mode components |
R |
Number of C-mode components |
center |
Whether to center the data |
center.mode |
If scaling the data, on which mode to do this |
scale |
Whether to scale the data |
scale.mode |
If centering the data, on which mode to do this |
conv |
Convergence criterion, defaults to |
start |
Initial values for the A, B and C components. Can be |
robust |
Whether to apply a robust estimation |
coda.transform |
If the data are a composition, use an ilr or clr transformation.
Default is non-compositional data, i.e. |
ncomp.rpca |
Number of components for robust PCA |
alpha |
Measures the fraction of outliers the algorithm should resist. Allowed values are between 0.5 and 1 and the default is 0.75 |
robiter |
Maximal number of iterations for robust estimation |
crit |
Cut-off for identifying outliers, default |
trace |
Logical, provide trace output |
The function can compute four versions of the Tucker3 model:
Classical Tucker3,
Tucker3 for compositional data,
Robust Tucker3 and
Robust Tucker3 for compositional data.
This is controlled through the parameters robust=TRUE
and coda.transform="ilr"
.
An object of class "tucker3" which is basically a list with components:
fit |
Fit value |
fp |
Fit percentage |
A |
Orthogonal loading matrix for the A-mode |
B |
Orthogonal loading matrix for the B-mode |
Bclr |
Orthogonal loading matrix for the B-mode, clr transformed.
Available only if |
C |
Orthogonal loading matrix for the C-mode |
GA |
Core matrix, which describes the relation between |
iter |
Number of iterations |
rd |
Residual distances |
sd |
Score distances |
flag |
The observations whose residual distance |
robust |
The paramater |
coda.transform |
The input paramater |
La |
Diagonal matrix containing the intrinsic eigenvalues for A-mode |
Lb |
Diagonal matrix containing the intrinsic eigenvalues for B-mode |
Lc |
Diagonal matrix containing the intrinsic eigenvalues for C-mode |
Valentin Todorov [email protected] and Maria Anna Di Palma [email protected] and Michele Gallo [email protected]
Tucker, L.R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31: 279–311.
Egozcue J.J., Pawlowsky-Glahn, V., Mateu-Figueras G. and Barcel'o-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3): 279–300.
############# ## ## Example with the UNIDO Manufacturing value added data data(va3way) dim(va3way) ## Treat quickly and dirty the zeros in the data set (if any) va3way[va3way==0] <- 0.001 ## res <- Tucker3(va3way) res print(res$fit) print(res$A) ## Print the core matrix print(res$GA) ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") ## Paired component plot, mode A plot(res, which="comp", main="Paired component plot (mode A)") ## Paired component plot, mode B plot(res, which="comp", mode="B", main="Paired component plot (mode B)") ## Joint biplot plot(res, which="jbplot", main="Joint biplot") ## Trajectory plot(res, which="tjplot", choices=c(1:4), arrows=FALSE, main="Trajectory biplot")
############# ## ## Example with the UNIDO Manufacturing value added data data(va3way) dim(va3way) ## Treat quickly and dirty the zeros in the data set (if any) va3way[va3way==0] <- 0.001 ## res <- Tucker3(va3way) res print(res$fit) print(res$A) ## Print the core matrix print(res$GA) ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") ## Paired component plot, mode A plot(res, which="comp", main="Paired component plot (mode A)") ## Paired component plot, mode B plot(res, which="comp", mode="B", main="Paired component plot (mode B)") ## Joint biplot plot(res, which="jbplot", main="Joint biplot") ## Trajectory plot(res, which="tjplot", choices=c(1:4), arrows=FALSE, main="Trajectory biplot")
The dataset contains the undeclared labor in thousands work units. The data originate from Italy and are recorded at a regional level over a certain time horizon for five macroeconomic activities defined according to NACE Rev. 1.1 classification.
data("ulabor")
data("ulabor")
A three-way array with dimension 22x5x5. The first dimension refers to 22 regions in Italy. The second dimension refers to the 5 economic activities. The third dimension refers to the years in the period 2001-2009.
ISTAT (2011). Note metodologiche, la misura dell'occupazione non regolare nelle stime di contabilita nazionale [online]. Roma.
ISTAT (2011). Note metodologiche, la misura dell'occupazione non regolare nelle stime di contabilita nazionale [online]. Roma.
Di Palma M.A., Filzmoser P., Gallo M. and Hron, K. (2016). A robust CP model for compositional data, submitted.
data(ulabor) dim(ulabor) str(ulabor) ## Plot robust and non-robust DD-plots of the ilr-transformed data usr <- par(mfrow=c(1,2)) res1 <- Parafac(ulabor, robust=TRUE, coda.transform="ilr") res2 <- Parafac(ulabor, coda.transform="ilr") plot(res1) plot(res2) par(usr) ## Not run: ## Plot Orthonormalized A-mode component plot res <- Parafac(ulabor, robust=TRUE, coda.transform="ilr") plot(res, which="comp", mode="A", main="Component plot, A-mode") ## Plot Orthonormalized B-mode component plot plot(res, which="comp", mode="B", main="Component plot, B-mode") ## Plot Orthonormalized B-mode component plot plot(res, which="comp", mode="C", main="Component plot, C-mode") ## Per component plot ## adapted for the example and only for robust, ilr transformed model ## ## res <- Parafac(ulabor, robust=TRUE, coda.transform="ilr") plot(res, which="percomp") # component 1 plot(res, which="percomp", comp=2) # component 2 ## End(Not run)
data(ulabor) dim(ulabor) str(ulabor) ## Plot robust and non-robust DD-plots of the ilr-transformed data usr <- par(mfrow=c(1,2)) res1 <- Parafac(ulabor, robust=TRUE, coda.transform="ilr") res2 <- Parafac(ulabor, coda.transform="ilr") plot(res1) plot(res2) par(usr) ## Not run: ## Plot Orthonormalized A-mode component plot res <- Parafac(ulabor, robust=TRUE, coda.transform="ilr") plot(res, which="comp", mode="A", main="Component plot, A-mode") ## Plot Orthonormalized B-mode component plot plot(res, which="comp", mode="B", main="Component plot, B-mode") ## Plot Orthonormalized B-mode component plot plot(res, which="comp", mode="C", main="Component plot, C-mode") ## Per component plot ## adapted for the example and only for robust, ilr transformed model ## ## res <- Parafac(ulabor, robust=TRUE, coda.transform="ilr") plot(res, which="percomp") # component 1 plot(res, which="percomp", comp=2) # component 2 ## End(Not run)
Conducts matricizations of a three-way array into matrices according to the selected mode.
unfold(x, mode=c("A", "B", "C"))
unfold(x, mode=c("A", "B", "C"))
x |
Array to be unfolded |
mode |
the selected mode for unfolding |
A matrix represnting the input array, according to the selected mode:
Mode=A: B
-mode entities are nested within C
-mode entities (all the frontal slices of the array next to each other)
item Mode=B: C
-mode entities nested within A
-mode entities (all the horizontal slices of the array next to each other)
item Mode C: A
-mode entities nested within B
-mode entities (all the lateral slices of the array next to each other)
Valentin Todorov [email protected]
H.A.L. Kiers (2000). Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics 14:105–122.
(X <- array(1:24, c(4,3,2))) dim(X) ## matricize the array ## matricized X with the A-mode entities in its rows ## all the frontal slices of the array next to each other ## (Xa <- unfold(X)) dim(Xa) ## matricized X with the B-mode entities in its rows ## all the horizontal slices of the array next to each other ## (Xb <- unfold(X, mode="B")) dim(Xb) ## matricized X with the C-mode entities in its rows ## all the lateral slices of the array next to each other ## (Xc <- unfold(X, mode="C")) dim(Xc)
(X <- array(1:24, c(4,3,2))) dim(X) ## matricize the array ## matricized X with the A-mode entities in its rows ## all the frontal slices of the array next to each other ## (Xa <- unfold(X)) dim(Xa) ## matricized X with the B-mode entities in its rows ## all the horizontal slices of the array next to each other ## (Xb <- unfold(X, mode="B")) dim(Xb) ## matricized X with the C-mode entities in its rows ## all the lateral slices of the array next to each other ## (Xc <- unfold(X, mode="C")) dim(Xc)
A three-way array containing manufacturing value added by technology intensity for 55 countries in the period 2000–2010. UNIDO maintains a unique database containing key industrial statistics indicators for more than 160 countries in the world in the period 1963-2011: INDSTAT 2, available at https://stat.unido.org. The data are organized according to the International Standard Industrial Classification of all economic activities (ISIC) Revision 3 at 2-digit level. The present data set was created by aggregating the 23 2-digit divisions into five groups according to technology intensity, using the UNIDO derived classification (Upadhyaya, 2011). Then 55 countries were selected which have relatively complete data in the period 2000–2010.
data(va3way)
data(va3way)
A three-way array with dimension 55x5x11. The first dimension refers to 55 countries. The second dimension refers to the five categories of technology intensity described above. The third dimension refers to the years in the period 2000–2010.
Note that the values in the second mode (sectors) sum up to a constant - the total manufacturing value added of a country in a given year and thus the data set has a compositional character.
Upahdyaya S (2011). Derived classifications for industrial performance indicators. In Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session STS022).
Upadhyaya S, Todorov V (2008). UNIDO Data Quality. UNIDO Staff Working Paper, Vienna.
data(va3way) ct <- 2 x <- va3way[ct,,]/1000000 plot(colnames(x), x[1,], ylim=c(min(x), max(x)), type="n", ylab="Manufacturing Value Added in million USD", xlab="Years") for(i in 1:nrow(x)) lines(colnames(x), x[i,], col=i) legend("topleft", legend=rownames(x), col=1:nrow(x), lwd=1) title(paste("Coutnry: ", rownames(va3way[,,1])[ct])) ## Treat quickly and dirty the zeros in the data set (if any) ## in order to be able to perform ilr transformation: va3way[va3way==0] <- 0.001 res <- Tucker3(va3way) ## ## Not yet a print function ## print(res$fit) print(res$A) ## Print the core matrix print(res$GA) ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") ## Paired component plot, mode A plot(res, which="comp", main="Paired component plot (mode A)") ## Paired component plot, mode B plot(res, which="comp", mode="B", main="Paired component plot (mode B)") ## Joint biplot plot(res, which="jbplot", main="Joint biplot") ## Trajectory plot(res, which="tjplot", main="Trajectory biplot")
data(va3way) ct <- 2 x <- va3way[ct,,]/1000000 plot(colnames(x), x[1,], ylim=c(min(x), max(x)), type="n", ylab="Manufacturing Value Added in million USD", xlab="Years") for(i in 1:nrow(x)) lines(colnames(x), x[i,], col=i) legend("topleft", legend=rownames(x), col=1:nrow(x), lwd=1) title(paste("Coutnry: ", rownames(va3way[,,1])[ct])) ## Treat quickly and dirty the zeros in the data set (if any) ## in order to be able to perform ilr transformation: va3way[va3way==0] <- 0.001 res <- Tucker3(va3way) ## ## Not yet a print function ## print(res$fit) print(res$A) ## Print the core matrix print(res$GA) ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") ## Paired component plot, mode A plot(res, which="comp", main="Paired component plot (mode A)") ## Paired component plot, mode B plot(res, which="comp", mode="B", main="Paired component plot (mode B)") ## Joint biplot plot(res, which="jbplot", main="Joint biplot") ## Trajectory plot(res, which="tjplot", main="Trajectory biplot")
Water quality data for three years of seasonal compositional groundwater chemistry data for 14 wells at a study site in Wyoming, USA. Routine water quality monitoring typically involves measurement of J parameters and constituents measured at I number of static locations at K sets of seasonal occurrences.
data("waterquality")
data("waterquality")
A three-way array with dimension 14x12x10. The first dimension refers to 14 wells at a study site in Wyoming, USA. The second dimension refers to the ten most reactive and indicative dissolved constituents at the site: B, Ba, Ca, Cl, K, Mg, Na, Si, Sr, and SO4. In addition, the concentration of water in each sample was calculated. The third dimension refers to the time of collection - ten occasions.
Engle, M.A., Gallo, M., Schroeder, K.T., Geboy, N.J., Zupancic, J.W., (2014). Three-way compositional analysis of water quality monitoring data. Environmental and Ecological Statistics, 21(3):565-581.
data(waterquality) dim(waterquality) # [1] 14 12 10 dim(waterquality[,,1]) # [1] 14 12 rownames(waterquality[,,1]) # the 14 wells colnames(waterquality[,,1]) # the 12 chemical compositions dim(waterquality[,1,]) # [1] 14 10 colnames(waterquality[,1,]) # the ten occasions (res <- Tucker3(waterquality, robust=FALSE, coda.transform="ilr")) ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") ## Paired component plot, mode A plot(res, which="comp", main="Paired component plot (mode A)") ## Paired component plot, mode B plot(res, which="comp", mode="B", main="Paired component plot (mode B)") ## Joint biplot plot(res, which="jbplot", main="Joint biplot") ## Trajectory plot(res, which="tjplot", main="Trajectory biplot")
data(waterquality) dim(waterquality) # [1] 14 12 10 dim(waterquality[,,1]) # [1] 14 12 rownames(waterquality[,,1]) # the 14 wells colnames(waterquality[,,1]) # the 12 chemical compositions dim(waterquality[,1,]) # [1] 14 10 colnames(waterquality[,1,]) # the ten occasions (res <- Tucker3(waterquality, robust=FALSE, coda.transform="ilr")) ## Distance-distance plot plot(res, which="dd", main="Distance-distance plot") ## Paired component plot, mode A plot(res, which="comp", main="Paired component plot (mode A)") ## Paired component plot, mode B plot(res, which="comp", mode="B", main="Paired component plot (mode B)") ## Joint biplot plot(res, which="jbplot", main="Joint biplot") ## Trajectory plot(res, which="tjplot", main="Trajectory biplot")