Package 'rrcov3way'

Title: Robust Methods for Multiway Data Analysis, Applicable also for Compositional Data
Description: Provides methods for multiway data analysis by means of Parafac and Tucker 3 models. Robust versions (Engelen and Hubert (2011) <doi:10.1016/j.aca.2011.04.043>) and versions for compositional data are also provided (Gallo (2015) <doi:10.1080/03610926.2013.798664>, Di Palma et al. (2018) <doi:10.1080/02664763.2017.1381669>). Several optimization methods alternative to ALS are available (Simonacci and Gallo (2019) <doi:10.1016/j.chemolab.2019.103822>, Simonacci and Gallo (2020) <doi:10.1007/s00500-019-04320-9>).
Authors: Valentin Todorov [aut, cre] , Violetta Simonacci [aut], Maria Anna Di Palma [aut], Michele Gallo [aut]
Maintainer: Valentin Todorov <[email protected]>
License: GPL (>=3)
Version: 0.5-0
Built: 2024-11-10 03:44:34 UTC
Source: https://github.com/valentint/rrcov3way

Help Index


Amino acids fluorescence data.

Description

A data set containing five simple laboratory-made samples where each sample contains different amounts of tyrosine, tryptophan and phenylalanine dissolved in phosphate buffered water. The samples were measured by fluorescence (excitation 240-300 nm, emission 250-450 nm, 1 nm intervals) on a PE LS50B spectrofluorometer.

Usage

data(amino)

Format

A three-way array with dimension 5x201x61. The first dimension refers to the 5 samples. The second dimension refers to the emission measurements (250-450nm, 1nm intervals). The third dimension refers to the excitation (240-300 nm, 1nm intervals).

Source

https://ucphchemometrics.com/datasets/.

References

Bro, R, PARAFAC: Tutorial and applications, Chemometrics and Intelligent Laboratory Systems, 1997, 38, 149-171 Bro, R, Multi-way Analysis in the Food Industry. Models, Algorithms, and Applications. 1998. Ph.D. Thesis, University of Amsterdam (NL) & Royal Veterinary and Agricultural University (DK). Kiers, H.A.L. (1998) A three-step algorithm for Candecomp/Parafac analysis of large data sets with multicollinearity, Journal of Chemometrics, 12, 155-171.

Examples

## Not run: 

 data(amino)
 ##  Plotting Emission spectra
 oldpar <- par(mfrow=c(2,1))
 matplot(t(amino[,,1]), type="l",
     xlab="Wavelength/nm", ylab="Intensity",
     main="Fluorescence emission spectra")
 matplot(t(amino[,,5]), type="l",
     xlab="Wavelength/nm", ylab="Intensity",
     main="Fluorescence emission spectra")
 par(oldpar)

 ##  Plotting excitation spectra
 oldpar <- par(mfrow=c(2,1))
 matplot(t(amino[,1,]), type="l",
     xlab="Wavelength/nm", ylab="Intensity",
     main="Fluorescence excitation spectra")
 matplot(t(amino[,30,]), type="l",
     xlab="Wavelength/nm", ylab="Intensity",
     main="Fluorescence excitation spectra")
 par(oldpar)


## End(Not run)

Chemical composition of water in the main stream of Arno river

Description

Chemical composition of water in the main stream of Arno river.

Usage

data("Arno")

Format

A three-way array with dimension 23x11x4. The first dimension refers to 23 distances from the spring. The second dimension refers to the 11 chemical compositions. The third dimension refers to the time of collection - four occasions.

Details

The Arno data example was used in Gallo and Buccinati (2013) to illustrate a particular version of the Tucker model, known as the weighted principal component analysis. The Tucker3 results are usually given in the form of tables or plots and in this work for the representation of the Tucker3 results of logratio data, is proposed to use one-mode plots, clr-joint biplots (Gallo, 2015), and trajectory plots.

Source

Nisi B., Vaselli O., Buccianti A., Minissale A., Delgado-Huertas A., Tassi F., Montegrossi G. (2008). Geochemical and isotopic investigation of the dissolved load in the running waters from the Arno valley: evaluation of the natural and anthropogenic input. In Memorie Descrittive della Carta Geologica d'Italia, Nisi (eds.), 79: 1-91.

Nisi B., Buccianti A., Vaselli O., Perini G., Tassi F., Minissale A., Montegrossi G. (2008) Hydrogeochemistry and strontium isotopes in the Arno river basin (Tuscany, Italy): Constraints on natural controls by statistical modeling. Journal of Hydrology 360: 166-183.

References

Gallo M. and Buccianti A. (2013). Weighted principal component analysis for compositional data: application example for the water chemistry of the Arno river (Tuscany, central Italy), Environmetrics, 24(4):269-277.

Gallo M. (2015). Tucker3 model for compositional data. Communications in Statistics-Theory and Methods, 44(21):4441-4453.

Examples

data(Arno)
dim(Arno)               # [1] 23 11  4
dim(Arno[,,1])          # [1] 23 11
rownames(Arno[,,1])     # the 23 distances from the spring
colnames(Arno[,,1])     # the 11 chemical compositions
dim(Arno[,1,])          # [1] 23  4
colnames(Arno[,1,])     # the four occasions

res <- Tucker3(Arno, robust=FALSE, coda.transform="ilr")
res

## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")

## Paired component plot, mode A
plot(res, which="comp", main="Paired component plot (mode A)")

## Paired component plot, mode B
plot(res, which="comp", mode="B", main="Paired component plot (mode B)") 

## Joint biplot
plot(res, which="jbplot", main="Joint biplot") 

## Trajectory
plot(res, which="tjplot", main="Trajectory biplot")

Coefficient of factor congruence (phi)

Description

The function congruence(x, y) computes the Tucker's congruence (phi) coefficients among two sets of factors.

Usage

congruence(x, y = NULL)

Arguments

x

A vector or matrix of factor loadings.

y

A vector or matrix of factor loadings (may be NULL).

Details

Find the Tucker's coefficient of congruence between two sets of factor loadings. Factor congruences are the cosines of pairs of vectors defined by the loadings matrix and based at the origin. Thus, for loadings that differ only by a scaler (e.g. the size of the eigen value), the factor congruences will be 1.

For factor loading vectors of X and Y the measure of factor congruence, phi, is

ϕ=XY(X2)(Y2).\phi = \frac{\sum X Y}{\sqrt{\sum(X^2)\sum(Y^2)}} .

If y=NULL and x is a numeric matrix, the congruence coefficients between the columns of the matrix x are returned. The result is a symmetric matrix with ones on the diagonal. If two matrices are provided, they must have the same size and the result is a square matrix containing the congruence coefficients between all pairs of columns of the two matrices.

Value

A matrix of factor congruences.

Author(s)

Valentin Todorov, [email protected]

References

L.R Tucker (1951). A method for synthesis of factor analysis studies. Personnel Research Section Report No. 984. Department of the Army, Washington, DC.

Examples

require(rrcov)

 data(delivery, package="robustbase")
 X <- getLoadings(PcaClassic(delivery))
 Y <- getLoadings(PcaHubert(delivery, k=3))
 round(congruence(X,Y),3)

Alternating Least Squares (ALS) for Candecomp/Parafac (CP)

Description

Alternating Least Squares (ALS) algorithm with optional constraints for the minimization of the Candecomp/Parafac (CP) loss function.

Usage

cp_als(
  X,
  n,
  m,
  p,
  ncomp,
  const = "none",
  start = "random",
  conv = 1e-06,
  maxit = 10000,
  trace = FALSE
)

Arguments

X

A three-way array or a matrix. If X is a matrix (matricised threeway array), n, m and p must be given and are the number of A-, B- and C-mode entities respectively

n

Number of A-mode entities

m

Number of B-mode entities

p

Number of C-mode entities

ncomp

Number of components to extract

const

Optional constraints for each mode. Can be a three element character vector or a single character, one of "none" for no constraints (default), "orth" for orthogonality constraints, "nonneg" for nonnegativity constraints or "zerocor" for zero correlation between the extracted factors. For example, const="orth" means orthogonality constraints for all modes, while const=c("orth", "none", "none") sets the orthogonality constraint only for mode A.

start

Initial values for the A, B and C components. Can be "svd" for starting point of the algorithm from SVD's, "random" for random starting point (orthonormalized component matrices or nonnegative matrices in case of nonnegativity constraint), or a list containing user specified components.

conv

Convergence criterion, default is conv=1e-6.

maxit

Maximum number of iterations, default is maxit=10000.

trace

Logical, provide trace output.

Value

The result of the decomposition as a list with the following elements:

  • fit Value of the loss function

  • fp Fit value expressed as a percentage

  • ss Sum of squares

  • A Component matrix for the A-mode

  • B Component matrix for the B-mode

  • C Component matrix for the C-mode

  • iter Number of iterations

  • tripcos Minimal triple cosine between two components across the three component matrices, used to inspect degeneracy

  • mintripcos Minimal triple cosine during the iterative algorithm observed at every 10 iterations, used to inspect degeneracy

  • ftiter Matrix containing in each row the function value and the minimal triple cosine at every 10 iterations

  • const Optional constraints (same as the input parameter const)

Note

The argument const should be a three element character vector. Set const[j]="none" for unconstrained update in j-th mode weight matrix (the default), const[j]="orth" for orthogonal update in j-th mode weight matrix, const[j]="nonneg" for non-negative constraint on j-th mode or const[j]="zerocor" for zero correlation between the extracted factors. The default is unconstrained update for all modes.

The loss function to be minimized is sum(k)X(k)AD(k)B2sum(k)|| X(k) - A D(k) B' ||^2, where D(k)D(k) is a diagonal matrix holding the k-th row of C.

Author(s)

Valentin Todorov, [email protected]

References

Harshman, R.A. (1970). Foundations of Parafac procedure: models and conditions for an "explanatory" multi-mode factor analysis. UCLA Working Papers in Phonetics, 16: 1–84.

Harshman, R. A., & Lundy, M. E. (1994). PARAFAC: Parallel factor analysis. Computational Statistics and Data Analysis, 18, 39–72.

Lawson CL, Hanson RJ (1974). Solving Least Squares Problems. Prentice Hall, Englewood Cliffs, NJ.

Examples

## Not run: 
## Example with the OECD data
 data(elind)
 dim(elind)

 res <- cp_als(elind, ncomp=3)
 res$fp
 res$fp
 res$iter

 res <- cp_als(elind, ncomp=3, const="nonneg")
 res$A

## End(Not run)

Alternating Trilinear Decomposition (ATLD) for Candecomp/Parafac (CP)

Description

Alternating Trilinear Decomposition algorithm for estimating the Candecomp/Parafac (CP) model. It is based on an alternating least squares principle and replaces the iterative procedure used in the traditional PARAFAC algorithm by an improved procedure. This improved procedure contains Moore-Penrose pseudoinverse computations based on singular value decomposition (SVD) which should be theoretically more robust to similarities in spectra and time profiles

Usage

cp_atld(
  X,
  n,
  m,
  p,
  ncomp,
  conv = 1e-06,
  start = "random",
  maxit = 5000,
  trace = FALSE
)

Arguments

X

A three-way array or a matrix. If X is a matrix (matricised threeway array), n, m and p must be given and are the number of A-, B- and C-mode entities respectively

n

Number of A-mode entities

m

Number of B-mode entities

p

Number of C-mode entities

ncomp

Number of components to extract

conv

Convergence criterion, default is conv=1e-6.

start

Initial values for the A, B and C components. Can be "svd" for starting point of the algorithm from SVD's, "random" for random starting point (orthonormalized component matrices or nonnegative matrices in case of nonnegativity constraint), or a list containing user specified components.

maxit

Maximum number of iterations, default is maxit=10000.

trace

Logical, provide trace output.

Value

The result of the decomposition as a list with the following elements:

  • fit Value of the loss function

  • fp Fit value expressed as a percentage

  • ss Sum of squares

  • A Component matrix for the A-mode

  • B Component matrix for the B-mode

  • C Component matrix for the C-mode

  • iter Number of iterations

  • tripcos Minimal triple cosine between two components across the three component matrices, used to inspect degeneracy

  • mintripcos Minimal triple cosine during the iterative algorithm observed at every 10 iterations, used to inspect degeneracy

  • ftiter Matrix containing in each row the function value and the minimal triple cosine at every 10 iterations

Author(s)

Valentin Todorov, [email protected]; Violetta Simonacci, [email protected]

References

H.-L. Wu, M. Shibukawa, K. Oguma, An alternating trilinear decomposition algorithm with application to calibration of HPLC-DAD for simultaneous determination of overlapped chlorinated aromatic hydrocarbons, Journal of Chemometrics 12 (1998) 1–26.

Examples

## Not run: 
## Example with the OECD data
 data(elind)
 dim(elind)

 res <- cp_atld(elind, ncomp=3)
 res$fp
 res$fp
 res$iter

## End(Not run)

Generate PARAFAC data sets, optionally with outliers

Description

Generates nsim data sets according to the given parameters. If eps > 0, the specified fraction of random outliers of the identified by the parameter type type are added to the data sets.

Usage

cp_gen(
  I = 20,
  J = 20,
  K = 20,
  nsim = 200,
  nf = 3,
  noise = 0.05,
  noise1 = 0,
  Acol = TRUE,
  Bcol = TRUE,
  Ccol = TRUE,
  congA = 0.5,
  congB = 0.5,
  congC = 0.5,
  eps = 0,
  type = c("none", "bl", "gl", "og"),
  c1 = 10,
  c2 = 0.1,
  silent = FALSE
)

Arguments

I

number of observations

J

number of variables

K

number of occasions

nsim

number of data sets to generate

nf

number of PARAFAC components

noise

level of homoscedastic (HO) noise

noise1

level of heteroscedastic (HE) noise

Acol

whether to apply collinearity with factor congA to mode A

Bcol

whether to apply collinearity with factor congB to mode B

Ccol

whether to apply collinearity with factor congC to mode C

congA

collinearity factor for mode A

congB

collinearity factor for mode B

congC

collinearity factor for mode C

eps

fraction of outliers (percent contamination)

type

type of outliers: one of "none" for no outliers (possible only of eps==0), "bl" for bad leverage points, "gl" for good leverage points and "og" for orthogonal outliers

c1

parameter for outlier generation (c1=10 for type="bl" or type="gl" and c1=1 for type="og")

c2

parameter for outlier generation (c2=0.1 for type="bl" or type="og" and c2=0 for type="gl")

silent

whether to issue warnings

Value

A list consisting of the following lists:

  • As list of nsim matrices for the mode A

  • Bs list of nsim matrices for the mode B

  • Cs list of nsim matrices for the mode C

  • Xs list of nsim PARAFAC data sets, each with dimension IxJxK

  • Os list of nsim vectors containing the added outliers (if any)

  • param list of parameters used for generation of the data sets

References

Todorov, V. and Simonacci, V. and Gallo, M. and Trendafilov, N. (2023). A novel estimation procedure for robust CANDECOMP/PARAFAC model fitting. Econometrics and Statistics. In press.

Tomasi, G. and Bro, R., (2006). A comparison of algorithms for fitting the PARAFAC model. Computational Statistics & Data Analysis 50 (7), 1700–1734.

Faber, N.M. and Bro, R. and Hopke, P.K. (2003). Recent developments in CANDECOMP/PARAFAC algorithms: A critical review. Chemometrics and Intelligent Laboratory Systems 65, 119–137.

Examples

##  Generate one PARAFAC data set (nsim=1) with R=2 components (nf=2) and dimensions
 ##  50x10x10. Apply 0.15 homoscedastic noise and 0.10 heteroscedastic noise, apply 
 ##  collinearity with congruence factor 0.5 to all modes. Add 20% bad leverage points.

 library(rrcov3way)
 xdat <- cp_gen(I=50, J=100, K=10, nsim=1, nf=2,
     noise=0.15, noise1=0.10, Acol=TRUE, Bcol=TRUE, Ccol=TRUE,
     congA=0.5, congB=0.5, congC=0.5,
     eps=0.2, type="bl")
 names(xdat)

ATLD-ALS algorithm for Candecomp/Parafac (CP)

Description

Integrated algorithm combining ATLD and ALS for the minimization of the Candecomp/Parafac (CP) loss function.

Usage

cp_int2(
  X,
  n,
  m,
  p,
  ncomp,
  initconv = 0.01,
  conv = 1e-06,
  const = "none",
  start = "random",
  maxit = 5000,
  trace = FALSE
)

Arguments

X

A three-way array or a matrix. If X is a matrix (matricised threeway array), n, m and p must be given and are the number of A-, B- and C-mode entities respectively

n

Number of A-mode entities

m

Number of B-mode entities

p

Number of C-mode entities

ncomp

Number of components to extract

initconv

Convergence criterion for the initialization phase (ATLD), default is conv=1e-2.

conv

Convergence criterion, default is conv=1e-6.

const

Optional constraints for each mode. Can be a three element character vector or a single character, one of "none" for no constraints (default), "orth" for orthogonality constraints or "zerocor" for zero correlation between the extracted factors. For example, const="orth" means orthogonality constraints for all modes, while const=c("orth", "none", "none") sets the orthogonality constraint only for mode A.

start

Initial values for the A, B and C components. Can be "svd" for starting point of the algorithm from SVD's, "random" for random starting point (orthonormalized component matrices or nonnegative matrices in case of nonnegativity constraint), or a list containing user specified components.

maxit

Maximum number of iterations, default is maxit=10000.

trace

Logical, provide trace output.

Value

The result of the decomposition as a list with the following elements:

  • fit Value of the loss function

  • fp Fit value expressed as a percentage

  • ss Sum of squares

  • A Component matrix for the A-mode

  • B Component matrix for the B-mode

  • C Component matrix for the C-mode

  • iter Number of iterations

  • tripcos Minimal triple cosine between two components across the three component matrices, used to inspect degeneracy

  • mintripcos Minimal triple cosine during the iterative algorithm observed at every 10 iterations, used to inspect degeneracy

  • ftiter Matrix containing in each row the function value and the minimal triple cosine at every 10 iterations

  • const Optional constraints (same as the input parameter const)

Note

The argument const should be a three element character vector. Set const[j]="none" for unconstrained update in j-th mode weight matrix (the default), const[j]="orth" for orthogonal update in j-th mode weight matrix or const[j]="zerocor" for zero correlation between the extracted factors. The default is unconstrained update for all modes.

The loss function to be minimized is sum(k)X(k)AD(k)B2sum(k)|| X(k) - A D(k) B' ||^2, where D(k)D(k) is a diagonal matrix holding the k-th row of C.

Author(s)

Valentin Todorov, [email protected]; Violetta Simonacci, [email protected]

References

H.-L. Wu, M. Shibukawa, K. Oguma, An alternating trilinear decomposition algorithm with application to calibration of HPLC-DAD for simultaneous determination of overlapped chlorinated aromatic hydrocarbons, Journal of Chemometrics 12 (1998) 1–26.

Simonacci, V. and Gallo, M. (2020). An ATLD–ALS method for the trilinear decomposition of large third-order tensors, Soft Computing 24 13535–13546.

Todorov, V. and Simonacci, V. and Gallo, M. and Trendafilov, N. (2023). A novel estimation procedure for robust CANDECOMP/PARAFAC model fitting. Econometrics and Statistics. In press.

Examples

## Not run: 
## Example with the OECD data
 data(elind)
 dim(elind)

 res <- cp_int2(elind, ncomp=3)
 res$fp
 res$fp
 res$iter

 res <- cp_int2(elind, ncomp=3, const="orth")
 res$A

## End(Not run)

Postprocessing: renormalization, reflection and reordering; access to some of the components of the model.

Description

The estimated model will be renormalized, reflected (change of sign) or the components will be reordered. Functions that provide access to some components of the model: coordinates, weights.

Usage

## S3 method for class 'tucker3'
do3Postprocess(x, reflectA, reflectB, reflectC, reorderA, reorderB, reorderC, ...)
    ## S3 method for class 'parafac'
do3Postprocess(x, reflectA, reflectB, reflectC, reorder, ...)
    ## S3 method for class 'parafac'
coordinates(x, mode = c("A", "B", "C"), type = c("normalized", "unit", "principal"), ...)
    ## S3 method for class 'tucker3'
coordinates(x, mode = c("A", "B", "C"), type = c("normalized", "unit", "principal"), ...)
    ## S3 method for class 'parafac'
weights(object, ...)
    ## S3 method for class 'tucker3'
weights(object, mode = c("A", "B", "C"), ...)
    ## S3 method for class 'parafac'
reflect(x, mode = c("A", "B", "C"), rsign = 1, ...)
    ## S3 method for class 'tucker3'
reflect(x, mode = c("A", "B", "C"), rsign = 1, ...)

Arguments

x

Tucker3 or Parafac solution

object

Tucker3 or Parafac solution (alternative of x for the generic function weights())

reflectA

How to handle the signs of the components of mode A - can be a single number or a vector with length of the number of components of A

reflectB

How to handle the signs of the components of mode B - can be a single number or a vector with length of the number of components of B

reflectC

How to handle the signs of the components of mode C - can be a single number or a vector with length of the number of components of C

reorder

How to reorder the components of a Parafac solution - a vector with length of the number of components

reorderA

How to reorder the components of mode A - a vector with length of the number of components of A giving the new order

reorderB

How to reorder the components of mode B - a vector with length of the number of components of B giving the new order

reorderC

How to reorder the components of mode C - a vector with length of the number of components of C giving the new order

mode

For which mode to provide the coordinates or weights. Default is mode A

type

Which type of coordinates to provide. Default is "normalized"

rsign

How to change the sign of the components of the given mode. Can be a single number or a vector with length of the number of components of the corresponding mode.

...

Potential further arguments passed to lower level functions.

Value

The output value of do3Postproces() is the postprocessed solution, Parafac or Tucker3. The output of weights() and coordinates() are the respective values.

Author(s)

Valentin Todorov [email protected] and Maria Anna Di Palma [email protected] and Michele Gallo [email protected]

References

Kroonenberg (2008). Applied multiway data analysis. Wiley series in probability and statistics. Hoboken NJ, Wiley.

Examples

data(elind)
    x1 <- do3Scale(elind, center=TRUE, scale=TRUE)
    (cp <- Parafac(x1, ncomp=3, const=c("orth", "none", "none")))
    
    cp$B
    cp1 <- do3Postprocess(cp, reflectB=-1)      # change the sign of all components of B
    cp1$B
    weights(cp1)
    coordinates(cp1)
    coordinates(cp1, type="principal")

    ## Same as above - the centering and scaling is done inside the Parafac procedure
    (cp1 <- Parafac(elind, ncomp=3, const=c("orth", "none", "none"), 
        center=TRUE, scale=TRUE))

    ## Robust estimation with robust scaling with median and mad
    (cp1 <- Parafac(elind, ncomp=3, const=c("orth", "none", "none"), 
        center=median, scale=mad, robust=TRUE))

    ## Robust estimation with robust scaling with median and Qn (Rousseeuw and Croux, 1993)
    require(robustbase)
    (cp1 <- Parafac(elind, ncomp=3, const=c("orth", "none", "none"), 
        center=median, scale=Qn, robust=TRUE))

Varimax Rotation for Tucker3 models

Description

Computes varimax rotation of the core and component matrix of a Tucker3 model to simple structure.

Usage

do3Rotate(x, ...)

## S3 method for class 'tucker3'
do3Rotate(x, weights = c(0, 0, 0), rotate = c("A", "B", "C"), ...)

Arguments

x

A Tucker 3 object

...

Potential further arguments passed to called functions.

weights

A numeric vector with length 3: relative weights (greater or equal 0) for the simplicity of the component matrices A, B and C respectively.

rotate

Within which mode to rotate the Tucker3 solution: rotate="A" means to rotate the component matrix A of mode A; rotate=c("A", "B") means to rotate the component matrices A and B of modes A and B respectively. Default is to rotate all modes, i.e. rotate=c("A", "B", "C").

Value

A list including the following components:

Author(s)

Valentin Todorov, [email protected]

Examples

## Rotation of a Tucker3 solution
 data(elind)
 (t3 <- Tucker3(elind, 3, 2, 2))
 xout <- do3Rotate(t3, c(3, 3, 3), rotate=c("A", "B", "C"))
 xout$vvalue

Centering and scaling

Description

Centering and/or normalization of a three way array or a matricized array across one mode (modes indicated by "A", "B" or "C").

Usage

## S3 method for class 'tucker3'
do3Scale(x, renorm.mode = c("A", "B", "C"), ...)
    ## S3 method for class 'parafac'
do3Scale(x, renorm.mode = c("A", "B", "C"), ...)
    ## Default S3 method:
do3Scale(x, center = FALSE, scale = FALSE, 
        center.mode = c("A", "B", "C", "AB", "AC", "BC", "ABC"), 
        scale.mode = c("B", "A", "C"), 
        only.data=TRUE, ...)

Arguments

x

Three dimensional array of order (I x J x K) or a matrix (or data.frame coerced to a matrix) of order (I x JK) containing the matricized array (frontal slices)

center

Whether and how to center the data. Can be NULL, logical TRUE or FALSE, function or a numeric vector with length corresponding to the number of columns in the corresponding mode. If center=TRUE, mean() is used; default is center=FALSE.

scale

Whether and how to scale the data. Can be NULL, logical TRUE or FALSE, function or a numeric vector with length corresponding to the number of columns in the corresponding mode. If scale=TRUE, sd() is used; default is scale=FALSE.

center.mode

Across which mode to center. Default is center.mode="A"

scale.mode

Within which mode to scale. Default is scale.mode="B"

renorm.mode

Within which mode to renormalize a Parafac or Tucker3 solution. See in Details how this is performed for the different models. Default is renorm.mode="A"

only.data

Whether to return only the centered/scaled data or also the center and the scale themselves. Default is only.data=TRUE

...

potential further arguments passed to lower level functions.

Value

A named list, consisting of the centered and/or scaled data, a center vector, a scale vector and the mode in which the data were centered/scaled.

Author(s)

Valentin Todorov [email protected] and Maria Anna Di Palma [email protected] and Michele Gallo [email protected]

References

Kiers, H.A.L. (2000).Towards a standardizrd notation and terminology in multiway analysis. Journal of Chemometrics, 14:105-122.

Kroonenberg, P.M. (1983).Three-mode principal component analysis: Theory and applications (Vol. 2), DSWO press.

Examples

data(elind)
    (x1 <- do3Scale(elind, center=TRUE, scale=TRUE))
    (x2 <- do3Scale(elind, center=TRUE, scale=TRUE, center.mode="B"))
    (x3 <- do3Scale(elind, center=TRUE, scale=TRUE, center.mode="C", scale.mode="C"))

Dorrit fluorescence data.

Description

A data set with 27 synthetic samples containing different concentrations of four analytes (hydroquinone, tryptophan, phenylalanine and dopa) measured in a Perkin-Elmer LS50 B fluorescence spectrometer.

Usage

data(dorrit)

Format

A three-way array with dimension 27 x 116 x 18. The first dimension refers to the 27 samples. The second dimension refers to the emission measurements (251-481nm, 2nm intervals). The third dimension refers to the excitation (230-315nm, 5nm intervals).

Details

Each fluorescence landscape corresponding to each sample in the original data set consists of 233 emission wavelengths (250-482 nm) and 24 excitation wavelengths (200-315 nm taken each 5 nm). The fluorescence data is three-way. Ideally, the data is trilinear, the components of the modes corresponding to concentrations (27), emission spectra (233) and excitation spectra (24). Hence a three-way PARAFAC model should be capable of uniquely and meaningfully describing the variation in the data set.

The data set is modified as described in Engelen and Hubert (2011): the emission wavelengths are taken at 2 nm, noisy parts situated at the excitation wavelengths from 200 to 230 nm and at emission wavelengths below 250 nm are excluded. The severe Rayleigh scattering areas present in all samples are replaced by interpolated values. Thus we end up with a (27 x 116 x 18) data array.

Source

https://ucphchemometrics.com/datasets/.

References

Bro, R, Sidiropoulos, ND and Smilde, AK (2002). Maximum likelihood fitting using ordinary least squares algorithms. Journal of Chemometrics, 16(8–10), 387–400.

Riu, J and Bro, R (2003) Jack-knife estimation of standard errors and outlier detection in PARAFAC models. Chemometrics and Intelligent Laboratory Systems, 65(1), 35–49

Engelen, S and Hubert, M (2011) Detecting outlying samples in a parallel factor analysis model, Analytica Chemica Acta 705 155–165.

Baunsgaard, D (1999) Factors Affecting 3-way Modelling (PARAFAC) of Fluorescence Landscapes, Royal Veterinary and Agricultural University, Department of Dairy and Food Science, Frederiksberg, Denmark.

Examples

## Not run: 

 data(dorrit)
 ##  Plotting Emission spectra
 oldpar <- par(mfrow=c(2,1))
 matplot(t(dorrit[,,1]), type="l",
     xlab="Wavelength/nm", ylab="Intensity",
     main="Fluorescence emission spectra")
 matplot(t(dorrit[,,5]), type="l",
     xlab="Wavelength/nm", ylab="Intensity",
     main="Fluorescence emission spectra")
 par(oldpar)

 ##  Plotting excitation spectra
 oldpar <- par(mfrow=c(2,1))
 matplot(t(dorrit[,1,]), type="l",
     xlab="Wavelength/nm", ylab="Intensity",
     main="Fluorescence excitation spectra")
 matplot(t(dorrit[,30,]), type="l",
     xlab="Wavelength/nm", ylab="Intensity",
     main="Fluorescence excitation spectra")
 par(oldpar)

 persp(as.numeric(dimnames(dorrit)[[2]]),
     as.numeric(dimnames(dorrit)[[3]]), dorrit[4,,],
     xlab="Emission", ylab="Excitation", zlab="Intensity",
     theta = 30, phi = 30, expand = 0.5, col = "lightblue",
     ticktype="detailed")

 pp <- Parafac(dorrit, ncomp=4, robust=TRUE)
 plot(pp)


## End(Not run)

OECD Electronics Industries Data

Description

OECD publishes comparative statistics of the export size of various sectors of the electronics industry:

  1. information science,

  2. telecommunication products,

  3. radio and television equipment,

  4. components and parts,

  5. electromedical equipment, and

  6. scientific equipment.

The data consist of specialisation indices of electronics industries of 23 European countries for the years 1973–1979. The specialization index is defined as the proportion of the monetary value of an electronic industry compared to the total export value of manufactured goods of a country compared to the similar proportion for the world as a whole (see D'Ambra, 1985, p. 249 and Kroonenberg, 2008, p.282).

Usage

data(elind)

Format

A three-way array with dimension 23x6x7. The first dimension refers to 23 countries. The second dimension refers to the six indices of electronics industries. The third dimension refers to the years in the period 1978–1985.

Source

The data set is available from Pieter Kroonenberg's web site at: "three-mode.leidenuniv.nl/data/electronicindustriesinfo.htm"

References

D'Ambra, L. (1985). Alcune estensione dell'analisi in componenti principali per lo studio dei sistemi evolutivi. Uno studio sul commercio internazionale dell'elettronica. In: Ricerche Economiche. 2. del Dipartimento di Scienze Economiche Ca'Foscari, Venezia.

Kroonenberg PM (2008). Applied multiway data analysis. Wiley series in probability and statistics. John Wiley and Sons, Hoboken, NJ, p.282.

Examples

data(elind)

res <- Parafac(elind, robust=FALSE, coda.transform="none")

## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")

## Paired component plot, mode A
plot(res, which="comp", main="Paired component plot (mode A)")

## Paired component plot, mode B
plot(res, which="comp", mode="B", main="Paired component plot (mode B)") 

## Per-component plot
plot(res, which="percomp", comp=1, main="Per component plot") 

## all components plot
plot(res, which="allcomp", main="All components plot", legend.position="topright")

Sempe girls' growth curves data

Description

Thirty girls selected from a French auxiological study (1953-1975) to get insight into the physical growth patterns of children from ages four to fifteen, Sempe (1987). They were measured yearly between the ages 4 and 15 on the following eight variables:

  1. weight = Weight

  2. length = Length

  3. crump = Crown-rump length

  4. head = Head circumference

  5. chest = Chest circumference

  6. arm = Arm

  7. calf = Calf

  8. pelvis = Pelvis

The data set is three way data array of size 30 (girls) x 8 (variables) x 12 (years).

Usage

data("girls")

Format

The format is a three way array with the following dimensions: The first dimension refers to 30 girls. The second dimension refers to the eight variables measured on the girls. The third dimension refers to the years – 4 to 15.

Details

The data are generally preprocessed as standard multiway profile data. For details see Kroonenberg (2008), Chapters 6 and 15.

Source

The data sets are available from Pieter Kroonenberg's web site at: "three-mode.leidenuniv.nl/data/girlsgrowthcurvesinfo.htm"

References

Sempe, M. (1987). Multivariate and longitudinal data on growing children: Presentation of the French auxiological survey. In J.Janssen et al. Data analysis. The Ins and Outs of solving real problems (pp. 3-6). New York: Plenum Press.

Kroonenberg (2008). Applied multiway data analysis. Wiley series in probability and statistics. Hoboken NJ, Wiley.

Examples

data(girls)
str(girls)
## Center the data in mode A and find the "average girl"
center.girls <- do3Scale(girls, center=TRUE, only.data=FALSE)
X <- center.girls$x
center <- center.girls$center
average.girl <- as.data.frame(matrix(center, ncol=8, byrow=TRUE))
dimnames(average.girl) <- list(dimnames(X)[[3]], dimnames(X)[[2]])

## Divide these variables by 10 to reduce their range
average.girl$weight <- average.girl$weight/10
average.girl$length <- average.girl$length/10
average.girl$crrump <- average.girl$crrump/10

average.girl 
p <- ncol(average.girl)
plot(rownames(average.girl), average.girl[,1], ylim=c(min(average.girl), 
    max(average.girl)), type="n", xlab="Age", ylab="")
for(i in 1: p)
{
    lines(rownames(average.girl), average.girl[,i], lty=i, col=i) 
    points(rownames(average.girl), average.girl[,i], pch=i, col=i) 
}    
legend <- colnames(average.girl)
legend[1] <- paste0(legend[1], "*")
legend[2] <- paste0(legend[3], "*")
legend[3] <- paste0(legend[4], "*")
legend("topleft", legend=legend, col=1:p, lty=1:p, pch=1:p)

Parental behaviour in Japan

Description

The data are drawn from a study (Kojima, 1975) of the perception of parental behaviour by parents and their children. Two data sets, boys and girls are available as Kojima.boys and Kojima.girls.

  • Boys data were analysed in Kroonenberg (2008)

  • Girls data were analysed in Kroonenberg, Harshman, & Murakami (2009).

Usage

data(Kojima)

Format

Both data sets are three dimensional arrays:

  • boys: 150 x 18 x 4

  • girls: 153 x 18 x 4

The rows (1st mode) are 150 Japanese sons/153 Japanese daughters. The columns (2nd mode) are 18 scales (Acceptance, Child centerness, Possesiveness, etc.). The slices (3rd mode) are the 4 judgements (See Details for explanation).

Details

The boys data are ratings expressing the judgments of parents with respect to their own behaviour toward their sons, and the judgments of their sons with respect to their parents. Thus, there are four conditions:

  • Father-Own behaviour (F-F),

  • Mother-Own behaviour (M-M),

  • Son-Father (B-F),

  • Son-Mother (B-M).

The judgments involved 150 middle-class Japanese eighth-grade boys on the 18 subscales of the inventory. Thus, the data set consists of a 150 (Sons) x 18 (scales) x 4 (judgment combinations) data array.

Similarly, the girls data are ratings expressing the judgments of parents with respect to their own behaviour toward their daughters, and the judgments of their daughters with respect to their parents. Thus, there are four conditions:

  • Father-Own behaviour (F-F),

  • Mother-Own behaviour (M-M),

  • Daughter-Father (G-F),

  • Daughter-Mother (G-M).

The judgments involved 153 middle-class Japanese eighth-grade girls on the 18 subscales of the inventory. Thus, the data set consists of a 153 (Daughters) x 18 (scales) x 4 (judgment combinations) data array.

Preprocessing Given that the data are three-way profile data they are treated in the standard manner: centering per occasion-variable combination and by normalising the data after centring per lateral slice i.e. per scale over all sons/daughters x judges combinations. For details see Kroonenberg (2008), Chapter 13.

Source

The data sets are available from the Pieter Kroonenberg's web site at https://three-mode.leidenuniv.nl/.

References

Kojima, H. (1975). Inter-battery factor analysis of parents' and children's reports of parental behavior. Japanese Psychological Bulletin, 17, 33-48 (in Japanese).

Kroonenberg, P. M. (2008). Applied multiway data analysis. Wiley series in probability and statistics. Wiley, Hoboken NJ.

Kroonenberg, P. M., Harshman, R. A, & Murakami, T. (2009). Analysing three-way profile data using the Parafac and Tucker3 models illustrated with views on parenting. Applied Multivariate Research, 13:5-41. PDF available at: http://www.phaenex.uwindsor.ca/ojs/leddy/index.php/AMR/article/viewFile/2833/2271

Examples

data(Kojima)
    dim(Kojima.boys)
    dim(Kojima.girls)

The Khatri-Rao product of two matrices

Description

The function krp(A,B) returns the Khatri-Rao product of two matrices A and B, of dimensions I x K and J x K respectively. The result is an IJ x K matrix formed by the matching column-wise Kronecker products, i.e. the k-th column of the Khatri-Rao product is defined as kronecker(A[, k], B[, k]).

Usage

krp(A, B)

Arguments

A

Matrix of order I x K.

B

Matrix of order J x K.

Value

The IJ x K matrix of columnwise Kronecker products.

Author(s)

Valentin Todorov, [email protected]

References

Khatri, C. G., and Rao, C. Radhakrishna (1968). Solutions to Some Functional Equations and Their Applications to Characterization of Probability Distributions. Sankhya: Indian J. Statistics, Series A 30, 167-180.

Smilde, A., Bro R. and Gelardi, P. (2004). Multi-way Analysis: Applications in Chemical Sciences, Chichester:Wiley

Examples

a <- matrix(1:12, 3, 4)
b <- diag(1:4)
krp(a, b)
krp(b, a)

The trace of a square numeric matrix

Description

Computes the trace of a square numeric matrix. If A is not numeric and square matrix, the function terminates with an error message.

Usage

mtrace(A)

Arguments

A

A square numeric matrix.

Value

the sum of the values on the diagonal of the matrix A, i.e. sum(diag(A)).

Author(s)

Valentin Todorov, [email protected]

Examples

(a <- matrix(c(5,2,3, 4,-3,7, 4,1,2), ncol=3))
(b <- matrix(c(1,0,1, 0,1,2, 1,0,3), ncol=3))

mtrace(a)
mtrace(b)

## tr(A+B)=tr(A)+tr(B)
all.equal(mtrace(a) + mtrace(b), mtrace(a+b))

## tr(A)=tr(A')
all.equal(mtrace(a), mtrace(t(a)))

## tr(alphA)=alphatr(A)
alpha <- 0.5
all.equal(mtrace(alpha*a), alpha*mtrace(a))

##  tr(AB)=tr(BA)
all.equal(mtrace(a %*% b), mtrace(b %*% a))


##  tr(A)=tr(BAB-1)
all.equal(mtrace(a), mtrace(b %*% a %*% solve(b)))

Robust Parafac estimator for compositional data

Description

Compute a robust Parafac model for compositional data

Usage

Parafac(X, ncomp = 2, center = FALSE, 
    center.mode = c("A", "B", "C", "AB", "AC", "BC", "ABC"),
    scale=FALSE, scale.mode=c("B", "A", "C"), 
    const="none", conv = 1e-06, start="svd", maxit=10000, 
    optim=c("als", "atld", "int2"),
    robust = FALSE, coda.transform=c("none", "ilr", "clr"), 
    ncomp.rpca = 0, alpha = 0.75, robiter = 100, crit=0.975, trace = FALSE)

Arguments

X

3-way array of data

ncomp

Number of components

center

Whether to center the data

center.mode

If centering the data, on which mode to do this

scale

Whether to scale the data

scale.mode

If scaling the data, on which mode to do this

const

Optional constraints for each mode. Can be a three element character vector or a single character, one of "none" for no constraints (default), "orth" for orthogonality constraints, "nonneg" for nonnegativity constraints or "zerocor" for zero correlation between the extracted factors. For example, const="orth" means orthogonality constraints for all modes, while const=c("orth", "none", "none") sets the orthogonality constraint only for mode A.

conv

Convergence criterion, defaults to 1e-6

start

Initial values for the A, B and C components. Can be "svd" for starting point of the algorithm from SVD's, "random" for random starting point (orthonormalized component matrices or nonnegative matrices in case of nonnegativity constraint), or a list containing user specified components.

maxit

Maximum number of iterations, default is maxit=10000.

optim

How to optimize the CP loss function, default is to use ALS, i.e. optim="als". Other optins are ATLD (optim="atld") and INT2 (optim="INT2"). Please note that ATLD cannot be used with the robust option.

robust

Whether to apply a robust estimation

coda.transform

If the data are a composition, use an ilr or clr transformation. Default is non-compositional data, i.e. coda.transform="none"

ncomp.rpca

Number of components for robust PCA

alpha

Measures the fraction of outliers the algorithm should resist. Allowed values are between 0.5 and 1 and the default is 0.75

robiter

Maximal number of iterations for robust estimation

crit

Cut-off for identifying outliers, default crit=0.975

trace

Logical, provide trace output

Details

The function can compute four versions of the Parafac model:

  1. Classical Parafac,

  2. Parafac for compositional data,

  3. Robust Parafac and

  4. Robust Parafac for compositional data.

This is controlled though the paramters robust=TRUE and coda.transform=c("none", "ilr").

Value

An object of class "parafac" which is basically a list with components:

fit

Fit value

fp

Fit percentage

ss

Sum of squares

A

Orthogonal loading matrix for the A-mode

B

Orthogonal loading matrix for the A-mode

Bclr

Orthogonal loading matrix for the B-mode, clr transformed. Available only if coda.transform="ilr", otherwise NULL

C

Orthogonal loading matrix for the C-mode

Xhat

(Robustly) reconstructed array

const

Optional constraints (same as the input parameter)

iter

Number of iterations

rd

Residual distances

sd

Score distances

flag

The observations whose residual distance rd is larger than cutoff.rd or score distance sd is larger than cutoff.sd, can be considered outliers and receive a flag equal to zero. The regular observations receive a flag 1

robust

The paramater robust, whether robust method is used or not

coda.transform

Which coda transformation is used, can be coda.transform=c("none", "ilr", "clr").

Author(s)

Valentin Todorov [email protected] and Maria Anna Di Palma [email protected] and Michele Gallo [email protected]

References

Harshman, R.A. (1970). Foundations of Parafac procedure: models and conditions for an "explanatory" multi-mode factor analysis. UCLA Working Papers in Phonetics, 16: 1–84.

Engelen, S., Frosch, S. and Jorgensen, B.M. (2009). A fully robust PARAFAC method analyzing fluorescence data. Journal of Chemometrics, 23(3): 124–131.

Kroonenberg, P.M. (1983).Three-mode principal component analysis: Theory and applications (Vol. 2), DSWO press.

Rousseeuw, P.J. and Driessen, K.V. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3): 212–223.

Egozcue J.J., Pawlowsky-Glahn V., Mateu-Figueras G. and Barcel'o-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3): 279-300

Examples

#############
##
## Example with the UNIDO Manufacturing value added data

data(va3way)
dim(va3way)

## Treat quickly and dirty the zeros in the data set (if any) 
va3way[va3way==0] <- 0.001

## 
res <- Parafac(va3way)
res
print(res$fit)
print(res$A)

## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")

data(ulabor)
res <- Parafac(ulabor, robust=TRUE, coda.transform="ilr")
res

## Plot Orthonormalized A-mode component plot
plot(res, which="comp", mode="A", main="Component plot, A-mode")

## Plot Orthonormalized B-mode component plot
plot(res, which="comp", mode="B", main="Component plot, B-mode")

## Plot Orthonormalized C-mode component plot
plot(res, which="comp", mode="C", main="Component plot, C-mode")

Permutation of a matricized array

Description

Permutes the matricized (n x m x p) array X to the matricized array Y of order (m x p x n).

Usage

permute(X,n,m,p)

Arguments

X

Matrix (or data.frame coerced to a matrix) containing the matricized array

n

Number of A-mode entities of the array X

m

Number of B-mode entities of the array X

p

Number of C-mode entities of the array X

Value

Y

Matrix containing the permuted matricized array

References

H.A.L. Kiers (2000). Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics 14:105–122.

Examples

X <- array(1:24, c(4,3,2))
dim(X)

##  matricize the array
Xa <- unfold(X)     # matricized X with the A-mode entities in its rows
dim(Xa)
Xa

## matricized X with the B-mode entities in its rows
Xb <- permute(Xa, 4, 3, 2)
dim(Xb)
Xb

## matricized X with the C-mode entities in its rows
Xc <- permute(Xb, 3, 2, 4)
dim(Xc)
Xc

Plot a parafac or a tucker3 object

Description

Different plots for the results of Parafac or Tucker3 analysis, stored in a Parafac or a tucker3 object, see Details.

Usage

## S3 method for class 'tucker3'
plot(x, which = c("dd", "comp", "allcomp", "jbplot", 
    "tjplot", "all"), ask = (which == "all" && dev.interactive(TRUE)), id.n, ...)
    ## S3 method for class 'parafac'
plot(x, which = c("dd", "comp", "percomp", "allcomp", 
    "all"), ask = (which == "all" && dev.interactive(TRUE)), id.n, ...)

Arguments

x

A tucker3 object

which

Which plot to select (see Details). Default is dd, distance-distance plot.

ask

Generates all plots in interactive mode

id.n

Number of items to highlight

...

Other parameters to be passed to the lower level functions.

Details

Different plots for a tucker3 or parafac object will be produced. Use the parameter which to select which plot to produce:

dd

Distance-distance plot

comp

Paired components plot

percomp

Per-component plot - only for Parafac

allcomp

All components plot

jbplot

Joint biplot - only for Tucker3

tjplot

Trajectory plot - only for Tucker3

Author(s)

Valentin Todorov [email protected] and Maria Anna Di Palma [email protected] and Michele Gallo [email protected]

References

Kiers, H.A. (2000).Some procedures for displaying results from three-way methods. Journal of Chemometrics. 14(3): 151-170.

Kroonenberg, P.M. (1983).Three-mode principal component analysis: Theory and applications (Vol. 2), DSWO press.

Examples

#############
##
## Example with the UNIDO Manufacturing value added data

data(va3way)
dim(va3way)

## Treat quickly and dirty the zeros in the data set (if any)
va3way[va3way==0] <- 0.001

##
res <- Tucker3(va3way)
res
print(res$fit)
print(res$A)

## Print the core matrix
print(res$GA)

## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")

## Paired component plot, mode A
plot(res, which="comp", main="Paired component plot (mode A)")

## Paired component plot, mode B
plot(res, which="comp", mode="B", main="Paired component plot (mode B)")

## Joint biplot
plot(res, which="jbplot", main="Joint biplot")

## Trajectory
plot(res, which="tjplot", choices=c(1:4), arrows=FALSE, main="Trajectory biplot")

Matrix to array conversion

Description

Restore an array from its matricization with all the frontal slices of the array next to each other (mode="A")

Usage

toArray(x, n, m, r, mode = c("A", "B", "C"))

Arguments

x

Matrix (or data.frame coerced to a matrix) containing the elements of the frontal slices of an array

n

number of A-mode elements

m

number of B-mode elements

r

number of C-mode elements

mode

in which mode is the matricized array

Value

Three way array

Author(s)

Valentin Todorov [email protected] and Maria Anna Di Palma [email protected] and Michele Gallo [email protected]

References

H.A.L. Kiers (2000). Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics, 14: 105–122.

Examples

data(elind)
    di <- dim(elind)
    toArray(unfold(elind), di[1], di[2], di[3])

Robust Tucker3 estimator for compositional data

Description

Compute a robust Tucker3 model for compositional data

Usage

Tucker3(X, P = 2, Q = 2, R = 2, 
    center = FALSE, center.mode = c("A", "B", "C", "AB", "AC", "BC", "ABC"), 
    scale = FALSE, scale.mode = c("B", "A", "C"), 
    conv = 1e-06, start="svd",
    robust = FALSE, coda.transform=c("none", "ilr", "clr"), 
    ncomp.rpca = 0, alpha = 0.75, robiter=100, crit=0.975, trace = FALSE)

Arguments

X

3-way array of data

P

Number of A-mode components

Q

Number of B-mode components

R

Number of C-mode components

center

Whether to center the data

center.mode

If scaling the data, on which mode to do this

scale

Whether to scale the data

scale.mode

If centering the data, on which mode to do this

conv

Convergence criterion, defaults to 1e-6

start

Initial values for the A, B and C components. Can be "svd" for starting point of the algorithm from SVD's, "random" for random starting point (orthonormalized component matrices), or a list containing user specified components.

robust

Whether to apply a robust estimation

coda.transform

If the data are a composition, use an ilr or clr transformation. Default is non-compositional data, i.e. coda.transform="none"

ncomp.rpca

Number of components for robust PCA

alpha

Measures the fraction of outliers the algorithm should resist. Allowed values are between 0.5 and 1 and the default is 0.75

robiter

Maximal number of iterations for robust estimation

crit

Cut-off for identifying outliers, default crit=0.975

trace

Logical, provide trace output

Details

The function can compute four versions of the Tucker3 model:

  1. Classical Tucker3,

  2. Tucker3 for compositional data,

  3. Robust Tucker3 and

  4. Robust Tucker3 for compositional data.

This is controlled through the parameters robust=TRUE and coda.transform="ilr".

Value

An object of class "tucker3" which is basically a list with components:

fit

Fit value

fp

Fit percentage

A

Orthogonal loading matrix for the A-mode

B

Orthogonal loading matrix for the B-mode

Bclr

Orthogonal loading matrix for the B-mode, clr transformed. Available only if coda.transform="ilr", otherwise NULL

C

Orthogonal loading matrix for the C-mode

GA

Core matrix, which describes the relation between A, B and C, unfolded in A-form. The largest squared elements of the core matrix indicate the most important factors in the model of X.

iter

Number of iterations

rd

Residual distances

sd

Score distances

flag

The observations whose residual distance RD is larger than cutoff.RD can be considered as outliers and receive a flag equal to zero. The regular observations receive a flag 1

robust

The paramater robust, whether robust method is used or not

coda.transform

The input paramater coda.transform, what trnasofrmation for compositional data was used

La

Diagonal matrix containing the intrinsic eigenvalues for A-mode

Lb

Diagonal matrix containing the intrinsic eigenvalues for B-mode

Lc

Diagonal matrix containing the intrinsic eigenvalues for C-mode

Author(s)

Valentin Todorov [email protected] and Maria Anna Di Palma [email protected] and Michele Gallo [email protected]

References

Tucker, L.R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31: 279–311.

Egozcue J.J., Pawlowsky-Glahn, V., Mateu-Figueras G. and Barcel'o-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3): 279–300.

Examples

#############
##
## Example with the UNIDO Manufacturing value added data

data(va3way)
dim(va3way)

## Treat quickly and dirty the zeros in the data set (if any)
va3way[va3way==0] <- 0.001

##
res <- Tucker3(va3way)
res
print(res$fit)
print(res$A)

## Print the core matrix
print(res$GA)

## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")

## Paired component plot, mode A
plot(res, which="comp", main="Paired component plot (mode A)")

## Paired component plot, mode B
plot(res, which="comp", mode="B", main="Paired component plot (mode B)")

## Joint biplot
plot(res, which="jbplot", main="Joint biplot")

## Trajectory
plot(res, which="tjplot", choices=c(1:4), arrows=FALSE, main="Trajectory biplot")

Undeclared labor by region in Italy

Description

The dataset contains the undeclared labor in thousands work units. The data originate from Italy and are recorded at a regional level over a certain time horizon for five macroeconomic activities defined according to NACE Rev. 1.1 classification.

Usage

data("ulabor")

Format

A three-way array with dimension 22x5x5. The first dimension refers to 22 regions in Italy. The second dimension refers to the 5 economic activities. The third dimension refers to the years in the period 2001-2009.

Source

ISTAT (2011). Note metodologiche, la misura dell'occupazione non regolare nelle stime di contabilita nazionale [online]. Roma.

References

ISTAT (2011). Note metodologiche, la misura dell'occupazione non regolare nelle stime di contabilita nazionale [online]. Roma.

Di Palma M.A., Filzmoser P., Gallo M. and Hron, K. (2016). A robust CP model for compositional data, submitted.

Examples

data(ulabor)
dim(ulabor)
str(ulabor)

## Plot robust and non-robust DD-plots of the ilr-transformed data
usr <- par(mfrow=c(1,2))
res1 <- Parafac(ulabor, robust=TRUE, coda.transform="ilr")
res2 <- Parafac(ulabor, coda.transform="ilr")
plot(res1)
plot(res2)
par(usr)

## Not run: 

## Plot Orthonormalized A-mode component plot
res <- Parafac(ulabor, robust=TRUE, coda.transform="ilr")
plot(res, which="comp", mode="A", main="Component plot, A-mode")

## Plot Orthonormalized B-mode component plot
plot(res, which="comp", mode="B", main="Component plot, B-mode")

## Plot Orthonormalized B-mode component plot
plot(res, which="comp", mode="C", main="Component plot, C-mode")


## Per component plot
## adapted for the example and only for robust, ilr transformed model
##
##
res <- Parafac(ulabor, robust=TRUE, coda.transform="ilr")

plot(res, which="percomp")              # component 1
plot(res, which="percomp", comp=2)      # component 2

## End(Not run)

Matrix unfolding

Description

Conducts matricizations of a three-way array into matrices according to the selected mode.

Usage

unfold(x, mode=c("A", "B", "C"))

Arguments

x

Array to be unfolded

mode

the selected mode for unfolding

Value

A matrix represnting the input array, according to the selected mode:

  • Mode=A: B-mode entities are nested within C-mode entities (all the frontal slices of the array next to each other) item Mode=B: C-mode entities nested within A-mode entities (all the horizontal slices of the array next to each other) item Mode C: A-mode entities nested within B-mode entities (all the lateral slices of the array next to each other)

Author(s)

Valentin Todorov [email protected]

References

H.A.L. Kiers (2000). Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics 14:105–122.

Examples

(X <- array(1:24, c(4,3,2)))
dim(X)

## matricize the array

##  matricized X with the A-mode entities in its rows
##  all the frontal slices of the array next to each other
##
(Xa <- unfold(X))
dim(Xa)

## matricized X with the B-mode entities in its rows
##  all the horizontal slices of the array next to each other
##
(Xb <- unfold(X, mode="B"))
dim(Xb)

## matricized X with the C-mode entities in its rows
##  all the lateral slices of the array next to each other
##
(Xc <- unfold(X, mode="C"))
dim(Xc)

Manufacturing value added by technology intensity for several years

Description

A three-way array containing manufacturing value added by technology intensity for 55 countries in the period 2000–2010. UNIDO maintains a unique database containing key industrial statistics indicators for more than 160 countries in the world in the period 1963-2011: INDSTAT 2, available at https://stat.unido.org. The data are organized according to the International Standard Industrial Classification of all economic activities (ISIC) Revision 3 at 2-digit level. The present data set was created by aggregating the 23 2-digit divisions into five groups according to technology intensity, using the UNIDO derived classification (Upadhyaya, 2011). Then 55 countries were selected which have relatively complete data in the period 2000–2010.

Usage

data(va3way)

Format

A three-way array with dimension 55x5x11. The first dimension refers to 55 countries. The second dimension refers to the five categories of technology intensity described above. The third dimension refers to the years in the period 2000–2010.

Details

Note that the values in the second mode (sectors) sum up to a constant - the total manufacturing value added of a country in a given year and thus the data set has a compositional character.

Source

https://stat.unido.org

References

Upahdyaya S (2011). Derived classifications for industrial performance indicators. In Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session STS022).

Upadhyaya S, Todorov V (2008). UNIDO Data Quality. UNIDO Staff Working Paper, Vienna.

Examples

data(va3way)
ct <- 2
x <- va3way[ct,,]/1000000
plot(colnames(x), x[1,], ylim=c(min(x), max(x)), type="n", ylab="Manufacturing Value 
    Added in million USD", xlab="Years")
for(i in 1:nrow(x))
    lines(colnames(x), x[i,], col=i)
legend("topleft", legend=rownames(x), col=1:nrow(x), lwd=1)
title(paste("Coutnry: ", rownames(va3way[,,1])[ct]))


## Treat quickly and dirty the zeros in the data set (if any) 
## in order to be able to perform ilr transformation:

va3way[va3way==0] <- 0.001

res <- Tucker3(va3way)

##
## Not yet a print function
##
print(res$fit)
print(res$A)

## Print the core matrix
print(res$GA)

## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")

## Paired component plot, mode A
plot(res, which="comp", main="Paired component plot (mode A)")

## Paired component plot, mode B
plot(res, which="comp", mode="B", main="Paired component plot (mode B)") 

## Joint biplot
plot(res, which="jbplot", main="Joint biplot") 

## Trajectory
plot(res, which="tjplot", main="Trajectory biplot")

Water quality data in Wyoming, USA

Description

Water quality data for three years of seasonal compositional groundwater chemistry data for 14 wells at a study site in Wyoming, USA. Routine water quality monitoring typically involves measurement of J parameters and constituents measured at I number of static locations at K sets of seasonal occurrences.

Usage

data("waterquality")

Format

A three-way array with dimension 14x12x10. The first dimension refers to 14 wells at a study site in Wyoming, USA. The second dimension refers to the ten most reactive and indicative dissolved constituents at the site: B, Ba, Ca, Cl, K, Mg, Na, Si, Sr, and SO4. In addition, the concentration of water in each sample was calculated. The third dimension refers to the time of collection - ten occasions.

References

Engle, M.A., Gallo, M., Schroeder, K.T., Geboy, N.J., Zupancic, J.W., (2014). Three-way compositional analysis of water quality monitoring data. Environmental and Ecological Statistics, 21(3):565-581.

Examples

data(waterquality)
dim(waterquality)               # [1] 14 12 10
dim(waterquality[,,1])          # [1] 14 12
rownames(waterquality[,,1])     # the 14 wells
colnames(waterquality[,,1])     # the 12 chemical compositions
dim(waterquality[,1,])          # [1] 14 10
colnames(waterquality[,1,])     # the ten occasions

(res <- Tucker3(waterquality, robust=FALSE, coda.transform="ilr"))

## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")
## Paired component plot, mode A
plot(res, which="comp", main="Paired component plot (mode A)")

## Paired component plot, mode B
plot(res, which="comp", mode="B", main="Paired component plot (mode B)") 

## Joint biplot
plot(res, which="jbplot", main="Joint biplot") 

## Trajectory
plot(res, which="tjplot", main="Trajectory biplot")