marine cadoret, sébastien lê, jérôme pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf ·...
TRANSCRIPT
Elements of validity in Multiple Factor Analysis
Marine Cadoret, Sébastien Lê, Jérôme Pagès
Applied Mathematics Department, Agrocampus Rennes, France
Caserta, june 11th 2008
SFC-CLADAG (Caserta) Elements of validity in MFA 1 / 20
Context
Problem
Selection of the number of dimensions in Principal Component Analysis(PCA) :
Bar plot of the eigenvalues
Visual test : Cattell criterion
Stability in spite of perturbations in the dataset
SFC-CLADAG (Caserta) Elements of validity in MFA 2 / 20
Methods PCA
Dray, 2007 : �rst dimension
X√λ1 v1 X̂1
u01
Eigenvector of XX 0 Eigenvector of X 0X
Is the data reconstituted from the �rst dimension (X̂1) closer to theone of original data (X ) than a random table?
Measure of similarity : RV coe�cient (Escou�er, 1973)
SFC-CLADAG (Caserta) Elements of validity in MFA 3 / 20
Methods PCA
Dray, 2007 : �rst dimension
X√λ1 v1 X̂1
u01
Eigenvector of XX 0 Eigenvector of X 0X
Is the observed RV coe�cient large?
H0 : Absence of structure among variables
Procedure based on permutation tests
SFC-CLADAG (Caserta) Elements of validity in MFA 4 / 20
Methods PCA
First dimension : permutation tests
Calculate the p-value associated to the observed RV :1 Repeat a large number of times :
1 Independent row permutations within each column of X → Xp
2 PCA on X p
3 Reconstitution of X p from the �rst dimension of the PCA on X p → X̂p
1
4 Calculate RV (X p, X̂ p
1)
2 Distribution of RV coe�cient under H0
3 Identify the observed value in this distribution to get the p-value
SFC-CLADAG (Caserta) Elements of validity in MFA 5 / 20
Methods PCA
Evaluation of Dray's procedure
Behavior of the procedure under the alternative hypothesis (Dray)
Behavior of the procedure under the null hypothesis
SFC-CLADAG (Caserta) Elements of validity in MFA 6 / 20
Methods PCA
Behavior of the procedure under H0 : �rst dimension
simulation algorithm
0
0
1
1
×1000
0 1
Compute the RV between X and X̂1
Compute the RV between Xp and X̂p1
Distribution of RV
Compute the p-value associated to the observed RV
Distribution of p-value under H0
Reconstitution of the first dimension of Xp → X̂p1
Simulation of a dataset X under H0
Row permutations of X → Xp
PCA on Xp
×10000
SFC-CLADAG (Caserta) Elements of validity in MFA 7 / 20
Methods PCA
Behavior of the procedure under H0 : �rst dimension
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
First dimension
level
% o
f dat
aset
s w
ith s
igni
fican
t 1st
dim
ensi
on
⇒ For a signi�cant level of α%, we observe α% of data tables having asigni�cant �rst dimension
SFC-CLADAG (Caserta) Elements of validity in MFA 7 / 20
Methods PCA
Dray, 2007 : second dimension
We are in the space orthogonal to the �rst dimension
SFC-CLADAG (Caserta) Elements of validity in MFA 8 / 20
Methods PCA
Dray, 2007 : second dimension
We use the same methodology that for the �rst dimension : we calculatethe RV coe�cient between X − X̂1 and X̂2.
SFC-CLADAG (Caserta) Elements of validity in MFA 9 / 20
Methods PCA
Behavior of the procedure under H0 : second dimension
Same simulation procedure
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●●
●●●●●●
●●●
●●●●●●
●●●●●●
●●●●●
●●●
●●●●●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Second dimension
Significant level of 20% for the first dimensionlevel
% o
f dat
aset
s w
ith s
igni
fican
t 2nd
e di
men
sion
SFC-CLADAG (Caserta) Elements of validity in MFA 10 / 20
Methods PCA
Particular case
⇒ Stability 6= Signi�cant structure
SFC-CLADAG (Caserta) Elements of validity in MFA 11 / 20
Methods MFA
Multiple Factor Analysis
Multiple Factor Analysis deals with data tables in which a set of individuals(I ) is described by several groups of variables (J)
MFA highlights a structure common to all the groups, to some groups orspeci�c to a group.
SFC-CLADAG (Caserta) Elements of validity in MFA 12 / 20
Methods MFA
2 main questions
Does the dimension s correspond to a structure common to severalgroups?
In this case, which groups contribute to this common structure?
SFC-CLADAG (Caserta) Elements of validity in MFA 13 / 20
Methods MFA
Existence of a common structure in MFA
H0 : Absence of common structure (no links between groups)
Row permutations within each group
First dimension : Calculate the RV coe�cient between X and X̂1
SFC-CLADAG (Caserta) Elements of validity in MFA 14 / 20
Methods MFA
Contribution of groups to the common structure
H0 : No contribution of the group j to the common structure
First dimension : Calculate the RV coe�cient between Xj and [X̂j ]1
SFC-CLADAG (Caserta) Elements of validity in MFA 15 / 20
Application
Application
Classical example of MFA (INRA Angers, Agrocampus Rennes, Spad,FactoMineR)
21 wines described by 27 variables gathered into 4 groups :
Olfaction before shaking : 5 variables
Vision : 3 variables
Olfaction after shaking : 10 variables
Gustation : 9 variables
Expected results :
Dim.1 Dim.2 Dim.3 Dim.4
Olfaction before shaking × × ×Vision ×Olfaction after shaking × × ×Gustation × ×
SFC-CLADAG (Caserta) Elements of validity in MFA 16 / 20
Application
Application : Number of dimensions
λ P-value
Dim.1 3.46 < 0.001Dim.2 1.37 < 0.001Dim.3 0.62 0.004Dim.4 0.37 0.15
SFC-CLADAG (Caserta) Elements of validity in MFA 17 / 20
Application
Application : Contribution of the groups
Contribution
Dim.1 Dim.2 Dim.3 Dim.4
Olfaction before shaking 0.78 0.62 0.37 0.17Vision 0.85 0.04 0.01 0.05Olfaction after shaking 0.92 0.47 0.18 0.10Gustation 0.90 0.24 0.05 0.05Sum 3.46 1.37 0.62 0.37
P-value
Dim.1 Dim.2 Dim.3 Dim.4
Olfaction before shaking 0.02 0.174 0.038 0.127Vision 0.007 0.104 0.387 0.149Olfaction after shaking < 0.001 0.004 < 0.001 0.638Gustation < 0.001 0.002 0.278 0.39
SFC-CLADAG (Caserta) Elements of validity in MFA 18 / 20
Conclusion, perspective
Conclusion, perspective
Dray's procedure extended to MFA
Ambiguity between stability and signi�cant structure
Implementation of systematic simulations in MFA
SFC-CLADAG (Caserta) Elements of validity in MFA 19 / 20
http://factominer.free.fr
R package dedicated to exploratory analysiswritten by Applied Mathematics Department of Agrocampus
SFC-CLADAG (Caserta) Elements of validity in MFA 20 / 20