descriptive analysis and pca
DESCRIPTION
Descriptive Analysis and PCA. Dominique Valentin ENSBANA/CESG [email protected]. Hervé Abdi The university of Texas at Dallas [email protected]. Back to the yogurt example. Texture Thickness: consistency of the mass in the mouth - PowerPoint PPT PresentationTRANSCRIPT
Descriptive Analysisand PCA
Hervé AbdiThe university of Texas at Dallas
Dominique ValentinENSBANA/CESG
Back to the yogurt example
Texture Thickness: consistency of the mass in the mouthRate of Melt: amount of product melted after a certain pressure of the tongueGraininess: amount of particle in massMouth coating: amount of film left on the mouth surfaces
Basic tastesSweet: SucroseSour: lactic acidBitter: caffeineSalty: sodium chloride
ArômeWater: taste like water down Flour: 1 spoon of flavor mixed in waterWood: cutting from pencil sharpening Chalk: smectaMilk: whole milk Raw pie crust: commercial raw pie crustCream: crème fraiche Hazelnut: : hazelnut powderearthy: earth Mushroom: dry mushrooms soaked in water
9 panélistes
5 yogurts: 2 cow milk yogurts 3 soy yogurts
Pas du tout Très
Amer
Pas du tout TrèsSalé
Pas du toutAstringent
Back to the yogurt example
TextureFarineux - Flour
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
ab ab ab b
leaderprice
Épais – thickness
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
Inte
nsi
té m
oye
nn
e
bc bca
abd
Gras – Mouth coating
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
b abab a
ab
Fondant - melt
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
abc abcc
abcab
Back to the yogurt example
astringent
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
Taste
Sucré - Sweet
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
ab ab abab ab
leaderprice
Acide - Sour
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
Inte
nsi
té m
oye
nn
e
cd cd cd bca
Amer - Bitter
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
aa a a a
a abc abcc
abc
Back to the yogurt example
AromaFarine - flour
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
Craie - chalk
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
Crème - cream
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
cabc
d
abc
d
ab b
b b
Noisette - Hazelnut
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
aba
ab ab b c cc
ac
Back to the yogurt example
-0.8 -0.4 0 0.4 0.8
-0.8
-0.4
0
0.4
0.8
Facteur 1 - 61.04 %
Facteur 2 - 17.84 %
farineux
epais
gras
fondant
sucre
acide
astringent
eau
farine
bois
craie
lait
creme
noisette
terreuxchampignon
-4.5 -3.0 -1.5 0 1.5 3.0
-2
-1
0
1
2
Facteur 1 - 61.04 %
Facteur 2 - 17.84 %
soja bio
soja champion
Soja leaderpriceSoja carrefour
Soja bifidus
Soja sun
sojade
Soja délice
carrefour
velouté danone
danone bifidus
Leader price
A solution: Principal Component Analysis
A statistical technique used to transform a number of correlated variables into a smaller number of uncorrelated variables called principal components.
The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible
The mathematical technique used in PCA is called eigen analysis
What is PCA ?
When to use PCA ?
To analyze 2 dimensional data tables describing I observations with J quantitative variables
1 … j … J
1...i...I
yij…...
……
...
Variables
Obs
erva
tions
Why using PCA ?
1.To evaluate the similarity between the observations, here the products
2. to detect structure in the relationships between variables, here the descriptors
3. to reduce the number of variables to allow for a graphical representation of the data
To give a synthetic description of the products
General principle of PCA
1 … j … J1
...i
...I
yij…...
……
...
VariablesO
bse
rva
tions
PC1 .. PCk .. PCK
1
...i
...I
Cpik…...
……
...
Principal components
Diagonalizationor eigen analysis
Cp1
PC2
PC1
PC2
Circle of correlations Projection of observations
++ ++
A baby example: wine profile
AmberBlackcurrent Coconut Leather Musc
Goose berry Woody Vanilla Rasberry
v1 7.000 3.000 1.000 6.000 9.000 3.000 1.000 0.000 2.000
v2 0.000 5.000 1.000 1.000 0.000 7.000 0.000 1.000 6.000
v3 1.000 9.000 0.000 0.000 0.000 6.000 1.000 1.000 5.000
v4 1.000 6.000 7.000 0.000 1.000 6.000 4.000 6.000 4.000
v5 6.000 1.000 8.000 5.000 4.000 2.000 5.000 5.000 1.000
v6 1.000 6.000 5.000 1.000 0.000 5.000 5.000 7.000 6.000
v7 7.000 3.000 1.000 6.000 8.000 2.000 1.000 0.000 2.000
v8 6.000 3.000 0.000 5.000 5.000 3.000 1.000 1.000 3.000
v9 0.000 4.000 4.000 1.000 0.000 7.000 6.000 5.000 5.000
v10 4.000 2.000 6.000 5.000 6.000 2.000 5.000 7.000 1.000
v11 5.000 1.000 4.000 6.000 7.000 1.000 6.000 7.000 2.000
v12 1.000 6.000 0.000 1.000 0.000 5.000 0.000 1.000 8.000
A baby example: wine profile
A baby example: wine profile
How to find the principal components?
Step 1: get some data
Step 2: subtract the means of the variables
Step 3: find the eigenvectors and eigenvalues of the covariance matrix
Step 4: find the principal components by projecting the observations onto the eigenvectors
Step 5: compute the loading as the correlation between the original variables and the principal components
A 2D example: step 1 get the data
20 words :
Variable 1 = number of letters
Variable 2 = number of lines used to define the words in the dictionary.
A 2D example: step 1 get the data
A 2D example: step 2 subtract the mean
Y = “length of words ” MY = 6y = (Y −MY)
W = “number of lines of the definition” MW = 8w = (W −MW)
A 2D example: step 2 subtract the mean
A 2D example: step 3 find the eigenvectors
A 2D example: step 3 find the eigenvectors
A 2D example: project the observations
A 2D example: project the observations
A 2D example: compute the loadings
r (W, F1) = 0.97
Pearson correlation coefficient
A 2D example: compute the loadings
r (W, F2) = 0.23
Pearson correlation coefficient
A 2D example: compute the loadings
r (Y, F1) = -0.87
Pearson correlation coefficient
A 2D example: compute the loadings
r (Y, F2) = 0.50
Pearson correlation coefficient
A 2D example: draw the circle of correlation
r (W, F1) = 0.97
r (W, F2) = 0.23
r (Y, F1) = -0.87
r (Y, F2) = 0.50
How to compute the explained variance ?
Eigenvalue % variance Cumulated % variance
392 88 88 52 12 100 444
392
444X 100 = 88%
How many components to keep
The Kaiser criterion. retain only composante with eigenvalues greater than 1.
The scree test.
Common sens. Keep dimensions that are interpretable.
Examines several solutions and chooses the one that makes the best "sense."
0
0,5
1
1,5
2
2,5
3
3,5
4
1 2 3 4 5 6 7 8
Should I normalize the data
Yes if they are not measured on the same scale
Otherwise it depends:
Normalized: same weight for all variables Not normalized: weight proportional to standard deviation