descriptive analysis and pca

31
Descriptive Analysis and PCA Hervé Abdi The university of Texas at Dallas [email protected] Dominique Valentin ENSBANA/CESG [email protected]

Upload: amal

Post on 23-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Descriptive Analysis and PCA. Dominique Valentin ENSBANA/CESG [email protected]. Hervé Abdi The university of Texas at Dallas [email protected]. Back to the yogurt example. Texture Thickness: consistency of the mass in the mouth - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Descriptive Analysis and PCA

Descriptive Analysisand PCA

Hervé AbdiThe university of Texas at Dallas

[email protected]

Dominique ValentinENSBANA/CESG

[email protected]

Page 2: Descriptive Analysis and PCA

Back to the yogurt example

Texture Thickness: consistency of the mass in the mouthRate of Melt: amount of product melted after a certain pressure of the tongueGraininess: amount of particle in massMouth coating: amount of film left on the mouth surfaces

Basic tastesSweet: SucroseSour: lactic acidBitter: caffeineSalty: sodium chloride

ArômeWater: taste like water down Flour: 1 spoon of flavor mixed in waterWood: cutting from pencil sharpening Chalk: smectaMilk: whole milk Raw pie crust: commercial raw pie crustCream: crème fraiche Hazelnut: : hazelnut powderearthy: earth Mushroom: dry mushrooms soaked in water

Page 3: Descriptive Analysis and PCA

9 panélistes

5 yogurts: 2 cow milk yogurts 3 soy yogurts

Pas du tout Très

Amer

Pas du tout TrèsSalé

Pas du toutAstringent

Back to the yogurt example

Page 4: Descriptive Analysis and PCA

TextureFarineux - Flour

0,00

2,00

4,00

6,00

8,00

10,00

sojacarrefour

sojasun sojade veloutédanone

leaderprice

Inte

nsi

té m

oye

nn

e

ab ab ab b

leaderprice

Épais – thickness

0,00

2,00

4,00

6,00

8,00

10,00

sojacarrefour

sojasun sojade veloutédanone

Inte

nsi

té m

oye

nn

e

bc bca

abd

Gras – Mouth coating

0,00

2,00

4,00

6,00

8,00

10,00

sojacarrefour

sojasun sojade veloutédanone

leaderprice

Inte

nsi

té m

oye

nn

e

b abab a

ab

Fondant - melt

0,00

2,00

4,00

6,00

8,00

10,00

sojacarrefour

sojasun sojade veloutédanone

leaderprice

Inte

nsi

té m

oye

nn

e

abc abcc

abcab

Back to the yogurt example

Page 5: Descriptive Analysis and PCA

astringent

0,00

2,00

4,00

6,00

8,00

10,00

sojacarrefour

sojasun sojade veloutédanone

leaderprice

Inte

nsi

té m

oye

nn

e

Taste

Sucré - Sweet

0,00

2,00

4,00

6,00

8,00

10,00

sojacarrefour

sojasun sojade veloutédanone

leaderprice

Inte

nsi

té m

oye

nn

e

ab ab abab ab

leaderprice

Acide - Sour

0,00

2,00

4,00

6,00

8,00

10,00

sojacarrefour

sojasun sojade veloutédanone

Inte

nsi

té m

oye

nn

e

cd cd cd bca

Amer - Bitter

0,00

2,00

4,00

6,00

8,00

10,00

sojacarrefour

sojasun sojade veloutédanone

leaderprice

Inte

nsi

té m

oye

nn

e

aa a a a

a abc abcc

abc

Back to the yogurt example

Page 6: Descriptive Analysis and PCA

AromaFarine - flour

0,00

2,00

4,00

6,00

8,00

10,00

sojacarrefour

sojasun sojade veloutédanone

leaderprice

Inte

nsi

té m

oye

nn

e

Craie - chalk

0,00

2,00

4,00

6,00

8,00

10,00

sojacarrefour

sojasun sojade veloutédanone

leaderprice

Inte

nsi

té m

oye

nn

e

Crème - cream

0,00

2,00

4,00

6,00

8,00

10,00

sojacarrefour

sojasun sojade veloutédanone

leaderprice

Inte

nsi

té m

oye

nn

e

cabc

d

abc

d

ab b

b b

Noisette - Hazelnut

0,00

2,00

4,00

6,00

8,00

10,00

sojacarrefour

sojasun sojade veloutédanone

leaderprice

Inte

nsi

té m

oye

nn

e

aba

ab ab b c cc

ac

Back to the yogurt example

Page 7: Descriptive Analysis and PCA

-0.8 -0.4 0 0.4 0.8

-0.8

-0.4

0

0.4

0.8

Facteur 1 - 61.04 %

Facteur 2 - 17.84 %

farineux

epais

gras

fondant

sucre

acide

astringent

eau

farine

bois

craie

lait

creme

noisette

terreuxchampignon

-4.5 -3.0 -1.5 0 1.5 3.0

-2

-1

0

1

2

Facteur 1 - 61.04 %

Facteur 2 - 17.84 %

soja bio

soja champion

Soja leaderpriceSoja carrefour

Soja bifidus

Soja sun

sojade

Soja délice

carrefour

velouté danone

danone bifidus

Leader price

A solution: Principal Component Analysis

Page 8: Descriptive Analysis and PCA

A statistical technique used to transform a number of correlated variables into a smaller number of uncorrelated variables called principal components.

The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible

The mathematical technique used in PCA is called eigen analysis

What is PCA ?

Page 9: Descriptive Analysis and PCA

When to use PCA ?

To analyze 2 dimensional data tables describing I observations with J quantitative variables

1 … j … J

1...i...I

yij…...

……

...

Variables

Obs

erva

tions

Page 10: Descriptive Analysis and PCA

Why using PCA ?

1.To evaluate the similarity between the observations, here the products

2. to detect structure in the relationships between variables, here the descriptors

3. to reduce the number of variables to allow for a graphical representation of the data

To give a synthetic description of the products

Page 11: Descriptive Analysis and PCA

General principle of PCA

1 … j … J1

...i

...I

yij…...

……

...

VariablesO

bse

rva

tions

PC1 .. PCk .. PCK

1

...i

...I

Cpik…...

……

...

Principal components

Diagonalizationor eigen analysis

Cp1

PC2

PC1

PC2

Circle of correlations Projection of observations

++ ++

Page 12: Descriptive Analysis and PCA

A baby example: wine profile

AmberBlackcurrent Coconut Leather Musc

Goose berry Woody Vanilla Rasberry

v1 7.000 3.000 1.000 6.000 9.000 3.000 1.000 0.000 2.000

v2 0.000 5.000 1.000 1.000 0.000 7.000 0.000 1.000 6.000

v3 1.000 9.000 0.000 0.000 0.000 6.000 1.000 1.000 5.000

v4 1.000 6.000 7.000 0.000 1.000 6.000 4.000 6.000 4.000

v5 6.000 1.000 8.000 5.000 4.000 2.000 5.000 5.000 1.000

v6 1.000 6.000 5.000 1.000 0.000 5.000 5.000 7.000 6.000

v7 7.000 3.000 1.000 6.000 8.000 2.000 1.000 0.000 2.000

v8 6.000 3.000 0.000 5.000 5.000 3.000 1.000 1.000 3.000

v9 0.000 4.000 4.000 1.000 0.000 7.000 6.000 5.000 5.000

v10 4.000 2.000 6.000 5.000 6.000 2.000 5.000 7.000 1.000

v11 5.000 1.000 4.000 6.000 7.000 1.000 6.000 7.000 2.000

v12 1.000 6.000 0.000 1.000 0.000 5.000 0.000 1.000 8.000

Page 13: Descriptive Analysis and PCA

A baby example: wine profile

Page 14: Descriptive Analysis and PCA

A baby example: wine profile

Page 15: Descriptive Analysis and PCA

How to find the principal components?

Step 1: get some data

Step 2: subtract the means of the variables

Step 3: find the eigenvectors and eigenvalues of the covariance matrix

Step 4: find the principal components by projecting the observations onto the eigenvectors

Step 5: compute the loading as the correlation between the original variables and the principal components

Page 16: Descriptive Analysis and PCA

A 2D example: step 1 get the data

20 words :

Variable 1 = number of letters

Variable 2 = number of lines used to define the words in the dictionary.

Page 17: Descriptive Analysis and PCA

A 2D example: step 1 get the data

Page 18: Descriptive Analysis and PCA

A 2D example: step 2 subtract the mean

Y = “length of words ” MY = 6y = (Y −MY)

W = “number of lines of the definition” MW = 8w = (W −MW)

Page 19: Descriptive Analysis and PCA

A 2D example: step 2 subtract the mean

Page 20: Descriptive Analysis and PCA

A 2D example: step 3 find the eigenvectors

Page 21: Descriptive Analysis and PCA

A 2D example: step 3 find the eigenvectors

Page 22: Descriptive Analysis and PCA

A 2D example: project the observations

Page 23: Descriptive Analysis and PCA

A 2D example: project the observations

Page 24: Descriptive Analysis and PCA

A 2D example: compute the loadings

r (W, F1) = 0.97

Pearson correlation coefficient

Page 25: Descriptive Analysis and PCA

A 2D example: compute the loadings

r (W, F2) = 0.23

Pearson correlation coefficient

Page 26: Descriptive Analysis and PCA

A 2D example: compute the loadings

r (Y, F1) = -0.87

Pearson correlation coefficient

Page 27: Descriptive Analysis and PCA

A 2D example: compute the loadings

r (Y, F2) = 0.50

Pearson correlation coefficient

Page 28: Descriptive Analysis and PCA

A 2D example: draw the circle of correlation

r (W, F1) = 0.97

r (W, F2) = 0.23

r (Y, F1) = -0.87

r (Y, F2) = 0.50

Page 29: Descriptive Analysis and PCA

How to compute the explained variance ?

Eigenvalue % variance Cumulated % variance

392 88 88 52 12 100 444

392

444X 100 = 88%

Page 30: Descriptive Analysis and PCA

How many components to keep

The Kaiser criterion. retain only composante with eigenvalues greater than 1.

The scree test.

Common sens. Keep dimensions that are interpretable.

Examines several solutions and chooses the one that makes the best "sense."

0

0,5

1

1,5

2

2,5

3

3,5

4

1 2 3 4 5 6 7 8

Page 31: Descriptive Analysis and PCA

Should I normalize the data

Yes if they are not measured on the same scale

Otherwise it depends:

Normalized: same weight for all variables Not normalized: weight proportional to standard deviation