discriminant analysis-lecture 8

12
4/30/2012 1 Linear Discriminant Analysis Proposed by Fisher (1936) for classifying an observation into one of two possible groups based on many measurements x 1 ,x 2 ,…x p . Seek a linear transformation of the variables Y=w 1 x 1 +w 2 x 2 +..+w p x p + a constant

Upload: laila-fatehy

Post on 21-May-2015

2.561 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Discriminant Analysis-lecture 8

4/30/2012

1

Linear Discriminant Analysis

Proposed by Fisher (1936) for

classifying an observation into one of

two possible groups based on many

measurements x1,x2,…xp.

Seek a linear transformation of the

variables Y=w1x1+w2x2+..+wpxp + a constant

Page 2: Discriminant Analysis-lecture 8

4/30/2012

2

Linear Discriminant Analysis

Discriminant analysis – creates an equation which will minimize the possibility of misclassifying cases into their respective groups or categories.

The purposes of discriminant analysis (DA)

Discriminant Function Analysis (DA) undertakes the same task as multiple linear regression by predicting an outcome.

However, multiple linear regression is limited to cases where the dependent is numerical

But many interesting variables are categorical,

Page 3: Discriminant Analysis-lecture 8

4/30/2012

3

The objective of DA is to perform dimensionality reduction while preserving as much of the class discriminatory information as possible

Assume we have a set of D-dimensional samples {x 1, x2, …, xN}, N1 of which belong to class ω1, and N2 to class ω2.

We seek to obtain a scalar y by projecting the samples x onto a line

y = wTx

•The top two distributions overlap too much and do not

discriminate too well compared to the bottom set.

•Misclassification will be minimal in the lower pair,

•whereas many will be misclassified in the top pair.

Page 4: Discriminant Analysis-lecture 8

4/30/2012

4

Linear Discriminant Analysis

Assume variance matrices equal

Classify the item x at hand to one of J groups

based on measurements on p predictors.

Rule: Assign x to group j that has the closest mean

j = 1, 2, …, J

Distance Measure: Mahalanobis Distance.

Linear Discriminant Analysis

Distance Measure:

For j = 1, 2, …, J, compute

T

1

j jj plx x xd Sx x

Assign x to the group for which dj is minimum

S pl is the pooled estimate of the covariance

matrix

Page 5: Discriminant Analysis-lecture 8

4/30/2012

5

…or equivalently, assign x to the

group for which

xSxSxL jpl

T

jpl

T

jjxx

11

2

1

is a maximum.

(Notice the linear form of the equation!)

Linear Discriminant Analysis

…optimal if….

• Multivariate normal distribution for the

observation in each of the groups

• Equal covariance matrix for all groups

• Equal prior probability for each group

• Equal costs for misclassification

Page 6: Discriminant Analysis-lecture 8

4/30/2012

6

Relaxing the assumption of equal prior

probabilities…

xSxSxpL jpl

T

jpl

T

jj

xjx11

2

1ln

p j being the prior probability for the jth

group.

Relaxing the assumption of equal

covariance matrices…

jx

jjx

xxSx

SpQ

j

T

j

j

1

ln2

1ln

result?…Quadratic DiscriminantAnalysis

Page 7: Discriminant Analysis-lecture 8

4/30/2012

7

Quadratic Discriminant Analysis

Rule: assign to group j if is

the largest.

xQj

Optimal if

the J groups of measurements are

multivariate normal

Other Extensions & Related MethodsRelaxing the assumption of normality…

Kernel density based LDA and QDA

Other extensions…..

Regularized discriminant analysis

Penalized discriminant analysis

Flexible discriminant analysis

Page 8: Discriminant Analysis-lecture 8

4/30/2012

8

Evaluations of the Methods

Classification Table (confusion matrix)

Actual group Number of

observations

Predicted group

A B

A

B

nA

nB

n11

n21

n12

n22

Evaluations of the Methods

Apparent Error Rate (APER):

….underestimates the actual error rate.

Improved estimate of APER:

Holdout Method or cross validation

# misclassifiedAPER =

Total # of cases

Page 9: Discriminant Analysis-lecture 8

4/30/2012

9

Fisher's iris dataset

•The data were collected by Anderson and used

by Fisher to formulate the linear discriminant

analysis (LDA or DA).

•The dataset gives the measurements in

centimeters of the following variables:

1- sepal length, 2- sepal width, 3- petal length,

and 4- petal width,

this for 50 fowers from each of the 3 species of

iris considered.

•The species considered are Iris setosa,

versicolor, and virginica

setosa versicolor virginica

Page 10: Discriminant Analysis-lecture 8

4/30/2012

10

An Example: Fisher’s Iris Data

Actual

Group

Number of

Observations

Predicted Group

Setosa Versicolo

r

Virginica

Setosa

Versicolor

Virginica

50

50

50

50

0

0

0

48

1

0

2

49

Table 1: Linear Discriminant Analysis

(APER = 0.0200)

An Example: Fisher’s Iris Data

Actual

Group

Number of

Observations

Predicted Group

Setosa Versicolo

r

Virginica

Setosa

Versicolor

Virginica

50

50

50

50

0

0

0

47

1

0

3

49

Table 1: Quadratic Discriminant Analysis

(APER = 0.0267)

Page 11: Discriminant Analysis-lecture 8

4/30/2012

11

An Example: Fisher’s Iris Data

Sepal Width

Pet

al W

idth

2.0 2.5 3.0 3.5 4.0

0.5

1.0

1.5

2.0

2.5

ss ss s

ssss

sss

sss

sss ss

s

s

s

s

ss

s

ssss

s

sss s s

ss s

sss

s

ss

ss ss

v

v

v

v

vv

vvv

v

vv

vv

vv

v

vv

v

v

vv

v

v

vv v

v

v

vv

v

vv

vv

vv

v

vv

v

v

v

v

vv

v

v

ccc

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

c

cc

cc

c

c

ccc

c

cc

cc

c ccc

c

c

c

cc

cc

c

c

ss ss s

ssss

sss

sss

sss ss

s

s

s

s

ss

s

ssss

s

sss s s

ss s

sss

s

ss

ss ss

v

vvvv

v

v

v

vvvv

vvvvvv

v

vv

v

vvvv

vvvvvvv

v

v

vv

vv

v

vvv

v

vv

vv

v

v

v

ccc

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

cc

cc

c

c

ccc

c

cc

c

c ccc

c

c

c

cc

cc

c

c

c

ss ss s

ssss

sss

sss

sss ss

s

s

s

s

ss

s

ssss

s

sss s s

ss s

sss

s

ss

ss ss

v

vvvv

v

v

v

vvvv

vvvvvv

v

vv

v

vvvv

vvvvvvv

v

v

vv

vv

v

vvv

v

vv

vv

v

v

v

ccc

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

cc

cc

c

c

ccc

c

cc

c

c ccc

c

c

c

cc

cc

c

c

c

ss ss s

ssss

sss

sss

sss ss

s

s

s

s

ss

s

ssss

s

sss s s

ss s

sss

s

ss

ss ss

v

v

v

v

vv

vvv

v

vv

vv

vv

v

vv

v

v

vv

v

v

vv v

v

v

vv

v

vv

vv

vv

v

vv

v

v

v

v

vv

v

v

ccc

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

c

cc

cc

c

c

ccc

c

cc

cc

c ccc

c

c

c

cc

cc

c

c

ss ss s

ssss

sss

sss

sss ss

s

s

s

s

ss

s

ssss

s

sss s s

ss s

sss

s

ss

ss ss

v

vvvv

v

v

v

vvvv

vvvvvv

v

vv

v

vvvv

vvvvvvv

v

v

vv

vv

v

vvv

v

vv

vv

v

v

v

ccc

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

cc

cc

c

c

ccc

c

cc

c

c ccc

c

c

c

cc

cc

c

c

c

ss ss s

ssss

sss

sss

sss ss

s

s

s

s

ss

s

ssss

s

sss s s

ss s

sss

s

ss

ss ss

v

v

v

v

vv

vvv

v

vv

vv

vv

v

vv

v

v

vv

v

v

vv v

v

v

vv

v

vv

vv

vv

v

vv

v

v

v

v

vv

v

v

ccc

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

c

cc

cc

c

c

ccc

c

cc

cc

c ccc

c

c

c

cc

cc

c

c

ss ss s

ssss

sss

sss

sss ss

s

s

s

s

ss

s

ssss

s

sss s s

ss s

sss

s

ss

ss ss

v

vvvv

v

v

v

vvvv

vvvvvv

v

vv

v

vvvv

vvvvvvv

v

v

vv

vv

v

vvv

v

vv

vv

v

v

v

ccc

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

cc

cc

c

c

ccc

c

cc

c

c ccc

c

c

c

cc

cc

c

c

c

ss ss s

ssss

sss

sss

sss ss

s

s

s

s

ss

s

ssss

s

sss s s

ss s

sss

s

ss

ss ss

v

v

v

v

vv

vvv

v

vv

vv

vv

v

vv

v

v

vv

v

v

vv v

v

v

vv

v

vv

vv

vv

v

vv

v

v

v

v

vv

v

v

ccc

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

c

cc

cc

c

c

ccc

c

cc

cc

c ccc

c

c

c

cc

cc

c

c

An Example: Fisher’s Iris Data

Sepal Width

Pet

al W

idth

2.0 2.5 3.0 3.5 4.0

0.5

1.0

1.5

2.0

2.5

ss ss s

ssss

sss

sss

sss ss

s

s

s

s

ss

s

ssss

s

sss s s

ss s

sss

s

ss

ss ss

v

v

v

v

vv

vvv

v

vv

vv

vv

v

vv

v

v

vv

v

v

vv v

v

v

vv

v

vv

vv

vv

v

vv

v

v

v

v

vv

v

v

ccc

c

c

c

c

c

cc

c

c

c

cc

cc

c

c

c

c

c

c

cc

cc

c

c

ccc

c

cc

cc

c ccc

c

c

c

cc

cc

c

c

Sepal Width

Pet

al W

idth

2.0 2.5 3.0 3.5 4.0

0.5

1.0

1.5

2.0

2.5

++ ++ +

++++

+++

+++

+++ ++

+

+

+

+

++

+

++++

+

+++ + +

++ +

+++

+

++

++ ++

o

oooo

o

o

o

oooo

oooooo

o

oo

o

oooo

ooooooo

o

o

oo

oo

o

ooo

o

oo

oo

o

o

o

xxx

x

x

x

x

x

xx

x

x

x

xx

xx

x

x

x

x

x

xx

xx

x

x

xxx

x

xx

x

x xxx

x

x

x

xx

xx

x

x

x

Page 12: Discriminant Analysis-lecture 8

4/30/2012

12

Summary

LDA is a powerful tool available for

classification.

Widely implemented through various

software

Theoretical properties well

researched