discriminant analysis-lecture 8
TRANSCRIPT
4/30/2012
1
Linear Discriminant Analysis
Proposed by Fisher (1936) for
classifying an observation into one of
two possible groups based on many
measurements x1,x2,…xp.
Seek a linear transformation of the
variables Y=w1x1+w2x2+..+wpxp + a constant
4/30/2012
2
Linear Discriminant Analysis
Discriminant analysis – creates an equation which will minimize the possibility of misclassifying cases into their respective groups or categories.
The purposes of discriminant analysis (DA)
Discriminant Function Analysis (DA) undertakes the same task as multiple linear regression by predicting an outcome.
However, multiple linear regression is limited to cases where the dependent is numerical
But many interesting variables are categorical,
4/30/2012
3
The objective of DA is to perform dimensionality reduction while preserving as much of the class discriminatory information as possible
Assume we have a set of D-dimensional samples {x 1, x2, …, xN}, N1 of which belong to class ω1, and N2 to class ω2.
We seek to obtain a scalar y by projecting the samples x onto a line
y = wTx
•The top two distributions overlap too much and do not
discriminate too well compared to the bottom set.
•Misclassification will be minimal in the lower pair,
•whereas many will be misclassified in the top pair.
4/30/2012
4
Linear Discriminant Analysis
Assume variance matrices equal
Classify the item x at hand to one of J groups
based on measurements on p predictors.
Rule: Assign x to group j that has the closest mean
j = 1, 2, …, J
Distance Measure: Mahalanobis Distance.
Linear Discriminant Analysis
Distance Measure:
For j = 1, 2, …, J, compute
T
1
j jj plx x xd Sx x
Assign x to the group for which dj is minimum
S pl is the pooled estimate of the covariance
matrix
4/30/2012
5
…or equivalently, assign x to the
group for which
xSxSxL jpl
T
jpl
T
jjxx
11
2
1
is a maximum.
(Notice the linear form of the equation!)
Linear Discriminant Analysis
…optimal if….
• Multivariate normal distribution for the
observation in each of the groups
• Equal covariance matrix for all groups
• Equal prior probability for each group
• Equal costs for misclassification
4/30/2012
6
Relaxing the assumption of equal prior
probabilities…
xSxSxpL jpl
T
jpl
T
jj
xjx11
2
1ln
p j being the prior probability for the jth
group.
Relaxing the assumption of equal
covariance matrices…
jx
jjx
xxSx
SpQ
j
T
j
j
1
ln2
1ln
result?…Quadratic DiscriminantAnalysis
4/30/2012
7
Quadratic Discriminant Analysis
Rule: assign to group j if is
the largest.
xQj
Optimal if
the J groups of measurements are
multivariate normal
Other Extensions & Related MethodsRelaxing the assumption of normality…
Kernel density based LDA and QDA
Other extensions…..
Regularized discriminant analysis
Penalized discriminant analysis
Flexible discriminant analysis
4/30/2012
8
Evaluations of the Methods
Classification Table (confusion matrix)
Actual group Number of
observations
Predicted group
A B
A
B
nA
nB
n11
n21
n12
n22
Evaluations of the Methods
Apparent Error Rate (APER):
….underestimates the actual error rate.
Improved estimate of APER:
Holdout Method or cross validation
# misclassifiedAPER =
Total # of cases
4/30/2012
9
Fisher's iris dataset
•The data were collected by Anderson and used
by Fisher to formulate the linear discriminant
analysis (LDA or DA).
•The dataset gives the measurements in
centimeters of the following variables:
1- sepal length, 2- sepal width, 3- petal length,
and 4- petal width,
this for 50 fowers from each of the 3 species of
iris considered.
•The species considered are Iris setosa,
versicolor, and virginica
setosa versicolor virginica
4/30/2012
10
An Example: Fisher’s Iris Data
Actual
Group
Number of
Observations
Predicted Group
Setosa Versicolo
r
Virginica
Setosa
Versicolor
Virginica
50
50
50
50
0
0
0
48
1
0
2
49
Table 1: Linear Discriminant Analysis
(APER = 0.0200)
An Example: Fisher’s Iris Data
Actual
Group
Number of
Observations
Predicted Group
Setosa Versicolo
r
Virginica
Setosa
Versicolor
Virginica
50
50
50
50
0
0
0
47
1
0
3
49
Table 1: Quadratic Discriminant Analysis
(APER = 0.0267)
4/30/2012
11
An Example: Fisher’s Iris Data
Sepal Width
Pet
al W
idth
2.0 2.5 3.0 3.5 4.0
0.5
1.0
1.5
2.0
2.5
ss ss s
ssss
sss
sss
sss ss
s
s
s
s
ss
s
ssss
s
sss s s
ss s
sss
s
ss
ss ss
v
v
v
v
vv
vvv
v
vv
vv
vv
v
vv
v
v
vv
v
v
vv v
v
v
vv
v
vv
vv
vv
v
vv
v
v
v
v
vv
v
v
ccc
c
c
c
c
c
cc
c
c
c
cc
cc
c
c
c
c
c
c
cc
cc
c
c
ccc
c
cc
cc
c ccc
c
c
c
cc
cc
c
c
ss ss s
ssss
sss
sss
sss ss
s
s
s
s
ss
s
ssss
s
sss s s
ss s
sss
s
ss
ss ss
v
vvvv
v
v
v
vvvv
vvvvvv
v
vv
v
vvvv
vvvvvvv
v
v
vv
vv
v
vvv
v
vv
vv
v
v
v
ccc
c
c
c
c
c
cc
c
c
c
cc
cc
c
c
c
c
c
cc
cc
c
c
ccc
c
cc
c
c ccc
c
c
c
cc
cc
c
c
c
ss ss s
ssss
sss
sss
sss ss
s
s
s
s
ss
s
ssss
s
sss s s
ss s
sss
s
ss
ss ss
v
vvvv
v
v
v
vvvv
vvvvvv
v
vv
v
vvvv
vvvvvvv
v
v
vv
vv
v
vvv
v
vv
vv
v
v
v
ccc
c
c
c
c
c
cc
c
c
c
cc
cc
c
c
c
c
c
cc
cc
c
c
ccc
c
cc
c
c ccc
c
c
c
cc
cc
c
c
c
ss ss s
ssss
sss
sss
sss ss
s
s
s
s
ss
s
ssss
s
sss s s
ss s
sss
s
ss
ss ss
v
v
v
v
vv
vvv
v
vv
vv
vv
v
vv
v
v
vv
v
v
vv v
v
v
vv
v
vv
vv
vv
v
vv
v
v
v
v
vv
v
v
ccc
c
c
c
c
c
cc
c
c
c
cc
cc
c
c
c
c
c
c
cc
cc
c
c
ccc
c
cc
cc
c ccc
c
c
c
cc
cc
c
c
ss ss s
ssss
sss
sss
sss ss
s
s
s
s
ss
s
ssss
s
sss s s
ss s
sss
s
ss
ss ss
v
vvvv
v
v
v
vvvv
vvvvvv
v
vv
v
vvvv
vvvvvvv
v
v
vv
vv
v
vvv
v
vv
vv
v
v
v
ccc
c
c
c
c
c
cc
c
c
c
cc
cc
c
c
c
c
c
cc
cc
c
c
ccc
c
cc
c
c ccc
c
c
c
cc
cc
c
c
c
ss ss s
ssss
sss
sss
sss ss
s
s
s
s
ss
s
ssss
s
sss s s
ss s
sss
s
ss
ss ss
v
v
v
v
vv
vvv
v
vv
vv
vv
v
vv
v
v
vv
v
v
vv v
v
v
vv
v
vv
vv
vv
v
vv
v
v
v
v
vv
v
v
ccc
c
c
c
c
c
cc
c
c
c
cc
cc
c
c
c
c
c
c
cc
cc
c
c
ccc
c
cc
cc
c ccc
c
c
c
cc
cc
c
c
ss ss s
ssss
sss
sss
sss ss
s
s
s
s
ss
s
ssss
s
sss s s
ss s
sss
s
ss
ss ss
v
vvvv
v
v
v
vvvv
vvvvvv
v
vv
v
vvvv
vvvvvvv
v
v
vv
vv
v
vvv
v
vv
vv
v
v
v
ccc
c
c
c
c
c
cc
c
c
c
cc
cc
c
c
c
c
c
cc
cc
c
c
ccc
c
cc
c
c ccc
c
c
c
cc
cc
c
c
c
ss ss s
ssss
sss
sss
sss ss
s
s
s
s
ss
s
ssss
s
sss s s
ss s
sss
s
ss
ss ss
v
v
v
v
vv
vvv
v
vv
vv
vv
v
vv
v
v
vv
v
v
vv v
v
v
vv
v
vv
vv
vv
v
vv
v
v
v
v
vv
v
v
ccc
c
c
c
c
c
cc
c
c
c
cc
cc
c
c
c
c
c
c
cc
cc
c
c
ccc
c
cc
cc
c ccc
c
c
c
cc
cc
c
c
An Example: Fisher’s Iris Data
Sepal Width
Pet
al W
idth
2.0 2.5 3.0 3.5 4.0
0.5
1.0
1.5
2.0
2.5
ss ss s
ssss
sss
sss
sss ss
s
s
s
s
ss
s
ssss
s
sss s s
ss s
sss
s
ss
ss ss
v
v
v
v
vv
vvv
v
vv
vv
vv
v
vv
v
v
vv
v
v
vv v
v
v
vv
v
vv
vv
vv
v
vv
v
v
v
v
vv
v
v
ccc
c
c
c
c
c
cc
c
c
c
cc
cc
c
c
c
c
c
c
cc
cc
c
c
ccc
c
cc
cc
c ccc
c
c
c
cc
cc
c
c
Sepal Width
Pet
al W
idth
2.0 2.5 3.0 3.5 4.0
0.5
1.0
1.5
2.0
2.5
++ ++ +
++++
+++
+++
+++ ++
+
+
+
+
++
+
++++
+
+++ + +
++ +
+++
+
++
++ ++
o
oooo
o
o
o
oooo
oooooo
o
oo
o
oooo
ooooooo
o
o
oo
oo
o
ooo
o
oo
oo
o
o
o
xxx
x
x
x
x
x
xx
x
x
x
xx
xx
x
x
x
x
x
xx
xx
x
x
xxx
x
xx
x
x xxx
x
x
x
xx
xx
x
x
x
4/30/2012
12
Summary
LDA is a powerful tool available for
classification.
Widely implemented through various
software
Theoretical properties well
researched