1
CS434a/541a: Pattern RecognitionProf. Olga Veksler
Lecture 4
2
Outline
� Normal Random Variable� Properties� Discriminant functions
3
Why Normal Random Variables?
� Analytically tractable� Works well when observation comes
form a corrupted single prototype (µµµµ)� Is an optimal distribution of data for
many classifiers used in practice
4
The Univariate Normal Density
Where: µµµµ = mean (or expected value) of x
σσσσ2 = variance
,x
21
exp21
)x(p2
������������
����
������������
������������
����
���� −−−−−−−−====σσσσ
µµµµσσσσππππ
� x is a scalar (has dimension 1)
5
6
Several Features
� What if we have several features x1, x2, …, xd� each normally distributed� may have different means� may have different variances� may be dependent or independent of each other
� How do we model their joint distribution?
7
covariance of x1 and xd
inverse of ΣΣΣΣ
determinant of ΣΣΣΣ
� Multivariate normal density in d dimensions is:
��������
������������
���� −−−−−−−−−−−−==== −−−− )x()x(21
exp)2(
1)x(p 1t
2/12/dµµµµΣΣΣΣµµµµ
ΣΣΣΣππππ
The Multivariate Normal Density
��������
����
����
��������
����
����====
2d1d
d121
σσσσσσσσ
σσσσσσσσΣΣΣΣ
�
���
�
µµµµ = [µµµµ1, µµµµ2, …, µµµµd]t
x = [x1, x2, …, xd]t
� Each xi is N(µµµµi,σσσσi2)
� to prove this, integrate out all other features from the joint density
8
More on ΣΣΣΣ
� plays role similar to the role that σσσσ2 plays in one dimension
� From ΣΣΣΣ we can find out 1. The individual variances of features
x1, x2, …, xd
2. If features xi and xj are � independent σσσσij=0� have positive correlation σσσσij>0� have negative correlation σσσσij<0
��������
����
����
��������
����
����====
2d1d
d121
σσσσσσσσ
σσσσσσσσΣΣΣΣ
�
���
�
9
The Multivariate Normal Density
� If ΣΣΣΣ is diagonal then the features xi,…, xj are independent, and
∏∏∏∏====
��������
������������
���� −−−−−−−−====d
i i
ii
i
xxp
1
2
22)(
exp2
1)(
σσσσµµµµ
ππππσσσσ
������������
����
����
������������
����
����
23
22
21
000000
σσσσσσσσ
σσσσ
10
scalar s (single number), the closer s to 0 the larger is p(x)normalizing
constant
��������
������������
���� −−−−−−−−−−−−==== −−−− )x()x(21
exp)2(
1)x(p 1t
2/12/dµµµµΣΣΣΣµµµµ
ΣΣΣΣππππ
The Multivariate Normal Density
[[[[ ]]]]������������
����
����
������������
����
����
��������
����
����
��������
����
����
−−−−−−−−−−−−
������������
����
����
������������
����
����
−−−−−−−−−−−−−−−−⋅⋅⋅⋅====
−−−−
33
22
11
1
232313
232212
131221
332211
xxx
xxx21
expc)x(pµµµµµµµµµµµµ
σσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσ
µµµµµµµµµµµµ
� Thus P(x) is larger for smaller )x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−
)x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−
� ΣΣΣΣ is positive semi definite (xtΣΣΣΣ x>=0)� If xtΣΣΣΣ x=0 for nonzero x then det(ΣΣΣΣ)=0. This case is
not interesting, p(x) is not defined1. one feature vector is a constant (has zero
variance) 2. or two components are multiples of each other
� so we will assume ΣΣΣΣ is positive definite (xtΣΣΣΣ x >0)
� If ΣΣΣΣ is positive definite then so is Σ Σ Σ Σ −−−−1111
)x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−
t
t
21
21
1 ΜΜΜΜΜΜΜΜΦΛΦΛΦΛΦΛΦΛΦΛΦΛΦΛΣΣΣΣ ====������������
����
����������������
����
����====
−−−−−−−−−−−−
� Thus if ΛΛΛΛ−−−−1/2 1/2 1/2 1/2 denotes matrix s.t. ΛΛΛΛ−−−−1/2 1/2 1/2 1/2 ΛΛΛΛ−−−−1/2 1/2 1/2 1/2 = ΛΛΛΛ−−−−1111
� Positive definite matrix of size d by d has d distinct real eigenvalues and its d eigenvectors are orthogonal
� Thus if ΦΦΦΦ is a matrix whose columns are normalized eigenvectors of ΣΣΣΣ, then Φ Φ Φ Φ -1= Φ Φ Φ Φ t
� ΣΦΣΦΣΦΣΦ =ΦΛΦΛΦΛΦΛ where ΛΛΛΛ is a diagonal matrix with corresponding eigenvalues on the diagonal
� Thus ΣΣΣΣ =ΦΛΦΦΛΦΦΛΦΦΛΦ−−−−1111 and ΣΣΣΣ −−−−1111 =ΦΛΦΛΦΛΦΛ−−−−1111 Φ Φ Φ Φ −−−−1111
)x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−
� Thus
� Thus
where
====−−−−−−−−====−−−−−−−− −−−− )x(MM)x()x()x( tt1t µµµµµµµµµµµµΣΣΣΣµµµµ
(((( )))) (((( )))) 2tttt )x(M)x(M)x(M µµµµµµµµµµµµ −−−−====−−−−−−−−====
21 )()()( µµµµµµµµµµµµ −−−−====−−−−ΣΣΣΣ−−−− −−−− xMxx tt
121
tM −−−−−−−−==== ΦΦΦΦΛΛΛΛ
� Points x which satisfy lie on an
ellipse
constxMt ====−−−−2
)( µµµµ
rotation matrix
scaling matrix
)x()x( t µµµµµµµµ −−−−−−−−
µ
points x at equal Eucledian
distance from µµµµlie on a circle
µ
points x at equal Mahalanobis distance from
µµµµ lie on an ellipse: Σstretches circles to ellipses
)x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−
usual (Eucledian) distance between x and µµµµ
p(x) decreases
−−−−−−−− −−−− )x()x( 1t µµµµµµµµMahalanobis distance
between x and µµµµ
p(x) decreases slowp(
x) d
ecre
ases
fast
eigenvectors of ΣΣΣΣ
15
2-d Multivariate Normal Density
� Can you see much in this graph?
� At most you can see that the mean is around [0,0], but can’t really tell if x1 and x2 are correlated
16
2-d Multivariate Normal Density
� How about this graph?
17
2-d Multivariate Normal Density
� Level curves graph� p(x) is constant along
each contour� topological map of 3-d
surface
� Now we can see much more � x1 and x1 are independent� σσσσ1
2 and σσσσ22 are equal
µµµµ
18
2-d Multivariate Normal Density
������������
������������==== 10
01ΣΣΣΣ
[[[[ ]]]]0,0====µµµµ������������
������������==== 10
01ΣΣΣΣ
[[[[ ]]]]1,1====µµµµ
������������
������������==== 40
01ΣΣΣΣ
[[[[ ]]]]0,0====µµµµ
������������
������������==== 10
04ΣΣΣΣ
[[[[ ]]]]0,0====µµµµ
2-d Multivariate Normal Density
������������
������������==== 15.0
5.01ΣΣΣΣ
[[[[ ]]]]0,0====µµµµ
������������
������������==== 19.0
9.01ΣΣΣΣ������������
������������==== 49.0
9.01ΣΣΣΣ
������������
������������−−−−
−−−−==== 15.05.01ΣΣΣΣ
������������
������������−−−−
−−−−==== 19.09.01ΣΣΣΣ
������������
������������−−−−
−−−−==== 49.09.01ΣΣΣΣ
20
The Multivariate Normal Density
� If X has density N(µµµµ,ΣΣΣΣ) then AX has density N(ΑΑΑΑtµµµµ,ΑΑΑΑtΣΑΣΑΣΑΣΑ)� Thus X can be transformed into a spherical normal
variable (covariance of spherical density is the identity matrix I) with whitening transform
21−−−−
ΦΛΦΛΦΛΦΛ====wA
������������
������������==== 19.0
9.01ΣΣΣΣ������������
������������==== 10
01ΣΣΣΣ
X AX
Discriminant Functions� Classifier can be viewed as network which
computes m discriminant functions and selects category corresponding to the largest discriminant
features
discriminantfunctions
select classgiving maximim
(((( ))))xg2 (((( ))))xgm(((( ))))xg1
1x 2x 3x dx
� gi(x) can be replaced with any monotonically increasing function, the results will be unchanged
22
Discriminant Functions
� The minimum error-rate classification is achieved by the discriminant function
gi(x) = P(ci |x)=P(x|ci)P(ci)/P(x)
� Since the observation x is independent of the class, the equivalent discriminant function is
gi(x) = P(x|ci)P(ci)
� For normal density, convinient to take logarithms. Since logarithm is a monotonically increasing function, the equivalent discriminant function is
gi(x) = ln P(x|ci)+ ln P(ci)
Discriminant Functions for the Normal Density
��������
������������
���� −−−−ΣΣΣΣ−−−−−−−−ΣΣΣΣ
==== −−−− )()(21
exp)2(
1)|( 1
2/12/ iit
ii
di xxcxp µµµµµµµµππππ
)(lnln21
2ln2
)()(21
)( 1iiii
tii cP
dxxxg ++++ΣΣΣΣ−−−−−−−−−−−−ΣΣΣΣ−−−−−−−−==== −−−− ππππµµµµµµµµ
� Plug in p(x|ci) and P(ci) get
� Discriminant function gi(x) = ln P(x|ci)+ ln P(ci)
� Suppose we for class ci its class conditional density p(x|ci) is N(µi,Σi)
)(lnln21
)()(21
)( 1iiii
tii cPxxxg ++++ΣΣΣΣ−−−−−−−−ΣΣΣΣ−−−−−−−−==== −−−− µµµµµµµµ
constant for all i
24
� That is
Case ΣΣΣΣi = σσσσ2I
������������
����
������������
����⋅⋅⋅⋅====
��������
����
����
��������
����
����====
100010001
000000
2
2
2
2
i σσσσσσσσ
σσσσσσσσ
� In this case, features x1, x2. ,…, xd are independent with different means and equal variances σ2
µµµµ
25
� Discriminant function
Case ΣΣΣΣi = σσσσ2I
)(lnln21
)()(21
)( 1iii
tii cPxxxg ++++ΣΣΣΣ−−−−−−−−−−−−−−−−==== −−−− µµµµµµµµ
� Can simplify discriminant function
(((( )))) )(lnln21
)()(21
)( 22 i
di
tii cPx
Ixxg ++++−−−−−−−−−−−−−−−−==== σσσσµµµµ
σσσσµµµµ
constant for all i
====++++−−−−−−−−−−−−==== )(ln)()(2
1)( 2 ii
tii cPxxxg µµµµµµµµ
σσσσ
)(ln2
1 2
2 ii cPx ++++−−−−−−−−==== µµµµσσσσ
� Det(ΣΣΣΣi)=σσσσ2d and ΣΣΣΣi-1=(1/σσσσ2)I
������������������������
����
����
������������������������
����
����
====
2
2
2
100
01
0
001
σσσσ
σσσσ
σσσσ
Case ΣΣΣΣi = σσσσ2I Geometric Interpretation
(((( )))) 2ii xxg µµµµ−−−−−−−−====
then),c(Pln)c(PlnIf ji ====
(((( )))) )c(Plnx2
1xg i
2i2i ++++−−−−−−−−==== µµµµ
σσσσ
then),c(Pln)c(PlnIf ji ≠≠≠≠
decision region for c2
decision region for c3
decision region for c1
in c3
1µµµµ
2µµµµ
3µµµµ
decision region for c2
decision region for c3
decision region for c1
x
1µµµµ
2µµµµ
3µµµµ
voronoi diagram: points in each cell are closer to the mean in that cell
than to any other mean
27
Case ΣΣΣΣi = σσσσ2I====++++−−−−−−−−−−−−==== )(ln)()(
21
)( 2 iit
ii cPxxxg µµµµµµµµσσσσ
)c(Pln)xxxx(2
1ii
tii
tti
t2 ++++++++−−−−−−−−−−−−==== µµµµµµµµµµµµµµµµ
σσσσconstant
for all classes
====++++++++−−−−−−−−==== )c(Pln)x2(2
1)x(g ii
ti
ti2i µµµµµµµµµµµµ
σσσσ
0)( itii wxwxg ++++====
))c(Pln2
(x i2i
ti
2
ti ++++−−−−++++
σσσσµµµµµµµµ
σσσσµµµµ
discriminant function is linear
Case ΣΣΣΣi = σσσσ2I
0)( itii wxwxg ++++====
� Thus discriminant function is linear,� Therefore the decision boundaries
gi(x)=gj(x) are linear� lines if x has dimension 2� planes if x has dimension 3� hyper-planes if x has dimension larger than 3
linear in x:
====
====d
iii
ti xwxw
1
constant in x
29
Case ΣΣΣΣi = σσσσ2I: Example� 3 classes, each 2-dimensional Gaussian with
������������
������������==== 21
1µµµµ������������
������������==== 64
2µµµµ������������
������������−−−−==== 4
23µµµµ ��������
����������������============ 30
03321 ΣΣΣΣΣΣΣΣΣΣΣΣ
� Priors and(((( )))) (((( ))))41
cPcP 21 ======== (((( ))))21
cP 3 ====
������������
����
����++++−−−−++++==== )c(Pln
2x)x(g i2
iti
2
ti
i σσσσµµµµµµµµ
σσσσµµµµ
� Discriminant function is
[[[[ ]]]])38.1
65
(x321
)x(g1 −−−−−−−−++++====
� Plug in parameters for each class[[[[ ]]]]
)38.1652
(x364
)x(g2 −−−−−−−−++++====
[[[[ ]]]])69.0
620
(x3
42)x(g3 −−−−−−−−++++
−−−−====
30
Case ΣΣΣΣi = σσσσ2I: Example� Need to find out when gi(x) < gj(x) for i,j=1,2,3
� Can be done by solving gi(x) = gj(x) for i,j=1,2,3
� Let’s take g1(x) = g2(x) first
� Simplifying, [[[[ ]]]]647
xx
343
2
1 −−−−====������������
��������
����−−−−−−−−
647
x34
x 21 −−−−====−−−−−−−−
line equation
[[[[ ]]]] [[[[ ]]]])38.1
652
(x364
)38.165
(x321 −−−−−−−−++++====−−−−−−−−++++
31
Case ΣΣΣΣi = σσσσ2I: Example� Next solve g2(x) = g3(x)
02.6x32
x2 21 ====++++
� Almost finally solve g1(x) = g3(x)
81.1x32
x 21 −−−−====−−−−
� And finally solve g1(x) = g2(x) = g3(x)
82.4xand4.1x 21 ========
32
Case ΣΣΣΣi = σσσσ2I: Example
� Priors and(((( )))) (((( ))))41
cPcP 21 ========
c1
c2
c3
(((( ))))21
cP 3 ====
g1 (x) = g
2 (x)
g2 (x) = g
3 (x)
g 1(x
) = g 3
(x)
lines connecting means
are perpendicular to decision boundaries
33
� Covariance matrices are equal but arbitrary
Case ΣΣΣΣi = ΣΣΣΣ
� In this case, features x1, x2. ,…, xd are not necessarily independent
������������
������������==== 15.0
5.01ΣΣΣΣ
squared Mahalanobis Distance
� Discriminant function
Case ΣΣΣΣi = ΣΣΣΣ
)(lnln21
)()(21
)( 1iii
tii cPxxxg ++++ΣΣΣΣ−−−−−−−−−−−−−−−−==== −−−− µµµµµµµµ
constantfor all classes
� Discriminant function becomes
)c(Pln)x()x(21
)x(g ii1t
ii ++++−−−−−−−−−−−−==== −−−− µµµµµµµµ
� Mahalanobis Distance −−−−−−−−====−−−− −−−−−−−− )yx()yx(yx 1t2
1ΣΣΣΣ
� If ΣΣΣΣ=I, Mahalanobis Distance becomes usual Eucledian distance
)yx()yx(yx t2
I 1 −−−−−−−−====−−−− −−−−
Eucledian vs. Mahalanobis Distances
−−−−−−−−====−−−− −−−−ΣΣΣΣ−−−− )()( 12
1 µµµµµµµµµµµµ xxx t)()(2 µµµµµµµµµµµµ −−−−−−−−====−−−− xxx t
µ
points x at equal Eucledian
distance from µµµµlie on a circle
µ
points x at equal Mahalanobis distance from
µµµµ lie on an ellipse:Σ stretches cirles to ellipses
eigenvectors of Σ
Case ΣΣΣΣi = ΣΣΣΣ Geometric Interpretation
(((( )))) 1ii xxg −−−−−−−−−−−−==== ΣΣΣΣµµµµ
then),c(Pln)c(PlnIf ji ====
(((( )))) )c(Plnx21
xg iii 1 ++++−−−−−−−−==== −−−−ΣΣΣΣµµµµ
then),c(Pln)c(PlnIf ji ≠≠≠≠
1µµµµ
2µµµµ
3µµµµ
decision region for c2
decision region for c3
decision region for c1
1µµµµ
2µµµµ
3µµµµ
decision region for c2
decision region for c3
decision region for c1
points in each cell are closer to the mean in that cell than to any other mean under Mahalanobis distance
Case ΣΣΣΣi = ΣΣΣΣ� Can simplify discriminant function:
====++++−−−−−−−−−−−−==== −−−− )(ln)()(21
)( 1ii
tii cPxxxg µµµµµµµµ
(((( )))) ====++++ΣΣΣΣ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−ΣΣΣΣ−−−−==== −−−−−−−−−−−−−−−− )(ln21 1111
iitii
tti
t cPxxxx µµµµµµµµµµµµµµµµ
(((( )))) ====++++ΣΣΣΣ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−==== −−−−−−−−−−−− )(ln221 111
iiti
ti
t cPxxx µµµµµµµµµµµµ
constant for all classes
(((( )))) )(ln221 11
iiti
ti cPx ++++ΣΣΣΣ++++ΣΣΣΣ−−−−−−−−==== −−−−−−−− µµµµµµµµµµµµ
� Thus in this case discriminant is also linear
0iti wxw ++++====����
����
����
���� −−−−++++==== −−−−−−−−i
1tii
1ti 2
1)c(Plnx µµµµΣΣΣΣµµµµΣΣΣΣµµµµ
Case ΣΣΣΣi = ΣΣΣΣ: Example� 3 classes, each 2-dimensional Gaussian with
������������
������������==== 21
1µµµµ������������
������������−−−−==== 5
12µµµµ
������������
������������−−−−==== 4
23µµµµ
������������
������������−−−−
−−−−============ 45.15.11
321 ΣΣΣΣΣΣΣΣΣΣΣΣ
(((( )))) (((( ))))41
cPcP 21 ======== (((( ))))21
cP 3 ====
� Again can be done by solving gi(x) = gj(x) for i,j=1,2,3
scalarrow vector
Case ΣΣΣΣi = ΣΣΣΣ: Example
��������
����
���� −−−−++++====��������
����
���� −−−−++++ −−−−−−−−−−−−−−−−i
1tii
1tij
1tjj
1tj 2
1)c(Plnx
21
)c(Plnx µµµµΣΣΣΣµµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµ
(((( )))) ��������
����
���� −−−−++++��������
����
���� −−−−−−−−====−−−− −−−−−−−−−−−−−−−−i
1tiij
1tjj
1ti
1tj 2
1)c(Pln
21
)c(Plnx µµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµΣΣΣΣµµµµ
(((( ))))��������
����
����
����−−−−++++====−−−− −−−−−−−−−−−−
i1t
ij1t
jj
i1ti
tj 2
121
)c(P)c(P
lnx µµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµµµµµ
� Let’s solve in general first
(((( )))) (((( ))))xgxg ij ====
� Let’s regroup the terms
� We get the line where (((( )))) (((( ))))xgxg ij ====
Case ΣΣΣΣi = ΣΣΣΣ: Example
[[[[ ]]]] 0x02 ====−−−−
� Now substitute for i,j=1,2
[[[[ ]]]] 41.2x4.114.3 −−−−====−−−−−−−−
� Now substitute for i,j=2,3
[[[[ ]]]] 41.2x43.114.5 −−−−====−−−−−−−−� Now substitute for i,j=1,3
(((( ))))��������
����
����
����−−−−++++====−−−− −−−−−−−−−−−−
i1t
ij1t
jj
i1ti
tj 2
121
)c(P)c(P
lnx µµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµµµµµ
0x1 ====
41.2x4.1x14.3 21 ====++++
41.2x43.1x14.5 21 ====++++
41
Case ΣΣΣΣi = ΣΣΣΣ : Example
� Priors and(((( )))) (((( ))))41
cPcP 21 ======== (((( ))))21
cP 3 ====
lines connecting means
are not in general perpendicular to
decision boundaries
c2
c3
c1
42
� Covariance matrices for each class are arbitrary
General Case ΣΣΣΣi are arbitrary
� In this case, features x1, x2. ,…, xd are not necessarily independent
������������
������������====ΣΣΣΣ 15.0
5.01i ��������
����������������−−−−
−−−−====ΣΣΣΣ 49.09.01
j
43
� From previous discussion,
General Case ΣΣΣΣi are arbitrary
� This can’t be simplified, but we can rearrange it:
)(lnln21
)()(21
)( 1iiii
tii cPxxxg ++++ΣΣΣΣ−−−−−−−−ΣΣΣΣ−−−−−−−−==== −−−− µµµµµµµµ
(((( )))) )(lnln21
221
)( 111iiii
tii
tii
ti cPxxxxg ++++ΣΣΣΣ−−−−ΣΣΣΣ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−==== −−−−−−−−−−−− µµµµµµµµµµµµ
��������
����
���� ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−++++ΣΣΣΣ++++��������
����
���� ΣΣΣΣ−−−−==== −−−−−−−−−−−− )(lnln21
21
21
)( 111iiii
tii
tii
ti cPxxxxg µµµµµµµµµµµµ
0)( itt
i wxwWxxxg ++++++++====
44
General Case ΣΣΣΣi are arbitrary
0)( itt
i wxwWxxxg ++++++++====
� Thus the discriminant function is quadratic� Therefore the decision boundaries are quadratic
(ellipses and parabolloids)
quadratic in x since
======== ====
========d
jijiij
d
j
d
ijiij
t xxwxxwWxx1,1 1
linear in x
constant in x
� 3 classes, each 2-dimensional Gaussian with
������������
������������−−−−==== 3
11µµµµ
������������
������������==== 60
2µµµµ������������
������������−−−−==== 4
23µµµµ
������������
������������−−−−
−−−−==== 25.05.01
1ΣΣΣΣ
(((( )))) (((( ))))41
cPcP 21 ======== (((( ))))21
cP 3 ====
� Again can be done by solving gi(x) = gj(x) for i,j=1,2,3
General Case ΣΣΣΣi are arbitrary: Example
������������
������������−−−−
−−−−==== 7222
2ΣΣΣΣ������������
������������==== 35.1
5.113ΣΣΣΣ
��������
����
���� ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−++++ΣΣΣΣ++++��������
����
���� ΣΣΣΣ−−−−==== −−−−−−−−−−−− )(lnln21
21
21
)( 111iiii
tii
tii
ti cPxxxxg µµµµµµµµµµµµ
� Need to solve a bunch of quadratic inequalities of 2 variables
� Priors: and
General Case ΣΣΣΣi are arbitrary: Example
������������
������������−−−−==== 3
11µµµµ
������������
������������==== 60
2µµµµ������������
������������−−−−==== 4
23µµµµ
������������
������������−−−−
−−−−==== 25.05.01
1ΣΣΣΣ������������
������������−−−−
−−−−==== 7222
2ΣΣΣΣ������������
������������==== 35.1
5.113ΣΣΣΣ
(((( )))) (((( ))))41
cPcP 21 ======== (((( ))))21
cP 3 ====
c2
c3 c1
c1
� The Bayes classifier when classes are normally distributed is in general quadratic� If covariance matrices are equal and proportional to
identity matrix, the Bayes classifier is linear� If, in addition the priors on classes are equal, the Bayes
classifier is the minimum Eucledian distance classifier� If covariance matrices are equal, the Bayes
classifier is linear� If, in addition the priors on classes are equal, the Bayes
classifier is the minimum Mahalanobis distance classifier
� Popular classifiers (Euclidean and Mahalanobis distance) are optimal only if distribution of data is appropriate (normal)
Important Points