1 e. fatemizadeh statistical pattern recognition
TRANSCRIPT
![Page 1: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/1.jpg)
1
E. Fatemizadeh
Statistical Pattern
Recognition
![Page 2: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/2.jpg)
2
Typical application areas Machine vision Character recognition (OCR) Computer aided diagnosis Speech recognition Face recognition Biometrics Image Data Base retrieval Data mining Bionformatics
The task: Assign unknown objects – patterns – into the correct class. This is known as classification.
PATTERN RECOGNITION
![Page 3: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/3.jpg)
3
Features: These are measurable quantities obtained from the patterns, and the classification task is based on their respective values.
Feature vectors: A number of features
constitute the feature vector
Feature vectors are treated as random vectors.
,,...,1 lxx
lTl Rxxx ,...,1
![Page 4: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/4.jpg)
4
An example:
![Page 5: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/5.jpg)
5
The classifier consists of a set of functions, whose values, computed at , determine the class to which the corresponding pattern belongs
Classification system overview
x
sensor
featuregeneration
feature selection
classifierdesign
systemevaluation
Patterns
![Page 6: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/6.jpg)
6
Supervised – unsupervised pattern recognition:
The two major directions Supervised: Patterns whose class is known a-
priori are used for training. Unsupervised: The number of classes is (in
general) unknown and no training patterns are available.
![Page 7: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/7.jpg)
7
CLASSIFIERS BASED ON BAYES DECISION THEORY
Statistical nature of feature vectors
Assign the pattern represented by feature vector to the most probable of the available classes
That is maximum
Tl21 x,...,x,xx
x
M ,...,, 21
)(: xPx ii
![Page 8: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/8.jpg)
8
Computation of a-posteriori probabilities Assume known
• a-priori probabilities
• This is also known as the likelihood of
)()...,(),( 21 MPPP
Mixp i ...2,1,)(
... i to rw x
![Page 9: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/9.jpg)
9
2
1
( ) ( ) ( ) ( )
( ) ( ) Likelihood×Priori
( ) Evidence
( ) ( ) ( )
i i i
i ii
i ii
p x P x p x P
p x PP x
p x
p x p x P
The Bayes rule (Μ=2)
where
![Page 10: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/10.jpg)
10
The Bayes classification rule (for two classes M=2) Given classify it according to the rule
Equivalently: classify according to the rule
For equiprobable classes the test becomes
x
212
121
)()(
)()(
xxPxP
xxPxP
If
If
)()()()()( 2211 PxpPxp
)()()( 21 xPxp
x
![Page 11: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/11.jpg)
11
)()( 2211 RR and
![Page 12: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/12.jpg)
12
Equivalently in words: Divide space in two regions
Probability of error Total shaded area
Bayesian classifier is OPTIMAL with respect to minimising the classification error probability!!!!
22
11
in If
in If
xRx
xRx
0
0
x
1
x
2e dx)x(pdx)x(pP
![Page 13: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/13.jpg)
13
Indeed: Moving the threshold the total shaded area INCREASES by the extra “grey” area.
![Page 14: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/14.jpg)
Minimizing Classification Error ProbabilityMinimizing Classification Error Probability
Our Claim: Bayesian Classifier is Optimal w.r to minimizing the classification error Probability.
14
2 1
2 1
2 1 1 2
2 1 1 1 2 2
1 1 2 2
1 2
, ,
Use Bayes Rule:
e
e
e R R
e R R
P P x R P x R
P P x R P P x R P
P P P x dx P P x dx
P P x p x dx P x p x dx
![Page 15: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/15.jpg)
Minimizing Classification Error ProbabilityMinimizing Classification Error Probability
R1 and R2 are Union of space:
15
1 2
1
1 1 1
1 1 2
1 1 2
Thus:
:
R R
e R
P x p x dx P x p x dx P
P P P x P x p x dx
R P x P x
![Page 16: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/16.jpg)
16
The Bayes classification rule for many (M>2) classes: Given classify it to if:
Such a choice also minimizes the classification error probability
Minimizing the average risk For each wrong decision, a penalty term is assigned
since some decisions are more sensitive than others
ijxPxP ji )()(
x i
![Page 17: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/17.jpg)
17
For M=2• Define the loss matrix
• penalty term for deciding class ,although the pattern belongs to , etc.
Risk with respect to
)(2221
1211
L
12
1
1
1 2
12 11 11 1 ( )( )R R
r p x d x p x d x
2
![Page 18: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/18.jpg)
18
Risk with respect to
RED lighted:
Average risk
2
1 2
2 2221 2 2( ( ))R R
p x d xr p x d x
)()( 2211 PrPrr
Probabilities of wrong decisions, weighted by the penalty terms
![Page 19: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/19.jpg)
19
Choose and so that r is minimized
Then assign to if
Equivalently:
assign x in if
: likelihood ratio
1R 2R
x i
)( 21
1112
2221
1
2
2
112 )(
)()(
)(
)(
P
P
xp
xp
12
)()()()(
)()()()(
222211122
222111111
PxpPxp
PxpPxp
![Page 20: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/20.jpg)
20
1 2 11 22
211 1 2
12
122 1 2
21
21 12
1( ) ( ) and 0
2
if
if
if Minimum classification
error probability
P P
x P x P x
x P x P x
If
![Page 21: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/21.jpg)
21
An example:
00.1
5.00
2
1)()(
))1(exp(1
)(
)exp(1
)(
21
22
21
L
PP
xxp
xxp
![Page 22: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/22.jpg)
22
Then the threshold value is:
Threshold for minimum r
2
1
))1(exp()exp( :
: minimumfor
0
220
0
x
xxx
Px e
2
1
2
)21(ˆ
))1((exp2)(exp :ˆ
0
220
n
x
xxx
0x̂
![Page 23: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/23.jpg)
23
Thus moves to the left of (WHY?)
0x̂ 02
1x
![Page 24: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/24.jpg)
MinMax CriteriaMinMax Criteria
Goal:Minimizing Maximum Possible Overall Risk
(No Priori Knowledge)
24
1 2
1
2
2
11 1 1 21 2 2
12 1 1 22 2 2
1 1 1
( ) ( ) ( ) ( )
( ) ( )
Use This Fact: ( ) 1 ( ) 1
)
&
( ( ) ,
R
R
R R
R p x P p x P dx
p x P p x P
P P p x dx p x
d
x
x
d
![Page 25: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/25.jpg)
MinMax CriteriaMinMax Criteria
With Some Simplification:
25
1
2 1
2 1
1
1 22 21 22 2
1 11 22 12 11 1 21 22 2
1
11 22 12 11 1 21 22 2
22 21 22 2 11 12
( )
( )
Our Goal: Minimize Effect of ( )
0
R
R R
R R
mM
R
R P p x dx
P p x dx p x dx
P
p x dx p x dx
R p x dx
2
11 1
R
p x dx
![Page 26: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/26.jpg)
MinMax CriteriaMinMax Criteria
A Simple Case (λ11= λ22=0, λ12= λ21=1)
26
2 1
1 2
R R
p x dx p x dx
![Page 27: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/27.jpg)
27
If are contiguous:
is the surface separating the regions. On one side is positive (+), on the other is negative (-). It is known as Decision Surface
)()( :
)()( :
xPxPR
xPxPR
ijj
jii
ji RR , 0)()()( xPxPxg ji
+ - 0xg )(
DISCRIMINANT FUNCTIONS DECISION SURFACES
![Page 28: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/28.jpg)
28
For monotonic increasing f(.), the rule remains the same if we use:
is a discriminant function
In general, discriminant functions can be defined independent of the Bayesian rule. They lead to suboptimal solutions, yet if chosen appropriately, can be computationally more tractable.
jixPfxPfx jii ))(())(( :if
))(()( xPfxg ii
![Page 29: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/29.jpg)
29
BAYESIAN CLASSIFIER FOR NORMAL DISTRIBUTIONS
Multivariate Gaussian pdf
The last one Called Covariance Matrix
1
12 2
1 1( ) exp ( ) ( )
2(2 )
: 1 vector in
( )( ) : matrix in
i ii i
i
ii
i ii i
p x x x
E x
E x x
![Page 30: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/30.jpg)
30
is monotonically increasing. Define:
Example:
ln
)(ln)(ln
))()(ln()(
ii
iii
Pxp
Pxpxg
ii
iiiiT
ii
C
CPxxxg
ln)2
1(2ln)
2(
)(ln)()(2
1)( 1
2
2
i0
0
![Page 31: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/31.jpg)
31
That is, is quadratic and the surfaces
Quadrics, Ellipsoids, Parabolas, Hyperbolas, Pairs of lines.
For example:
iiii
iii
CP
xxxxxg
)ln()(2
1
)(1
)(2
1)(
22
212
2211222
212
)(xgi
0)()( xgxg ji
![Page 32: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/32.jpg)
32
Decision Hyperplanes
Quadratic terms come from:
If ALL (the same) the quadratic terms are not of interest. They are not involved in comparisons. Then, equivalently, we can write:
Discriminant functions are LINEAR
xx 1i
T
ΣΣi
ii
Τii
ii
ioTii
ΣPw
Σw
wxwxg
10
1
2
1)(ln
)(
![Page 33: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/33.jpg)
33
Let in addition:•
•
•
• 22
02
2
)(
)(ln)(
2
1
,
)(
0)()()(
1)(
Then .
ji
ji
j
i
jio
ji
oT
jiij
iT
ii
P
Px
w
xxw
xgxgxg
wxxg
IΣ
![Page 34: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/34.jpg)
34
Non Diagonal:
•
•
•
Decision hyperplane
2
0)()( 0 xxwxg Tij
)(1
jiw
1
1
0 2
11 2
( )1( ) ln( )
2 ( )
( )
i ji
i jj
i j
T
Px
P
x x x
)( tonormal
tonormalnot
1
ji
ji
![Page 35: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/35.jpg)
35
Minimum Distance Classifiers
equiprobable
Euclidean Distance: smaller
Mahalanobis Distance: smaller
MP i
1)(
)()(2
1)( 1
i
T
ii xxxg
:Assign :2ixI
iE xd
:Assign :2ixI
2
11 ))()((
i
T
im xxd
![Page 36: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/36.jpg)
MoreMore
Geometry Analysis
36
11 2
11 2 1 2
1 11 1 22 2
2 2
1 2
1
(( ) ( ))
, , ,
(( ) ( )) (( ) ( )) ,
Center of Mass:
Principal Axes: 2
Tm i i
T Tl l
T T T
i i i i
il l
T
il
l
i
k
x
d x x
v v v diag
x x x
C
xx C
x x
C
μ
![Page 37: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/37.jpg)
37
![Page 38: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/38.jpg)
38
:tionclassificaBayesian using2.2
0.1 vectorheclassify t
9.13.0
3.01.1 ,
3
3 ,
0
0), ,()(
), ,()( and)()( : ,Given
2122
112121
x
ΣNxp
ΣNxpPP
55.015.0
15.095.0 1-Σ
672.38.0
0.28.0,0.2,952.2
2.2
0.1
2.2,0.1: , from sMahalanobi Compute
12,
21
1,2
21
m
mm
dΣ
dd
1,,21 that Observe .Classify EE ddx
Example:
![Page 39: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/39.jpg)
Statistical Error Analysis Minimum Attainable Classification Error
(Bayes):
39
1
1 1
1
min ,
Lemma: min , , , 0, 0 1
Chenoff Bound
0.5, , , ,
exp
21 1ln , Bhattach
8 2 2
e i i j j
s s
s sss
e i j i j CB
i j
CB i j
i j
T i j
i j
P P p x P p x dx
a b a b a b s
P P P p x p x dx
s N N
P P B
B
i j
i j i j
μ μ
μ μ μ μ aryya Dist.
![Page 40: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/40.jpg)
40
Maximum Likelihood
:method The
to w.r. of Likelihood theasknown iswhich
);(
);,...,();(
,...,
);()( : parameter
ctorunknown vean in known with )(Let
tindependen andknown ,...., ,Let
1
21
21
21
X
xp
xxxpXp
xxxX
xpxp
xp
xxx
k
N
k
N
N
N
ESTIMATION OF UNKNOWN PROBABILITY DENSITY FUNCTIONS
![Page 41: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/41.jpg)
41
0)(
)(
)(
1
)(
)( :ˆ
);(ln);(ln)(
);(maxarg :ˆ
;
; 1
1
1ML
k
k
N
kML
k
N
k
k
Ν
k
xp
xp
L
xpXpL
xp
![Page 42: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/42.jpg)
42
![Page 43: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/43.jpg)
43
0
0
0 N
2
0N
22ˆ 2
0
Un
If, indeed, there is a such that
( ) ( ; ), then
lim [ ]
ˆ
biased
Consistence
E
lim 0
1,
Cramer-Rao Bo
ffi
u
c
nd
1
t
,
e
ˆ
i n
ML
ML
p x p x
E
E
LI E
NI
NNI
![Page 44: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/44.jpg)
44
Example:
AA
AA
xN
xΣ
L
L
L
xΣx
Σ
xp
xΣxCxpL
xpxpxxxΣNxp
TT
k
N
kMLk
N
k
l
kT
klk
kT
k
N
kk
N
k
kkN
2)(
if :Remember
10)(
.
.
.
)(
)(
))()(2
1exp(
)2(
1);(
)()(2
1);(ln)(
);()(,...,,unknown, :),( :)(
1
1
1
1
1
2
12
1
11
21
![Page 45: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/45.jpg)
45
Maximum Aposteriori Probability Estimation In ML method, θ was considered as a
parameterHere we shall look at θ as a random vector
described by a pdf p(θ), assumed to be known
Given
Compute the maximum of
From Bayes theorem
N21 x,...,x,xX
)( Xp
( ) ( ) ( ) ( ) or
( ) ( )
( )
p p X p X p X
p p Xp X
p X
![Page 46: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/46.jpg)
46
The method:
MLMAP
MAP
MAP
p
XpP
Xp
ˆenough broador uniform is )( If
))()((:ˆ
or )(maxargˆ
![Page 47: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/47.jpg)
47
![Page 48: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/48.jpg)
48
Example:
k
N
kML
k
N
k
MAP
k
N
kk
N
kMAP
ll
N
xN
For
N
x
xpxp
p
x,...,xXΣNxp
1MAP
2
2
2
2
12
2
0
02211
2
2
0
2
1
1ˆˆ
Nfor or ,1
1
ˆ
0)ˆ(1
)(1
or 0))()(ln( :
)2
exp(
)2(
1)(
unknown, ),,( :)(
![Page 49: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/49.jpg)
49
Bayesian Inference
Traini
1
ng Set
ML,MAP a single estimate for .
Here a different root is followed.
Given: { ,..., }, ( ) and ( )
The goal: estimate p( )
How??
NX x x p x p
x X
![Page 50: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/50.jpg)
50
2
20 0
2
2 2 2 220 0 0
2 2 2 210 0
A bit more insight via an example
( ) ( , ) , Unknown Mean
( ) ( , )
It turns out (Problem 2.22) that: ( ) ( , )
1 , ,
N N
N
N N kk
Let p x N
p N
p X N
N xx x
NN N
)()(
)()(
)()(
)(
)()()(
)()(
1
k
N
kxpXp
dpXp
pXp
Xp
pXpXp
dXpxpXxp
)(
![Page 51: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/51.jpg)
51
The above is a sequence of Gaussians as
Maximum EntropyEntropy
xdxpxpH )(ln)(
sconstraint available
thesubject to maximum :)(ˆ Hxp
N
![Page 52: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/52.jpg)
52
Example: x is nonzero in the intervaland zero otherwise. Compute the ME pdf
• The constraint:
• Lagrange Multipliers
•
21 xxx
2
1
1)(x
x
dxxp
2 2
1 1
( ( ) 1) ln ( ) 1x x
LL
x x
HH H p x dx p x dx
p x
otherwise
0
1)(ˆ
)1exp()(ˆ
21
12
xxx
xxxp
xp
![Page 53: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/53.jpg)
53
Mixture Models
Assume parametric modeling, i.e.,
The goal is to estimategiven a set
We Ignore the details!
M
jj
J
jj
xdjxpP
Pjxpxp
1 x
1
1)( ,1
)()(
);jx(p
jPPP ,...,, and 21 N21 x,...,x,xX
![Page 54: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/54.jpg)
54
Nonparametric Estimation
2
2
hx̂x̂
hx̂
0,,0
if , as )()(ˆ , continuous )( If
N
kkh
Nxpxpxp
NNN
2ˆ- ,
1)ˆ(ˆ)(ˆ
total
in
hxx
N
k
hxpxp
N
hk
N
kP
N
NN
![Page 55: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/55.jpg)
55
Parzen Windows Divide the multidimensional space in
hypercubes
![Page 56: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/56.jpg)
56
Define
• That is, it is 1 inside a unit side hypercube centered at 0
•
•
• The problem:
• Parzen windows-kernels-potential functions
))(1
(1
)(ˆ1
N
i
il h
xx
Nhxp
xN
at centered hypercube side-han
inside points ofnumber *1
*volume
1
ousdiscontinu (.)
continuous )(
xp
x
xdxx
x
1)( ,0)(
smooth is )(
otherwise2
1
0
1)( ij
ixx
![Page 57: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/57.jpg)
57
Mean value
•
•
•
•
Hence unbiased in the limit
'1
')'()'
(1
)])([1
(1
)](ˆ[x
l
N
i
il
xdxph
xx
hh
xxE
NhxpE
lh
1 ,0h
0)'
( of width the0
h
xxh
1)'
(1
xdh
xx
hl
)()(1
0 xh
x
hh
l
'
)(')'()'()](ˆ[x
xpxdxpxxxpE
![Page 58: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/58.jpg)
58
Variance• The smaller the h the higher the variance
h=0.1, N=1000 h=0.8, N=1000
![Page 59: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/59.jpg)
59
h=0.1, N=10000
The higher the N the better the accuracy
![Page 60: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/60.jpg)
60
If• • •
asymptotically unbiased
The method• Remember:
•
Νh
N
h 0
1112
2221
1
2
2
112 )(
)()(
)(
)(
P
P
xp
xpl
)(
)(1
)(1
2
1
12
11
N
i
il
N
i
il
hxx
hN
hxx
hN
![Page 61: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/61.jpg)
61
CURSE OF DIMENSIONALITY
In all the methods, so far, we saw that the highest the number of points, N, the better the resulting estimate.
If in the one-dimensional space an interval, filled with N points, is adequately (for good estimation), in the two-dimensional space the corresponding square will require N2 and in the ℓ-dimensional space the ℓ-dimensional cube will require Nℓ points.
The exponential increase in the number of necessary points in known as the curse of dimensionality. This is a major problem one is confronted with in high dimensional spaces.
![Page 62: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/62.jpg)
62
NAIVE – BAYES CLASSIFIER
Let and the goal is to estimate i = 1, 2, …, M. For a “good” estimate of the pdf one would need, say, Nℓ points.
Assume x1, x2 ,…, xℓ mutually independent. Then:
In this case, one would require, roughly, N points for each pdf. Thus, a number of points of the order N·ℓ would suffice.
It turns out that the Naïve – Bayes classifier works reasonably well even in cases that violate the independence assumption.
x ixp |
1
||j
iji xpxp
![Page 63: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/63.jpg)
63
K Nearest Neighbor Density Estimation
In Parzen:• The volume is constant• The number of points in the volume is
varying
Now:• Keep the number of points
constant
• Leave the volume to be varying
•
kkN
)()(ˆ
xNV
kxp
![Page 64: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/64.jpg)
64
•
)( 11
22
22
11 VN
VN
VNkVN
k
![Page 65: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/65.jpg)
65
The Nearest Neighbor Rule Choose k out of the N training vectors, identify
the k nearest ones to x
Out of these k identify ki that belong to class ωi
The simplest version
k=1 !!!
For large N this is not bad. It can be shown that:
if PB is the optimal Bayesian error probability, then:
jikk:x jii Assign
2)1
2( BBBNNB PPM
MPPP
![Page 66: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/66.jpg)
66
For small PB:
k
P2PPP NN
BkNNB
BkNN PP,k
23 )(3
2
BBNN
BNN
PPP
PP
![Page 67: 1 E. Fatemizadeh Statistical Pattern Recognition](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f0c5503460f94c2043e/html5/thumbnails/67.jpg)
67
Voronoi tesselation
jixxdxxdxR jii ),(),(: