Bayesian Decision Theory(Classification)
主講人:虞台文
Contents
Introduction Generalize Bayesian Decision Rule Discriminant Functions The Normal Distribution Discriminant Functions for the Normal Popula
tions. Minimax Criterion Neyman-Pearson Criterion
Bayesian Decision Theory(Classification)
Introduction
What is Bayesian Decision Theory?
Mathematical foundation for decision making.
Using probabilistic approach to help making decision (e.g., classification) so as to minimize the risk (cost).
Preliminaries and Notations
:},,,{ 21 ci a state of nature
:)( iP prior probability
:x feature vector
:)|( ip x class-conditionaldensity
:)|( xiP posterior probability
Bayesian Rule
)(
)()|()|(
x
xx
p
PpP ii
i
c
jii Ppp
1
)()|()( xx
Decision
)(
)()|()|(
x
xx
p
PpP ii
i
)|(maxarg)( xx iPi
D )|(maxarg)( xx iPi
D
unimportant inmaking decision
unimportant inmaking decision
Decision)(
)()|()|(
x
xx
p
PpP ii
i
( ) arg max ( | )i
iP
x xD( ) arg max ( | )i
iP
x xD
Decide i if P(i|x) > P(j|x) j i
Decide i if p(x|i)P(i) > p(x|j)P(j) j i
Special cases:1. P(1)=P(2)= =P(c)2. p(x|1)=p(x|2) = = p(x|c)
Two Categories
Decide i if P(i|x) > P(j|x) j i
Decide i if p(x|i)P(i) > p(x|j)P(j) j i
Decide 1 if P(1|x) > P(2|x); otherwise decide 2
Decide 1 if p(x|1)P(1) > p(x|2)P(2); otherwise decide 2
Special cases:1. P(1)=P(2)
Decide 1 if p(x|1) > p(x|2); otherwise decide 1
2. p(x|1)=p(x|2)Decide 1 if P(1) > P(2); otherwise decide 2
Example
R2
P(1)=P(2)
R1
Special cases:1. P(1)=P(2)
Decide 1 if p(x|> p(x|2); otherwise decide 1
2. p(x|1)=p(x|2)Decide 1 if P(1) > P(2); otherwise decide 2
Special cases:1. P(1)=P(2)
Decide 1 if p(x|> p(x|2); otherwise decide 1
2. p(x|1)=p(x|2)Decide 1 if P(1) > P(2); otherwise decide 2
Example
R1R1
R2
R2
P(1)=2/3P(2)=1/3
Decide 1 if p(x|1)P(1) > p(x|2)P(2); otherwise decide 2
Classification Error
xx derrorperrorP ),()(
xxx dperrorP )()|(
Consider two categories:
21
12
decide weif)|(
decide weif)|()|(
x
xx
P
PerrorP
Decide 1 if P(1|x) > P(2|x); otherwise decide 2
)]|(),|(min[ 21 xx PP
xxx dperrorP )()|(
Classification Error
xx derrorperrorP ),()(
Consider two categories:
21
12
decide weif)|(
decide weif)|()|(
x
xx
P
PerrorP
Decide 1 if P(1|x) > P(2|x); otherwise decide 2
)]|(),|(min[ 21 xx PP
Bayesian Decision Theory(Classification)
Generalized Bayesian Decision
Rule
The Generation
:},,,{ 21 c a set of c states of nature
:},,,{ 21 a a set of a possible actions
:)|( jiij The loss incurred for taking action i when the true state of nature is j.
We want to minimize the expected loss in making decision.
Risk
can be zero.
Conditional Risk
c
jjjii PR
1
)|()|()|( xx
c
jjij P
1
)|( x
Given x, the expected loss
(risk) associated with taking action
i.
Given x, the expected loss
(risk) associated with taking action
i.
0/1 Loss Function
otherwise1
with assiciateddecision correct a is 0)|( ji
ji
c
jjjii PR
1
)|()|()|( xx
c
jjij P
1
)|( x
( | ) ( | )iR P error x x
Decision
c
jjjii PR
1
)|()|()|( xx
c
jjij P
1
)|( x
)|(minarg)( xx iRi
)|(minarg)( xx iRi
Bayesian Decision Rule:
Overall Risk
xxxx dpRR )()|)((Decision function
Bayesian decision rule:
the optimal one to minimize the overall riskIts resulting overall risk is called the Bayesian risk
)|(minarg)( xx iRi
)|(minarg)( xx iRi
Two-Category Classification
},{ 21
},{ 21 A
ctio
n
State of Nature
1 2
1 11 12
2 21 22
Loss Function
)|()|()|( 2121111 xxx PPR
)|()|()|( 2221212 xxx PPR
Two-Category Classification
)|()|()|( 2121111 xxx PPR
)|()|()|( 2221212 xxx PPR
Perform 1 if R(2|x) > R(1|x); otherwise perform 2
)|()|()|()|( 212111222121 xxxx PPPP
)|()()|()( 2221211121 xx PP
Two-Category Classification
Perform 1 if R(2|x) > R(1|x); otherwise perform 2
)|()|()|()|( 212111222121 xxxx PPPP
positive
)|()()|()( 2221211121 xx PP
positive
Posterior probabilities are scaled before comparison.
Two-Category Classification
)(
)()|()|(
x
xx
p
PpP ii
i
irrelevan
t
irrelevant
Perform 1 if R(2|x) > R(1|x); otherwise perform 2
)|()|()|()|( 212111222121 xxxx PPPP
)|()()|()( 2221211121 xx PP
)()|()()()|()( 222212111121 PpPp xx
)(
)(
)(
)(
)|(
)|(
1
2
1121
2212
2
1
P
P
p
p
x
x
Two-Category Classification
)(
)(
)(
)(
)|(
)|(
1
2
1121
2212
2
1
P
P
p
p
x
xPerform 1 if
LikelihoodRatio
Threshold
This slide will be recalled later.This slide will be recalled later.
Bayesian Decision Theory(Classification)
Discriminant Functions
The Multicategory Classification
g1(x)g1(x)
g2(x)g2(x)
gc(x)gc(x)
x Action(e.g., classification)
(x)
Assign x to i ifgi(x) > gj(x) for all j i.
gi(x)’s are called the discriminant functions.
How to define discriminant functions?
Simple Discriminant Functions
)|()( xx ii Rg
)|()( xx ii Pg
Minimum Risk case:
Minimum Error-Rate case:
)()|()( iii Ppg xx
)(ln)|(ln)( iii Ppg xx
If f(. ) is a monotonically increasing function, than f(gi(. ) )’s are also be discriminant functions.
Decision Regions
} )()(|{ ijgg jii xxxR
Two-category example
Decision regions are separated by decision boundaries.
Bayesian Decision Theory(Classification)
The Normal Distribution
Basics of Probability
Discrete random variable (X) - Assume integer
Continuous random variable (X)
Probability mass function (pmf): )()( xXPxp
Cumulative distribution function (cdf):
x
t
tpxXPxF )()()(
Probability density function (pdf): )(or )( xfxp
Cumulative distribution function (cdf):
xdttpxXPxF )()()(
not a probability
Expectations
continuous is )()(
discrete is )()()]([
Xdxxpxg
XxpxgXgE x
continuous is )()(
discrete is )()()]([
Xdxxpxg
XxpxgXgE x
Let g be a function of random variable X.
The kth moment ][ kXE
The kth central moments ])[( kXXE
The 1st moment ][XEX
Important Expectations
Mean
continuous is )(
discrete is )(][
Xdxxxp
XxxpXE xX
Variance
continuous is )()(
discrete is )()(])[(][
2
2
22
Xdxxpx
XxpxXEXVar
X
xX
XX
Fact: 22 ])[(][][ XEXEXVar 22 ])[(][][ XEXEXVar
Entropy
continuous is )(ln)(
discrete is )(ln)(][
Xdxxpxp
XxpxpXH x
continuous is )(ln)(
discrete is )(ln)(][
Xdxxpxp
XxpxpXH x
The entropy measures the fundamental uncertainty in the value of points selected randomly from a distribution.
Univariate Gaussian Distribution
x
p(x)X~N(μ,σ2)
2
2
2
)(
2
1)(
x
exp2
2
2
)(
2
1)(
x
exp μ
σ σ
2σ 2σ
3σ 3σE[X] =μ
Var[X] =σ2
Properties:1. Maximize the entropy2. Central limit theorem
Random Vectors
dR:XdR:XA d-dimensional
random vector
TdE ),,,(][ 21 XμVector Mean:
Covariance Matrix:
]))([( TE μXμXΣ
221
22221
11221
ddd
d
d
TdXXX ),,,( 21 X
Multivariate Gaussian Distribution
X~N(μ,Σ)
)()(
2
1exp
||)2(
1)( 1
2/12/μxΣμx
Σx T
dp
)()(
2
1exp
||)2(
1)( 1
2/12/μxΣμx
Σx T
dp
E[X] =μ
E[(X-μ) (X-μ)T] =Σ
2
2
2
)(
2
1)(
x
exp2
2
2
)(
2
1)(
x
exp
A d-dimensional random vector
Properties of N(μ,Σ)
X~N(μ,Σ) A d-dimensional random vector
Let Y=ATX, where A is a d × k matrix.
Y~N(ATμ, ATΣA)
Properties of N(μ,Σ)
X~N(μ,Σ) A d-dimensional random vector
Let Y=ATX, where A is a d × k matrix.
Y~N(ATμ, ATΣA)
On Parameters of N(μ,Σ)
X~N(μ,Σ) TdXXX ),,,( 21 X
TdE ),,,(][ 21 Xμ
ddijTE ][]))([( μXμXΣ
][ ii XE ][ ii XE
),()])([( jijjiiij XXCovXXE ),()])([( jijjiiij XXCovXXE
)(])[( 22iiiiii XVarXE )(])[( 22
iiiiii XVarXE
0 ijji XX
More On Covariance Matrix
ddijTE ][]))([( μXμXΣ
),()])([( jijjiiij XXCovXXE ),()])([( jijjiiij XXCovXXE
)(])[( 22iiiiii XVarXE )(])[( 22
iiiiii XVarXE
0 ijji XX
is symmetric and positive semidefinite.TΦΛΦΣ
: orthonormal matrix, whose columns are eigenvectors of . : diagonal matrix (eigenvalues).
TΦΛΦΛ 2/12/1
T))(( 2/12/1 ΦΛΦΛΣ T))(( 2/12/1 ΦΛΦΛΣ
Whitening Transform
X~N(μ,Σ)
Y=ATX Y~N(ATμ, ATΣA)
T))(( 2/12/1 ΦΛΦΛΣ T))(( 2/12/1 ΦΛΦΛΣ
Let 2/1ΦΛwA2/1ΦΛwA
),(~ wTw
Tww NX ΣAAμAA
IΦΛΦΛΦΛΦΛΣAA )())(()( 2/12/12/12/1 TTw
Tw
),(~ IμAA Tww NX ),(~ IμAA T
ww NX
Whitening Transform
X~N(μ,Σ)
Y=ATX Y~N(ATμ, ATΣA)
T))(( 2/12/1 ΦΛΦΛΣ T))(( 2/12/1 ΦΛΦΛΣ
Let 2/1ΦΛwA2/1ΦΛwA
),(~ wTw
Tww NX ΣAAμAA
IΦΛΦΛΦΛΦΛΣAA )())(()( 2/12/12/12/1 TTw
Tw
),(~ IμAA Tww NX ),(~ IμAA T
ww NX
Whitening
Projection
LinearTransform
Mahalanobis Distance
)()(
2
1exp
||)2(
1)( 1
2/12/μxΣμx
Σx T
dp
)()(
2
1exp
||)2(
1)( 1
2/12/μxΣμx
Σx T
dp
constant
)()( 12 μxΣμx Tr )()( 12 μxΣμx Tr
r2depends on the value of r2
X~N(μ,Σ)
Mahalanobis Distance
)()(
2
1exp
||)2(
1)( 1
2/12/μxΣμx
Σx T
dp
)()(
2
1exp
||)2(
1)( 1
2/12/μxΣμx
Σx T
dp
constant
)()( 12 μxΣμx Tr )()( 12 μxΣμx Tr
r2depends on the value of r2
X~N(μ,Σ)
Bayesian Decision Theory(Classification)
Discriminant Functions for the Normal Po
pulations
Minimum-Error-Rate Classification
)|()( xx ii Pg )()|()( iii Ppg xx
)(ln)|(ln)( iii Ppg xx
Xi~N(μi,Σi)
)()(
2
1exp
||)2(
1)|( 1
2/12/ iiT
ii
dip μxΣμxΣ
x
)()(
2
1exp
||)2(
1)|( 1
2/12/ iiT
ii
dip μxΣμxΣ
x
)(ln||ln2
12ln
2)()(
2
1)( 1
iiiiT
ii Pd
g ΣμxΣμxx )(ln||ln2
12ln
2)()(
2
1)( 1
iiiiT
ii Pd
g ΣμxΣμxx
Minimum-Error-Rate Classification
)(ln||ln2
12ln
2)()(
2
1)( 1
iiiiT
ii Pd
g ΣμxΣμxx )(ln||ln2
12ln
2)()(
2
1)( 1
iiiiT
ii Pd
g ΣμxΣμxx
Three Cases:Case 1: IΣ 2i
Case 2: ΣΣ i
Case 3: ji ΣΣ
Classes are centered at different mean, and their feature components are pairwisely independent have the same variance.
Classes are centered at different mean, but have the same variation.
Arbitrary.
Case 1. i = 2I
)(ln||ln2
12ln
2)()(
2
1)( 1
iiiiT
ii Pd
g ΣμxΣμxx )(ln||ln2
12ln
2)()(
2
1)( 1
iiiiT
ii Pd
g ΣμxΣμxx
irrelevant
)(ln||||2
1)( 2
2 iii Pg
μxx
IΣ2
1 1
iIΣ
21 1
i
)(ln)2(2
12 ii
Ti
Ti
T P
μμxμxx
irrelevant
)(ln
2
11)(
22 iiTi
Tii Pg
μμxμx
Case 1. i = 2I
ii μw 21
ii μw 2
1
)(ln
2
11)(
22 iiTi
Tii Pg
μμxμx
)(ln221
0 iiTii Pw
μμ )(ln22
10 ii
Tii Pw
μμ
0)( iTii wg xwx
Case 1. i = 2I
ii μw 21
ii μw 2
1
)(ln221
0 iiTii Pw
μμ )(ln22
10 ii
Tii Pw
μμ
0)( iTii wg xwx
i j
Boundary btw. i and j
)()( xx ji gg 00 j
Tji
Ti ww xwxw
00)( ijTj
Ti ww xww
)(
)(ln)()( 2
21
j
ij
Tji
Ti
Tj
Ti P
P
μμμμxμμ
)(
)(ln
||||
))(())(()(
22
21
j
i
ji
jiTj
Ti
jiTj
Ti
Tj
Ti P
P
μμ
μμμμμμμμxμμ
Case 1. i = 2I
i j
Boundary btw. i and j
)()( xx ji gg
)(
)(ln
||||
))(())(()(
22
21
j
i
ji
jiTj
Ti
jiTj
Ti
Tj
Ti P
P
μμ
μμμμμμμμxμμ
wT
w0)( 0 xxwT
)()(
)(ln
||||)(
2
2
21
0 jij
i
jiji P
Pμμ
μμμμx
ji μμw x0
x
xx0
The decision boundary will be a hyperplane perpendicular to the line btw. the means at somewhere.
0 if P( i)=P( j)midpoint
Case 1. i = 2I
)()(
)(ln
||||)( 21
2
12
21
2
2121
0 μμμμ
μμx
P
P
)()( 21 PP
Minimum distance classifier (template matching)
Case 1. i = 2I
)()( 21 PP
)()(
)(ln
||||)( 21
2
12
21
2
2121
0 μμμμ
μμx
P
P
Case 1. i = 2I
)()( 21 PP
)()(
)(ln
||||)( 21
2
12
21
2
2121
0 μμμμ
μμx
P
P
Case 1. i = 2I
)()( 21 PP
)()(
)(ln
||||)( 21
2
12
21
2
2121
0 μμμμ
μμx
P
P
Demo
Case 2. i =
)(ln||ln2
12ln
2)()(
2
1)( 1
iiiiT
ii Pd
g ΣμxΣμxx )(ln||ln2
12ln
2)()(
2
1)( 1
iiiiT
ii Pd
g ΣμxΣμxx
Irrelevant ifP( i)= P( j) i, j
)(ln)()(2
1)( 1
iiT
ii Pg μxΣμxx
MahalanobisDistance
irrelevant
Case 2. i =
)(ln)()(2
1)( 1
iiT
ii Pg μxΣμxx
)(ln)2(2
1 111ii
Ti
Ti
T P μΣμxΣμxΣx
Irrelevant
0)( iTii wg xwx
ii μΣw 1 ii μΣw 1
)(ln121
0 iiTii Pw μΣμ )(ln1
21
0 iiTii Pw μΣμ
Case 2. i =
0)( iTii wg xwx
ii μΣw 1 ii μΣw 1
)(ln121
0 iiTii Pw μΣμ )(ln1
21
0 iiTii Pw μΣμ
i j
)()( xx ji gg 0)( 0 xxwT
)(1ji μμΣw
)()()(
)](/)(ln[)(
121
0 jiji
Tji
jiji
PPμμ
μμΣμμμμx
w
x0
x
Case 2. i =
Case 2. i = Demo
Case 3. i j
)(ln||ln2
12ln
2)()(
2
1)( 1
iiiiT
ii Pd
g ΣμxΣμxx )(ln||ln2
12ln
2)()(
2
1)( 1
iiiiT
ii Pd
g ΣμxΣμxx
)(ln||ln2
1)()(
2
1)( 1
iiiiT
ii Pg ΣμxΣμxx
irrelevant
0)( iTii
Ti wg xwxWxx
1
2
1 ii ΣW1
2
1 ii ΣW iii μΣw 1 iii μΣw 1 )(ln||ln 1211
21
0 iiiiTii Pw ΣμΣμ )(ln||ln 1
211
21
0 iiiiTii Pw ΣμΣμ
Without this termIn Case 1 and 2
Decision surfaces are hyperquadrics, e.g.,• hyperplanes• hyperspheres• hyperellipsoids• hyperhyperboloids
Case 3. i j
Non-simply connected decision regions can arise in one dimensions for Gaussians having unequal variance.
Case 3. i j
Case 3. i j
Case 3. i j
Demo
Multi-Category Classification
Bayesian Decision Theory(Classification)
Minimax Criterion
Bayesian Decision Rule:Two-Category Classification
)(
)(
)(
)(
)|(
)|(
1
2
1121
2212
2
1
P
P
p
p
x
xDecide 1 if
LikelihoodRatio
Threshold
Minimax criterion deals with the case thatthe prior probabilities are unknown.
Basic Concept on Minimax
To choose the worst-case prior probabilities (the maximum loss) and, then, pick the decision rule that will minimize the overall risk.
Minimize the maximum possible overall risk.
Overall Risk
xxxx dpRR )()|)((
21
)()|()()|( 21 RRxxxxxx dpRdpR
)|()|()|( 2121111 xxx PPR )|()|()|( 2221212 xxx PPR
2
1
)()]|()|([
)()]|()|([
222121
212111
R
R
xxxx
xxxx
dpPP
dpPPR
Overall Risk
2
1
)()]|()|([
)()]|()|([
222121
212111
R
R
xxxx
xxxx
dpPP
dpPPR
)(
)()|()|(
x
xx
p
PpP ii
i
)(
)()|()|(
x
xx
p
PpP ii
i
2
1
)]|()()|()([
)]|()()|()([
22221121
22121111
R
R
xxx
xxx
dpPpP
dpPpPR
Overall Risk)(1)( 12 PP )(1)( 12 PP
2
1
)]|()()|()([
)]|()()|()([
22221121
22121111
R
R
xxx
xxx
dpPpP
dpPpPR
2
1
)}|()](1[)|()({
)}|()](1[)|()({
21221121
21121111
R
R
xxx
xxx
dpPpP
dpPpPR
1 2
1 1
2 2
12 2 22 2
11 1 1 12 1 2
21 1 1 22 1 2
( | ) ( | )
( ) ( | ) ( ) ( | )
( ) ( | ) ( ) ( | )
R p d p d
P p d P p d
P p d P p d
x x x x
x x x x
x x x x
R R
R R
R R
Overall Risk
1 2
1 1
2 2
12 2 22 2
11 1 1 12 1 2
21 1 1 22 1 2
( | ) ( | )
( ) ( | ) ( ) ( | )
( ) ( | ) ( ) ( | )
R p d p d
P p d P p d
P p d P p d
x x x x
x x x x
x x x x
R R
R R
R R
1)|()|(21
RRxxxx dpdp ii 1)|()|(
21
RRxxxx dpdp ii
12
1
)|()()|()()()(
)|()()]([
222121112122111
22212221
RR
R
xxxx
xx
dpdpP
dpPR
Overall Risk
12
1
)|()()|()()()(
)|()()]([
222121112122111
22212221
RR
R
xxxx
xx
dpdpP
dpPR
The overall risk for a particular P(1).
The value depends onthe setting of decision boundary
The value depends onthe setting of decision boundary
R(x) = ax + bR(x) = ax + b
Overall Risk
12
1
)|()()|()()()(
)|()()]([
222121112122111
22212221
RR
R
xxxx
xx
dpdpP
dpPR
= 0 for minimax solution
= R mm, minimax risk
R(x) = ax + bR(x) = ax + b
Independent on the value of P(i).
Minimax Risk
12
1
)|()()|()()()(
)|()()]([
222121112122111
22212221
RR
R
xxxx
xx
dpdpP
dpPR
1
)|()( 2221222 Rxx dpRmm
2
)|()( 1112111 Rxx dp
Error Probability
12
1
)|()()|()()()(
)|()()]([
222121112122111
22212221
RR
R
xxxx
xx
dpdpP
dpPR
Use 0/1 loss function
12
1
)|()|()(
)|()]([
211
21
RR
R
xxxx
xx
dpdpP
dpPPerror
Minimax Error-Probability
1
)|()( 2Rxx dperrorPmm
2
)|( 1Rxx dp
Use 0/1 loss function
P( 1| 2) P( 2| 1)
12
1
)|()|()(
)|()]([
211
21
RR
R
xxxx
xx
dpdpP
dpPPerror
Minimax Error-Probability
R1 R2
1 2
1
)|()( 2Rxx dperrorPmm
2
)|( 1Rxx dp
P( 1| 2) P( 2| 1)
12
1
)|()|()(
)|()]([
211
21
RR
R
xxxx
xx
dpdpP
dpPPerror
12
1
)|()|()(
)|()]([
211
21
RR
R
xxxx
xx
dpdpP
dpPPerror
Minimax Error-Probability
12
1
)|()|()(
)|()]([
211
21
RR
R
xxxx
xx
dpdpP
dpPPerror
12
1
)|()|()(
)|()]([
211
21
RR
R
xxxx
xx
dpdpP
dpPPerror
Bayesian Decision Theory(Classification)
Neyman-Pearson Criterion
Bayesian Decision Rule:Two-Category Classification
)(
)(
)(
)(
)|(
)|(
1
2
1121
2212
2
1
P
P
p
p
x
xDecide 1 if
LikelihoodRatio
Threshold
Neyman-Pearson Criterion deals with the case that both loss functions and the prior probabilities are unknown.
Signal Detection Theory
The theory of signal detection theory evolved from the development of communications and radar equipment the first half of the last century.
It migrated to psychology, initially as part of sensation and perception, in the 50's and 60's as an attempt to understand some of the features of human behavior when detecting very faint stimuli that were not being explained by traditional theories of thresholds.
The situation of interest
A person is faced with a stimulus (signal) that is very faint or confusing.
The person must make a decision, is the signal there or not.
What makes this situation confusing and difficult is the presences of other mess that is similar to the signal. Let us call this mess noise.
Example
Noise is present both in the environment and in the sensory system of the observer.
The observer reacts to the momentary total activation of the sensory system, which fluctuates from moment to moment, as well as responding to environmental stimuli, which may include a signal.
Example A radiologist is examining a CT scan, looking for
evidence of a tumor. A Hard job, because there is always some uncertainty.
There are four possible outcomes: – hit (tumor present and doctor says "yes'')– miss (tumor present and doctor says "no'') – false alarm (tumor absent and doctor says "yes") – correct rejection (tumor absent and doctor says "no").
Two types of Error
Correct RejectionCorrect Rejection
The Four Cases
P(1|1)
MissMiss
False AlarmsFalse Alarms HitHit
Signal (tumor)
Absent (1) Present (2)
Decision
No (1)
Yes (2)P(2|2)
P(1|2)
P(2|1)
Signal detection theory was developed to help us understand how a continuous and ambiguous signal can lead to a binary yes/no decision.
Signal detection theory was developed to help us understand how a continuous and ambiguous signal can lead to a binary yes/no decision.
No (1) Yes (2)
Decision Making
d’Noise
1
Noise + Signal
2
Criterion
Hit
FalseAlarm
Discriminability
||' 12 d
||' 12 d
Based on expectancy(decision bias)
P(2|2)
P(2|1)
ROC Curve(Receiver Operating Characteristic)
Hit
FalseAlarm
PH=P(2|2)
PFA=P(2|1)
Neyman-Pearson Criterion
FalseAlarm
PFA=P(2|1)
NP:max. PH
subject to PFA ≦ a
Hit
PH=P(2|2)
Likelihood Ratio Test
T
T
pppp
)|()|()|()|(
2
1
2
1
1
0)(
xxxx
x
where T is a threshold that meets the PFA constraint ( ≦ a).
)}|()|(|{ 211 xxx Tpp R )}|()|(|{ 211 xxx Tpp R
)}|()|(|{ 222 xxx Tpp R )}|()|(|{ 222 xxx Tpp R
How to determine T?
Likelihood Ratio Test
T
T
pppp
)|()|()|()|(
2
1
2
1
1
0)(
xxxx
x)}|()|(|{ 211 xxx Tpp R )}|()|(|{ 211 xxx Tpp R
)}|()|(|{ 222 xxx Tpp R )}|()|(|{ 222 xxx Tpp R
PH
PFA
R2 R1
xx dpPFA )|( 12
R
xx dpPH )|( 22
R
xxx dp )|()( 12
R
xxx dp )|()( 22
R
]|)([ 1 XEPFA ]|)([ 1 XEPFA
]|)([ 2 XEPH ]|)([ 2 XEPH
Neyman-Pearson Lemma
Consider the aforementioned rule with T chosen to give PFA() =a. There is no decision rule ’ such that PFA(’) a and PH(’) > PH() .
]|)([ 1 XEPFA ]|)([ 1 XEPFA
]|)([ 2 XEPH ]|)([ 2 XEPH
T
T
pppp
)|()|()|()|(
2
1
2
1
1
0)(
xxxx
x
T
T
pppp
)|()|()|()|(
2
1
2
1
1
0)(
xxxx
x
Pf) Let ’ be a decision rule with .]|)('[)'( 1 aEPFA X
0)]|()|()][(')([ 12 xxxxx dpTp
=1
0 > 0
Neyman-Pearson Lemma
Consider the aforementioned rule with T chosen to give PFA() =a. There is no decision rule ’ such that PFA(’) ≦a and PH(’) > PH() .
]|)([ 1 XEPFA ]|)([ 1 XEPFA
]|)([ 2 XEPH ]|)([ 2 XEPH
T
T
pp
pp
)|()|(
)|()|(
2
1
2
1
0
1)(
xx
xx
x
T
T
pp
pp
)|()|(
)|()|(
2
1
2
1
0
1)(
xx
xx
x
Pf) Let ’ be a decision rule with .]|)('[)'( 1 aEPFA X
0)]|()|()][(')([ 12 xxxxx dpTp
=0
0 0
OK
Neyman-Pearson Lemma
Consider the aforementioned rule with T chosen to give PFA() =a. There is no decision rule ’ such that PFA(’) ≦a and PH(’) > PH() .
]|)([ 1 XEPFA ]|)([ 1 XEPFA
]|)([ 2 XEPH ]|)([ 2 XEPH
T
T
pp
pp
)|()|(
)|()|(
2
1
2
1
0
1)(
xx
xx
x
T
T
pp
pp
)|()|(
)|()|(
2
1
2
1
0
1)(
xx
xx
x
Pf) Let ’ be a decision rule with .]|)('[)'( 1 aEPFA X
0)]|()|()][(')([ 12 xxxxx dpTp
xxxxxxxxxxxx dpdpdpdpT )|()(')|()()|()(')|()( 1122
)]'()([)]'()([ FAFAHH PPPPT 0
0
)'()( HH PP )'()( HH PP