detection theory - binary and m-ary source
DESCRIPTION
Detection Theory - Binary and M-ary sourceTRANSCRIPT
-
Bayesian Hypothesis testing Statistical Decision Theory I. Simple Hypothesis testing. Binary Hypothesis testing Bayesian Hypothesis testing. Minimax Hypothesis testing. Neyman-Pearson criterion. M-Hypotheses. Receiver Operating Characteristics. Composite Hypothesis testing. Composite Hypothesis testing approaches. Performance of GLRT for large data records. Nuisance parameters.
-
Classical detection and estimation theory. What is detection? Signal detection and estimation is the area of study that deals with
the processing of information-bearing signals for the purpose of extracting information from them.
A simple digital communication system.
TransmitterSource ChannelDigital
SequenceSignal
Sequence
s(t) r(t)=s(t)+n(t)
TT
1 0
TT
sin0tsin1t
-
Components of a decision theory problem. 1. Source - that generates an
output. 2. Probabilistic transition
mechanism - a device that knows which hypothesis is true. It generates a point in the observation space accordingly to some probability law.
3. Observation space describes all the outcomes of the transition mechanism.
4. Decision - to each point in observation space is assigned one of the hypotheses
Probabilistictransition
mechanismSource Observation
space
Decisionrule
Decision
H1
H0
Components of a decision theory problem.
-
Example:
When 1H is true the source generates +1.
When 0H is true the source generates -1.
An independent discrete random variable n whose probability density is added to the source output.
12
14
14
N0-1 +1
pn(N)
12
14
14
N+10 +2
pn(N H1)
12
14
14
N-1-2 0
pn(N H0)
Source
H1
H0+1
-1
Transitionmechanism
Observationspace
r
-
The sum of the source output and n is observed variable r. Observation space has finite dimension, i.e. observation consists of a
set of N numbers and can be represented as a point in N dimensional space.
Under the two hypotheses, we have
1
0
: 1
: 1
H r n
H r n
= +
= + After observing the outcome in the observation space we shall guess
which hypothesis is true. We use a decision rule that assigns each point to one of the
hypotheses.
-
Detection and estimation applications involve making inferences from observations that are distorted or corrupted in some unknown manner.
-
Simple binary hypothesis testing. The decision problem in which each of two source outputs
corresponds to a hypothesis. Each hypothesis maps into a point in the observation space. We assume that the observation space is a set of N observations:
1 2, , , Nr r r . Each set can be represented as a vector r:
1
2
N
rr
r
r #
-
The probabilistic transition mechanism generates points in accord with the two known conditional densities ( )
11|
|H
p Hr
R , ( )0
0||
Hp H
rR .
The objective is to use this information to develop a decision rule.
-
Decision criteria. In the binary hypothesis problem either 0H or 1H is true. We are seeking decision rules for making a choice. Each time the experiment is conducted one of four things can happen:
1. 0H true; choose 0H correct 2. 0H true; choose 1H 3. 1H true; choose 1H correct 4. 1H true; choose 0H
The purpose of a decision criterion is to attach some relative importance to the four possible courses of action.
The method for processing the received data depends on the decision criterion we select.
-
Bayesian criterion. Source generates two outputs with given (a priori) probabilities 1 0,P P . These represent the observer information before the experiment is conducted.
The cost is assigned to each course of actions. 00 10 01 11, , ,C C C C . Each time the experiment is conducted a certain cost will be incurred. The decision rule is designed so that on the average the cost will be
as small as possible. Two probabilities are averaged over: the a priori probability and
probability that a particular course of action will be taken.
Source
pr H1(R H1)
pr H0(R H0)Z: Observation space
R
R
Z0
Z0
Say:H0
Z1
Say:H1
-
The expected value of the cost is
( )( )( )( )
00 0 0 0
10 0 1 0
11 1 1 1
01 1 0 1
Pr say | is true
Pr say | is true
Pr say | is true
Pr say | is true
C P H H
C P H H
C P H H
C P H H
=
+
+
+
R
The binary observation rule divides the total observation space Z into two parts: 0 1,Z Z .
Each point in observation space is assigned to one of these sets. The expression of the risk in terms of transition probabilities and the
decision regions:
-
( ) ( )
( ) ( )
0 0
0 1
1 1
1 0
00 0 0 10 0 0r| r|
11 1 1 01 1 1r| r|
R | R R | R
R | R R | R
H HZ Z
H HZ Z
C P p H d C P p H d
C P p H d C P p H d
= +
+ +
R
0 1,Z Z cover the observation space (the integrals integrate to one). We assume that the cost of a wrong decision is higher than the cost
of a correct decision.
10 00
01 11
C C
C C
>
>
For Bayesian test the regions 0Z and 1Z are chosen such that the risk will be minimized.
-
We assume that the decision is to be made for each point in observation space. ( )0 1Z Z Z= +
The decision regions are defined by the statement:
( ) ( )
( ) ( )
0 0
0 0
1 1
0 0
00 0 0 10 0 0| |
11 1 1 01 1 1| |
| |
| |
H HZ Z Z
H HZ Z Z
C P p H d C P p H d
C P p H d C P p H d
= +
+ +
r r
r r
R R R R
R R R R
R
Observing that
( ) ( )0 1
0 1| || | 1
H HZ Z
p H d p H d= = r rR R R R
-
( ) ( )( ) ( )
1
0 0
1 01 11 1|
0 10 1 11
0 10 00 0|
|
|
H
Z H
P C C p HPC PC d
P C C p H
= + + r
r
RR
RR
The integral represents the cost controlled by those points R that we assign to 0Z .
The value of R where the second term is larger than the first contribute to the negative amount to the integral and should be included in 0Z .
The value of R where two terms are equal has no effect. The decision regions are defined by the statement: If ( ) ( ) ( ) ( )
1 01 01 11 1 0 10 11 0r| r|
R | R |H H
P C C p H P C C p H , assign R to 1Z and say that 1H is true. Otherwise assign R to 0Z and say that 0H is true.
-
This may be expressed as:
( )( )
( )( )
01
10
1r| 0 0010
0 1 01 11r|
R |
R |
HH
HH
p H P C C
p H P C C
( )( )( )
1
0
1r|
0r|
R |R
R |H
H
p H
p H = is called likelihood ratio.
Regardless of the dimension of R , ( ) R is one-dimensional variable. Data processing is involved in computing ( )R and is not affected by
the prior probabilities and cost assignments.
The quantity ( )( )
0 10 00
1 01 11
P C C
P C C is the threshold of the test.
-
The can be left as a variable threshold and may be changed if our a priori knowledge or costs are changed.
Bayes criterion has led us to a Likelihood Ratio Test (LRT)
( )0
1
RH
H
An equivalent test is ( )0
1
ln R lnH
H
-
Summary of the Bayesian test: The Bayesian test can be conducted simply by calculating the
likelihood ratio ( )R and comparing it to the threshold. Test design: Assign a-priori probabilities to the source outputs. Assign costs for each action. Assume distribution for ( )
11|
|H
p Hr
R , ( )0
0||
Hp H
rR .
calculate and simplify the ( )R
Likelihood ratio processor
DataProcessor
( )R( )
1
0
RH
H
Threshold device DecisionR
-
Special case.
00 11 0C C= = 01 10 1C C= =
( ) ( )0 1
0 0
0 0 1 1| || |
H HZ Z Z
P p H d P p H d
= + r rR R R RR
( ) ( )01
00 1
1
ln R ln ln ln 1H
H
PP P
P =
When the two hypotheses are equally likely, the threshold is zero.
-
Sufficient statistics.
Sufficient statistics is a function T that transfers the initial data set to the new data set ( )T R that still contains all necessary information contained in R regarding the problem under investigation.
The set that contains a minimal amount of elements is called minimal
sufficient statistics.
When making a decision knowing the value of the sufficient statistic
is just as good as knowing R .
-
The integrals in the Bayes test. False alarm:
( )0
1
0||F H
Z
P p H d= r R R . We say that target is present when it is not.
Probability of detection:
( )1
1
1||D H
Z
P p H d= r R R . Probability of miss:
( )1
0
1||M H
Z
P p H d= r R R . We say target is absent when it is present.
-
Special case: the prior probabilities unknown. Minimax test.
( ) ( )
( ) ( )
0 0
0 0
1 1
0 0
00 0 0 10 0 0| |
11 1 1 01 1 1| |
| |
| |
H HZ Z Z
H HZ Z Z
C P p H d C P p H d
C P p H d C P p H d
= +
+ +
r r
r r
R R R R
R R R R
R
If the regions 0Z and 1Z fixed the integrals are determined. ( ) ( )( )0 10 1 11 1 01 11 0 10 00 1M FPC PC P C C P P C C P= + + R 0 11P P= The Bayesian risk will be function of 1P .
-
( ) ( )
( ) ( ) ( )1 00 10
1 11 00 01 11 10 00
1 F F
M F
P C P C P
P C C C C P C C P
= + + +
R
Bayesian test can be found if all the costs and a priori probabilities are known.
If we know all the probabilities we can calculate the Bayesian cost.
Assume that we do not know 1P and just assume a certain one *
1P and design a corresponding test.
If 1P changes the regions 0 1,Z Z changes and with these also FP and DP .
-
The test is designed for *1P but the actual a priori probability is 1P . By assuming *1P we fix FP and DP . Cost for different 1P is given by a
function ( )*1 1,P PR . Because the threshold is fixed the cost ( )*1 1,P PR is a linear function of *1P .
Bayesian test minimizes the risk for *1P . For other values of 1P
( ) ( )*1 1 1,P P PR R ( )1PR is strictly concave. (If ( )R is a
continuos random variable with strictly
R
RF
RBC11
C00
P1
*
1P10
A function of 1P if *
1P is fixed
-
monotonic probability distribution function, the change of always change the risk.)
Minimax test. The Bayesian test designed to minimize the maximum possible risk is
called a minimax test. 1P is chosen to maximize our risk ( )*1 1,P PR . To minimize the maximum risk we select the *1P for which ( )1PR is
maximum. If the maximum occurs inside the interval [ ]0,1 , the ( )*1 1,P PR will
become a horizontal line. Coefficient of 1P must be zero. ( ) ( ) ( )11 00 01 11 10 00 0M FC C C C P C C P + = This equation is
the minimax equation.
-
RRF
RB
C11
C00
P110
R
RF
RB
C11
C00P1
10
R
RF
RB
C11C00
P110
a) b) c)
Risk curves: maximum value of Rat a) 1 1P = b) 1 0P = c) 10 1P
-
Special case. Cost function is
00 11 0C C= =
01 10,M FC C C C= = . The risk is
( ) ( )1 1 0 1F F M M F F F F M MP C P P C P C P PC P PC P= + = +R . The minimax equation is
M M F FC P C P= .
-
Neyman-Pearson test. Often it is difficult to assign realistic costs of a priori probabilities.
This can be bypassed if to work with the conditional probabilities FP and DP .
We have two conflicting objectives to make FP as small as possible and DP as large as possible.
Neyman-Pearson criterion. Constrain 'FP = and design a test to maximize DP (or minimize
MP ) under this constraint. The solution can be obtained by using Lagrange multipliers. 'M FF P P = +
-
( ) ( )1 0
0 0
1 0r| r|F R | R+ R | R '
H HZ Z Z
p H d p H d
=
If FP = , minimizing F minimizes MP . ( ) ( ) ( )
1 0
0
1 0r| r|F 1 ' R | R | R
H HZ
p H p H d = + For any positive value of an LRT will minimize F. F is minimized if we assign a point R to 0Z only when the term in the
bracket is negative.
If ( )( )
1
0
1r|
0r|
R |
R |H
H
p H
p H< assign point to 0Z (say 0H )
-
F is minimized by the likelihood ratio test. ( )1
0
RH
H
To satisfy the constraint is selected so that 'FP = . ( )
00|
| 'F HP p H d
= = Value of will be nonnegative because ( )
00|
|H
p H will be zero for negative values of .
( )0| 0l Hp X H ( )1| 1l Hp X H
L
( )0| 0l Hp X H ( )1| 1l Hp X H
L
-
Example. We assume that under 1H the source output is a constant voltage m . Under 0H the sourse output is zero. Voltage is corrupted by an additive noise. The out put is sampled with N samples for each second. Each noise sample is a i.i.d. zero mean Gaussian random variable with variance 2 .
1
0
: , 1, 2, ,
: , 1, 2, ,i i
i i
H r m n i N
H r n i N
= + =
= =
( )2
2
1 exp2 2in
Xp X =
The probability density of ir under each hypothesis is:
-
( ) ( ) ( )1
2
21|
1| exp2 2ii
ini ir H
R mp R H p R m
= =
( ) ( ) ( )0
2
20|
1| exp2 2ii
ini ir H
Rp R H p R
= =
The joint probability of N samples is:
( ) ( )1
2
21|1
1| exp2 2
Ni
Hi
R mp H =
= r R
( )0
2
20|1
1| exp2 2
Ni
Hi
Rp H =
= r R
-
The likelihood ratio is
( )
( )22
1
2
21
1 exp2 2
1 exp2 2
Ni
i
Ni
i
R m
R
=
=
=
R
After cancelling common terms and taking logarithm:
( )2
2 21
ln .2
N
ii
m NmR = = R
Likelihood ratio test is:
-
1
0
2
2 21
ln2
N H
iHi
m NmR = ,
1
0
2
1
ln2
N H
iHi
NmRm
=+ .
We use Nmd for normalisation.
1
01
1 ln2
N H
iHi
Nml RN Nm
== +
Under 0H l is obtained by adding N independent zero mean Gaussian variables with variance 2 and then dividing by N . Therefore l is ( )0,1N .
-
Under 1H l is ,1NmN
.
( )
2
log / /2
1 lnexp2 2 2F
d d
x dP dx erfcd
+
= = + where
Nmd is the distance between the means of the two densities.
-
( )
( )
( )
( )
2
log / /2
2
log / /2
1 exp2 2
1 logexp2 2 2
Dd d
d d
x dP dx
y ddy erfcd
+
= = =
In the communication systems a special case is important
( ) 0 1Pr F MP P PP + . If 0 1P P= the threshold is one and ( ) ( )12Pr F MP P + .
-
Receiver Operating Characteristics (ROC).
For a Neyman-Pearson test the values of FP and DP completely specify the test performance.
DP depends on FP . The function of ( )D FP P is defined as the Receiver Operating Characteristic (ROC).
The Receiver Operating Characteristic (ROC) completely describes the performance of the test as a function of the parameters of interest.
-
Example.
-
Properties of ROC. All continuous likelihood tests have ROCs that are concave
downward. All continuous likelihood ratio tests have ROCs that are above the
D FP P= . The slope of a curve in a ROC at a particular point is equal to the
value of the threshold required to achieve the FP and DP at that point
-
Whenever the maximum value of the Bayes risk is interior to the interval (0,1) of the P1 axis the minimax operating point is the intersection of the line
( ) ( )( ) ( )11 00 01 11 10 001 0D FC C C C P C C P + = and the appropriate curve on the ROC.
-
Determination of minimax operating point.
-
Conclusions. Using either the Bayes criterion of a Neyman-Pearson criterion, we
find that the optimum test is a likelihood ratio test. Regardless of the dimension of the observation space the optimum
test consist of comparing a scalar variable ( ) R with the threshold. For the binary hypothesis test the decision space is one dimensional. The test can be simplified by calculating the sufficient statistics. A complete description of the LRT performance was obtained by
plotting the conditioning probabilities DP and FP as the threshold was varied.
-
M Hypotheses. We choose one of M hypotheses There are M source outputs each of which corresponds to one of M
hypotheses. We are forsed to make decisions. There are 2M alternatives that may occur each time the experiment
is conducted.
Probabilistictransition
mechanismSource
NH
#1H0H
Z: Observation space
R
R
Zi
Z0
Say:H0
Z1
Say:H1
Say:Hi
( )0
0r|R |
Hp H
( )1 1r|
R |H
p H
( )r| R |N NHp H
-
Bayes Criterion.
ijC cost of each course of actions.
iZ region in observation space where we chose iH
iP a priori probabilities.
( )1 1
|0 0
|j
i
M M
j ij jHi j Z
PC p H d
= == r R RR
R is minimized through selecting iZ .
-
Example M = 3.
0 1 2Z Z Z Z= + +
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( )
0 0
1 2 1
0 1
2 0 2
1 1
0 2
2 2
0 1 0
2
1
0 00 0 0 10 0| |
0 20 0 1 11 1| |
1 01 1 1 21 1| |
2 22 2 2 02 2| |
2 12 2|
| |
| + |
| |
| + |
|
H HZ Z Z Z
H HZ Z Z Z
H HZ Z
H HZ Z Z Z
HZ
PC p H d PC p H d
PC p H d PC p H d
PC p H d PC p H d
PC p H d PC p H d
PC p H d
= +
+
+ +
+
r r
r r
r r
r r
r
R R R R
R R R R
R R R R
R R R R
R R
R
-
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
2 1
0
0 2
1
0 1
2
0 00 1 11 2 22
2 02 22 2 1 01 11 1| |
0 10 00 0 2 12 22 2| |
0 20 00 0 1 21 11 1| |
| |
| |
| |
H HZ
H HZ
H HZ
PC PC PC
P C C p H P C C p H d
P C C p H P C C p H d
P C C p H P C C p H d
= + +
+ + + + + +
r r
r r
r r
R R R
R R R
R R R
R
R is minimized if we assign each R to the region in which the value of the integrand is the smallest.
Label the integrals ( ) ( ) ( )0 1 2, ,I I IR R R .
- ( ) ( ) ( )0 1 2 0 and , chooseI I I H
-
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
0 2
1 2
0 1
2 1
0 1
0 2
or
0 001 01 11 1 10 2 12 02 2or
or
0 002 02 22 2 20 1 21 01 1or
or
02 12 22 2 20 10 1 21 11 1or
H H
H H
H H
H H
H H
H H
P C C P C C P C C
P C C P C C P C C
P C C P C C P C C
+
+
+
R R
R R
R R
M hypotheses always lead to a decision space that has, at most, 1M dimensions.
Decision Space
2H
1H0H
( )1 R
( )2 R
-
Special case. (often in communication)
00 11 22 0C C C= = = 1,ijC i j= ( ) ( )
( ) ( )
( ) ( )
1 2
1 00 2
1 2
2 00 1
0 2
2 10 1
or
1 1 0 0| |or
or
2 2 0 0| |or
or
2 2 0 1| |or
| |
| |
| |
H H
H HH H
H H
H HH H
H H
H HH H
P p H P p H
P p H P p H
P p H P p H
r r
r r
r r
R R
R R
R R
Compute the posterior probabilities and choose the largest. Maximum a posteriori probability computer.
-
Special case. Degeneration of hypothesis. What happens if to combine 1H and 2H :
12 21 0C C= = . For simplicity
01 10 20 02C C C C= = =
00 11 22 0C C C= = = First two equations of the test reduce to
( ) ( )1 2
0
or
1 1 2 2 0
H H
HP P P + R R
1 2or H H
0H
( )2 R
( )1 R
-
Dummy hypothesis. Actual problem has two hypothesis 1H and 2H . We introduce a new one 0H with a priori probability 0 0P = Let 1 2 1P P+ = and 12 02 21 01,C C C C= = We always choose 1H or 2H . The test reduces to:
( ) ( ) ( ) ( )21
2 12 22 2 1 21 11 1
H
HP C C P C C R R
Useful if ( )( )
2
1
2|
1|
|
|H
H
p H
p Hr
r
R
R difficult to work with but ( )1 R , ( )2 R are
simple.
-
Conclusions. 1. The minimum dimension of the decision space is no more that
1M . The boundaries of the decision regions are hyperplanes in the ( )1 1, , m plane. 2. The test is simple to find but error probabilities are often difficult to compute. 3. An important test is the minimum total probability of error test. Here we compute the a posteriori probability of each test and choose the largest.