decision-making under statisticalmaking under statistical ...€¦ · decision-making under...
Post on 07-Jul-2020
11 Views
Preview:
TRANSCRIPT
Decision-Making under StatisticalDecision Making under Statistical Uncertainty
Jayakrishnan Unnikrishnan
PhD DefenseECE Department
University of Illinois at Urbana ChampaignUniversity of Illinois at Urbana-Champaign
CSL 14112 June 2010
Statistical Decision-Making
Relevant in several contextsReceiver design for communication systemsSensor networks for environment-monitoring and failure detectionfailure detectionDrug-testing
Based on probabilistic model for observations
Well-studied problem but questions still remainUncertain statistical knowledgeUncertain statistical knowledge
2
Statistics in Detection
Example: Likelihood ratio test for binary h thhypotheses
ˆ { ( ) }H L X τ= >I
requires knowledge of likelihood ratio function
{ ( ) }
( )X1
0
( )( )( )
p Xp
L XX
=
3
Imperfect Statistics in Detection
Often perfect statistical knowledge is not available e gavailable e.g.,
fault-onset detectioni t i d t ti
Robust change d t tiintrusion detection
anomaly detectionfilt i
Universal h th i t ti
detection
spam filteringprimary detection and dynamic
t f iti di
hypothesis testing
Online spectrum access for cognitive radio
How to cope with uncertain statistics?F i i d b ti
learning
Focus on i.i.d. observations4
Outline
Robust Quickest Change DetectionDesigning for worst-case guarantees minimaxoptimality
Universal Hypothesis TestingPartial knowledge helpsPartial knowledge helps
Universal Hypothesis TestingUniversal Hypothesis Testing Model Uncertainty
5
Outline
Robust Quickest Change DetectionDesigning for worst-case guarantees minimaxoptimality
6
Quickest Change Detection
Single observation sequenceSStopping time at which change is declaredTradeoff between
Detection dela
τ
Detection delayFrequency of false alarms
Applications: process monitoring quality control7
Applications: process monitoring, quality control
Lorden Criterion
Change-point modeled as deterministic Minimize worst-case delay subject to bound on expected time to false alarm
Mi i i WDD( ) bj t t ( ) B≥E0
Minimize WDD( ) subject to ( ) Bν ττ ≥E
1 11
where WDD( ) supess sup [( 1) | , , ]X X λλ
τ τ λ +−
≥…= − +E
8
Lorden Criterion
Change-point modeled as deterministic Minimize worst-case delay subject to bound on expected time to false alarm
Mi i i WDD( ) bj t t ( ) B≥E0
Minimize WDD( ) subject to ( ) Bν ττ ≥E
CUSUM stopping rule is optimal
1 11
where WDD( ) supess sup [( 1) | , , ]X X λλ
τ τ λ +−
≥…= − +E
CUSUM stopping rule is optimal
1C 1
( )inf{ 1: max }( )
ni
k nXnX
ντ ην≤ ≤= ≥ ≥∏
9
0 ( )i k iXν=
Uncertain Statistics
Most known results assume pre-change and post-change distributions and are knownOften and are not completely known in
li ti
0ν
0ν1ν
1νapplications
10
Example1: Infrastructure Monitoring
Post-fault distribution is uncertain 11
Example 2: Intrusion Detection
Post-intrusion system behavior is uncertain e.g. network security
12
Robust Change Detection
Suppose and are known to be in t i t l f d iti dP
0νP
1νuncertainty classes of densities and Minimax robust formulation
Mi i i d l ll
0P 1P
Minimize worst-case delay among all distributions from and subject to uniform bound on expected time to
0P 1Psubject to uniform bound on expected time to false alarm under all possible distributions from
min sup WDD( )τ
0P
0 0 1 1,min sup WDD( )
ν ντ
∈ ∈P P
00 0
s.t. inf ( ) Bνντ
∈≥
PE
13
0 0ν ∈P
Solution via LFDs
Approach: identify least favorable distributions (LFDs) under a stochastic ordering condition [Veeravalli et al.1994]g [ ]
Like Huber’s approach to robust hypothesis testing [Huber 1965]
14
Solution via LFDs
Approach: identify least favorable distributions (LFDs) under a stochastic ordering condition [Veeravalli et al.1994]g [ ]
Like Huber’s approach to robust hypothesis testing [Huber 1965]
For random variables we denote
1 2X X
1 2if ( ) ( ) for all X t X t t≥ ≥ ≥P P
JSB condition: For let
1 2( ) ( )
0 1 0 1( , )ν ν ∈ ×P P * 1
0
(.)(.)(.)
L νν
=
E.g. -contamination classes, total variation and Prohorovdi t i hb h d
1 1 0 0
* * * *we need ( ( )) ( ( )) and ( ( )) ( ( ))L X L X L X L Xν ν ν ν
εdistance neighborhoods
15
Solution via LFDs
Under JSB and some other regularity conditions the optimal stopping rule designed with respect to LFDsoptimal stopping rule designed with respect to LFDssolves robust problemExample: { (0 1)}P Np
0
1
{ (0,1)}{ ( ,1) : 0.1 3}θ θ
== ≤ ≤
P NP N
Can easily show that LFDs are*0 (0,1)ν = N0*1 (0.1,1)ν = N
16
Cost of Robustness
17
Comparison with GLR test
A benchmark scheme: CUSUM based on G li d Lik lih d R ti (GLR t t)Generalized Likelihood Ratio (GLR test)
1( )inf{ 1: max sup }n
iXn ντ η≥ ≥∑
A t ti ll d CUSUM ith k
1 1
1GLR 1
0
inf{ 1: max sup }( )
ik n
i k i
nXν
τ ην≤ ≤
∈ =
= ≥ ≥∑P
Asymptotically as good as CUSUM with known distributions in exponential familiesOften too complex to implementOften too complex to implement
Robust CUSUM admits simple recursionRobust CUSUM admits simple recursion18
Robust test vs GLR test
19
Other Criteria for Optimality
Pollak criterion: Alternate definition for delaySRP stopping rule is asymptotically optimal
B i it i Ch i t d l dBayesian criterion: Change-point modeled as geometric random variable
Minimize average delay subject to probability ofMinimize average delay subject to probability of false alarm constraintShiryaev test is optimaly
20
Other Criteria for Optimality
Pollak criterion: Alternate definition for delaySRP stopping rule is asymptotically optimal
B i it i Ch i t d l dBayesian criterion: Change-point modeled as geometric random variable
Minimize average delay subject to probability ofMinimize average delay subject to probability of false alarm constraintShiryaev test is optimaly
Robust tests designed for LFDs are optimal
21
Outline
Universal Hypothesis TestingPartial knowledge helpsPartial knowledge helps
22
Universal Hypothesis Testing
Given a sequence of i.i.d. observationst t h th th dtest whether they were drawn
according to a modeled distribution1 2, , , nXX X…
0p
0 0
1 0
NullAlternate : ,
: ~ ~ unknown
i
i
H X pp ppH X ≠
Applications: anomaly detection, spam filt i t
1 0i p pp
filtering etc.
23
Hoeffding’s Universal Test
Hoeffding test is optimal in error-exponent sense:
0ˆ { ( ) }nH D p p τ= >I ‖
Uses Kullback-Leibler divergence as test statistic
0{ : ( ) }q D q p τ≤‖ 0{ : ( ) }q D q p τ≤‖
p
242N n≈
0p
Hoeffding’s Universal Test
Hoeffding test is optimal in error-exponent sense:
0ˆ { ( ) }nH D p p τ= >I ‖
Uses Kullback-Leibler divergence as test statisticSelect for target false alarm probability via
S ’ Th i L D i ti ( t )τ
Sanov’s Theorem in Large Deviations (error-exponents)
0FAˆ( 0) exp( )
pp H nτ= ≠ ≈ −P
Weak convergence under
p
. 22 ( ) dnD p p χ→‖
0p
252N n≈
0 12 ( )n NnnD p p χ −→∞⎯⎯⎯→‖
Error exponents are inaccurate
26Alphabet size, N = 20
Large Alphabet Regime
Hoeffding test performs poorly for large ( l h b t i )
N(alphabet size)
suffers from high bias and variance
0 01)]
2[ (p n
Npn
D p −≈E ‖
0 0 2
1[ ( )]2p nND p p
n−
≈Var ‖
272N n≈
Large Alphabet Regime
Hoeffding test performs poorly for large ( l h b t i )
N(alphabet size)
suffers from high bias and variance
0 01)]
2[ (p n
Npn
D p −≈E ‖
0 0 2
1[ ( )]2p nND p p
n−
≈Var ‖
Can do better if we have partial information about alternate hypothesisalternate hypothesis
282N n≈
Mismatched Test
Mismatched test uses mismatched divergencei t d f KL diinstead of KL divergence
0ˆ { ( ) }MMH D p p τ= >‖I
introduced as a lower bound to KL divergence
0{ ( ) }nH D p p τ>‖I
MM test is equivalent to replacing with ML ti t f f il i it i GLRT
np{ }estimate from a family i.e., it is a GLRT{ }pθ
ˆ0 0( ) ( )MMnD p p D p p
θ=‖ ‖ML
29
0 0( ) ( )n
np p p pθ
‖ ‖ML
Mismatched Test properties
+ Addresses high variance issuesd
0 0 )]2
([ MMp n
MM
dpn
d
D p ≈E ‖
‖
where dθ ∈
H b i l i
0 0 2[ ( )]2
MMp n
dD p pn
≈Var ‖
- However, sub-optimal in error-exponent sense+ Optimal when alternate distribution lies in { }pθ
30
Mismatched Test properties
+ Addresses high variance issuesd
0 0 )]2
([ MMp n
MM
dpn
d
D p ≈E ‖
‖
where dθ ∈
H b i l i
0 0 2[ ( )]2
MMp n
dD p pn
≈Var ‖
- However, sub-optimal in error-exponent sense+ Optimal when alternate distribution lies in { }pθ
Partial knowledge of unknown alternate distribution can give substantial performance improvement for large alphabetsg p
31
Performance comparison
32N = 19, n = 40
Outline
Universal Hypothesis Testing under ModelUniversal Hypothesis Testing under Model Uncertainty
33
Uncertain Null Hypothesis
Consider following hypothesis testing blproblem
0 : ~ for any ,iH X p p ∈P
A robust universal formulation
1 ~ , for : anyiH X q q ∉P
A robust universal formulationRelevant when null hypothesis distribution is uncertain
0p
Pandit and Meyn studied this when isP{ : ( ) ( ) 0}, 1ip p x x i dψ= = ≤ ≤∑P
34
{ ( ) ( ) },ix
p p ψ∑
Robust Hoeffding Test
Robust Hoeffding test
where ( ) : inf ( )ROBD D‖ ‖P
ˆ { ( ) }ROBnH D p τ= >I ‖P
where ( ) : inf ( )O
pD q D q p
∈=‖ ‖
PP
{ : ( ) }q D q p τ≤‖ 0{ : ( ) }q D q p τ≤‖
35
0p
Robust Hoeffding Test
Robust Hoeffding test
where ( ) : inf ( )ROBD D‖ ‖P
ˆ { ( ) }ROBnH D p τ= >I ‖P
where ( ) : inf ( )O
pD q D q p
∈=‖ ‖
PP
{ : ( ) }q D q p τ≤‖
{ : ( ) }ROBq D q τ≤‖P
P0{ : ( ) }q D q p τ≤‖
36
0p
Robust Hoeffding Test
Robust Hoeffding test
where ( ) : inf ( )ROBD D‖ ‖P
ˆ { ( ) }ROBnH D p τ= >I ‖P
Guarantees exponential decay of worst-case f l l b bilit
where ( ) : inf ( )O
pD q D q p
∈=‖ ‖
PP
false alarm probability
ˆmax ( 0) exp( )p p H nτ∈ ≠ ≈ −PP
- Error-exponents not good indicator of error probability
( ) p( )p p∈P
probability37
Weak Convergence Result
Can interpret robust divergence as a i t h d dimismatched divergence
Yields weak convergence result under p. 22 ( )
p
dROBn dnnD p χ→∞⎯⎯⎯→‖P
where gives better approximation for false alarm probability
pd d≤
probabilitySimilar robust Kolmogorov-Smirnov test for continuous distributionscontinuous distributions
38
Kolmogorov-Smirnov Test
Universal hypothesis test for continuous alphabet
0 0: ~iH X F
KS test statistic0sup | ( ) ( ) |n nD F x F x= −
where
p | ( ) ( ) |n nx
1( ) { }n
n iF x X xn
= ≤∑ I
Thresholds set using weak convergence of
1in =
nDProblem of overfitting for large
39n
Robust KS Test
unknown from uncertainty class0F
{ : ( ) ( ) ( ), }F F x F x F x x− += ≤ ≤ ∀F
0 ( )F x
40
Robust KS Test
unknown from uncertainty class0F
{ : ( ) ( ) ( ), }F F x F x F x x− += ≤ ≤ ∀F
( )F x−
0 ( )F x
( )F x+
( )F x
41
Robust KS Test
Uncertainty class via stochastic ordering
{ : ( ) ( ) ( ), }F F x F x F x x− += ≤ ≤ ∀F
Modified test statistic
min sup | ( ) ( ) |FE F x F x= −F
We obtain weak convergence results for that
min sup | ( ) ( ) |n F nx
E F x F x∈F
nEgare useful for setting thresholds
n
42
Conclusion
Various approaches to coping with uncertaintyRobust change detection: Designing for LFDs guarantees minimax optimalityUniversal hypothesis testing: Partial knowledgeUniversal hypothesis testing: Partial knowledge improves performanceDynamic spectrum access: Online learning
43
Conclusion
Various approaches to coping with uncertaintyRobust change detection: Designing for LFDs guarantees minimax optimalityUniversal hypothesis testing: Partial knowledgeUniversal hypothesis testing: Partial knowledge improves performanceDynamic spectrum access: Online learning
ExtensionsPerformance analysis of other robust stopping rulesAdapting dimensionality with observation lengthConvergence rates of weak convergence resultsE di i i d i
d n
Extending to non - i.i.d. setting44
Thank You!Thank You!
45
References
J. Unnikrishnan, D. Huang, S. Meyn, A. Surana, and V. V. Veeravalli, “Universal and Composite Hypothesis Testing via Mismatched Divergence” IEEE Trans. Inf. Theory, revised April 2010.
J. Unnikrishnan, V. V. Veeravalli, and S. Meyn, “Minimax Robust Quickest Change Detection” submitted to IEEE Trans. Inf. Theory, revised May, 2010.
J. Unnikrishnan, S. Meyn, and V. Veeravalli, “On Thresholds for Robust Goodness-of-Fit Tests” to be presented at IEEE pInformation Theory Workshop, Dublin, Aug. 2010.
available at http://www ifp illinois edu/~junnikr2available at http://www.ifp.illinois.edu/ junnikr2
46
top related