selecting observations against adversarial objectives
DESCRIPTION
Selecting Observations against Adversarial Objectives. Andreas Krause Brendan McMahan Carlos Guestrin Anupam Gupta. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A. Observation selection problems. - PowerPoint PPT PresentationTRANSCRIPT
Carnegie Mellon
Selecting Observations against Adversarial
ObjectivesAndreas Krause
Brendan McMahanCarlos GuestrinAnupam Gupta
Observation selection problems
Place sensors forbuilding automation
Monitor rivers, lakes using robots
Detectcontaminations
in water networksSet V of possible observations (sensor locations,..)Want to pick subset A* µ V such that
For most interesting utilities F, NP-hard!
A¤ = argmaxjA j· k
F (A)
Placement B = {S1,…, S5}
Key observation: Diminishing returns
Placement A = {S1, S2}
Formalization: SubmodularityFor A µ B, F(A [ {S’}) – F(A) ¸ F(B [ {S’}) – F(B)
Adding S’ will help a lot! Adding S’ doesn’t
help muchNew sensor S’
Submodularity[with Guestrin, Singh, Leskovec, VanBriesen, Faloutsos, Glance]
We prove submodularity forMutual information F(A) = H(unobs) – H(unobs|A)
UAI ’05, JMLR ’07 (Spatial prediction)Outbreak detection F(A) = Impact reduction sensing A
KDD ’07 (Water monitoring, …)
Also submodular:Geometric coverage F(A) = area coveredVariance reduction F(A) = Var(Y) – Var(Y|A) …
Why is submodularity useful?
Theorem [Nemhauser et al ‘78]Greedy algorithm gives constant factor approximationF(Agreedy) ¸ (1-1/e) F(Aopt)
Can get online (data dependent) bounds for any algorithmCan significantly speed up greedy algorithmCan use MIP / branch & bound for optimal solution
~63%
12
34
5Greedy Algorithm(forward selection)
sj +1 = argmaxs2VnA j
F (A j [ f sg)
Robust observation selection
What if …… parameters of model P(XV j ) unknown / change?… sensors fail?… an adversary selects the outbreak scenario?
Morevariabilityhere now
new
Attackhere!
Best placement forparameters old
Sensors
Robust prediction
Instead: minimize “width” of the confidence bandsFor every location s 2 V, define Fs(A) = Var(s) – Var(s|A)Minimize “width” simultaneously maximize all Fs(A)Each Fs(A) is (often) submodular! [Das & Kempe ‘07]
Low average variance (MSE)but high maximum
(in most interesting part!)
Typical objective: Minimize average variance (MSE)
Confidencebands
Horizontal positions V
pH v
alue
-3
-2
-1
0
1
2
3
-3
-2
-1
0
1
2
3
-3
-2
-1
0
1
2
3
Adversarial observation selection
Given:Possible observations V, Submodular functions F1,…,Fm
Want to solve
Can model many problems this way:Width of confidence bands: Fi is variance at location iunknown parameters: Fi is info-gain with parameters i
adversarial outbreak scenarios: Fi is utility for scenario i…
Unfortunately, mini Fi(A) is not submodular
One Fi foreach location i
… …A¤ = argmax
jA j· kmin
iF i (A)
How does greedy do?Set A F1 F2 mini Fi
{x} 1 0 0{y} 0 2 0{z} {x,y} 1 2 1{x,z} 1 {y,z} 2
Theorem: The problem max|A|· k mini F(A) does not admit any approximation unless P=NP
Optimalsolution(k=2)
Greedy picksz first
Then, canchoose only
x or y
Greedy does arbitrarily badly. Is there something better?
Alternative formulationIf somebody told us the optimal value,
can we recover the optimal solution A*?
Need to solve dual problem
Is this any easier?
Yes, if we relax the constraint |A| · k
c¤ = maxjA j· k
mini
F i (A)
A¤ = argminA
jA j such that mini
F i (A) ¸ c¤
Solving the alternative problem
Trick: For each Fi and c, define truncation c
|A|
Fi(A)F’i(A)
Set F1 F2 F’1
F’2
F’avg,1 mini Fi
{x} 1 0 1 0 ½ 0{y} 0 2 0 1 ½ 0{z} {x,y}
1 2 1 1 1 1
{x,z}
1 1 (1+)/2
{y,z}
2 1 (1+)/2
mini Fi(A) ¸ c F’avg,c(A) = c
Lemma:
F’avg,c(A)is submodular!
F 0i (A) = minfF i (A);cg
F 0avg;c(A) = 1
mX
iF 0
i (A)
Why is this useful?Can use the greedy algorithm to find (approximate) solution!
Proposition: Greedy algorithm finds
AG with |AG| · k and F’avg,c(AG) = c
where = 1+log maxs i Fi({s})
Back to our example
Guess c=1First pick xThen pick y
Optimal solution!
How do we find c?
Set F1 F2 mini Fi
F’avg,1
{x} 1 0 0 ½{y} 0 2 0 ½{z} {x,y}
1 2 1 1
{x,z}
1 (1+)/2
{y,z}
2 (1+)/2
Submodular Saturation Algorithm
Given set V, integer k and functions F1,…,Fm
Initialize cmin=0, cmax = mini Fi(V)Do binary search: c = (cmin+cmax)/2
Use greedy algorithm to find AG such that F’avg,c(AG) = cIf |AG| > k: decrease cmax
If |AG| · k: increase cmin
until convergencecmaxcmin c
|AG| · k c too low
|AG| > k c too high
Theoretical guarantees
Theorem: If there were a polytime algorithm with better constant < , then NPµ DTIME(nlog log n)
Theorem: Saturate finds a solution AS such that
mini Fi(AS) ¸ OPTk and |AS|· k
where OPTk = max|A|· k mini Fi(A) = 1 + log maxs i Fi({s})
Theorem: The problem max|A|· k mini F(A) does not admit any approximation unless P=NP
Experiments:Minimizing maximum variance in GP regressionRobust biological experimental designOutbreak detection against adversarial contaminations
Goals:Compare against state of the artAnalyze appropriateness of“worst-case” assumption
0 20 40 600
0.05
0.1
0.15
0.2
0.25
Number of sensors
Max
imum
mar
gina
l var
ianc
e
Greedy
SimulatedAnnealing
Saturate
Spatial prediction
Compare to state of the art [Sacks et.al. ’88, Wiens ’05, …]Highly tuned simulated annealing heuristics (7 parameters)
Saturate is competitive & faster, better on larger problems
Environmental monitoring Precipitation data
bette
r
0 20 40 60 80 1000.5
1
1.5
2
2.5
Number of sensors
Max
imum
mar
gina
l var
ianc
e
Greedy
Saturate
SimulatedAnnealing
Maximum vs. average variance
Minimizing the worst-case leads to good average-case score, not vice versa
Environmental monitoring Precipitation data
bette
r
0 5 10 15 200
0.05
0.1
0.15
0.2
0.25
Number of sensors
Mar
gina
l var
ianc
e
Max. var.opt. avg.(Greedy) Max. var.
opt. var.(Saturate)
Avg. var.opt. max.(Saturate)
Avg. var.opt. avg.(Greedy)
0 5 10 15 200
0.5
1
1.5
2
2.5
3
Number of sensors
Mar
gina
l var
ianc
e
Max. var.opt. avg.(Greedy) Max. var.
opt. max.(Saturate)
Avg. var.opt. max.(Saturate)
Avg. var.opt. avg.(Greedy)
Outbreak detection
Results even more prominent on water network monitoring (12,527 nodes)
Water networks
bette
r
Water networks
0 2 4 6 8 100
500
1000
1500
2000
2500
3000
Number of sensors
Det
ectio
n tim
e (m
inut
es)
max DT(Saturate)
max DT(Greedy)
avg DT(Saturate)
avg DT(Greedy)
0 10 20 300
500
1000
1500
2000
2500
3000
Number of sensors
Max
imum
det
ectio
n tim
e (m
inut
es)
Greedy
SimulatedAnnealing
Saturate
Robust experimental design
Learn parameters of nonlinear functionyi = f(xi,) + wChoose stimuli xi to facilitate MLE of Difficult optimization problem!
Common approach: linearization!yi ¼ f(xi,0) + rf0
(xi)T (-0) + wAllows nice closed form (fractional) solution!
How should we choose 0??
Robust experimental design
State-of-the-art: [Flaherty et al., NIPS ‘06]Assume perturbation on Jacobian rf0
(xi)Solve robust SDP against worst-case perturbationMinimize maximum eigenvalue of estimation error (E-optimality)
This paper:Assume perturbation of initial parameter estimate 0
Use Saturate to perform well against all initial parameter estimatesMinimize MSE of parameter estimate(Bayesian A-optimality, typically submodular!)
Experimental setupEstimate parameters of Michaelis-Menten model (to compare results)Evaluate efficiency of designs
Loss of optimal design,knowing true parameter true
Loss of robust design,assuming (wrong) initial parameter 0
e±ciency ´ ¸max[Cov(µ̂ j µtrue;wopt(µtr ue)))]¸max[Cov(µ̂ j µtr ue;w½(µ0))]
Robust design results
Saturate more efficient than SDP if optimizing for high parameter uncertainty
bette
r
Low uncertainty in 0 High uncertainty in 0
A B C A B C
10-1
100
1010
0.2
0.4
0.6
0.8
1
Initial parameter estimate 02
Effi
cien
cy (w
.r.t.
E-o
ptim
ality
)
ClassicalE-optimal
design
SDP = 10-3
true 2
Saturate
10-1
100
1010
0.2
0.4
0.6
0.8
1
Initial parameter estimate 02
Effi
cien
cy (w
.r.t.
E-o
ptim
ality
)
ClassicalE-optimal
design
SDP = 10-3
true 2
Saturate
SDP = 16.3
Future (current) workIncorporating complex constraints (communication, etc.)Dealing with large numbers of objectives
Constraint generationImproved guarantees for certain objectives (sensor failures)
Trading off worst-case and average-case scores
0 200 400 600 8000
500
1000
1500
2000
2500
3000
Expected score
Adv
ersa
rial s
core
k=5k=10
k=15k=20
ConclusionsMany observation selection problems require optimizing adversarially chosen submodular function
Problem not approximable to any factor!Presented efficient algorithm: Saturate
Achieves optimal score, with bounded increase in costGuarantees are best possible under reasonable complexity assumptions
Saturate performs well on real-world problemsOutperforms state-of-the-art simulated annealing algorithms for sensor placement, no parameters to tuneCompares favorably with SDP based solutions for robust experimental design
A¤ = argmaxjA j· k
mini
F i (A)