selecting observations against adversarial objectives

Carnegie Mellon

Selecting Observations against Adversarial

ObjectivesAndreas Krause

Brendan McMahanCarlos GuestrinAnupam Gupta

Observation selection problems

Place sensors forbuilding automation

Monitor rivers, lakes using robots

Detectcontaminations

in water networksSet V of possible observations (sensor locations,..)Want to pick subset A* µ V such that

For most interesting utilities F, NP-hard!

A¤ = argmaxjA j· k

F (A)

Placement B = {S1,…, S5}

Key observation: Diminishing returns

Placement A = {S1, S2}

Formalization: SubmodularityFor A µ B, F(A [ {S’}) – F(A) ¸ F(B [ {S’}) – F(B)

Adding S’ will help a lot! Adding S’ doesn’t

help muchNew sensor S’

Submodularity[with Guestrin, Singh, Leskovec, VanBriesen, Faloutsos, Glance]

We prove submodularity forMutual information F(A) = H(unobs) – H(unobs|A)

UAI ’05, JMLR ’07 (Spatial prediction)Outbreak detection F(A) = Impact reduction sensing A

KDD ’07 (Water monitoring, …)

Also submodular:Geometric coverage F(A) = area coveredVariance reduction F(A) = Var(Y) – Var(Y|A) …

Why is submodularity useful?

Theorem [Nemhauser et al ‘78]Greedy algorithm gives constant factor approximationF(Agreedy) ¸ (1-1/e) F(Aopt)

Can get online (data dependent) bounds for any algorithmCan significantly speed up greedy algorithmCan use MIP / branch & bound for optimal solution

~63%

12

34

5Greedy Algorithm(forward selection)

sj +1 = argmaxs2VnA j

F (A j [ f sg)

Robust observation selection

What if …… parameters of model P(XV j ) unknown / change?… sensors fail?… an adversary selects the outbreak scenario?

Morevariabilityhere now

new

Attackhere!

Best placement forparameters old

Sensors

Robust prediction

Instead: minimize “width” of the confidence bandsFor every location s 2 V, define Fs(A) = Var(s) – Var(s|A)Minimize “width” simultaneously maximize all Fs(A)Each Fs(A) is (often) submodular! [Das & Kempe ‘07]

Low average variance (MSE)but high maximum

(in most interesting part!)

Typical objective: Minimize average variance (MSE)

Confidencebands

Horizontal positions V

pH v

alue

-3

-2

-1

0

1

2

3

-3

-2

-1

0

1

2

3

-3

-2

-1

0

1

2

3

Adversarial observation selection

Given:Possible observations V, Submodular functions F1,…,Fm

Want to solve

Can model many problems this way:Width of confidence bands: Fi is variance at location iunknown parameters: Fi is info-gain with parameters i

adversarial outbreak scenarios: Fi is utility for scenario i…

Unfortunately, mini Fi(A) is not submodular

One Fi foreach location i

… …A¤ = argmax

jA j· kmin

iF i (A)

How does greedy do?Set A F1 F2 mini Fi

{x} 1 0 0{y} 0 2 0{z} {x,y} 1 2 1{x,z} 1 {y,z} 2

Theorem: The problem max|A|· k mini F(A) does not admit any approximation unless P=NP

Optimalsolution(k=2)

Greedy picksz first

Then, canchoose only

x or y

Greedy does arbitrarily badly. Is there something better?

Alternative formulationIf somebody told us the optimal value,

can we recover the optimal solution A*?

Need to solve dual problem

Is this any easier?

Yes, if we relax the constraint |A| · k

c¤ = maxjA j· k

mini

F i (A)

A¤ = argminA

jA j such that mini

F i (A) ¸ c¤

Solving the alternative problem

Trick: For each Fi and c, define truncation c

|A|

Fi(A)F’i(A)

Set F1 F2 F’1

F’2

F’avg,1 mini Fi

{x} 1 0 1 0 ½ 0{y} 0 2 0 1 ½ 0{z} {x,y}

1 2 1 1 1 1

{x,z}

1 1 (1+)/2

{y,z}

2 1 (1+)/2

mini Fi(A) ¸ c F’avg,c(A) = c

Lemma:

F’avg,c(A)is submodular!

F 0i (A) = minfF i (A);cg

F 0avg;c(A) = 1

mX

iF 0

i (A)

Why is this useful?Can use the greedy algorithm to find (approximate) solution!

Proposition: Greedy algorithm finds

AG with |AG| · k and F’avg,c(AG) = c

where = 1+log maxs i Fi({s})

Back to our example

Guess c=1First pick xThen pick y

Optimal solution!

How do we find c?

Set F1 F2 mini Fi

F’avg,1

{x} 1 0 0 ½{y} 0 2 0 ½{z} {x,y}

1 2 1 1

{x,z}

1 (1+)/2

{y,z}

2 (1+)/2

Submodular Saturation Algorithm

Given set V, integer k and functions F1,…,Fm

Initialize cmin=0, cmax = mini Fi(V)Do binary search: c = (cmin+cmax)/2

Use greedy algorithm to find AG such that F’avg,c(AG) = cIf |AG| > k: decrease cmax

If |AG| · k: increase cmin

until convergencecmaxcmin c

|AG| · k c too low

|AG| > k c too high

Theoretical guarantees

Theorem: If there were a polytime algorithm with better constant < , then NPµ DTIME(nlog log n)

Theorem: Saturate finds a solution AS such that

mini Fi(AS) ¸ OPTk and |AS|· k

where OPTk = max|A|· k mini Fi(A) = 1 + log maxs i Fi({s})

Theorem: The problem max|A|· k mini F(A) does not admit any approximation unless P=NP

Experiments:Minimizing maximum variance in GP regressionRobust biological experimental designOutbreak detection against adversarial contaminations

Goals:Compare against state of the artAnalyze appropriateness of“worst-case” assumption

0 20 40 600

0.05

0.1

0.15

0.2

0.25

Number of sensors

Max

imum

mar

gina

l var

ianc

e

Greedy

SimulatedAnnealing

Saturate

Spatial prediction

Compare to state of the art [Sacks et.al. ’88, Wiens ’05, …]Highly tuned simulated annealing heuristics (7 parameters)

Saturate is competitive & faster, better on larger problems

Environmental monitoring Precipitation data

bette

r

0 20 40 60 80 1000.5

1

1.5

2

2.5

Number of sensors

Max

imum

mar

gina

l var

ianc

e

Greedy

Saturate

SimulatedAnnealing

Maximum vs. average variance

Minimizing the worst-case leads to good average-case score, not vice versa

Environmental monitoring Precipitation data

bette

r

0 5 10 15 200

0.05

0.1

0.15

0.2

0.25

Number of sensors

Mar

gina

l var

ianc

e

Max. var.opt. avg.(Greedy) Max. var.

opt. var.(Saturate)

Avg. var.opt. max.(Saturate)

Avg. var.opt. avg.(Greedy)

0 5 10 15 200

0.5

1

1.5

2

2.5

3

Number of sensors

Mar

gina

l var

ianc

e

Max. var.opt. avg.(Greedy) Max. var.

opt. max.(Saturate)

Avg. var.opt. max.(Saturate)

Avg. var.opt. avg.(Greedy)

Outbreak detection

Results even more prominent on water network monitoring (12,527 nodes)

Water networks

bette

r

Water networks

0 2 4 6 8 100

500

1000

1500

2000

2500

3000

Number of sensors

Det

ectio

n tim

e (m

inut

es)

max DT(Saturate)

max DT(Greedy)

avg DT(Saturate)

avg DT(Greedy)

0 10 20 300

500

1000

1500

2000

2500

3000

Number of sensors

Max

imum

det

ectio

n tim

e (m

inut

es)

Greedy

SimulatedAnnealing

Saturate

Robust experimental design

Learn parameters of nonlinear functionyi = f(xi,) + wChoose stimuli xi to facilitate MLE of Difficult optimization problem!

Common approach: linearization!yi ¼ f(xi,0) + rf0

(xi)T (-0) + wAllows nice closed form (fractional) solution!

How should we choose 0??

Robust experimental design

State-of-the-art: [Flaherty et al., NIPS ‘06]Assume perturbation on Jacobian rf0

(xi)Solve robust SDP against worst-case perturbationMinimize maximum eigenvalue of estimation error (E-optimality)

This paper:Assume perturbation of initial parameter estimate 0

Use Saturate to perform well against all initial parameter estimatesMinimize MSE of parameter estimate(Bayesian A-optimality, typically submodular!)

Experimental setupEstimate parameters of Michaelis-Menten model (to compare results)Evaluate efficiency of designs

Loss of optimal design,knowing true parameter true

Loss of robust design,assuming (wrong) initial parameter 0

e±ciency ´ ¸max[Cov(µ̂ j µtrue;wopt(µtr ue)))]¸max[Cov(µ̂ j µtr ue;w½(µ0))]

Robust design results

Saturate more efficient than SDP if optimizing for high parameter uncertainty

bette

r

Low uncertainty in 0 High uncertainty in 0

A B C A B C

10-1

100

1010

0.2

0.4

0.6

0.8

1

Initial parameter estimate 02

Effi

cien

cy (w

.r.t.

E-o

ptim

ality

)

ClassicalE-optimal

design

SDP = 10-3

true 2

Saturate

10-1

100

1010

0.2

0.4

0.6

0.8

1

Initial parameter estimate 02

Effi

cien

cy (w

.r.t.

E-o

ptim

ality

)

ClassicalE-optimal

design

SDP = 10-3

true 2

Saturate

SDP = 16.3

Future (current) workIncorporating complex constraints (communication, etc.)Dealing with large numbers of objectives

Constraint generationImproved guarantees for certain objectives (sensor failures)

Trading off worst-case and average-case scores

0 200 400 600 8000

500

1000

1500

2000

2500

3000

Expected score

Adv

ersa

rial s

core

k=5k=10

k=15k=20

ConclusionsMany observation selection problems require optimizing adversarially chosen submodular function

Problem not approximable to any factor!Presented efficient algorithm: Saturate

Achieves optimal score, with bounded increase in costGuarantees are best possible under reasonable complexity assumptions

Saturate performs well on real-world problemsOutperforms state-of-the-art simulated annealing algorithms for sensor placement, no parameters to tuneCompares favorably with SDP based solutions for robust experimental design

A¤ = argmaxjA j· k

mini

F i (A)

selecting observations against adversarial objectives

Documents

mini fa

c lemma

exampleguess c

location s

cmin cmax2use greedy

np optimal solution

cais submodular

fbadding s