multistage stochastic optimization€¦ · consider n 2 st1. then for all m 2 c(n): qm(xn)= min...

Multistage Stochastic Optimization

Shabbir AhmedGeorgia Tech

IMA 2016

Outline•  Setup–  Deterministic problem

–  Uncertainty model–  Dynamics

•  Formulations

–  Extensive formulation–  Scenario formulation

–  Dynamic Programming formulation•  Algorithms

–  Rolling horizon heuristic–  Scenario decomposition

–  Stagewise decomposition

Multistage Optimization

•  Canonical deterministic problem

•  = state variables; = local/stage variables

•  Linear objective and constraints

•  Bounded and feasible

xt yt

Atxt�1 +Btxt + Ctyt � bt

minx,y

(TX

t=1

f

t

(xt

, y

t

) : (xt�1, xt

, y

t

) 2 X

t

8 t

)

Example: Hydro Power Planning

Howmuchhydropowertogenerateineachperiodtosa4sfydemand?



⇠

d

uv

h

minP

t

(bt

qt

+ ct

ut

+ gt

vt

)s.t. h

t

= ht�1 + ⇠

t

� pt

+ ut

� vt

8 t↵p

t

+ qt

= dt

8 t0 h

t

hmax 8 tpt

, qt

, ut

, vt

� 0 8 t.

p

q

↵p



⇠

d

uv

h

minP

t

(bt

qt

+ ct

ut

+ gt

vt

)s.t. h

t

= ht�1 + ⇠

t

� pt

+ ut

� vt

8 t↵p

t

+ qt

= dt

8 t0 h

t

hmax 8 tpt

, qt

, ut

, vt

� 0 8 t.

p

q

↵p

xt = (ht)yt = (pt, qt, ut, vt)

Multistage Stochastic Optimization

•  Stochastic Data:

•  Dynamics:

{⇠t = (ft, Xt)}Tt=1

⇠t

#. . . (xt�1, yt�1) �! (xt, yt) . . .

minx,y

(TX

t=1

f

t

(xt

, y

t

) : (xt�1, xt

, y

t

) 2 X

t

8 t

)

Formulation

•  Optimize expected cost•  Expectation is w.r.t conditional distribution i.e. at stage t

•  Natural extension of the two-stage formulation

•  Decisions in stage t depend on history up to stage t (called a policy)•  Assume “complete recourse”: For any value of state variables and

data realization, there exists a feasible local/stage solution.

min(x1,y1)2X1

n

f1(x1, y1) + Eh

min(x2,y2)2X2(x1)

n

f2(x2, y2) + · · ·

+ Eh

min(xT ,yT )2XT (xT�1)

n

f

T

(xT

, y

T

)oioio

E⇠[t,T ][ · |⇠[1,t�1]]

Dynamics•  The number of “stages” depend on the decision dynamics and

not on the number of periods

•  Consider the following dynamics:

•  Equivalent to:

This is a two-stage (multiperiod) model

⇠1 ⇠t

# #(x1, . . . , xT ) �! (y1) . . . (yt�1) �! (yt) . . .

(⇠1, . . . , ⇠T )#

(x1, . . . , xT ) �! (y1, . . . , yT )

Applications

• Energy• Naturalresources• Healthcare• Logis4cs• Telecommunica4ons• Finance• …

Scenario Tree

•  Dynamic uncertainty modeling: Scenario tree•  Explicit construction/discretization

•  Monte Carlo Sampling•  Markov Chain (implicit)

Formulations

•  Extensive form

•  Scenario formulation

•  Dynamic programming formulation

Extensive Form

minxn,yn

(X

n2Tp

n

f

n

(xn

, y

n

) : (xa(n), xn

, y

n

) 2 X

n

8 n 2 T).

1

2

3

4

5

6

7

Scenario Formulation

minx,y

X

s

p

s

⇣X

t

f

s,t

(xs,t

, y

s,t

)⌘

s.t. (xs,t�1, xs,t

, y

s,t

) 2 X

s,t

8 s, t

x

s,t

= x

s

0,t

8 (s, s0) 2 N

t

, 8 t

1

2

3

4

5

6

7

s=1

s=2

s=3

s=4

NonAn4cipa4vityConstraints

Dynamic Programming

•  Formulation:

•  Expected Cost-to-go (ECTG) function:

•  Cost-to-go function:

minx1,y1

�f1(x1, y1) +Q1(x1) : (x

a(1), x1, y1) 2 X1

Q

m

(xn

) = minxm,ym

{fm

(xm

, y

m

) +Qm

(xm

) : (xn

, x

m

, y

m

) 2 X

m

} .

Qn(xn) :=X

m2C(n)

qnmQm(xn)

C(n)m

Dynamic Programming

•  Formulation:



minx1,y1

�f1(x1, y1) +Q1(x1) : (x

a(1), x1, y1) 2 X1

Q

m

(xn

) = minxm,ym

{fm

(xm

, y

m

) +Qm

(xm

) : (xn

, x

m

, y

m

) 2 X

m

} .

Qn(xn) :=X

m2C(n)

qnmQm(xn)

C(n)m

Condi4onalprobabilityqnm = pmP

m02C(n) pm0

Exercisemin

Pt

(bt

qt

+ ct

ut

+ gt

vt

)s.t. h

t

= ht�1 + ⇠

t

� pt

+ ut

� vt

8 t↵p

t

+ qt

= dt

8 t0 h

t

hmax 8 tpt

, qt

, ut

, vt

� 0 8 t.

⇠

d

uv

hp

q

↵p

1

2

3

4

5

6

7

1.0

0.4

0.6

0.2

0.2

0.2

0.4

Supposetheinflowsarestochas4candfollowthescenariotreeshown(withassociatedprobabili4es).Writedown•  Extensiveform•  Scenarioformula4on•  DPformula4on•  Two-stageformula4on(assumingwater

levelsaredecidedhereandnow)

Solution Approaches

Extensive form is usually too large to be solved as one monolithic problem. Need for decomposition approaches.

•  Two stage approximation + Rolling horizon

•  Scenario Decomposition: –  Progressive Hedging

•  Stagewise Decomposition

–  Stochastic Dual Dynamic Programming

–  Nested Benders

Two-Stage Approximations

min

(f1(x1, y1) +

X

s

ps

� TX

t=2

fs,t(xs,t, ys,t)�)

min(x1,...,xT )

(X

n2Tp

n

min

yn2Xn(x)f

n

(xtn , yn)

�)

x

y1

2

3

4

5

6

7

•  Restric4on•  Firststage=xvariables•  Secondstage=y

variables•  Decomposebynode•  Feasiblesolu4on

•  Relaxa4on•  Firststage=Firstperiod•  Secondstage=Scenariovariables•  Decomposebyscenario

subproblems

Rolling Horizon Heuristicfor all t = 1, . . . , T :

for all n 2 St:

- solve two-stage approximation for the multistage problem over

subtree Tn with initial conditions xa(n)

- let {x⇤m, y

⇤m}m2Tn be an optimal solution

- set xn = x

⇤n, yn = y

⇤n (root of Tn)

end-for

end-for

return feasible solution (xn, yn) for all n 2 T

St

Tn

Progressive Hedging

minx,y

X

s

p

s

⇣X

t

f

s,t

(xs,t

, y

s,t

)⌘


, y

s,t

) 2 X

s,t

8 s, t

x

s,t

= x

s

0,t

8 (s, s0) 2 N

t

, 8 t

1

2

3

4

5

6

7

s=1

s=2

s=3

s=4

Progressive Hedging

PH=ADMM(alterna4ngdirec4onmethodofmul4pliers)appliedtothescenarioformula4on

Progressive Hedging

1

2

3

4

5

6

7

s=1

s=2

s=3

s=4

minx,y,z

X

s

p

s

⇣X

t

f

s,t

(xs,t

, y

s,t

)⌘


, y

s,t

) 2 X

s,t

8 s, t

x

s,t

= z

n

8n 2 St

, 8s 2 n, 8 t

Introducenodalvariables

Progressive Hedging

1

2

3

4

5

6

7

s=1

s=2

s=3

s=4

minx,y,z

X

s

p

s

⇣X

t

f

s,t

(xs,t

, y

s,t

)+

�

>s,t

(xs,t

� z

n

) + ⇢

2 ||xs,t

� z

n

||2⌘


, y

s,t

) 2 X

s,t

8 s, t

AugmentedLagrangian:

X

s2n

⇣ psPs02n ps0

⌘�s,tn = 0Dualfeasibility:

Progressive Hedging1. Fix z

i

and �

i

and solve for (x

i+1, y

i+1):

- For each s solve

min

x,y

Pt

�f

s,t

(x

s,t

, y

s,t

) + (�

i

s,t

)

>(x

s,t

� z

i

n

)

+

⇢

2 ||xs,t

� z

i

n

||2�

s.t. (x

s,t�1, xs,t

, y

s,t

) 2 X

s,t

8 t.

- Let (x

i+1s,t

, y

i+1s,t

) be the corresponding optimal solutions.

2. Solve for z

i+1:

- Set z

i+1n

=

X

s2n

✓p

sPs

02n

p

s

0

◆x

i+1s,t

3. Update �:

- Set �

i+1s,t

= �

i

s,t

+ ⇢(x

i+1s,t

� z

i+1n

)

PH Convergence

Theorem: (Under mild assumptions) The sequence of primal and dual solutions converge to an optimal solution of the scenario formulation.

At every iteration, values of nodal variables (z) constitute a feasible state vector and if corresponding local variables are computed, then we have a feasible solution

-RockafellarandWets(1991)

Stochastic Dual Dynamic Programming

Dynamic Programming

•  Formulation:



minx1,y1

�f1(x1, y1) +Q1(x1) : (x

a(1), x1, y1) 2 X1

Q

m

(xn

) = minxm,ym

{fm

(xm

, y

m

) +Qm

(xm

) : (xn

, x

m

, y

m

) 2 X

m

} .

Qn(xn) :=X

m2C(n)

qnmQm(xn)

C(n)m

ConvexityProposition: ECTG function in every node is polyhedral, i.e. piecewise linear and convex.

Proof: (Backward induction)

Suppose for all m 2 St

we have Qm

(x

m

) is polyhedral. Consider n 2 St�1.

Then for all m 2 C(n):

Q

m

(x

n

) = min

xm,ym

{fm

(x

m

, y

m

) +Qm

(x

m

) : A

m

x

n

+B

m

x

m

+ C

n

y

m

� b

m

}= min

xm,ym

{f 0m

(x

m

, y

m

) : B

m

x

m

+ C

n

y

m

� b

m

�A

m

x

n

}

= max

⇡�0

min

xm,ym

{f 0m

(x

m

, y

m

)� ⇡

>(B

m

x

m

+ C

n

y

m

)}+ ⇡

>(b

m

�A

m

x

n

)

�

= max

⇡�0

⇥v

m

(⇡)� ⇡

>A

m

x

n

)

⇤

where the equality follows from strong duality of linear programs and the in-

duction hypothesis. Thus Q

m

is convex. By LP duality finitely many dual

solutions, thus it is piecewise linear. So Q

m

is polyhedral. Since

Qn

(x

n

) =

X

m2C(n)

q

nm

Q

m

(x

n

)

the claim follows.

Stage-wise Independence

•  Stage has independent realizations

•  “Recombining” scenario tree

•  One expected cost-to-go function per stage

•  implicitly encodes solution (policy): action under -th realization in stage given previous action is

t Nt

Qn(·) ⌘ Qt(·) 8 n 2 St

Qt(·) jt xt�1

(xt

, y

t

) = argminx,y

{fj,t

(x, y) +Qt

(x) : (xt�1, x, y) 2 X

j,t

}

x1

x2

x3

Q1(x1)

Q2(x2)

Q3(x3)

Illustra(onofSDDP

x1

x2

x3

Q1(x1)

Q2(x2)

Q3(x3)

Iter1:Forwardpass

x1

x2

x3

Q1(x1)

Q2(x2)

Q3(x3)

Iter1:Backwardpass

Benders’Cut

x1

x2

x3

Q1(x1)

Q2(x2)

Q3(x3)

Iter1:Backwardpass

x1

x2

x3

Q1(x1)

Q2(x2)

Q3(x3)

Iter2:Forwardpass

x1

x2

x3

Q1(x1)

Q2(x2)

Q3(x3)

Iter2:Backwardpass

x1

x2

x3

Q1(x1)

Q2(x2)

Q3(x3)

Iter3:Forwardpass

x1

x2

x3

Q1(x1)

Q2(x2)

Q3(x3)

Iter3:Backwardpass

x1

x2

x3

Q1(x1)

Q2(x2)

Q3(x3)

Iter4:Forwardpass

x1

x2

x3

Q1(x1)

Q2(x2)

Q3(x3)

Iter4:Backwardpass

Details

Forward problem at node (j,t) iteration i:

t� 1 t+ 1t

j = 1

j =...

j = N

x

itSolve to get

Q

i

j,t

(xi

t�1) := minx,y,✓

f

j,t

(x, y) + ✓

s.t. B

j,t

x+ C

j,t

y � b

j,t

�A

j,t

x

i

t�1 (⇢j,t

)

✓ � 1N

hPj

0 v`

j

0,t+1 + (⇡`

j

0,t+1)

>x

i8 ` = 1, . . . , i� 1

Details

Backward pass: for all j in stage t+1 solve:

t� 1 t+ 1t

j = 1

j =...

j = N

Q

i+1j,t+1

(xi

t

) := minx,y,✓

f

j,t+1(x, y) + ✓

s.t. B

j,t+1x+ C

j,t+1y � b

j,t+1 �A

j,t+1xi

t

(⇢j,t+1)

✓ � 1N

hPj

0 v`

j

0,t+2 + (⇡`

j

0,t+2)

>x

i8 ` = 1, . . . , i

Get dual optimal solution: ⇢j,t+1

Details


t� 1 t+ 1t

j = 1

j =...

j = N

Q

i+1j,t+1

(xi

t

) := minx,y,✓

f

j,t+1(x, y) + ✓

s.t. B

j,t+1x+ C

j,t+1y � b

j,t+1 �A

j,t+1xi

t

(⇢j,t+1)

✓ � 1N

hPj

0 v`

j

0,t+2 + (⇡`

j

0,t+2)

>x

i8 ` = 1, . . . , i

Compute cut coefficients:

v

ij,t+1 = Q

i+1j,t+1

(xit) + ⇢

>j,t+1Aj,t+1x

it

(⇡ij,t+1)

> = �⇢>j,t+1Aj,t+1

Details


t� 1 t+ 1t

j = 1

j =...

j = N

Q

i+1j,t+1

(xi

t

) := minx,y,✓

f

j,t+1(x, y) + ✓

s.t. B

j,t+1x+ C

j,t+1y � b

j,t+1 �A

j,t+1xi

t

(⇢j,t+1)

✓ � 1N

hPj

0 v`

j

0,t+2 + (⇡`

j

0,t+2)

>x

i8 ` = 1, . . . , i

Compute cut coefficients:

v

ij,t+1 = Q

i+1j,t+1

(xit) + ⇢

>j,t+1Aj,t+1x

it

(⇡ij,t+1)

> = �⇢>j,t+1Aj,t+1

✓ � 1

N

2

4X

j0

v

ij0,t+1 + (⇡i

j0,t+1)>x

3

5

Add cut to all nodes in t:

SDDP Bounds •  Stochastic algorithm•  Forward pass generates candidate solutions along sample

paths, average of these objective values provides a statistical upper bound

•  Optimal value of stage 1 problem (with lower approximation of ECTGF) provides a deterministic lower bound

Stochastic Dual Dynamic Programming

SDDP Success

Dr. Pereira developed the well-known stochastic dual DP methodology, used by the National System Operator for the dispatch of Brazil’s 95 GW hydrothermal generation systems; and by the Wholesale Energy Market for the calculation of spot prices.

Itaipu Dam (14GW)

SDDP Convergence

Theorem: If sampling is done with replacement and basic dual optimal solutions are used to construct Benders’ cuts, then w.p. 1 SDDP converges in a finite number of iterations to an optimal policy for a multistage stochastic LP.

Structure of LP value function structure is crucial

•  Chen&Powell(1999)•  Philpo\&Guan(2008)•  Shapiro(2011)•  Girardeauetal.(2014)•  …

Cut Conditions and Convergence

•  Valid:

•  Tight:

•  Finite: Possible cut coefficients finite

Qj,t(x) � v

ij,t + (⇡i

j,t)>x 8 x 2 {0, 1}d

Q

i+1j,t

(xit�1) = v

ij,t + (⇡i

j,t)>x

it�1

Theorem: If sampling is done with replacement and cuts are valid, tight and finite, then w.p. 1 SDDP converges in a finite number of iterations to an optimal policy.

Convergence Proof

•  Consider extensive tree structure since solutions are path dependent

•  Assumption: A solution/policy generated in the forward pass at iteration is completely specified by the approximate ECTG functions

•  Lemma 1: If then is optimal.

Proof: Backward induction of DP equations

{ it(·)}Tt=1

{xin}n2T

i

{xin}n2T i

tn(xin) = Qtn(x

in) 8 n 2 T

x

i

n

=

⇢argmin

x,y

f

n

(x, y) + i

tn(x)

s.t. (xi

a(n), x, y) 2 X

n

�

Convergence Proof (contd.)

•  Let

•  Two types of iterations:(a)  At least one approx ECTG function changes

(b)  Approx ECTG functions do not change so policy does not change

•  Finite cuts mean finitely many approximations of ECTG

•  Finitely many possible solutions (because of assumption) means the summation is finite

K = sup{i : {xin}n2T is not optimal}

K = Ka +Kb

K = Ka

+P

x

Kx

b

Ka < +1

Convergence Proof (contd.)

•  Lemma 2: Proof:

–  If solution is not optimal then by Lemma 1 there is a (last stage) node where approximation is loose

–  By Borel-Cantelli lemma, w.p.1 forward pass will hit this node within finite iterations, then a tight cut is generated, and we have new approximation of ECTGF

•  Thus

Pr[Kx

b

< +1] = 1

1 = Pr[Kx

b

< +1] Pr[K = Ka +P

x

Kx

b

< +1]

Q.E.D

Nested Benders Decomposition

•  General scenario tree•  Maintain a approximate problem in every node

•  Forward pass: solve all nodes in a stage before moving to next stage.

•  Backward pass: solve all nodes in a stage, and pass back cuts to ancestor problems.

•  Deterministic algorithm•  Implementations: Gussmann (1990), Birge (1996), King (1994)

etc.

Selected References1.  Birge (1985): Decomposition and Partitioning methods for Multistage Stochastic

Linear Programs, Operations Research, pp.989-1007

2.  Birge and Louveaux (2011): Introduction to Stochastic Programming (2nd Ed). Springer.

3.  Chen and Powell (1999): Convergent cutting plane and partial-sampling algorithm for multistage stochastic linear programs with recourse, Journal of Optimization Theory and Applications, pp. 497-524.

4.  GAMS: sddp.gms https://www.gams.com/modlib/libhtml/sddp.htm5.  Gassmann (1990): MSLiP: A computer code for the multistage stochastic linear

programming problem, Math. Prog. pp.407-423.

6.  Pereira and Pinto (1991): Multi-stage Stochastic Optimization applied to Energy Planning, Math. Prog., pp.359-375.

7.  Philpott and Guan (2008): On the convergence of stochastic dual dynamic programming and related methods, OR Letters, pp. 450-455.

8.  Rockafellar and Wets (1991): Scenarios and Policy Aggregation in Optimization under Uncertainty, Math. Of OR, pp.119-147.

9.  Shapiro, Dentcheva, and Ruszczynski (2014): Lectures on Stochastic Programming: Modeling and Theory (2nd Ed). SIAM Publishers.

multistage stochastic optimization€¦ · consider n 2 st1. then for all m 2 c(n): qm(xn)= min...

Documents