distributed solution of stochastic optimal control problem on gpus

Distributed solution of stochastic optimal controlproblem on GPUs

Ajay K. Sampathiraoa, P. Sopasakisa, A. Bemporada and P. Patrinosb

a IMT Institute for Advanced Studies Lucca, Italyb Dept. Electr. Eng. (ESAT), KU Leuven, Belgium

December 18, 2015

Applications

X Microgrids [Hans et al. ’15]

X Drinking water networks [Sampathirao et al. ’15]

X HVAC [Long et al. ’13, Zhang et al. ’13, Parisio et al. ’13]

X Financial systems [Patrinos et al. ’11, Bemporad et al., ’14]

X Chemical process [Lucia et al. ’13]

X Distillation column [Garrido and Steinbach, ’11]

1 / 28

Motivation

Stochastic optimisation is not fit for control applications.

2 / 28

Spoiler alert!

Example:

I 920, 000 decision variables

I Interior point runtime 35s

I GPU APG solver < 3s

3 / 28

Outline

1. Stochastic optimal control problem formulation

2. Accelerated proximal gradient algorithm

3. Parallelisable implementation

4. Simulations

4 / 28

I. Stochastic Optimal Control

System description

Discrete-time uncertain linear system:

xk+1 = Aξkxk +Bξkuk + wξk ,

ξk is a random variable on a prob. space (Ωk,Fk,Pk). At time k weobserve xk but not ξk.

5 / 28

Stochastic optimal control problem

Optimisation problem:

V ?(p) = minπ=ukk=N−1

k=0

E

[Vf (xN , ξN ) +

N−1∑k=0

`k(xk, uk, ξk)

],

s.t x0 = p,

xk+1 = Aξkxk +Bξkuk + wξk ,

where:

I E[·]: conditional expectation wrt the product probability measure

I Casual policy uk = ψk(p,ξξξk−1), with ξξξk = (ξ0, ξ1, . . . , ξk)

I ` and Vf can encode constraints

6 / 28

Stage cost

The stage cost is a function `k : Rn × Rm × Ωk → R

`k(xk, uk, ξk) = φk(xk, uk, ξk) + φk(Fkxk +Gkuk, ξk),

where φ is real-valued, convex, smooth, e.g.,

φk(xk, uk, ξk) = x′kQξkxk + u′kRξkuk,

and φ is proper, convex, lsc, and possibly non-smooth, e.g.,

φk(xk, uk, ξk) = δ(Fkxk +Gkuk | Yξk).

7 / 28

Terminal cost

The terminal cost is a function Vf : Rn × ΩN → R which can be writtenas

Vf (x) = φN (xN , ξN ) + φN (xN , ξN ),

where φN is real-valued, convex, smooth and φN is proper, convex, lscand possibly non-smooth.

8 / 28

Total cost

The total cost function can be written as E(f(x) + g(Hx)), wherex = ((xk)k, (uk)k)

f(x) =

N−1∑k=0

φk(xk, uk, ξk) + φN (xN , ξN ) + δ(x | X (p))

g(Hx) =

N−1∑k=0

φk(Fkxk +Gkuk, ξk) + φN (FNxN , ξN ),

and φk and φN are such that f is σ-strongly convex on its domain, thatis, the affine space which defines the system dynamics, i.e.,

X (p) = x : xjk+1 = Ajkxik +Bj

kuik + wjk, j ∈ child(k, i)

9 / 28

II. Proximal gradient algorithm

Proximal operator

We define a mapping proxγf : Rn → Rn of a closed, convex, properextended-real valued function f : Rn → R as

proxγf (v) = arg minx∈Rn

f(x) +

1

2γ‖x− v‖22

,

for γ>0.

10 / 28

Proximal of the conjugate function

For a function f : Rn → R we define its conjugate function to be1

f∗(y) = supx∈Rn〈y, x〉 − f(x).

If we can compute proxγf , then we can also compute proxγf∗ using theMoreau decomposition formula

v = proxγf (v) + γ proxγ−1f∗(γ−1v)

1R. Rockafellar, Convex analysis. Princeton university press, 1972.

11 / 28

Optimisation problem

Consider the optimisation problem :

P ? = minz=Hx

f(x) + g(z),

where f : Rn → R is σ-strongly convex and g : Rm → R is closed, properand convex. The Fenchel dual of this problem is:

D? = minyf∗(−H ′y) + g∗(y),

where f∗ has Lipschitz-continuous gradient with constant 1/σ.

12 / 28

The basic algorithm

The proximal point algorithm applied to the dual optimisation problemis defined by the recursion on dual variables2:

y0 = 0,

yν+1 = proxλg∗(yν + λH∇f∗(−H ′yν)).

Using the conjugate subgradient theorem we can define

xν := ∇f∗(−H ′yν) = arg minz〈z,H ′yν〉+ f(z).

2P. Combettes and J. Pesquet, “Proximal splitting methods in signal processing”, Fixed-Point Algorithms for InverseProblems in Science and Engineering, 2011.

13 / 28

Dual APG algorithm

Nesterov’s accelerated proximal gradient algorithm (APG) convergesat a rate of O(1/ν2) and is defined by the recursion:

vν = yν + θν(θ−1ν−1 − 1)(yν − yν−1),xν = arg min

z〈z,H ′vν〉+ f(z),

zν = proxλ−1g(λ−1vν +Hxν),

yν+1 = vν + λ(Hzv − tv),

θν+1 =1

2(√θ4ν + 4θ2ν − θ2ν).

14 / 28

Characteristics of the algoritm

X Dual iterates converge at a rate of O(1/ν2)

X An ergodic (averaged) primal iterate converges at a rate of O(1/ν2)3

X Preconditioning is of crucial importance

X Terminate the algorithm when the iterate (xν , zν) satisfies

f(x) + g(z)− P ? ≤ εV‖x−Hz‖∞ ≤ εg.

3P. Patrinos and A. Bemporad, “An accelerated dual gradient-projection algorithm for embedded linear model predictivecontrol,” IEEE Trans. Aut. Contr., vol. 59, no. 1, pp. 18–33, 2014.

15 / 28

III. APG for Stochastic OptimalControl Problems

Scenario tree formulation

16 / 28

Splitting for proximal formulation

We have

Ef(x)=

N−1∑k=0

µ(k)∑i=1

pikφ(xik,uik,i) +

µ(N)∑i=1

piNφN (xiN , i)+δ(x|X (p)),

Eg(Hx)=

N−1∑k=0

µ(k)∑i=1

pikφ(F ikxik+G

iku

ik,i)+

µ(N)∑i=1

piN φN (F iNxiN , i),

where

X (p) = x : xjk+1 = Ajkxik +Bj

kuik + wjk, j ∈ child(k, i)

17 / 28

Computation of the dual gradient

Using dynamic programming, we solve the problem

xν = arg minz〈z,H ′yν〉+ Ef(z).

where

Ef(x)=

N−1∑k=0

µ(k)∑i=1

pikφ(xik,uik,i) +

µ(N)∑i=1

piNφN (xiN , i)+δ(x|X (p)),

18 / 28


Factor step:

I Performed once

I Parallelisable

I For time-invariant problems,can be performed once offline

Algorithm 1 Solve stepqiN ← yiN , ∀i ∈ N[1,µ(N)], %Backward substitution

for k = N − 1, . . . , 0 dofor i ∈ µ(k) do in parallel

uik ← Φikyik +

∑j∈child(k,i) Θ

jkqjk+1

+ σik

qik ← Di′k yik +

∑j∈child(k,i) Λ

j′kqjk+1

+ cikend for

end forx10 = p, %Forward substitution

for k = 0, . . . , N − 1 dofor i ∈ µ(k) do in parallel

uik ← Kikxik + uik

for j ∈ child(k, i) do in parallelxjk+1

← Ajkxik + B

jkuik + w

jk

end forend for

end for

19 / 28


X Dynamic programming approach

X Parallelisable across all nodes of a stage

X The solve step involves only matrix-vector products

20 / 28

IV. Simulations

Simulation Results

X Linear spring-mass system

X GPU CUDA-C implementation (NVIDIA Tesla 2075)

X Average and maximum runtime for a random sample of 100 initialpoints

X Compared against interior-point solver of Gurobi

21 / 28

Number of scenarios

22 / 28

Number of scenarios

log2(scenarios)

7 8 9 10 11 12 13

max. tim

e (

sec)

10-2

10-1

100

101

102

APG 0.005

APG 0.01

APG 0.05

Gurobi IP

23 / 28

Number of scenarios

In numbers:

I 8192 scenarios

I 6.39 · 105 primal variables

I 2.0 · 106 dual variables

I Using εg = εV = 0.01 we are 40× faster (average)

24 / 28

Prediction horizon

prediction horizon

10 20 30 40 50 60

avera

ge tim

e (

sec)

10-1

100

101

APG 0.005

APG 0.01

APG 0.05

Gurobi IP

25 / 28

Prediction horizon

prediction horizon

10 20 30 40 50 60

max. tim

e (

sec)

10-1

100

101

APG 0.005

APG 0.01

APG 0.05

Gurobi IP

26 / 28

Prediction horizon

In numbers:

I N = 60 and 500 scenarios

I 0.92 · 106 primal variables

I 2.0 · 106 dual variables

I Using εg = εV = 0.01 we are 23× faster (average)

27 / 28

Stochastic MPC of drinking water networks

Recent results (to be submitted):

I About 2 million primal variables

I 593 scenarios, N = 24

I Gurobi requires 1329s on average

I GPU APG runtime is about 58s

28 / 28

Thank you for your attention.

This work was financially supported by the EU FP7 research project EFFINET “Efficient Integrated Real-timemonitoring and Control of Drinking Water Networks,” grant agreement no. 318556.