distributed solution of stochastic optimal control problem on gpus
TRANSCRIPT
Distributed solution of stochastic optimal controlproblem on GPUs
Ajay K. Sampathiraoa, P. Sopasakisa, A. Bemporada and P. Patrinosb
a IMT Institute for Advanced Studies Lucca, Italyb Dept. Electr. Eng. (ESAT), KU Leuven, Belgium
December 18, 2015
Applications
X Microgrids [Hans et al. ’15]
X Drinking water networks [Sampathirao et al. ’15]
X HVAC [Long et al. ’13, Zhang et al. ’13, Parisio et al. ’13]
X Financial systems [Patrinos et al. ’11, Bemporad et al., ’14]
X Chemical process [Lucia et al. ’13]
X Distillation column [Garrido and Steinbach, ’11]
1 / 28
Motivation
Stochastic optimisation is not fit for control applications.
2 / 28
Spoiler alert!
Example:
I 920, 000 decision variables
I Interior point runtime 35s
I GPU APG solver < 3s
3 / 28
Outline
1. Stochastic optimal control problem formulation
2. Accelerated proximal gradient algorithm
3. Parallelisable implementation
4. Simulations
4 / 28
I. Stochastic Optimal Control
System description
Discrete-time uncertain linear system:
xk+1 = Aξkxk +Bξkuk + wξk ,
ξk is a random variable on a prob. space (Ωk,Fk,Pk). At time k weobserve xk but not ξk.
5 / 28
Stochastic optimal control problem
Optimisation problem:
V ?(p) = minπ=ukk=N−1
k=0
E
[Vf (xN , ξN ) +
N−1∑k=0
`k(xk, uk, ξk)
],
s.t x0 = p,
xk+1 = Aξkxk +Bξkuk + wξk ,
where:
I E[·]: conditional expectation wrt the product probability measure
I Casual policy uk = ψk(p,ξξξk−1), with ξξξk = (ξ0, ξ1, . . . , ξk)
I ` and Vf can encode constraints
6 / 28
Stochastic optimal control problem
Optimisation problem:
V ?(p) = minπ=ukk=N−1
k=0
E
[Vf (xN , ξN ) +
N−1∑k=0
`k(xk, uk, ξk)
],
s.t x0 = p,
xk+1 = Aξkxk +Bξkuk + wξk ,
where:
I E[·]: conditional expectation wrt the product probability measure
I Casual policy uk = ψk(p,ξξξk−1), with ξξξk = (ξ0, ξ1, . . . , ξk)
I ` and Vf can encode constraints
6 / 28
Stochastic optimal control problem
Optimisation problem:
V ?(p) = minπ=ukk=N−1
k=0
E
[Vf (xN , ξN ) +
N−1∑k=0
`k(xk, uk, ξk)
],
s.t x0 = p,
xk+1 = Aξkxk +Bξkuk + wξk ,
where:
I E[·]: conditional expectation wrt the product probability measure
I Casual policy uk = ψk(p,ξξξk−1), with ξξξk = (ξ0, ξ1, . . . , ξk)
I ` and Vf can encode constraints
6 / 28
Stochastic optimal control problem
Optimisation problem:
V ?(p) = minπ=ukk=N−1
k=0
E
[Vf (xN , ξN ) +
N−1∑k=0
`k(xk, uk, ξk)
],
s.t x0 = p,
xk+1 = Aξkxk +Bξkuk + wξk ,
where:
I E[·]: conditional expectation wrt the product probability measure
I Casual policy uk = ψk(p,ξξξk−1), with ξξξk = (ξ0, ξ1, . . . , ξk)
I ` and Vf can encode constraints
6 / 28
Stage cost
The stage cost is a function `k : Rn × Rm × Ωk → R
`k(xk, uk, ξk) = φk(xk, uk, ξk) + φk(Fkxk +Gkuk, ξk),
where φ is real-valued, convex, smooth, e.g.,
φk(xk, uk, ξk) = x′kQξkxk + u′kRξkuk,
and φ is proper, convex, lsc, and possibly non-smooth, e.g.,
φk(xk, uk, ξk) = δ(Fkxk +Gkuk | Yξk).
7 / 28
Terminal cost
The terminal cost is a function Vf : Rn × ΩN → R which can be writtenas
Vf (x) = φN (xN , ξN ) + φN (xN , ξN ),
where φN is real-valued, convex, smooth and φN is proper, convex, lscand possibly non-smooth.
8 / 28
Total cost
The total cost function can be written as E(f(x) + g(Hx)), wherex = ((xk)k, (uk)k)
f(x) =
N−1∑k=0
φk(xk, uk, ξk) + φN (xN , ξN ) + δ(x | X (p))
g(Hx) =
N−1∑k=0
φk(Fkxk +Gkuk, ξk) + φN (FNxN , ξN ),
and φk and φN are such that f is σ-strongly convex on its domain, thatis, the affine space which defines the system dynamics, i.e.,
X (p) = x : xjk+1 = Ajkxik +Bj
kuik + wjk, j ∈ child(k, i)
9 / 28
II. Proximal gradient algorithm
Proximal operator
We define a mapping proxγf : Rn → Rn of a closed, convex, properextended-real valued function f : Rn → R as
proxγf (v) = arg minx∈Rn
f(x) +
1
2γ‖x− v‖22
,
for γ>0.
10 / 28
Proximal of the conjugate function
For a function f : Rn → R we define its conjugate function to be1
f∗(y) = supx∈Rn〈y, x〉 − f(x).
If we can compute proxγf , then we can also compute proxγf∗ using theMoreau decomposition formula
v = proxγf (v) + γ proxγ−1f∗(γ−1v)
1R. Rockafellar, Convex analysis. Princeton university press, 1972.
11 / 28
Optimisation problem
Consider the optimisation problem :
P ? = minz=Hx
f(x) + g(z),
where f : Rn → R is σ-strongly convex and g : Rm → R is closed, properand convex. The Fenchel dual of this problem is:
D? = minyf∗(−H ′y) + g∗(y),
where f∗ has Lipschitz-continuous gradient with constant 1/σ.
12 / 28
The basic algorithm
The proximal point algorithm applied to the dual optimisation problemis defined by the recursion on dual variables2:
y0 = 0,
yν+1 = proxλg∗(yν + λH∇f∗(−H ′yν)).
Using the conjugate subgradient theorem we can define
xν := ∇f∗(−H ′yν) = arg minz〈z,H ′yν〉+ f(z).
2P. Combettes and J. Pesquet, “Proximal splitting methods in signal processing”, Fixed-Point Algorithms for InverseProblems in Science and Engineering, 2011.
13 / 28
Dual APG algorithm
Nesterov’s accelerated proximal gradient algorithm (APG) convergesat a rate of O(1/ν2) and is defined by the recursion:
vν = yν + θν(θ−1ν−1 − 1)(yν − yν−1),xν = arg min
z〈z,H ′vν〉+ f(z),
zν = proxλ−1g(λ−1vν +Hxν),
yν+1 = vν + λ(Hzv − tv),
θν+1 =1
2(√θ4ν + 4θ2ν − θ2ν).
14 / 28
Dual APG algorithm
Nesterov’s accelerated proximal gradient algorithm (APG) convergesat a rate of O(1/ν2) and is defined by the recursion:
vν = yν + θν(θ−1ν−1 − 1)(yν − yν−1),xν = arg min
z〈z,H ′vν〉+ f(z),
zν = proxλ−1g(λ−1vν +Hxν),
yν+1 = vν + λ(Hzv − tv),
θν+1 =1
2(√θ4ν + 4θ2ν − θ2ν).
14 / 28
Characteristics of the algoritm
X Dual iterates converge at a rate of O(1/ν2)
X An ergodic (averaged) primal iterate converges at a rate of O(1/ν2)3
X Preconditioning is of crucial importance
X Terminate the algorithm when the iterate (xν , zν) satisfies
f(x) + g(z)− P ? ≤ εV‖x−Hz‖∞ ≤ εg.
3P. Patrinos and A. Bemporad, “An accelerated dual gradient-projection algorithm for embedded linear model predictivecontrol,” IEEE Trans. Aut. Contr., vol. 59, no. 1, pp. 18–33, 2014.
15 / 28
III. APG for Stochastic OptimalControl Problems
Scenario tree formulation
16 / 28
Splitting for proximal formulation
We have
Ef(x)=
N−1∑k=0
µ(k)∑i=1
pikφ(xik,uik,i) +
µ(N)∑i=1
piNφN (xiN , i)+δ(x|X (p)),
Eg(Hx)=
N−1∑k=0
µ(k)∑i=1
pikφ(F ikxik+G
iku
ik,i)+
µ(N)∑i=1
piN φN (F iNxiN , i),
where
X (p) = x : xjk+1 = Ajkxik +Bj
kuik + wjk, j ∈ child(k, i)
17 / 28
Splitting for proximal formulation
We have
Ef(x)=
N−1∑k=0
µ(k)∑i=1
pikφ(xik,uik,i) +
µ(N)∑i=1
piNφN (xiN , i)+δ(x|X (p)),
Eg(Hx)=
N−1∑k=0
µ(k)∑i=1
pikφ(F ikxik+G
iku
ik,i)+
µ(N)∑i=1
piN φN (F iNxiN , i),
where
X (p) = x : xjk+1 = Ajkxik +Bj
kuik + wjk, j ∈ child(k, i)
17 / 28
Computation of the dual gradient
Using dynamic programming, we solve the problem
xν = arg minz〈z,H ′yν〉+ Ef(z).
where
Ef(x)=
N−1∑k=0
µ(k)∑i=1
pikφ(xik,uik,i) +
µ(N)∑i=1
piNφN (xiN , i)+δ(x|X (p)),
18 / 28
Computation of the dual gradient
Factor step:
I Performed once
I Parallelisable
I For time-invariant problems,can be performed once offline
Algorithm 1 Solve stepqiN ← yiN , ∀i ∈ N[1,µ(N)], %Backward substitution
for k = N − 1, . . . , 0 dofor i ∈ µ(k) do in parallel
uik ← Φikyik +
∑j∈child(k,i) Θ
jkqjk+1
+ σik
qik ← Di′k yik +
∑j∈child(k,i) Λ
j′kqjk+1
+ cikend for
end forx10 = p, %Forward substitution
for k = 0, . . . , N − 1 dofor i ∈ µ(k) do in parallel
uik ← Kikxik + uik
for j ∈ child(k, i) do in parallelxjk+1
← Ajkxik + B
jkuik + w
jk
end forend for
end for
19 / 28
Computation of the dual gradient
X Dynamic programming approach
X Parallelisable across all nodes of a stage
X The solve step involves only matrix-vector products
20 / 28
IV. Simulations
Simulation Results
X Linear spring-mass system
X GPU CUDA-C implementation (NVIDIA Tesla 2075)
X Average and maximum runtime for a random sample of 100 initialpoints
X Compared against interior-point solver of Gurobi
21 / 28
Number of scenarios
22 / 28
Number of scenarios
log2(scenarios)
7 8 9 10 11 12 13
max. tim
e (
sec)
10-2
10-1
100
101
102
APG 0.005
APG 0.01
APG 0.05
Gurobi IP
23 / 28
Number of scenarios
In numbers:
I 8192 scenarios
I 6.39 · 105 primal variables
I 2.0 · 106 dual variables
I Using εg = εV = 0.01 we are 40× faster (average)
24 / 28
Prediction horizon
prediction horizon
10 20 30 40 50 60
avera
ge tim
e (
sec)
10-1
100
101
APG 0.005
APG 0.01
APG 0.05
Gurobi IP
25 / 28
Prediction horizon
prediction horizon
10 20 30 40 50 60
max. tim
e (
sec)
10-1
100
101
APG 0.005
APG 0.01
APG 0.05
Gurobi IP
26 / 28
Prediction horizon
In numbers:
I N = 60 and 500 scenarios
I 0.92 · 106 primal variables
I 2.0 · 106 dual variables
I Using εg = εV = 0.01 we are 23× faster (average)
27 / 28
Stochastic MPC of drinking water networks
Recent results (to be submitted):
I About 2 million primal variables
I 593 scenarios, N = 24
I Gurobi requires 1329s on average
I GPU APG runtime is about 58s
28 / 28
Thank you for your attention.
This work was financially supported by the EU FP7 research project EFFINET “Efficient Integrated Real-timemonitoring and Control of Drinking Water Networks,” grant agreement no. 318556.