op#miza#on*and*other* applied*mathema#cs*challenges*for*...
TRANSCRIPT
Op#miza#on and other Applied Mathema#cs Challenges for Complex Energy Systems
Mihai Anitescu
Mathema.cs and Computer Science Division Argonne Na.onal Laboratory
IMSE Illinois 2012
1. Mo&va&on: Management of Energy Systems under Ambient Condi&ons Uncertainty
2
Mihai Anitescu -‐ Op.miza.on under uncertainty
Ambient Condi&on Effects in Energy Systems Opera&on of Energy Systems is Strongly Affected by Ambient Condi&ons -‐ Power Grid Management: Predict Spa.o-‐Temporal Demands (Douglas, et.al. 1999) -‐ Power Plants: Genera.on levels affected by air humidity and temperature (General Electric) -‐ Petrochemical: Hea.ng and Cooling U.li.es (ExxonMobil)
-‐ Buildings: Hea.ng and Cooling Needs (Braun, et.al. 2004) -‐ (Focus) Next Genera&on Energy Systems assume a major renewable energy penetra.on: Wind + Solar + Fossil (Beyer, et.al. 1999)
-‐ Increased reliance on renewables must account for variability of ambient condi.ons, which cannot be done determinis.cally …
-‐ We must op.mize opera.onal and planning decisions accoun.ng for the uncertainty in ambient condi.ons (and others, e.g. demand)
-‐ Op&miza&on Under Uncertainty.
Wind Power Profiles 3
Mihai Anitescu -‐ Op.miza.on under uncertainty
2. Impact: Stochas&c Unit Commitment – Management of Energy Systems
4
Mihai Anitescu -‐ Op.miza.on under uncertainty
Stochas&c Predic&ve Control
Dynamic System Model
Stochas&c
Op&miza&on
Weather Model
Low-‐Level Control
Forecast
Energy System
Set-‐Points
Forecast & Uncertainty
Measurements
Stochas&c NLMPC
Min
x0
f0 (x0 ) + E Minx
f (x,ω )⎡⎣
⎤⎦{ }
subj. to. g0 x0( ) = b0gi x0 ,xi( ) = bi i =1,2…S
x0 ≥ 0, xi ≥ 0
Two-‐stage Stoch Prog
5
Mihai Anitescu -‐ Op.miza.on under uncertainty
Stochas&c Unit Commitment with Wind Power (SAA)
§ Wind Forecast – WRF(Weather Research and Forecas.ng) Model – Real-‐.me grid-‐nested 24h simula.on – 30 samples require 1h on 500 CPUs (Jazz@Argonne)
Mihai Anitescu -‐ Op.miza.on under uncertainty
6
1min COST
s.t. , ,
, ,
ramping constr., min. up/down constr.
wind
wind
p u dsjk jk jk
s j ks
sjk kj
windsjk
j
wik ksj
ndsk
jjk
j
c c cN
p D s k
p D R s k
p
p
∈ ∈ ∈
∈
∈
∈
∈
⎛ ⎞= + +⎜ ⎟
⎝ ⎠+ = ∈ ∈
+ ≥ + ∈ ∈
∑ ∑∑
∑ ∑
∑ ∑S N T
N
N
N
N
S T
S T
Zavala & al 2010.
Thermal Units Schedule? Minimize Cost Sa.sfy Demand Have a Reserve Dispatch through network
0 24 48 720
200
400
600
800
1000
1200
Tota
l P
ow
er
[MW
]
Time [hr]
§ Unit commitment & energy dispatch with uncertain wind power genera.on for the State of Illinois, assuming 20% wind power penetra.on, using the same windfarm sites as the one exis.ng today.
§ Full integra.on with 10 thermal units to meet demands. Consider dynamics of start-‐up, shutdown, set-‐point changes
§ The solu.on is only 1% more expensive then the one with exact informa.on. Solu.on on average infeasible at 10%.
wind power
Mihai Anitescu -‐ Op.miza.on under uncertainty
7
Demand Samples Wind
Thermal
Wind power forecast and stochas&c programming
Some Considera&ons in Using Supercompu&ng for Power Grid
§ Is it really worth using a supercomputer for this task? (We need the answer every 1hr with 24 hour .me horizons. )
§ Let’s look at the most pressing item of Supercompu.ng usage: power. – BG/P (and exascale) needs <~ 20MW of power. – The Midwest US has 140GW of power installed, and the peak demands runs up to
110GW. – We will never reduce power consump.on, but we will make it more reliable, less
dependent on fossil, and cheaper by bemer managing the peak
§ If we accept this will lead to 10% more renewable penetra.on (our SUC study), then this is worth on the order of 10-‐15GW, far above what BG/P costs in power consump.on.
§ In addi.on opera.onal constraints makes supercompu.ng (if uncertainty needed to account for) necessary and not just useful or convenient.
§ But, even if approxima.ons will work, this tool will be helpful as the “gold standard” for valida.ng other algorithms to be deployed on defined computa.onal resources.
Mihai Anitescu -‐ Op.miza.on under uncertainty
8
3. Low-‐Hanging Fruit Scalable SoWware: PIPS (Parallel Interior Point Stochas&c Programming) – Petra, Lubin, Anitescu
9
Mihai Anitescu -‐ Op.miza.on under uncertainty
PIPS – Our Scalable Stochas&c Programming Solver Using Direct Schur Complement Method
§ The arrow shape of H
§ Solving Hz=r
Mihai Anitescu -‐ Op.miza.on under uncertainty
10
H 1 G1T
H 2 G2T
H S GS
T
G1 G2 … GS H 0
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥⎥
=
L1L2
LSL10 L20 … LS 0 Lc
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
D1D2
DNDc
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
L1T L10
T
LT2 L20T
LTS LS 0
T
LTc
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥⎥
10
1
10
, , 1, ,
, .
T T
T Ti i i c
i i i i i i i iS
ci
c
L D L H L G L D i S
H H G D L CC G L−
− −
=
= =
= −
=
=∑
K
1
10 0
10
, 1, ,i i i
c i i
S
i
Lw r i S
w L r wL=
−
−
=
⎛ ⎞⎜ ⎟=⎝ ⎠
=
−∑
K1 , 0,...,i i iv iD Sw− == ( )
10 0
0 0 ,
1, .
c
T Ti i i i
L v
L v
z
z
i S
L z
−
−
=
= −
= KBack subs.tu.on
Implicit factoriza.on, C is dense, H’s are sparse.
Diagonal solve Forward subs.tu.on
Parallelizing the 1st stage linear algebra
§ We distribute the 1st stage Schur complement system.
§ C is treated as dense.
§ Alterna.ve to PSC for problems with large number of 1st stage variables.
§ Removes the memory bomleneck of PSC and DSC.
§ We inves.gated ScaLapack, Elemental (successor of PLAPACK) – None have a solver for symmetric indefinite matrices (Bunch-‐Kaufman); – LU or Cholesky only. – So we had to think of modifying either.
Mihai Anitescu -‐-‐ Stochas.c Programming
11
C =Q A0
T
A0 0
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥, Q dense symm. pos. def., 0A sparse full rank.
Cholesky-‐based -‐like factoriza&on
§
§ Can be viewed as an “implicit” normal equa.ons approach.
§ In-‐place implementa.on inside Elemental: no extra memory needed.
§ Idea: modify the Cholesky factoriza.on, by changing the sign auer processing p columns.
§ It is much easier to do in Elemental, since this distributes elements, not blocks.
§ Twice as fast as LU
§ Works for more general saddle-‐point linear systems, i.e., pos. semi-‐def. (2,2) block.
Mihai Anitescu -‐-‐ Stochas.c Programming
12
Q AT
A 0
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥= L 0
AL−T L
⎡
⎣⎢⎢
⎤
⎦⎥⎥
I− I
⎡
⎣⎢
⎤
⎦⎥
LT L−1AT
0 LT
⎡
⎣⎢⎢
⎤
⎦⎥⎥, where LLT = Q, LLT = A Q−1AT
TLDL
Distribu&ng the 1st stage Schur complement matrix
§ All processors contribute to all of the elements of the (1,1) dense block
§ A large amount of inter-‐process communica.on occurs.
§ Possibly more costly than the factoriza.on itself.
§ Solu.on: use buffer to reduce the number of messages when doing a Reduce_scaLer.
§ approach also reduces the communica.on by half – only need to send lower triangle.
Mihai Anitescu -‐-‐ Stochas.c Programming
13
Q = Q 0 +1S
AiT BiQ
i
−1Bi
T⎛⎝
⎞⎠
−1
Ai
⎡
⎣⎢
⎤
⎦⎥
i=1
S
∑
TLDL
Large-‐scale performance
Mihai Anitescu -‐-‐ Stochas.c Programming
14
§ Comparison of ScaLapack (LU), Elemental(LU), and (1024 cores)
§ Strong scaling § 90.1% from 64 to 1024 cores; § 75.4% from 64 to 2048 cores. § > 4,000 scenarios.
TLDL
SAA problem: 189 million variables
PIPS – Parallel solver for stochas&c op&miza&on
§ Interior-‐point method implementa.on (Mehrotra’s algorithm) § Scenario-‐based decomposi.on of the linear algebra. § PIPS reuses OOQP (Object-‐oriented quadra.c programming solver) class hierarchy. § New parallel linear algebra layer for block-‐angular IPM linear systems. § Hybrid MPI+SMP paralleliza.on
§ First-‐stage Schur complement: dense linear algebra. – Distributed factoriza.on and backsolves (by using Elemental) if needed. – Shared-‐memory paralleliza.on(SMP) is obtained via Elemental. – Distributed assembling of the SC matrix is done by a streamlined Reduced_sca`er that is also
in-‐node SMP-‐accelerated.
§ Second-‐stage linear systems are sparse. – Supports various sparse solvers: MA57 (HSL UK), WSMP(IBM). – SMP is obtained with WSMP
Mihai Anitescu -‐ Op.miza.on under uncertainty
15
PIPS Solver Capabili&es § Hybrid MPI/SMP running on Blue Gene/P
– Successfully (though incompletely due to alloca.on limit) run on up to 32,768 nodes (96% strong scaling) for Illinois problem with grid constraints. 3B variables, maybe largest ever solved?
§ Handles up to 100,000 first-‐stage variables. Previous results dealt with O(20-‐50). § Close to real-‐.me solu.ons (24 hr horizon in 1 hr wallclock)
– Further development needed, since users aim for • More uncertainty, more detail (x 10) • Faster Dynamics à Shorter Decision Window (x 10) • Longer Horizons (California == 72 hours) (x 3)
16
Mihai Anitescu -‐ Op.miza.on under uncertainty
Components of Execu&on Time and Strong Scaling
4k 8k 16k 32k
Backsolve
Factor SC
Distrib. SC
BG/P Nodes
Sec./Iteration
0
10
1400
5
200
400
600
800
1000
1200
1
17
§ 32K nodes=130K cores (80% BG/P) § “Backsolve” phase embarrassingly parallel, but not Schur Complement (SC) § Communica.on for “Distrib. SC” not yet a bomleneck, but we will get there.
Strong Scaling
BG/P Nodes
Speedup(baseline4k)
4k 8k 16k 32k
4k
8k
16k
32k
Linear
PIPS
1
Mihai Anitescu -‐ Op.miza.on under uncertainty
4. The harder problems need some mathema&cs
18
Mihai Anitescu -‐ Op.miza.on under uncertainty
4.1 Q1: How do I deal with the impending first-‐stage bo`leneck?
19
Mihai Anitescu -‐ Op.miza.on under uncertainty
The Stochas&c Precondi&oner § The limi.ng factor in the scalability of Schur method is the expensive solve with dense
Schur complement matrix
– A computa.onal bomleneck: workers sit idle wai.ng for the master to factorize C.
§ Remedy – the precondi.oned Schur complement (PSC) – 1. factorize incomplete matrix P in the same .me C is computed. – 2. use the factoriza.on of P to solve with C very fast.
§ In linear algebra terms – P is a precondi.oner for C. Our choice of P is
where is an IID subset of n scenarios. – Krylov itera.ves solves (PCG or BiCGStab) replaces the direct solves
20
1 2{ , , , }nk k k= KK Sn = Q 0 +
1n
Aki
T BkiQ ki
−1Bki
T⎛⎝
⎞⎠
−1
Aki
⎡
⎣⎢
⎤
⎦⎥
i=1
n
∑ ,
C = Q 0 +
1N
AiT BiQ
i
−1Bi
T⎛⎝
⎞⎠
−1
Ai
⎡
⎣⎢
⎤
⎦⎥
i=1
N
∑
Mihai Anitescu -‐ Op.miza.on under uncertainty
Precondi&oned Schur Complement (PSC)
Mihai Anitescu -‐ Op.miza.on under uncertainty
21
1
0 01
i i i
Nii
i
N
w r
r L
L
r w
−
=
=
⎛ ⎞⎜ ⎟⎝
=⎠
−∑%
1 , 0,...,i i iv iD Nw− ==
( )0 0T T
i i i iz L v L z−= −
10
1
Ti
N
ii
iH H GC G −
=
−= ∑1
0 1, , ,,T Ti i i i i i i iL L iD L H G L D N− −= == K
(separate process)
TM M ML D L P=
00 Krylov( , , )C Pz r= %
P is a C built from a subset of scenarios
Quality of the Stochas&c Precondi&oner § “Exponen&ally” be`er precondi&oning (Petra & Anitescu 2010)
22
21
24 24Pr(| ( exp) 1| ) 2
||2 ||n NN max
np L S
S pS ε ελ − ⎛ ⎞−⎜ ⎟
⎝≥ ≤
⎠−
Op.mal use of PSC – linear scaling
Factoriza.on of the precondi.oner can not be hidden anymore by the computa.on of C.
§ A typical scaling behavior of our approach. Bemer scaling than the direct Schur complement method (DSC) is exhibited by PSC.
§ DSC uses p processes, PSC uses p+1.
Mihai Anitescu -‐ Op.miza.on under uncertainty
Performance of the precondi&oner
§ Eigenvalues clustering & Krylov itera.ons § Affected by the well-‐known ill-‐condi.oning of IPMs.
Mihai Anitescu -‐ Op.miza.on under uncertainty
23
Sn ≈ SN and SN ≈ E S(ω )⎡⎣ ⎤⎦ , where
S(ω ) = Q0 + D0( )+ AT (ω ) B(ω ) Q(ω ) + D(ω )( )−1BT (ω )( )−1
A(ω )⎡
⎣⎢
⎤
⎦⎥
4.2 Q2: How do I use very expensive samples?
24
Mihai Anitescu -‐ Op.miza.on under uncertainty
Bootstrap for stochas&c op&miza&on of energy systems
§ Sampling the uncertainty present in complex energy systems may be a computa.onally intensive task – Example: weather forecas.ng – May need 400K CPUs for 30 samples at the resolu.on we need.
§ Obtaining uncertainty es.mates (confidence intervals) on the op.mal value is important in policy-‐making process.
§ Only a small number of samples(scenarios) can be afforded. Therefore an opera.onal constraint makes me start to care about the low-‐sample size regime and its asympto.cs.
§ But how good is the current state of the theory in that regime?
Mihai Anitescu -‐ Op.miza.on under uncertainty
25
Theory situa&on for Stochas&c Programming § Most es.mates for SAA are based on results of the following type:
§ This allows, in principle, for the convergence of the confidence intervals to be arbitrarily slow.
§ Current state of the area is built around applica.on of the Delta Theorem , which provides the results of the type:
§ But this is not sufficient for similar results for the confidence intervals !!!
§ Intui.on: Convergence in probability tells me how well I behave on a “good” set increasing to probability 1, but tells me nothing about the bad set.
Mihai Anitescu -‐ Op.miza.on under uncertainty
26
N 0.5 θ −θ̂σ̂
⎡
⎣⎢
⎤
⎦⎥
D⎯ →⎯ N(0,1)
X(ω ) =ω , XN (ω ) =−1, 0 ≤ω < 1
log(N +1)
ω , 1log(N +1)
≤ω ≤1.
⎧
⎨⎪⎪
⎩⎪⎪
⇒P( lim
N→∞Na XN (ω )− X(ω ) = 0) = 1
P(XN (ω ) ≤ 0)− P(X(ω ) ≤ 0) =1
log(N +1) N −b
⎧
⎨⎪⎪
⎩⎪⎪
XN (ω )− X(ω ) = oP (N−a )⇔ P Na XN (ω )− X(ω )( )→ 0
Large Devia&on + Bootstrap
§ Bootstrap is a resampling method that builds high-‐order confidence es.mates. § The idea of bootstrapping is to squeeze out informa.on from a small number of
samples by resampling (with replacement). § But it applies only to finite-‐dimensional func.ons of means, and the op.mal value
of stochas.c op.miza.on is not one (due to nonlinearity).
§ Idea: Use large devia.ons, to produce exponen.ally convergent probability sets
§ Here, depends on the expected value of the objec.ve func.on and its higher deriva.ves at the solu.on of the SAA approxima.on problem. We can thus use bootstrap theory to produce confidence intervals for and exponen.al convergence ensures order stays the same as for bootstrap !!
Mihai Anitescu -‐ Op.miza.on under uncertainty
27
P(| Nb (θ̂ −θ) |> ) = f1()N
f2 () exp(− f3()Nc )
θ̂
θ̂
The es&mator
§ We proposed a corrected sta.s.c, computable at the SAA approxima.on
§ L is the Lagrangian of the problem and J is the Jacobian of the constraints. § is the solu.on of the SAA problem obtained from a sample of size N.
§ Bootstrapping is performed based on a second sample of size M, and it works due to the fact that it is now applied at a set point so finite dimensional results do apply.
Mihai Anitescu -‐ Op.miza.on under uncertainty
28
Φ = E f (xN )⎡⎣ ⎤⎦ −
12
E∇L(xN ,λN )0
⎛
⎝⎜
⎞
⎠⎟
TE∇2 L(xN ,λN ) J (xN )T
J (xN ) 0
⎛
⎝⎜⎜
⎞
⎠⎟⎟
−1
E∇L(xN ,λN )0
⎛
⎝⎜
⎞
⎠⎟
Nx
Nx
Accuracy of the es&mator’s confidence levels
§ We proved that bootstrap confidence intervals build using are close to second order correct for the true op.mal value :
§ We observed the predicted or bemer correctness in the numerical simula.ons
§ bootstrapping outperforms classical normal approxima.on method. § We now have analy.cal techniques for asympto.cs of confidence intervals in SP!
Mihai Anitescu -‐ Op.miza.on under uncertainty
29
Φ*( )f xθ =
P(θ ∈Jαb ) =α +O(N −1+a ),a > 0.
Φ
Correctness order 0.32
Correctness order 0.82
Correctness order 2.11
Correctness order 1.14
4.3: Q3: How do I roll the horizon in real &me?
Mihai Anitescu -‐ Op.miza.on under uncertainty
30
Example DO:
Off-the-Shelf : Solve to Given Accuracy (Neglect Dynamics)
Real-Time (Z & A) : One SQP Iteration per step
Fundamental Limitations of Off-The-Shelf Optimization
Context: Parametric NLP KKT system for QP
Note: Canonical Form Iden&cal to Time-‐Steping for DVI
Exact Solu&on Sa&sfies:
From Lipschitz Con&nuity of strongly regular GE:
Op&mal Solu&on
Lineariza&on Point
QP Solu&on
-‐ Strong Regularity Requires SSOC and LICQ -‐ NLP Error is Bounded by LGE Perturba&on -‐ One QP solu&on from exact manifold is second-‐order
accurate
MPC as Dynamic Generalized Equation (Z & A)
wt* −wt ≤ LΔt 2
Time lineariza&on of Op&mality Condi&ons: Find
-‐ A: LGE is Strongly Regular at ALL e.g. NLP sa&sfies LICQ and SOSC everywhere
But for linearized DO I am never EXACTLY on the manifold: What then?
Theorem (elucida&ng an issue posed by Diehl et al.)
Solve off-‐manifold &me-‐dependent QP
Then: For sufficiently small , we can track the manifold stably, solving 1 QP per step
Moreover: Stability Holds Even if QP Solved t o accuracy. Can use itera&ve methods.
Much less effort per step and be`er chances for real-‐&me performance !
.
One-QP per step stabilizes
Need for more features of DO solvers
Control of Polymeriza&on Reactor
Closed-Loop Transitions
Time
Set-Point
System Cooling Agent
Reactant Concentration
Temperature
Product A
Product B
• One QP per step may s&ll be too much • Moreover I may need also good global and fast local convergence proper&es as well, it is not all
about asympto&cs! • Some&mes one switch regimes, the op&mal point moves far away, and you s&ll want to be able to
track well. – MPC algorithm must exhibit global convergence and fast local convergence (i.e. Newton)!
• Also, power grid problems can be huge (US ~ 1 – 100 Billion Variables). Need scalable solvers.
Solution forms Time-Moving and Non-Smooth Manifold
Technical Problem
Latency
Active-Set Change
Non-Smooth Manifold
- Challenge is to Track Manifold Accurately (Classical Optimization) AND Stably (Latency Conscious: A good Step, Computer Fast)
Technical Problem
• Challenge is to Track Manifold Accurately AND Stably (Get Good Step with Minimum Latency) • This requires NLP Solvers with the Following Features:
• A) Classical Optimization Oriented : 1) Superlinear Convergence (Newton-Based) 2) Scalable Step Computation (Iterative Linear Algebra)
• B) Latency Conscious: 3) Asymptotic Monotonicity of Minor Iterations (Makes Progress in O(N)) 4) Active-Set Detection and Warm-Start
- Existing Solvers Tend to Fail at Least One Feature - Interior Point: 4, and to some extent, 2,3 - Augmented Lagrangian: 1 - SQP: 2
Exact Differentiable Penalty Functions (EDPFs) Consider Transformation using Squared Slacks
Equivalent To:
Apply DiPillo and Grippo’s Penalty Function DiPillo,Grippo, 1979, Bertsekas, 1982
Solve NLP Indirectly Through EDPF Problem:
Conclusions
§ Complex energy systems pose an enormous number of modeling and simula.on challenges.
§ Computa.onal Power will help, but it will not alone address many of the challenges.
§ This research requires sustained, mul.-‐area, integra.ve thinking in mathema.cs. § Increases focus on area of mathema.cs that were not explored up to this point
and creates opportuni.es for FUNDAMENTAL math advances:
§ Examples: resampling in stoch prog for expensive scenarios; stochas.c precondi.oning, fast nonlinear programming.
§ We expect many more such challenges, as embodied in our MACS center.
Mihai Anitescu -‐ Op.miza.on under uncertainty
38