op#miza#onandother* appliedmathema#cschallengesfor...

Op#miza#on and other Applied Mathema#cs Challenges for Complex Energy Systems

Mihai Anitescu

Mathema.cs and Computer Science Division Argonne Na.onal Laboratory

IMSE Illinois 2012

1. Mo&va&on: Management of Energy Systems under Ambient Condi&ons Uncertainty

2

Mihai Anitescu -‐ Op.miza.on under uncertainty

Ambient Condi&on Effects in Energy Systems Opera&on of Energy Systems is Strongly Affected by Ambient Condi&ons -‐ Power Grid Management: Predict Spa.o-‐Temporal Demands (Douglas, et.al. 1999) -‐ Power Plants: Genera.on levels affected by air humidity and temperature (General Electric) -‐ Petrochemical: Hea.ng and Cooling U.li.es (ExxonMobil)

-‐ Buildings: Hea.ng and Cooling Needs (Braun, et.al. 2004) -‐ (Focus) Next Genera&on Energy Systems assume a major renewable energy penetra.on: Wind + Solar + Fossil (Beyer, et.al. 1999)

-‐  Increased reliance on renewables must account for variability of ambient condi.ons, which cannot be done determinis.cally …

-‐  We must op.mize opera.onal and planning decisions accoun.ng for the uncertainty in ambient condi.ons (and others, e.g. demand)

-‐  Op&miza&on Under Uncertainty.

Wind Power Profiles 3


2. Impact: Stochas&c Unit Commitment – Management of Energy Systems

4


Stochas&c Predic&ve Control

Dynamic System Model

Stochas&c

Op&miza&on

Weather Model

Low-‐Level Control

Forecast

Energy System

Set-‐Points

Forecast & Uncertainty

Measurements

Stochas&c NLMPC

Min

x0

f0 (x0 ) + E Minx

f (x,ω )⎡⎣

⎤⎦{ }

subj. to. g0 x0( ) = b0gi x0 ,xi( ) = bi i =1,2…S

x0 ≥ 0, xi ≥ 0

Two-‐stage Stoch Prog

5


Stochas&c Unit Commitment with Wind Power (SAA)

§  Wind Forecast – WRF(Weather Research and Forecas.ng) Model –  Real-‐.me grid-‐nested 24h simula.on –  30 samples require 1h on 500 CPUs (Jazz@Argonne)


6

1min COST

s.t. , ,

, ,

ramping constr., min. up/down constr.

wind

wind

p u dsjk jk jk

s j ks

sjk kj

windsjk

j

wik ksj

ndsk

jjk

j

c c cN

p D s k

p D R s k

p

p

∈ ∈ ∈

∈

∈

∈

∈

⎛ ⎞= + +⎜ ⎟

⎝ ⎠+ = ∈ ∈

+ ≥ + ∈ ∈

∑ ∑∑

∑ ∑

∑ ∑S N T

N

N

N

N

S T

S T

Zavala & al 2010.

Thermal Units Schedule? Minimize Cost Sa.sfy Demand Have a Reserve Dispatch through network

0 24 48 720

200

400

600

800

1000

1200

Tota

l P

ow

er

[MW

]

Time [hr]

§  Unit commitment & energy dispatch with uncertain wind power genera.on for the State of Illinois, assuming 20% wind power penetra.on, using the same windfarm sites as the one exis.ng today.

§ Full integra.on with 10 thermal units to meet demands. Consider dynamics of start-‐up, shutdown, set-‐point changes

§  The solu.on is only 1% more expensive then the one with exact informa.on. Solu.on on average infeasible at 10%.

wind power


7

Demand Samples Wind

Thermal

Wind power forecast and stochas&c programming

Some Considera&ons in Using Supercompu&ng for Power Grid

§  Is it really worth using a supercomputer for this task? (We need the answer every 1hr with 24 hour .me horizons. )

§  Let’s look at the most pressing item of Supercompu.ng usage: power. –  BG/P (and exascale) needs <~ 20MW of power. –  The Midwest US has 140GW of power installed, and the peak demands runs up to

110GW. –  We will never reduce power consump.on, but we will make it more reliable, less

dependent on fossil, and cheaper by bemer managing the peak

§  If we accept this will lead to 10% more renewable penetra.on (our SUC study), then this is worth on the order of 10-‐15GW, far above what BG/P costs in power consump.on.

§  In addi.on opera.onal constraints makes supercompu.ng (if uncertainty needed to account for) necessary and not just useful or convenient.

§  But, even if approxima.ons will work, this tool will be helpful as the “gold standard” for valida.ng other algorithms to be deployed on defined computa.onal resources.


8

3. Low-‐Hanging Fruit Scalable SoWware: PIPS (Parallel Interior Point Stochas&c Programming) – Petra, Lubin, Anitescu

9


PIPS – Our Scalable Stochas&c Programming Solver Using Direct Schur Complement Method

§  The arrow shape of H

§  Solving Hz=r


10

H 1 G1T

H 2 G2T

H S GS

T

G1 G2 … GS H 0

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥⎥

=

L1L2

LSL10 L20 … LS 0 Lc

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

D1D2

DNDc

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

L1T L10

T

LT2 L20T

LTS LS 0

T

LTc

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥⎥

10

1

10

, , 1, ,

, .

T T

T Ti i i c

i i i i i i i iS

ci

c

L D L H L G L D i S

H H G D L CC G L−

− −

=

= =

= −

=

=∑

K

1

10 0

10

, 1, ,i i i

c i i

S

i

Lw r i S

w L r wL=

−

−

=

⎛ ⎞⎜ ⎟=⎝ ⎠

=

−∑

K1 , 0,...,i i iv iD Sw− == ( )

10 0

0 0 ,

1, .

c

T Ti i i i

L v

L v

z

z

i S

L z

−

−

=

= −

= KBack subs.tu.on

Implicit factoriza.on, C is dense, H’s are sparse.

Diagonal solve Forward subs.tu.on

Parallelizing the 1st stage linear algebra

§  We distribute the 1st stage Schur complement system.

§  C is treated as dense.

§  Alterna.ve to PSC for problems with large number of 1st stage variables.

§  Removes the memory bomleneck of PSC and DSC.

§  We inves.gated ScaLapack, Elemental (successor of PLAPACK) –  None have a solver for symmetric indefinite matrices (Bunch-‐Kaufman); –  LU or Cholesky only. –  So we had to think of modifying either.

Mihai Anitescu -‐-‐ Stochas.c Programming

11

C =Q A0

T

A0 0

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥, Q dense symm. pos. def., 0A sparse full rank.

Cholesky-‐based -‐like factoriza&on

§ 

§  Can be viewed as an “implicit” normal equa.ons approach.

§  In-‐place implementa.on inside Elemental: no extra memory needed.

§  Idea: modify the Cholesky factoriza.on, by changing the sign auer processing p columns.

§  It is much easier to do in Elemental, since this distributes elements, not blocks.

§  Twice as fast as LU

§  Works for more general saddle-‐point linear systems, i.e., pos. semi-‐def. (2,2) block.


12

Q AT

A 0

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥= L 0

AL−T L

⎡

⎣⎢⎢

⎤

⎦⎥⎥

I− I

⎡

⎣⎢

⎤

⎦⎥

LT L−1AT

0 LT

⎡

⎣⎢⎢

⎤

⎦⎥⎥, where LLT = Q, LLT = A Q−1AT

TLDL

Distribu&ng the 1st stage Schur complement matrix

§  All processors contribute to all of the elements of the (1,1) dense block

§  A large amount of inter-‐process communica.on occurs.

§  Possibly more costly than the factoriza.on itself.

§  Solu.on: use buffer to reduce the number of messages when doing a Reduce_scaLer.

§  approach also reduces the communica.on by half – only need to send lower triangle.


13

Q = Q 0 +1S

AiT BiQ

i

−1Bi

T⎛⎝

⎞⎠

−1

Ai

⎡

⎣⎢

⎤

⎦⎥

i=1

S

∑

TLDL

Large-‐scale performance


14

§  Comparison of ScaLapack (LU), Elemental(LU), and (1024 cores)

§  Strong scaling §  90.1% from 64 to 1024 cores; §  75.4% from 64 to 2048 cores. §  > 4,000 scenarios.

TLDL

SAA problem: 189 million variables

PIPS – Parallel solver for stochas&c op&miza&on

§  Interior-‐point method implementa.on (Mehrotra’s algorithm) §  Scenario-‐based decomposi.on of the linear algebra. §  PIPS reuses OOQP (Object-‐oriented quadra.c programming solver) class hierarchy. §  New parallel linear algebra layer for block-‐angular IPM linear systems. §  Hybrid MPI+SMP paralleliza.on

§  First-‐stage Schur complement: dense linear algebra. –  Distributed factoriza.on and backsolves (by using Elemental) if needed. –  Shared-‐memory paralleliza.on(SMP) is obtained via Elemental. –  Distributed assembling of the SC matrix is done by a streamlined Reduced_sca`er that is also

in-‐node SMP-‐accelerated.

§  Second-‐stage linear systems are sparse. –  Supports various sparse solvers: MA57 (HSL UK), WSMP(IBM). –  SMP is obtained with WSMP


15

PIPS Solver Capabili&es §  Hybrid MPI/SMP running on Blue Gene/P

–  Successfully (though incompletely due to alloca.on limit) run on up to 32,768 nodes (96% strong scaling) for Illinois problem with grid constraints. 3B variables, maybe largest ever solved?

§  Handles up to 100,000 first-‐stage variables. Previous results dealt with O(20-‐50). §  Close to real-‐.me solu.ons (24 hr horizon in 1 hr wallclock)

–  Further development needed, since users aim for •  More uncertainty, more detail (x 10) •  Faster Dynamics à Shorter Decision Window (x 10) •  Longer Horizons (California == 72 hours) (x 3)

16


Components of Execu&on Time and Strong Scaling

4k 8k 16k 32k

Backsolve

Factor SC

Distrib. SC

BG/P Nodes

Sec./Iteration

0

10

1400

5

200

400

600

800

1000

1200

1

17

§  32K nodes=130K cores (80% BG/P) §  “Backsolve” phase embarrassingly parallel, but not Schur Complement (SC) §  Communica.on for “Distrib. SC” not yet a bomleneck, but we will get there.

Strong Scaling

BG/P Nodes

Speedup(baseline4k)

4k 8k 16k 32k

4k

8k

16k

32k

Linear

PIPS

1


4. The harder problems need some mathema&cs

18


4.1 Q1: How do I deal with the impending first-‐stage bo`leneck?

19


The Stochas&c Precondi&oner §  The limi.ng factor in the scalability of Schur method is the expensive solve with dense

Schur complement matrix

–  A computa.onal bomleneck: workers sit idle wai.ng for the master to factorize C.

§  Remedy – the precondi.oned Schur complement (PSC) –  1. factorize incomplete matrix P in the same .me C is computed. –  2. use the factoriza.on of P to solve with C very fast.

§  In linear algebra terms –  P is a precondi.oner for C. Our choice of P is

where is an IID subset of n scenarios. –  Krylov itera.ves solves (PCG or BiCGStab) replaces the direct solves

20

1 2{ , , , }nk k k= KK Sn = Q 0 +

1n

Aki

T BkiQ ki

−1Bki

T⎛⎝

⎞⎠

−1

Aki

⎡

⎣⎢

⎤

⎦⎥

i=1

n

∑ ,

C = Q 0 +

1N

AiT BiQ

i

−1Bi

T⎛⎝

⎞⎠

−1

Ai

⎡

⎣⎢

⎤

⎦⎥

i=1

N

∑


Precondi&oned Schur Complement (PSC)


21

1

0 01

i i i

Nii

i

N

w r

r L

L

r w

−

=

=

⎛ ⎞⎜ ⎟⎝

=⎠

−∑%

1 , 0,...,i i iv iD Nw− ==

( )0 0T T

i i i iz L v L z−= −

10

1

Ti

N

ii

iH H GC G −

=

−= ∑1

0 1, , ,,T Ti i i i i i i iL L iD L H G L D N− −= == K

(separate process)

TM M ML D L P=

00 Krylov( , , )C Pz r= %

P is a C built from a subset of scenarios

Quality of the Stochas&c Precondi&oner §  “Exponen&ally” be`er precondi&oning (Petra & Anitescu 2010)

22

21

24 24Pr(| ( exp) 1| ) 2

||2 ||n NN max

np L S

S pS ε ελ − ⎛ ⎞−⎜ ⎟

⎝≥ ≤

⎠−

Op.mal use of PSC – linear scaling

Factoriza.on of the precondi.oner can not be hidden anymore by the computa.on of C.

§  A typical scaling behavior of our approach. Bemer scaling than the direct Schur complement method (DSC) is exhibited by PSC.

§  DSC uses p processes, PSC uses p+1.


Performance of the precondi&oner

§  Eigenvalues clustering & Krylov itera.ons §  Affected by the well-‐known ill-‐condi.oning of IPMs.


23

Sn ≈ SN and SN ≈ E S(ω )⎡⎣ ⎤⎦ , where

S(ω ) = Q0 + D0( )+ AT (ω ) B(ω ) Q(ω ) + D(ω )( )−1BT (ω )( )−1

A(ω )⎡

⎣⎢

⎤

⎦⎥

4.2 Q2: How do I use very expensive samples?

24


Bootstrap for stochas&c op&miza&on of energy systems

§  Sampling the uncertainty present in complex energy systems may be a computa.onally intensive task –  Example: weather forecas.ng –  May need 400K CPUs for 30 samples at the resolu.on we need.

§  Obtaining uncertainty es.mates (confidence intervals) on the op.mal value is important in policy-‐making process.

§  Only a small number of samples(scenarios) can be afforded. Therefore an opera.onal constraint makes me start to care about the low-‐sample size regime and its asympto.cs.

§  But how good is the current state of the theory in that regime?


25

Theory situa&on for Stochas&c Programming §  Most es.mates for SAA are based on results of the following type:

§  This allows, in principle, for the convergence of the confidence intervals to be arbitrarily slow.

§  Current state of the area is built around applica.on of the Delta Theorem , which provides the results of the type:

§  But this is not sufficient for similar results for the confidence intervals !!!

§  Intui.on: Convergence in probability tells me how well I behave on a “good” set increasing to probability 1, but tells me nothing about the bad set.


26

N 0.5 θ −θ̂σ̂

⎡

⎣⎢

⎤

⎦⎥

D⎯ →⎯ N(0,1)

X(ω ) =ω , XN (ω ) =−1, 0 ≤ω < 1

log(N +1)

ω , 1log(N +1)

≤ω ≤1.

⎧

⎨⎪⎪

⎩⎪⎪

⇒P( lim

N→∞Na XN (ω )− X(ω ) = 0) = 1

P(XN (ω ) ≤ 0)− P(X(ω ) ≤ 0) =1

log(N +1) N −b

⎧

⎨⎪⎪

⎩⎪⎪

XN (ω )− X(ω ) = oP (N−a )⇔ P Na XN (ω )− X(ω )( )→ 0

Large Devia&on + Bootstrap

§  Bootstrap is a resampling method that builds high-‐order confidence es.mates. §  The idea of bootstrapping is to squeeze out informa.on from a small number of

samples by resampling (with replacement). §  But it applies only to finite-‐dimensional func.ons of means, and the op.mal value

of stochas.c op.miza.on is not one (due to nonlinearity).

§  Idea: Use large devia.ons, to produce exponen.ally convergent probability sets

§  Here, depends on the expected value of the objec.ve func.on and its higher deriva.ves at the solu.on of the SAA approxima.on problem. We can thus use bootstrap theory to produce confidence intervals for and exponen.al convergence ensures order stays the same as for bootstrap !!


27

P(| Nb (θ̂ −θ) |> ) = f1()N

f2 () exp(− f3()Nc )

θ̂

θ̂

The es&mator

§  We proposed a corrected sta.s.c, computable at the SAA approxima.on

§  L is the Lagrangian of the problem and J is the Jacobian of the constraints. §  is the solu.on of the SAA problem obtained from a sample of size N.

§  Bootstrapping is performed based on a second sample of size M, and it works due to the fact that it is now applied at a set point so finite dimensional results do apply.


28

Φ = E f (xN )⎡⎣ ⎤⎦ −

12

E∇L(xN ,λN )0

⎛

⎝⎜

⎞

⎠⎟

TE∇2 L(xN ,λN ) J (xN )T

J (xN ) 0

⎛

⎝⎜⎜

⎞

⎠⎟⎟

−1

E∇L(xN ,λN )0

⎛

⎝⎜

⎞

⎠⎟

Nx

Nx

Accuracy of the es&mator’s confidence levels

§  We proved that bootstrap confidence intervals build using are close to second order correct for the true op.mal value :

§  We observed the predicted or bemer correctness in the numerical simula.ons

§  bootstrapping outperforms classical normal approxima.on method. §  We now have analy.cal techniques for asympto.cs of confidence intervals in SP!


29

Φ*( )f xθ =

P(θ ∈Jαb ) =α +O(N −1+a ),a > 0.

Φ

Correctness order 0.32




4.3: Q3: How do I roll the horizon in real &me?


30

Example DO:

Off-the-Shelf : Solve to Given Accuracy (Neglect Dynamics)

Real-Time (Z & A) : One SQP Iteration per step

Fundamental Limitations of Off-The-Shelf Optimization

Context: Parametric NLP KKT system for QP

Note: Canonical Form Iden&cal to Time-‐Steping for DVI

Exact Solu&on Sa&sfies:

From Lipschitz Con&nuity of strongly regular GE:

Op&mal Solu&on

Lineariza&on Point

QP Solu&on

-‐ Strong Regularity Requires SSOC and LICQ -‐  NLP Error is Bounded by LGE Perturba&on -‐  One QP solu&on from exact manifold is second-‐order

accurate

MPC as Dynamic Generalized Equation (Z & A)

wt* −wt ≤ LΔt 2

Time lineariza&on of Op&mality Condi&ons: Find

-‐ A: LGE is Strongly Regular at ALL e.g. NLP sa&sfies LICQ and SOSC everywhere

But for linearized DO I am never EXACTLY on the manifold: What then?

Theorem (elucida&ng an issue posed by Diehl et al.)

Solve off-‐manifold &me-‐dependent QP

Then: For sufficiently small , we can track the manifold stably, solving 1 QP per step

Moreover: Stability Holds Even if QP Solved t o accuracy. Can use itera&ve methods.

Much less effort per step and be`er chances for real-‐&me performance !

.

One-QP per step stabilizes

Need for more features of DO solvers

Control of Polymeriza&on Reactor

Closed-Loop Transitions

Time

Set-Point

System Cooling Agent

Reactant Concentration

Temperature

Product A

Product B

•  One QP per step may s&ll be too much •  Moreover I may need also good global and fast local convergence proper&es as well, it is not all

about asympto&cs! •  Some&mes one switch regimes, the op&mal point moves far away, and you s&ll want to be able to

track well. – MPC algorithm must exhibit global convergence and fast local convergence (i.e. Newton)!

•  Also, power grid problems can be huge (US ~ 1 – 100 Billion Variables). Need scalable solvers.

Solution forms Time-Moving and Non-Smooth Manifold

Technical Problem

Latency

Active-Set Change

Non-Smooth Manifold

-  Challenge is to Track Manifold Accurately (Classical Optimization) AND Stably (Latency Conscious: A good Step, Computer Fast)

Technical Problem

•  Challenge is to Track Manifold Accurately AND Stably (Get Good Step with Minimum Latency) •  This requires NLP Solvers with the Following Features:

•  A) Classical Optimization Oriented : 1) Superlinear Convergence (Newton-Based) 2) Scalable Step Computation (Iterative Linear Algebra)

•  B) Latency Conscious: 3)   Asymptotic Monotonicity of Minor Iterations (Makes Progress in O(N)) 4)   Active-Set Detection and Warm-Start

-  Existing Solvers Tend to Fail at Least One Feature -  Interior Point: 4, and to some extent, 2,3 -  Augmented Lagrangian: 1 -  SQP: 2

Exact Differentiable Penalty Functions (EDPFs) Consider Transformation using Squared Slacks

Equivalent To:

Apply DiPillo and Grippo’s Penalty Function DiPillo,Grippo, 1979, Bertsekas, 1982

Solve NLP Indirectly Through EDPF Problem:

Conclusions

§  Complex energy systems pose an enormous number of modeling and simula.on challenges.

§  Computa.onal Power will help, but it will not alone address many of the challenges.

§  This research requires sustained, mul.-‐area, integra.ve thinking in mathema.cs. §  Increases focus on area of mathema.cs that were not explored up to this point

and creates opportuni.es for FUNDAMENTAL math advances:

§  Examples: resampling in stoch prog for expensive scenarios; stochas.c precondi.oning, fast nonlinear programming.

§  We expect many more such challenges, as embodied in our MACS center.


38

op#miza#on*and*other* applied*mathema#cs*challenges*for*...

Documents

op#miza#onandother* appliedmathema#cschallengesfor...