=1=models and algorithms for stochastic...

Models and Algorithms for StochasticProgramming

Jeff Linderoth

Dept. of Industrial and Systems EngineeringUniv. of Wisconsin-Madison

[email protected]

Enterprise-Wide Optimization MeetingCarnegie-Mellon University

March 10th, 2009

Jeff Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 1 / 82

Mission Impossible

ExplainingStochasticProgramming in90 mins

I will try to givean overview –please interruptwith questions!


What I’ll Ramble On

Models

How to deal with uncertainty

Why modeling uncertainty is important

Who has used stochastic programming?

Why more people don’t use stochastic programming

Algorithms

Extensive Form

Benders Decomposition (2-stage)

Sampling

Nested Benders Decomposition (multistage)


Dealing with Uncertainty Definition of Stochastic Programming

Etymology

program:

(3) An ordered list of events to take place or procedures to be followed; ascheduleLate Latin programma, public notice, from Greek programma, programmat-, from

prographein, to write publicly

stochastic:

(1b) Involving chance or probabilityGreek stokhastikos, from stokhasts, diviner, from stokhazesthai, to guess at, from

stokhos, aim, goal.

Source: The American Heritage Dictionary of the English Language, Fourth

Edition.


Dealing with Uncertainty Sources of Uncertainty

Sources of Uncertainty

Houston, we have uncertainty!

What we anticipate seldom occurs; what we least expectedgenerally happens.

Benjamin Disraeli (1804 - 1881)

Financial

Market price movementsDefaults by a business partner

Operational

Customer demands,Travel times

Technology related

Will a new technology beready “in time”

Market Related

Shifts in tastes

Competition

What will your competitorsstrategy be next year?

Acts of God: Jeff’s travelexperience yesterday!!!

WeatherEquipment failureBirds flying into planes



Stochastic Programming

A tool used in planning under uncertainty

More specifically: Mathematical Programming, or Optimization, inwhich some of the parameters defining a problem instance arerandom, or uncertain

Optimization

minx∈X

f(x)

x: Variables you control

Stochastic Optimization

minx∈X(ω)

F(x,ω)

ω: Variables you don’t control

Stochastic Optimization is UNDEFINED

You can’t possibly choose an x that optimizes for all ω

More specifiation is required



Jeff’s Stochastic Programming Assumptions

In stochastic programming, we assume that a probability distributionfor the uncertainty ω is known or can be approximated.

We also assume that probabilities are independent of the decisionsthat are taken.

Decision-dependent uncertainty

Decisions influence probabilitydistributions

Decisions influence knowledgediscovery

Want to know about stochasticprogramming withdecision-dependent uncertainty?

Talk to Ignacio!



Probability Theory(?)

This notion of having to know a probability distribution for therandomness is troubling, since in reality, very few people exactly knowthat

Their customer demands follow a log-normal distribution with mean17.26 and variance 2.88726Their plant will have forced shutdowns following a Weibull distributionwith parameters (100.25, 73.7916)

Instead, you might be able to

Estimate distributions from historial data (be careful!)Have “qualitative” probability measures (“low/medium/high”)Create your own scenarios of interest



The Journey is the Reward?

Business process people can argue/discuss amongst themselves whatthe various scenarios might be and the outcomes of those scenarios.

This process by itself can be very useful

There is a good amount of frightening-looking mathematical theoryand computational evidence that solutions obtained from stochasticprograms are often quite “stable” with respect to changes in theinput probability distribution

The Upshot

It doesn’t matter ”too much” if your numbers aren’t quite right

The insights you gain from considering the uncertainty can still bevaluable


Dealing with Uncertainty Related Decision Making Technologies

A Concrete Example: An Uncertain LP

min cx

s.t. Ax ≥ b

T(ω)x ≥ h(ω)

x ≥ 0

T(ω) and h(ω) are uncertain: X(ω) = x | Ax ≥ b, T(ω)x ≥ h(ω)

We must choose x despite this uncertaintyExamples:

Decide production quantities before knowing demandsConstraint data includes imprecise measurements

Three Approaches

1 Robust optimization

2 Chance-constrained programming

3 Recourse-based stochastic programming



Robust Optimization

Uncertain data is assumed to lie in an uncertainty set

(T(ω), h(ω)) ∈ U

Guarantee that constraints be satisfied for all possible realizations

min cx

s.t. Ax ≥ b

Tx ≥ h ∀(T, h) ∈ Ux ≥ 0

Tractability depends on structure of U



Robust Optimization

To control conservatism, uncertainty set can be parameterized by abudget of uncertainty

Example 1: Tij(ω) ∈ [lij, uij] (Bertsimas and Sim)

At most K of the components in each row can differ from the nominalvalueNature can choose which K will differK large ⇒ highly conservative (Soyster)K = 0 ⇒ No robustnessCan formulate this problem as a linear program

Example 2: U is ellipsoidal (Ben-Tal and Nemirovski)



Robust Optimization

Advantages:

Computationally tractable

Can yield extremely reliable solutions

Does not require stochastic model

Disadvantages:

Does not use a stochastic model

Although conservatism can be controlled, the control parameterdoesn’t have meaning to decision makers




Assume uncertain data are random variables with known distributions

Two approaches to uncertain constraints:

1 Require constraint to be satisfied with high probability

min cx : x ∈ X, PT(ω)x ≥ h(ω) ≥ 1 − ε

ε is a parameter, e.g. ε = 0.05 or ε = 0.01

Linear program with probabilistic (chance) constraints

2 Penalize violations of constraints

mincx + E[λ(h(ω) − T(ω)x)+] : x ∈ X

Special case of a Two stage stochastic program



Linear Programs with Probabilistic Constraints

Individual constraints:

min

cx : x ∈ X, PT(ω)ix ≥ h(ω)i ≥ 1 − εi ∀i

Joint constraints:

min cx : x ∈ X, PT(ω)x ≥ h(ω) ≥ 1 − ε

Bad news: calculating probability is hard

Worse news: probabilistic constraints are generally non-convex!



Non-convexity of the feasible region

Consider: Px1 ≥ ξ1, x2 ≥ ξ2 ≥ 0.6

Each dot: a realization of ξ which occurs with probability 1/10

x2

x1



Two Stage Stochastic Programming

(SP) mincx + E[λ(h(ω) − T(ω)x)+] : x ∈ X

Choose x ⇒ Observe (T(ω), h(ω)) ⇒ Pay penalty

Good news: (SP) is convex

Bad news: Calculating expectation is hard

Successful Approach: Sample Average Approximation

Generate (T(ω)1, h(ω)1), . . . , (T(ω)N, h(ω)N) and solve

(SPN) min

cx +

N∑i=1

1

Nλ(h(ω)i − T(ω)ix)+ : x ∈ X

x∗N is a often a good approximation to true optimal solution

We’ll see (a lot) more later!



Stochastic Programming vs. Simulation

Simulation

(Pro): Very flexible—System need not be mathematically defined(Pro): Fast(Con): If I run 100 “what-ifs” and get 100 different solutions, howdoes simulation help me plan for the future?


(Con): More challenging to build and solve models(Pro): SP helps you “optimize” over your “what-ifs”.

The Upshot!

Use simulation to generate scenarios. Input the scenarios to a stochas-tic program to show how to decide how to best hedge against thisuncertainty



Multistage Decision Making

ω1

x1

ω2

x2

ω3

xT−1

ωT

xT

Random vectorsω1 ∈ Rn1 ,ω2 ∈Rn2 , . . . , ωT ∈ RnT

Make sequence ofdecisions x1 ∈ X1, x2 ∈X2, . . . , xT ∈ XT .

The evolution of information is of fundamental importance to thedecision-making progress.

We make a decision now (x1)

Nature makes a random decision ω2: (“stuff” happens)

We make a second period decision x2 that attempts to repair thehavoc wrought by nature in (recourse).

Repeat as necessary...

We make decisions in stages, in between which uncertainty is revealedto us


Why use Stochastic Programming The Newsvendor

Hot Off the Presses

A paperboy (newsvendor) needs to decide how many papers to buy inorder to maximize his profit.

He doesn’t know at the beginning of the day how many papers he cansell (his demand).

Each newspaper costs c.He can sell each newspaper for a price of q.He can return each unsold newspaper at the end of the day for r.(Obviously r < c < q).

The Newsvendor Problem

Given only knowledge of the probability distribution F of demand,how may papers should the newsvendor buy?



Newsvendor Problem

Suppose that the newsvendor’s goal is to maximize the profits in thelong run. (In expectation)...

Intuitively, it seems that the newsvendor’s best strategy is to everypurchase the average demand

Take Away Message!

The “optimal” solution is NOT to use the mean demand.

In fact, the two solution can be far apart. (Depending on thedistribution, and parameters r, c, q



Example—The Newsvendor

c = 50, q = 70, r = 5

Demand: (Truncated) Normal distributed. µ = 100, σ = 50

Mean Value Solution

Buy 100. (Duh!)Expect to profit: 2000TRUE long run profit ≈ 650

Stochastic Solution

Buy 75.Expect to profit: 1500TRUE long run profit ≈ 880

The difference between the two solutions (880 − 650) is called thevalue of the stochastic solution.

How much is it worth to you to plan using full uncertainty informationas opposed to mean-values for the uncertain parameters



A Take Away Message

The “Flaw” of Averages

The flaw of averages occurs when uncertainties are replaced bysingle average numbers planning.

Did you hear the one about the statistician who drowned fording ariver with an average depth of three feet.

Point Estimates

If you are planning with point estimates for demands, then you areplanning sub-optimally

It doesn’t matter how carefully you choose the point estimate – itis impossible to hedge against future uncertainty by consideringone realization of the uncertainty in your planning process


Stochastic Programming Success Stories Financial Optimization

Russell-Yasuda Kasai

Yasuda Kasai: Seventh largest (worldwide) property and casualtyinsurer.

Assets of > U3.47 trillion

Liability structure is complex, but want a tool that will allow them tomaximize the revenue from these assets in the face of assetmanagement restrictions

Frank Russell Company hired to develop Asset-Liability ManagementModel based on (multistage) stochastic programming

Carino, Myers, Ziemba, Second place in Edelman prize competition ofINFORMS.



Asset Allocation Model

Decisions:Investment amounts for various assets

Random Events:Return on investment for each asset.Liability payouts

Constraints:Asset Allocation Constraints (Complex)Loan ModelLiability Model

Compared to a performance benchmark established at YasudaKasai at the beginning of the Fiscal Year to measure the valueadded by their use of the model, the new model increased annualincome by U9.5 billion.

Mr. Kunihiko Sasamoto, Director and Deputy President, Yasuda Kasai.



But Wait There’s More!

Ease of Use

Risk is well defined, not using some “abstract” measure like standarddeviation

Improved other systems

Other models and IT systems “upgraded” to support new system

Improved Human Judgement

How to think about and incorporate uncertainty into the planningprocess



Product Portfolio Planning

Decisions:

Invest in various projects (All or nothing investment).Complicated project prerequisite structure

Random Events: (HUGE impact)Design-win from customersTechnology failuresMarket forces

Constraints:

ResourcesHire-fire costs



Product Portfolio Management at Agere

We implemented a decision support tool for Agere

1 Optimization Model

2 Simulator of future conditions – (random events were correlated!)

The muckety-mucks loved it!

They like the ability to talk about the different scenarios.

Focuses discussion in business planning meetings

Gives “unbiased” simulator view of potential outcomes of decisions


Stochastic Programming Success Stories Logistics

SP in the Supply Chain

Decisions:

Regular supply chain decision: How much? where? and when?

Random Elements:

Demands, prices, resource capacity.Supply chains going global imply that companies are now more exposedto risky factors such as exchange rates and reliability of transferchannels.

Constraints:

Regular supply chain constraints: Flow balance, material availability,etc.



A Case Study

T. Santoso, S. Ahmed, M. Goetschalckx, and A. Shapiro. ”A Stochastic ProgrammingApproach for Supply Chain Network Design under Uncertainty,” European Journal ofOperational Research, vol.167, pp.96-115, 2005.

Two real supply chains

One Domestic (Cardboard packages to breweries and soft drinkmanufacturers...)One global

Sizes: Around 100 facilities. Around 100 customers,

In general, the (sampled) stochastic model was roughly 5% betterthan using the “mean value” of demand, translating into millions ofdollars in potential savings.



Supply Chain Projects

Bulk Gas Production and Distribution

Uncertainty in customer demands,“competitor drain”Built (prototype) optimization modeland simulator.They are now(?) doing a realimplementation

Lesson Learned

Having a (static) simulation of the production-disribution process is akey component to the project


Stochastic Programming Success Stories Other Industries

Other Industrial Applications of SP

Energy Industry

Unit Commitment Problem: Schedule production from powergeneration units

Telecommunication

Capacity/bandwidth planning: Invest in capacity for the network beforeyou know the true bandwidth demands

Military

Network Interdiction Problem: Where to place “agent” on a network to“interrupt” evil-doers

It ain’t that rosy

As far as I know, mot implementations are built on a case-by-case basisand are fairly ad-hoc.


Why More People Don’t Use SP

Stochastic Programming Objectives—Risk Profile

What is your goal?

1 I want to do well on average

Expected Value

2 I want to limit my exposure in the “worst” case or cases

Value at Risk/Conditional Value at Risk

3 I want the probability that I achieve a goal to be sufficiently high?

Chance constraints

4 I want to achieve a “steady” return?

Dispersion-based objectives

Each of these imply a different notion of risk, and lead to differentstochastic optimization problems

Stochastic Programming isn’t about getting a number, it’s aboutgetting a distribution that looks good to you



Some SP Objectives

min F(x, ω) Mean-Value Problem

min EωF(x,ω) Risk Neutral

min EωF(x,ω) − λρ(F(x,ω)) Risk Measures

ρ(F(x,ω)) = VarF(x,ω) Markowitzρ(F(x,ω)) = E [(EF(x,ω) − F(x,ω))+] Semideviation



Things People Want

Arbi

trary

Distribu

tions

(Conditional) Value at Risk

Network

Problem

s

Scenario Trees Stochastic Dynamic Programming

Robust Optimization

(Joint)Ch

ance

Constra

ints

Stochastic Dominance

Stochastic Control

Joint Distributions

Nonlinear problems

Int

eger pro

blems

Free Beer



Supporting Stochastic Programs

I point out all these different flavors of SP to highlight what I thinkhas been one of the hinderances of having a modeling laguage for SP.

I don’t know the key to success, but the key to failure is trying toplease everybody.

Bill Cosby (1937 - )

I believe the fact that a “stochastic program” is not a well-definedconcept is one of the fundamental reasons why more people don’t usestochastic programming

Other reasons people don’t use stochastic programming?



Why Don’t More People Use Stochastic Programming

They don’t start their training early enough!

Jacob Linderoth, age 4 months, reading Introduction to StochasticProgramming


Why More People Don’t Use SP Barriers to Stochastic Programming

Why Don’t More People Use Stochastic Programming

Because they don’t know the probability distribution?

Even crude approximations can help

Because they can’t “solve” them?

Linderoth and Wright solve a 10-million scenario problemRecent theory suggests that you don’t need to include many scenariosto get an accurate solution to the true problem

Because they can’t model them?

Modeling tools are on the way (more later)

Because it is hard to verify that the solution is better

The same could be said of Deterministic OptimizationUse simulation to verify that the solution is better


Why More People Don’t Use SP Barriers to Stochastic Programming

Probability Management

A “true believer” is Sam Savage (consulting professor at at Stanford).

He believes companies should have a comprehensive probabilitymanagement plan.

Probability Management

Simulations to generate distributions

Information systems to hold distributions of key uncertain inputs

A “Chief probability officer” responsible for signing off on thedistributions

You can start small...

1 What are your scenarios and distributions?

2 Do you have models that can use this information?


Algorithms

ALGORITHMS

I focus almost exclusively on two-stage recourse problems


Algorithms Two-Stage Stochastic Programs with Recourse

Stochastic ProgrammingA Stochastic Program

minx∈X

EωF(x,ω)

2 Stage Stochastic LP w/Recourse

F(x,ω)def= cTx + Q(x,ω)

cTx: Pay me now

Q(x,ω): Pay me later

The Recourse Problem

Q(x,ω)def= minqTy

Wy = h(ω) − T(ω)x

y ≥ 0

Expected Recourse Function:

Q(x)def= Eω[Q(x,ω)]

Two-Stage Stochastic LP

minx≥0,Ax=b

cTx +Q(x)Jeff Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 41 / 82

Algorithms Extensive Form

Extensive Form

Assume Ω = ω1,ω2, . . .ωS ⊆ Rr,P(ω = ωs) = ps,∀s = 1, 2, . . . , S

Ts ≡ T(ωs), hs = h(ωs)

Then can write extensive form:

cTx + p1qTy1 + p2qTy2 + · · · + psqTys

s.t.Ax = b

T1x + Wy1 = h1

T2x + Wy2 = h2

... +. . .

...TSx + Wys = hs

x ∈ X y1 ∈ Y y2 ∈ Y ys ∈ Y

The Upshot!

This is just a larger linear program

It is a larger linear program that also has special structure



Best-Known Solution Procedure

METH O D



Small SP’s are Easy!

0 50 100 150 200 250 3000

10

20

30

40

50

60

70

number of scenarios

Tim

e

Cplex/Extensive Form

L−shaped


Algorithms The LShaped Method

Two-Stage Stochastic Linear Programming

We assume that the P has finite support, so ω has a finite number ofpossible realizations (scenarios):

Q(x) =

N∑i=1

piQ(x,ωi)

For a partition of the N scenarios into sets N1,N2, . . .Nt, let Q[j](x)

be the contribution of the jth set to Q(x):

Q[j](x)def=

∑i∈Nj

piQ(x,ωi)

so then Q(x) =∑t

j=1Q[j]



Important (and well-known) Facts

Q(x,ωi), Q[·](x), and Q(x) are piecewise linear convex functions of x.

If πi is an optimal dual solution to the linear program correspondingto Q(x,ωi), then −TT

i πi ∈ ∂Q(x,ωi)

gj(x)def=

∑i∈Nj

−piTTi πi ∈ ∂Q[j](x).

Key Idea

Represent Q[j](x) by an artificial variable θj and find supportingplanes for θj

θj ≥ Q[j](xk) + gj(x

k)T (x − xk) (∗)

Point of Decomposition

Evaluation of Q(x) is separable

We can solve linear programs corresponding to each Q(x,ωi)

independently – in parallel!



Worth 1000 Words?

x

Q(x)



Worth 1000 Words

x

Q(x)

xk



Worth 1000 Words

x

Q(x)

x1x2



(Multicut) L-shaped method

M

s1 s2 s3 s4 s5

M

s1 s2 s3 s4 s5

1 Solve the masterproblem M with thecurrent approximation toQ(x) for xk.

2 Solve the subproblems,(sj) evaluating Q(xk) andobtaining subgradient(s)to update masterapproximation M

3 k = k+1. Goto 1.

Let’s Get Parallel!

Of course, solution of sj can be carried out independently.



Warning!

If Q(x) is not convex, then this algorithm doesn’t work

If you have a integer recourse variables y ∈ Zp × Rn−p, the problembecomes significantly more difficult.

Your Options

Give your favorite solver the full extensive form (and pray)

Weak relaxation

Decomposition method: Carøe and Schultz, Sen

Spatial branch and bound:

Want to know about stochasticinteger programming/spatial branchand bound?

Talk to Nick!



Does it Work? The World’s Largest LP

Linderoth and Wright built a fancy decomposition-based solvercapable of running on “the grid”

Storm – A stochastic cargo-flight scheduling problem (Mulvey andRuszczynski)

We aim to solve an instance with 10,000,000 scenarios

x ∈ R121, yk ∈ R1259

The deterministic equivalent LP is of size

A ∈ R985,032,889×12,590,000,121



The Super Storm Computer

Number Type Location

184 Intel/Linux Argonne

254 Intel/Linux New Mexico

36 Intel/Linux NCSA

265 Intel/Linux Wisconsin88 Intel/Solaris Wisconsin239 Sun/Solaris Wisconsin

124 Intel/Linux Georgia Tech90 Intel/Solaris Georgia Tech13 Sun/Solaris Georgia Tech

9 Intel/Linux Columbia U.10 Sun/Solaris Columbia U.

33 Intel/Linux Italy (INFN)

1345



TA-DA!!!!!

Wall clock time 31:53:37CPU time 1.03 Years

Avg. # machines 433Max # machines 556Parallel Efficiency 67%

Master iterations 199CPU Time solving the master problem 1:54:37

Maximum number of rows in master problem 39647



Number of Workers

0

100

200

300

400

500

600

0 20000 40000 60000 80000 100000 120000 140000

#wor

kers

Sec.Jeff Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 55 / 82

Sampling

Why Sampling is Necessary

ys ≡ y(ωs) is the recourse action to take if scenario ωs occurs.

Pro: It’s a linear program.

Con: It’s a BIG linear program.

Imagine the following (real) problem. A Telecom company wants toexpand its network in a way in which to meet an unknown (random)demand.

There are 86 unknown demands. Each demand is independent andmay take on one of five values.

S = |Ω| = Π86k=1(5) = 586 = 4.77× 1072

The number of subatomic particles in the universe.

How do we solve a problem that has more variables and moreconstraints than the number of subatomic particles in the universe?


Sampling

But Its Even Worse!

The answer is we can’t!

If Ω is not a countable set say if it is made up of continuous-valuedrandom variables, our “deterministic equivalent” would have ∞variables and constraints. :-)

We solve an approximating problem obtained through sampling.

The Very Good News

Using Monte-Carlo methods (Sample Average Approximation), wecan obtain high-quality solutions

Even Better: Can obtain (statistical) bounds on the quality of thesolution


Sampling

Sample Average Approximation(SAA)

The Story

Solving two-stage SP exactly is often impossible

Solving two-stage SP approximately is often easy: Sample AverageApproximation (SAA)

I view SAA as the Jeff Linderoth of solution methods

It ain’t smartIt ain’t sexyBut it generally does work!


Sampling

SAA for Dummies

Let v∗ be the optimal solution to the “true” problem:

v∗def= min

x∈X

f(x)

def= EωF(x,ω)

Take a sample (ω1, ..., ωN) of N realizations of the vector ω, andform the sample average function

fN(x)def= N−1

N∑j=1

F(x,ωj)

For Stochastic LP w/recourse, evaluate fN(x) ⇒ solve one LP for eachof N scenarios

Optimize sample average function:

vNdef= min

x∈X

fN(x)def= N−1

N∑j=1

F(x,ωj)


Sampling

SAA for Dummies, Cont.

Note that vN is a random variable, as it depends on the (random)sample of size N

From this information, we can get bounds on the optimal solutionvalue v∗

All “Good” Talks Contain...

Thm. E(vN) ≤ v∗ ≤ f(x) ∀x


Sampling

Making SAA Work

Take a solution x from a SAA instanceWe are mostly interested in estimating the quality of a given solutionx. This is f(x) − v∗.

1 Get upper bound on v∗ from f(x). Estimate f(x) by solving N′

(completely independent) linear programs—recourse LP’s with x fixed.

fN′(x)def= (N′)−1

N′∑j=1

F(x,ωj)

2 Get a lower bound on v∗ from E(vN). Estimate E(vN) by solving M

independent stochastic LPs, giving optimal values v1N, v2

N, . . . vMN

E(vN)def= M−1

M∑j=1

vjN

Independent ⇒ no synchronization ⇒ good for the GridIndependent ⇒ can construct confidence intervals around theestimates


Sampling

More Theory

A very interesting result of Shapiro and Homem-de-Mello says thefollowing:Suppose that x? is the unique optimal solution to the ”true” problemLet xN be the solution to the sampled approximating problemUnder certain conditions, the event (xN = x?) happens withprobability 1 for N large enough.The probability of this event approaches 1 exponentially fast asN → ∞!!There exists a constant β such that

limN→∞ N−1 log[1 − P(x = x∗)] ≤ −β.

This is a qualitative result indicating that it might not be necessary tohave a large sample size in order to solve the true problem exactly.For a problem with 51000 scenarios a sample of size N ≈ 400 isrequired in order to find the true optimal solution with probability95%!!!


Sampling

Does SAA Work on “Real” Problems?

M = 10 times – Solve a stochastic sampled approximation of size N.

Compute confidence interval on lower bound estimate E(vN)

Choose one x from solution to M SAA instances and computeconfidence interval on upper bound estimate fN′(x), with N′ = 10000

Test Instances

Name Application |Ω|

LandS HydroPower Planning 106

gbd Aircraft Allocation 6.46× 105

storm Cargo Flight Scheduling 6× 1081

20term Vehicle Assignment 1.1× 1012

ssn Telecom. Network Design 1070


Sampling

20term Convergence

251500

252000

252500

253000

253500

254000

254500

255000

255500

10 100 1000 10000

Val

ue

N

Lower BoundUpper Bound


Sampling

ssn Convergence

2

4

6

8

10

12

14

16

18

10 100 1000 10000

Val

ue

N



Sampling

storm Convergence

1.544e+06

1.545e+06

1.546e+06

1.547e+06

1.548e+06

1.549e+06

1.55e+06

1.551e+06

1.552e+06

1.553e+06

1.554e+06

1.555e+06

10 100 1000 10000

Val

ue

N



Sampling

gbd Convergence

1500

1550

1600

1650

1700

1750

1800

10 100 1000 10000

Val

ue

N



Multistage SPs

Multistage Stochastic LP

ω1

x1

ω2

x2

ω3

xT−1

ωT

xT

Random vectorsω1 ∈ Rn1 ,ω2 ∈Rn2 , . . . , ωT ∈ RnT

Make sequence ofdecisions x1 ∈ X1, x2 ∈X2, . . . , xT ∈ XT .

Risk Neutral: We always aim to optimize the expected value of ourcurrent decision xt

Linear: Assume Xt are polyhedra

Discrete: Assume ωt are drawn from a discrete distribution.

The Hard Part

Decisions made at period t (xt) must only depend on events and decisionsup to period t


Multistage SPs

The Stickler. My Favorite Eight Syllable Word.

We need to enforce nonanticipativity.

Other eight-syllable words...

autosuggestibility, incommensurability, electroencephalogram,unidirectionality

At any point in time, different scenarios “look the same”

We can’t allow different decisions for these scenarios.We are not allowed to anticipate the outcome of future random eventswhen making our decision now.

How to do it?

1 Use Tree Structure (Nested Decomposition)

2 Create (extra) variables for all possible scenarios, and enfroceequality between decisions that should be nonanticipative(Progressive Hedging)


Multistage SPs

Scenario Tree

xnxρ(n)

x0ξ1

ξ2

N: Set of nodes in the tree

ρ(n): Unique predecessor of noden in the tree

S(n): Set of successor nodes of n

qn: Probability that the sequenceof events leading to node n occurs

xn: Decision taken at node n

Warning!

Scenario Trees can get big

There are some tools that try and “prune” the tree while keepingsimilar statistical properties in the stochastic process


Multistage SPs

Multistage Stochastic Programming

Entensive Form

zSP = min

∑n∈N

qncTnxn

∣∣ Tnxρ(n) + Wnxn = hn ∀n ∈ N

Value Function of node n

Qn(xρ(n))def= min

xn

cTnxn +

∑m∈S(n)

qmnQm(xn) | Wnxn = hn − Tnxρ(n)

qmn: conditional probability of node n given node m

Tree structure encodes nonanticipativity


Multistage SPs Algorithm

Nested Decomposition

0: Root node of the scenario tree

x0: Initial state of the system

Recursive Formulation

zSP = Q0(x0)

Cost to go: Gn(x)def=

∑m∈S(n) qmnQm(x)

Mkn(x): Lower bound on Gn(x) in iteration k

Qn(xρ(n)) ≥ minxn

cT

nxn + Mkn(xn)

∣∣ Wnxn = hn − Tnxρ(n)

((MLPn))



Building Mkn(x)

Create a partition (or clustering Cn) of S(n)

A lower bound mkn[j] for each element of the partition (each cluster)

is created independently

Mkn(x)

def=

∑j∈Cn

mkn[j](x)

mkn[j](x)

def= inf

θj

∣∣∣ θje ≥ Fkn[j]x + fk

n[j]

Fk

n[j], fn[j] obtained from dual solutions (to form subgradients) of

linear programs of nodes within cluster [j]

Mkn(x∗) → Gn(x∗)



Action Pictures

x0

ξ1

x0

ξ1

ξ2

x0

ξ1

ξ2ξ3

x0

(Fkn[j], f

kn[j])

x0

(Fkn[j], f

kn[j])

x0

(Fkn[j], f

kn[j])

x0

x0

1 Solve MLP0 to get x0. Sendpolicy forward

2 Solve each MLPS0using x0 and

realizations ξ1

3 Continue forward to end

4 Go backwards. Send cuts fromchildren back to parent. UpdateMLPn and resolve.

5 Lather, Rinse, Repeat.



A small Multistage Telecom Problem

A

B C D

E F

Set of stages T

Set J of links

Sets It of demands

Random demand dt(ξ) ∈ R|It|

Budget each period

Install capacity on links eachperiod to minimize the totalexpected unserved demand



Some (Limited) Computational Results

T = 5

K: Realizations/Period

N: Number of scenarios

DE: Size of deterministicequivalent

K N DE Size30 0.81M 18M * 31M50 6.25M 140M * 236M60 12.9M 290M * 488M



Computational Results

It: Number of iterations (Times MLP0 was solved)

E: Parallel efficiency.

Time machines solving MLPn

Time machines available

K It Avg Workers Wall Time CPU Time E30 9 62 2:34:21 6:15:15:10 6750 7 75 1:12:49:27 85:20:24:15 7760 11 162 3:16:51:00 431:12:15:37 73


Multistage SPs Modeling Tools

Existing Modeling Tools

Many stochastic programming implementations I’m aware of havebeen built from scratch

But there are some modeling tools on the way

Name Author(s) CommentAIMMS AIMMS Team CommercialGams Gams Team CommercialMPL Kristjensen Commercial

XPRESS-SP Verma, Dash Opt. Commercial, BetaSPiNE Valente, CARISMA

STRUMS Fourer and Lopes Prototype(?)SUTIL Czyzyk and Linderoth C++ classesSLPLib Felt, Sarich, Ariyawansa Open Source C Routines

COIN-Smi, SP/OSL COIN, IBM C++ methods



Existing Solution Tools

Most stochastic programming implementations of which I’m aware,merely form and solve extensive form

Other software:

Name Author(s) CommentAIMMS AIMMS Team Commercial, LShaped methodSLP-IOR Kall, Mayer LShaped, Stochastic Decomposition, othersMSLiP Gassmann Nested LShapedSPInE Valente, CARISMA Commercial, LShaped method, may not exist anymoreBNBS Altenstedt Nested LShaped method, Open sourceATR Linderoth, Wright Design to run in parallel. Not simple to build and run



Conclusions


A tool for decision making under uncertainty

Considers the impact of recourse decisionsIt may not be the answer, but it does help you hedge againstupcoming uncertaintyMore importantly, it gets people talking about the impact ofuncertainty in the decision making process

Planning with “mean-value” estimates will not lead to an optimalpolicy

Used with some success in industry

Financial Services (Many successes)Logistics and Supply Chain (Fewer successes, but coming!)

Tools and algorithms are “on the way”



We Want YOU!

To consider using StochasticProgramming as a decisionsupport tool to help managein turbulent times!

Thanks!

I am happy to help. email: [email protected]

http://www.stoprog.org/


http://www.stoprog.org/


Some Take Away Quotes

“If a man will begin with certainty, he shall end in doubts, but ifhe will be content to beign with doubts, he shall end incertainties”

— Francis Bacon

“It is a good thing for theuneducated person to readbooks of quotations”

—Winston Churchill


=1=models and algorithms for stochastic...

Documents