september6,2011 - zccfe.uzh.ch · dynamicprogramming kennethl.judd (joint work with yongyang cai)...

Dynamic Programming

Kenneth L. Judd

(Joint work with Yongyang Cai)

September 6, 2011

Dynamic Programming

Powerful for solving dynamic stochastic optimization problemsI Based on principle of recursion due to Bellman and IsaacsI Replaces multiperiod optimization problems with a sequence of

two-period problemsApplications

I EconomicsI Business investmentI Life-cycle decisions on labor, consumption, education, portfolio

choiceI Economic policy

I Operations ResearchI Scheduling, queueingI Inventory

I Climate changeI Economic response to climate policiesI Optimal policy response to global warming problems

Canonical Example in EconomicsGeneral Stochastic Accumulation

I Problem:

V (k, θ) = maxct ,`t

E

∞∑t=0

βt u(ct , `t)

kt+1 = F (kt , `t , θt)− ctθt+1 = g(θt , εt)

k0 = k, θ0 = θ.

I State variables:I k: productive capital stock, endogenousI θ: productivity state, exogenous

I The dynamic programming formulation is

V (k, θ) = maxc,`

u(c , `) + βEV (F (k, `, θ)− c , θ+)|θ, (12.1.21)

where θ+ is next period’s θ realization.

DefinitionsDiscrete-Time Dynamic Programming

I Objective:

E

T∑

t=1

π(xt , ut , t) + W (xT+1)

(1)

I X : set of statesI D: the set of controlsI π(x , u, t) payoffs in period t, for x ∈ X at the beginning of period t,

and control u ∈ D is applied in period t.I D(x , t) ⊆ D: controls which are feasible in state x at time t.I F (A; x , u, t) : probability that xt+1 ∈ A ⊂ X conditional on time t

control and stateI Value function

V (x , t) ≡ supU(x,t)

E

T∑

s=t

π(xs , us , s) + W (xT+1)|xt = x

(2)

I Bellman equation

V (x , t) = supu∈D(x,t)

π(x , u, t) + E V (xt+1, t + 1)|xt = x , ut = u

(3)I Existence: boundedness of π is sufficient

Parametric ApproachGeneral Parametric Approach: Approximating T

I For each xj , (TV )(xj) is defined by

vj = (TV )(xj) = maxu∈D(xj )

π(u, xj) + β

ˆV (x+; a)dF (x+|xj , u) (4)

I In practice, we compute the approximation T

vj = (TV )(xj).

= (TV )(xj)

I Integration step: for ωj and xj for some numerical quadrature formula

EV (x+; a)|xj , u) =ˆ

V (x+; a)dF (x+|xj , u)

=

ˆV (g(xj , u, ε); a)dF (ε)

.=

∑`

ω`V (g(xj , u, ε`); a)

I Maximization step: for xi ∈ X , evaluate

vi = (TV )(xi )

I Fitting step:I Data: (vi , xi ), i = 1, · · · , nI Objective: find an a ∈ Rm such that V (x ; a) best fits the dataI Methods: determined by V (x ; a)

Shape-preserving Chebyshev Interpolation

Problem: Instability of Value Function IterationSolution: LP model for shape-preserving Chebyshev Interpolation:

mincj

m−1∑j=0

(c+j + c−j ) +n∑

j=m

(j + 1−m)2(c+j + c−j )

s.t.n∑

j=0

cjT ′j (yi ) > 0 >n∑

j=0

cjT ′′j (yi ) , i = 1, . . . ,m′,

n∑j=0

cjTj (zi ) = vi , i = 1, . . . ,m,

cj − cj = c+j − c−j , j = 0, . . . ,m − 1,

cj = c+j − c−j , j = m, . . . , n,

c+j ≥ 0, c−j ≥ 0, j = 1, . . . , n.

Optimal Growth Example

I Optimal Growth Problem:

V0(k0) = maxc,l

T−1∑t=0

βtu(ct , lt) + βTVT (kT ),

s.t. kt+1 = F (kt , lt)− ct , 0 ≤ t < T

I DP model of optimal growth problem:

Vt(k) = maxc,l

u(c , l) + βVt+1(F (k, l)− c)

Errors of NDP with Chebyshev interpolation (shape-preserving or not)

0 5 10 15 200

0.02

0.04

0.06

0.08

0.1

0.12

time t

LInf

errors for consumption

interpolation w/oshape−preservation

shape−preservinginterpolation

0 5 10 15 202

4

6

8

10

12

14x 10

−3

time t

L1 errors for consumption



0 5 10 15 200

0.2

0.4

0.6

0.8

1

1.2

1.4

time t

LInf

errors for labor



0 5 10 15 200.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

time t

L1 errors for labor



Portfolio Optimization Example

I Wt : wealth at stage t; stocks’ random return: R = (R1, . . . ,Rn);bond’s riskfree return: Rf ;

I St = (St1, . . . ,Stn)>: money in the stocks; Bt = Wt − e>St : moneyin the bond,

I Wt+1 = Rf (Wt − e>St) + R>St

I Multi-Stage Portfolio Optimization Problem:

V0(W0) = maxXt ,0≤t<T

Eu(WT )

I Bellman Equation:

Vt(W ) = maxS

EVt+1(Rf (W − e>S) + R>S)

W : state variable; S : control variables.

Exact optimal bond allocation

1 1.5 2 2.5 3

0

0.05

0.1

bond allocation at times

t=5

t=4

t=3

t=2

t=1

t=0

Errors of Optimal Stock Allocations (shape-preserving or not)

1 2 3 4

−6

−5

−4

−3

−2

−1

Wealth, t=4

log10

(errors of B4/W

4)

0.5 1 1.5 2 2.5 3

−6

−5

−4

−3

−2

−1

Wealth, t=3

log10

(errors of B3/W

3)

1 1.5 2

−6

−5

−4

−3

−2

−1

Wealth, t=2

log10

(errors of B2/W

2)

0.8 1 1.2 1.4 1.6

−6

−5

−4

−3

−2

−1

Wealth, t=1

log10

(errors of B1/W

1)

0.9 0.95 1 1.05 1.1

−6

−5

−4

−3

−2

−1

Wealth, t=0

log10

(errors of B0/W

0)

Errors for Chebyshev interpolationwithout shape−preservation

Errors for shape−preservingChebyshev interpolation

Derivative of Value Functions in General Models

I For an optimization problem,

V (x) = maxy

f (x , y)

s.t. g(x , y) = 0, h(x , y) ≥ 0,

add a trivial control variable z and a trivial constraint x − z = 0:

V (x) = maxy ,z

f (z , y)

s.t. g(z , y) = 0, h(z , y) ≥ 0, x − z = 0.

I Then by the envelope theorem, we get

V ′(x) = λ,

where λ is the shadow price for the trivial constraint x − z = 0.I Idea: use shadow price as new information in approximating value

functions

error of c∗0 error of l∗0

γ η m

0.5 0.1 5

10

20

0.5 1 5

10

20

2 0.1 5

10

20

2 1 5

10

20

8 0.1 5

10

20

8 1 5

10

20

Lagrange Hermite

1.1(−1) 1.2(−2)

6.8(−3) 3.1(−5)

2.3(−5) 1.5(−6)

1.4(−1) 1.4(−2)

7.7(−3) 3.7(−5)

2.6(−5) 6.5(−6)

5.5(−2) 6.1(−3)

3.5(−3) 2.1(−5)

1.6(−5) 1.4(−6)

9.4(−2) 1.1(−2)

5.7(−3) 3.9(−5)

2.8(−5) 4.7(−6)

2.0(−2) 2.2(−3)

1.2(−3) 8.5(−6)

6.1(−6) 1.0(−6)

6.6(−2) 7.2(−3)

3.0(−3) 2.6(−5)

2.0(−5) 0.0(−7)

Lagrange Hermite

1.9(−1) 1.8(−2)

9.9(−3) 4.4(−5)

3.2(−5) 2.3(−6)

6.1(−2) 5.6(−3)

3.1(−3) 1.6(−5)

1.1(−5) 3.0(−6)

2.7(−1) 3.6(−2)

2.0(−2) 1.2(−4)

9.1(−5) 7.6(−6)

1.3(−1) 1.7(−2)

9.2(−3) 6.1(−5)

4.3(−5) 8.0(−6)

3.6(−1) 4.9(−2)

2.7(−2) 1.9(−4)

1.4(−4) 4.4(−6)

3.4(−1) 4.5(−2)

2.0(−2) 1.7(−4)

1.3(−4) 2.1(−7)

Note: a(k) means a× 10k .

Shape-preserving Hermite Spline Interpolation

I Idea: impose shape and use gradient informationI Using Hermite data (xi , vi , si ) : i = 1, . . . ,m,

V (x ; c) = ci1 + ci2(x − xi ) +ci3ci4(x − xi )(x − xi+1)

ci3(x − xi ) + ci4(x − xi+1),

when x ∈ [xi , xi+1], where

ci1 = vi ,

ci2 =vi+1 − vi

xi+1 − xi,

ci3 = si − ci2,ci4 = si+1 − ci2,

for i = 1, . . . ,m − 1.

Errors of Optimal Bond Allocations (Lagrange vs Hermite vsShape-preserving+Hermite)

0.9 0.95 1 1.05 1.1

−6

−5

−4

−3

−2

−1

0

Wealth, t=1

log10

(errors of B1/W

1)

0.8 1 1.2 1.4 1.6

−6

−5

−4

−3

−2

−1

0

Wealth, t=2

log10

(errors of B2/W

2)

1 1.5 2

−6

−5

−4

−3

−2

−1

0

Wealth, t=3

log10

(errors of B3/W

3)

0.5 1 1.5 2 2.5 3

−6

−5

−4

−3

−2

−1

0

Wealth, t=4

log10

(errors of B4/W

4)

1 2 3 4

−6

−5

−4

−3

−2

−1

0

Wealth, t=5

log10

(errors of B5/W

5)

Errors for Chebyshev interpolationusing Lagrange data

Errors for Chebyshev−Hermiteinterpolation using Hermite Data

Errors for Shape−preservingHermite Rational function splineinterpolation using Hermite Data

Proportional Transaction Cost and CRRA Utility

I Separability of wealth W and portfolio fractions x .I If u(W ) = W 1−γ/(1− γ), then Vt(Wt , xt) = W 1−γ

t · gt(xt).I If u(W ) = log(W ), then Vt(Wt , xt) = log(Wt) + ψt(xt).I “No-trade” region: Ωt = xt : (δ+t )∗ = (δ−t )∗ = 0, where

(δ+t )∗ ≥ 0 are fractions of wealth for buying stocks, and (δ−t )∗ ≥ 0are fractions of wealth for selling stocks.

2 stocks with i.i.d. returns at t = 0 (liquidate at t = 6)

0.5 0.55 0.6 0.65 0.7 0.75 0.80.5

0.55

0.6

0.65

0.7

0.75

0.8

Stock 1 fraction

Sto

ck 2

fra

ctio

nNo−trade region at stage t=0


0.5 0.55 0.6 0.65 0.7 0.75 0.80.5

0.55

0.6

0.65

0.7

0.75

0.8

Stock 1 fraction

Sto

ck 2

fra

ctio


2 stocks with correlated returns at t = 0 (liquidate at t = 6)

0.3 0.4 0.5 0.6 0.7 0.8 0.9−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

Stock 1 fraction

Sto

ck 2

fra

ctio



0.3 0.4 0.5 0.6 0.7 0.8 0.9−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

Stock 1 fraction

Sto

ck 2

fra

ctio


2 stocks with stochastic µ at t = 0 (liquidate at t = 6)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Stock 1 fraction

Sto

ck 2

fra

ctio



0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Stock 1 fraction

Sto

ck 2

fra

ctio


3 correlated stocks at t = 0 (liquidate at t = 6)

0.1

0.2

0.3

0.4

0.1

0.2

0.3

0.4

0.1

0.2

0.3

0.4


0.1

0.2

0.3

0.4

0.1

0.2

0.3

0.4

0.1

0.2

0.3

0.4

Application of Portfolio Analysis

OptionsI The pricing theory of options assumes that options have no social

valueI Finance people claim that they economize on transaction costs, but

provide no analysisI Cai has shown that there is some value to one option; future work

will examine social value of free entry

1 stock and 1 at-the-money put option at t = 0 (liquidate at t = 6months)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1No−Trade Region, Option Price=0.0462351K

Stock fraction

Option fra

ction

I Put option: strike K , expiration time T , payoff max(K − ST , 0)

I stock price S , utility u(W ) = −W−2/2

Value functions with/without options at t = 0 (liquidate at t = 6)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.483

−0.482

−0.481

−0.48

−0.479

−0.478

−0.477

−0.476

−0.475

−0.474

Value Functions V0(W,S,x,y) at W=1, S=1 and y=0

Stock fraction x

Valu

e

Option unavailableOption available, τ

2=0.02

Option available, τ2=0.01

Option available, τ2=0.005

I (x , y): fractions of money in stock and optionI τ1 = 0.01 and τ2: transaction cost ratios of stock and option

Introduction

I All IAMs (Integrated Assessment Models) are deterministicI Most are myopic, not forward-lookingI This combination makes it impossible for IAMs to consider decisions

in a dynamic, evolving and uncertain worldI We formulate dynamic stochastic general equilibrium extensions of

DICE (Nordhaus)I Conventional wisdom: "Integration of DSGE models with long run

intertemporal models like IGEM is beyond the scientific frontier atthe moment" (Peer Review of ADAGE and IGEM, June 2010)

I Fact: We use multidimensional dynamic programming methods,developed over the past 20 years in Economics, to study dynamicallyoptimal policy responses

DSICE

Cai-Judd-Lontzek DSICE Model:Dynamic Stochastic Integrated Model of Climate and Economy

DSICE = DICE2007− constraint on savings rate , i .e. : s = .22− ad hoc finite difference method+ stochastic production function+ stochastic damage function+ 1-year period length

stochastic means: intrinsic random events within the specific model, notuncertain parameters

I DSICE: solve stochastic optimization problem

max ct ,lt ,µt E

∞∑t=0

βtu(ct , lt)

s.t. kt+1 = (1− δ)kt + Ωt(1− Λt)Yt − ct ,Mt+1 = ΦMMt + (Et , 0, 0)>,

Tt+1 = ΦTTt + (ξ1Ft , 0)>,

ζt+1 = gζ(ζt , ωζt ),

Jt+1 = gJ(Jt , ωJt )

I output: Yt ≡ f (kt , lt , ζt , t) = ζtAtkαt l1−αt

I damages: Ωt ≡ Jt1+π1TAT

t +π2(TATt )2

I ζt : productivity shock, Jt : damage function shock

I emission control effort: Λt ≡ ψ1−θ2t θ1,tµ

θ2t

I Mass of carbon concentration: Mt = (MATt ,MLO

t ,MUPt )>

I Temperature: Tt = (TATt ,T LO

t )>

I Total carbon emission: Et = EInd,t + ELand,t , where

EInd,t = σt(1− µt)(f1(kt , lt , θt , t))

I Total radiative forcing (watts per square meter from 1900):

Ft = η log2(MATt /MAT

0 ) + FEXt

I DP model for DSICE :

Vt(k, ζ, J,M,T ) = maxc,l,µ

u(c , l) + βE[Vt+1(k+, ζ+, J+,M+,T+)]

s.t. k+ = (1− δ)k + Ωt(1− Λt)f (k, l , ζ, t)− c ,M+ = ΦMM + (Et , 0, 0)>,

T+ = ΦTT + (ξ1Ft , 0)>,

ζ+ = gζ(ζ, ωζ),

J+ = gJ(J, ωJ)

Parallel DP Algorithm

I Parallelization in Maximization step in NDP: Compute

vi = maxai∈D(xi ,t)

ut(xi , ai ) + βEV (x+i ;bt+1)|xi , ai,

for each xi ∈ Xt , 1 ≤ i ≤ mt .

I Condor Master-Worker system: distributed parallelization, twoentities: Master processor, a cluster of Worker processors.

Parallelization in Optimal Growth Problems

I Problem size: 4D continuous state k , 4D discrete state θ with64 = 1296 values

I Performance:

Wall clock time for all 3 VFIs 65 hoursTotal time workers were up (alive) 1487 hoursTotal cpu time used by all workers 1358 hoursMinimum task cpu time 557 secondsMaximum task cpu time 4,196 secondsNumber of (different) workers 25Overall Parallel Performance 93.56%

Parallelization in Optimal Growth Problems

Parallel efficiency for various number of worker processors

# Worker Parallel Average task Total wall clockprocessors efficiency CPU time (minute) time (hour)

25 93.56% 21 6554 93.46% 25 33100 86.73% 25 19

Parallelization in Dynamic Portfolio Problems

IProblem size: 6 stocks plus 1 bond, transaction cost, number oftask = 3125.

I Performance:

Wall clock time for all 6 VFIs 1.56 hoursTotal time workers were up (alive) 295 hoursTotal cpu time used by all workers 248 hoursMinimum task cpu time 2 secondsMaximum task cpu time 395 secondsNumber of (different) workers 200Overall Parallel Performance 87.2%

DSICE - DP in an Integrated Model of Climate andEconomy

I All IAMs (Integrated Assessment Models) are deterministicI Most are myopic, not forward-lookingI This combination makes it impossible for IAMs to consider decisions

in a dynamic, evolving and uncertain worldI We formulate dynamic stochastic general equilibrium extensions of

DICE (Nordhaus)I Conventional wisdom: "Integration of DSGE models with long run

intertemporal models like IGEM is beyond the scientific frontier atthe moment" (Peer Review of ADAGE and IGEM, June 2010)

I Fact: We use multidimensional dynamic programming methods,developed over the past 20 years in Economics, to study dynamicallyoptimal policy responses

DSICE

Cai-Judd-Lontzek DSICE Model:Dynamic Stochastic Integrated Model of Climate and Economy

DSICE = DICE2007− time travel for CO2+ stochastic production function+ stochastic damage function+ 1-year period length

stochastic means: intrinsic random events within the specific model, notuncertain parameters

I DSICE: solve stochastic optimization problem

max ct ,lt ,µt E

∞∑t=0

βtu(ct , lt)

s.t. kt+1 = (1− δ)kt + Ωt(1− Λt)Yt − ct ,Mt+1 = ΦMMt + (Et , 0, 0)>,

Tt+1 = ΦTTt + (ξ1Ft , 0)>,

ζt+1 = gζ(ζt , ωζt ),

Jt+1 = gJ(Jt , ωJt )

I output: Yt ≡ f (kt , lt , ζt , t) = ζtAtkαt l1−αt

I damages: Ωt ≡ Jt1+π1TAT

t +π2(TATt )2

I ζt : productivity shock, Jt : damage function shock

I emission control effort: Λt ≡ ψ1−θ2t θ1,tµ

θ2t

I Mass of carbon concentration: Mt = (MATt ,MLO

t ,MUPt )>

I Temperature: Tt = (TATt ,T LO

t )>

I Total carbon emission: Et = EInd,t + ELand,t , where

EInd,t = σt(1− µt)(f1(kt , lt , θt , t))

I Total radiative forcing (watts per square meter from 1900):

Ft = η log2(MATt /MAT

0 ) + FEXt

I DP model for DSICE :

Vt(k, ζ, J,M,T ) = maxc,l,µ

u(c , l) + βE[Vt+1(k+, ζ+, J+,M+,T+)]

s.t. k+ = (1− δ)k + Ωt(1− Λt)f (k, l , ζ, t)− c ,M+ = ΦMM + (Et , 0, 0)>,

T+ = ΦTT + (ξ1Ft , 0)>,

ζ+ = gζ(ζ, ωζ),

J+ = gJ(J, ωJ)

Application: Carbon Tax vs. Cap-and-Trade

Policy AlternativesI Carbon taxI Cap-and-Trade

DSICE was used to compute optimal comovement of carbon tax andpermissible emissions

I Explicitly takes into account unpredictability in economic activityI Optimal policy is a permit supply curve with elasticity between one

and two; a strong version of a price cap

Future Directions for Dynamic Programming

New tools:I There is no curse of dimensionality in either quadrature or

approximation for smooth functions - Griebel and WozniakowskiI Massive parallelization is current supercomputer architecture; DP

fits it wellEconomics and OR

I The traditional ties died in late 1970’sI Economics is now hostile to introduction of OR and applied math

tools; actively suppresses research that does not make economistslook good

I “Soon economists will be so far behind that they will not be able tocatch up”

I Hopefully concerted efforts by economists and OR researchers willprevent this

september6,2011 - zccfe.uzh.ch · dynamicprogramming kennethl.judd (joint work with yongyang cai)...

Documents