september6,2011 - zccfe.uzh.ch · dynamicprogramming kennethl.judd (joint work with yongyang cai)...
TRANSCRIPT
Dynamic Programming
Kenneth L. Judd
(Joint work with Yongyang Cai)
September 6, 2011
Dynamic Programming
Powerful for solving dynamic stochastic optimization problemsI Based on principle of recursion due to Bellman and IsaacsI Replaces multiperiod optimization problems with a sequence of
two-period problemsApplications
I EconomicsI Business investmentI Life-cycle decisions on labor, consumption, education, portfolio
choiceI Economic policy
I Operations ResearchI Scheduling, queueingI Inventory
I Climate changeI Economic response to climate policiesI Optimal policy response to global warming problems
Canonical Example in EconomicsGeneral Stochastic Accumulation
I Problem:
V (k, θ) = maxct ,`t
E
∞∑t=0
βt u(ct , `t)
kt+1 = F (kt , `t , θt)− ctθt+1 = g(θt , εt)
k0 = k, θ0 = θ.
I State variables:I k: productive capital stock, endogenousI θ: productivity state, exogenous
I The dynamic programming formulation is
V (k, θ) = maxc,`
u(c , `) + βEV (F (k, `, θ)− c , θ+)|θ, (12.1.21)
where θ+ is next period’s θ realization.
DefinitionsDiscrete-Time Dynamic Programming
I Objective:
E
T∑
t=1
π(xt , ut , t) + W (xT+1)
(1)
I X : set of statesI D: the set of controlsI π(x , u, t) payoffs in period t, for x ∈ X at the beginning of period t,
and control u ∈ D is applied in period t.I D(x , t) ⊆ D: controls which are feasible in state x at time t.I F (A; x , u, t) : probability that xt+1 ∈ A ⊂ X conditional on time t
control and stateI Value function
V (x , t) ≡ supU(x,t)
E
T∑
s=t
π(xs , us , s) + W (xT+1)|xt = x
(2)
I Bellman equation
V (x , t) = supu∈D(x,t)
π(x , u, t) + E V (xt+1, t + 1)|xt = x , ut = u
(3)I Existence: boundedness of π is sufficient
Parametric ApproachGeneral Parametric Approach: Approximating T
I For each xj , (TV )(xj) is defined by
vj = (TV )(xj) = maxu∈D(xj )
π(u, xj) + β
ˆV (x+; a)dF (x+|xj , u) (4)
I In practice, we compute the approximation T
vj = (TV )(xj).
= (TV )(xj)
I Integration step: for ωj and xj for some numerical quadrature formula
EV (x+; a)|xj , u) =ˆ
V (x+; a)dF (x+|xj , u)
=
ˆV (g(xj , u, ε); a)dF (ε)
.=
∑`
ω`V (g(xj , u, ε`); a)
I Maximization step: for xi ∈ X , evaluate
vi = (TV )(xi )
I Fitting step:I Data: (vi , xi ), i = 1, · · · , nI Objective: find an a ∈ Rm such that V (x ; a) best fits the dataI Methods: determined by V (x ; a)
Shape-preserving Chebyshev Interpolation
Problem: Instability of Value Function IterationSolution: LP model for shape-preserving Chebyshev Interpolation:
mincj
m−1∑j=0
(c+j + c−j ) +n∑
j=m
(j + 1−m)2(c+j + c−j )
s.t.n∑
j=0
cjT ′j (yi ) > 0 >n∑
j=0
cjT ′′j (yi ) , i = 1, . . . ,m′,
n∑j=0
cjTj (zi ) = vi , i = 1, . . . ,m,
cj − cj = c+j − c−j , j = 0, . . . ,m − 1,
cj = c+j − c−j , j = m, . . . , n,
c+j ≥ 0, c−j ≥ 0, j = 1, . . . , n.
Optimal Growth Example
I Optimal Growth Problem:
V0(k0) = maxc,l
T−1∑t=0
βtu(ct , lt) + βTVT (kT ),
s.t. kt+1 = F (kt , lt)− ct , 0 ≤ t < T
I DP model of optimal growth problem:
Vt(k) = maxc,l
u(c , l) + βVt+1(F (k, l)− c)
Errors of NDP with Chebyshev interpolation (shape-preserving or not)
0 5 10 15 200
0.02
0.04
0.06
0.08
0.1
0.12
time t
LInf
errors for consumption
interpolation w/oshape−preservation
shape−preservinginterpolation
0 5 10 15 202
4
6
8
10
12
14x 10
−3
time t
L1 errors for consumption
interpolation w/oshape−preservation
shape−preservinginterpolation
0 5 10 15 200
0.2
0.4
0.6
0.8
1
1.2
1.4
time t
LInf
errors for labor
interpolation w/oshape−preservation
shape−preservinginterpolation
0 5 10 15 200.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
time t
L1 errors for labor
interpolation w/oshape−preservation
shape−preservinginterpolation
Portfolio Optimization Example
I Wt : wealth at stage t; stocks’ random return: R = (R1, . . . ,Rn);bond’s riskfree return: Rf ;
I St = (St1, . . . ,Stn)>: money in the stocks; Bt = Wt − e>St : moneyin the bond,
I Wt+1 = Rf (Wt − e>St) + R>St
I Multi-Stage Portfolio Optimization Problem:
V0(W0) = maxXt ,0≤t<T
Eu(WT )
I Bellman Equation:
Vt(W ) = maxS
EVt+1(Rf (W − e>S) + R>S)
W : state variable; S : control variables.
Exact optimal bond allocation
1 1.5 2 2.5 3
0
0.05
0.1
bond allocation at times
t=5
t=4
t=3
t=2
t=1
t=0
Errors of Optimal Stock Allocations (shape-preserving or not)
1 2 3 4
−6
−5
−4
−3
−2
−1
Wealth, t=4
log10
(errors of B4/W
4)
0.5 1 1.5 2 2.5 3
−6
−5
−4
−3
−2
−1
Wealth, t=3
log10
(errors of B3/W
3)
1 1.5 2
−6
−5
−4
−3
−2
−1
Wealth, t=2
log10
(errors of B2/W
2)
0.8 1 1.2 1.4 1.6
−6
−5
−4
−3
−2
−1
Wealth, t=1
log10
(errors of B1/W
1)
0.9 0.95 1 1.05 1.1
−6
−5
−4
−3
−2
−1
Wealth, t=0
log10
(errors of B0/W
0)
Errors for Chebyshev interpolationwithout shape−preservation
Errors for shape−preservingChebyshev interpolation
Derivative of Value Functions in General Models
I For an optimization problem,
V (x) = maxy
f (x , y)
s.t. g(x , y) = 0, h(x , y) ≥ 0,
add a trivial control variable z and a trivial constraint x − z = 0:
V (x) = maxy ,z
f (z , y)
s.t. g(z , y) = 0, h(z , y) ≥ 0, x − z = 0.
I Then by the envelope theorem, we get
V ′(x) = λ,
where λ is the shadow price for the trivial constraint x − z = 0.I Idea: use shadow price as new information in approximating value
functions
error of c∗0 error of l∗0
γ η m
0.5 0.1 5
10
20
0.5 1 5
10
20
2 0.1 5
10
20
2 1 5
10
20
8 0.1 5
10
20
8 1 5
10
20
Lagrange Hermite
1.1(−1) 1.2(−2)
6.8(−3) 3.1(−5)
2.3(−5) 1.5(−6)
1.4(−1) 1.4(−2)
7.7(−3) 3.7(−5)
2.6(−5) 6.5(−6)
5.5(−2) 6.1(−3)
3.5(−3) 2.1(−5)
1.6(−5) 1.4(−6)
9.4(−2) 1.1(−2)
5.7(−3) 3.9(−5)
2.8(−5) 4.7(−6)
2.0(−2) 2.2(−3)
1.2(−3) 8.5(−6)
6.1(−6) 1.0(−6)
6.6(−2) 7.2(−3)
3.0(−3) 2.6(−5)
2.0(−5) 0.0(−7)
Lagrange Hermite
1.9(−1) 1.8(−2)
9.9(−3) 4.4(−5)
3.2(−5) 2.3(−6)
6.1(−2) 5.6(−3)
3.1(−3) 1.6(−5)
1.1(−5) 3.0(−6)
2.7(−1) 3.6(−2)
2.0(−2) 1.2(−4)
9.1(−5) 7.6(−6)
1.3(−1) 1.7(−2)
9.2(−3) 6.1(−5)
4.3(−5) 8.0(−6)
3.6(−1) 4.9(−2)
2.7(−2) 1.9(−4)
1.4(−4) 4.4(−6)
3.4(−1) 4.5(−2)
2.0(−2) 1.7(−4)
1.3(−4) 2.1(−7)
Note: a(k) means a× 10k .
Shape-preserving Hermite Spline Interpolation
I Idea: impose shape and use gradient informationI Using Hermite data (xi , vi , si ) : i = 1, . . . ,m,
V (x ; c) = ci1 + ci2(x − xi ) +ci3ci4(x − xi )(x − xi+1)
ci3(x − xi ) + ci4(x − xi+1),
when x ∈ [xi , xi+1], where
ci1 = vi ,
ci2 =vi+1 − vi
xi+1 − xi,
ci3 = si − ci2,ci4 = si+1 − ci2,
for i = 1, . . . ,m − 1.
Errors of Optimal Bond Allocations (Lagrange vs Hermite vsShape-preserving+Hermite)
0.9 0.95 1 1.05 1.1
−6
−5
−4
−3
−2
−1
0
Wealth, t=1
log10
(errors of B1/W
1)
0.8 1 1.2 1.4 1.6
−6
−5
−4
−3
−2
−1
0
Wealth, t=2
log10
(errors of B2/W
2)
1 1.5 2
−6
−5
−4
−3
−2
−1
0
Wealth, t=3
log10
(errors of B3/W
3)
0.5 1 1.5 2 2.5 3
−6
−5
−4
−3
−2
−1
0
Wealth, t=4
log10
(errors of B4/W
4)
1 2 3 4
−6
−5
−4
−3
−2
−1
0
Wealth, t=5
log10
(errors of B5/W
5)
Errors for Chebyshev interpolationusing Lagrange data
Errors for Chebyshev−Hermiteinterpolation using Hermite Data
Errors for Shape−preservingHermite Rational function splineinterpolation using Hermite Data
Proportional Transaction Cost and CRRA Utility
I Separability of wealth W and portfolio fractions x .I If u(W ) = W 1−γ/(1− γ), then Vt(Wt , xt) = W 1−γ
t · gt(xt).I If u(W ) = log(W ), then Vt(Wt , xt) = log(Wt) + ψt(xt).I “No-trade” region: Ωt = xt : (δ+t )∗ = (δ−t )∗ = 0, where
(δ+t )∗ ≥ 0 are fractions of wealth for buying stocks, and (δ−t )∗ ≥ 0are fractions of wealth for selling stocks.
2 stocks with i.i.d. returns at t = 0 (liquidate at t = 6)
0.5 0.55 0.6 0.65 0.7 0.75 0.80.5
0.55
0.6
0.65
0.7
0.75
0.8
Stock 1 fraction
Sto
ck 2
fra
ctio
nNo−trade region at stage t=0
2 stocks with i.i.d. returns at t = 3 (liquidate at t = 6)
0.5 0.55 0.6 0.65 0.7 0.75 0.80.5
0.55
0.6
0.65
0.7
0.75
0.8
Stock 1 fraction
Sto
ck 2
fra
ctio
nNo−trade region at stage t=3
2 stocks with i.i.d. returns at t = 5 (liquidate at t = 6)
0.5 0.55 0.6 0.65 0.7 0.75 0.80.5
0.55
0.6
0.65
0.7
0.75
0.8
Stock 1 fraction
Sto
ck 2
fra
ctio
nNo−trade region at stage t=5
2 stocks with correlated returns at t = 0 (liquidate at t = 6)
0.3 0.4 0.5 0.6 0.7 0.8 0.9−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
Stock 1 fraction
Sto
ck 2
fra
ctio
nNo−trade region at stage t=0
2 stocks with correlated returns at t = 3 (liquidate at t = 6)
0.3 0.4 0.5 0.6 0.7 0.8 0.9−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
Stock 1 fraction
Sto
ck 2
fra
ctio
nNo−trade region at stage t=3
2 stocks with correlated returns at t = 5 (liquidate at t = 6)
0.3 0.4 0.5 0.6 0.7 0.8 0.9−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
Stock 1 fraction
Sto
ck 2
fra
ctio
nNo−trade region at stage t=5
2 stocks with stochastic µ at t = 0 (liquidate at t = 6)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Stock 1 fraction
Sto
ck 2
fra
ctio
nNo−trade region at stage t=0
2 stocks with stochastic µ at t = 3 (liquidate at t = 6)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Stock 1 fraction
Sto
ck 2
fra
ctio
nNo−trade region at stage t=3
2 stocks with stochastic µ at t = 5 (liquidate at t = 6)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Stock 1 fraction
Sto
ck 2
fra
ctio
nNo−trade region at stage t=5
3 correlated stocks at t = 0 (liquidate at t = 6)
0.1
0.2
0.3
0.4
0.1
0.2
0.3
0.4
0.1
0.2
0.3
0.4
3 correlated stocks at t = 3 (liquidate at t = 6)
0.1
0.2
0.3
0.4
0.1
0.2
0.3
0.4
0.1
0.2
0.3
0.4
3 correlated stocks at t = 5 (liquidate at t = 6)
0.1
0.2
0.3
0.4
0.1
0.2
0.3
0.4
0.1
0.2
0.3
0.4
Application of Portfolio Analysis
OptionsI The pricing theory of options assumes that options have no social
valueI Finance people claim that they economize on transaction costs, but
provide no analysisI Cai has shown that there is some value to one option; future work
will examine social value of free entry
1 stock and 1 at-the-money put option at t = 0 (liquidate at t = 6months)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1No−Trade Region, Option Price=0.0462351K
Stock fraction
Option fra
ction
I Put option: strike K , expiration time T , payoff max(K − ST , 0)
I stock price S , utility u(W ) = −W−2/2
Value functions with/without options at t = 0 (liquidate at t = 6)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.483
−0.482
−0.481
−0.48
−0.479
−0.478
−0.477
−0.476
−0.475
−0.474
Value Functions V0(W,S,x,y) at W=1, S=1 and y=0
Stock fraction x
Valu
e
Option unavailableOption available, τ
2=0.02
Option available, τ2=0.01
Option available, τ2=0.005
I (x , y): fractions of money in stock and optionI τ1 = 0.01 and τ2: transaction cost ratios of stock and option
Introduction
I All IAMs (Integrated Assessment Models) are deterministicI Most are myopic, not forward-lookingI This combination makes it impossible for IAMs to consider decisions
in a dynamic, evolving and uncertain worldI We formulate dynamic stochastic general equilibrium extensions of
DICE (Nordhaus)I Conventional wisdom: "Integration of DSGE models with long run
intertemporal models like IGEM is beyond the scientific frontier atthe moment" (Peer Review of ADAGE and IGEM, June 2010)
I Fact: We use multidimensional dynamic programming methods,developed over the past 20 years in Economics, to study dynamicallyoptimal policy responses
DSICE
Cai-Judd-Lontzek DSICE Model:Dynamic Stochastic Integrated Model of Climate and Economy
DSICE = DICE2007− constraint on savings rate , i .e. : s = .22− ad hoc finite difference method+ stochastic production function+ stochastic damage function+ 1-year period length
stochastic means: intrinsic random events within the specific model, notuncertain parameters
I DSICE: solve stochastic optimization problem
max ct ,lt ,µt E
∞∑t=0
βtu(ct , lt)
s.t. kt+1 = (1− δ)kt + Ωt(1− Λt)Yt − ct ,Mt+1 = ΦMMt + (Et , 0, 0)>,
Tt+1 = ΦTTt + (ξ1Ft , 0)>,
ζt+1 = gζ(ζt , ωζt ),
Jt+1 = gJ(Jt , ωJt )
I output: Yt ≡ f (kt , lt , ζt , t) = ζtAtkαt l1−αt
I damages: Ωt ≡ Jt1+π1TAT
t +π2(TATt )2
I ζt : productivity shock, Jt : damage function shock
I emission control effort: Λt ≡ ψ1−θ2t θ1,tµ
θ2t
I Mass of carbon concentration: Mt = (MATt ,MLO
t ,MUPt )>
I Temperature: Tt = (TATt ,T LO
t )>
I Total carbon emission: Et = EInd,t + ELand,t , where
EInd,t = σt(1− µt)(f1(kt , lt , θt , t))
I Total radiative forcing (watts per square meter from 1900):
Ft = η log2(MATt /MAT
0 ) + FEXt
I DP model for DSICE :
Vt(k, ζ, J,M,T ) = maxc,l,µ
u(c , l) + βE[Vt+1(k+, ζ+, J+,M+,T+)]
s.t. k+ = (1− δ)k + Ωt(1− Λt)f (k, l , ζ, t)− c ,M+ = ΦMM + (Et , 0, 0)>,
T+ = ΦTT + (ξ1Ft , 0)>,
ζ+ = gζ(ζ, ωζ),
J+ = gJ(J, ωJ)
Parallel DP Algorithm
I Parallelization in Maximization step in NDP: Compute
vi = maxai∈D(xi ,t)
ut(xi , ai ) + βEV (x+i ;bt+1)|xi , ai,
for each xi ∈ Xt , 1 ≤ i ≤ mt .
I Condor Master-Worker system: distributed parallelization, twoentities: Master processor, a cluster of Worker processors.
Parallelization in Optimal Growth Problems
I Problem size: 4D continuous state k , 4D discrete state θ with64 = 1296 values
I Performance:
Wall clock time for all 3 VFIs 65 hoursTotal time workers were up (alive) 1487 hoursTotal cpu time used by all workers 1358 hoursMinimum task cpu time 557 secondsMaximum task cpu time 4,196 secondsNumber of (different) workers 25Overall Parallel Performance 93.56%
Parallelization in Optimal Growth Problems
Parallel efficiency for various number of worker processors
# Worker Parallel Average task Total wall clockprocessors efficiency CPU time (minute) time (hour)
25 93.56% 21 6554 93.46% 25 33100 86.73% 25 19
Parallelization in Dynamic Portfolio Problems
IProblem size: 6 stocks plus 1 bond, transaction cost, number oftask = 3125.
I Performance:
Wall clock time for all 6 VFIs 1.56 hoursTotal time workers were up (alive) 295 hoursTotal cpu time used by all workers 248 hoursMinimum task cpu time 2 secondsMaximum task cpu time 395 secondsNumber of (different) workers 200Overall Parallel Performance 87.2%
DSICE - DP in an Integrated Model of Climate andEconomy
I All IAMs (Integrated Assessment Models) are deterministicI Most are myopic, not forward-lookingI This combination makes it impossible for IAMs to consider decisions
in a dynamic, evolving and uncertain worldI We formulate dynamic stochastic general equilibrium extensions of
DICE (Nordhaus)I Conventional wisdom: "Integration of DSGE models with long run
intertemporal models like IGEM is beyond the scientific frontier atthe moment" (Peer Review of ADAGE and IGEM, June 2010)
I Fact: We use multidimensional dynamic programming methods,developed over the past 20 years in Economics, to study dynamicallyoptimal policy responses
DSICE
Cai-Judd-Lontzek DSICE Model:Dynamic Stochastic Integrated Model of Climate and Economy
DSICE = DICE2007− time travel for CO2+ stochastic production function+ stochastic damage function+ 1-year period length
stochastic means: intrinsic random events within the specific model, notuncertain parameters
I DSICE: solve stochastic optimization problem
max ct ,lt ,µt E
∞∑t=0
βtu(ct , lt)
s.t. kt+1 = (1− δ)kt + Ωt(1− Λt)Yt − ct ,Mt+1 = ΦMMt + (Et , 0, 0)>,
Tt+1 = ΦTTt + (ξ1Ft , 0)>,
ζt+1 = gζ(ζt , ωζt ),
Jt+1 = gJ(Jt , ωJt )
I output: Yt ≡ f (kt , lt , ζt , t) = ζtAtkαt l1−αt
I damages: Ωt ≡ Jt1+π1TAT
t +π2(TATt )2
I ζt : productivity shock, Jt : damage function shock
I emission control effort: Λt ≡ ψ1−θ2t θ1,tµ
θ2t
I Mass of carbon concentration: Mt = (MATt ,MLO
t ,MUPt )>
I Temperature: Tt = (TATt ,T LO
t )>
I Total carbon emission: Et = EInd,t + ELand,t , where
EInd,t = σt(1− µt)(f1(kt , lt , θt , t))
I Total radiative forcing (watts per square meter from 1900):
Ft = η log2(MATt /MAT
0 ) + FEXt
I DP model for DSICE :
Vt(k, ζ, J,M,T ) = maxc,l,µ
u(c , l) + βE[Vt+1(k+, ζ+, J+,M+,T+)]
s.t. k+ = (1− δ)k + Ωt(1− Λt)f (k, l , ζ, t)− c ,M+ = ΦMM + (Et , 0, 0)>,
T+ = ΦTT + (ξ1Ft , 0)>,
ζ+ = gζ(ζ, ωζ),
J+ = gJ(J, ωJ)
Application: Carbon Tax vs. Cap-and-Trade
Policy AlternativesI Carbon taxI Cap-and-Trade
DSICE was used to compute optimal comovement of carbon tax andpermissible emissions
I Explicitly takes into account unpredictability in economic activityI Optimal policy is a permit supply curve with elasticity between one
and two; a strong version of a price cap
Future Directions for Dynamic Programming
New tools:I There is no curse of dimensionality in either quadrature or
approximation for smooth functions - Griebel and WozniakowskiI Massive parallelization is current supercomputer architecture; DP
fits it wellEconomics and OR
I The traditional ties died in late 1970’sI Economics is now hostile to introduction of OR and applied math
tools; actively suppresses research that does not make economistslook good
I “Soon economists will be so far behind that they will not be able tocatch up”
I Hopefully concerted efforts by economists and OR researchers willprevent this