Dynamic Programming
Lecture 13 (5/21/2014)
- A Forest Thinning Example -
1260
2650
3535
4410
5850
6750
7650
8600
9500
10400
11260
0
50
100
0150
200
850
750
650
600
500
400
0
0
100
175
150
75
Stage1
Stage2
Stage3
Age10
Age20
Age30
Age10
Source: Dykstra’s Mathematical Programming for Natural Resource Management (1984)
Projected yield (m3/ha) at age 20 as function of action taken at age 10
Beginning Volume Residual Ten-year Volumevolume thinned volume growth at age 20
260 0 260 390 650260 50 210 325 535260 100 160 250 410
Projected yield (m3/ha) at age 30 as function of beginning volume at age 20 and action taken at age 20
Beginning Volume Residual Ten-year Volumevolume thinned volume growth at age 30
0 650 200 850150 500 250 750200 450 200 650
0 535 215 750100 435 215 650175 360 140 500
0 410 190 60075 335 165 500
150 260 140 400410
Age 10
Age 20
650
535
Dynamic Programming Cont.
• Stages, states and actions (decisions)
• Backward and forward recursion
decision variable: immediate destination node at stage i;
( , ) the maximum total volume harvested during all remaining stages,
given that the stand has reached the state corr
i
i i
x
f s x
*
* * *
esponding to node s
at stage i and the immediate destination node is ;
the value of that maximizes ( , ); and
( ) the maximum value of ( , ), i.e., ( ) ( ,
i
i i i i
i i i i i i
x
x x f s x
f s f s x f s f s x
).
Solving Dynamic Programs
• Recursive relation at stage i (Bellman Equation):
* *, 1
,
( ) max ( ) ,
where is the volume harvested by thinning at the beginning of stage
in order to move the stand from state at the beginning of
s
ii
i
i s x i ix
s x
f s H f x
H i
s
tage to state at the end of stage .ii x i
Dynamic Programming
• Structural requirements of DP– The problem can be divided into stages– Each stage has a finite set of associated
states (discrete state DP)– The impact of a policy decision to transform a
state in a given stage to another state in the subsequent stage is deterministic
– Principle of optimality: given the current state of the system, the optimal policy for the remaining stages is independent of any prior policy adopted
Examples of DP
• The Floyd-Warshall Algorithm (used in the Bucket formulation of ARM):
Let ( , ) denote the length (or weight) of the path
between node and ; and
Let ( , , ) denote the length (or total weight) of
the shortest path between nodes i and
c i j
i j
s i j k
j
going through intermediate node 1, 2, ...,k.
Then, the following recursion will give the length of the
shortest paths between all pairs of nodes in a graph:
( , ) if 1( , , )
min ( , , 1), ( , , 1) ( , , 1) otherwise
c i j ks i j k
s i j k s i k k s k j k
Examples of DP cont.• Minimizing the risk of losing an endangered species
(non-linear DP):
Source: Buongiorno and Gilles’ (2003) Decision Methods for Forest Resource Management
Probability of Project Failure
Project (stage) $0 $1M $2M1 50% 30% 20%2 70% 50% 30%3 80% 50% 40%
Funding level (state)
$2M
$2M
$1M
$0
$0
$1M
$2M
Stage 1 Stage 2 Stage 3
$2M$0
$1M
$1M
$0
$2M
$0
$1M
$0
$2M
$2M
$1M
$0
Minimizing the risk of losing an endangered species (DP example)
• Stages (t=1,2,3) represent $ allocation decisions for Project 1, 2 and 3;• States (i=0,1,2) represent the budgets available at each stage t;• Decisions (j=0,1,2) represent the budgets available at stage t+1;• denotes the probability of failure of Project t if decision j is made (i.e., i-j is spent on Project t); and• is the smallest probability of total failure from stage t onward, starting in state i and making the best decision j*.
( , )tp i j
*( )tV i
* *1( ) min ( , ) ( )t t t
jV i p i j V j
Then, the recursive relation can be stated as:
Markov Chains(based on Buongiorno and Gilles 2003)
Volume by stand statesState i Volume (m3/ha)
L <400M 400-700H 700<
20-yr Transition probabilities w/o managementStartState i L M H
L 40% 60% 0%M 0% 30% 70%H 5% 5% 90%
End State j
Transition probability matrix P
= probability distribution of stand states in period tp t
-1 t tp p P t
• Vector pt converges to a vector of steady-state probabilities p*.• p* is independent of p0!
Markov Chains cont.
• Berger-Parker Landscape Index
• Mean Residence Times
• Mean Recurrence Times
max( , , )Lt Mt Ht
tLt Mt Ht
p p pBP
p p p
,where is the probability that the stand
is in state , or in period .itp
i L M H t
(1 )iii
Dm
p
,where D is the length of each period, and
is the probability that a stand in state
in the beginning of period stays there till
the end of period .
iip i
t
t
iii
Dm
,where is the steady-state probability of state . i i
Markov Chains cont.• Forest dynamics (expected revenues or biodiversity with
vs. w/o management:
20-yr Transition probabilities w/o managementStartState i L M H
L 40% 60% 0%M 0% 30% 70%H 5% 5% 90%
20-yr Transition probabilities w/ managementStartState i L M H
L 40% 60% 0%M 0% 30% 70%H 40% 60% 0%
End State j
End State j
Expected long-term biodiversity:
Expected long-term periodic income: L L M M H H
L L M M H H
B B B B
R R R R
Markov Chains cont.
• Present value of expected returns
, 1 20
,
1
(1 )
where is the present value of the expected return from
a stand in state in state = , or managed with a
specific harvest policy with periods to go befo
i t i iL Lt iM Mt iH Ht
i t
V R p V p V p Vr
V
i L M H
t
re the end of
the planning horizon. is the immediate return from
managing a stand in state with the given harvest policy.
, , are the probabilities that a stand in state moves to
state
i
iL iM iH
R
i
p p p i
, or , respectively.L M H
Markov Decision Processes20-yr Transition probabilities depending on the harvest decision
Start StartState i L M H State i L M H
L 40% 60% 0% L 40% 60% 0%M 0% 30% 70% M 40% 60% 0%H 5% 5% 90% H 40% 60% 0%
End State j End State jNo Cut Cut
* * * *, 1 20
*, 1
1max
(1 )
where is the highest present value of the expected return from
a stand in state in state = , or managed with a
specific harvest pol
i t ij iLj Lt iMj Mt iHj Htj
i t
V R p V p V p Vr
V
i L M H
icy with periods to go before the end of
the planning horizon. is the immediate return from
managing a stand in state with harvest policy .
, , are the probabilities that a stand
ij
iLj iMj iHj
t
R
i j
p p p in state moves to
state , or , respectively if the harvest policy is .
i
L M H j