solving sequential decision problems via continuation values1 · 2016-10-13 · solving sequential...
Post on 14-Jul-2020
10 Views
Preview:
TRANSCRIPT
Solving Sequential Decision Problems via Continuation
Values1
Qingyin Maa and John Stachurskib
a, bResearch School of Economics, Australian National University
September 12, 2016
ABSTRACT. We study a solution method for sequential decision problems based around the
continuation value function, rather than the value function. This approach turns to have signif-
icant advantages. One is that continuation value functions are smoother, allowing for sharper
analysis of optimal policies and more efficient computation. Another is that, for a range of
problems, the continuation value function exists in a lower dimensional space than the value
function, mitigating the curse of dimensionality. In one typical experiment, the lower state
dimension reduces computation time from over a week to less than three minutes.
1The authors thank the Australian Research Council Discovery Grant DP120100321.
Email addresses: qingyin.ma@anu.edu.au, john.stachurski@anu.edu.au1
2
PREFACE
Thesis Title: Essays on Sequential Decision Problems in Economic Dynamics
Supervisor: Prof. John Stachurski
In many economic problems, agents located in a stochastically evolving environment must
choose between acting now or waiting for a better opportunity. These problems can be mod-
eled in an optimal stopping framework. The thesis attempts to provide a systematic analysis
to this class of problem. Three main contributions are made:
Firstly, the thesis extends the standard dynamic programming theory by providing a system-
atic treatment to unbounded returns in optimal stopping problems. Under general settings
where unbounded return functions are permitted, the thesis provides easy-to-check suffi-
cient conditions for the existence and uniqueness of solutions to the Bellman equation, and
the unique fixed point of the Bellman operator is shown to be the value function (VF). The
theory is applicable to a broad class of applications in economics and finance.
Secondly, the thesis proposes an alternative approach to solve optimal stopping problems.
The idea involves calculating the continuation value function (CVF) directly, and has signifi-
cant advantages over standard approaches based on VF: (1) In a wide range of economic ap-
plications, CVF exists in a lower dimensional space than VF, while the converse never holds.
This allows us to mitigate one of the primary stumbling blocks for numerical analysis—the
curse of dimensionality. (2) CVF is typically smoother than VF, which is easier to approxi-
mate numerically. (3) CVF-based approach allows a sharp analysis of the optimal policy.
Finally, some preliminary extensions have been done so far: the theory is shown to work
well for repeated optimal stopping problems. The next stage is to build a unified theoretical
framework that treats optimal stopping problems with recursive preferences.
Although only applications of economics are presented, the theory developed contributes to
many other areas, including mathematical finance, operations research, sequential analysis,
and so on. The thesis is structured as follows:
Chapter I: Introduction; Chapter II: Optimal Stopping with Unbounded Returns; Chap-
ter III: The Continuation Value Based Approach; Chapter IV: Extensions; Chapter V:
Conclusions.
This paper is mainly based on results presented in Chapter III.
3
1. INTRODUCTION
In many economic problems, agents face stochastically evolving environments and choose
between acting immediately or waiting for a better opportunity. One such scenario is that
faced by job seekers, who can either accept their current wage offer or continue job hunting
(see, e.g., McCall (1970) or Pissarides (2000)). Another one is that faced by firms choosing
whether to enter a market or to wait, or to exit when incumbent (e.g., Jovanovic (1982),
Hopenhayn (1992), Ericson and Pakes (1995), Fajgelbaum et al. (2015)). Other problems in
this category include American call and put options (Karatzas and Shreve (1998), Shiryaev
(1999), Duffie (2010)), consumer search problems (Burdett and Judd (1983), Kiyotaki and
Wright (1993), Trejos and Wright (1995), Shi (1995, 1997)), optimal default (Choi et al. (2003),
Albuquerque and Hopenhayn (2004), Arellano (2008)), optimal replacement of durable goods
(Rust (1986, 1987)), optimal timing of investment (Dixit and Pindyck (1994)), timing of re-
tirement (Huggett et al. (2011)), timing of harvesting agricultural products (Insley and Wir-
janto (2010)) and optimal monopoly pricing with unknown demand across multiple markets
(Rothschild (1974)).2
In solving these problems, the standard path is to first seek the value function, which gives
maximal expected rewards from the flow of possible payoffs. From the value function, one
can calculate the continuation value by taking the expectation of the value function in the
next period, appropriately discounted and combined with flow benefits from continuation.
Once the continuation value is obtained, it can be compared with the reward from stopping.
The optimal policy is to stop if and only if the reward from stopping is larger.
An alternative approach was introduced by Jovanovic (1982) in the context of firm exit de-
cisions. The idea involves calculating the continuation value directly, using an operator
that we refer to below as the continuation value operator. In this paper, we show that Jo-
vanovic’s approach extends naturally to almost all optimal stopping problems of interest to
economists. We systematically study the method and its relationship to traditional dynamic
programming. We show that, for many problems, this method has significant advantages
over traditional methods based around the value function.
2Optimal stopping problems have major roles in other related fields. For example, in finance, American op-
tions provide the right to buy or sell an asset at a predetermined price or continue to the next period (Duffie
(2010)). Analysis of options in financial markets has led to the study of various economic and political decisions
using the framework of real options (Alvarez and Dixit (2014), Backus (2014)). Within operations research, prob-
lems such as adaptive routing and optimal dynamic mechanism design are solved using the theory of optimal
stopping.
4
One advantage is that, for a range of interesting problems, the continuation value function
exists in a lower dimensional space than the value function.3 For example, in the classic job
search model of McCall (1970), wage offers are independent draws from a fixed distribution.
The current offer affects lifetime rewards only if the agent decides to accept the offer. If not,
then the process updates with the preceding draw forgotten. Hence the current wage draw
appears in the value function—since the offer in hand can impact lifetime rewards—but has
no impact on the continuation value.4
The practical impact of lower dimensionality can be very large, as has been pointed out by
many authors (see, e.g., Bellman (1969) or Rust (1997)). For example, while solving a well
known version of the job search model in Section 5.1, we find that the continuation value
based approach takes only 171 seconds to compute the optimal policy to a given level of
accuracy, as opposed to more than 7 days for the value function iteration approach.
A second potential benefit of the continuation value based approach is that the continuation
value function is often smoother than the value function. The intuition behind this result
is that the value function is typically kinked at points where it is optimal to switch between
continuing and stopping. However, when transitions are stochastic and shocks have a degree
of smoothness (for example, the distributions have densities), such kinks are smoothed out
in the continuation value function. As a result, the continuation value function becomes
easier to approximate and more useful for making inferences about the optimal policy. For
example, we use smoothness in the continuation value function to obtain new results on the
differentiability of transition thresholds (e.g., reservation wages) as functions of other state
variables.
In extending Jovanovic’s continuation value function method to the whole spectrum of se-
quential decision problems used by economists, several challenges must be addressed. One
is that, in many applications, rewards are unbounded, meaning that traditional methods
based around contractions with respect to supremum norms do not apply.5 To this end, we
study the continuation value function in general settings where unbounded payoff functions
3While every state variable that appears in the continuation value function must appear in the value function,
the converse is not true. Hence, the number of arguments in the continuation value function is always weakly
less than the number of arguments in the value function, and sometimes strictly so.4Of course the current wage offer could affect the continuation value in a variety of ways, some of which are
considered below. For example, McCall (1970) considers a mechanism where the the current offer matters for
the state of knowledge, as described a belief distribution. In this case, however, the value function is still higher
dimensional than the continuation value function, since the value function must track both the current offer and
the parameters in the belief distribution, while the continuation value function tracks only the latter.5While in some cases this problem can be eliminated by compactifying the state space to the underlying
model, in other cases such changes are problematic. For example, wages might be driven by a state process with
unit root (see Example 2.2 below), in which case the state space cannot be compactified. Alternatively, in studies
5
are allowed. This is achieved by using weighted supremum norms. This approach turns out
to interact well with the continuation value function operator, leading to simple sufficient
conditions that are straightforward to check in applications.
Since we tackle unbounded problems, our research is also connected to earlier studies on
unbounded dynamic programming. In economics, the weighted supremum norm approach
was pioneered by Boyd (1990) and has been used in numerous other studies of unbounded
dynamic programming.6 When adapting this method to continuation value functions, we
find it possible to develop a simple and direct version of the methodology that includes
bounded problems as a special case.
Another line of research treats unboundedness via the local contraction approach, which
constructs a local contraction based on a suitable sequence of increasing compact subsets.
See, for example, Rincon-Zapatero and Rodrıguez-Palmero (2003, 2009), Martins-da Rocha
and Vailakis (2010) and Matkowski and Nowak (2011). One of the motivations of this line of
work is to deal with dynamic programming problems that are unbounded both above and
below. For our problem, we show that the weighted supremum norm based method can
tackle this case effectively, and hence we do not consider local contractions.
The paper is structured as follows. Section 2 outlines the method and provides the basic op-
timality results. Section 3 discusses the properties of the continuation value function, such
as continuity and differentiability. Section 4 explores the connections between the continu-
ation value and the optimal policy. Section 5 compares the computational efficiency of our
approach with the value function approach. Section 6 concludes. Proofs are provided in the
appendix.7
2. OPTIMALITY RESULTS
This section studies the optimality results. Prior to discussing technical details, we first give
an overview of the method and our terminology.
2.1. Overview. Consider a decision problem where an agent is faced at each point in time
with the choice between stopping (e.g., exercising an option, exiting a market, accepting a
of firm decisions, interest might center on the tails of the firm size distribution, so compactifying the state space
is undesirable.6Examples include Becker and Boyd (1997), Alvarez and Stokey (1998), Duran (2000), Duran (2003) and Le Van
and Vailakis (2005).7Due to the page limit, the Appendix section has been cut short a lot. However, a complete technical appendix
is available upon request.
6
job) or continuing to the next stage. Suppose that the value function v∗ satisfies a Bellman
equation of the form
v∗(z) = max
r(z), c(z) + β∫
v∗(z′)P(z, dz′)
(1)
where z ∈ Z is the current state, z′ is next period’s state, r(z) is the payoff to stopping, c(z) is
the flow payoff to continuing and P gives one step transition probabilities for the state. For
example, r(z) might be the liquidation value of a firm considering whether to exit a market
and c(z) might be one period profit conditional on remaining active, given the state z. In this
case, v∗(z) is the value of the firm prior to deciding whether to continue or exit.
The continuation value function associated with this problem is the second term on the right
hand side of (1). We write it as
ψ∗(z) := c(z) + β∫
v∗(z′)P(z, dz′). (2)
It is straightforward to write down a functional equation such that ψ∗ is at least one of the
solutions: From (1) and (2), we have v∗(z) = maxr(z), ψ∗(z) for all z. Inserting this identity
into the right hand side of (2) leads us to the equation
ψ(z) = c(z) + β∫
maxr(z′), ψ(z′)P(z, dz′) (3)
for all z ∈ Z. To analyze this equation, we study the operator Q defined by
Qψ(z) = c(z) + β∫
maxr(z′), ψ(z′)P(z, dz′). (4)
By construction, fixed points of Q solve (3). As shown below, they are also continuation
value functions and from them we can derive value functions, optimal stopping rules and so
on. Once the fundamental optimality results are in place, we turn to properties of the con-
tinuation value function, such as continuity, differentiability and monotonicity, and deduce
implications for the optimal stopping rule. Prior to these tasks, we recall some facts related
to optimal stopping and weighted supremum norms.
2.2. Preliminaries. For real numbers a and b we set a ∨ b := maxa, b. If f and g are
functions, then ( f ∨ g)(x) := f (x) ∨ g(x). If (Z, Z ) is a measurable space, then bZ is the
set of Z -measurable bounded functions from Z to R, with norm ‖ f ‖ := supz∈Z | f (z)|. For
unbounded functions we use weighted supremum norms. Given a function κ : Z → [1, ∞),
the κ-weighted supremum norm of f : Z→ R is defined as
‖ f ‖κ := ‖ f /κ‖ = supz∈Z
| f (z)|κ(z)
If ‖ f ‖κ < ∞, then we say that f is κ-bounded. The symbol bκZ will denote the set of all
functions from Z to R that are both Z -measurable and κ-bounded. We use ρκ to represent
7
the metric ρκ( f , g) = ‖ f − g‖κ on bκZ. As is well-known, the pair (bκZ, ρκ) forms a Banach
space.
A stochastic kernel P on (Z, Z ) is a map P : Z × Z → [0, 1] such that z 7→ P(z, B) is Z -
measurable for each B ∈ Z and B 7→ P(z, B) is a probability measure for each z ∈ Z. Below,
we understand P(z, B) as representing the probability of a state transition from z ∈ Z to
B ∈ Z in one unit of time.
2.3. Set Up. Let (Zn)n≥0 be a time-homogeneous Markov process defined on probability
space (Ω, F ,P) and taking values in measurable space (Z, Z ). Let P denote the correspond-
ing stochastic kernel. Let Fnn≥0 be a filtration contained in F and such that (Zn)n≥0 is
adapted to Fnn≥0. Let Pz indicate probability conditioned on Z0 = z, while E z is expec-
tation conditioned on the same event. In proofs we take (Ω, F ) to be the canonical sequence
space, so that Ω = ×∞n=0Z and F is the product σ-algebra generated by Z . For the formal
construction of Pz on (Ω, F ) given P and z ∈ Z see Theorem 3.4.1 of Meyn and Tweedie
(2012) or Section 8.2 of Stokey et al. (1989).
A random variable τ taking values in N0 := 0, 1, . . . is called a (finite) stopping time with
respect to the filtration Fnn≥0 if Pτ < ∞ = 1 and τ ≤ n ∈ Fn for all n ≥ 0. Below,
τ = n has the interpretation of choosing to act at time n. Let M denote the set of all stopping
times on Ω with respect to the filtration Fnn≥0.
Let r : Z → R and c : Z → R be a measurable functions, referred to below as the exit payoff
and flow continuation payoff respectively. Consider a problem where, at each time t ≥ 0,
an agent observes Zt and chooses between stopping and continuing. Stopping generates
final payoff r(Zt). Continuing involves continuation payoff c(Zt) and transition to the next
period, where the agent observes Zt+1 and the process repeats. Future payoff are discounted
at rate β ∈ (0, 1).
The value function is defined at z ∈ Z by
v∗(z) := supτ∈M
E z
τ−1
∑t=0
βtc(Zt) + βτr(Zτ)
. (5)
A stopping time τ ∈M is called an optimal stopping time if it attains the supremum in (5). A
policy is a map σ from Z to 0, 1, with 0 indicating the decision to continue and 1 indicating
the decision to stop. A policy is called an optimal policy if τ∗ defined by τ∗ := inft ≥0 | σ(Zt) = 1 is an optimal stopping time.
To guarantee existence of the value function and related properties without insisting that the
payoff functions are bounded, we adopt the next assumption:
8
Assumption 2.1. There exist a Z -measurable function g : Z→ R+ and constants m, d ∈ R+
such that βm < 1 and, for all z ∈ Z,
max∫|r(z′)|P(z, dz′), |c(z)|
≤ g(z) (6)
and ∫g(z′)P(z, dz′) ≤ mg(z) + d. (7)
The interpretation of Assumption 2.1 is that both r and c are small in absolute value relative
to some function g such that E g(Zt) does not grow too quickly. Slow growth in E g(Zt) is
imposed by (7), which can be understood as a geometric drift condition (see, e.g., Meyn and
Tweedie (2012), chapter 15).8
Example 2.1. A standard example of an optimal stopping problem in economics is job search.
As a simple example, suppose that a worker can either accept a current wage offer wt and
work permanently at that wage, or reject the offer, receive unemployment compensation c,
and reconsider next period. Let the current wage offer be a function wt = w(Zt) of some
idiosyncratic or aggregate state process (Zt)t≥0. The exit reward is r(z) = u(w(z))/(1− β),
where u is a utility function and β < 1 is the discount factor. The flow continuation payoff is
the constant c.9 If u is bounded, then we can then set g(z) equal to the constant ‖r‖ ∨ c and
Assumption 2.1 is satisfied with m = 1 and d = 0.
Example 2.2. Consider the same setting as Example 2.1, with state process
zt+1 = ρzt + b + εt+1, (εt)IID∼ N(0, σ2), (8)
Let wt = exp(zt), so that wages are lognormal, We consider several standard utility functions
that are unbounded.
(1) For u(w) = ln w. If β|ρ| < 1, let g(w) = | ln w|, m = |ρ|, and d = σ√
2π + |b|, then
Assumption 2.1 holds. Since the correlation coefficient ρ ≥ 1 is allowed, our theory
can treat nonstationary state processes.
(2) For u(w) = w1−γ
1−γ , where γ ≥ 0 and γ 6= 1. Notice that when γ 6= 0, the utility function
is of constant relative risk aversion form, with a coefficient of relative risk aversion γ.
When γ = 0, the utility function reduces to u(w) = w.
(a) If ρ ∈ [0, 1] and β exp[(1− γ)ρb + (1−γ)2ρ2σ2
2
]< 1, then Assumption 2.1 holds
by letting m = d = exp[(1− γ)ρb + (1−γ)2ρ2σ2
2
]and g(w) = w(1−γ)ρ.
8To verify Assumption 2.1, it sufficies to obtain a Z -measurable function g : Z → R+, constants m, d ∈ R+
with βm < 1 and constants a1, a2, a3 and a4 inR+ such that∫|r(z′)|P(z, dz′) ≤ a1g(z) + a2, |c(z)| ≤ a3g(z) + a4
and (7) holds. We use this fact in the applications below.9The classical McCall model used an IID wage process (McCall (1970)). We follow many subsequent studies
in assuming Markov dynamics for wages (see, e.g., Jovanovic (1987) or Bull and Jovanovic (1988)).
9
(b) If ρ ∈ [−1, 0] and β exp[|(1− γ)ρb|+ (1−γ)2ρ2σ2
2
]< 1, then Assumption 2.1
holds by letting m = exp[|(1− γ)ρb|+ (1−γ)2ρ2σ2
2
], d = 0, and g(w) = w(1−γ)ρ +
w−(1−γ)ρ.
Example 2.3. Consider the asset pricing problem of a perpetual call option (see, e.g., Shiryaev
(1999), Duffie (2010)), an infinite-horizon American call option with no fixed maturity nor
exercise limit. Let x be the current price of the asset. Recall the stochastic process defined in
(8), and let the sequence of asset price (xt)t≥0 be xt = ezt for all t ≥ 0. The value of the option
to buy the asset at a strike price K is given by
v∗(x) = max(x− K)+, e−γ
∫v∗(x′) f (x′|x)dx′
where f (x′|x) = LN(ρ ln x + b, σ2), and γ > 0 is the riskless rate of return. If ρ ∈ [0, 1] and
β exp(
ρb + ρ2σ2
2
)< 1, then Assumption 2.1 holds by letting m = d = exp
(ρb + ρ2σ2
2
), and
g(x) = xρ. If ρ ∈ [−1, 0] and β exp(|ρb|+ ρ2σ2
2
)< 1, then Assumption 2.1 holds by letting
m = exp(|ρb|+ ρ2σ2
2
), d = 0, and g(x) = xρ + x−ρ.
2.4. Optimality. Let g be as in Assumption 2.1 and let
k(z) := ∑t≥0
βtEz|r(Zt)|+ g(Zt)+ 1, (9)
As supplementary appendix of this paper, Ma (2016) shows that, under Assumption 2.1, the
value function v∗ is a well-defined element of bkZ that satisfies the Bellman equation (1), and
that the Bellman operator
Tv(z) = max
r(z), c(z) + β∫
v(z′)P(z, dz′)
(10)
is a contraction mapping on bkZ when paired with the weighted supremum norm ‖ · ‖k.
Hence v∗ is the unique fixed point. With the notation introduced in Section 2.2, the Bellman
equation (1) can be expressed in functional notation as v = r ∨ (c + βPv), and the continua-
tion value function can be defined by ψ∗ = c + βPv∗. Since v∗ satisfies the Bellman equation,
we also have v∗ = r ∨ ψ∗. Ma (2016) also shows that the optimal stopping time is
τ∗ := inft ≥ 0 | r(Zt) ≥ ψ∗(Zt).
Thus, the optimal strategy is a Markov strategy, with action at time t depending only on the
current state Zt.
2.5. The Continuation Value Operator. Without loss of generality, consider the case m > 1
and d ≥ 1β − 1. Let ` be the weighting function
`(y) = g(y) +d
m− 1. (11)
10
Let Q be the operator from b`Z to itself defined by (4). As we now show, the fixed point of Q
is the continuation value function ψ∗ defined in (2).
Theorem 2.1. If Assumption 2.1 holds, then the following statements are true:
(1) Q is a contraction mapping on (b`Z, ρ`) of modulus βm.
(2) The unique fixed point of Q in b`Z is ψ∗.
(3) The policy σ∗ defined pointwise by σ∗(z) = 1r(z) ≥ ψ∗(z) is an optimal policy.
Example 2.2 (Continued). Recall the extended job search model of McCall (1970), in which a
general Markov process (zt)t≥0 is considered that generates the wage process. For each type
of utility function u, the continuation value operator satisfies
Qψ(w) = c + β∫
max
u(w′)1− β
, ψ(w′)
f (w′|w)dw′
Since Assumption 2.1 has been verified, from Theorem 2.1 we know that there exists a unique
fixed point of Q under b`Z that coincides with ψ∗— the continuation value function, which
in the current case represents the expected value of rejecting the current offer and waiting
for a new draw.
Example 2.3 (Continued). Recall the perpetual option problem of Shiryaev (1999). The con-
tinuation value operator for the perpetual option satisfies
Qψ(x) = e−γ∫
max(x′ − K)+, ψ(x′) f (x′|x)dx′
By Theorem 2.1, Q admits a unique fixed point ψ∗ in b`Z, which in this case can be interpreted
as the expected value of holding the option in the current period and considering exercising
at a later stage.
Example 2.4. (Firm Exit I). Consider a firm exit model in the style of Hopenhayn (1992). At
the beginning of each period, a productivity shock a is realized and observed by an incum-
bent firm in the industry. The firm must decide whether to exit the market or not in the next
period (before a′ is realized). The output of the firm is q(a, l) = alα, where α ∈ (0, 1), l de-
notes the labor demand. Suppose that the productivity shock process (at)t≥0 satisfies at = ezt
for all t ≥ 0, where (zt)t≥0 is defined in (8).
A fixed cost c f > 0 must be paid every period by the incumbent firm, which can be treated
as a fixed outside opportunity cost for some resources (e.g., managerial ability) used by the
firm. Given output and input prices p and w, profit maximization behavior implies that the
exit payoff and flow continuation payoff of staying in the industry r(a) = c(a) = Ga1
1−α − c f ,
where G =( αp
w
) 11−α( 1−α
α
)w. The continuation value operator
Qψ(a) =(
Ga1
1−α − c f
)+ β
∫max
Ga′
11−α − c f , ψ(a′)
f (a′|a)da′
11
where f (a′|a) = LN(ρ ln a+ b, σ2). It can be verified that if β exp[
b1−α + σ2
2(1−α)2
]< 1 and ρ ∈
[0, 1], then Assumption 2.1 holds by letting g(a) = a1
1−α and m = d = exp[
b1−α + σ2
2(1−α)2
].
If ρ ∈ [−1, 0] and β exp[|b|
1−α + σ2
2(1−α)2
]< 1, then Assumption 2.1 holds by letting g(a) =
a1
1−α + a−1
1−α and m = exp[|b|
1−α + σ2
2(1−α)2
]and d = 0. By Theorem 2.1, Q admits a unique
fixed point in b`Z that corresponds to the continuation value function ψ∗, which can be un-
derstood as the expected value of staying in the industry for the next period and performing
optimally afterwards.
Example 2.5. (Firm Exit II). Consider the firm exit model of Jovanovic (1982). Let q be
the output of a firm, and C(q) a cost function that satisfies: C(0) = C′(0) = 0, C′(q) >
0, C′′(q) > 0, and limq→∞ C′(q) = ∞. The total cost is C(q)x, where (xt)t≥0 is a stochastic
process that satisfies xt = l(ηt); l is a positive, strictly increasing, and continuous function
with limη→−∞ l(η) = α1 > 0 and limη→∞ l(η) = α2 ≤ ∞; and (ηt)t≥0 is a stochastic process
that satisfies
ηt = ξ + εt, (εt)IID∼ N(0, σ2)
where ξ denotes firm type, which is connected to firm efficiency and unobservable. At the be-
ginning of each period, the firm observes x, and must decide whether to exit the industry or
not. The firm has prior belief ξ ∼ N(µ, γ) and updates it in a Bayesian manner after observ-
ing x′, so the posterior ξ|x′ ∼ N(µ′, γ′), where γ′ =(
1γ + 1
σ2
)−1and µ′ = γ′
(µγ + l−1(x′)
σ2
).
Let π(p, x) = maxq
[pq− C(q)x] be the maximal profits, where (pt)t≥0 is a bounded price se-
quence which is Markovian with transition probability h. Jovanovic (1982) shows that π is
a bounded and continuous function. Let W > 0 denote the expected present value of the
firm’s fixed factor in a different industry. Then the continuation value operator satisfies
Qψ(p, x, µ, γ) = π(p, x) + β∫
maxW, ψ(p′, x′, µ′, γ′) f (x′|µ, γ)h(p′|p)d(x′, p′)
Since both the exit and flow continuation payoffs are bounded, Assumption 2.1 satisfies
trivially by letting g be the upper bound of W ∨ π, m = 1 and d = 0. So Q admits a unique
fixed point ψ∗ in bZ that can be interpreted as the value of staying in the industry for one
period and performing optimally afterwards.
The two operators Q and T are closely related, in the sense that the n-th iterate of the value
function can be obtained from the n-th iterate of the continuation value function by taking the
pointwise maximum of this function and r. In particular, iterates of these operators converge
to their respective fixed points at the same rate. The next proposition clarifies:
Proposition 2.1. Fix ψ0 ∈ b`Z and let v0 := r ∨ ψ0. If vn := Tnv0 and ψn := Qnψ0 for some
n ∈ N, then vn = r ∨ ψn.
12
3. PROPERTIES OF CONTINUATION VALUES
In this section we explore some further properties of the continuation value function. As
the most significant result among those we establish, the continuation value function ψ∗
is shown to be smooth (continuously differentiable) under mild assumptions. While the
value function v∗ usually has kinks, ψ∗ can be smoother since the incorporated integration
operation creates a smoothing effect. This makes the continuation value based approach
more favorable for numerical computation than the value function based approaches since
smooth functions are easier to approximate numerically.
3.1. Continuity. We establish two results on continuity. The first one serves general prob-
lems, while the second one works well when the stochastic kernel P admits a density repre-
sentation.
Assumption 3.1. The flow continuation payoff function c is continuous.
Assumption 3.2. The function z 7→∫
maxr(z′), ψ(z′)P(z, dz′) is continuous for all contin-
uous function ψ ∈ b`Z.
Assumption 3.3. The exit payoff function r is continuous.
We have the following general result on the continuity of ψ∗. The continuity of v∗ can be
obtained as a byproduct under an additional continuity assumption of r.
Proposition 3.1. If Assumptions 2.1 and 3.1 - 3.2 hold, and g is continuous, then ψ∗ is continuous.
If in addition Assumption 3.3 holds, then v∗ is continuous.
In many applications, the stochastic kernel P has a density representation, which makes the
verification of Assumption 3.2 easier.
Definition 3.1. A stochastic density kernel (or density kernel) on Z is a measurable function
f : Z× Z→ R+ such that∫f (z′|z)dz′ :=:
∫f (z′|z)λ(dz′) = 1 for all z ∈ Z
where λ denotes the Lebesgue measure. We say that the stochastic kernel P has a density
representation if there exists a density kernel f such that
P(z, B) =∫1(z′ ∈ B) f (z′|z)dz′ for all z ∈ Z and B ∈ Z
The following result provides an alternative way to obtain the continuity of ψ∗ and v∗ when
P has a density representation, which is highly valuable in applications.
13
Proposition 3.2. Suppose Assumptions 2.1, 3.1 and the following conditions hold:
(1) P has a density representation f , and z 7→ f (z′|z) is continuous for all z′ ∈ Z;
(2) z 7→∫|r(z′)| f (z′|z)dz′, z 7→
∫g(z′) f (z′|z)dz′, and g are continuous.
Then ψ∗ is continuous. If in addition Assumption 3.3 holds, then v∗ is continuous.
Remark 3.1. When the return functions r and c are bounded, as is the case of many standard
economic models, establishing the continuity of ψ∗ is even easier. For general problems, we
only require that Assumption 3.1 holds and that P satisfies the Feller property. When P has
a density representation f , Assumption 3.1 and the continuity of z 7→ f (z′|z) (for all z′ ∈ Z)
are sufficient for ψ∗ to be continuous.
Example 2.5 (Continued). Recall the firm exit model of Jovanovic (1982). The exit payoff
W and flow continuation payoff π are bounded and continuous, and the Feller property in
this case can be easily verified by applying Lemma 7.1 in the Appendix. Therefore, ψ∗ is
continuous. Since the exit payoff W is constant, v∗ is continuous.
Remark 3.2. The continuity of ψ∗ does not necessarily require the continuity of r, while the
continuity of v∗ usually does. Intuitively, the integration operation inside operator Q has a
smoothing effect.
Example 2.2 (Continued). Recall the job search problem where the wage sequence is driven
by a general Markov process (zt)t≥0. Notice that P has a density representation f , and w 7→f (w′|w) is continuous for all w′ ∈ Z. Moreover, it is easy to verify the following statements:
(1)∫| ln w′| f (w′|w)dw′ = σ
√2π e
[− (ρ ln w+b)2
2σ2
]+ (ρ ln w + b)
[1− 2Φ
(− ρ ln w+b
σ
)](2)∫
w′a f (w′|w)dw′ = waρe(
ab+ a2σ22
)(a 6= 0)
where Φ denotes the normal cumulative distribution function. So the second condition of
Proposition 3.2 holds. Therefore, we can show that ψ∗ and v∗ are continuous for all three
types of u functions.
Example 2.3 (Continued). For the perpetual option problem presented previously, P admits
a density representation f , and x 7→ f (x′|x) is continuous for all x′ ∈ Z. Since∫
x′ρ f (x′|x)dx′
= exp(
ρ2 ln x + ρb + ρ2σ2
2
)for all ρ ∈ R, we can easily verify the second condition of Propo-
sition 3.2 by applying Lemma 7.1 in the Appendix. Therefore, ψ∗ and v∗ are continuous.
Example 2.4 (Continued). Recall the firm exit problem of Hopenhayn (1992). Notice that
c(a) = Ga1
1−α − c f is continuous. Through similar analysis as in Example 2.3, we can show
that ψ∗ and v∗ are continuous.
14
Example 3.1. (Firm Entry). Consider the firm entry problem in the style of Fajgelbaum
et al. (2015). In the beginning of each period, the firm observes an investment cost f , where
ftIID∼ h = LN(µ f , γ f ). Based on the belief of the fundamental, the firm has two choices:
enter the market, incur the observed investment cost and obtain a stochastic dividend xt
through production, or wait and reconsider next period. The firm aims to find a decision
rule that maximizes the expected net present value.
The stochastic dividend follows xt = ξt + εxt , εx
t IID∼ N(0, γx), where ξt and εx
t are respec-
tively the persistent and transient component. A public signal yt is released at the end of
each period, where yt = ξt + εyt ,
εyt IID∼ N(0, γy). Suppose that the firm has prior belief
ξ ∼ N(µ, γ) at the beginning of each period and updates it in a Bayesian way after observing
y, then the posterior belief ξ|y ∼ N(µ′, γ′), where γ′ =(
1γ + 1
γy
)−1and µ′ = γ′
(µγ + y
γy
).
The firm has constant absolute risk aversion u(x) = 1a (1− e−ax), a > 0. The continuation
value operator follows
Qψ(µ, γ) = β∫
max
Eµ′,γ′ [u(x′)]− f ′, ψ(µ′, γ′)
p( f ′, y|µ, γ)d( f ′, y) (12)
where p( f ′, y|µ, γ) = h( f ′)l(y|µ, γ) with l(y|µ, γ) = N(µ, γ + γy). Moreover, the exit payoff
r( f , µ, γ) = Eµ,γ[u(x)]− f = 1a
(1− exp
[−aµ + a2(γ+γx)
2
])− f .
This is another example with unbounded returns. To apply our method, consider the state
space Y = R×R++ with typical element y ∈ Y taking form of y = (µ, γ). Consider ` : Y →[1, ∞) defined by `(µ, γ) = exp
(−aµ + a2γ
2
)+ 1. Then from Theorem 2.1, Proposition 3.1,
and Lemma 7.1 in the Appendix, we can show that (See the Appendix for a detailed proof)
(1) Q is a well-defined mapping from b`Y into itself, and it is a contraction mapping of
modulus β on the complete metric space (b`Y, ρ`);
(2) ψ∗ and v∗ are continuous functions.
3.2. Shape Properties. We now study the shape properties of the continuation value func-
tion including monotonicity and concavity.
Assumption 3.4. The flow continuation payoff c is increasing (resp. decreasing).
Assumption 3.5. The function z 7→∫
maxr(z′), ψ(z′)P(z, dz′) is increasing (resp. decreas-
ing) for all increasing (resp. decreasing) function ψ ∈ b`Z.
Assumption 3.6. The exit payoff r is increasing (resp. decreasing).
Remark 3.3. If Assumption 3.6 holds and P is stochastically increasing in the sense that
P(z, ·) first-order stochastically dominates P(z, ·) for all z ≤ z, then Assumption 3.5 holds.
We have the following result regarding monotonicity.
15
Proposition 3.3. Under Assumptions 2.1 and 3.4 - 3.5, ψ∗ is increasing (resp. decreasing). If in
addition Assumption 3.6 holds, then v∗ is increasing (resp. decreasing).
The next result studies concavity properties of ψ∗.
Proposition 3.4. Suppose that Assumption 2.1 holds, r ≥ 0, P has a density representation f , and
that z → f (z′|z) (for all z′ ∈ Z) and c are concave (resp. convex) functions. Then ψ∗ is a concave
(resp. convex) function.
Example 2.2 (Continued). In the job search problem where the wage process (wt)t≥0 is
driven by a Markov process (zt)t≥0, the flow continuation payoff is constant, and each type
of exit payoff is increasing. From the properties of the log-normal distribution we know
that if ρ ≥ 0, the stochastic kernel corresponding to the density kernel f is stochastically in-
creasing. By Theorem 2.1 and Proposition 3.3, ψ∗ and v∗ are increasing under the following
circumstances: (1) u(w) = ln w and ρ ∈ [0, 1β ); (2) u(w) = w1−γ
1−γ (γ ≥ 0, γ 6= 1), ρ ∈ [0, 1] and
β exp[(1− γ)ρb + (1−γ)2ρ2σ2
2
]< 1.
Example 2.3 (Continued). Recall the pricing problem of the perpetual option. The exit pay-
off r(x) = (x − K)+ is increasing. Follow similar analysis as in Example 2.2, we can show
that ψ∗ and v∗ are increasing.
Example 2.4 (Continued). For the firm exit problem of Hopenhayn (1992), both r and c are
increasing functions. Similar as Examples 2.2 - 2.3, we can show that ψ∗ and v∗ are increasing
functions.
Example 3.1 (Continued). For the firm entry problem of Fajgelbaum et al. (2015), Proposition
3.3 shows that ψ∗ is increasing in µ, and v∗ is increasing in µ and decreasing in f .
3.3. Differentiability. Suppose Z ⊂ Rm, then a typical element z ∈ Z takes form of z =
(z1, ..., zm). For given function h defined on Z and for all z ∈ int(Z), define Dih(z) := ∂h(z)∂zi ,
i = 1, ..., m. For given z0 ∈ Z and δ > 0, define Bδ(z0) := z ∈ Z : ‖z− z0‖ < δ, Bδ(zi0) :=
zi ∈ Z(i) : |zi − zi0| < δ, Bδ(z0) and Bδ(zi
0) as their closures, where ‖ · ‖ is the Euclidean
norm, Z(i) is the i-th dimension of Z and Z(−i) denotes the remaining m− 1 dimensions of Z.
Assumption 3.7. P has a density representation f , and for all z′ ∈ Z, z 7→ f (z′|z) is differen-
tiable at interior points in the sense that Di f (z′|z) exists for all z ∈ int(Z), i = 1, ..., m.
Assumption 3.8. For all z0 ∈ int(Z), there exists δ > 0, such that for i = 1, ..., m, the following
functions take finite values:
(1) z−i0 7→
∫sup
zi∈Bδ(zi0)
|Di f (z′|z)|dz′;
16
(2) z−i0 7→
∫|r(z′)| sup
zi∈Bδ(zi0)
|Di f (z′|z)|dz′;
(3) z−i0 7→
∫g(z′) sup
zi∈Bδ(zi0)
|Di f (z′|z)|dz′.
Assumption 3.9. The flow continuation payoff function c is differentiable at interior points
in the sense that Dic(z) exists for all z ∈ int(Z), i = 1, ..., m.
The following result provides a group of sufficient conditions for ψ∗ to be differentiable.
Proposition 3.5. Under Assumptions 2.1 and 3.7 - 3.9, ψ∗ is differentiable at interior points in the
sense that Diψ∗(z) exists for all z ∈ int(Z), i = 1, ..., m.
We consider an alternative way to establish the property of differentiability.
Assumption 3.10. For all z′ ∈ Z, z 7→ f (z′|z) is twice differentiable at interior points in the
sense that D2i f (z′|z) exits for all z ∈ int(Z), i = 1, ..., m. Moreover, each (z, z′) 7→ Di f (z′|z) is
continuous.
Assumption 3.11. The following conditions hold for i = 1, ..., m
(1) There are finite solutions to D2i f (z′|z) = 0, and for all z0 ∈ int(Z), there exists δ > 0,
such that each solution (z′, z−i0 ) 7→ z∗i (z
′, z−i0 ) /∈ Bδ(zi
0) as ‖z′‖ → ∞;
(2) The following functions take finite values on int(Z): (a) z 7→∫|Di f (z′|z)|dz′; (b)
z 7→∫|r(z′)Di f (z′|z)|dz′; (c) z 7→
∫g(z′)|Di f (z′|z)|dz′. Moreover, r and g are con-
tinuous.
Remark 3.4. A sufficient condition for condition (1) of Assumption 3.11 is frequently used
when the state space is unbounded: There are finite solutions to D2i f (z′|z) = 0, and each so-
lution (z′, z−i) 7→ z∗i (z′, z−i) satisfies |z∗i (z′, z−i)| → ∞ as ‖z′‖ → ∞ for given z−i ∈ int(Z(−i));
The following proposition, which avoids verifying assumption 3.8, is useful in applications
where unbounded state space presents, as to be shown below.
Proposition 3.6. Under Assumptions 2.1 and 3.9 - 3.11, ψ∗ is differentiable at interior points in the
sense that Diψ∗(z) exists for all z ∈ int(Z), i = 1, ..., m.
Outside of being highly valuable for numerical computaion, smoothness is a desired prop-
erty in a lot of applications in which we want to characterize the properties of the optimal
policy, as to be shown in the next section.
Assumption 3.12. For i = 1, ..., m, the following conditions hold:
17
(1) The following functions are continuous on int(Z): (a) z 7→∫|Di f (z′|z)|dz′; (b) z 7→∫
|r(z′)Di f (z′|z)|dz′; and (c) z 7→∫
g(z′)|Di f (z′|z)|dz′;
(2) The flow continuation payoff function c is continuously differentiable at interior points
in the sense that z 7→ Dic(z) is continuous on int(Z).
The next result provides sufficient conditions for ψ∗ to be smooth.
Proposition 3.7. Suppose that Assumption 3.12 holds, and either (1) or (2) holds:
(1) The assumptions of Proposition 3.5 hold, and each z 7→ Di f (z′|z) is continuous on int(Z);
(2) The assumptions of Proposition 3.6 hold.
Then ψ∗ is continuously differentiable at interior points in the sense that z 7→ Diψ∗(z) is continuous
on int(Z), i = 1, ..., m.
Remark 3.5. When the return functions r and c are bounded, conditions (1.b) and (1.c) of
Assumption 3.12 are not required to establish the smoothness of ψ∗ in Proposition 3.7.
Example 2.2 (Continued). In the extended job search model where (wt)t≥0 is generated by
the Markov process (zt)t≥0, it is easy to verify the following statements:
(1) The solutions to ∂2 f (w′|w)∂w2 = 0 are w∗(w′) = exp
(ln w′−b
ρ − σ2
ρ2 ± σ2
ρ
√1ρ2 +
2σ2
);
(2)∫ ∣∣∣ ∂ f (w′|w)
∂w
∣∣∣dw′ = ρ2σw
√2π ;
(3)∣∣∣(ln w′) ∂ f (w′|w)
∂w
∣∣∣ ≤ ρw
1w′√
2πσ2 exp[− (ln w′−ρ ln w−b)2
2σ2
](ln w′)2+|ρ ln w+b|| ln w′|
2σ2 ;
(4)∣∣∣w′a ∂ f (w′|w)
∂w
∣∣∣ ≤ ρw
1w′√
2πσ2 exp[− (ln w′−ρ ln w−b)2
2σ2
](ln w′−ρ ln w−b)2+w′2a
2σ2 , a 6= 0;
(5) The four terms on both sides of statements (3) and (4) are continuous in w;
(6) The integrations of the right-hand-side terms of statements (3) and (4) with respect to
w′ are continuous in w.
From the first statement we know that condition (1) of Assumption 3.11 holds. Based on
statements (2) - (6) and Lemma 7.1 in the Appendix, we can show that condition (2) of As-
sumption 3.12 holds. The remaining conditions of Proposition 3.7 are easy to verify. There-
fore, ψ∗ is continuously differentiable.
To see that ψ∗ is smoother than v∗, we run the following simulation. For simplicity, we
consider r(w) = w1−β , and set β = 0.96, ρ = 0.6, σ = 1, b = 0 and c = 1. From Figure 1 we
can see that although v∗ has a kink in the interior of the state space, ψ∗ is smooth in the sense
that it is continuously differentiable and allows no kinks.
Example 2.3 (Continued). Recall the pricing problem of the perpetual option. By similar
analysis as in Example 2.2, we can show that ψ∗ is continuously differentiable. This is the
case despite the fact that the exit payoff r(x) = (x − K)+ has a kink at x = K. Therefore,
18
FIGURE 1. Comparison of ψ∗ and v∗
in general, the exit payoff function is not required to be differentiable for the continuation
value function to be smooth.
Example 2.4 (Continued). For the firm exit model of Hopenhayn (1992), through similar
analysis as in Examples 2.2 - 2.3, we can show that ψ∗ is continuously differentiable.
3.4. Parametric Continuity. In applications, we are often curious about how the value func-
tion, continuation value function, and optimal policy change in response to the variation of
some key parameters. In such circumstances, parametric continuity is highly valuable.
Consider the parameter space Θ ⊂ Rk. Let Pθ , rθ , cθ , v∗θ , and ψ∗θ denote the stochastic kernel,
exit payoff, flow continuation payoff, value function, and continuation value function with
respect to parameter θ ∈ Θ, respectively. Under Assumption 2.1, for all θ ∈ Θ, there exist
measurable map gθ : Z → R+, and constants mθ , dθ ∈ R+ with βmθ < 1 such that for all
z ∈ Z: (1) max∫|rθ(z′)|Pθ(z, dz′), |cθ(z)|
≤ gθ(z); and (2)
∫gθ(z)Pθ(z, dz′) ≤ mθ gθ(z)+ dθ .
Define m := supθ∈Θ
mθ and d := supθ∈Θ
dθ .
Assumption 3.13. βm < 1 and d < ∞.
Remark 3.6. To simplify analysis, we consider the parameter space Θ that does not include
the space of β. An alternative way to treat this problem is to consider β ∈ [0, a], where
a ∈ [0, 1), and include this space as part of Θ. In this case, Assumption 3.13 is replaced by
am < 1 and d < ∞. All the theoretical results on parametric continuity of this paper remain
true if we make this change.
19
Assumption 3.14. For all θ ∈ Θ, Pθ has a density representation fθ . For all z, z′ ∈ Z, θ 7→fθ(z′|z) is continuous. For all z ∈ Z, θ 7→
∫|rθ(z′)| fθ(z′|z)dz′ and θ 7→
∫gθ(z′) fθ(z′|z)dz′
are continuous.
Assumption 3.15. For all z ∈ Z, θ 7→ rθ(z), θ 7→ cθ(z) and θ 7→ gθ(z) are continuous.
Under these assumptions we have the following result for parametric continuity.
Proposition 3.8. Under Assumptions 2.1 and 3.13 - 3.15, θ 7→ ψ∗θ (z) and θ 7→ v∗θ (z) are continu-
ous for all z ∈ Z.
Example 2.2 (Continued). Recall the extension of the job search model of McCall (1970). For
simplicity, consider u(w) = ln w. Let the parameter space Θ = (− 1β , 1
β )× A× B× C, where
A, B are bounded subsets of R++,R respectively, and C ⊂ R. A typical element θ ∈ Θ
takes form of θ = (ρ, σ, b, c). Based on Proposition 3.8, θ 7→ ψ∗θ (w) and θ 7→ v∗θ (w) are
continuous for all w ∈ Z. Similarly, we can establish the parametric continuity property for
u(w) = w1−γ
1−γ (γ ≥ 0, γ 6= 1).
Remark 3.7. The parametric continuity result of Examples 2.2 - 2.5 and 3.1 can be established
similarly. To simplify analysis, unless explicitly specified, we do not discuss parametric con-
tinuity for other examples, though this property holds for each of them.
4. OPTIMAL POLICIES
In this section, we discuss several other significant advantages of the continuation value
based approach over traditional approaches based on the value function. To begin with, for
a broad range of problems, the continuation value function exists in a lower dimensional
space than the value function. The relationship is asymmetric. While each state variable that
appears in the continuation value function must appear in the value function, the converse
is not true. This facilitates numerical computation significantly since the curse of dimension-
ality is greatly mitigated.
Moreover, among these problems, the decision rule usually exhibits threshold behavior with
respect to some state variable, in the sense that the sequential decision process terminates
whenever a threshold level is achieved by that state process. In such cases, the continuation
value based method allows for a sharp analysis of the optimal policy. This type of problem
is pervasive in quantitative and theoretical economic modeling, as we now formulate.
Suppose that the state space Z ⊂ Rm and can be written as Z = X× Y, where X is a con-
vex subset of R and Y is a convex subset of Rm−1.10 The state process (Zn)n≥0 is then
10To simplify analysis, we assume that X is one dimensional. In general, the dimension of X can be higher.
20
(Xn, Yn)n≥0, where (Xn)n≥0 and (Yn)n≥0 are two stochastic processes taking values in X
and Y respectively. In particular, the period-n state vector Zn = (Xn, Yn), where Xn repre-
sents the first dimension and Yn the rest m− 1 dimensions of the random variable Zn.
Assume that stochastic processes (Xn)n≥0 and (Yn)n≥0 satisfy the following properties: (1)
(Monotonicity) The exit payoff function r is monotone on X; and (2) (Conditional Independence)
Conditional on each Yn, the next period states (Xn+1, Yn+1) and the current state Xn are in-
dependent. We call each random variable Xn the threshold state variable of period n, and each
Yn the environment state vector (or environment states, or environment) of period n. Moreover,
we call X the threshold state space and Y the environment space. Assume further, for this thresh-
old state optimal stopping problem, that the flow continuation payoff c is defined on the
environment space, i.e., c : Y → R.
Denote x as the threshold state variable and y as the environment so that the vector of state
variables in the current period is z = (x, y). Let z′ = (x′, y′) be the vector of states of next
period. We know from the definition of the threshold state variable that the stochastic ker-
nel P(z, dz′) can be represented by the conditional distribution function of (x′, y′) given y,
donoted as Fy(x′, y′), i.e., P(z, dz′) = P((x, y), d(x′, y′)) = dFy(x′, y′). Notice that under
this setup, the continuation value ψ∗ is a function of y only, while the value function v∗ is a
function of both x and y. So ψ∗ has strictly fewer arguments than v∗.11
Assumption 4.1. r is strictly monotone on X. Moreover, for all y ∈ Y, there exists x ∈ X such
that r(x, y) = c(y) + β∫
v∗(x′, y′)dFy(x′, y′).
Under Assumption 4.1, the reservation rule property holds. When the exit payoff r is strictly
increasing in x, for instance, this property states that if the agent terminates at state x ∈ X
at a given point in time, then he would have terminated at any higher state at that moment.
Specifically, there is a decision threshold x : Y → X such that when the state variable x at-
tains this threshold level, i.e., x = x(y), the agent is indifferent between terminating and
continuing, i.e., r(x(y), y) = ψ∗(y) for all y ∈ Y.
As shown in Theorem 2.1, the optimal policy σ∗ : Z → 0, 1 satisfies σ∗(z) = 1r(z) ≥ψ∗(z). For threshold state optimal stopping problems, this policy is fully specified by the
decision threshold x. In particular, under Assumption 4.1, the optimal policy σ∗(x, y) =
1x ≥ x(y) if r is strictly increasing in x, and σ∗(x, y) = 1x ≤ x(y) if r is strictly decreas-
ing in x. Based on the properties of the continuation value function, the properties of the
decision threshold x can be easily established. We summarize them in the following. Firstly,
we have the following result for continuity.
11In this case, since the threshold state is assumed one-dimensional, ψ∗ has one less argument than v∗. In
general, the difference in the arguments of ψ∗ and v∗ can be strictly larger than one.
21
Proposition 4.1. Suppose that either the assumptions of Proposition 3.1 or Proposition 3.2 hold, and
that Assumption 4.1 holds, then x is continuous.
The next result provides sufficient conditions for x to be monotone.
Proposition 4.2. Suppose that the assumptions of Proposition 3.3 and Assumption 4.1 hold, and
that r is defined on X. If ψ∗ is increasing and r is strictly increasing (resp. decreasing), then x is
increasing (resp. decreasing). If ψ∗ is decreasing and r is strictly increasing (resp. decreasing), then
x is decreasing (resp. increasing).
A typical element y ∈ Y takes form of y = (y1, ..., ym−1). For i = 1, ..., m − 1 and given
functions h : Y → R, l : X × Y → R, define Dih(y) := ∂h(y)∂yi , Dil(x, y) := ∂l(x,y)
∂yi , and
Dxl(x, y) := ∂l(x,y)∂x . The following result on the smoothness of x follows from Proposition 3.7
and the implicit function theorem.
Proposition 4.3. Suppose that the assumptions of Proposition 3.7 and Assumption 4.1 hold. More-
over, r is continuously differentiable on int(Z). Then x is continuously differentiable on int(Y). In
particular, Di x(y) = −Dir(x(y),y)−Diψ∗(y)
Dxr(x(y),y) for all y ∈ int(Y).
Intuitively, (x, y) 7→ r(x, y)− ψ∗(y) denotes the premium of terminating the sequential deci-
sion process. So functions (x, y) 7→ Dir(x, y)− Diψ∗(y); Dxr(x, y) denote the instantaneous
rate of change in the terminating premium in response to an instantaneous change in the
environment state yi and threshold state x, respectively. Holding the terminating premium
at 0, the change of premium as a result of change of x cancels the premium change resulting
from the variation of y. Therefore, the instantaneous rate of change of x(y) with respect to yi
is equivalent to the ratio of the instantaneous rates of changes in the premium. The negative
sign is due to the 0-sum property of the terminating premium at the decision threshold x.
Let xθ be the decision threshold with respect to θ ∈ Θ. We have the following result for
parametric continuity.
Proposition 4.4. Suppose that the assumptions of Proposition 3.8, and Assumptions 3.3 and 4.1
hold. Then θ 7→ xθ(y) is continuous for all y ∈ Y.
Example 3.1 (Continued). Recall the firm entry problem of Fajgelbaum et al. (2015). This is
a typical threshold state optimal stopping problem. In particular, the threshold state space
X = R+, and the threshold state variable x = f . The environment space Y = R×R+ with
environment states y = (µ, γ). The value function of the firm follows
v∗( f , µ, γ) = max
Eµ,γ[u(x)]− f , β∫
v∗( f ′, µ′, γ′)p( f ′, y|µ, γ)d( f ′, y)
22
Since there are 3 state variables, v∗ is defined on a space of 3-dimensional. However, ψ∗
is defined on a space of 2-dimensional since the environment space is one dimension less.
Moreover, the optimal policy is determined by a reservation cost function f : Y → R such
that when f = f (µ, γ), the firm is indifferent between entering the market and waiting. In
particular, f (µ, γ) = Eµ,γ[u(x)]− ψ∗(µ, γ) and optimal policy σ∗( f , µ, γ) = 1 f ≤ f (µ, γ)for all ( f , µ, γ) ∈ Z. By Proposition 4.1, we can show that f is continuous.
5. COMPUTATIONAL EFFICIENCY
The motivation of this section is to provide an illustration of the computational efficiency
of the continuation value based method over the traditional value function based methods.
Numerical experiments show that the partial impact of lower dimensionality of the continu-
ation value can be huge, even the difference between the arguments of this function and the
value function is only a single variable. For example, while solving a well known version of
the job search model in Section 5.1, the continuation value iteration takes only 171 seconds
to compute the optimal policy with the level of accuracy 10−6 (see group-3 experiments), as
opposed to more than 7 days for the value function iteration.
Moreover, we do not provide a detailed comparison of the two approaches in Section 5.2,
as the computation via value function takes too long (more than 7 days) due to the curse of
dimensionality. However, our approach takes only 15.45 minutes to compute the optimal
policy with a level of accuracy 10−6. Finally, all the applications demonstrate the effective-
ness our approach in characterizing the optimal policy.
5.1. Job Search II. Consider another extension of McCall’s job search model presented by
Ljungqvist and Sargent (2012). The model is as the benchmark case, apart from the fact that
the distribution of the wage process h is unknown. The worker knows that there are two
possible densities f and g. At the start of time, nature selects h to be either f or g. The choice
is not observed by the worker, who puts prior probability π0 on f being chosen. By the
Bayes’ rule, πt updates via πt+1 = πt f (wt+1)πt f (wt+1)+(1−πt)g(wt+1)
. We can express the value function
of the unemployed worker recursively as follows
v∗(w, π) = max
w1− β
, c + β∫
v∗(w′, π′)hπ(w′)dw′
where π′ = q(w′, π) = π f (w′)π f (w′)+(1−π)g(w′) and hπ(w′) := π f (w′) + (1− π)g(w′). This is a
typical threshold state optimal stopping problem, in which the threshold state variable is w
and the environment is π. In particular, ψ∗ is defined on a space that is of lower dimensional
than the state space where v∗ is defined, in the sense that ψ∗ is a function of π only while v∗
is a function of both w and π.
23
Following Ljungqvist and Sargent (2012), we set f = Beta(1, 1) and g = Beta(3, 1.2). Then
the state space Z = [0, 2]× [0, 1]. Based on our theory, the optimal policy is characterized by a
reservation wage function w : [0, 1]→ R such that when w = w(π), the worker is indifferent
between accepting and rejecting the offer. Denote b[0, 1] as the set of bounded functions on
[0, 1]. Consider the Banach space (b[0, 1], ‖ · ‖∞) as the space of candidate functions. The
continuation value operator defined on this space satisfies
Qψ(π) = c + β∫
max
w′
1− β, ψ q(w′, π)
hπ(w′)dw′ (13)
This is the special case of our theory when the state space is compact, and both exit and flow
continuation payoffs are bounded.
Proposition 5.1. When the unemployment compensation c ∈ [0, 2], the following statements hold:
(1) Q is a well-defined mapping from b[0, 1] into itself, and it is a contraction mapping of modulus
β on the Banach space (b[0, 1], ‖ · ‖∞).
(2) The value function v∗(w, π) = max
w1−β , ψ∗(π)
, reservation wage w(π) = (1+ β)ψ∗(π),
and optimal policy σ∗(w, π) = 1w ≥ w(π) for all (w, π) ∈ Z.
(3) ψ∗, w, and v∗ are continuous functions.
FIGURE 2. The reservation wage
Following Section 6.6 of Ljungqvist and Sargent (2012), we set β = 0.95 and c = 0.6. In the
benchmark simulation, the grid points (w, π) lie in [0, 2]× [104, 1− 10−4] with 100 points for
the w grid and 50 points for the π grid. As shown in Figure 2, the reservation wage w is a
decreasing in π. Intuitively, f is a less attractive offer distribution than g, and larger π means
more weight on f and less on g. Therefore, larger π depresses the worker’s assessment of
his future prospects, and relatively low current offers become more attractive.
24
Since the computation is 2-dimensional via value function iteration (VFI), and is only 1-
dimensional via continuation value function iteration (CVI), we can expect that the compu-
tation via CVI would be much faster. To make a comparison, we conduct several groups
of experiments and provide the time taken by the two approaches. All the experiments are
processed in a standard Python environment on a laptop with a 2.5 GHz Intel Core i5.
5.1.1. Group-1 Experiments. In this group, we explore the time taken by the two approaches
to compute the fixed point at different levels of accuracy and across different parameteriza-
tions. Specifically, Table 1 provides the list of experiments we perform. In all simulations,
the setup of the grid points is the same as the baseline simulation. For each given test and
level of accuracy, we run the simulation 50 times for CVI, 20 times for VFI, and calculate the
average time. The results are provided in Table 2.
TABLE 1. Group-1 Experiments
Parameter Test 1 Test 2 Test 3 Test 4 Test 5
β 0.9 0.95 0.98 0.95 0.95
c 0.6 0.6 0.6 0.001 1
TABLE 2. Time Taken of Group-1 Experiments
Test/Method/Precision 10−3 10−4 10−5 10−6 10−7 10−8
Test 1VFI 114.17 140.94 174.91 201.77 228.59 255.67
CVI 0.67 0.92 1.16 1.43 1.71 1.94
Test 2VFI 181.78 234.58 271.89 323.22 339.87 341.55
CVI 0.95 1.49 1.80 2.27 2.69 3.11
Test 3VFI 335.78 335.87 335.28 335.91 338.70 334.21
CVI 1.77 2.68 3.08 3.03 3.03 3.06
Test 4VFI 154.18 201.05 247.72 294.90 335.32 335.00
CVI 0.79 1.22 1.65 2.06 2.50 2.91
Test 5VFI 275.41 336.02 326.33 327.41 327.11 327.71
CVI 1.33 2.12 2.79 2.99 2.97 2.97
As can be seen in Table 2, our method performs much better than VFI. Averagely speaking,
CVI is 141 times faster than VFI. In the best case, CVI is 207 times faster. In Test 5, VFI takes
275.41 seconds to achieve the level of accuracy 10−3, while CVI takes only 1.33 seconds. Even
if in the worst case, CVI is 109 times faster. In Test 5, VFI takes 327.41 seconds while CVI takes
only 2.99 seconds to achieve the level of accuracy 10−6.
25
5.1.2. Group-2 Experiments. In applications, more grid points are needed for the numerical
approximation to be more accurate. In this group of experiments, we compare how the two
approaches perform under different grid sizes. The parameterization is the same as in the
benchmark setup. Again, we run the simulation 50 times for CVI, 20 times for VFI, and
calculate the average time. Information and results of these experiments are provided in
Table 3 and Table 4, respectively.
TABLE 3. Group-2 Experiments
Variable Test 2 Test 6 Test 7 Test 8 Test 9 Test 10
π 50 50 50 100 100 100
w 100 150 200 100 150 200
TABLE 4. Time Taken of Group-2 Experiments
Test/Precision/Method 10−3 10−4 10−5 10−6 10−7 10−8
Test 2VFI 181.78 234.58 271.89 323.22 339.87 341.55
CVI 0.95 1.49 1.80 2.27 2.69 3.11
Test 6VFI 264.34 336.20 407.52 476.01 508.05 509.05
CVI 0.96 1.39 1.82 2.30 2.73 3.14
Test 7VFI 355.40 449.55 545.51 641.05 679.93 678.28
CVI 0.92 1.37 1.79 2.22 2.84 3.07
Test 8VFI 352.76 447.36 541.75 639.73 678.91 677.52
CVI 1.94 2.74 3.58 4.42 5.30 6.14
Test 9VFI 526.72 670.19 812.66 951.78 1017.29 1015.15
CVI 1.81 2.68 3.68 4.33 5.23 6.08
Test 10VFI 706.34 897.07 1086.15 1278.27 1354.37 1360.07
CVI 1.83 2.72 3.51 4.40 5.21 6.10
As can be seen, our approach outperforms VFI more obviously as the grid size increases. In
Table 4 we see that as we increase the number of grid points for w, the speed of CVI is not
affected. However, the speed of VFI reduces significantly. Amongst tests 2, 6 and 7, CVI is
219 times faster than VFI on average. In the best case, CVI is 386 times faster. While it takes
VFI 355.40 seconds to achieve a level of accuracy 10−3 in Test 7, CVI takes only 0.92 second.
As we increase the grids of w from 100 to 200, CVI is not affected, but the time taken for VFI
almost doubles. Obviously, this is because the grid points for w are not used for CVI, while
they are part of the grids for VFI.
26
As we increase the grid size of both w and π, there is a slight decrease in the computation
speed of CVI. Nevertheless, the decrease in the speed of VFI is almost exponential. Amongst
tests 2 and 8 - 10, CVI is 223.41 times as fast as VFI on average. In Test 10, VFI takes 706.34
seconds to achieve a level of precision 10−3, instead, CVI takes only 1.83 seconds, which is
386 times faster.
5.1.3. Group-3 Experiments. Since the total number of grids increases exponentially with re-
spect to the total number of states, the speed of computation drops dramatically as the num-
ber of states increases. For example, with 3 state variables, VFI suffers the ”curse of dimen-
sionality”, while CVI works quite well. To illustrate this point, we consider the parametric
class problem with respect to the unemployment compensation c, in which case c is treated
as an alternative state variable. In this case, VFI has 3 state variables and the computation
takes more than 7 days. However, the CVI has only 2 states and the computation finishes
within 171 seconds to attain the accuracy level 10−6. Hence, we can conveniently calculate
via CVI the reservation wage as a function of both π and c. Figure 3 provides the result.
FIGURE 3. The reservation wage
This figure, in which a whole class of c values are considered, serves as a generalization of
Figure 2. Not surprisingly, the reservation wage increases as c increases, since a higher level
of compensation hinders the agent’s incentive to enter into the labor market.
5.2. Job Search III. Consider the adaptive search model proposed (though not implemented)
in McCall (1970). The model explores how the reservation utility changes in response to the
agent’s expectation of the mean and variance of the unknown wage offer distribution. Sup-
pose the wage process follows
w = ξ + εw, εw ∼ N(0, γw) (14)
27
where ξ is the persistent component with prior belief ξ ∼ N(µ, γ), and εw is a transitory com-
ponent. The worker’s current estimate of the next period wage distribution is f (w′|µ, γ) =
N(µ, γ + γw). After observing w′ next period, the posterior belief ξ|w′ ∼ N(µ′, γ′), where
γ′ =(
1γ + 1
γw
)−1and µ′ = γ′
(µγ + w′
γw
). The worker has constant absolute risk aversion
u(w) = 1a (1− e−aw), a > 0. Once he accepts the offer, the search process terminates and he
obtains the same utility u(w) in each future period. If the agent rejects the offer, he obtains
utility c from unemployment compensation and reconsiders next period. The value function
follows
v∗(w, µ, γ) = max
u(w)
1− β, c + β
∫v∗(w′, µ′, γ′) f (w′|µ, γ)dw′
(15)
This is another threhold state optimal stopping problem. In particular, the threshold state
space X = R and the threshold state variable x = w. The environment space Y = R×R+
and the environment states y = (µ, γ). Since there are 3 state variables, standard approaches
via VFI suffers the ”curse of dimensionality”. The computation via VFI is as time-consuming
as it performs in Experiment 3 of Section 5.1. However, the computation via CVI is only 2-
dimensional and our theory works well.
Notice that the exit payoff is unbounded below. We consider a weight function ` : Y → [1, ∞)
defined by `(µ, γ) = exp(−aµ + a2γ
2
)+ 1 and the space of candidate functions (b`Y, ρ`). For
all ψ ∈ b`Y, the continuation value operator follows
Qψ(µ, γ) = c + β∫
max
u(w′)1− β
, ψ(µ′, γ′)
f (w′|µ, γ)dw′ (16)
where µ′, γ′ and f (w′|µ, γ) are defined as above. Based on the theory of Section 4, the optimal
policy is determined by a reservation wage function w : Y → R such that when w = w(µ, γ),
the worker is indifferent between accepting and rejecting the job offer.
Proposition 5.2. Suppose that the unemployment compensation satisfies c < 1a . Then the following
statements hold:
(1) Q is a well-defined mapping from b`Y into itself, and it is a contraction mapping of modulus
β on the complete metric space (b`Y, ρ`).
(2) For all (w, µ, γ) ∈ Z, the value function v∗(w, µ, γ) = max
u(w)1−β , ψ∗(µ, γ)
, reservation
wage w(µ, γ) = − 1a ln [1− a(1− β)ψ∗(µ, γ)], and optimal policy σ∗(w, µ, γ) = 1w ≥
w(µ, γ).(3) ψ∗, w, and v∗ are continuous functions.
(4) ψ∗ and w are increasing functions of µ. v∗ is an increasing function of w and µ.
Remark 5.1. When risk aversion is considered, the exit payoff is bounded above, though it is
unbounded below. However, it is easy to verify that our theory can be applied to all settings
28
where the exit payoff is of form r(w) = aw + b, a, b ∈ R+, or r(w) = aew + b with a, b ∈ R+,
and the flow continuation payoff c ≥ b.
Since in the current context (1− β)ψ∗ is a monotone transformation of the reservation wage
and possesses clear economic intuition, we define it as the reservation utility function and use
it for the remaining analysis.
In the simulation, we set β = 0.95 and a = 0.6. To parallel Ljungqvist and Sargent (2012),
we set c = 0.0493 after transforming their parameterization by the utility function u. The
literature provides little guidance on γw, so we perform a sensitivity analysis. The grid
points (µ, γ) lie in [−50, 50]× [10−4, 25], with 150 points for the µ grid and 75 points for the
γ grid. The grid is scaled to be more dense when the absolute values of µ and γ are small.
We set the threshold function outside the grid to its value at the closest grid. The integration
is computed via Monte Carlo with 1000 draws.12 Figure 4 provides the simulation results.
There are several key characteristics in Figure 4. Firstly, in each case, the reservation utility
is an increasing function of µ, which parallels the result of Proposition 5.2. Naturally, a more
optimistic agent (higher µ) would expect that higher offers can be obtained. Thus he will not
accept the current offer until the utility obtained is high enough.
As another interesting point, for given µ of a relatively small value, the reservation utility is
increasing in γ. However, as µ gets large, this utility starts to be decreasing in γ. Intuitively,
although a pessimistic worker (low µ) expects that he will obtain low wage offers on average,
part of the downside risks are chopped off. Worst case scenario, he is ensured to get an
unemployment compensation c > 0. Thus, a higher level of uncertainty (higher γ) in the
offer distribution provides the worker with a better opportunity to ”try the fortune” for a
good offer. This pushes up the reservation utility. For an optimistic (high µ) but risk-averse
worker, since the choice is irreversible, when facing a higher level of uncertainty, the worker
has an incentive to enter the labor market at an earlier stage so as to avoid downside risks.
This depresses the reservation utility. For similar reasons, increasing γw creates a positive
effect on the reservation utility when µ is small.
5.3. Job Search IV. We consider another extension of the standard job search model of Mc-
Call (1970). Assume that the wage process follows
wt = ηt + θtξt (17)
ln θt = ρ ln θt−1 + ln ut (18)
12Changing the number of Monte Carlo samples, the grid range and grid density produces almost the same
results.
29
FIGURE 4. The reservation utility
where ρ ∈ [−1, 1] is a constant. The sequences ξtIID∼ h with
∫|ξ|h(ξ)dξ < ∞, ηt
IID∼ v
with∫|η|v(η)dη < ∞, and ut
IID∼ LN(0, σ2u). Moreover, ξt, ηt, and ut are inde-
pendent, and the sequence θt is independent of ξt and ηt. The process in (17) and
(18) is general in the sense that it incorporates several standard setups. For example, when
ξt and ηt are log normally distributed, it simplifies to the setup of Kaplan and Violante
(2010), where income fluctuation problems are studied. Furthermore, when ξtIID∼ N(0, 1),
through some slight modification, this process simplifies to a setup that incorporates the
standard stochastic volatility model (see, e.g., Taylor, 1982).
We set h = LN(0, σ2ξ ) and v = LN(µη , σ2
η). In this case, θt and ξt are persistent and transitory
components of income, respectively, while ut is treated as a shock to the persistent compo-
nent. ηt can be interpreted as social security, gifts, etc. The threshold state space X = R+
with threshold state process wt, and the environment space Y = R+ with environment
30
process θt. This is another example for which the computation via VFI lacks efficiency but
our method performs very well. The value function of the agent satisfies
v∗(w, θ) = max
w1− β
, c + β∫
v∗(w′, θ′) f (θ′|θ)h(ξ ′)v(η′)d(θ′, ξ ′, η′)
and the continuation value operator takes form of
Qψ(θ) = c + β∫
max
w′
1− β, ψ(θ′)
f (θ′|θ)h(ξ ′)v(η′)d(θ′, ξ ′, η′)
where w′ = η′+ θ′ξ ′, and f (θ′|θ) = LN(ρ ln θ, σ2u) is the density kernel of the Markov process
θt. Suppose ρ ∈ [−1, 1] and β exp(
ρ2σ2u
2
)< 1, then Assumption 2.1 holds by letting g(θ) =
θρ + θ−ρ, m = exp(
ρ2σ2u
2
), and d = 1
β − 1.
Proposition 5.3. Suppose ρ ∈ [−1, 1], λ := β exp(
ρ2σ2u
2
)< 1, and the unemployment compensa-
tion c ∈ R+. Then the following statements hold:
(1) Q is a well-defined mapping from b`Y into itself, and it is a contraction mapping of modulus
λ on the complete metric space (b`Y, ρ`).
(2) The the value function v∗(w, θ) = max
w1−β , ψ∗(θ)
, reservation wage w(θ) = (1 −
β)ψ∗(θ), and optimal policy σ∗(w, θ) = 1w ≥ w(θ) for all (w, θ) ∈ Z.
(3) ψ∗ and w are continuously differentiable, and v∗ is continuous.
We choose β = 0.95 and µη = 0 for the baseline parameterization. We set σξ = 0.05, ση =
0.001, and σu = 0.01. In the first simulation, we consider the parametric class problem with
respect to c, where we let c ∈ [0, 10] with 50 grid points and ρ = 1. In the second simulation,
we consider the parametric class problem with respect to ρ, where ρ ∈ [0.5, 1] with 20 grid
points and we set c = 0.6 as in Ljungqvist and Sargent (2012). We set θ ∈ [10−3, 25] with
100 grid points, and the grid is scaled to be more dense when θ is smaller. Similar as before,
the reservation wage outside the grid points is set to its value at the closest grid, and the
integration is computed via Monte Carlo with 1000 draws.
We see in Figure 5 that the reservation wage is an increasing function of θ. When the re-
alization of θ is small, the reservation wage is an increasing function of the unemployment
compensation c. When θ gets large, the reservation wage becomes less sensitive to c. Intu-
itively, when θ gets well above c, since the shock is highly persistent (ρ = 1), the reservation
wage is completely determined by the realization of the permanent shock.
In Figure 6, we see that for any ρ ∈ [0.5, 1], the reservation wage is an increasing function of
θ. For larger ρ, the slope of the reservation wage function is sharper. Intuitively, ρ measures
the degree of income persistence. As ρ gets larger, the effect of a positive shock lasts longer,
which pushes up the worker’s reservation wage.
31
FIGURE 5. The reservation wage FIGURE 6. The reservation wage
6. CONCLUSION
In this paper, we study an alternative solution method to sequential decision problems. The
idea involves calculating the continuation value directly. We show that not only is the set of
possible applications of this method very broad, but it turns to have significant advantages
over traditional methods based on the value function.
7. APPENDIX
Denote (X,X ) as a measurable space and (Y,Y , u) as a measure space.
Lemma 7.1. Let p : Y × X → R be a measurable map that is continuous in x. If there exists a
measurable map q : Y × X → R+ that is continuous in x with q(y, x) ≥ |p(y, x)| for all (y, x) ∈Y × X, and that x 7→
∫q(y, x)u(dy) is continuous, then the mapping x 7→
∫p(y, x)u(dy) is
continuous.
Proof. Since q(y, x) ≥ |p(y, x)| for all (y, x) ∈ Y× X, we know that (y, x) 7→ q(y, x)± p(y, x)
are nonnegative measurable functions. Let (xn) be a sequence of X with xn → x. By Fatou’s
lemma, we have∫lim inf
n→∞[q(y, xn)± p(y, xn)]u(dy) ≤ lim inf
n→∞
∫[q(y, xn)± p(y, xn)]u(dy)
From the given assumptions we know that limn→∞
∫q(y, xn)u(dy) = q(y, x). Combine this
result with the above inequality, we have
±∫
p(y, x)u(dy) ≤ lim infn→∞
(±∫
p(y, xn)u(dy))
32
where we have used the fact that for any two given sequences (an)n≥0 and (bn)n≥0 of R with
limn→∞
an exists, we have: lim infn→∞
(an + bn) = lim infn→∞
an + lim infn→∞
bn. So
lim supn→∞
∫p(y, xn)u(dy) ≤
∫p(y, x)u(dy) ≤ lim inf
n→∞
∫p(y, xn)u(dy)
Therefore, the mapping x 7→∫
p(y, x)u(dy) is continuous.
Proof of theorem 2.1. To prove the first statement, based on the weighted contraction mapping
theorem (see, e.g., Boyd, 1990), it is sufficient to verify: (a) Q is monotone in the sense that
Qψ ≤ Qφ for all ψ, φ ∈ b`Z with ψ ≤ φ; (b) Q0 ∈ b`Z; and (c) Q(ψ + a`) ≤ Qψ + a(βm)`
for all a ∈ R+ and ψ ∈ b`Z. Obviously, Q is monotone and condition (a) holds. From
Assumption 2.1 we know that
|(Q0)(z)|`(z)
=
∣∣∣∣ c(z)`(z)+ β
∫max
r(z′)`(z)
, 0
P(z, dz′)∣∣∣∣
≤ |c(z)|`(z)
+ β∫ |r(z′)|
`(z)P(z, dz′) ≤ 1 + β
for all z ∈ Z. So we have ‖Q0‖` < ∞. Condition (b) holds since the measurability of Q0
follows immediately from primitive assumptions. It remains to verify condition (c). Notice
that based on Assumption 2.1, we have∫`(z′)P(z, dz′) =
∫g(z′)P(z, dz′) +
dm− 1
≤ mg(z) + d +d
m− 1
= m[
g(z) +d
m− 1
]= m`(z)
Therefore, for all a ∈ R+ and ψ ∈ b`Z, we have
Q(ψ + a`)(z) = c(z) + β∫
max
r(z′), ψ(z′) + a`(z′)
P(z, dz′)
≤ c(z) + β∫
max
r(z′) + a`(z′), ψ(z′) + a`(z′)
P(z, dz′)
= c(z) + β∫
max
r(z′), ψ(z′)
P(z, dz′) + aβ∫
`(z′)P(z, dz′)
≤ Qψ(z) + a(βm)`(z)
So condition (3) is verified.
Let ψ denote the unique fixed point of the operator Q under b`Z. To prove the second state-
ment, it remains to verify that ψ = ψ∗. Notice that since
maxr(z), ψ(z) = max
r(z), c(z) + β∫
maxr(z′), ψ(z′)P(z, dz′)
33
the function z 7→ maxr(z), ψ(z) solves the Bellman equation (1). Moreover, there exist
h1, h2 ∈ R+ such that
|maxr(z), ψ(z)|k(z)
≤ |r(z)|+ |ψ(z)|k(z)
≤ |r(z)|+ h1g(z) + h2
k(z)
for all z ∈ Z. Since the last term is bounded on Z, the measurable map z 7→ maxr(z), ψ(z)is a candidate of bkZ. Based on Ma (2016), we know that it must be the value function, i.e.,
v∗(z) = maxr(z), ψ(z) for all z ∈ Z. Therefore, we have
ψ∗(z) = c(z) + β∫
v∗(z′)P(z, dz′)
= c(z) + β∫
maxr(z′), ψ(z′)P(z, dz′) = ψ(z)
for all z ∈ Z, and the second statement is verified. The proof of the third statement is trivial.
Proof of Proposition 2.1. The claim is true by construction when n = 0. Now suppose that the
claim is true for arbitrary n. We aim to show that it also holds at n + 1. To this end, consider
the two operators R and L defined by
Rψ := r ∨ ψ and Lv := c + βPv.
With this notation, we can write T and Q as Tv = RLv and Qψ = LRψ. By the induction
hypothesis, we have vn = r ∨ ψn, or vn = Rψn, so
vn+1 = Tvn = TRψn = RLRψn = RQψn = Rψn+1.
In other words, vn+1 = r ∨ ψn+1, as was to be shown.
Proof of Proposition 3.1. Let b`cZ be the set of continuous functions in b`Z. Since g is continu-
ous, it is easy to show that b`cZ is a closed subset of b`Z. To verify the continuity of ψ∗, it is
sufficient to show that Q(b`cZ) ⊂ b`cZ (see., e.g., Stokey et al., 1989). The assumptions of the
proposition ensure that this is so. The continuity of v∗ follows from the continuity of ψ∗ and
r and the fact that v∗ = r ∨ ψ∗.
Proof of Proposition 3.2. By Assumptions 2.1, 3.1, and Proposition 3.1, to show that ψ∗ is con-
tinuous, we only need to verify that Assumption 3.2 holds. For all ψ ∈ b`cZ, there exists
G ∈ R+ such that |ψ(z)| ≤ G`(z) for all z ∈ Z, so we have: |maxr(z′), ψ(z′) f (z′|z)| ≤[|r(z′)| + G`(z′)] f (z′|z). Based on the given assumptions, z 7→ [|r(z′)| + G`(z′)] f (z′|z) is
nonnegative and continuous for all z′ ∈ Z, and z 7→∫[|r(z′)|+ G`(z′)] f (z′|z)dz′ is continu-
ous. By Lemma 7.1, z 7→∫
maxr(z′),ψ(z′) f (z′|z)dz′ is continuous. Combined with the fact that c is continuous, Assumption 3.2
holds. The remaining proof is similar as Proposition 3.1.
34
Proof of Proposition 3.3. Let b`iZ (resp. b`dZ) be the set of increasing (resp. decreasing) func-
tions in b`Z. Then b`iZ (resp. b`dZ) is a closed subset of b`Z. To show that ψ∗ is increasing
(resp. decreasing), it is sufficient to show that Q(b`iZ) ⊂ b`iZ (resp. Q(b`dZ) ⊂ b`dZ) (see,
e.g., Stokey et al., 1989). The assumptions of the proposition guarantee that this is the case.
The value function v∗ is increasing (resp. decreasing) since r is assumed to be increasing
(resp. decreasing) and v∗ = r ∨ ψ∗.
Proof of Proposition 3.4. Notice that the set of concave (resp. convex) functions of b`Z is a
closed subset of b`Z. The remaining proof is similar to that of Proposition 3.3.
Proof of Proposition 3.5. For all z ∈ int(Z) and i = 1, ..., m, let µ(z) :=∫
maxr(z′),ψ∗(z′) f (z′|z)dz′, and hi(z) :=
∫maxr(z′), ψ∗(z′)Di f (z′|z)dz′. Since c is differentiable by
Assumption 3.9, to prove the desired result, we only need to verify that Diµ = hi at all
interior points for each i. For any zi0, let zi
n be an arbitrary sequence such that zin → zi
0 and
zin 6= zi
0 for all n ∈ N. Let zn and z0 be elements of int(Z) with the i-th entry being zin
and zi0 respectively, and z−i
n = z−i0 for all n ∈N.
For given δ > 0, there exists N ∈ N such that for all n ≥ N, zin ∈ Bδ(zi
0). By the mean value
theorem, given z−i = z−i0 , there exists ξ(z′, zn, z0) ∈ Bδ(zi
0) such that
|4(z′, zn, z0)| :=
∣∣∣∣∣ f (z′|zn)− f (z′|z0)
zin − zi
0
∣∣∣∣∣ = ∣∣∣Di f (z′|z)|zi=ξ(z′,zn,z0)
∣∣∣ ≤ supzi∈Bδ(zi
0)
∣∣Di f (z′|z)∣∣
Since ψ∗ ∈ b`Z, there exists G ∈ R+ such that |ψ∗| ≤ G`. For all n ≥ N, we have
(1) |maxr(z′), ψ∗(z′)4(z′, zn, z0)| ≤ (|r(z′)|+ G`(z′)) supzi∈Bδ(zi
0)
|Di f (z′|z)|;
(2) z−i0 7→
∫(|r(z′)|+ G`(z′)) sup
zi∈Bδ(zi0)
|Di f (z′|z)| dz′ takes finite values;
(3) maxr(z′), ψ∗(z′)4(z′, zn, z0)→ maxr(z′), ψ∗(z′)Di f (z′|z0) as n→ ∞.
By the dominated convergence theorem, we have Diµ(z0) = hi(z0) since
µ(zn)− µ(z0)
zin − zi
0=∫
maxr(z′), ψ∗(z′)4(z′, zn, z0)dz′
→∫
maxr(z′), ψ∗(z′)Di f (z′|z0)dz′ = hi(z0)
Proof of Proposition 3.6. From condition (1) of Assumption 3.11 we know that for all z0 ∈int(Z), there exists a compact subset A ⊂ Z, such that for all z′ ∈ Ac, we have z∗i (z
′, z−i0 ) /∈
Bδ(zi0), and sup
zi∈Bδ(zi0)
|Di f (z′|z)| = max|Di f (z′|z)|zi=zi0+δ, |Di f (z′|z)|zi=zi
0−δ given z−i = z−i0 .
35
By Assumption 3.10, (z, z′) 7→ Di f (z′|z) is continuous. Therefore, given z−i = z−i0 , there
exists G ∈ R+ such that
supzi∈Bδ(zi
0)
|Di f (z′|z)| ≤ supz′∈A,zi∈Bδ(zi
0)
|Di f (z′|z)| · 1(z′ ∈ A)
+(|Di f (z′|z)|zi=zi
0+δ + |Di f (z′|z)|zi=zi0−δ
)· 1(z′ ∈ Ac)
≤ G · 1(z′ ∈ A)
+(|Di f (z′|z)|zi=zi
0+δ + |Di f (z′|z)|zi=zi0−δ
)· 1(z′ ∈ Ac)
Combine this result with condition (2) of Assumption 3.11, we can show that Assumption
3.8 holds. The desired result then follows from Proposition 3.5.
Proof of Proposition 3.7. Since ψ∗ ∈ b`Z, there exists G ∈ R+ such that |ψ∗(z)| ≤ G`(z) for
all z ∈ Z. So |maxr(z′), ψ∗(z′)Di f (z′|z)| ≤ (|r(z′)| + G`(z′))|Di f (z′|z)| for all z′, z ∈Z. For all z′ ∈ Z, z 7→ (|r(z′)| + G`(z′))|Di f (z′|z)| is nonnegative and continuous by the
given assumptions. z 7→∫[|r(z′)| + G`(z′)]|Di f (z′|z)|dz′ is continuous by condition (1) of
Assumption 3.12. So z 7→∫
maxr(z′), ψ∗(z′)Di f (z′|z)dz′
= Diµ(z) is continuous by Lemma 7.1. This result, combined with the assumption that c is
continuously differentiable (condition(2) of Assumption 3.12), show that ψ∗ is continuously
differentiable at interior points.
Proof of Proposition 3.8. Consider ` : Z× Θ → [1, ∞) defined by `(z, θ) = gθ(z) + dm−1 , the
Banach space (b`(Z×Θ), ‖ · ‖`), and the continuation value operator Q : b`(Z×Θ)→ b`(Z×Θ)
Qψθ(z) = cθ(z) + β∫
maxrθ(z′), ψθ(z′) fθ(z′|z)dz′
Based on Theorem 2.1, (z, θ) 7→ ψ∗θ (z) is the unique fixed point of Q in b`(Z × Θ). Let
b`cθ(Z× Θ) be the set of functions in b`(Z× Θ) that are continuous in θ. Since θ 7→ gθ(z)
is continuous for all z ∈ Z by Assumption 3.15, b`cθ(Z×Θ) is a closed subset. To show the
continuity of θ 7→ ψ∗θ (z), it remains to verify that Q : b`cθ(Z×Θ)→ b`cθ(Z×Θ).
For all candidate (z, θ) 7→ ψθ(z) in b`cθ(Z× Θ), there exists G ∈ R+ such that |ψθ(z′)| ≤G`(z′, θ), so |maxrθ(z′), ψθ(z′) fθ(z′|z)| ≤ [|rθ(z′)| + G`(z′, θ)] fθ(z′|z) for all (z′, z, θ) ∈Z× Z× Θ. Moreover, by Assumptions 3.14 - 3.15, θ 7→ [|rθ(z′)|+ G`(z′, θ)] fθ(z′|z) is non-
negative and continuous for all z, z′ ∈ Z, and θ 7→∫[|rθ(z′)|
+ G`(z′, θ)] fθ(z′|z)dz′ is continuous for all z ∈ Z. From Lemma 7.1 we know that θ 7→∫maxrθ(z′), ψθ(z′) fθ(z′|z)dz′ is continuous for all z ∈ Z. Moreover, θ 7→ cθ(z) is continu-
ous for all z ∈ Z. So θ 7→ ψ∗θ (z) is continuous.
The continuity of θ 7→ v∗θ (z) follows from the continuity of θ 7→ rθ(z) and the fact that
v∗θ = rθ ∨ ψ∗θ .
36
Proof of Proposition 4.4. Consider F : X×Y×Θ→ R defined by F(x, y, θ) := rθ(x, y)−ψ∗θ (y).
Without loss of generality, suppose that (x, y, θ) 7→ rθ(x, y) is strictly increasing in x. Then F
is a continuous function and is strictly increasing in x.
For all fixed y ∈ Y, θ0 ∈ Θ and ε > 0, since F is strictly increasing in x and F(xθ0(y), y, θ0) = 0,
we have:
F(xθ0(y) + ε, y, θ0) > 0 and F(xθ0(y)− ε, y, θ0) < 0
Since F is continuous with respect to θ, there exists δ > 0 such that for all θ ∈ Bδ(θ0) := θ ∈Θ : ‖θ − θ0‖ < δ, where ‖ · ‖ is the Euclidean norm, we have
F(xθ0(y) + ε, y, θ) > 0 and F(xθ0(y)− ε, y, θ) < 0
Since F(xθ(y), y, θ) = 0, by the strict monotonicity of F with respect to x, we have:
xθ(y) ∈ (xθ0(y)− ε, xθ0(y) + ε)
i.e., |xθ(y)− xθ0(y)| < ε. Hence, the function θ 7→ xθ(y) is continuous for all y ∈ Y.
REFERENCES
ALBUQUERQUE, R. AND H. A. HOPENHAYN (2004): “Optimal lending contracts and firm
dynamics,” The Review of Economic Studies, 71, 285–315.
ALVAREZ, F. AND A. DIXIT (2014): “A real options perspective on the future of the Euro,”
Journal of Monetary Economics, 61, 78–109.
ALVAREZ, F. AND N. L. STOKEY (1998): “Dynamic programming with homogeneous func-
tions,” Journal of Economic Theory, 82, 167–189.
ARELLANO, C. (2008): “Default risk and income fluctuations in emerging economies,” The
American Economic Review, 98, 690–712.
BACKUS, D. (2014): “Discussion of Alvarez and Dixit: A real options perspective on the
Euro,” Journal of Monetary Economics, 61, 110–113.
BECKER, R. A. AND J. H. BOYD (1997): Capital Theory, Equilibrium Analysis, and Recursive
Utility, Wiley-Blackwell.
BELLMAN, R. (1969): “A new type of approximation leading to reduction of dimensionality
in control processes,” Journal of Mathematical Analysis and Applications, 27, 454–459.
BOYD, J. H. (1990): “Recursive utility and the Ramsey problem,” Journal of Economic Theory,
50, 326–345.
BULL, C. AND B. JOVANOVIC (1988): “Mismatch versus derived-demand shift as causes of
labour mobility,” The Review of Economic Studies, 55, 169–175.
BURDETT, K. AND K. L. JUDD (1983): “Equilibrium price dispersion,” Econometrica, 955–969.
CHOI, J. J., D. LAIBSON, B. C. MADRIAN, AND A. METRICK (2003): “Optimal defaults,” The
American Economic Review, 93, 180–185.
37
DIXIT, A. K. AND R. S. PINDYCK (1994): Investment Under Uncertainty, Princeton University
Press.
DUFFIE, D. (2010): Dynamic Asset Pricing Theory, Princeton University Press.
DURAN, J. (2000): “On dynamic programming with unbounded returns,” Economic Theory,
15, 339–352.
——— (2003): “Discounting long run average growth in stochastic dynamic programs,” Eco-
nomic Theory, 22, 395–413.
ERICSON, R. AND A. PAKES (1995): “Markov-perfect industry dynamics: A framework for
empirical work,” The Review of Economic Studies, 62, 53–82.
FAJGELBAUM, P., E. SCHAAL, AND M. TASCHEREAU-DUMOUCHEL (2015): “Uncertainty
traps,” Tech. rep., NBER Working Paper.
HOPENHAYN, H. A. (1992): “Entry, exit, and firm dynamics in long run equilibrium,” Econo-
metrica, 1127–1150.
HUGGETT, M., G. VENTURA, AND A. YARON (2011): “Sources of lifetime inequality,” The
American Economic Review, 101, 2923–2954.
INSLEY, M. C. AND T. S. WIRJANTO (2010): “Contrasting two approaches in real options
valuation: contingent claims versus dynamic programming,” Journal of Forest Economics,
16, 157–176.
JOVANOVIC, B. (1982): “Selection and the evolution of industry,” Econometrica, 649–670.
——— (1987): “Work, rest, and search: unemployment, turnover, and the cycle,” Journal of
Labor Economics, 131–148.
KAPLAN, G. AND G. L. VIOLANTE (2010): “How much consumption insurance beyond self-
insurance?” American Economic Journal: Macroeconomics, 2, 53–87.
KARATZAS, I. AND S. E. SHREVE (1998): Methods of Mathematical Finance, vol. 39, Springer
Science & Business Media.
KIYOTAKI, N. AND R. WRIGHT (1993): “A search-theoretic approach to monetary econom-
ics,” The American Economic Review, 63–77.
LE VAN, C. AND Y. VAILAKIS (2005): “Recursive utility and optimal growth with bounded
or unbounded returns,” Journal of Economic Theory, 123, 187–209.
LJUNGQVIST, L. AND T. J. SARGENT (2012): Recursive Macroeconomic Theory, MIT Press.
MA, Q. (2016): “Supplementary appendix: solving sequential decision problems via contin-
uation values,” ANU Working Paper.
MARTINS-DA ROCHA, V. F. AND Y. VAILAKIS (2010): “Existence and uniqueness of a fixed
point for local contractions,” Econometrica, 78, 1127–1141.
MATKOWSKI, J. AND A. S. NOWAK (2011): “On discounted dynamic programming with
unbounded returns,” Economic Theory, 46, 455–474.
38
MCCALL, J. J. (1970): “Economics of information and job search,” The Quarterly Journal of
Economics, 84, 113–126.
MEYN, S. P. AND R. L. TWEEDIE (2012): Markov Chains and Stochastic Stability, Springer
Science & Business Media.
PISSARIDES, C. A. (2000): Equilibrium Unemployment Theory, MIT press.
RINCON-ZAPATERO, J. P. AND C. RODRIGUEZ-PALMERO (2003): “Existence and uniqueness
of solutions to the Bellman equation in the unbounded case,” Econometrica, 71, 1519–1555.
——— (2009): “Corrigendum to Existence and uniqueness of solutions to the Bellman equa-
tion in the unbounded case Econometrica, Vol. 71, No. 5 (September, 2003), 1519–1555,”
Econometrica, 77, 317–318.
ROTHSCHILD, M. (1974): “Searching for the lowest price when the distribution of prices is
unknown,” Journal of Political Economy, 82, 689–711.
RUST, J. (1986): “When is it optimal to kill off the market for used durable goods?” Econo-
metrica, 65–86.
——— (1987): “Optimal replacement of GMC bus engines: An empirical model of Harold
Zurcher,” Econometrica, 999–1033.
——— (1997): “Using randomization to break the curse of dimensionality,” Econometrica,
487–516.
SHI, S. (1995): “Money and prices: a model of search and bargaining,” Journal of Economic
Theory, 67, 467–496.
——— (1997): “A divisible search model of fiat money,” Econometrica, 75–102.
SHIRYAEV, A. N. (1999): Essentials of Stochastic Finance: Facts, Models, Theory, vol. 3, World
scientific.
STOKEY, N., R. LUCAS, AND E. PRESCOTT (1989): Recursive Methods in Economic Dynamics,
Harvard University Press.
TAYLOR, S. J. (1982): “Financial returns modelled by the product of two stochastic processes–
a study of the daily sugar prices 1961-75,” Time Series Analysis: Theory and Practice, 1, 203–
226.
TREJOS, A. AND R. WRIGHT (1995): “Search, bargaining, money, and prices,” Journal of Po-
litical Economy, 118–141.
top related