solving sequential decision problems via continuation values1 · 2016-10-13 · solving sequential...

Solving Sequential Decision Problems via Continuation

Values1

Qingyin Maa and John Stachurskib

a, bResearch School of Economics, Australian National University

September 12, 2016

ABSTRACT. We study a solution method for sequential decision problems based around the

continuation value function, rather than the value function. This approach turns to have signif-

icant advantages. One is that continuation value functions are smoother, allowing for sharper

analysis of optimal policies and more efficient computation. Another is that, for a range of

problems, the continuation value function exists in a lower dimensional space than the value

function, mitigating the curse of dimensionality. In one typical experiment, the lower state

dimension reduces computation time from over a week to less than three minutes.

1The authors thank the Australian Research Council Discovery Grant DP120100321.

Email addresses: qingyin.ma@anu.edu.au, john.stachurski@anu.edu.au1

PREFACE

Thesis Title: Essays on Sequential Decision Problems in Economic Dynamics

Supervisor: Prof. John Stachurski

In many economic problems, agents located in a stochastically evolving environment must

choose between acting now or waiting for a better opportunity. These problems can be mod-

eled in an optimal stopping framework. The thesis attempts to provide a systematic analysis

to this class of problem. Three main contributions are made:

Firstly, the thesis extends the standard dynamic programming theory by providing a system-

atic treatment to unbounded returns in optimal stopping problems. Under general settings

where unbounded return functions are permitted, the thesis provides easy-to-check suffi-

cient conditions for the existence and uniqueness of solutions to the Bellman equation, and

the unique fixed point of the Bellman operator is shown to be the value function (VF). The

theory is applicable to a broad class of applications in economics and finance.

Secondly, the thesis proposes an alternative approach to solve optimal stopping problems.

The idea involves calculating the continuation value function (CVF) directly, and has signifi-

cant advantages over standard approaches based on VF: (1) In a wide range of economic ap-

plications, CVF exists in a lower dimensional space than VF, while the converse never holds.

This allows us to mitigate one of the primary stumbling blocks for numerical analysis—the

curse of dimensionality. (2) CVF is typically smoother than VF, which is easier to approxi-

mate numerically. (3) CVF-based approach allows a sharp analysis of the optimal policy.

Finally, some preliminary extensions have been done so far: the theory is shown to work

well for repeated optimal stopping problems. The next stage is to build a unified theoretical

framework that treats optimal stopping problems with recursive preferences.

Although only applications of economics are presented, the theory developed contributes to

many other areas, including mathematical finance, operations research, sequential analysis,

and so on. The thesis is structured as follows:

Chapter I: Introduction; Chapter II: Optimal Stopping with Unbounded Returns; Chap-

ter III: The Continuation Value Based Approach; Chapter IV: Extensions; Chapter V:

Conclusions.

This paper is mainly based on results presented in Chapter III.

1. INTRODUCTION

In many economic problems, agents face stochastically evolving environments and choose

between acting immediately or waiting for a better opportunity. One such scenario is that

faced by job seekers, who can either accept their current wage offer or continue job hunting

(see, e.g., McCall (1970) or Pissarides (2000)). Another one is that faced by firms choosing

whether to enter a market or to wait, or to exit when incumbent (e.g., Jovanovic (1982),

Hopenhayn (1992), Ericson and Pakes (1995), Fajgelbaum et al. (2015)). Other problems in

this category include American call and put options (Karatzas and Shreve (1998), Shiryaev

(1999), Duffie (2010)), consumer search problems (Burdett and Judd (1983), Kiyotaki and

Wright (1993), Trejos and Wright (1995), Shi (1995, 1997)), optimal default (Choi et al. (2003),

Albuquerque and Hopenhayn (2004), Arellano (2008)), optimal replacement of durable goods

(Rust (1986, 1987)), optimal timing of investment (Dixit and Pindyck (1994)), timing of re-

tirement (Huggett et al. (2011)), timing of harvesting agricultural products (Insley and Wir-

janto (2010)) and optimal monopoly pricing with unknown demand across multiple markets

(Rothschild (1974)).2

In solving these problems, the standard path is to first seek the value function, which gives

maximal expected rewards from the flow of possible payoffs. From the value function, one

can calculate the continuation value by taking the expectation of the value function in the

next period, appropriately discounted and combined with flow benefits from continuation.

Once the continuation value is obtained, it can be compared with the reward from stopping.

The optimal policy is to stop if and only if the reward from stopping is larger.

An alternative approach was introduced by Jovanovic (1982) in the context of firm exit de-

cisions. The idea involves calculating the continuation value directly, using an operator

that we refer to below as the continuation value operator. In this paper, we show that Jo-

vanovic’s approach extends naturally to almost all optimal stopping problems of interest to

economists. We systematically study the method and its relationship to traditional dynamic

programming. We show that, for many problems, this method has significant advantages

over traditional methods based around the value function.

2Optimal stopping problems have major roles in other related fields. For example, in finance, American op-

tions provide the right to buy or sell an asset at a predetermined price or continue to the next period (Duffie

(2010)). Analysis of options in financial markets has led to the study of various economic and political decisions

using the framework of real options (Alvarez and Dixit (2014), Backus (2014)). Within operations research, prob-

lems such as adaptive routing and optimal dynamic mechanism design are solved using the theory of optimal

stopping.

One advantage is that, for a range of interesting problems, the continuation value function

exists in a lower dimensional space than the value function.3 For example, in the classic job

search model of McCall (1970), wage offers are independent draws from a fixed distribution.

The current offer affects lifetime rewards only if the agent decides to accept the offer. If not,

then the process updates with the preceding draw forgotten. Hence the current wage draw

appears in the value function—since the offer in hand can impact lifetime rewards—but has

no impact on the continuation value.4

The practical impact of lower dimensionality can be very large, as has been pointed out by

many authors (see, e.g., Bellman (1969) or Rust (1997)). For example, while solving a well

known version of the job search model in Section 5.1, we find that the continuation value

based approach takes only 171 seconds to compute the optimal policy to a given level of

accuracy, as opposed to more than 7 days for the value function iteration approach.

A second potential benefit of the continuation value based approach is that the continuation

value function is often smoother than the value function. The intuition behind this result

is that the value function is typically kinked at points where it is optimal to switch between

continuing and stopping. However, when transitions are stochastic and shocks have a degree

of smoothness (for example, the distributions have densities), such kinks are smoothed out

in the continuation value function. As a result, the continuation value function becomes

easier to approximate and more useful for making inferences about the optimal policy. For

example, we use smoothness in the continuation value function to obtain new results on the

differentiability of transition thresholds (e.g., reservation wages) as functions of other state

variables.

In extending Jovanovic’s continuation value function method to the whole spectrum of se-

quential decision problems used by economists, several challenges must be addressed. One

is that, in many applications, rewards are unbounded, meaning that traditional methods

based around contractions with respect to supremum norms do not apply.5 To this end, we

study the continuation value function in general settings where unbounded payoff functions

3While every state variable that appears in the continuation value function must appear in the value function,

the converse is not true. Hence, the number of arguments in the continuation value function is always weakly

less than the number of arguments in the value function, and sometimes strictly so.4Of course the current wage offer could affect the continuation value in a variety of ways, some of which are

considered below. For example, McCall (1970) considers a mechanism where the the current offer matters for

the state of knowledge, as described a belief distribution. In this case, however, the value function is still higher

dimensional than the continuation value function, since the value function must track both the current offer and

the parameters in the belief distribution, while the continuation value function tracks only the latter.5While in some cases this problem can be eliminated by compactifying the state space to the underlying

model, in other cases such changes are problematic. For example, wages might be driven by a state process with

unit root (see Example 2.2 below), in which case the state space cannot be compactified. Alternatively, in studies

are allowed. This is achieved by using weighted supremum norms. This approach turns out

to interact well with the continuation value function operator, leading to simple sufficient

conditions that are straightforward to check in applications.

Since we tackle unbounded problems, our research is also connected to earlier studies on

unbounded dynamic programming. In economics, the weighted supremum norm approach

was pioneered by Boyd (1990) and has been used in numerous other studies of unbounded

dynamic programming.6 When adapting this method to continuation value functions, we

find it possible to develop a simple and direct version of the methodology that includes

bounded problems as a special case.

Another line of research treats unboundedness via the local contraction approach, which

constructs a local contraction based on a suitable sequence of increasing compact subsets.

See, for example, Rincon-Zapatero and Rodrıguez-Palmero (2003, 2009), Martins-da Rocha

and Vailakis (2010) and Matkowski and Nowak (2011). One of the motivations of this line of

work is to deal with dynamic programming problems that are unbounded both above and

below. For our problem, we show that the weighted supremum norm based method can

tackle this case effectively, and hence we do not consider local contractions.

The paper is structured as follows. Section 2 outlines the method and provides the basic op-

timality results. Section 3 discusses the properties of the continuation value function, such

as continuity and differentiability. Section 4 explores the connections between the continu-

ation value and the optimal policy. Section 5 compares the computational efficiency of our

approach with the value function approach. Section 6 concludes. Proofs are provided in the

appendix.7

2. OPTIMALITY RESULTS

This section studies the optimality results. Prior to discussing technical details, we first give

an overview of the method and our terminology.

2.1. Overview. Consider a decision problem where an agent is faced at each point in time

with the choice between stopping (e.g., exercising an option, exiting a market, accepting a

of firm decisions, interest might center on the tails of the firm size distribution, so compactifying the state space

is undesirable.6Examples include Becker and Boyd (1997), Alvarez and Stokey (1998), Duran (2000), Duran (2003) and Le Van

and Vailakis (2005).7Due to the page limit, the Appendix section has been cut short a lot. However, a complete technical appendix

is available upon request.

job) or continuing to the next stage. Suppose that the value function v∗ satisfies a Bellman

equation of the form

v∗(z) = max

r(z), c(z) + β∫

v∗(z′)P(z, dz′)

where z ∈ Z is the current state, z′ is next period’s state, r(z) is the payoff to stopping, c(z) is

the flow payoff to continuing and P gives one step transition probabilities for the state. For

example, r(z) might be the liquidation value of a firm considering whether to exit a market

and c(z) might be one period profit conditional on remaining active, given the state z. In this

case, v∗(z) is the value of the firm prior to deciding whether to continue or exit.

The continuation value function associated with this problem is the second term on the right

hand side of (1). We write it as

ψ∗(z) := c(z) + β∫

v∗(z′)P(z, dz′). (2)

It is straightforward to write down a functional equation such that ψ∗ is at least one of the

solutions: From (1) and (2), we have v∗(z) = maxr(z), ψ∗(z) for all z. Inserting this identity

into the right hand side of (2) leads us to the equation

ψ(z) = c(z) + β∫

maxr(z′), ψ(z′)P(z, dz′) (3)

for all z ∈ Z. To analyze this equation, we study the operator Q defined by

Qψ(z) = c(z) + β∫

maxr(z′), ψ(z′)P(z, dz′). (4)

By construction, fixed points of Q solve (3). As shown below, they are also continuation

value functions and from them we can derive value functions, optimal stopping rules and so

on. Once the fundamental optimality results are in place, we turn to properties of the con-

tinuation value function, such as continuity, differentiability and monotonicity, and deduce

implications for the optimal stopping rule. Prior to these tasks, we recall some facts related

to optimal stopping and weighted supremum norms.

2.2. Preliminaries. For real numbers a and b we set a ∨ b := maxa, b. If f and g are

functions, then ( f ∨ g)(x) := f (x) ∨ g(x). If (Z, Z ) is a measurable space, then bZ is the

set of Z -measurable bounded functions from Z to R, with norm ‖ f ‖ := supz∈Z | f (z)|. For

unbounded functions we use weighted supremum norms. Given a function κ : Z → [1, ∞),

the κ-weighted supremum norm of f : Z→ R is defined as

‖ f ‖κ := ‖ f /κ‖ = supz∈Z

| f (z)|κ(z)

If ‖ f ‖κ < ∞, then we say that f is κ-bounded. The symbol bκZ will denote the set of all

functions from Z to R that are both Z -measurable and κ-bounded. We use ρκ to represent

the metric ρκ( f , g) = ‖ f − g‖κ on bκZ. As is well-known, the pair (bκZ, ρκ) forms a Banach

space.

A stochastic kernel P on (Z, Z ) is a map P : Z × Z → [0, 1] such that z 7→ P(z, B) is Z -

measurable for each B ∈ Z and B 7→ P(z, B) is a probability measure for each z ∈ Z. Below,

we understand P(z, B) as representing the probability of a state transition from z ∈ Z to

B ∈ Z in one unit of time.

2.3. Set Up. Let (Zn)n≥0 be a time-homogeneous Markov process defined on probability

space (Ω, F ,P) and taking values in measurable space (Z, Z ). Let P denote the correspond-

ing stochastic kernel. Let Fnn≥0 be a filtration contained in F and such that (Zn)n≥0 is

adapted to Fnn≥0. Let Pz indicate probability conditioned on Z0 = z, while E z is expec-

tation conditioned on the same event. In proofs we take (Ω, F ) to be the canonical sequence

space, so that Ω = ×∞n=0Z and F is the product σ-algebra generated by Z . For the formal

construction of Pz on (Ω, F ) given P and z ∈ Z see Theorem 3.4.1 of Meyn and Tweedie

(2012) or Section 8.2 of Stokey et al. (1989).

A random variable τ taking values in N0 := 0, 1, . . . is called a (finite) stopping time with

respect to the filtration Fnn≥0 if Pτ < ∞ = 1 and τ ≤ n ∈ Fn for all n ≥ 0. Below,

τ = n has the interpretation of choosing to act at time n. Let M denote the set of all stopping

times on Ω with respect to the filtration Fnn≥0.

Let r : Z → R and c : Z → R be a measurable functions, referred to below as the exit payoff

and flow continuation payoff respectively. Consider a problem where, at each time t ≥ 0,

an agent observes Zt and chooses between stopping and continuing. Stopping generates

final payoff r(Zt). Continuing involves continuation payoff c(Zt) and transition to the next

period, where the agent observes Zt+1 and the process repeats. Future payoff are discounted

at rate β ∈ (0, 1).

The value function is defined at z ∈ Z by

v∗(z) := supτ∈M

τ−1

∑t=0

βtc(Zt) + βτr(Zτ)

A stopping time τ ∈M is called an optimal stopping time if it attains the supremum in (5). A

policy is a map σ from Z to 0, 1, with 0 indicating the decision to continue and 1 indicating

the decision to stop. A policy is called an optimal policy if τ∗ defined by τ∗ := inft ≥0 | σ(Zt) = 1 is an optimal stopping time.

To guarantee existence of the value function and related properties without insisting that the

payoff functions are bounded, we adopt the next assumption:

Assumption 2.1. There exist a Z -measurable function g : Z→ R+ and constants m, d ∈ R+

such that βm < 1 and, for all z ∈ Z,

max∫|r(z′)|P(z, dz′), |c(z)|

≤ g(z) (6)

and ∫g(z′)P(z, dz′) ≤ mg(z) + d. (7)

The interpretation of Assumption 2.1 is that both r and c are small in absolute value relative

to some function g such that E g(Zt) does not grow too quickly. Slow growth in E g(Zt) is

imposed by (7), which can be understood as a geometric drift condition (see, e.g., Meyn and

Tweedie (2012), chapter 15).8

Example 2.1. A standard example of an optimal stopping problem in economics is job search.

As a simple example, suppose that a worker can either accept a current wage offer wt and

work permanently at that wage, or reject the offer, receive unemployment compensation c,

and reconsider next period. Let the current wage offer be a function wt = w(Zt) of some

idiosyncratic or aggregate state process (Zt)t≥0. The exit reward is r(z) = u(w(z))/(1− β),

where u is a utility function and β < 1 is the discount factor. The flow continuation payoff is

the constant c.9 If u is bounded, then we can then set g(z) equal to the constant ‖r‖ ∨ c and

Assumption 2.1 is satisfied with m = 1 and d = 0.

Example 2.2. Consider the same setting as Example 2.1, with state process

zt+1 = ρzt + b + εt+1, (εt)IID∼ N(0, σ2), (8)

Let wt = exp(zt), so that wages are lognormal, We consider several standard utility functions

that are unbounded.

(1) For u(w) = ln w. If β|ρ| < 1, let g(w) = | ln w|, m = |ρ|, and d = σ√

2π + |b|, then

Assumption 2.1 holds. Since the correlation coefficient ρ ≥ 1 is allowed, our theory

can treat nonstationary state processes.

(2) For u(w) = w1−γ

1−γ , where γ ≥ 0 and γ 6= 1. Notice that when γ 6= 0, the utility function

is of constant relative risk aversion form, with a coefficient of relative risk aversion γ.

When γ = 0, the utility function reduces to u(w) = w.

(a) If ρ ∈ [0, 1] and β exp[(1− γ)ρb + (1−γ)2ρ2σ2

]< 1, then Assumption 2.1 holds

by letting m = d = exp[(1− γ)ρb + (1−γ)2ρ2σ2

]and g(w) = w(1−γ)ρ.

8To verify Assumption 2.1, it sufficies to obtain a Z -measurable function g : Z → R+, constants m, d ∈ R+

with βm < 1 and constants a1, a2, a3 and a4 inR+ such that∫|r(z′)|P(z, dz′) ≤ a1g(z) + a2, |c(z)| ≤ a3g(z) + a4

and (7) holds. We use this fact in the applications below.9The classical McCall model used an IID wage process (McCall (1970)). We follow many subsequent studies

in assuming Markov dynamics for wages (see, e.g., Jovanovic (1987) or Bull and Jovanovic (1988)).

(b) If ρ ∈ [−1, 0] and β exp[|(1− γ)ρb|+ (1−γ)2ρ2σ2

]< 1, then Assumption 2.1

holds by letting m = exp[|(1− γ)ρb|+ (1−γ)2ρ2σ2

], d = 0, and g(w) = w(1−γ)ρ +

w−(1−γ)ρ.

Example 2.3. Consider the asset pricing problem of a perpetual call option (see, e.g., Shiryaev

(1999), Duffie (2010)), an infinite-horizon American call option with no fixed maturity nor

exercise limit. Let x be the current price of the asset. Recall the stochastic process defined in

(8), and let the sequence of asset price (xt)t≥0 be xt = ezt for all t ≥ 0. The value of the option

to buy the asset at a strike price K is given by

v∗(x) = max(x− K)+, e−γ

∫v∗(x′) f (x′|x)dx′

where f (x′|x) = LN(ρ ln x + b, σ2), and γ > 0 is the riskless rate of return. If ρ ∈ [0, 1] and

β exp(

ρb + ρ2σ2

)< 1, then Assumption 2.1 holds by letting m = d = exp

(ρb + ρ2σ2

), and

g(x) = xρ. If ρ ∈ [−1, 0] and β exp(|ρb|+ ρ2σ2

)< 1, then Assumption 2.1 holds by letting

m = exp(|ρb|+ ρ2σ2

), d = 0, and g(x) = xρ + x−ρ.

2.4. Optimality. Let g be as in Assumption 2.1 and let

k(z) := ∑t≥0

βtEz|r(Zt)|+ g(Zt)+ 1, (9)

As supplementary appendix of this paper, Ma (2016) shows that, under Assumption 2.1, the

value function v∗ is a well-defined element of bkZ that satisfies the Bellman equation (1), and

that the Bellman operator

Tv(z) = max

r(z), c(z) + β∫

v(z′)P(z, dz′)

is a contraction mapping on bkZ when paired with the weighted supremum norm ‖ · ‖k.

Hence v∗ is the unique fixed point. With the notation introduced in Section 2.2, the Bellman

equation (1) can be expressed in functional notation as v = r ∨ (c + βPv), and the continua-

tion value function can be defined by ψ∗ = c + βPv∗. Since v∗ satisfies the Bellman equation,

we also have v∗ = r ∨ ψ∗. Ma (2016) also shows that the optimal stopping time is

τ∗ := inft ≥ 0 | r(Zt) ≥ ψ∗(Zt).

Thus, the optimal strategy is a Markov strategy, with action at time t depending only on the

current state Zt.

2.5. The Continuation Value Operator. Without loss of generality, consider the case m > 1

and d ≥ 1β − 1. Let ` be the weighting function

`(y) = g(y) +d

m− 1. (11)

Let Q be the operator from b`Z to itself defined by (4). As we now show, the fixed point of Q

is the continuation value function ψ∗ defined in (2).

Theorem 2.1. If Assumption 2.1 holds, then the following statements are true:

(1) Q is a contraction mapping on (b`Z, ρ`) of modulus βm.

(2) The unique fixed point of Q in b`Z is ψ∗.

(3) The policy σ∗ defined pointwise by σ∗(z) = 1r(z) ≥ ψ∗(z) is an optimal policy.

Example 2.2 (Continued). Recall the extended job search model of McCall (1970), in which a

general Markov process (zt)t≥0 is considered that generates the wage process. For each type

of utility function u, the continuation value operator satisfies

Qψ(w) = c + β∫

u(w′)1− β

, ψ(w′)

f (w′|w)dw′

Since Assumption 2.1 has been verified, from Theorem 2.1 we know that there exists a unique

fixed point of Q under b`Z that coincides with ψ∗— the continuation value function, which

in the current case represents the expected value of rejecting the current offer and waiting

for a new draw.

Example 2.3 (Continued). Recall the perpetual option problem of Shiryaev (1999). The con-

tinuation value operator for the perpetual option satisfies

Qψ(x) = e−γ∫

max(x′ − K)+, ψ(x′) f (x′|x)dx′

By Theorem 2.1, Q admits a unique fixed point ψ∗ in b`Z, which in this case can be interpreted

as the expected value of holding the option in the current period and considering exercising

at a later stage.

Example 2.4. (Firm Exit I). Consider a firm exit model in the style of Hopenhayn (1992). At

the beginning of each period, a productivity shock a is realized and observed by an incum-

bent firm in the industry. The firm must decide whether to exit the market or not in the next

period (before a′ is realized). The output of the firm is q(a, l) = alα, where α ∈ (0, 1), l de-

notes the labor demand. Suppose that the productivity shock process (at)t≥0 satisfies at = ezt

for all t ≥ 0, where (zt)t≥0 is defined in (8).

A fixed cost c f > 0 must be paid every period by the incumbent firm, which can be treated

as a fixed outside opportunity cost for some resources (e.g., managerial ability) used by the

firm. Given output and input prices p and w, profit maximization behavior implies that the

exit payoff and flow continuation payoff of staying in the industry r(a) = c(a) = Ga1

1−α − c f ,

where G =( αp

) 11−α( 1−α

)w. The continuation value operator

Qψ(a) =(

1−α − c f

∫max

11−α − c f , ψ(a′)

f (a′|a)da′

where f (a′|a) = LN(ρ ln a+ b, σ2). It can be verified that if β exp[

b1−α + σ2

2(1−α)2

]< 1 and ρ ∈

[0, 1], then Assumption 2.1 holds by letting g(a) = a1

1−α and m = d = exp[

b1−α + σ2

2(1−α)2

If ρ ∈ [−1, 0] and β exp[|b|

1−α + σ2

2(1−α)2

]< 1, then Assumption 2.1 holds by letting g(a) =

1−α + a−1

1−α and m = exp[|b|

1−α + σ2

2(1−α)2

]and d = 0. By Theorem 2.1, Q admits a unique

fixed point in b`Z that corresponds to the continuation value function ψ∗, which can be un-

derstood as the expected value of staying in the industry for the next period and performing

optimally afterwards.

Example 2.5. (Firm Exit II). Consider the firm exit model of Jovanovic (1982). Let q be

the output of a firm, and C(q) a cost function that satisfies: C(0) = C′(0) = 0, C′(q) >

0, C′′(q) > 0, and limq→∞ C′(q) = ∞. The total cost is C(q)x, where (xt)t≥0 is a stochastic

process that satisfies xt = l(ηt); l is a positive, strictly increasing, and continuous function

with limη→−∞ l(η) = α1 > 0 and limη→∞ l(η) = α2 ≤ ∞; and (ηt)t≥0 is a stochastic process

that satisfies

ηt = ξ + εt, (εt)IID∼ N(0, σ2)

where ξ denotes firm type, which is connected to firm efficiency and unobservable. At the be-

ginning of each period, the firm observes x, and must decide whether to exit the industry or

not. The firm has prior belief ξ ∼ N(µ, γ) and updates it in a Bayesian manner after observ-

ing x′, so the posterior ξ|x′ ∼ N(µ′, γ′), where γ′ =(

1γ + 1

)−1and µ′ = γ′

(µγ + l−1(x′)

Let π(p, x) = maxq

[pq− C(q)x] be the maximal profits, where (pt)t≥0 is a bounded price se-

quence which is Markovian with transition probability h. Jovanovic (1982) shows that π is

a bounded and continuous function. Let W > 0 denote the expected present value of the

firm’s fixed factor in a different industry. Then the continuation value operator satisfies

Qψ(p, x, µ, γ) = π(p, x) + β∫

maxW, ψ(p′, x′, µ′, γ′) f (x′|µ, γ)h(p′|p)d(x′, p′)

Since both the exit and flow continuation payoffs are bounded, Assumption 2.1 satisfies

trivially by letting g be the upper bound of W ∨ π, m = 1 and d = 0. So Q admits a unique

fixed point ψ∗ in bZ that can be interpreted as the value of staying in the industry for one

period and performing optimally afterwards.

The two operators Q and T are closely related, in the sense that the n-th iterate of the value

function can be obtained from the n-th iterate of the continuation value function by taking the

pointwise maximum of this function and r. In particular, iterates of these operators converge

to their respective fixed points at the same rate. The next proposition clarifies:

Proposition 2.1. Fix ψ0 ∈ b`Z and let v0 := r ∨ ψ0. If vn := Tnv0 and ψn := Qnψ0 for some

n ∈ N, then vn = r ∨ ψn.

3. PROPERTIES OF CONTINUATION VALUES

In this section we explore some further properties of the continuation value function. As

the most significant result among those we establish, the continuation value function ψ∗

is shown to be smooth (continuously differentiable) under mild assumptions. While the

value function v∗ usually has kinks, ψ∗ can be smoother since the incorporated integration

operation creates a smoothing effect. This makes the continuation value based approach

more favorable for numerical computation than the value function based approaches since

smooth functions are easier to approximate numerically.

3.1. Continuity. We establish two results on continuity. The first one serves general prob-

lems, while the second one works well when the stochastic kernel P admits a density repre-

sentation.

Assumption 3.1. The flow continuation payoff function c is continuous.

Assumption 3.2. The function z 7→∫

maxr(z′), ψ(z′)P(z, dz′) is continuous for all contin-

uous function ψ ∈ b`Z.

Assumption 3.3. The exit payoff function r is continuous.

We have the following general result on the continuity of ψ∗. The continuity of v∗ can be

obtained as a byproduct under an additional continuity assumption of r.

Proposition 3.1. If Assumptions 2.1 and 3.1 - 3.2 hold, and g is continuous, then ψ∗ is continuous.

If in addition Assumption 3.3 holds, then v∗ is continuous.

In many applications, the stochastic kernel P has a density representation, which makes the

verification of Assumption 3.2 easier.

Definition 3.1. A stochastic density kernel (or density kernel) on Z is a measurable function

f : Z× Z→ R+ such that∫f (z′|z)dz′ :=:

∫f (z′|z)λ(dz′) = 1 for all z ∈ Z

where λ denotes the Lebesgue measure. We say that the stochastic kernel P has a density

representation if there exists a density kernel f such that

P(z, B) =∫1(z′ ∈ B) f (z′|z)dz′ for all z ∈ Z and B ∈ Z

The following result provides an alternative way to obtain the continuity of ψ∗ and v∗ when

P has a density representation, which is highly valuable in applications.

Proposition 3.2. Suppose Assumptions 2.1, 3.1 and the following conditions hold:

(1) P has a density representation f , and z 7→ f (z′|z) is continuous for all z′ ∈ Z;

(2) z 7→∫|r(z′)| f (z′|z)dz′, z 7→

∫g(z′) f (z′|z)dz′, and g are continuous.

Then ψ∗ is continuous. If in addition Assumption 3.3 holds, then v∗ is continuous.

Remark 3.1. When the return functions r and c are bounded, as is the case of many standard

economic models, establishing the continuity of ψ∗ is even easier. For general problems, we

only require that Assumption 3.1 holds and that P satisfies the Feller property. When P has

a density representation f , Assumption 3.1 and the continuity of z 7→ f (z′|z) (for all z′ ∈ Z)

are sufficient for ψ∗ to be continuous.

Example 2.5 (Continued). Recall the firm exit model of Jovanovic (1982). The exit payoff

W and flow continuation payoff π are bounded and continuous, and the Feller property in

this case can be easily verified by applying Lemma 7.1 in the Appendix. Therefore, ψ∗ is

continuous. Since the exit payoff W is constant, v∗ is continuous.

Remark 3.2. The continuity of ψ∗ does not necessarily require the continuity of r, while the

continuity of v∗ usually does. Intuitively, the integration operation inside operator Q has a

smoothing effect.

Example 2.2 (Continued). Recall the job search problem where the wage sequence is driven

by a general Markov process (zt)t≥0. Notice that P has a density representation f , and w 7→f (w′|w) is continuous for all w′ ∈ Z. Moreover, it is easy to verify the following statements:

(1)∫| ln w′| f (w′|w)dw′ = σ

√2π e

[− (ρ ln w+b)2

]+ (ρ ln w + b)

[1− 2Φ

(− ρ ln w+b

)](2)∫

w′a f (w′|w)dw′ = waρe(

ab+ a2σ22

)(a 6= 0)

where Φ denotes the normal cumulative distribution function. So the second condition of

Proposition 3.2 holds. Therefore, we can show that ψ∗ and v∗ are continuous for all three

types of u functions.

Example 2.3 (Continued). For the perpetual option problem presented previously, P admits

a density representation f , and x 7→ f (x′|x) is continuous for all x′ ∈ Z. Since∫

x′ρ f (x′|x)dx′

= exp(

ρ2 ln x + ρb + ρ2σ2

)for all ρ ∈ R, we can easily verify the second condition of Propo-

sition 3.2 by applying Lemma 7.1 in the Appendix. Therefore, ψ∗ and v∗ are continuous.

Example 2.4 (Continued). Recall the firm exit problem of Hopenhayn (1992). Notice that

c(a) = Ga1

1−α − c f is continuous. Through similar analysis as in Example 2.3, we can show

that ψ∗ and v∗ are continuous.

Example 3.1. (Firm Entry). Consider the firm entry problem in the style of Fajgelbaum

et al. (2015). In the beginning of each period, the firm observes an investment cost f , where

ftIID∼ h = LN(µ f , γ f ). Based on the belief of the fundamental, the firm has two choices:

enter the market, incur the observed investment cost and obtain a stochastic dividend xt

through production, or wait and reconsider next period. The firm aims to find a decision

rule that maximizes the expected net present value.

The stochastic dividend follows xt = ξt + εxt , εx

t IID∼ N(0, γx), where ξt and εx

t are respec-

tively the persistent and transient component. A public signal yt is released at the end of

each period, where yt = ξt + εyt ,

εyt IID∼ N(0, γy). Suppose that the firm has prior belief

ξ ∼ N(µ, γ) at the beginning of each period and updates it in a Bayesian way after observing

y, then the posterior belief ξ|y ∼ N(µ′, γ′), where γ′ =(

1γ + 1

)−1and µ′ = γ′

(µγ + y

The firm has constant absolute risk aversion u(x) = 1a (1− e−ax), a > 0. The continuation

value operator follows

Qψ(µ, γ) = β∫

Eµ′,γ′ [u(x′)]− f ′, ψ(µ′, γ′)

p( f ′, y|µ, γ)d( f ′, y) (12)

where p( f ′, y|µ, γ) = h( f ′)l(y|µ, γ) with l(y|µ, γ) = N(µ, γ + γy). Moreover, the exit payoff

r( f , µ, γ) = Eµ,γ[u(x)]− f = 1a

(1− exp

[−aµ + a2(γ+γx)

])− f .

This is another example with unbounded returns. To apply our method, consider the state

space Y = R×R++ with typical element y ∈ Y taking form of y = (µ, γ). Consider ` : Y →[1, ∞) defined by `(µ, γ) = exp

(−aµ + a2γ

)+ 1. Then from Theorem 2.1, Proposition 3.1,

and Lemma 7.1 in the Appendix, we can show that (See the Appendix for a detailed proof)

(1) Q is a well-defined mapping from b`Y into itself, and it is a contraction mapping of

modulus β on the complete metric space (b`Y, ρ`);

(2) ψ∗ and v∗ are continuous functions.

3.2. Shape Properties. We now study the shape properties of the continuation value func-

tion including monotonicity and concavity.

Assumption 3.4. The flow continuation payoff c is increasing (resp. decreasing).

Assumption 3.5. The function z 7→∫

maxr(z′), ψ(z′)P(z, dz′) is increasing (resp. decreas-

ing) for all increasing (resp. decreasing) function ψ ∈ b`Z.

Assumption 3.6. The exit payoff r is increasing (resp. decreasing).

Remark 3.3. If Assumption 3.6 holds and P is stochastically increasing in the sense that

P(z, ·) first-order stochastically dominates P(z, ·) for all z ≤ z, then Assumption 3.5 holds.

We have the following result regarding monotonicity.

Proposition 3.3. Under Assumptions 2.1 and 3.4 - 3.5, ψ∗ is increasing (resp. decreasing). If in

addition Assumption 3.6 holds, then v∗ is increasing (resp. decreasing).

The next result studies concavity properties of ψ∗.

Proposition 3.4. Suppose that Assumption 2.1 holds, r ≥ 0, P has a density representation f , and

that z → f (z′|z) (for all z′ ∈ Z) and c are concave (resp. convex) functions. Then ψ∗ is a concave

(resp. convex) function.

Example 2.2 (Continued). In the job search problem where the wage process (wt)t≥0 is

driven by a Markov process (zt)t≥0, the flow continuation payoff is constant, and each type

of exit payoff is increasing. From the properties of the log-normal distribution we know

that if ρ ≥ 0, the stochastic kernel corresponding to the density kernel f is stochastically in-

creasing. By Theorem 2.1 and Proposition 3.3, ψ∗ and v∗ are increasing under the following

circumstances: (1) u(w) = ln w and ρ ∈ [0, 1β ); (2) u(w) = w1−γ

1−γ (γ ≥ 0, γ 6= 1), ρ ∈ [0, 1] and

β exp[(1− γ)ρb + (1−γ)2ρ2σ2

Example 2.3 (Continued). Recall the pricing problem of the perpetual option. The exit pay-

off r(x) = (x − K)+ is increasing. Follow similar analysis as in Example 2.2, we can show

that ψ∗ and v∗ are increasing.

Example 2.4 (Continued). For the firm exit problem of Hopenhayn (1992), both r and c are

increasing functions. Similar as Examples 2.2 - 2.3, we can show that ψ∗ and v∗ are increasing

functions.

Example 3.1 (Continued). For the firm entry problem of Fajgelbaum et al. (2015), Proposition

3.3 shows that ψ∗ is increasing in µ, and v∗ is increasing in µ and decreasing in f .

3.3. Differentiability. Suppose Z ⊂ Rm, then a typical element z ∈ Z takes form of z =

(z1, ..., zm). For given function h defined on Z and for all z ∈ int(Z), define Dih(z) := ∂h(z)∂zi ,

i = 1, ..., m. For given z0 ∈ Z and δ > 0, define Bδ(z0) := z ∈ Z : ‖z− z0‖ < δ, Bδ(zi0) :=

zi ∈ Z(i) : |zi − zi0| < δ, Bδ(z0) and Bδ(zi

0) as their closures, where ‖ · ‖ is the Euclidean

norm, Z(i) is the i-th dimension of Z and Z(−i) denotes the remaining m− 1 dimensions of Z.

Assumption 3.7. P has a density representation f , and for all z′ ∈ Z, z 7→ f (z′|z) is differen-

tiable at interior points in the sense that Di f (z′|z) exists for all z ∈ int(Z), i = 1, ..., m.

Assumption 3.8. For all z0 ∈ int(Z), there exists δ > 0, such that for i = 1, ..., m, the following

functions take finite values:

(1) z−i0 7→

∫sup

zi∈Bδ(zi0)

|Di f (z′|z)|dz′;

(2) z−i0 7→

∫|r(z′)| sup

zi∈Bδ(zi0)

|Di f (z′|z)|dz′;

(3) z−i0 7→

∫g(z′) sup

zi∈Bδ(zi0)

|Di f (z′|z)|dz′.

Assumption 3.9. The flow continuation payoff function c is differentiable at interior points

in the sense that Dic(z) exists for all z ∈ int(Z), i = 1, ..., m.

The following result provides a group of sufficient conditions for ψ∗ to be differentiable.

Proposition 3.5. Under Assumptions 2.1 and 3.7 - 3.9, ψ∗ is differentiable at interior points in the

sense that Diψ∗(z) exists for all z ∈ int(Z), i = 1, ..., m.

We consider an alternative way to establish the property of differentiability.

Assumption 3.10. For all z′ ∈ Z, z 7→ f (z′|z) is twice differentiable at interior points in the

sense that D2i f (z′|z) exits for all z ∈ int(Z), i = 1, ..., m. Moreover, each (z, z′) 7→ Di f (z′|z) is

continuous.

Assumption 3.11. The following conditions hold for i = 1, ..., m

(1) There are finite solutions to D2i f (z′|z) = 0, and for all z0 ∈ int(Z), there exists δ > 0,

such that each solution (z′, z−i0 ) 7→ z∗i (z

′, z−i0 ) /∈ Bδ(zi

0) as ‖z′‖ → ∞;

(2) The following functions take finite values on int(Z): (a) z 7→∫|Di f (z′|z)|dz′; (b)

z 7→∫|r(z′)Di f (z′|z)|dz′; (c) z 7→

∫g(z′)|Di f (z′|z)|dz′. Moreover, r and g are con-

tinuous.

Remark 3.4. A sufficient condition for condition (1) of Assumption 3.11 is frequently used

when the state space is unbounded: There are finite solutions to D2i f (z′|z) = 0, and each so-

lution (z′, z−i) 7→ z∗i (z′, z−i) satisfies |z∗i (z′, z−i)| → ∞ as ‖z′‖ → ∞ for given z−i ∈ int(Z(−i));

The following proposition, which avoids verifying assumption 3.8, is useful in applications

where unbounded state space presents, as to be shown below.

Proposition 3.6. Under Assumptions 2.1 and 3.9 - 3.11, ψ∗ is differentiable at interior points in the

sense that Diψ∗(z) exists for all z ∈ int(Z), i = 1, ..., m.

Outside of being highly valuable for numerical computaion, smoothness is a desired prop-

erty in a lot of applications in which we want to characterize the properties of the optimal

policy, as to be shown in the next section.

Assumption 3.12. For i = 1, ..., m, the following conditions hold:

(1) The following functions are continuous on int(Z): (a) z 7→∫|Di f (z′|z)|dz′; (b) z 7→∫

|r(z′)Di f (z′|z)|dz′; and (c) z 7→∫

g(z′)|Di f (z′|z)|dz′;

(2) The flow continuation payoff function c is continuously differentiable at interior points

in the sense that z 7→ Dic(z) is continuous on int(Z).

The next result provides sufficient conditions for ψ∗ to be smooth.

Proposition 3.7. Suppose that Assumption 3.12 holds, and either (1) or (2) holds:

(1) The assumptions of Proposition 3.5 hold, and each z 7→ Di f (z′|z) is continuous on int(Z);

(2) The assumptions of Proposition 3.6 hold.

Then ψ∗ is continuously differentiable at interior points in the sense that z 7→ Diψ∗(z) is continuous

on int(Z), i = 1, ..., m.

Remark 3.5. When the return functions r and c are bounded, conditions (1.b) and (1.c) of

Assumption 3.12 are not required to establish the smoothness of ψ∗ in Proposition 3.7.

Example 2.2 (Continued). In the extended job search model where (wt)t≥0 is generated by

the Markov process (zt)t≥0, it is easy to verify the following statements:

(1) The solutions to ∂2 f (w′|w)∂w2 = 0 are w∗(w′) = exp

(ln w′−b

ρ − σ2

ρ2 ± σ2

√1ρ2 +

(2)∫ ∣∣∣ ∂ f (w′|w)

∣∣∣dw′ = ρ2σw

√2π ;

(3)∣∣∣(ln w′) ∂ f (w′|w)

∣∣∣ ≤ ρw

1w′√

2πσ2 exp[− (ln w′−ρ ln w−b)2

](ln w′)2+|ρ ln w+b|| ln w′|

2σ2 ;

(4)∣∣∣w′a ∂ f (w′|w)

∣∣∣ ≤ ρw

1w′√

2πσ2 exp[− (ln w′−ρ ln w−b)2

](ln w′−ρ ln w−b)2+w′2a

2σ2 , a 6= 0;

(5) The four terms on both sides of statements (3) and (4) are continuous in w;

(6) The integrations of the right-hand-side terms of statements (3) and (4) with respect to

w′ are continuous in w.

From the first statement we know that condition (1) of Assumption 3.11 holds. Based on

statements (2) - (6) and Lemma 7.1 in the Appendix, we can show that condition (2) of As-

sumption 3.12 holds. The remaining conditions of Proposition 3.7 are easy to verify. There-

fore, ψ∗ is continuously differentiable.

To see that ψ∗ is smoother than v∗, we run the following simulation. For simplicity, we

consider r(w) = w1−β , and set β = 0.96, ρ = 0.6, σ = 1, b = 0 and c = 1. From Figure 1 we

can see that although v∗ has a kink in the interior of the state space, ψ∗ is smooth in the sense

that it is continuously differentiable and allows no kinks.

Example 2.3 (Continued). Recall the pricing problem of the perpetual option. By similar

analysis as in Example 2.2, we can show that ψ∗ is continuously differentiable. This is the

case despite the fact that the exit payoff r(x) = (x − K)+ has a kink at x = K. Therefore,

FIGURE 1. Comparison of ψ∗ and v∗

in general, the exit payoff function is not required to be differentiable for the continuation

value function to be smooth.

Example 2.4 (Continued). For the firm exit model of Hopenhayn (1992), through similar

analysis as in Examples 2.2 - 2.3, we can show that ψ∗ is continuously differentiable.

3.4. Parametric Continuity. In applications, we are often curious about how the value func-

tion, continuation value function, and optimal policy change in response to the variation of

some key parameters. In such circumstances, parametric continuity is highly valuable.

Consider the parameter space Θ ⊂ Rk. Let Pθ , rθ , cθ , v∗θ , and ψ∗θ denote the stochastic kernel,

exit payoff, flow continuation payoff, value function, and continuation value function with

respect to parameter θ ∈ Θ, respectively. Under Assumption 2.1, for all θ ∈ Θ, there exist

measurable map gθ : Z → R+, and constants mθ , dθ ∈ R+ with βmθ < 1 such that for all

z ∈ Z: (1) max∫|rθ(z′)|Pθ(z, dz′), |cθ(z)|

≤ gθ(z); and (2)

∫gθ(z)Pθ(z, dz′) ≤ mθ gθ(z)+ dθ .

Define m := supθ∈Θ

mθ and d := supθ∈Θ

Assumption 3.13. βm < 1 and d < ∞.

Remark 3.6. To simplify analysis, we consider the parameter space Θ that does not include

the space of β. An alternative way to treat this problem is to consider β ∈ [0, a], where

a ∈ [0, 1), and include this space as part of Θ. In this case, Assumption 3.13 is replaced by

am < 1 and d < ∞. All the theoretical results on parametric continuity of this paper remain

true if we make this change.

Assumption 3.14. For all θ ∈ Θ, Pθ has a density representation fθ . For all z, z′ ∈ Z, θ 7→fθ(z′|z) is continuous. For all z ∈ Z, θ 7→

∫|rθ(z′)| fθ(z′|z)dz′ and θ 7→

∫gθ(z′) fθ(z′|z)dz′

are continuous.

Assumption 3.15. For all z ∈ Z, θ 7→ rθ(z), θ 7→ cθ(z) and θ 7→ gθ(z) are continuous.

Under these assumptions we have the following result for parametric continuity.

Proposition 3.8. Under Assumptions 2.1 and 3.13 - 3.15, θ 7→ ψ∗θ (z) and θ 7→ v∗θ (z) are continu-

ous for all z ∈ Z.

Example 2.2 (Continued). Recall the extension of the job search model of McCall (1970). For

simplicity, consider u(w) = ln w. Let the parameter space Θ = (− 1β , 1

β )× A× B× C, where

A, B are bounded subsets of R++,R respectively, and C ⊂ R. A typical element θ ∈ Θ

takes form of θ = (ρ, σ, b, c). Based on Proposition 3.8, θ 7→ ψ∗θ (w) and θ 7→ v∗θ (w) are

continuous for all w ∈ Z. Similarly, we can establish the parametric continuity property for

u(w) = w1−γ

1−γ (γ ≥ 0, γ 6= 1).

Remark 3.7. The parametric continuity result of Examples 2.2 - 2.5 and 3.1 can be established

similarly. To simplify analysis, unless explicitly specified, we do not discuss parametric con-

tinuity for other examples, though this property holds for each of them.

4. OPTIMAL POLICIES

In this section, we discuss several other significant advantages of the continuation value

based approach over traditional approaches based on the value function. To begin with, for

a broad range of problems, the continuation value function exists in a lower dimensional

space than the value function. The relationship is asymmetric. While each state variable that

appears in the continuation value function must appear in the value function, the converse

is not true. This facilitates numerical computation significantly since the curse of dimension-

ality is greatly mitigated.

Moreover, among these problems, the decision rule usually exhibits threshold behavior with

respect to some state variable, in the sense that the sequential decision process terminates

whenever a threshold level is achieved by that state process. In such cases, the continuation

value based method allows for a sharp analysis of the optimal policy. This type of problem

is pervasive in quantitative and theoretical economic modeling, as we now formulate.

Suppose that the state space Z ⊂ Rm and can be written as Z = X× Y, where X is a con-

vex subset of R and Y is a convex subset of Rm−1.10 The state process (Zn)n≥0 is then

10To simplify analysis, we assume that X is one dimensional. In general, the dimension of X can be higher.

(Xn, Yn)n≥0, where (Xn)n≥0 and (Yn)n≥0 are two stochastic processes taking values in X

and Y respectively. In particular, the period-n state vector Zn = (Xn, Yn), where Xn repre-

sents the first dimension and Yn the rest m− 1 dimensions of the random variable Zn.

Assume that stochastic processes (Xn)n≥0 and (Yn)n≥0 satisfy the following properties: (1)

(Monotonicity) The exit payoff function r is monotone on X; and (2) (Conditional Independence)

Conditional on each Yn, the next period states (Xn+1, Yn+1) and the current state Xn are in-

dependent. We call each random variable Xn the threshold state variable of period n, and each

Yn the environment state vector (or environment states, or environment) of period n. Moreover,

we call X the threshold state space and Y the environment space. Assume further, for this thresh-

old state optimal stopping problem, that the flow continuation payoff c is defined on the

environment space, i.e., c : Y → R.

Denote x as the threshold state variable and y as the environment so that the vector of state

variables in the current period is z = (x, y). Let z′ = (x′, y′) be the vector of states of next

period. We know from the definition of the threshold state variable that the stochastic ker-

nel P(z, dz′) can be represented by the conditional distribution function of (x′, y′) given y,

donoted as Fy(x′, y′), i.e., P(z, dz′) = P((x, y), d(x′, y′)) = dFy(x′, y′). Notice that under

this setup, the continuation value ψ∗ is a function of y only, while the value function v∗ is a

function of both x and y. So ψ∗ has strictly fewer arguments than v∗.11

Assumption 4.1. r is strictly monotone on X. Moreover, for all y ∈ Y, there exists x ∈ X such

that r(x, y) = c(y) + β∫

v∗(x′, y′)dFy(x′, y′).

Under Assumption 4.1, the reservation rule property holds. When the exit payoff r is strictly

increasing in x, for instance, this property states that if the agent terminates at state x ∈ X

at a given point in time, then he would have terminated at any higher state at that moment.

Specifically, there is a decision threshold x : Y → X such that when the state variable x at-

tains this threshold level, i.e., x = x(y), the agent is indifferent between terminating and

continuing, i.e., r(x(y), y) = ψ∗(y) for all y ∈ Y.

As shown in Theorem 2.1, the optimal policy σ∗ : Z → 0, 1 satisfies σ∗(z) = 1r(z) ≥ψ∗(z). For threshold state optimal stopping problems, this policy is fully specified by the

decision threshold x. In particular, under Assumption 4.1, the optimal policy σ∗(x, y) =

1x ≥ x(y) if r is strictly increasing in x, and σ∗(x, y) = 1x ≤ x(y) if r is strictly decreas-

ing in x. Based on the properties of the continuation value function, the properties of the

decision threshold x can be easily established. We summarize them in the following. Firstly,

we have the following result for continuity.

11In this case, since the threshold state is assumed one-dimensional, ψ∗ has one less argument than v∗. In

general, the difference in the arguments of ψ∗ and v∗ can be strictly larger than one.

Proposition 4.1. Suppose that either the assumptions of Proposition 3.1 or Proposition 3.2 hold, and

that Assumption 4.1 holds, then x is continuous.

The next result provides sufficient conditions for x to be monotone.

Proposition 4.2. Suppose that the assumptions of Proposition 3.3 and Assumption 4.1 hold, and

that r is defined on X. If ψ∗ is increasing and r is strictly increasing (resp. decreasing), then x is

increasing (resp. decreasing). If ψ∗ is decreasing and r is strictly increasing (resp. decreasing), then

x is decreasing (resp. increasing).

A typical element y ∈ Y takes form of y = (y1, ..., ym−1). For i = 1, ..., m − 1 and given

functions h : Y → R, l : X × Y → R, define Dih(y) := ∂h(y)∂yi , Dil(x, y) := ∂l(x,y)

∂yi , and

Dxl(x, y) := ∂l(x,y)∂x . The following result on the smoothness of x follows from Proposition 3.7

and the implicit function theorem.

Proposition 4.3. Suppose that the assumptions of Proposition 3.7 and Assumption 4.1 hold. More-

over, r is continuously differentiable on int(Z). Then x is continuously differentiable on int(Y). In

particular, Di x(y) = −Dir(x(y),y)−Diψ∗(y)

Dxr(x(y),y) for all y ∈ int(Y).

Intuitively, (x, y) 7→ r(x, y)− ψ∗(y) denotes the premium of terminating the sequential deci-

sion process. So functions (x, y) 7→ Dir(x, y)− Diψ∗(y); Dxr(x, y) denote the instantaneous

rate of change in the terminating premium in response to an instantaneous change in the

environment state yi and threshold state x, respectively. Holding the terminating premium

at 0, the change of premium as a result of change of x cancels the premium change resulting

from the variation of y. Therefore, the instantaneous rate of change of x(y) with respect to yi

is equivalent to the ratio of the instantaneous rates of changes in the premium. The negative

sign is due to the 0-sum property of the terminating premium at the decision threshold x.

Let xθ be the decision threshold with respect to θ ∈ Θ. We have the following result for

parametric continuity.

Proposition 4.4. Suppose that the assumptions of Proposition 3.8, and Assumptions 3.3 and 4.1

hold. Then θ 7→ xθ(y) is continuous for all y ∈ Y.

Example 3.1 (Continued). Recall the firm entry problem of Fajgelbaum et al. (2015). This is

a typical threshold state optimal stopping problem. In particular, the threshold state space

X = R+, and the threshold state variable x = f . The environment space Y = R×R+ with

environment states y = (µ, γ). The value function of the firm follows

v∗( f , µ, γ) = max

Eµ,γ[u(x)]− f , β∫

v∗( f ′, µ′, γ′)p( f ′, y|µ, γ)d( f ′, y)

Since there are 3 state variables, v∗ is defined on a space of 3-dimensional. However, ψ∗

is defined on a space of 2-dimensional since the environment space is one dimension less.

Moreover, the optimal policy is determined by a reservation cost function f : Y → R such

that when f = f (µ, γ), the firm is indifferent between entering the market and waiting. In

particular, f (µ, γ) = Eµ,γ[u(x)]− ψ∗(µ, γ) and optimal policy σ∗( f , µ, γ) = 1 f ≤ f (µ, γ)for all ( f , µ, γ) ∈ Z. By Proposition 4.1, we can show that f is continuous.

5. COMPUTATIONAL EFFICIENCY

The motivation of this section is to provide an illustration of the computational efficiency

of the continuation value based method over the traditional value function based methods.

Numerical experiments show that the partial impact of lower dimensionality of the continu-

ation value can be huge, even the difference between the arguments of this function and the

value function is only a single variable. For example, while solving a well known version of

the job search model in Section 5.1, the continuation value iteration takes only 171 seconds

to compute the optimal policy with the level of accuracy 10−6 (see group-3 experiments), as

opposed to more than 7 days for the value function iteration.

Moreover, we do not provide a detailed comparison of the two approaches in Section 5.2,

as the computation via value function takes too long (more than 7 days) due to the curse of

dimensionality. However, our approach takes only 15.45 minutes to compute the optimal

policy with a level of accuracy 10−6. Finally, all the applications demonstrate the effective-

ness our approach in characterizing the optimal policy.

5.1. Job Search II. Consider another extension of McCall’s job search model presented by

Ljungqvist and Sargent (2012). The model is as the benchmark case, apart from the fact that

the distribution of the wage process h is unknown. The worker knows that there are two

possible densities f and g. At the start of time, nature selects h to be either f or g. The choice

is not observed by the worker, who puts prior probability π0 on f being chosen. By the

Bayes’ rule, πt updates via πt+1 = πt f (wt+1)πt f (wt+1)+(1−πt)g(wt+1)

. We can express the value function

of the unemployed worker recursively as follows

v∗(w, π) = max

w1− β

, c + β∫

v∗(w′, π′)hπ(w′)dw′

where π′ = q(w′, π) = π f (w′)π f (w′)+(1−π)g(w′) and hπ(w′) := π f (w′) + (1− π)g(w′). This is a

typical threshold state optimal stopping problem, in which the threshold state variable is w

and the environment is π. In particular, ψ∗ is defined on a space that is of lower dimensional

than the state space where v∗ is defined, in the sense that ψ∗ is a function of π only while v∗

is a function of both w and π.

Following Ljungqvist and Sargent (2012), we set f = Beta(1, 1) and g = Beta(3, 1.2). Then

the state space Z = [0, 2]× [0, 1]. Based on our theory, the optimal policy is characterized by a

reservation wage function w : [0, 1]→ R such that when w = w(π), the worker is indifferent

between accepting and rejecting the offer. Denote b[0, 1] as the set of bounded functions on

[0, 1]. Consider the Banach space (b[0, 1], ‖ · ‖∞) as the space of candidate functions. The

continuation value operator defined on this space satisfies

Qψ(π) = c + β∫

1− β, ψ q(w′, π)

hπ(w′)dw′ (13)

This is the special case of our theory when the state space is compact, and both exit and flow

continuation payoffs are bounded.

Proposition 5.1. When the unemployment compensation c ∈ [0, 2], the following statements hold:

(1) Q is a well-defined mapping from b[0, 1] into itself, and it is a contraction mapping of modulus

β on the Banach space (b[0, 1], ‖ · ‖∞).

(2) The value function v∗(w, π) = max

w1−β , ψ∗(π)

, reservation wage w(π) = (1+ β)ψ∗(π),

and optimal policy σ∗(w, π) = 1w ≥ w(π) for all (w, π) ∈ Z.

(3) ψ∗, w, and v∗ are continuous functions.

FIGURE 2. The reservation wage

Following Section 6.6 of Ljungqvist and Sargent (2012), we set β = 0.95 and c = 0.6. In the

benchmark simulation, the grid points (w, π) lie in [0, 2]× [104, 1− 10−4] with 100 points for

the w grid and 50 points for the π grid. As shown in Figure 2, the reservation wage w is a

decreasing in π. Intuitively, f is a less attractive offer distribution than g, and larger π means

more weight on f and less on g. Therefore, larger π depresses the worker’s assessment of

his future prospects, and relatively low current offers become more attractive.

Since the computation is 2-dimensional via value function iteration (VFI), and is only 1-

dimensional via continuation value function iteration (CVI), we can expect that the compu-

tation via CVI would be much faster. To make a comparison, we conduct several groups

of experiments and provide the time taken by the two approaches. All the experiments are

processed in a standard Python environment on a laptop with a 2.5 GHz Intel Core i5.

5.1.1. Group-1 Experiments. In this group, we explore the time taken by the two approaches

to compute the fixed point at different levels of accuracy and across different parameteriza-

tions. Specifically, Table 1 provides the list of experiments we perform. In all simulations,

the setup of the grid points is the same as the baseline simulation. For each given test and

level of accuracy, we run the simulation 50 times for CVI, 20 times for VFI, and calculate the

average time. The results are provided in Table 2.

TABLE 1. Group-1 Experiments

Parameter Test 1 Test 2 Test 3 Test 4 Test 5

β 0.9 0.95 0.98 0.95 0.95

c 0.6 0.6 0.6 0.001 1

TABLE 2. Time Taken of Group-1 Experiments

Test/Method/Precision 10−3 10−4 10−5 10−6 10−7 10−8

Test 1VFI 114.17 140.94 174.91 201.77 228.59 255.67

CVI 0.67 0.92 1.16 1.43 1.71 1.94

Test 2VFI 181.78 234.58 271.89 323.22 339.87 341.55

CVI 0.95 1.49 1.80 2.27 2.69 3.11

Test 3VFI 335.78 335.87 335.28 335.91 338.70 334.21

CVI 1.77 2.68 3.08 3.03 3.03 3.06

Test 4VFI 154.18 201.05 247.72 294.90 335.32 335.00

CVI 0.79 1.22 1.65 2.06 2.50 2.91

Test 5VFI 275.41 336.02 326.33 327.41 327.11 327.71

CVI 1.33 2.12 2.79 2.99 2.97 2.97

As can be seen in Table 2, our method performs much better than VFI. Averagely speaking,

CVI is 141 times faster than VFI. In the best case, CVI is 207 times faster. In Test 5, VFI takes

275.41 seconds to achieve the level of accuracy 10−3, while CVI takes only 1.33 seconds. Even

if in the worst case, CVI is 109 times faster. In Test 5, VFI takes 327.41 seconds while CVI takes

only 2.99 seconds to achieve the level of accuracy 10−6.

5.1.2. Group-2 Experiments. In applications, more grid points are needed for the numerical

approximation to be more accurate. In this group of experiments, we compare how the two

approaches perform under different grid sizes. The parameterization is the same as in the

benchmark setup. Again, we run the simulation 50 times for CVI, 20 times for VFI, and

calculate the average time. Information and results of these experiments are provided in

Table 3 and Table 4, respectively.

TABLE 3. Group-2 Experiments

Variable Test 2 Test 6 Test 7 Test 8 Test 9 Test 10

π 50 50 50 100 100 100

w 100 150 200 100 150 200

TABLE 4. Time Taken of Group-2 Experiments

Test/Precision/Method 10−3 10−4 10−5 10−6 10−7 10−8

Test 2VFI 181.78 234.58 271.89 323.22 339.87 341.55

CVI 0.95 1.49 1.80 2.27 2.69 3.11

Test 6VFI 264.34 336.20 407.52 476.01 508.05 509.05

CVI 0.96 1.39 1.82 2.30 2.73 3.14

Test 7VFI 355.40 449.55 545.51 641.05 679.93 678.28

CVI 0.92 1.37 1.79 2.22 2.84 3.07

Test 8VFI 352.76 447.36 541.75 639.73 678.91 677.52

CVI 1.94 2.74 3.58 4.42 5.30 6.14

Test 9VFI 526.72 670.19 812.66 951.78 1017.29 1015.15

CVI 1.81 2.68 3.68 4.33 5.23 6.08

Test 10VFI 706.34 897.07 1086.15 1278.27 1354.37 1360.07

CVI 1.83 2.72 3.51 4.40 5.21 6.10

As can be seen, our approach outperforms VFI more obviously as the grid size increases. In

Table 4 we see that as we increase the number of grid points for w, the speed of CVI is not

affected. However, the speed of VFI reduces significantly. Amongst tests 2, 6 and 7, CVI is

219 times faster than VFI on average. In the best case, CVI is 386 times faster. While it takes

VFI 355.40 seconds to achieve a level of accuracy 10−3 in Test 7, CVI takes only 0.92 second.

As we increase the grids of w from 100 to 200, CVI is not affected, but the time taken for VFI

almost doubles. Obviously, this is because the grid points for w are not used for CVI, while

they are part of the grids for VFI.

As we increase the grid size of both w and π, there is a slight decrease in the computation

speed of CVI. Nevertheless, the decrease in the speed of VFI is almost exponential. Amongst

tests 2 and 8 - 10, CVI is 223.41 times as fast as VFI on average. In Test 10, VFI takes 706.34

seconds to achieve a level of precision 10−3, instead, CVI takes only 1.83 seconds, which is

386 times faster.

5.1.3. Group-3 Experiments. Since the total number of grids increases exponentially with re-

spect to the total number of states, the speed of computation drops dramatically as the num-

ber of states increases. For example, with 3 state variables, VFI suffers the ”curse of dimen-

sionality”, while CVI works quite well. To illustrate this point, we consider the parametric

class problem with respect to the unemployment compensation c, in which case c is treated

as an alternative state variable. In this case, VFI has 3 state variables and the computation

takes more than 7 days. However, the CVI has only 2 states and the computation finishes

within 171 seconds to attain the accuracy level 10−6. Hence, we can conveniently calculate

via CVI the reservation wage as a function of both π and c. Figure 3 provides the result.

FIGURE 3. The reservation wage

This figure, in which a whole class of c values are considered, serves as a generalization of

Figure 2. Not surprisingly, the reservation wage increases as c increases, since a higher level

of compensation hinders the agent’s incentive to enter into the labor market.

5.2. Job Search III. Consider the adaptive search model proposed (though not implemented)

in McCall (1970). The model explores how the reservation utility changes in response to the

agent’s expectation of the mean and variance of the unknown wage offer distribution. Sup-

pose the wage process follows

w = ξ + εw, εw ∼ N(0, γw) (14)

where ξ is the persistent component with prior belief ξ ∼ N(µ, γ), and εw is a transitory com-

ponent. The worker’s current estimate of the next period wage distribution is f (w′|µ, γ) =

N(µ, γ + γw). After observing w′ next period, the posterior belief ξ|w′ ∼ N(µ′, γ′), where

γ′ =(

1γ + 1

)−1and µ′ = γ′

(µγ + w′

). The worker has constant absolute risk aversion

u(w) = 1a (1− e−aw), a > 0. Once he accepts the offer, the search process terminates and he

obtains the same utility u(w) in each future period. If the agent rejects the offer, he obtains

utility c from unemployment compensation and reconsiders next period. The value function

follows

v∗(w, µ, γ) = max

1− β, c + β

∫v∗(w′, µ′, γ′) f (w′|µ, γ)dw′

This is another threhold state optimal stopping problem. In particular, the threshold state

space X = R and the threshold state variable x = w. The environment space Y = R×R+

and the environment states y = (µ, γ). Since there are 3 state variables, standard approaches

via VFI suffers the ”curse of dimensionality”. The computation via VFI is as time-consuming

as it performs in Experiment 3 of Section 5.1. However, the computation via CVI is only 2-

dimensional and our theory works well.

Notice that the exit payoff is unbounded below. We consider a weight function ` : Y → [1, ∞)

defined by `(µ, γ) = exp(−aµ + a2γ

)+ 1 and the space of candidate functions (b`Y, ρ`). For

all ψ ∈ b`Y, the continuation value operator follows

Qψ(µ, γ) = c + β∫

u(w′)1− β

, ψ(µ′, γ′)

f (w′|µ, γ)dw′ (16)

where µ′, γ′ and f (w′|µ, γ) are defined as above. Based on the theory of Section 4, the optimal

policy is determined by a reservation wage function w : Y → R such that when w = w(µ, γ),

the worker is indifferent between accepting and rejecting the job offer.

Proposition 5.2. Suppose that the unemployment compensation satisfies c < 1a . Then the following

statements hold:

(1) Q is a well-defined mapping from b`Y into itself, and it is a contraction mapping of modulus

β on the complete metric space (b`Y, ρ`).

(2) For all (w, µ, γ) ∈ Z, the value function v∗(w, µ, γ) = max

u(w)1−β , ψ∗(µ, γ)

, reservation

wage w(µ, γ) = − 1a ln [1− a(1− β)ψ∗(µ, γ)], and optimal policy σ∗(w, µ, γ) = 1w ≥

w(µ, γ).(3) ψ∗, w, and v∗ are continuous functions.

(4) ψ∗ and w are increasing functions of µ. v∗ is an increasing function of w and µ.

Remark 5.1. When risk aversion is considered, the exit payoff is bounded above, though it is

unbounded below. However, it is easy to verify that our theory can be applied to all settings

where the exit payoff is of form r(w) = aw + b, a, b ∈ R+, or r(w) = aew + b with a, b ∈ R+,

and the flow continuation payoff c ≥ b.

Since in the current context (1− β)ψ∗ is a monotone transformation of the reservation wage

and possesses clear economic intuition, we define it as the reservation utility function and use

it for the remaining analysis.

In the simulation, we set β = 0.95 and a = 0.6. To parallel Ljungqvist and Sargent (2012),

we set c = 0.0493 after transforming their parameterization by the utility function u. The

literature provides little guidance on γw, so we perform a sensitivity analysis. The grid

points (µ, γ) lie in [−50, 50]× [10−4, 25], with 150 points for the µ grid and 75 points for the

γ grid. The grid is scaled to be more dense when the absolute values of µ and γ are small.

We set the threshold function outside the grid to its value at the closest grid. The integration

is computed via Monte Carlo with 1000 draws.12 Figure 4 provides the simulation results.

There are several key characteristics in Figure 4. Firstly, in each case, the reservation utility

is an increasing function of µ, which parallels the result of Proposition 5.2. Naturally, a more

optimistic agent (higher µ) would expect that higher offers can be obtained. Thus he will not

accept the current offer until the utility obtained is high enough.

As another interesting point, for given µ of a relatively small value, the reservation utility is

increasing in γ. However, as µ gets large, this utility starts to be decreasing in γ. Intuitively,

although a pessimistic worker (low µ) expects that he will obtain low wage offers on average,

part of the downside risks are chopped off. Worst case scenario, he is ensured to get an

unemployment compensation c > 0. Thus, a higher level of uncertainty (higher γ) in the

offer distribution provides the worker with a better opportunity to ”try the fortune” for a

good offer. This pushes up the reservation utility. For an optimistic (high µ) but risk-averse

worker, since the choice is irreversible, when facing a higher level of uncertainty, the worker

has an incentive to enter the labor market at an earlier stage so as to avoid downside risks.

This depresses the reservation utility. For similar reasons, increasing γw creates a positive

effect on the reservation utility when µ is small.

5.3. Job Search IV. We consider another extension of the standard job search model of Mc-

Call (1970). Assume that the wage process follows

wt = ηt + θtξt (17)

ln θt = ρ ln θt−1 + ln ut (18)

12Changing the number of Monte Carlo samples, the grid range and grid density produces almost the same

results.

FIGURE 4. The reservation utility

where ρ ∈ [−1, 1] is a constant. The sequences ξtIID∼ h with

∫|ξ|h(ξ)dξ < ∞, ηt

IID∼ v

with∫|η|v(η)dη < ∞, and ut

IID∼ LN(0, σ2u). Moreover, ξt, ηt, and ut are inde-

pendent, and the sequence θt is independent of ξt and ηt. The process in (17) and

(18) is general in the sense that it incorporates several standard setups. For example, when

ξt and ηt are log normally distributed, it simplifies to the setup of Kaplan and Violante

(2010), where income fluctuation problems are studied. Furthermore, when ξtIID∼ N(0, 1),

through some slight modification, this process simplifies to a setup that incorporates the

standard stochastic volatility model (see, e.g., Taylor, 1982).

We set h = LN(0, σ2ξ ) and v = LN(µη , σ2

η). In this case, θt and ξt are persistent and transitory

components of income, respectively, while ut is treated as a shock to the persistent compo-

nent. ηt can be interpreted as social security, gifts, etc. The threshold state space X = R+

with threshold state process wt, and the environment space Y = R+ with environment

process θt. This is another example for which the computation via VFI lacks efficiency but

our method performs very well. The value function of the agent satisfies

v∗(w, θ) = max

w1− β

, c + β∫

v∗(w′, θ′) f (θ′|θ)h(ξ ′)v(η′)d(θ′, ξ ′, η′)

and the continuation value operator takes form of

Qψ(θ) = c + β∫

1− β, ψ(θ′)

f (θ′|θ)h(ξ ′)v(η′)d(θ′, ξ ′, η′)

where w′ = η′+ θ′ξ ′, and f (θ′|θ) = LN(ρ ln θ, σ2u) is the density kernel of the Markov process

θt. Suppose ρ ∈ [−1, 1] and β exp(

ρ2σ2u

)< 1, then Assumption 2.1 holds by letting g(θ) =

θρ + θ−ρ, m = exp(

ρ2σ2u

), and d = 1

β − 1.

Proposition 5.3. Suppose ρ ∈ [−1, 1], λ := β exp(

ρ2σ2u

)< 1, and the unemployment compensa-

tion c ∈ R+. Then the following statements hold:

(1) Q is a well-defined mapping from b`Y into itself, and it is a contraction mapping of modulus

λ on the complete metric space (b`Y, ρ`).

(2) The the value function v∗(w, θ) = max

w1−β , ψ∗(θ)

, reservation wage w(θ) = (1 −

β)ψ∗(θ), and optimal policy σ∗(w, θ) = 1w ≥ w(θ) for all (w, θ) ∈ Z.

(3) ψ∗ and w are continuously differentiable, and v∗ is continuous.

We choose β = 0.95 and µη = 0 for the baseline parameterization. We set σξ = 0.05, ση =

0.001, and σu = 0.01. In the first simulation, we consider the parametric class problem with

respect to c, where we let c ∈ [0, 10] with 50 grid points and ρ = 1. In the second simulation,

we consider the parametric class problem with respect to ρ, where ρ ∈ [0.5, 1] with 20 grid

points and we set c = 0.6 as in Ljungqvist and Sargent (2012). We set θ ∈ [10−3, 25] with

100 grid points, and the grid is scaled to be more dense when θ is smaller. Similar as before,

the reservation wage outside the grid points is set to its value at the closest grid, and the

integration is computed via Monte Carlo with 1000 draws.

We see in Figure 5 that the reservation wage is an increasing function of θ. When the re-

alization of θ is small, the reservation wage is an increasing function of the unemployment

compensation c. When θ gets large, the reservation wage becomes less sensitive to c. Intu-

itively, when θ gets well above c, since the shock is highly persistent (ρ = 1), the reservation

wage is completely determined by the realization of the permanent shock.

In Figure 6, we see that for any ρ ∈ [0.5, 1], the reservation wage is an increasing function of

θ. For larger ρ, the slope of the reservation wage function is sharper. Intuitively, ρ measures

the degree of income persistence. As ρ gets larger, the effect of a positive shock lasts longer,

which pushes up the worker’s reservation wage.

FIGURE 5. The reservation wage FIGURE 6. The reservation wage

6. CONCLUSION

In this paper, we study an alternative solution method to sequential decision problems. The

idea involves calculating the continuation value directly. We show that not only is the set of

possible applications of this method very broad, but it turns to have significant advantages

over traditional methods based on the value function.

7. APPENDIX

Denote (X,X ) as a measurable space and (Y,Y , u) as a measure space.

Lemma 7.1. Let p : Y × X → R be a measurable map that is continuous in x. If there exists a

measurable map q : Y × X → R+ that is continuous in x with q(y, x) ≥ |p(y, x)| for all (y, x) ∈Y × X, and that x 7→

∫q(y, x)u(dy) is continuous, then the mapping x 7→

∫p(y, x)u(dy) is

continuous.

Proof. Since q(y, x) ≥ |p(y, x)| for all (y, x) ∈ Y× X, we know that (y, x) 7→ q(y, x)± p(y, x)

are nonnegative measurable functions. Let (xn) be a sequence of X with xn → x. By Fatou’s

lemma, we have∫lim inf

n→∞[q(y, xn)± p(y, xn)]u(dy) ≤ lim inf

n→∞

∫[q(y, xn)± p(y, xn)]u(dy)

From the given assumptions we know that limn→∞

∫q(y, xn)u(dy) = q(y, x). Combine this

result with the above inequality, we have

p(y, x)u(dy) ≤ lim infn→∞

(±∫

p(y, xn)u(dy))

where we have used the fact that for any two given sequences (an)n≥0 and (bn)n≥0 of R with

limn→∞

an exists, we have: lim infn→∞

(an + bn) = lim infn→∞

an + lim infn→∞

bn. So

lim supn→∞

∫p(y, xn)u(dy) ≤

∫p(y, x)u(dy) ≤ lim inf

n→∞

∫p(y, xn)u(dy)

Therefore, the mapping x 7→∫

p(y, x)u(dy) is continuous.

Proof of theorem 2.1. To prove the first statement, based on the weighted contraction mapping

theorem (see, e.g., Boyd, 1990), it is sufficient to verify: (a) Q is monotone in the sense that

Qψ ≤ Qφ for all ψ, φ ∈ b`Z with ψ ≤ φ; (b) Q0 ∈ b`Z; and (c) Q(ψ + a`) ≤ Qψ + a(βm)`

for all a ∈ R+ and ψ ∈ b`Z. Obviously, Q is monotone and condition (a) holds. From

Assumption 2.1 we know that

|(Q0)(z)|`(z)

∣∣∣∣ c(z)`(z)+ β

∫max

r(z′)`(z)

P(z, dz′)∣∣∣∣

≤ |c(z)|`(z)

+ β∫ |r(z′)|

`(z)P(z, dz′) ≤ 1 + β

for all z ∈ Z. So we have ‖Q0‖` < ∞. Condition (b) holds since the measurability of Q0

follows immediately from primitive assumptions. It remains to verify condition (c). Notice

that based on Assumption 2.1, we have∫`(z′)P(z, dz′) =

∫g(z′)P(z, dz′) +

dm− 1

≤ mg(z) + d +d

m− 1

g(z) +d

m− 1

]= m`(z)

Therefore, for all a ∈ R+ and ψ ∈ b`Z, we have

Q(ψ + a`)(z) = c(z) + β∫

r(z′), ψ(z′) + a`(z′)

P(z, dz′)

≤ c(z) + β∫

r(z′) + a`(z′), ψ(z′) + a`(z′)

P(z, dz′)

= c(z) + β∫

r(z′), ψ(z′)

P(z, dz′) + aβ∫

`(z′)P(z, dz′)

≤ Qψ(z) + a(βm)`(z)

So condition (3) is verified.

Let ψ denote the unique fixed point of the operator Q under b`Z. To prove the second state-

ment, it remains to verify that ψ = ψ∗. Notice that since

maxr(z), ψ(z) = max

r(z), c(z) + β∫

maxr(z′), ψ(z′)P(z, dz′)

the function z 7→ maxr(z), ψ(z) solves the Bellman equation (1). Moreover, there exist

h1, h2 ∈ R+ such that

|maxr(z), ψ(z)|k(z)

≤ |r(z)|+ |ψ(z)|k(z)

≤ |r(z)|+ h1g(z) + h2

for all z ∈ Z. Since the last term is bounded on Z, the measurable map z 7→ maxr(z), ψ(z)is a candidate of bkZ. Based on Ma (2016), we know that it must be the value function, i.e.,

v∗(z) = maxr(z), ψ(z) for all z ∈ Z. Therefore, we have

ψ∗(z) = c(z) + β∫

v∗(z′)P(z, dz′)

= c(z) + β∫

maxr(z′), ψ(z′)P(z, dz′) = ψ(z)

for all z ∈ Z, and the second statement is verified. The proof of the third statement is trivial.

Proof of Proposition 2.1. The claim is true by construction when n = 0. Now suppose that the

claim is true for arbitrary n. We aim to show that it also holds at n + 1. To this end, consider

the two operators R and L defined by

Rψ := r ∨ ψ and Lv := c + βPv.

With this notation, we can write T and Q as Tv = RLv and Qψ = LRψ. By the induction

hypothesis, we have vn = r ∨ ψn, or vn = Rψn, so

vn+1 = Tvn = TRψn = RLRψn = RQψn = Rψn+1.

In other words, vn+1 = r ∨ ψn+1, as was to be shown.

Proof of Proposition 3.1. Let b`cZ be the set of continuous functions in b`Z. Since g is continu-

ous, it is easy to show that b`cZ is a closed subset of b`Z. To verify the continuity of ψ∗, it is

sufficient to show that Q(b`cZ) ⊂ b`cZ (see., e.g., Stokey et al., 1989). The assumptions of the

proposition ensure that this is so. The continuity of v∗ follows from the continuity of ψ∗ and

r and the fact that v∗ = r ∨ ψ∗.

Proof of Proposition 3.2. By Assumptions 2.1, 3.1, and Proposition 3.1, to show that ψ∗ is con-

tinuous, we only need to verify that Assumption 3.2 holds. For all ψ ∈ b`cZ, there exists

G ∈ R+ such that |ψ(z)| ≤ G`(z) for all z ∈ Z, so we have: |maxr(z′), ψ(z′) f (z′|z)| ≤[|r(z′)| + G`(z′)] f (z′|z). Based on the given assumptions, z 7→ [|r(z′)| + G`(z′)] f (z′|z) is

nonnegative and continuous for all z′ ∈ Z, and z 7→∫[|r(z′)|+ G`(z′)] f (z′|z)dz′ is continu-

ous. By Lemma 7.1, z 7→∫

maxr(z′),ψ(z′) f (z′|z)dz′ is continuous. Combined with the fact that c is continuous, Assumption 3.2

holds. The remaining proof is similar as Proposition 3.1.

Proof of Proposition 3.3. Let b`iZ (resp. b`dZ) be the set of increasing (resp. decreasing) func-

tions in b`Z. Then b`iZ (resp. b`dZ) is a closed subset of b`Z. To show that ψ∗ is increasing

(resp. decreasing), it is sufficient to show that Q(b`iZ) ⊂ b`iZ (resp. Q(b`dZ) ⊂ b`dZ) (see,

e.g., Stokey et al., 1989). The assumptions of the proposition guarantee that this is the case.

The value function v∗ is increasing (resp. decreasing) since r is assumed to be increasing

(resp. decreasing) and v∗ = r ∨ ψ∗.

Proof of Proposition 3.4. Notice that the set of concave (resp. convex) functions of b`Z is a

closed subset of b`Z. The remaining proof is similar to that of Proposition 3.3.

Proof of Proposition 3.5. For all z ∈ int(Z) and i = 1, ..., m, let µ(z) :=∫

maxr(z′),ψ∗(z′) f (z′|z)dz′, and hi(z) :=

∫maxr(z′), ψ∗(z′)Di f (z′|z)dz′. Since c is differentiable by

Assumption 3.9, to prove the desired result, we only need to verify that Diµ = hi at all

interior points for each i. For any zi0, let zi

n be an arbitrary sequence such that zin → zi

zin 6= zi

0 for all n ∈ N. Let zn and z0 be elements of int(Z) with the i-th entry being zin

and zi0 respectively, and z−i

n = z−i0 for all n ∈N.

For given δ > 0, there exists N ∈ N such that for all n ≥ N, zin ∈ Bδ(zi

0). By the mean value

theorem, given z−i = z−i0 , there exists ξ(z′, zn, z0) ∈ Bδ(zi

0) such that

|4(z′, zn, z0)| :=

∣∣∣∣∣ f (z′|zn)− f (z′|z0)

zin − zi

∣∣∣∣∣ = ∣∣∣Di f (z′|z)|zi=ξ(z′,zn,z0)

∣∣∣ ≤ supzi∈Bδ(zi

∣∣Di f (z′|z)∣∣

Since ψ∗ ∈ b`Z, there exists G ∈ R+ such that |ψ∗| ≤ G`. For all n ≥ N, we have

(1) |maxr(z′), ψ∗(z′)4(z′, zn, z0)| ≤ (|r(z′)|+ G`(z′)) supzi∈Bδ(zi

|Di f (z′|z)|;

(2) z−i0 7→

∫(|r(z′)|+ G`(z′)) sup

zi∈Bδ(zi0)

|Di f (z′|z)| dz′ takes finite values;

(3) maxr(z′), ψ∗(z′)4(z′, zn, z0)→ maxr(z′), ψ∗(z′)Di f (z′|z0) as n→ ∞.

By the dominated convergence theorem, we have Diµ(z0) = hi(z0) since

µ(zn)− µ(z0)

zin − zi

maxr(z′), ψ∗(z′)4(z′, zn, z0)dz′

→∫

maxr(z′), ψ∗(z′)Di f (z′|z0)dz′ = hi(z0)

Proof of Proposition 3.6. From condition (1) of Assumption 3.11 we know that for all z0 ∈int(Z), there exists a compact subset A ⊂ Z, such that for all z′ ∈ Ac, we have z∗i (z

′, z−i0 ) /∈

Bδ(zi0), and sup

zi∈Bδ(zi0)

|Di f (z′|z)| = max|Di f (z′|z)|zi=zi0+δ, |Di f (z′|z)|zi=zi

0−δ given z−i = z−i0 .

By Assumption 3.10, (z, z′) 7→ Di f (z′|z) is continuous. Therefore, given z−i = z−i0 , there

exists G ∈ R+ such that

supzi∈Bδ(zi

|Di f (z′|z)| ≤ supz′∈A,zi∈Bδ(zi

|Di f (z′|z)| · 1(z′ ∈ A)

+(|Di f (z′|z)|zi=zi

0+δ + |Di f (z′|z)|zi=zi0−δ

)· 1(z′ ∈ Ac)

≤ G · 1(z′ ∈ A)

+(|Di f (z′|z)|zi=zi

0+δ + |Di f (z′|z)|zi=zi0−δ

)· 1(z′ ∈ Ac)

Combine this result with condition (2) of Assumption 3.11, we can show that Assumption

3.8 holds. The desired result then follows from Proposition 3.5.

Proof of Proposition 3.7. Since ψ∗ ∈ b`Z, there exists G ∈ R+ such that |ψ∗(z)| ≤ G`(z) for

all z ∈ Z. So |maxr(z′), ψ∗(z′)Di f (z′|z)| ≤ (|r(z′)| + G`(z′))|Di f (z′|z)| for all z′, z ∈Z. For all z′ ∈ Z, z 7→ (|r(z′)| + G`(z′))|Di f (z′|z)| is nonnegative and continuous by the

given assumptions. z 7→∫[|r(z′)| + G`(z′)]|Di f (z′|z)|dz′ is continuous by condition (1) of

Assumption 3.12. So z 7→∫

maxr(z′), ψ∗(z′)Di f (z′|z)dz′

= Diµ(z) is continuous by Lemma 7.1. This result, combined with the assumption that c is

continuously differentiable (condition(2) of Assumption 3.12), show that ψ∗ is continuously

differentiable at interior points.

Proof of Proposition 3.8. Consider ` : Z× Θ → [1, ∞) defined by `(z, θ) = gθ(z) + dm−1 , the

Banach space (b`(Z×Θ), ‖ · ‖`), and the continuation value operator Q : b`(Z×Θ)→ b`(Z×Θ)

Qψθ(z) = cθ(z) + β∫

maxrθ(z′), ψθ(z′) fθ(z′|z)dz′

Based on Theorem 2.1, (z, θ) 7→ ψ∗θ (z) is the unique fixed point of Q in b`(Z × Θ). Let

b`cθ(Z× Θ) be the set of functions in b`(Z× Θ) that are continuous in θ. Since θ 7→ gθ(z)

is continuous for all z ∈ Z by Assumption 3.15, b`cθ(Z×Θ) is a closed subset. To show the

continuity of θ 7→ ψ∗θ (z), it remains to verify that Q : b`cθ(Z×Θ)→ b`cθ(Z×Θ).

For all candidate (z, θ) 7→ ψθ(z) in b`cθ(Z× Θ), there exists G ∈ R+ such that |ψθ(z′)| ≤G`(z′, θ), so |maxrθ(z′), ψθ(z′) fθ(z′|z)| ≤ [|rθ(z′)| + G`(z′, θ)] fθ(z′|z) for all (z′, z, θ) ∈Z× Z× Θ. Moreover, by Assumptions 3.14 - 3.15, θ 7→ [|rθ(z′)|+ G`(z′, θ)] fθ(z′|z) is non-

negative and continuous for all z, z′ ∈ Z, and θ 7→∫[|rθ(z′)|

+ G`(z′, θ)] fθ(z′|z)dz′ is continuous for all z ∈ Z. From Lemma 7.1 we know that θ 7→∫maxrθ(z′), ψθ(z′) fθ(z′|z)dz′ is continuous for all z ∈ Z. Moreover, θ 7→ cθ(z) is continu-

ous for all z ∈ Z. So θ 7→ ψ∗θ (z) is continuous.

The continuity of θ 7→ v∗θ (z) follows from the continuity of θ 7→ rθ(z) and the fact that

v∗θ = rθ ∨ ψ∗θ .

Proof of Proposition 4.4. Consider F : X×Y×Θ→ R defined by F(x, y, θ) := rθ(x, y)−ψ∗θ (y).

Without loss of generality, suppose that (x, y, θ) 7→ rθ(x, y) is strictly increasing in x. Then F

is a continuous function and is strictly increasing in x.

For all fixed y ∈ Y, θ0 ∈ Θ and ε > 0, since F is strictly increasing in x and F(xθ0(y), y, θ0) = 0,

we have:

F(xθ0(y) + ε, y, θ0) > 0 and F(xθ0(y)− ε, y, θ0) < 0

Since F is continuous with respect to θ, there exists δ > 0 such that for all θ ∈ Bδ(θ0) := θ ∈Θ : ‖θ − θ0‖ < δ, where ‖ · ‖ is the Euclidean norm, we have

F(xθ0(y) + ε, y, θ) > 0 and F(xθ0(y)− ε, y, θ) < 0

Since F(xθ(y), y, θ) = 0, by the strict monotonicity of F with respect to x, we have:

xθ(y) ∈ (xθ0(y)− ε, xθ0(y) + ε)

i.e., |xθ(y)− xθ0(y)| < ε. Hence, the function θ 7→ xθ(y) is continuous for all y ∈ Y.

REFERENCES

ALBUQUERQUE, R. AND H. A. HOPENHAYN (2004): “Optimal lending contracts and firm

dynamics,” The Review of Economic Studies, 71, 285–315.

ALVAREZ, F. AND A. DIXIT (2014): “A real options perspective on the future of the Euro,”

Journal of Monetary Economics, 61, 78–109.

ALVAREZ, F. AND N. L. STOKEY (1998): “Dynamic programming with homogeneous func-

tions,” Journal of Economic Theory, 82, 167–189.

ARELLANO, C. (2008): “Default risk and income fluctuations in emerging economies,” The

American Economic Review, 98, 690–712.

BACKUS, D. (2014): “Discussion of Alvarez and Dixit: A real options perspective on the

Euro,” Journal of Monetary Economics, 61, 110–113.

BECKER, R. A. AND J. H. BOYD (1997): Capital Theory, Equilibrium Analysis, and Recursive

Utility, Wiley-Blackwell.

BELLMAN, R. (1969): “A new type of approximation leading to reduction of dimensionality

in control processes,” Journal of Mathematical Analysis and Applications, 27, 454–459.

BOYD, J. H. (1990): “Recursive utility and the Ramsey problem,” Journal of Economic Theory,

50, 326–345.

BULL, C. AND B. JOVANOVIC (1988): “Mismatch versus derived-demand shift as causes of

labour mobility,” The Review of Economic Studies, 55, 169–175.

BURDETT, K. AND K. L. JUDD (1983): “Equilibrium price dispersion,” Econometrica, 955–969.

CHOI, J. J., D. LAIBSON, B. C. MADRIAN, AND A. METRICK (2003): “Optimal defaults,” The

DIXIT, A. K. AND R. S. PINDYCK (1994): Investment Under Uncertainty, Princeton University

Press.

DUFFIE, D. (2010): Dynamic Asset Pricing Theory, Princeton University Press.

DURAN, J. (2000): “On dynamic programming with unbounded returns,” Economic Theory,

15, 339–352.

——— (2003): “Discounting long run average growth in stochastic dynamic programs,” Eco-

nomic Theory, 22, 395–413.

ERICSON, R. AND A. PAKES (1995): “Markov-perfect industry dynamics: A framework for

empirical work,” The Review of Economic Studies, 62, 53–82.

FAJGELBAUM, P., E. SCHAAL, AND M. TASCHEREAU-DUMOUCHEL (2015): “Uncertainty

traps,” Tech. rep., NBER Working Paper.

HOPENHAYN, H. A. (1992): “Entry, exit, and firm dynamics in long run equilibrium,” Econo-

metrica, 1127–1150.

HUGGETT, M., G. VENTURA, AND A. YARON (2011): “Sources of lifetime inequality,” The

INSLEY, M. C. AND T. S. WIRJANTO (2010): “Contrasting two approaches in real options

valuation: contingent claims versus dynamic programming,” Journal of Forest Economics,

16, 157–176.

JOVANOVIC, B. (1982): “Selection and the evolution of industry,” Econometrica, 649–670.

——— (1987): “Work, rest, and search: unemployment, turnover, and the cycle,” Journal of

Labor Economics, 131–148.

KAPLAN, G. AND G. L. VIOLANTE (2010): “How much consumption insurance beyond self-

insurance?” American Economic Journal: Macroeconomics, 2, 53–87.

KARATZAS, I. AND S. E. SHREVE (1998): Methods of Mathematical Finance, vol. 39, Springer

Science & Business Media.

KIYOTAKI, N. AND R. WRIGHT (1993): “A search-theoretic approach to monetary econom-

ics,” The American Economic Review, 63–77.

LE VAN, C. AND Y. VAILAKIS (2005): “Recursive utility and optimal growth with bounded

or unbounded returns,” Journal of Economic Theory, 123, 187–209.

LJUNGQVIST, L. AND T. J. SARGENT (2012): Recursive Macroeconomic Theory, MIT Press.

MA, Q. (2016): “Supplementary appendix: solving sequential decision problems via contin-

uation values,” ANU Working Paper.

MARTINS-DA ROCHA, V. F. AND Y. VAILAKIS (2010): “Existence and uniqueness of a fixed

point for local contractions,” Econometrica, 78, 1127–1141.

MATKOWSKI, J. AND A. S. NOWAK (2011): “On discounted dynamic programming with

unbounded returns,” Economic Theory, 46, 455–474.

MCCALL, J. J. (1970): “Economics of information and job search,” The Quarterly Journal of

Economics, 84, 113–126.

MEYN, S. P. AND R. L. TWEEDIE (2012): Markov Chains and Stochastic Stability, Springer

Science & Business Media.

PISSARIDES, C. A. (2000): Equilibrium Unemployment Theory, MIT press.

RINCON-ZAPATERO, J. P. AND C. RODRIGUEZ-PALMERO (2003): “Existence and uniqueness

of solutions to the Bellman equation in the unbounded case,” Econometrica, 71, 1519–1555.

——— (2009): “Corrigendum to Existence and uniqueness of solutions to the Bellman equa-

tion in the unbounded case Econometrica, Vol. 71, No. 5 (September, 2003), 1519–1555,”

Econometrica, 77, 317–318.

ROTHSCHILD, M. (1974): “Searching for the lowest price when the distribution of prices is

unknown,” Journal of Political Economy, 82, 689–711.

RUST, J. (1986): “When is it optimal to kill off the market for used durable goods?” Econo-

metrica, 65–86.

——— (1987): “Optimal replacement of GMC bus engines: An empirical model of Harold

Zurcher,” Econometrica, 999–1033.

——— (1997): “Using randomization to break the curse of dimensionality,” Econometrica,

487–516.

SHI, S. (1995): “Money and prices: a model of search and bargaining,” Journal of Economic

Theory, 67, 467–496.

——— (1997): “A divisible search model of fiat money,” Econometrica, 75–102.

SHIRYAEV, A. N. (1999): Essentials of Stochastic Finance: Facts, Models, Theory, vol. 3, World

scientific.

STOKEY, N., R. LUCAS, AND E. PRESCOTT (1989): Recursive Methods in Economic Dynamics,

Harvard University Press.

TAYLOR, S. J. (1982): “Financial returns modelled by the product of two stochastic processes–

a study of the daily sugar prices 1961-75,” Time Series Analysis: Theory and Practice, 1, 203–

TREJOS, A. AND R. WRIGHT (1995): “Search, bargaining, money, and prices,” Journal of Po-

litical Economy, 118–141.

solving sequential decision problems via continuation values1 · 2016-10-13 · solving sequential...

Documents

career continuation your overview of the career continuation...

anemias - continuation

chap 3 interpolating values animation(u), chap 3,...

continuation power flow

auxiliary tasks continuation

business continuation

brs continuation pay fact sheet€¦ · the blended...

uw biostatistics working paper seriesbayesian evaluation of...

continuation sheet

continuation linking

lulik: the core of timorese values1 - tlstudies.org conf...

sequential clinical scheduling with service criteria...

midterm continuation

a continuation

downloading your samhsa continuation application … ·...

optimal timing of decisions: a general theory based on...

holland park avenue - consultation plans · 2019-05-13 ·...

career continuation

microsoft word - contemporary moral values1-12 book review...

partitional values1