e cient solution, filtering and estimation of models …have little e ect on in ation dynamics....

Efficient Solution, Filtering and Estimationof Models with OBCs

Work in progress, comments are very welcome. Latest version:

http://gregorboehl.com/live/obc_boehl.pdf

Gregor Boehl

Institute for Monetary and Financial Stability, Goethe University Frankfurt

February 12, 2020

Abstract

Occasionally binding constraints (OBCs) play a central role in macroeconomic modellingsince major developed economies have hit the zero lower bound (ZLB) on nominal inter-est rates. I present a solution method for rational expectations models with OBCs and aBayesian filter/smoother that combined can be used for fast and accurate Bayesian esti-mation of large-scale models featuring one or several OBCs. The quasi-analytic solutionmethod avoids matrix inversions and simulations at runtime for gains in computationalspeed. The transposed Ensemble Kalman Filter (TEnKF) is a hybrid of the particlefilter and the Kalman filter. It requires only a small number of particles and can beused to approximate the likelihood of large-scale nonlinear models with high accuracy.A nonlinear path-adjustment smoother (NPAS) adds a smoothing procedure and can beapplied to estimate the distributions of states and shock innovations while fully respect-ing the nonlinearity of – potentially high dimensional – transition functions. I furtherpropose a tempered version of the Differential Evolution Markov chain Monte Carlomethod, which can be massively parallelized, avoids local maximization, and facilitatesidentification even if likelihood functions are near-flat.

Keywords: Occasionally Binding Constraints, Effective Lower Bound, IterativeBayesian Filter, Ensemble Kalman Smoother, Bayesian EstimationJEL: E63, C63, E58, E32, C62

1 Introduction

Without doubt, the potential to generate credible policy advice from dynamic-stochasticgeneral equilibrium (DSGE) models crucially depends on the power to fit these models

?I am grateful to Claus Brand, Cees Diks, Alexander Meyer-Gohde, Kenneth Judd, Alexander RichterFelix Strobel, Carlos Zarazaga and participants of several conferences and seminars for discussions andhelpful comments on the contents of this paper. I am indebted to Edward Herbst and Pablo Winant formaking their code available.

Email address: [email protected], https://gregorboehl.com

http://gregorboehl.com/live/obc_boehl.pdf

to the data. The fact that nominal interest rates in major economies have reached theirso-called zero lower bound (ZLB) has not only called for methods to solve dynamic mod-els with occasionally binding constraints (OBCs). Since such constraints induce strongnonlinearities, their presence is likely to alter the estimates of parameters. Hence, ignor-ing OBCs during estimation has the potential to produce misleading policy implications.This calls for – necessarily nonlinear – solution and filtering methods that are robust,precise and reliable while being fast enough to be used together with Bayesian estima-tion techniques. This work attempts to provide two answers to this call: a fast solutionmethod for DSGE models with OBCs, and a Bayesian filter that is well-suited even forhighly nonlinear and high-dimensional transition functions.

Most solutions concepts to rational expectations models with OBCs consist of twoparts. The first part is a reduced representation of the solution in terms of the numbersof expected periods at the constraint (NEPC), and the second part is an algorithm tofind these numbers. This involves finding the number of periods until the constraint isreached, as well as the duration for which it binds. I derive a closed-form state-spacerepresentation for the complete expected trajectory depending on the set of NEPC andthe initial states. Taking this as a starting point, I provide five equilibrium conditionsgiven a set of NEPC and the initial states. The representation together with the con-ditions can be used to quickly and reliably find the rational expectations equilibrium– i.e. the NEPC – if it exists, or to be informative if it does not. The fact that thisalgorithm functions without the need to simulate the complete anticipated equilibriumpath reduces computational load tremendously.

The second central contribution of this paper is a Bayesian filter that – given that theNEPC is determined endogenously – takes into account the potentially highly nonlinearlaw of motion. The parameter estimation requires precise estimates of the likelihoodwhile the inference of the likelihood must also be very fast. In practice this implies tominimize the number of particle evaluations as much as possible. Furthermore, significanteconomic analysis requires an accurate approximation of the complete distribution ofhidden states. This includes that the estimated series of shocks can be used preciselyreproduce the original data. To meet these several challenging requirements I proposethe IPA smoother, which can be understood as a hybrid between the Kalman Filter, theparticle filter, and an iterative smoother. IPA smoother stands short for iterative path-adjusting transposed-ensemble RTS smoother. Like the particle filter the algorithm usesthe transition function on a set of particles to obtain an approximation of the states.This guarantees a reliable estimation of the likelihood and the distribution of states.Drawing on the assumption that the latter is approximately Gaussian, the ensembleis updated via particle shifting (similar to the Kalman transformation) rather than byusing reweighing methods as the particle filter does. This much reduces the number ofrequired particles and much lightening the computational burden of the filter. Finally,an iterative procedure ensures that the mode of the smoothened states fully respects thenonlinearities. Since most of the literature on Bayesian filtering is foreign to economicscience, a detailed discussion is delegated to the Section 3.

As an example I estimate the simple New Keynesian (NK) model with the ZLB onthe last two decades of US-data and use the IPA smoother to decompose the dynamics.Although primarily intended as a proof-of-concept for the methods suggested here, thisexercise provides meaningful insights. The estimation results indicate, well in line withmost recent insights, that the Philips Curve is extremely flat and fluctuations in demand

2

have little effect on inflation dynamics. Though monetary policy had measurable impacton output even when interest rates were zero, this effect was very small. The methodsintroduced here were also successfully applied in Boehl and Strobel (2019a,b) to mediumand large scale models with financial frictions, where we study conventional and uncon-ventional monetary policy (forward guidance and quantitative easing policies) during theUS-ZLB period.

Several solution concepts for linear DSGE models with OBCs are readily available.The first and probably most widely used is the solution method from Guerrieri andIacoviello (2015). Similar to the method introduced here, the authors find a matrixrepresentation of the solution in terms of the NEPC. The NEPC is then found by aninteger-based iterative process analogous to the Gauss-Jacobi algorithm. Their methoddoes not entail a closed-form representation for the solutions in future periods given a setof NEPC. This implies that each guess of the iterative process requires simulation, whichis costly in terms of computation time. Further, convergence of the iterative processfor discrete problems is not guaranteed, so it can not be maintained that a solution tothe problem is always found if it exists. As a result, while in some common cases bothmethods will return the same result, the method introduced here is more robust andconsiderably faster.1 Based on news shocks, Holden (2016, 2017) proposes a solutionconcept for which he finds a linear complementarity problem to the original model. This,correspondingly, is a representation of the solution in terms of the NEPC. His setup allowsfinding the NEPC by using familiar methods from mixed integer linear programming, andalso enables the author to state conditions on existence of uniqueness. For this reasonhis method is as robust as the one proposed here. It however is too computationallyextensive to be used in the context of parameter estimation. Lastly, Binning and Maih(2016) propose an implementation that is based on regime switching. While the solutionin terms of the NEPC could generally be represented as many different regimes (oneper NEPC), the authors focus on two representative regimes only. In particular, in theregime representing the constraint to hold, the NEPC is fixed. Arguably, this can beseen as an approximation of the complexity of the transition function with endogenousNEPC but may result in false evaluations of the likelihood and the distribution of filteredstates.

In most recent years, a few approaches to deal with OBCs were taken when estimatingeconomic models, in particular regarding the ZLB. All of them rely on a medium scaleNew Keynesian model in the spirit of Smets and Wouters (2007). Fratto and Uhlig(2014) simply chose to ignore the constraint both for estimation of the model and thedecomposition of the dynamics. As the results in Boehl and Strobel (2019b) suggest,such approach might yield reasonably good parameter estimates for some models, butthis likely is not generally the case. Ignoring the ZLB for the shock decomposition isvery likely to give misleading results not lease because the ZLB not only changes the sizebut also potentially the direction of economic effects. In Jones (2017) the author uses anexternal source for the expected durations at the constraint and includes them into theBayesian estimation. Hence, the NEPC are taken as given during each draw from theparameter distribution and the linear Kalman filter (KF) is used. Computationally, thisroutine is very slim. Unfortunately, such procedure will likely return a point estimate

1Fast here refers to magnitudes of milliseconds per particle in contexts of 30-40 states.

3

close to mode but fail to provide a reliable estimates of the distribution of the states. Thisis because the linear filter with a fixed NEPC will not take into account that differentlysized shocks imply different NEPC which in turn imply qualitative and quantitativedifferent responses of endogenous variables. Likewise, the likelihood evaluations by thelinear filter will be biased.

The method from Guerrieri and Iacoviello (2015) was most prominently applied inGuerrieri and Iacoviello (2017). For estimation and filtering of shock innovations theauthors use a one-to-one mapping from shocks and observables while ignoring uncertaintyabout initial states and measurement errors. Especially ignoring uncertainty about theinitial distribution of states can have severe effects on the economic interpretation of theresults.2 Likewise, to base the estimation of the likelihood on the estimated innovationsof exogenous variables only does not suffice as an approximation of the data-likelihoodgiven the full distribution of states. In Linde et al. (2017) the authors compare the regimeswitching approach with a method that is a hybrid between Guerrieri and Iacoviello(2015) and Holden (2016). They then use the Unscented Kalman Filter for likelihoodevaluation. Although this method is closest to the approach advocated in this paper,the UKF bears some drawbacks which are discussed in detail in Section 3. Finally, Gustet al. (2017) take the rocky road and use policy function iteration in combination with theparticle filter to estimate and filter the shocks. The use of such global solution methods iseconomically the most rigorous form, while requiring high computational knowledge andspecial hardware. Unfortunately this method still imposes limitations on the size of thestate space. Sceptics may argue that due to the use of several approximation methods(polynomial grid, grid size etc.) the result might still be an approximation of relativelylow oder and convergence of iterative method and particle filter are not guaranteed foreach draw in the state space, again bearing the potential to bias the results.

My reference implementation of the method and the smoother enables to effectivelyestimation large-scale DSGE models in about two to three hours for the benchmarkof Smets and Wouters (2007).3 Note that computational advantage much depends onefficient implementation. The reference implementation is written in the powerful andfreely available language Python.4

The rest of this draft is structured as follows. Section 2 contains the description of themethod including results on existence as well as comments on the implementation. In thefollowing Section 3 I introduce the IPA smoother whereas I briefly discuss estimation inSection 4. In Section 5 I apply the estimation to the simple NK model and quickly discussthe estimation results and insights from the estimated model. Section 6 concludes.

2Imagine for example to initiate the states at the steady state while in reality being in a recession.This will suggest that in the following periods there is a boom phase. Training the filter, as advocated bythe authors, does not make up for not using a real Bayesian filter that returns a meaningful distributionof states depending on assumptions on the initial distribution.

3Benchmark taken on a machine with 40 cores of 3.10GHz each.4Python can provide speed benchmarks that are en-par with compiled languages such as Fortran while

comprising the advantages of a high-level programming language. I would like to promote free and opensoftware and advocate the avoidance of proprietary languages. Open source alternatives already provideby far more efficient and more flexible environments while avoiding barriers for scientific advancing suchas licensing and closed-source code.

4

2 Solution Method

Let vt the n× 1 vector containing the full set of variables. Define a constraint (β, r)

on the variable rt to be a 1× n vector and a scalar such that rt = max

βᵀ

[vt

vt−1

], r

.

Similarly to Holden (2016), I rule out equilibrium paths that are due to long-term ex-pectations other than the steady state. For the case in which the constraint is the ZLB,this assumption is equivalent to assuming full credibility of the monetary authority. De-pending on this constraint, a linear economic model can be written in the form

ΓEtvt+1 + Λuvt + Φvt−1 + Θεt = 0 ⇐⇒ βᵀ

[vt

vt−1

]≥ r, (1)

ΓEtvt+1 + Λcvt + Φvt−1 + Σ + Θεt = 0 ⇐⇒ βᵀ

[vt

vt−1

]< r. (2)

Our first concern is to cast the above system into the following form

N

[xt

vt−1

]+ c max

bᵀ

[xt

vt−1

], r

= Et

[xt+1

vt

], (3)

where it is implicitly assumed that Θεt come up to some exogenous processes that arepart of vt.

5 I hence make use of end-of-period notation.This transformation is necessary for two reasons. First, the algorithm that determines

the set of numbers of expected periods at the constraint (NEPC) is based on a guess-and-verify-approach. In order to avoid computationally costly simulations of the completetime series for every guess, it is necessary to formulate the solution such that, givenvt and the set of NEPC, the complete time path can be evaluated using a closed-formexpression. One can imagine such solution as telescope into the future. The second reasonmaking this transformation necessary is that it avoids matrix inversions at runtime, whichconsiderably reduces the computational burden and further avoids problems with singularor near-singular system matrices.6

2.1 System Transformation

Define xt to be the m × 1 vector containing forward looking variables or variablesthat appear in the constraint at time t. xt hence corresponds to variable with nonzerocolumns in either Γ or β. Obtain Γ and β by removing all columns from the objectsthat do not correspond to xt. Further, for convenience define yt =

[xt vt−1

]ᵀ. The

above system can now be expressed as

Myt + κmaxβᵀyt, r

= P Etyt+1 (4)

5To be precise, vt−1 at the beginning of the period is defined such that it contains the noise innova-tion. Alternatively one could add a third type of variable wt only for exogenous processes and use theadmittedly more correct notation with wt on the LHS and wt+1 on the RHS.

6It is likely that unstable behavior reported by the users of the Guerrieri and Iacoviello (2015) methodis due to the numerical inversion of rank deficient matrices.

5

with M =

[0 ΦI 0

]and P =

[Γ Λc

0 Iy→x

]. The (n+m)× 1 column vector κ contains the

coefficients of the constrained variable on yt which can be readily obtained from Λc andΛu. I am purposefully omitting the representation of the shocks as I will represent themthrough the state system only.

Generally, P can not assumed to be nonsingular. It is however possible to convertthe system such that this can be safely assumed. For that purpose, let USVᵀ = P bethe singular-value decomposition (SVD) of P and define M = UᵀM, κ = Uᵀκ. Let s0

be the rank of P and note that P = SVᵀ =

[P1 P2

0 0

], following the logic of the SVD,

has s0 rows of zeroes at the bottom.Write out System (4) as[

M11 M12

M21 M22

] [xt

vt−1

]+

[κ1

κ2

]max

β

[xt

vt−1

], r

=

[P1 P2

0 0

]Et

[xt+1

vt

]. (5)

Let me assume κ2 = 0.7 The bottom part can be iterated forward to read M21Etxt+1 +

M22vt = 0 = M21xt+M22vt−1. Accordingly, define PM =

[P1 P2

M21 M22

]and acknowl-

edge that this matrix must be invertible since the upper (s0×n+m) part has rank s0 andthe lower part either contains (time-forwarded) definitions of lagged or static variables.If these are singular, your model is likely to be unsolvable regardless what. Obtain thedesired System in (3) with N = P−1

M M and c = P−1M κ.

Implementation: parsing and system transformation

Model parsing is done by a modified older fork of dolo that is included in thepydsge package available on my GitHub account https://github.com/gboehl.8

The parser provides the matrices from Equations (1) represented as function ofthe parameters. The above transformation is implemented using standard li-braries only (namely numpy and sympy). Since in the Monte-Carlo loop for eachpoint in the parameter space this transformation has only to be done once, itscomputational costs are negligible in comparison to the actual filtering to get thelikelihood.

2.2 Main method: one constraint

To fix ideas let me first assume that the constraint already binds in period t, hencethat there is no transition period from t = 0 in which the constraint is slack to a periodl 6= 0 in which it binds. Transitioning to the constraint is then a straightforward extensionexplained in detail in the next subsection. The system (N + cb), containing the steady

7This condition is likely to be satisfied, but however not under all circumstances. In the latter caseit is always possible to rearrange the system such that it is satisfied, notably by substituting out the

constrained variable in the row i where κ(i)2 6= 0.

8dolo is a software package written in Python to describe and solve economic models. The currentversion can be found at https://github.com/EconForge/dolo.

6

https://github.com/gboehl

https://github.com/EconForge/dolo

state, will be called the unconstrained system. In order to have a uniquely determinedsolution the generalized Blanchard-Kahn conditions (Blanchard and Kahn, 1980; Sims,2002) must be satisfied for this system. N likewise is the system matrix of the constraintsystem. System (3) can alternatively be rewritten as

[Etxt+1

vt

]=

(N + cbᵀ)

[xt

vt−1

]∀ bᵀ

[xt

vt−1

]− r ≥ 0

N

[xt

vt−1

]+ cr ∀ bᵀ

[xt

vt−1

]− r < 0.

(6)

From here it is easy to see that the system is continuous at the constraint since it holds

that if bᵀ

[xtvt

]= r, then maxr, r = r and in both systems the expectations are equal.

Put differently, both systems are perturbated around the (same) unconstrained steadystate.9

Let k be the NEPC in period t. Denote a rational expectations solution to (3) givenk and state variables vt as the function S such that

xt = S(k,vt−1). (8)

Definition 1 (No-transition equilibrium). Assuming no-transition, a rational expecta-tion solution S(k∗) for an expected number of periods k∗ at the constraint is a rationalexpectations equilibrium iff

bᵀEt yt+k∗ |S(k∗,vt−1),vt−1 − r ≥ 0 > bᵀEt yt+k|S(k∗,vt−1),vt−1 − r (9)

for k ∈ 0, 1, . . . , k∗ − 1, hence if in expectations the system is constrained for exactlyk∗ periods.

It is easy to see that the system implied by Equation (6) can only have a rationalexpectation equilibrium for a given vt−1 if the constraint is not a repeller, i.e. if

bᵀS(0,vt−1) < r =⇒ bᵀS(1,vt−1) < r. (10)

In the opposite case no solution for vt−1 would be defined. Let me further assume thatthe forecasting error implied by the triangle inequality is marginal, i.e. if we rewrite (3)as xt = f(vt−1, εt), then Et−1xt ≈ f(vt−1,0). Note that this assumption, although verycommon, could lead to misspecification.

The unconstrained system (N + cbᵀ) can be solved using familiar methods like the

9Taken separately, the constrained system generally does have a different steady state (x∗c ,v∗c ),

i.e. the steady state of the constrained system is a nonzero deviation from the steady state of theunconstrained system with

(I−N)

[x∗cv∗c

]= cr 6= 0 =

[x∗uv∗u

]. (7)

7

QZ-decomposition. Denote this (linear) solution by the matrix Ω, where it holds that

xt = Ωvt−1 ∀ bᵀ

[ΩI

]vt−1 − r ≥ 0. (11)

For Q =[I −Ω

], Equation (11) implies that

QEt

[xt+k

vt+k−1

]= 0 ∀ bᵀ

[ΩI

]Etvt+k−1 − r ≥ 0, (12)

i.e. for every future period t + k in which the system is expected to be unconstrained(and to remain so).

Now assume that the constraint binds at time t and will continue to do so until atleast period t+ k − 1. Iterating System (6) forward yields

Et

[xt+k

vt+k−1

]= Nk

[xt

vt−1

]+ (I−N)−1(I−Nk)cr ∀ bᵀEt

[xt+k−1

vt+k−2

]− r < 0, (13)

where (I − N)−1(I − Nk) =∑k−1i=0 Ni is the transformation for a geometric series of

matrices. Assuming that the system is unconstrained at t + k we can pre-multiplyEquation (13) by Q to obtain

QNk

[xt

vt−1

]= −Q(I−N)−1(I−Nk)cr, (14)

k = 0 iff. bᵀ

[xt

vt−1

]− r ≥ 0, (15)

k = 1 iff. bᵀEt

[xt+1

vt

]− r ≥ 0 > bᵀ

[xt

vt−1

]− r, (16)

. . . (17)

k = j iff. bᵀEt

[xt+j

vt+j−1

]− r ≥ 0 > bᵀEt

[xt+j−1

vt+j−2

]− r, (18)

which implies a solution of the endogenous variables xt in terms of the state variablesvt−1 given k. Since c is a vector of (known) constants, the whole RHS of (14) is a(known) vector. As suggested in Equation (27) denote this solution by

S(k,vt−1) =

[xt

vt−1

]: QNk

[xt

vt−1

]= −Q(I−N)−1(I−Nk)cr

. (19)

Using Equations (13) and (19) we can express the expectations on the state of theeconomy in period t+ k as the (preliminary) function L with

Lk(k,vt−1) = Nk

[S(k,vt−1)

vt−1

]+ (I−N)−1(I−N−k)cr (20)

= Et,k

[xt+j

vt+j−1

]. (21)

8

Note that L(0,vt−1) = S(0,vt−1) = Ωvt−1.The remaining task is to actually find the equilibrium k∗. This is trivial since the

function L can be used to rephrase the conditions in (1) as

k∗ = minkk s.t. bᵀLk(k,vt−1)− r ≥ 0. (22)

A simple result on existence and uniqueness can be found in Appendix A.

Implementation (sketch): no transition to the constraint

The case in which the system jumps to the constraint on impact of the shock canbe implemented very uprightly (this case is not actually implemented). Startingwith k = 0 we can just check for each k whether bᵀLk(k) > r, i.e. if the systemis unconstrained at period k. Once this condition is satisfied we are done.

k = 0

while b @ L(k, v) - r_bar < 0:

k += 1

2.3 Transition to the constraint

For empirically relevant models it is necessary, due to persistence in state variables, toallow for the case that shocks do not immediately push the system to the constraint butinitiate a transition towards it. It is straightforward to take Equation (19) as a startingpoint. We need to add the number of periods l in the unconstrained system (N + cb)until the system is at the constraint, as in

S(l, k,vt−1) =

[xt

vt−1

]: QNk (N + cb)

l

[xt

vt−1

]= −Q(I−N)−1(I−Nk)cr

, (23)

and likewise extend our definition of L such that Ls(l, k,vt−1) equals Etyt+s, i.e. theexpectation of time t for the system state at time t+ s given assuming that the systemwill be unconstrained for l periods and then constrained for k periods. Convince yourselfthat it holds that

Ls(l, k,vt−1) =Nmaxs−l,0 (N + cb)minl,s

[S(l, k,vt−1)

vt−1

](24)

+ (I−N)−1(I−Nmaxs−l,0)cr, (25)

and note that Lk(0, k,vt−1) = Lk(k,vt−1), and L0(l, k,vt−1) = S(l, k,vt−1) are specialcases respectively.

Using this specification, Definition 2 summarizes the conditions for existence of anequilibrium.

9

Definition 2 (transition equilibrium). A rational expectation solution S(l∗, k∗) is arational expectations equilibrium iff

bLs(l∗, k∗) ≥ r ∀s < l∗ ∧ s ≥ k∗ + l∗ (26)

andbLs(l

∗, k∗) < r ∀l∗ ≤ s < k∗ + l∗. (27)

This is very helpful as we are now able to check whether or not a (l∗, k∗) equilibriumexists and can also find it in finite time.

Implementation: transition to the constraint

The implementation of this case is only slightly more complicated and nests theno-transition case. First, define some maximum values for l and k. Then checkwhether no constraint until lmax can be an equilibrium.

l, k = 0, 0

for l in range(l_max):

if b L(l, 0, l, v) - r_bar < 0:

# break loop since constraint binds

break

if l is l_max - 1:

# return that l=k=0 is an equilibrium

return 0, 0

...

If this is the case, exit. Otherwise assume k > 0 and iterate over l and k until theequilibrium conditions in (9), (26) and (27) are satisfied.

...

for l in range(l_max):

for k in range(1, k_max):

if l:

if b L(l, k, 0, v) - r_bar < 0:

continue # continue skips the inner loop

if b L(l, k, l-1, v) - r_bar < 0:

continue

if b L(l, k, k+l, v) - r_bar < 0:

continue

if b L(l, k, l, v) - r_bar > 0:

continue

if b L(l, k, k+l-1, v) - r_bar > 0:

continue

# if we made it here, this must be an equilibrium

return l, k

# if the loop went though without finding an equilibrium, throw a warning

warn(’No equilibrium exists!11’)

This method is very simple computationally: one main loop iterates on the dura-

10

tions l and k. Checking each condition only implies the execution of one matrix(dot-) multiplication.10 As a result, the implementation is able to process approx-imately 80,000 particles per second. It is also included in the pydsge package tobe found at https://github.com/gboehl/pydsge.

2.4 Main method: several constraint

This section is under constructionOutline: the problem with several constraints is that for each constraint i an (li, ki)

solution must be found. (li, ki) is likely to depend on (lj 6=i, kj 6=i) and vice versa. Theproblem with the bruite force approach from the previous subsection is, that the set ofpossible spell durations will grow exponentially with the number of constraints (course ofdimensionality). Finding a RE solution will imply iterating over the complete set, whichwill quickly turn out time consuming as the number of constraints increases. This prob-lem can be avoided by converting the integer-based problem of finding (li, ki)∀i ∈ I intoa regular continuous root finding problem. This can be achieved by acknowledging thatAn = QΛnQ−1 where QΛQ−1 is the eigendecomposition of A. As Λ is a diagonal ma-trix, Λn is also defined for non-integer values of n. Using this, Ls(l, k, ) can be expressedfor real s, l, k and the equilibrium conditions can be expressed as a root finding problemfor l, k and two dummy conditions. For example, for l∗, k∗ > 0 to be an equilibrium, itmust be satisfied that Ll∗(l∗, k∗) = 0 and Ll∗+k∗(l∗, k∗) = 0 and that (dummy conditions)Ll∗+k∗/2(l∗, k∗) + |a1| = 0 and Ll∗+k∗+1(l, k)− |a2| = 0 for at least one pair of dummies(a1, a2). This means every constraint adds 4 equations to a standard nonlinear root find-ing problem. Given some regularity assumptions on these equations, the all (li, ki) canbe found simultaneously and fairly efficiently using standard numerical methods for rootfindings.

3 A Nonlinear Bayesian Filter

We are interested in solving the filtering problem, which means we (a) want to inferon the likelihood of the data, (b) want to estimate the distribution of all model statesgiven the data (observability), and (c) want to obtain the shock series that is able toreproduce the mode of the filtered states (recoverability).

The law of motion resulting from the method introduced in the previous section isa –potentially highly – nonlinear mapping from vt−1 to vt. Accordingly, we can notuse a linear filter like the standard Kalman filter but must use a nonlinear filter. Thereis a growing literature on applying particle filters (also called Sequential Monte Carlo

10For further speed boost, the function Ls(l, k,vt) is decomposed into a matrix LMl,s =

Nmaxs−l,0 (N + cb)minl,s and the vector Lvl,s = (I −N)−1(I −Nmaxs−l,0)cr, which both only

depend on l and s. The function S(l, k,vt) can likewise be decomposed to the matrix that is pre-multiplied to vt and a vector, which both depend on l and k. Given a reasonable set of l, k-values, forevery point in the parameter space these four objects can be pre-calculated. This is done in less that3 milliseconds for the Smets and Wouters (2007) benchmark. For the implementation the just-in-time(jit) compiler from the numba package is used.

11

https://github.com/gboehl/pydsge

methods) to economic models and data (see e.g. An and Schorfheide, 2007; Fernandez-Villaverde and Rubio-Ramırez, 2007; Herbst and Schorfheide, 2017). These methods arerelatively simple to implement but require an extremely high number of particles (i.e.transition function evaluations). For the benchmark model of Smets and Wouters (2007)estimates of the number of necessary particles range from at least 40.000 particles inHerbst and Schorfheide (2017) to – more realistically due to the curse of dimensionality– about 1.500.000 particles as in Gust et al. (2017). There is not yet a consensus on thenecessary number of particles depending on the dimensionality of the problem.

As advocates of the particle filter, Gust et al. (2017) successfully use it to estimate anonlinear medium scale DSGE model. Atkinson et al. (2018) globally solve a smaller New-Keynesian model. They as well document that the estimation using the particle filter ismost accurate in recovering true parameter values from an artificial data set (using 40.000particles), while the UKF is only twice as fast. However, estimation exercises with bothfilters require super-computers.11 In contrast, Andreasen (2013) documents that Sigmapoint filters as the one discussed further below are not only much faster but can also bemore accurate than particle filters. As laid out in several sources (for instance Binningand Maih, 2015), particle filter methods can further be subject to numeric instability.

There exists a family of Kalman Filters that are adopted to work with mildly nonlinearmodels under the fundamental assumption that distribution of states at each period isapproximately Gaussian. Whether or not one of these filters do a good job much dependson the transition function and the concrete application. Clearly, none of these filters intheir pure form will suffice for the tasks a) to c) listed above. But as we will see,combining particle filter and Kalman technology with iterative methods will yield veryprecise results while being computationally far more effective than the pure particle filter.

The first early nonlinear candidate (Smith et al., 1962; McElhoe, 1966) from theKalman family is the Extended Kalman Filter (EKF). The EKF replaces the state tran-sition and observation matrices with the Jacobian matrices of transition and observationfunction around the previous periods’ mean. These Jacobians are normally approximatednumerically. While considerably fast, this filter has a number of drawbacks. It can di-verge fast if nonlinearities are more severe or if observations fluctuate a lot. Given thatit does not use the actual transition function, covariances tend to be underestimated.Shock innovations extracted by the EKF are likely to be unable to reproduce the filteredstates. Despite these drawbacks the EKF is still widely used, in particular in engineeringand related disciplines.

The second prominent member is the Unscented Kalman Filter (UKF, Julier et al.,2000). The UKF relies on a deterministic sampling technique, the so-called Sigma points,that aims to minimize the number of points necessary at each iteration. These points arethen propagated through the true nonlinear transition function, then mean and covari-ance are calculated analogously to the linear Kalman filter (this is called the unscentedtransform).12 The quality of the estimates crucially depends on the choice of appropriateSigma points in order to represent the nonlinearity of the dynamic system sufficientlywell.

The approximation for each direction in a certain dimension relies on only one sample

11640 datasets, 115 hours/dataset (UKF), 313 hours/dataset (particle filter); 20 cores per MCMC.12Technical details can be found in Julier et al. (2000), Wan and Van Der Merwe (2000) and Julier

(2002).

12

point. If this point lies too far off while the transition function is strongly cubic in atleast one dimension – as for instance it is the case with the ZLB – the gradient will beoverestimated and the respective step in the state will be understated. Hence, in generalthe UKF tends to overestimate the tails and in the worst case, the filter will simplydiverge. This means that the quality of the filtering results crucially depends on theparameterization of the Sigma points. Unfortunately, a calibration that yields reasonableprecise estimates for one draw from the parameter distribution must not necessarily doso for a different draw. For the example of the ZLB the filter will evaluations points inthe state space that imply ZLB-spells of more than 20 periods. Further problematic isthe filters’ dependence on the matrix square root for find the optimal sigma points. Thisproblem deteriorates when the covariance matrix is close to singular, which is very likelythe case for models with occasionally binding constraints.13

The filter presented here draws on yet another member of the Kalman-class, the socalled Ensemble Kalman Filter (EnKF) introduced in Evensen (1994). The EnKF isa shifting-based method as opposed to reweightning-based method such as the particlefilter. Although used in many applications ranging from weather forecasting to targettracking, as Katzfuss et al. (2016) point out, the filter is remarkably unknown in thestatistic community. Instead of using a deterministic sampling technique, the initialdistribution of points is sampled stochastically just as with the particle filter. Afterpassing them through the transition function, each particle is updated using the Kalmantransformation. The EnKF is hence a degenerate of the particle filter but under theassumption that the state distribution in each period is approximately Gaussian. Whilethis filter performs well when it comes to likelihood approximation and inference of thedistribution of hidden states, it is still an approximation and hence fails in regard toexact recoverability.

The first difference between the filter introduced here and the EnKF is that it assumesthat the size of the state space is larger than the number of particles. The “transposed”formulation of the filter here is intended for the opposite case where the state spaceis smaller than the number of particles. The rest of this chapter is concerned withextending the transposed EnKF such that probabilistic state inference is improved whilefully accounting for the nonlinearity of the transition function. Inspired by the iterativeextended Kalman filter (Zhang, 1997), I make use of iterative techniques to find thetrue mode of the states by fitting the shock innovations. That way, recoverability of thefiltered states by the shock innovations can be maintained. The result is an iterativepath-adjusting Transposed Ensemble-Rauch-Tung-Striebel-Smoother (IPA smoother).

3.1 Filter

Let us for the moment diverge from the notation from the previous chapter and denotea potentially nonlinear hidden Markov-Model (HMM) by

xt =g(xt−1, εt) (28)

zt =h(xt) + νt (29)

13If the constraint binds, the derivative with respect to the constrained variable is zero.

13

with εt ∼ N (0, Q) and νt ∼ N (0, R). Since almost all objects in this chapter are eithervectors or matrices I will drop the convention to use boldfaced characters except for theensembles.

Let me denote with Xt = [x1t , · · · ,xNt ] ∈ Rn×N the ensemble at time t, which consists

of N vectors of the state. Further denote by (xt, Pt) the mean and the covariance matrixof the unconditional distribution of states for period t. Initialize the ensemble by samplingN times from the prior distribution

X0N∼ N (x0, P0) . (30)

Step 1: Predict

Predict the prior-ensemble Xt|t−1 at time t by applying the transition function tothe posterior ensemble from last period. Use the observation function to obtain a prior-ensemble of observables

Xt|t−1 = g(Xt−1|t−1, εt) (31)

Zt|t−1 = h(Xt|t−1) + νt, (32)

where εt and νt are each N realizations drawn from the respective distributions.

Step 2: Update

Denote by Xt = Xt(IN − 11ᵀ/N) the anomalies of the ensemble, i.e. the deviationsfrom the ensemble mean. Recall that the covariance matrix of the prior distribution at

t isXtX

ᵀt

N−1 . The Kalman mechanism then yields an update-step of

Xt|t = Xt|t−1 + Xt|t−1Zᵀt|t−1

(Zt|t−1Z

ᵀt|t−1

)−1 (zt1

ᵀ − Zt|t−1

). (33)

The mechanism is similar to the UKF but with particles instead of deterministicSigma points and statistical linearization instead of the unscented transform. This helpsus to avoid the dependence of the filtering result on the parameterization of the filter.Conceptionally this procedure can hence be seen as a transposition of the EnKF.14

3.2 Smoothing

The process of using all available information on all estimates is called smoothing.For this purpose I make use of the Rauch-Tung-Striebel smoother (Rauch et al., 1965)in its Ensemble formulation similar to Raanes (2016).

14Notationally both are equivalent. The regular EnKF assumes the size of the state spaces to be larger

than N , and accordingly the term(Zt|t−1Zᵀ

t|t−1

)to be rank deficient. The mechanism then builds on

the properties of the pseudoinverse (the latter provides a least squares solution to a system of linearequations), which is used instead of the regular matrix inverse.

14

Denote by T the period of the last observation available and update each ensembleaccording to the backwards recursion15

Xt|T = Xt|t + Xt|tX+t+1|t

[Xt+1|T −Xt+1|t

]. (35)

3.3 Iterative Path-adjusting

We have created a seriesXt|T

Tt=0

of representatives of the distributions of statesat each point in time, reflecting all the available information. We now want to ensurethat the mode of the distribution fully reflects the nonlinearity of the transition functionwhile retaining a reasonably good approximation of the full distribution. For economicanalysis we are also interested in the series of shocks, εtT−1

t=0 , that fully recovers themode of the smoothened states. It is very important that the smoothened distributionsare targeted instead of, e.g., just the distributions of observables and shocks. Only whenthe full smoothened distributions are targeted it can be maintained that all available in-formation is taken into account. This procedure implicitly assumes that the smootheneddistributions approximate the actual transition function sufficiently well and only minoradjustments remain necessary. Since in general there are (many) more states than ex-ogenous shocks, the fitting problem is underdefined and matching precision will dependon the size of the relative (co)variance of each variable. Small observation errors lead tosmall variances around observable states and tight fitting during path-adjustment whileloosely identified states grant more leeway.

Initiate the algorithm with x0 = EX0|T (the mean vector over the ensemble members),define Pt|T = CovXt|T and for each period t recursively find

εt = arg maxε

log f

(g(xt−1, ε)|xt|T , Pt|T

), (36)

xt =g(xt−1, εt), (37)

which can be done using standard iterative methods. Numerical details can be found inAppendix B together with a brief discussion on numeric stability.

The resulting series of xt corresponds to the effective (approximative) mode given theinitial mean and approximated covariances and is completely recoverable by εt. Natu-rally, it represents the nonlinearity of the transition function while taking all availableinformation into account. Since the deviation between mode xt and mean xt should bemarginal, I will refer to

xt, PtTt=0 (38)

as the path-adjusted smoothed distributions.16

15Although it is formally correct that

Xt|tXᵀt+1|t

(Xt+1|tX

ᵀt+1|t

)+= Xt|tX

+t+1|t, (34)

the implementation using the LHS of this equation is numerically more stable when using standardimplementations of the pseudo-inverse based on SVD.

16Unfortunately the adjustment step can not be done during the filtering stage already. Iterative ad-justment before the prediction step, would bias the transition of the covariance. Likewise, adjusting afterthe prediction step will require the repeating the prediction and updating step leading to a potentiallyinfinite loop.

15

Implementation

The IPA smoother is implemented in the econsieve package which can be foundat https://github.com/gboehl/econsieve. Further speedup is again achievedby compiling with numba.

As proposed in Ungarala (2012) one could also iterate the whole ensemble instead ofthe mode. However, as they note, the covariance of the ensemble is already determinedby the first update step an does not alter with further iterations. Then a procedurethat iterates the whole ensemble has the sole advantage that Newton-Ralphson iterativeschemes can be used with the covariance matrix as an approximation for the Jacobian.By fitting the residuals to find the mode of the distributions, the latter are explicitelytaken into account. When iterating over the ensemble mean, much emphasis would lie onthe point estimate. Further, iterating over the whole ensemble is computationally veryexpensive as it requires N function evaluations per iteration and trough the stepwisefashion of the transition function convergence may be problematic. A last advantageof the procedure presented here is that errors in general do not accumulate. Althoughinitially both the EnKF and – more so – the UKF will overestimate shocks from thetails of the distribution, like e.g. a financial crisis shock (and hence underestimate thelikelihood thereof), following state estimates will automatically correct the mistake.

4 Estimation

This section is under constructionFor posterior sampling I suggest avoiding methods that rely on mode maximization,

as these prone to get stuck in local maxima. But even if the global mode could be foundusing respective optimization techniques, some odd-shaped likelihood functions mightnot allow the MCMC algorithm to fully explore the posterior distribution (f.e. becausethe posterior is bimodal, or because it is flat with many local spikes).

I propose a tempering extension to the differential evolution Monte carlo Marcovchain (DE-MCMC) suggested by Ter Braak (2006) and ter Braak and Vrugt (2008).Tempering can be done in many ways, e.g. one could folow the lines of Herbst andSchorfheide (2014). The tempering algorithm is build on top of the implementation ofGoodman and Weare (2010).

The DE-MCMC method is a class of ensemble MCMC methods. Instead of usinga single or small number of chains that are state dependent (as e.g. in the Metropolisalgorithm), ensemble samplers use a large number of chains (the “ensemble”). For eachiteration, proposals are generated based on the current state of the ensemble. Thesemethods are hence self-tuning and do – if at all – only require hyper parameters. An-other advantage of ensemble samplers is that massive parallelization is straightforward.Ensemble methods have been extensively applied in particular in the field of astrophysics.

Several methods to initialize such ensemble are conceivable. Goodman and Weare(2010) suggest to initialize it as a small ball around some initial value. This howeverbears the risk that the ensemble can not fully unfold due to odd or irregularly shapedposteriors.

I suggest to initialize the ensemble with the prior distribution. This will put equalinitial weight to each region of the parameter space. In order for particles not to not

16

econsieve

https://github.com/gboehl/econsieve

numba

leave low-density regions too early, a tempering scheme can be used. The temperature λhere is the weight of the likelihood for the posterior. A λ = 0 posterior hence is identicalto the prior. A gradual increase of λ towards one allows to also sample odd-shaped ornear-flat distributions as some particles will remain in low-likelihood-high-prior regionsfor an extended period of time.

5 The Simple New Keynesian Model

Without going into the microfoundations, let me propose a simple New-Keynesianmodel a la Woodford (2003), Galı (2008) and Co. with a constraint at the zero lowerbound of interest rates. This model (in log-deviations) is given by

πt = βEtπt+1 + κyt − zt, (39)

yt = Etyt+1 − rt − dt, (40)

rt = it − Etπt+1, (41)

it = max ρit−1 + (1− ρ)(φππt + φyyt + vt), i (42)

zt = ρzzt−1 + εzt , εzt ∼ N(0, σπ), (43)

dt = ρddt−1 + εdt , εdt ∼ N(0, σy), (44)

vt = ρrvt−1 + εrt , εrt ∼ N(0, σr), (45)

with κ = (1 − θ)(1 − βθ)/θ – the slope of the Philips curve – as a function of thediscount rate β and the Calvo parameter θ. Equations (40) to (42) are the IS-curve,the fisher equation, and the Taylor rule with the ZLB on nominal interest rates. Thelast three equations define AR(1) processes of the economic shocks. zt represents a cost-push shock, dt stands for an exogenous increase in the risk premium and vt is a simplemonetary policy shock. The prior distribution of parameters follows standard principlesand can be found in Table 1.

distribution mean/alpha sd/beta mean sd MC error 2.5% 97.5%

θ beta 0.500 0.10 0.962 0.010 0.000 0.942 0.979φπ normal 1.700 0.25 1.218 0.274 0.002 0.687 1.767φy normal 0.125 0.05 0.135 0.022 0.000 0.097 0.181ρd beta 0.700 0.20 0.932 0.014 0.000 0.904 0.959ρr beta 0.700 0.20 0.528 0.081 0.000 0.376 0.684ρz beta 0.700 0.20 0.451 0.069 0.000 0.304 0.584ρ beta 0.700 0.20 0.760 0.059 0.000 0.645 0.869σd inv gamma 0.100 2.00 0.169 0.035 0.000 0.103 0.239σz inv gamma 0.100 2.00 0.251 0.074 0.000 0.115 0.399σr inv gamma 0.100 2.00 0.239 0.050 0.000 0.159 0.335

Table 1: Estimation results

I use US data from 1996Q1 to 2018Q1 of GDP deflator, GDP growth and the Fed-eral Funds rate for both filtering and estimation. For the initial distribution I assume

17

that each state is distributed with N (0, 10) (hence Q = diag100). I assume the stan-dard deviation of each observation to be 10% of the standard deviation of the respectivetime series in data. This excludes the federal funds rate for which I assume a stan-dard deviation of 0.1%, reflecting that interest rates are almost perfectly observable andensuring that the ZLB is fully respected. Output trend is assumed to be 0.356% perquarter which corresponds to the pre-crisis average. The mean inflation is assumed tobe 0.5, corresponding to an inflation target of approximately 2% annually.17 Since allvariables are expressed as deviations from their steady state, the ZLB is assumed to holdif it < i = 0.96, which implies a quarterly annual interest rate of about 4%.

For this small-scale model the estimation takes about 30 minutes on a machine with40 cores. The filter is set up with 250 particles. I run 1500 iterations with 100 chains,of which I discharge the first 1000. Details on the estimation – such as the tracesof the Markov chains and histograms of the parameter distribution – can be found inAppendix C. The main message from these plots is that the chains have convergedand parameters seem well identified. The right side of Table 1 lists the statistics of theposterior distribution.

Figure 1 shows the smoothened distributions of the observables at the posterior mean.The dashed line depicts the data, which are met very precisely. Note that the mode/meanof the distributions is fully reproduced by the filtered shocks.

The estimation result is roughly in line with standard results, let me only note afew things. First, the estimate of θ is very high, leading to a very flat Philips curve(κ ≈ 0). This generally corresponds to findings in the literature (Boehl and Strobel,2019a,b; Linde et al., 2017; Fratto and Uhlig, 2014), although the estimate here is evenmore extreme. This can likely be attributed to the absence of additional mechanismsthat are empirically relevant. Further, while demand shocks are very persistent, supplyand monetary policy shocks are not. As we will see below, this stems from the fact thatthe risk-premium governs the strong but persistent movement of output while the cost-push shock induces the short-run fluctuations in inflation. Meanwhile, persistence in theinterest rate is quite high allowing for forward guidance – interpreted as announcementson the future path of the interest rate – having sizable effects.

Let us turn to Figure 2 which decomposes the smoothened states into the contributionof the different shocks. For this figure I took 500 draws from the converged part ofthe Markov chain (the posterior). For each of these draws I use the IPA smoother toapproximate the mode series of shocks, which are then used to simulate the states. I thensubsequently switch off shocks, starting with the technology shock. This is helpful tovisualize the impact and effect of each of the shocks. Of particular interest here are thehidden states which are inferred by the filter which are yt (as opposed to the observablegrowth rate ∆yt = yt − yt−1) and the exogenous shocks. Output y = c prospers in theperiod before the 2008 crisis and then falls abruptly, whereas the drop is mainly driven bythe sharp increase in the risk-premium. For the pre-crisis years monetary policy and – toa lesser extend – supply sided shocks also had an impact. The drop in the risk-premiumalso lowers the inflation rate. Short run fluctuations in the inflation rate are howevermainly driven by the cost-push shock.

17As discussed in Boehl and Strobel (2019b) a different strategy would be to also estimate trendand inflation target. The result would be even lower mean growth which then would lead the filter tounderstate the severity of the crisis.

18

2000 2004 2008 2012 2016ygr

2

1

0

1

2000 2004 2008 2012 2016infl

0.25

0.00

0.25

0.50

0.75

1.00

2000 2004 2008 2012 2016int

0.0

0.5

1.0

1.5

Observables

Figure 1: Observables with simulated observables (lines are very close). Note: mean and 68% intervalover 500 simulations. Each simulation is based on the means of IPA-smoothened shocks given a drawfrom the posterior parameter distribution.

During the pre-crisis years the nominal interest rate is to a large extend driven bymonetary policy shocks, whereas the estimates indicate that the rate is kept too low foran extended period of time. When the crisis hits, the simulations imply that the Fedlowered the interest rate faster than implied by the estimated interest rate smoothingparameter, which modestly mitigated the initial impact of the financial crisis on output.Further, at the end of the sample the rate was raised too early than what would haveimplied by the Taylor rule. This “sudden” increase is accompanied by small losses inoutput. This effect can be explained by the fact that output trend is fixed to the pre-crisislevel. The Fed probably used a different measure of output at that point in time.

We can also use the filter to estimate the impact of monetary policy on expectationson the ZLB duration. Figure 3 compares the NEPC with and without monetary policyshocks, suggesting that FOMC announcements had the effect of raising expectations byabout one to two quarters. Despite this, forward guidance had rather modest effectson output and barely any impact on inflation. The fact that rates where at the ZLB isexplained by bad economic conditions rather than expansionary monetary policy. Thisanalysis is obviously constrained by the simplicity of the model. The results are howeverpromising to adapt a more realistic and empirically relevant model to the methodology.

19

2000 2004 2008 2012 2016Pi

0.50

0.25

0.00

0.25

0.50

2000 2004 2008 2012 2016c

7.5

5.0

2.5

0.0

2.5

5.0

2000 2004 2008 2012 2016y

7.5

5.0

2.5

0.0

2.5

5.0

2000 2004 2008 2012 2016dy

3

2

1

0

1

2

States 1

2000 2004 2008 2012 2016r

1.0

0.5

0.0

0.5

2000 2004 2008 2012 2016d

1.0

0.5

0.0

0.5

1.0

1.5

2000 2004 2008 2012 2016z

0.2

0.0

0.2

2000 2004 2008 2012 2016vr

0.2

0.0

0.2

0.4 totale_ue_re_z

States 2

Figure 2: Decomposition of time series into the contribution of the different shocks. Note: medians over500 simulations. Each simulation is based on the means of IPA-smoothened shocks given a draw fromthe posterior parameter distribution.

20

2008 2010 2012 2014 2016 20180

1

2

3

4

5

6 k

Figure 3: The number of expected periods at the ZLB. Blue: with monetary policy/forward guidanceshocks. Orange: Without forward guidance.

6 Conclusion

This paper has two major contributions. I first present a fast and accurate methodto endogenously handle occasionally binding constraints in linear dynamic systems. Ithen introduce a Bayesian filter (the IPA smoother) for nonlinear systems that, with verylimited computational effort, provides reliable estimates of the distribution of states whilefully accounting for the nonlinearity of the transition function. The smoothened series ofexogenous variables is able to fully recover the data, which is useful in order to conductcounterfactual analysis. I discuss the implementation of Bayesian estimation when usingthe solution method together with the IPA smoother and show that estimation andfiltering can be done accurately while at the same time keeping computational costs low.

As a proof-of-concept, the estimation and analysis of the simple NK model suggeststhat the Phillips curve in the US is very flat. Low interest rates are mainly explained bybad economic conditions rather than expansionary monetary policy. Although forwardguidance policy seemed to have some effects on extending the expected duration of theZLB-spell, its impact on output and inflation is estimated to be negligible. Given thesimplistic nature of the model employed, these results must remain preliminary. Theyhowever document robustness of the methods and promises valuable insights when ap-plying the outlined methodology with more empirically relevant models, as done in Boehland Strobel (2019a,b).

References

An, S., Schorfheide, F., 2007. Bayesian analysis of dsge models. Econometric reviews 26, 113–172.Andreasen, M.M., 2013. Non-linear dsge models and the central difference kalman filter. Journal of

Applied Econometrics 28, 929–955.Atkinson, T., Richter, A.W., Throckmorton, N.A., 2018. The Accuracy of Linear and Nonlinear Esti-

mation in the Presence of the Zero Lower Bound. Technical Report.Binning, A., Maih, J., 2015. Sigma point filters for dynamic nonlinear regime switching models. Technical

Report.Binning, A., Maih, J., 2016. Implementing the zero lower bound in an estimated regime-switching DSGE

model. Technical Report.Blanchard, O.J., Kahn, C.M., 1980. The solution of linear difference models under rational expectations.

Econometrica: Journal of the Econometric Society , 1305–1311.

21

Boehl, G., Strobel, F., 2019a. A Structural Investigation of Quantitative Easing. Technical Report.URL: https://gregorboehl.com/live/qe_bs.pdf.

Boehl, G., Strobel, F., 2019b. The Great Recession and the Zero Lower Bound. Technical Report. URL:https://gregorboehl.com/live/recession_elb_bs.pdf.

ter Braak, C.J., Vrugt, J.A., 2008. Differential evolution markov chain with snooker updater and fewerchains. Statistics and Computing 18, 435–446.

Evensen, G., 1994. Sequential data assimilation with a nonlinear quasi-geostrophic model using montecarlo methods to forecast error statistics. Journal of Geophysical Research: Oceans 99, 10143–10162.

Fernandez-Villaverde, J., Rubio-Ramırez, J.F., 2007. Estimating macroeconomic models: A likelihoodapproach. The Review of Economic Studies 74, 1059–1087.

Fratto, C., Uhlig, H., 2014. Accounting for post-crisis inflation and employment: A retro analysis.Technical Report. National Bureau of Economic Research.

Galı, J., 2008. Introduction to Monetary Policy, Inflation, and the Business Cycle: An Introduction tothe New Keynesian Framework. Princeton University Press.

Goodman, J., Weare, J., 2010. Ensemble samplers with affine invariance. Communications in appliedmathematics and computational science 5, 65–80.

Guerrieri, L., Iacoviello, M., 2015. Occbin: A toolkit for solving dynamic models with occasionallybinding constraints easily. Journal of Monetary Economics 70, 22–38.

Guerrieri, L., Iacoviello, M., 2017. Collateral constraints and macroeconomic asymmetries. Journal ofMonetary Economics 90, 28–49.

Gust, C., Herbst, E., Lopez-Salido, D., Smith, M.E., 2017. The empirical implications of the interest-ratelower bound. American Economic Review 107, 1971–2006.

Herbst, E., Schorfheide, F., 2014. Sequential monte carlo sampling for dsge models. Journal of AppliedEconometrics 29, 1073–1098.

Herbst, E., Schorfheide, F., 2017. Tempered particle filtering. Technical Report. National Bureau ofEconomic Research.

Holden, T.D., 2016. Computation of solutions to dynamic models with occasionally binding constraints.Technical Report.

Holden, T.D., 2017. Existence and uniqueness of solutions to dynamic models with occasionally bindingconstraints. Technical Report.

Jones, C., 2017. Unanticipated Shocks and Forward Guidance at the ZLB. Technical Report. manuscript.Julier, S., Uhlmann, J., Durrant-Whyte, H.F., 2000. A new method for the nonlinear transformation of

means and covariances in filters and estimators. IEEE Transactions on automatic control 45, 477–482.Julier, S.J., 2002. The scaled unscented transformation, in: American Control Conference, 2002. Pro-

ceedings of the 2002, IEEE. pp. 4555–4559.Katzfuss, M., Stroud, J.R., Wikle, C.K., 2016. Understanding the ensemble kalman filter. The American

Statistician 70, 350–357.Linde, J., Maih, J., Wouters, R., 2017. Estimation of Operational Macromodels at the Zero Lower

Bound. Technical Report. manuscript.McElhoe, B.A., 1966. An assessment of the navigation and course corrections for a manned flyby of

mars or venus. IEEE Transactions on Aerospace and Electronic Systems , 613–623.Nelder, J.A., Mead, R., 1965. A simplex method for function minimization. The computer journal 7,

308–313.Raanes, P.N., 2016. On the ensemble rauch-tung-striebel smoother and its equivalence to the ensemble

kalman smoother. Quarterly Journal of the Royal Meteorological Society 142, 1259–1264.Rauch, H.E., Striebel, C., Tung, F., 1965. Maximum likelihood estimates of linear dynamic systems.

AIAA journal 3, 1445–1450.Sims, C.A., 2002. Solving linear rational expectations models. Computational economics 20, 1–20.Smets, F., Wouters, R., 2007. Shocks and frictions in us business cycles: A bayesian dsge approach.

American Economic Review 97, 586–606.Smith, G.L., Schmidt, S.F., McGee, L.A., 1962. Application of statistical filter theory to the optimal

estimation of position and velocity on board a circumlunar vehicle. National Aeronautics and SpaceAdministration.

Ter Braak, C.J., 2006. A markov chain monte carlo version of the genetic algorithm differential evolution:easy bayesian computing for real parameter spaces. Statistics and Computing 16, 239–249.

Ungarala, S., 2012. On the iterated forms of kalman filters using statistical linearization. Journal ofProcess Control 22, 935–943.

Wan, E.A., Van Der Merwe, R., 2000. The unscented kalman filter for nonlinear estimation, in: AdaptiveSystems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE

22

https://gregorboehl.com/live/qe_bs.pdf

https://gregorboehl.com/live/recession_elb_bs.pdf

2000, Ieee. pp. 153–158.Woodford, M., 2003. Interest and Prices: Foundations of a Theory of Monetary Policy,” Princeton

University.Zhang, Z., 1997. Parameter estimation techniques: A tutorial with application to conic fitting. Image

and vision Computing 15, 59–76.

Appendix A Uniqueness result for the case without transition to the con-straint

Note that this result is interesting from a theory perspective, but without relevanceeconomically since transitions towards the constraint are rather common for many pointsin the state space (though probably not empirically relevant). In order to obtain condi-tions on existence and uniqueness it is helpful to denote

N =

[N1 N2

N3 N4

], c =

[c1

c2

]and b =

[b1

b2

]. (A.1)

Theorem 1. Assuming no transition phase to the constraint, for any given vector ofstate variables vt the equilibrium xt = S(k∗,vt) of the system in 3 exists and is uniqueif

b1 (N1 −ΩN3)−1

(c1 −Ωc2) ≥ 0. (A.2)

Proof. Let us write out the conditions on uniqueness in more detail. The equilibrium ofk∗ is unique iff

bL(k∗) ≥ r ∧ bLk<k∗(k∗) > r (A.3)

∧ bL(k∗ + 1) < r ∨ bLk∗(k∗ + 1) > r (A.4)

∧ bL(k∗ + 2) < r ∨ bLk∗(k∗ + 2) > r ∨ bLk∗+1(k∗ + 2) > r (A.5)

∧ bL(k∗ + 3) < r ∨ bLk∗(k∗ + 3) > r ∨ bLk∗+1(k∗ + 3) > r ∨ . . . (A.6)

From here it is clear that the equilibrium is unique if

i) the system is unconstrained after k periods, then it is also unconstrained after kperiods if it is expected to be unconstrained in k + 1 periods,

bL(k) ≥ r =⇒ bLk(k + 1) > r (A.7)

ii) the system is unconstrained in the period it is expected to be unconstrained whenit is already unconstrained one period before

bLk−1(k) ≥ r =⇒ bL(k) > r. (A.8)

Compare the difference between the two systems for k = 1. Writing out (3) yields

N1xut + N2vt−1 + c1 (b1xut + b2vt−1) = Etxut+1 (A.9)

N3xut + N4vt−1 + c2 (b1xut + b2vt−1) = vut , (A.10)

for the unconstrained xut and setting xut+1 = Ωvut enables to rewrite

(N1 −ΩN3) xut + (N2 −ΩN4) vt−1 + (c1 −Ωc2) (b1xut + b2vt−1) = 0. (A.11)23

Repeat the same under the assumption that yt is constrained and again set xct+1 = Ωvctin order to get

(N1 −ΩN3) xct + (N2 −ΩN4) vt−1 + (c1 −Ωc2) r = 0. (A.12)

Combine Equations A.11 and A.12 and pre-multiply by b1 to obtain

b1xct = b1xut + ψ (b1xut + b2vt−1 − r)︸︷︷︸>0 iff yt unconstrained

(A.13)

with ψ = b1 (N1 −ΩN3)−1

(c1 −Ωc2) being a scalar. If ψ > 0 as stated in the theorem,then condition i. holds. Since the result implies as well that the constrained system isalso moving further away from the constraint than the unconstrained system, conditionii. also holds. Additionally this ensures the existence of a solution.

Theorem 1 states the main result concerning existence and uniqueness if we abstainfrom systems which slowly transition to the constraint. Close to the constrained and interms of the measure b, the matrix N1 has to be such that the expectation dynamicsmore than offset the transition dynamics of the state space, and the force induced byc1 needs to be more-than set-off by the impact of the constraint on the state space.Unfortunately, neither the simple NK model nor the medium-scale SW model do actuallyfulfill this criterion.

Appendix B Numerical maximization of the posterior mode

Since the Newton-Ralphson algorithm would require the Jacobian and Hessian ofthe transition function g which would require numerical evaluation, it is by far moreefficient to use standard method such as the algorithm of Nelder and Mead (1965) onthe likelihood function. Typical problem of such local optimization methods is that theynormally only converge to local optima and their dependence on initial conditions.

In practice it improves numerical stability much to use the innovations from the corre-sponding linear system as the initial guess, which can be found analytically. Alternativelyone can use the shock innovations as implied by the mean of the RTS-smoothened states.

As a further intermediary step it is helpful to use this initial guess to find the maxi-mum likelihood estimate of

εt = arg maxε

log f (zt|h(g(xt−1, ε)), Q) + log f (ε|0, Q) , (B.1)

and then to use εt as the initial guess to actually find εt.

Appendix C Traces of MCMCs and Posterior distributions

The Figures C.4 and C.5 show the traces of 100 chains. Confidence intervals are95% and 68% of the chains. Figures C.6 and C.7 present histograms of the posteriordistribution.

24

0.92 0.94 0.96 0.980

10

20

30

40

Freq

uenc

y

theta

0 250 500 750 1000 1250 15000.92

0.94

0.96

0.98

Sam

ple

valu

e

theta

0.0 0.5 1.0 1.5 2.00.0

0.5

1.0

1.5

Freq

uenc

y

phi_pi

0 250 500 750 1000 1250 15000.5

1.0

1.5

2.0

Sam

ple

valu

e

phi_pi

0.10 0.15 0.20 0.250

5

10

15

Freq

uenc

y

phi_y

0 250 500 750 1000 1250 1500

0.10

0.15

0.20

Sam

ple

valu

e

phi_y

0.88 0.90 0.92 0.94 0.96 0.980

10

20

Freq

uenc

y

rho_d

0 250 500 750 1000 1250 1500

0.900

0.925

0.950

0.975

Sam

ple

valu

e

rho_d

0.2 0.3 0.4 0.5 0.6 0.7 0.80

1

2

3

4

Freq

uenc

y

rho_r

0 250 500 750 1000 1250 1500

0.4

0.6

0.8

Sam

ple

valu

e

rho_r

0.2 0.3 0.4 0.5 0.6 0.70

2

4

6

Freq

uenc

y

rho_z

0 250 500 750 1000 1250 1500

0.4

0.6

0.8

Sam

ple

valu

e

rho_z

Figure C.4: Left: priors and posteriors. Right: traces of the MCMCs. I use 1200 iterations as burn-inand 1500 iterations in total. 25

0.4 0.5 0.6 0.7 0.8 0.90

2

4

6

Freq

uenc

y

rho

0 250 500 750 1000 1250 1500

0.6

0.7

0.8

0.9

Sam

ple

valu

e

rho

0.10 0.15 0.20 0.25 0.30 0.350.0

2.5

5.0

7.5

10.0

Freq

uenc

y

sig_d

0 250 500 750 1000 1250 1500

0.1

0.2

0.3

Sam

ple

valu

e

sig_d

0.2 0.4 0.6 0.80

2

4

6

8

Freq

uenc

y

sig_z

0 250 500 750 1000 1250 1500

0.2

0.4

Sam

ple

valu

e

sig_z

0.2 0.3 0.4 0.50.0

2.5

5.0

7.5

Freq

uenc

y

sig_r

0 250 500 750 1000 1250 15000.1

0.2

0.3

0.4

Sam

ple

valu

e

sig_r

Figure C.5: Left: priors and posteriors. Right: traces of the MCMCs. I use 1200 iterations as burn-inand 1500 iterations in total.

26

0.92 0.94 0.96 0.98

0.942 0.979

95% HPD

mean=0.962

theta

0.5 1.0 1.5 2.0

0.687 1.767

95% HPD

mean=1.218

phi_pi

0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250

0.097 0.181

95% HPD

mean=0.135

phi_y

0.88 0.90 0.92 0.94 0.96 0.98

0.904 0.959

95% HPD

mean=0.932

rho_d

0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.376 0.684

95% HPD

mean=0.528

rho_r

0.2 0.3 0.4 0.5 0.6 0.7

0.304 0.584

95% HPD

mean=0.451

rho_z

0.4 0.5 0.6 0.7 0.8 0.9

0.645 0.869

95% HPD

mean=0.760

rho

0.10 0.15 0.20 0.25 0.30 0.35

0.103 0.239

95% HPD

mean=0.169

sig_d

Figure C.6: Histrograms of the posterior distributions27

0.2 0.4 0.6 0.8

0.115 0.399

95% HPD

mean=0.251

sig_z

0.2 0.3 0.4 0.5

0.159 0.335

95% HPD

mean=0.239

sig_r

Figure C.7: Histrograms of the posterior distributions

28

e cient solution, filtering and estimation of models …have little e ect on in ation dynamics....

Documents