ppp and unit roots: learning across regimes

24
PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN Preliminary and incomplete Abstract. Taking a Bayesian approach, we focus on the information content in two data sets: (1) the so-called Lothian–Taylor data and (2) post-war data for countries that adopted the Euro. We find the two data sets have similar implications for the existence of unit roots: Both data sets produce evidence goes against a unit root. 1. Introduction We examine two data sets. We refer to the first data set as the Lothian–Taylor data. The base currency-country for the Lothian–Taylor data is the U.K. (pound sterling). The data are annual. For the dollar/sterling exchange rate there 181 observations (1792–1973) and for the franc/sterling exchange rate there are 169 observations (1804–1973). We refer to the second data set as the Euro-related data. The are ten exchange rates; the base- currency country for the Euro-related data is Germany; these data are monthly. There are 588 observations for all ten exchange rates (January 1957 through December 2005). From a frequentist/classical perspective that focuses on test statistics, it appears that the Lothian–Taylor data tells a different story from the Euro-related data. By contrast, from a Bayesian perspective the two data sets appear to tell essentially the same story. Taking the Bayesian approach, it is natural to focus on the the information content as embodied in the likelihoods (rather than test statistics per se). We present a model that allows learning across exchange-rate regimes. The model in- volves a hierarchical prior with hyperparameters. The information flows from one regime to another via the hyperparameters. We present the posterior distributions of the parameters of interest based on both datasets combined. These posterior distributions show that it is unlikely that the root is near one. But we also think it is interesting to address the following question: Given a prior based on the Lothian–Taylor data, does the Euro-related data increase or decrease the odds in favor of a unit root? Perhaps surprisingly, the answer is that the Euro-related data decreases the odds in favor of a unit root. 2. The data and the likelihoods We adopt a simple model for the dynamics of a real exchange rate regime. Let y i = (y i,1 ,...,y i,T i ) denote the real exchange rate between two countries and let y =(y 1 ,...,y n ) Date : November 17, 2008. The views expressed herein are the author’s and do not necessarily reflect those of the Federal Reserve Bank of Atlanta or the Federal Reserve System. 1

Upload: independent

Post on 22-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES

GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

Preliminary and incomplete

Abstract. Taking a Bayesian approach, we focus on the information content in twodata sets: (1) the so-called Lothian–Taylor data and (2) post-war data for countries thatadopted the Euro. We find the two data sets have similar implications for the existence ofunit roots: Both data sets produce evidence goes against a unit root.

1. Introduction

We examine two data sets. We refer to the first data set as the Lothian–Taylor data.The base currency-country for the Lothian–Taylor data is the U.K. (pound sterling). Thedata are annual. For the dollar/sterling exchange rate there 181 observations (1792–1973)and for the franc/sterling exchange rate there are 169 observations (1804–1973). We referto the second data set as the Euro-related data. The are ten exchange rates; the base-currency country for the Euro-related data is Germany; these data are monthly. There are588 observations for all ten exchange rates (January 1957 through December 2005).

From a frequentist/classical perspective that focuses on test statistics, it appears that theLothian–Taylor data tells a different story from the Euro-related data. By contrast, froma Bayesian perspective the two data sets appear to tell essentially the same story. Takingthe Bayesian approach, it is natural to focus on the the information content as embodiedin the likelihoods (rather than test statistics per se).

We present a model that allows learning across exchange-rate regimes. The model in-volves a hierarchical prior with hyperparameters. The information flows from one regime toanother via the hyperparameters. We present the posterior distributions of the parametersof interest based on both datasets combined. These posterior distributions show that itis unlikely that the root is near one. But we also think it is interesting to address thefollowing question: Given a prior based on the Lothian–Taylor data, does the Euro-relateddata increase or decrease the odds in favor of a unit root? Perhaps surprisingly, the answeris that the Euro-related data decreases the odds in favor of a unit root.

2. The data and the likelihoods

We adopt a simple model for the dynamics of a real exchange rate regime. Let yi =(yi,1, . . . , yi,Ti) denote the real exchange rate between two countries and let y = (y1, . . . , yn)

Date: November 17, 2008.The views expressed herein are the author’s and do not necessarily reflect those of the Federal Reserve

Bank of Atlanta or the Federal Reserve System.

1

2 GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

denote a collection of n exchange rate regimes. Let the data generating process for yi bean AR(p) process:

yit = αi +p∑

j=1

γij yi,t−j + εit, (2.1)

where εitiid∼ N(0, σ2

i ). Let τi denote the frequency of the data (in years) for regime i, sothat τi = 1 for annual data and τi = 1/12 for monthly data. It is convenient to reparame-terize (2.1) as follows:1

yit = αi + (βi)τi yi,t−1 +p−1∑

j=1

ψij ∆yi,t−j + εit, (2.2)

where βi :=(∑p

j=1 γij

)1/τi

and ψij := −∑pk=j+1 γik. The parameter of interest is βi.2 The

process is said to have a unit root if βi = 1. Let θi = (αi, βi, σi, {ψij}). Then

p({yit}Tit=p+1 | {yit}p

t=1, θi) =Ti∏

i=p+1

p(yit|{yis}t−1s=t−p, θi), (2.3)

where

p(yit|{yis}t−1s=t−p, θi) = N

(yit | αi + (βi)τi yi,t−1 +

p−1∑

j=1

ψij ∆yi,t−j , σ2i

). (2.4)

The log-likelihood function for θi is given by

`(θi) := log(p({yit}Ti

t=p+1 | {yit}pt=1, θi)

). (2.5)

Define

θi := arg maxθi

`(θi) and Σθi :=( −∂2

∂θ ∂θ′`(θi)

)−1∣∣∣∣∣θi=θi

, (2.6)

and let yi := (βi, σβi) where βi := (θi)2 and σβi := (Σθi)22. Define

L(βi|yi) := N(βi|βi, σ2βi

). (2.7)

In effect, we treat (βi, σβi) as sufficient statistics for yi as far as βi is concerned. Note thatN(βi|βi, σβi) is simply the functional form for the likelihood and not a sampling distribution.3

See Table 1 for the sufficient statistics for both data sets. For a plot of the likelihoods,see Figure 1. See also Figure 2, which shows that each of the βi are within 2 σβi of theweighted average. Table 2 shows that the lag-length p has little effect on the average valuesof βi and σβi for the Euro-related data. Therefore, we conduct the analysis with p = 1.

1∆yi,t−j := yi,t−j − yi,t−j−1.2The parametrization (βi)

τi calibrates βi to an annual rate.3In the appendix we derive (2.7) by integrating out the nuisance parameters.

PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES 3

3. Bayesian inference and nested hypothesis testing

We provide a brief review of Bayesian inference and nested hypothesis testing. Theposterior distribution for βi is given by Bayes’ rule:

p(βi|yi) =L(βi|yi) p(βi)

p(yi)∝ L(βi|yi) p(βi), (3.1)

where p(yi) =∫

L(βi|yi) p(βi) dβi is the likelihood of the data according to the model. Forfuture reference, a rearrangement of Bayes’ rule (3.1) produces

p(βi|yi)p(βi)

=L(βi|yi)∫

L(βi|yi) p(βi) dβi. (3.2)

Two models of βi. Consider two models, one model where the probability that βi < 1 isone and another model where the probability βi = 1 is one. The first model is characterizedby βi ∈ [0, 1]. Although this model includes 1 in the space for βi, the probability thatβi ∈ (0, 1) is one.4 The second model is characterized by βi = 1. The second model can beobtained from the first model by simply conditioning on βi = 1; thus we refer to the firstmodel as the base model and the second model as the nested model.

Let A0 denote the base model and let A1 denote the nested model. The posterior prob-ability of model Am (m ∈ {0, 1}) is given by an application of Bayes rule:

p(Am|yi) ∝ L(Am|yi) p(Am), (3.3)

where L(Am|yi) is the likelihood of model Am and P (Am) is the prior probability of modelAm. Using (3.3), the posterior odds ratio can be expressed the product of the prior oddsratio and the Bayes factor:

p(A1|yi)p(A0|yi)

=L(A1|yi)L(A0|yi)

× p(A1)p(A0)

, (3.4)

where L(A1|yi)/L(A0|yi) is the Bayes factor. The Bayes factor expresses how the evidencechanges the prior odds ratio into the posterior ratio.

Let p(βi) denote the prior for βi for the base model. The likelihood of the observed datagiven the base model is L(A0|yi) =

∫ 10 L(βi|yi) p(βi) dβi. The likelihood of the observed

data given the nested model is L(A1|yi) = L(βi = 1|yi) = N(1|βi, σ2βi

). The Bayes factor infavor of the nested model relative to the base model is5

B(βi = 1) =L(βi = 1|yi)∫ 1

0 L(βi|yi) p(βi) dβi

. (3.5)

Given (3.2), we can express the Bayes factor as6

B(βi = 1) =p(βi = 1|yi)p(βi = 1)

. (3.6)

4We assume the prior for βi in this model has a density with respect to Lebesgue measure.5In due course, we will be interested in Bayes factors for different model comparisons, and therefore we

will use notation that makes explicit the comparison being made.6The right-hand side of (3.6) is known as the Savage–Dickey density ratio.

4 GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

As more data are acquired, the posterior distribution for βi in the base model will becomemore concentrated on the true value of βi (assuming the true value is in the unit interval).Consequently, if βi = 1 is true, the base model will become more like the nested model,and in the limit the two models will be indistinguishable. On the other hand, if βi < 1 istrue, the base model will eventually become quite distinct from the nested model. Referringto (3.6), one may see

limTi→∞

B(βi = 1) = limTi→∞

p(βi = 1|yi)p(βi = 1)

=

{∞ βi = 10 βi < 1.

(3.7)

This involves a sequence of Bayes factors, each of which uses the same prior p(βi).7

4. Sensitivity to prior

The posterior distribution p(βi|yi) and the Bayes factor B(βi = 1) both depend on theprior p(βi). Therefore, an investigation of the sensitivity of the posterior and the Bayesfactor to different priors is appropriate.

Truncated uniform prior. To begin, consider a uniform distribution truncated to theinterval [a, 1] where a ∈ [0, 1):8

U(x|a) =1[a,1](x)1− a

. (4.1)

Note the mean and standard deviation of x ∼ U(x|a) are (1 + a)/2 and (1 − a)/√

12,respectively. Let p(βi|a) = U(βi|a), where a indexes the base model.9 The posterior forβi is given by p(βi|yi, a) ∝ 1[a,1](βi) L(βi|yi). The marginal likelihood of the base model(indexed by a) is10

L(a|yi) :=∫ 1

0L(βi|yi) p(βi|a) dβi =

∫ 1a N(βi|βi, σ

2βi

) dβi

1− a=

Φ(

1−βi

σβi

)− Φ

(a−βi

σβi

)

1− a. (4.2)

Define L(a = 1|yi) := lima→1 L(a|yi) = N(1|βi, σ2βi

) and note L(a = 1|yi) = L(βi = 1|yi).The Bayes factor in favor of βi = 1 (conditioning on the base model indexed by a) is givenby

B(βi = 1|a) :=L(βi = 1|yi)

L(a|yi)=

L(a = 1|yi)L(a|yi)

. (4.3)

Consequently, we see B(βi = 1|a = 1) = 1. Figure 3 shows the Bayes factors B(βi = 1|a)for a ∈ [0, 1] for each of the twelve exchange-rate regimes.

7By contrast, if βi = 1 and we update our prior as each additional observation is obtained, thenlimTi→∞ B(βi = 1) = 1.

8The indicator function is defined as 1A(x) =

{1 x ∈ A

0 x 6∈ A.

9p(βi|a) is shorthand notation for p(βi|A0(a)) where the latter notation makes explicit the fact that thisis the prior for the base model (which itself depends on the parameter a).

10L(a|yi) is shorthand notation for L(A0(a)|yi).

PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES 5

Truncated Gaussian prior. Next, consider a truncated Gaussian distribution:

N(x|µ, σ2) :=1[0,1](x)N(x|µ, σ2)∫ 1

0 N(z|µ, σ2) dz. (4.4)

Let p(βi|µ, σ) = N(βi|µ, σ2).11 Then the posterior is also a truncated Gaussian:

p(βi|yi, µ, σ) = N(βi|µi, σ2i ), (4.5)

whereµi = αi βi + (1− αi) µ and σ2

i = αi σ2βi

(4.6)and

αi =σ2

σ2 + σ2βi

. (4.7)

Note (1) limσ→∞ αi = 1 and (2) limσ→0 αi = 0. In the first case the limiting prior is flatand the posterior is proportional to the likelihood (over the unit interval); in the secondcase the limiting prior puts all its weight on µ and consequently so does the posterior.

Before we examine the Bayes factor, it is convenient to compute the mean m and standarddeviation s of the the truncated Gaussian distribution. Define the transformation F :(µ, σ) 7→ (m, s) as follows:

m :=∫ 1

0N(x|µ, σ2) x dx and s :=

(∫ 1

0N(x|µ, σ2) (x−m)2 dx

)1/2

. (4.8)

The range of F is the shaded region R shown in Figure 4, which can be characterized by0 ≤ m ≤ 1 and 0 ≤ s ≤ g(m), where g(m) equals the standard deviation of a truncatedexponential distribution with the mean m.12

Now let13

p(βi|m, s) = N(βi|µ, σ)|(µ,σ)=F−1(m,s). (4.9)The posterior mean and standard deviation are given by (mi, si) = F (µi, σi). The marginallikelihood of the base model (indexed by m and s) is given by14

L(m, s|yi) :=∫ 1

0L(βi|yi) p(βi|m, s) dβi

=(∫ 1

0N(βi|βi, σ

2βi

) N(βi|µ, σ2) dβi

)∣∣∣∣(µ,σ)=F−1(m,s)

.

(4.10)

Note L(m, s = 0|yi) := lims→0 L(m, s|yi) = L(βi = m|yi) for m ∈ (0, 1) and L(m = 1, s =0|yi) := limm→1 L(m, s = 0|yi) = L(βi = 1|yi).

The Bayes factor in favor of βi = 1 relative to the base model (indexed by m and s) is

B(βi = 1|m, s) :=L(βi = 1|yi)∫ 1

0 L(βi|yi) p(βi|m, s) dβi

=L(m = 1, s = 0|yi)

L(m, s|yi). (4.11)

11Again, p(βi|µ, σ) is shorthand notation for p(βi|A0(µ, σ)).12A truncated exponential distribution is proportional to h(x) = 1[0,1](x) eλ x for some λ ∈ (−∞,∞).13Although (m, s) = F (µ, σ) has a closed-form solution, (µ, σ) = F−1(m, s) must be solved numerically.14There is a closed-form expression for the integral in the second line of (4.10).

6 GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

Therefore, B(βi = 1|m = 1, s = 0) = 1. In addition, it is interesting to note that the greaterthe likelihood of [the model indexed by] (m, s), the smaller the evidence in favor of βi = 1.

We can obtain the maximum evidence against βi = 1 with a prior that is a special caseof the truncated Gaussian prior. Let p(βi|m, s = 0) := lims→0 p(βi|m, s) = δ(βi − m),where δ( · ) is the Dirac delta function. This prior puts all its weight on m and consequently(mi, si) = (m, 0). As we have noted above, L(m, s = 0|yi) = L(βi = m|yi) and thusB(βi = 1|m, s = 0) = L(βi = 1|yi)/L(βi = m|yi). Therefore (assuming βi ∈ [0, 1]), themaximum evidence against βi = 1 is obtained with the prior p(βi|m = βi, s = 0), whichminimizes the Bayes factor in favor of βi = 1:

B∗i := B(βi = 1|m = βi, s = 0) =

L(βi = 1|yi)

L(βi = βi|yi)≤ 1. (4.12)

All other priors of any form (as long as the support is limited to βi ∈ [0, 1]) produce Bayesfactors more favorable to βi = 1. Therefore we refer to 1/B∗

i as the maximum evidenceagainst βi = 1.15 If 1/B∗

i is small, then we can be confident that the data contain littleevidence against βi = 1 regardless of one’s prior view. By contrast, if 1/B∗

i is large, thenthe evidence against βi = 1 may well depend on one’s prior view.

The maximum evidence against βi = 1 is shown in Table 1. Graphically, one can see themaximum evidence against in Figure 1 as the ratio of the maximum likelihood to the valueof the likelihood at βi = 1.

See Figure 5 for B(βi = 1|m, s) for (m, s) ∈ R for each of the twelve exchange-rateregimes. We see that Bayes factors that favor βi = 1 can be produced by choosing m ands sufficiently small (which has the effect of producing a posterior that has a low mean).

5. A model for learning across regimes

The preceding setup does not provide a formal way to draw inferences from one exchange-rate regime to another. In this section, we adopt a hierarchical prior that allows us to dojust that. In a nutshell, we form a joint prior over all the coefficients {βi} and treat (m, s) asunknown hyperparameters. This induces correlation across the βi coefficients in the prior,thereby allowing information to flow from βi to βj via the hyperparameters.

A hierarchical prior. We adopt the following hierarchical prior:

p(β, m, s|z) = p(m, s|z) p(β|m, s) = p(m, s|z)n∏

i=1

p(βi|m, s), (5.1)

where β := (β1, . . . , βn) and where p(βi|m, s) is given in (4.9).The prior for (m, s) ∈ R depends on a single parameter, z. Consider the following prior

for (m, s):p(m, s|λ) ∝ 1R(m, s) h(m|λ), (5.2)

where h(m|λ) ∝ 1[0,1](m) eλ m is the truncated exponential distribution. By construction,p(m|s = 0, λ) = h(m|λ) and ∂p(m, s|λ)/∂s = 0. We choose values of λ as follows. Define

15Note log(1/B∗i ) = 1

2

(1−βiσβi

)2

.

PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES 7

H(λ) := E[m|s = 0, λ] =∫ 10 h(m|λ) mdm. Given a value z ∈ (0, 1), let λ = H−1(z). Then

definep(m, s|z) := p(m, s|λ)|λ=H−1(z). (5.3)

Thus we index the base model by the parameter z.16 To illustrate the effect of varying theprior for (m, s), we let z take on the values 0.50, 0.80, and 0.95. A plot of p(m, s|z = 0.80)is shown in Figure 6.

The marginal prior for βi is a mixture, given by17

p(βi|z) =∫∫

p(βi|m, s) p(m, s|z) dmds. (5.4)

Plots of p(βi|z) for three values of z are shown in Figure 7.

Likelihood and posterior. Let the joint likelihood for β be given by

L(β|y) =n∏

i=1

L(βi|yi), (5.5)

where y := (y1, . . . , yn). Then the joint posterior is

p(β, m, s|y, z) ∝ L(β|y) p(β, |m, s) p(m, s|z). (5.6)

As we will see shortly, the marginal posterior distribution for (m, s) plays a central rolein learning across regimes. The (marginal) posterior distribution for (m, s) is

p(m, s|y, z) ∝ L(m, s|y) p(m, s|z), (5.7)

where

L(m, s|y) =∫

L(β|y) p(β|m, s) dβ =n∏

i=1

L(m, s|yi), (5.8)

and where L(m, s|yi) is given in (4.10).The marginal posterior for βi is a mixture of the conditional distribution:18

p(βi|y, z) =∫∫

p(βi|yi, m, s) p(m, s|y, z) dm ds. (5.9)

We can use (5.4) and (5.9) to express the Bayes factor in favor of βi = 1 relative the basemodel indexed by z:19

B(βi = 1|z) =p(βi = 1|y, z)p(βi = 1|z)

. (5.10)

The marginal posterior for βn+1 (for a regime for which we have no data but would haveincluded if we had) is20

p(βn+1|y, z) =∫∫

p(βn+1|m, s) p(m, s|y, z) dmds. (5.11)

16In the limit as z → 1, p(m, s|z) puts all its weight on (m, s) = (1, 0).17See the appendix for a derivation of (5.4).18See the Appendix for a derivation of (5.9).19It can be shown that limz→1 B(βi = 1|z) = 1.20See the Appendix for a derivation of (5.11).

8 GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

In (5.11) we see that it is the hyperparameters that carry the information across regimes.Finally, we can use (5.4) and (5.11) to express the Bayes factor in favor of βn+1 = 1:

B(βn+1 = 1|z) =p(βn+1 = 1|y, z)p(βn+1 = 1|z)

. (5.12)

Two special cases of this hierarchical prior merit mention. First, if p(m, s) places all itsweight on a single point, say (m0, s0), then we have a model in which there is no learning.In other words, we learn nothing about βj from βi. Second, if s = 0, then we have a modelin which βi = m for all i. In this latter case, what we learn about βi transfers directly toβj without attenuation.

For future reference it is convenient to define two posterior moments:

βi :=∫

βi p(βi|y) dβi and σ2βi

:=∫

(βi − βi)2 p(βi|y) dβi. (5.13)

Collections of regimes. Let yAll := (y1, . . . , y12), and partition yAll into yLT := (y1, y2)and yEuro := (y3, . . . , y12). The posterior based on all the data (yAll) can be obtained eitherdirectly or sequentially:

p(β, m, s|yAll, z) ∝ p(β, m, s|z) L(β|yAll)

= p(β, m, s|z) L(β|yLT) L(β|yEuro)

∝ p(β, m, s|yLT, z) L(β|yEuro).(5.14)

In the last line of (5.14), p(β, m, s|yLT, z) plays the role of the prior; it incorporates theLothian–Taylor data.

Figure 8 shows βi ± 2σβi computed from p(βi|yAll, z = 0.80). Compare with Figure 2.Figure 9 plots βi versus βi to show the shrinkage.

The Bayes factor in favor of βi = 1 that corresponds to the direct route can be decomposedinto the product of two Bayes factors that correspond the the sequential route:

p(βi = 1|yAll, z)p(βi = 1|z)

=p(βi = 1|yLT, z)

p(βi = 1|z)× p(βi = 1|yAll, z)

p(βi = 1|yLT, z). (5.15)

Of the three Bayes factors in (5.15), the one we focus on is

B(βi = 1|yLT, z) :=p(βi = 1|yAll, z)p(βi = 1|yLT, z)

, (5.16)

which expresses the revision in the odds ratio in favor of a unit root given the informativeprior based on the Lothian–Taylor data. Note that B(βi = 1|yLT, z) applies to i = 1, . . . , 13.For i = 3, . . . , 12, it expresses the revision in the odds ratio for the Euro-related exchangerates. For i = 1, 2, it expresses the revision in the odds ratio for the Lothian-Taylor exchangerates that comes about from seeing the Euro-related data (above and beyond having seenthe Lothian–Taylor data itself). Finally, for i = 13, it expresses the revision in the oddsratio for an exchange-rate regime for which we do not have data, but would have includedif we did.

See Figure 10 for the marginal posterior for p(β3|yLT, z) for three values of z. Figure 11shows B(βi = 1|yLT, z = 0.80) as defined in (5.16) for i = 3, . . . , 13. [Need to do this fori = 1, 2 as well.]

PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES 9

Table 3 displays 1/B(βi = 1|yLT, z) [the Bayes factors against βi = 1] for three valuesof z. In addition, Table 3 shows the numerical standard deviations for the Bayes factors,which are roughly 1/10 the size of the corresponding Bayes factor.21

Are they all the same? According to the hierarchical model, if s = 0, then βi = m for all i.The Bayes factor in favor of s = 0 is given by

B(s = 0|z) =p(s = 0|y, z)p(s = 0|z)

=

∫ 10 p(m, s = 0|y, z) dm∫ 10 p(m, s = 0|z) dm

. (5.17)

Again, we choose to use the informative prior based on the Lothian–Taylor data to computethis Bayes factor:

B(s = 0|yLT, z) :=p(s = 0|yAll, z)p(s = 0|yLT, z)

. (5.18)

In this case, we fine B(s = 0|yLT, z = 0.80) ≈ 2.6. Thus, the evidence favors s = 0.Figure 12 shows the posterior p(m|yAll, s = 0, z = 0.80).

Appendix A

Derivation of L(βi|yi). We can obtain (2.7) via an alternative route. The marginal like-lihood for the regression coefficients can be obtained as follows. Adopt a non-informativeprior p(σi) ∝ σ−1

i and integrate out σi:

L(αi, βi, {ψij} | yi) =∫ ∞

0

L(θi | yi)σi

dσi. (A.1)

where

L(θi | yi) = p({yit}Tit=p+1|{yit}p

t=1, θi). (A.2)

It can be shown that L(αi, βi, {ψij} | yoi ) is proportional to a t distribution with Ti − (p +

1) degrees of freedom. (Details below.) The marginal likelihood for βi, L(βi|yi), is alsoproportional to a t distribution withTi− (p+1) degrees of freedom, which can be computeddirectly from L(αi, βi, {ψij} | yo

i ).Define `(βi) := log

(L(βi|yo

i )). Since Ti − (p + 1) > 100 for our data sets, we adopt the

Gaussian approximation for the likelihood: L(βi|βi, σβi) = N(βi|βi, σ2βi

), where

βi = arg maxβi

`(βi) and σβi =(− `′′(βi)

)−1/2. (A.3)

Show t distribution.

21These numerical standard deviations are computed as follows. First, we apply the formulas for comput-ing the numerical standard deviation of an expectation using importance sampling given in Geweke (2005,pp. 114–115) and apply them to the computation of p(βi|yAll, z) and p(βi|yLT, z) via Rao–Blackwellization.Next, we conduct a Monte Carlo simulation of the Bayes factor by drawing from the the two distributionsand taking the ratio. The standard deviations from the Monte Carlo are reported in the table.

10 GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

Marginal distributions for βi in model with learning. Let β−i = β \ {βi}. Themarginal prior for βi (5.4) is a mixture of p(βi|m, s):

p(βi) =∫∫∫

p(β, m, s) dβ−i dmds

=∫∫ (∫

p(β|m, s) dβ−i

)p(m, s) dmds

=∫∫

p(βi|m, s) p(m, s) dmds,

(A.4)

where the third equality follows from p(β|m, s) =∏n

j=1 p(βj |m, s) and

∫ n∏

j=1

p(βj |m, s) dβ−i = p(βi|m, s)n∏

j=1

j 6=i

∫p(βj |m, s) dβj = p(βi|m, s). (A.5)

The marginal posterior distribution for βi (5.9) is a mixture of p(βi|yi,m, s):

p(βi|y) =∫∫∫

p(β, m, s|y) dβ−i dmds

=∫∫ (∫

p(β|y,m, s) dβ−i

)p(m, s|y) dmds

=∫∫

p(βi|yi,m, s) p(m, s|y) dmds,

(A.6)

where the third equality follows from p(β|y, m, s) =∏n

j=1 p(βi|yi,m, s) and the same stepsas in (A.5). The marginal posterior for βn+1 (5.11) can be obtained as a special case of (5.9)by adopting the formalism L(βn+1|yn+1) = 1, so that p(βn+1|yn+1,m, s) = p(βn+1|m, s).

Truncated exponential distribution. The truncated exponential distribution is givenby

h(x|λ) = 1[0,1](x)λ e−λ x

1− e−λ. (A.7)

The mean, as a function of λ, is given by

H(λ) :=∫ 1

0h(x|λ)x dx =

+1

1− eλ(A.8)

and the standard deviation is

G(λ) :=∫ 1

0h(x|λ) (x−H(λ))2 dx =

√1λ2

+1

2 (1− cosh(λ)). (A.9)

Then g(m) := G(λ)|λ=H−1(m).

PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES 11

Weighted bootstrap (importance sampling). Here is an outline of how to compute theposterior for the hierarchical model p(β|yBoth, z) with the prior p(m, s|z) via the weightedbootstrap.

(1) For r = 1, . . . , R

(a) Draw (m(r), s(r)) from p(m, s|z)(b) Compute (µ(r), σ(r)) = F−1(m(r), s(r))(c) For i = 1, . . . , 13

(i) Draw β(r)i from p(βi|µ(r), σ(r))

(ii) Compute v(r)i = L(β(r)

i |βi, σβi) (note: v(r)13 = 1 for all r)

(d) Compute v(r)All =

∏13i=1 v

(r)i

(2) Compute w(r)All = v

(r)All/

∑Rr=1 v

(r)All (for r = 1, . . . , R)

(3) Resample {(β(r)1 , . . . , β

(r)13 )}R

r=1 using {w(r)All}R

r=1 as probabilities(4) Done

To compute the posterior based on the LT data only, compute w(r)LT = v

(r)LT/

∑Rr=1 v

(r)LT

where v(r)LT =

∏2i=1 v

(r)i and resample {(β(r)

1 , . . . , β(r)13 )}R

r=1 using {w(r)LT}R

r=1 as probabilities.(The posteriors for {βi}13

i=3 will all be the same.)The density p(βi = 1|yAll, z) can be computed via Rao–Blackwellization as follows. First,

analogous to (5.9), we can express p(βi|yAll, z) as:

p(βi|yAll, z) =∫∫

p(βi|yi, µ, σ) p(µ, σ|yAll, z) dµ dσ. (A.10)

Equation (4.5) delivers a analytical expression for p(βi|yi, µ, σ) and we can obtain drawsof (µ, σ) from p(µ, σ|yAll, z) by resampling {(µ(r), σ(r)}R

r=1 using {w(r)All}R

r=1 as probabilities.Therefore, we can compute

p(βi = 1|yAll, z) ≈ 1R

R∑

r′=1

p(βi = 1|yi, µ(r′), σ(r′)), (A.11)

where {(µ(r′), σ(r′))}Rr′=1 are the resampled values. If we want p(βi = 1|yLT, z), then we

resample {(µ(r), σ(r)}Rr=1 using {w(r)

LT}Rr=1 as probabilities instead.

To compute the posterior using the prior p(m, s|z′) instead of p(m, s|z), first compute

q(r) :=p(m(r), s(r)|z′)p(m(r), s(r)|z)

(A.12)

and then resample {(β(r)1 , . . . , β

(r)13 )}R

r=1 using either {w(r)All}R

r=1 or {w(r)LT}R

r=1 as probabilities,where

w(r)All =

q(r) v(r)All∑R

r=1 q(r) v(r)All

and w(r)LT =

q(r) v(r)LT∑R

r=1 q(r) v(r)LT

. (A.13)

12 GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

0.5 0.6 0.7 0.8 0.9 1. 1.10

4

8

12U.S.

0.5 0.6 0.7 0.8 0.9 1. 1.10

4

8

12France

0.5 0.6 0.7 0.8 0.9 1. 1.10

4

8

12Austria

0.5 0.6 0.7 0.8 0.9 1. 1.10

4

8

12Belgium

0.5 0.6 0.7 0.8 0.9 1. 1.10

4

8

12Finland

0.5 0.6 0.7 0.8 0.9 1. 1.10

4

8

12France

0.5 0.6 0.7 0.8 0.9 1. 1.10

4

8

12Greece

0.5 0.6 0.7 0.8 0.9 1. 1.10

4

8

12Ireland

0.5 0.6 0.7 0.8 0.9 1. 1.10

4

8

12Italy

0.5 0.6 0.7 0.8 0.9 1. 1.10

4

8

12Netherlands

0.5 0.6 0.7 0.8 0.9 1. 1.10

4

8

12Portugal

0.5 0.6 0.7 0.8 0.9 1. 1.10

4

8

12Spain

Figure 1. The Lothian–Taylor likelihoods are shown in the first row, andthe Euro-related likelihoods are shown in the remaining rows.

PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES 13

References

Geweke, J. (2005). Contemporary Bayesian econometrics and statistics. John Wiley & Sons.

(Fisher) Federal Reserve Bank of Atlanta, Research Department, 1000 Peachtree StreetN.E., Atlanta, GA 30309–4470

E-mail address: [email protected]

URL: http://www.markfisher.net

14 GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

Table 1. Sufficient statistics for Lothian–Taylor data and for Euro-relateddata. The lag-length is one (p = 1). Also shown is 1/B∗

i , the maximumevidence against βi = 1.

Dataset i βi σβi 1/B∗i Country

LT 1 0.898 0.031 224.3 U.S.2 0.761 0.076 140.4 France

Average 0.830 0.054Euro 3 0.954 0.046 1.6 Austria

4 0.858 0.059 18.2 Belgium5 0.780 0.068 193.3 Finland6 0.776 0.066 321.7 France7 0.845 0.064 19.0 Greece8 0.832 0.074 12.9 Ireland9 0.881 0.061 6.8 Italy

10 0.852 0.059 23.8 Netherlands11 0.860 0.076 5.4 Portugal12 0.820 0.077 14.9 Spain

Average 0.846 0.065

Table 2. Averages of {βi}12i=3 and {σβi

}12i=3 (Euro-related data) by lag length.

p β σβ p β σβ p β σβ p β σβ

1 0.846 0.065 13 0.846 0.065 25 0.843 0.067 37 0.851 0.0722 0.843 0.065 14 0.848 0.066 26 0.841 0.068 38 0.851 0.0733 0.847 0.066 15 0.855 0.067 27 0.845 0.068 39 0.850 0.0734 0.845 0.066 16 0.853 0.067 28 0.848 0.069 40 0.851 0.0735 0.849 0.066 17 0.849 0.067 29 0.851 0.070 41 0.852 0.0746 0.851 0.067 18 0.848 0.067 30 0.853 0.070 42 0.849 0.0747 0.848 0.067 19 0.844 0.067 31 0.853 0.070 43 0.850 0.0758 0.848 0.067 20 0.845 0.067 32 0.856 0.071 44 0.849 0.0759 0.856 0.066 21 0.843 0.067 33 0.856 0.071 45 0.847 0.075

10 0.851 0.066 22 0.847 0.068 34 0.853 0.071 46 0.845 0.07611 0.860 0.066 23 0.848 0.068 35 0.856 0.072 47 0.842 0.07612 0.858 0.066 24 0.853 0.068 36 0.857 0.072 48 0.844 0.077

PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES 15

Table 3. Bayes factors against βi = 1, using three LT-based priors indexedby z. (Numerical standard errors shown beneath.)

z z

i 0.50 0.80 0.95 i 0.50 0.80 0.95

1 8.6 8.8 11.3 7 63.1 72.4 90.91.0 0.9 1.1 6.4 6.8 7.8

2 8.1 10.0 17.5 8 45.8 52.3 64.80.7 0.8 1.5 4.5 4.8 5.5

3 3.1 3.7 5.4 9 23.2 27.1 35.90.3 0.3 0.4 2.5 2.6 3.0

4 59.0 68.3 87.4 10 75.8 87.3 110.76.2 6.5 7.5 7.9 8.3 9.5

5 456.9 503.9 568.4 11 20.8 24.1 30.940.5 42.5 48.0 2.1 2.2 2.6

6 707.6 776.9 865.1 12 51.9 58.9 71.861.7 64.7 73.1 5.0 5.3 6.1

13 6.0 6.8 8.60.5 0.6 0.7

16 GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

1 2 3 4 5 6 7 8 9 10 11 120.5

0.6

0.7

0.8

0.9

1.0

1.1

Figure 2. Error bars show βi ± 2 σβi . The weighted mean is indicated,∑12i=1 wi βi = 0.864, where the weights are proportional to the precisions:

wi = σ−2βi

/∑12

j=1 σ−2βj

.

PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES 17

0.0 0.2 0.4 0.6 0.8 1.0

1.00.5

2.0

0.2

5.0

0.1

Β`

1

0.0 0.2 0.4 0.6 0.8 1.0

1.00.5

2.0

0.2

5.0

0.1

Β`

2

0.0 0.2 0.4 0.6 0.8 1.0

1.00.5

2.0

0.2

5.0

0.1

Β`

3

0.0 0.2 0.4 0.6 0.8 1.0

1.00.5

2.0

0.2

5.0

0.1

Β`

4

0.0 0.2 0.4 0.6 0.8 1.0

1.00.5

2.0

0.2

5.0

0.1

Β`

5

0.0 0.2 0.4 0.6 0.8 1.0

1.00.5

2.0

0.2

5.0

0.1

Β`

6

0.0 0.2 0.4 0.6 0.8 1.0

1.00.5

2.0

0.2

5.0

0.1

Β`

7

0.0 0.2 0.4 0.6 0.8 1.0

1.00.5

2.0

0.2

5.0

0.1

Β`

8

0.0 0.2 0.4 0.6 0.8 1.0

1.00.5

2.0

0.2

5.0

0.1

Β`

9

0.0 0.2 0.4 0.6 0.8 1.0

1.00.5

2.0

0.2

5.0

0.1

Β`

10

0.0 0.2 0.4 0.6 0.8 1.0

1.00.5

2.0

0.2

5.0

0.1

Β`

11

0.0 0.2 0.4 0.6 0.8 1.0

1.00.5

2.0

0.2

5.0

0.1

Β`

12

Figure 3. Bayes factors B(βi = 1|a) for a ∈ [0, 1]. [See equation (4.3).]Values greater than one favor βi = 1. The location of βi is indicated.

18 GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

0.0 0.2 0.4 0.6 0.8 1.00.00

0.05

0.10

0.15

0.20

0.25

m

s

Figure 4. Domain for (m, s): R = {(m, s) : 0 ≤ m ≤ 1 ∧ 0 ≤ s ≤ g(m)}.

PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES 19

0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0.000.050.100.150.200.25

m

s

0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0.000.050.100.150.200.25

m

s

0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0.000.050.100.150.200.25

m

s

0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0.000.050.100.150.200.25

m

s

0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0.000.050.100.150.200.25

m

s

0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0.000.050.100.150.200.25

m

s

0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0.000.050.100.150.200.25

m

s

0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0.000.050.100.150.200.25

m

s

0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0.000.050.100.150.200.25

m

s

0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0.000.050.100.150.200.25

m

s

0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0.000.050.100.150.200.25

m

s

0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0.000.050.100.150.200.25

m

s

Figure 5. Bayes factors B(βi = 1|m, s) = L(βi = 1|yi)/L(m, s|yi) for(m, s) ∈ R. Contours run from 10−2 to 102 in powers of 10. Shaded re-gions favor βi = 1. The location (βi, 0) that produces 1/B∗

i (the maximumevidence against βi = 1) is indicated.

20 GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

0.0

0.5

1.0

m

0.00.1

0.2s

0

10

20

30

Figure 6. Prior p(m, s|z = 0.80) over R is shown with evenly spaced contours.

0.0 0.2 0.4 0.6 0.8 1.00

2

4

6

8

Βi

Marginal priors pHΒiÈzL derived from priors for pHm,sÈzL

à z = 0.50

à z = 0.80

à z = 0.95

Figure 7. Marginal prior p(βi|z) for three values of p(m, s|z), where z =E[m|s = 0]. The means of the priors for βi are 0.50, 0.70, and 0.90.

PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES 21

1 2 3 4 5 6 7 8 9 10 11 12 130.5

0.6

0.7

0.8

0.9

1.0

1.1

Figure 8. Posterior means and standard deviations: βi ± 2σβi computedfrom p(βi|yBoth, z = 0.80).

22 GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

0.80 0.85 0.90 0.95

0.80

0.85

0.90

0.95

Β`

i

Βi

slope = 0.27

Figure 9. Posterior means βi versus βi shows shrinkage.

PPP AND UNIT ROOTS: LEARNING ACROSS REGIMES 23

0.0 0.2 0.4 0.6 0.8 1.00

1

2

3

4

5

6

7

Β

Marginal posteriors for Β3 for 3 values of z

à z = 0.50

à z = 0.80

à z = 0.95

Figure 10. Marginal posterior p(β3|y, z) for three values of p(m, s|z), wherez = E[m|s = 0]. The means of the posteriors are 0.78, 0.82, and 0.87.

24 GERALD P. DYWER, MARK FISHER, THOMAS J. FLAVIN, AND JAMES R. LOTHIAN

1 2 3 4 5 6 7 8 9 10 11 12 13

1.000

0.500

0.100

0.050

0.010

0.005

0.001

Bayes factor in favor of a unit root for Euro data

Figure 11. Bayes factors B(βi = 1|z = 0.80).

0.75 0.80 0.85 0.90 0.95 1.000

5

10

15

20

25

Βi

Prob

abili

tyde

nsity

Posterior for Βi given s=0

Figure 12. The posterior distribution p(m|yBoth, s = 0, z = 0.80).