Download - Factor models with downside risk by

Factor models with downside risk

by

Daniele Massacci, Lucio Sarno and Lorenzo Trapani

Granger Centre Discussion Paper No. 20/02

Factor Models with Downside Risk*

Daniele Massacci† Lucio Sarno‡ Lorenzo Trapani§

This Version: November 2020

*The usual disclaimer applies.†Daniele Massacci is with King’s College London. Email: [email protected].‡Lucio Sarno is with the University of Cambridge and the Centre for Economic Policy Research (CEPR). Email:

[email protected].§Lorenzo Trapani is with the University of Nottingham. Email: [email protected].

[email protected]

[email protected]

[email protected]

Factor Models with Downside Risk

We propose a conditional model of asset returns in the presence of common factors and down-side risk. Specifically, we generalize existing latent factor models in three ways: we allow fordownside risk via a threshold specification which allows for the estimation of the ‘disappointment’event, which therefore does not need to be assumed arbitrarily; we permit different factor struc-tures (and number of factors) in different regimes; we show how to recover the observable factorsrisk premia from the estimated latent ones in different regimes. The usefulness of this generalizedmodel is illustrated through two applications to cross-sections of asset returns in equity marketsand other major asset classes.

1 Introduction

A growing body of research argues that the cross-section of asset returns should and does

reflect a premium for bearing downside risk. This risk premium is required by investors for

holding assets that covary strongly with the market when the market declines or is abnormally

low: such assets are undesirable precisely because they offer particularly low returns in bad times.

For example, Ang, Chen, and Xing (2006) show that there is a downsize risk premium in the

cross-section of US stock returns, a result confirmed by Farago and Tedongap (2018) using a

five-factor conditional model, while Lettau, Maggiori, and Weber (2014) show that a conditional

capital asset pricing model that allows for downside risk can explain not only US stocks, but also

the cross section of returns in currency and other major asset classes. These results are consistent

with the original conceptual framework in Markovitz (1959), which advocated using semivariance

as a measure of risk, rather than variance, because semivariance measures downside losses rather

than upside gains.1

In a parallel development in the context of asset pricing models, the literature has also sought

to tackle the vexing issue of potentially biased estimation of unconditional (i.e., linear, and hence

free from downside risk) asset pricing models arising from omitting pricing factors. Since it is not

at all clear which pricing factors should be used, virtually any model which uses observable factors

is potentially at risk of omitting a relevant pricing factor, thereby suffering from omitted variable

bias. This misspecification problem is addressed rigorously in a recent, seminal contribution by

Giglio and Xiu (2020), who propose a three-pass procedure to conduct inference on the risk premia

of virtually any observable factor. The procedure solves both the omitted variable bias problem,

and the much debated “factor zoo” issue (see Harvey and Liu, 2020 and Cochrane, 2011), by

considering a latent factor model, which allows to span the underlying true factor space.

1The motivation for a downside risk premium arises from various theoretical perspectives. In some cases, therationale is behavioral and linked to theories of loss aversion, as in the setups of Ang, Chen, and Xing (2006) andFarago and Tedongap (2018), whereas in other cases it is based on rational asset pricing models, as in Lettau,Maggiori, and Weber (2014). The specific nature of the mechanism capable of generating the downside risk premiumis not a focus of our paper. We simply take seriously the finding that such premium can exist in equilibrium andask how it should be identified in an empirical asset pricing model.

1

The two research areas summarised above indicate that a linear asset pricing model based on

observable factors, which constitutes the workhorse model of empirical asset pricing, can suffer

from at least two possible misspecification issues: the omission of relevant pricing factors, and the

presence of a downside regime. Both these issues can be resolved by using a nonlinear, latent factor

model which allows for the presence of downside risk. However, despite the intuitive appeal of such

a setup, inference for asset pricing models with downside risk and latent factors is a challenging

task. In addition to the nonlinearity of the model, allowing for different regimes means that the

number of common latent factors may be different between good and bad states of the world; the

loadings on the factors may also change across states; the “disappointment event” that triggers the

downside state is in general unknown; and, finally, a solution to the omitted variable problem which

characterises linear asset pricing models needs to be fully developed when also taking downside

risk into account. Our paper offers a methodological contribution, addressing all these issues and

presenting a nonlinear model of asset returns in the presence of common factors and downside

risk; further, we consider a latent factor model approach, which allows to span the entire factor

space as advocated in Giglio and Xiu (2020). Our setup is general enough to encompass previous

models, while allowing for different common factors across states and estimation of the threshold

determining the downside state.

We make at least three contributions to the literature on factor models of asset returns.

Firstly, whilst we define the disappointment event as depending on the relative position of the

market return with respect to a threshold value in a similar fashion to the extant literature,

unlike previous studies we estimate the value of the threshold rather than setting it a priori

equal to a prespecified value. This is achieved by generalizing the inferential theory for panel

threshold models proposed by Massacci (2017) to the case where the number of common factors

and their loadings are allowed to differ across states; we also derive the relevant inferential theory

for the estimated threshold. Secondly, we develop a procedure to estimate the number of common

factors in a model with downside risk, by explicitly allowing for the case where such number may

differ across regimes. This is done in a similar fashion to Trapani (2018), although this is not a

trivial extension since it requires a preliminary round of estimation where the full factor model

2

is estimated (thereby including the threshold, and the spaces spanned by factors and loadings),

and the strong consistency of all estimates is required. We further provide test statistics that are

completely scale-invariant. Finally, we complete our inferential theory by developing a further

step, to allow for the mapping of the estimated latent factors in each stage onto economically

relevant observable ones. This step can be viewed as an extension of Giglio and Xiu (2020). The

distilled essence of these results is a general conditional model of asset returns with downside risk

which can be estimated and tested without imposing any arbitrary restrictions on the stochastic

process of returns. In our view, this setup constitutes at least a workhorse model for conducting

empirical asset pricing in a downside risk framework.

We employ our methodology to study the factor structure and estimate risk premia in the

presence of downside risk using two datasets previously employed in other studies - a dataset

composed of equity portfolios (the same as used by Farago and Tedongap (2018)) and a dataset

spanning multiple asset classes (based on the one used by Lettau, Maggiori, and Weber (2014)).

In both cases, our approach produces superior goodness of fit relative both to unconditional

(linear) factor models and restricted conditional models that impose the number of factors or the

threshold. The empirical estimation of the proposed latent factor model with downside risk proves

illuminating when compared to an unconditional model of asset returns. Indeed, the unconditional

model performs poorly in both states of the world, with systematic pricing errors that are positive

in the downside state and negative in the satisfactory state - i.e., the unconditional model predicts

much higher returns than the realized ones in the downside state, and lower returns than the

realized ones in the satisfactory state. Puzzingly, when analyzing the full sample performance

of the unconditional model, its performance can appear satisfactory, but this is only because

averaging across negative and positive pricing errors in different states gives the false impression

that the model can capture the average excess returns precisely. In other words, assessing the

performance of the unconditional factor model over the full sample can give the illusion that it

performs well when in fact it performs poorly in both regimes. In contrast, the latent factor model

with downside risk appears to have no systematic tendency to generate positive or negative errors

in either regime, and therefore also unconditionally, fitting the cross-sections of asset returns used

3

here very well.

The estimation results highlight the importance of estimating the threshold (rather than im-

posing it a priori) and of allowing for different common factors across states. For example, with

respect to the estimation of the threshold, we find a remarkable similarity in its point estimate

across the two datasets employed, with the estimate being around −6 percent. This estimate is

twice as large in absolute value as the most common value imposed in estimation by researchers,

which is −3 percent, thereby implying that the downside risk state is characterised by a much

smaller fraction of the return distribution with more extreme negative returns. We also find that,

while at least four (and up to six) common factors are needed in the satisfactory (good) state

of the world to capture the factor structure of returns, only one or at most two common factors

are needed in the downside risk state. This is consistent with the widely held view, and much

anecdotal evidence, that returns display a far lower-dimensional factor structure in bad times than

in good times, i.e. that diversification benefits disappear in bad times when they are needed most

(see e.g. Cappiello et al., 2006). This result arises endogenously in the model, estimated using

our procedure that explicitly allows the factors to change across states. In short, the empirical

results provide a strong case for the full-blown, unconstrained estimation of latent factor models

with downside risk in the context of asset pricing.

The remainder of the paper is organised as follows. Section 2 introduces the asset pricing

model. Section 3 deals with identification of risk premia. Section 4 presents our methodology to

determine the number of pricing factors. Section 5 discusses estimation and inference of pricing

factors, risk exposures and risk premia. Section 6 presents simulation results. Section 7 performs

the empirical analysis. Section 8 concludes.

2 Model

In this section we present both the setup of the unconditional asset pricing model that is most

commonly used in the literature, followed by the conditional asset pricing model with downside

risk that we study in this paper.

4

At the outset, it is useful to establish some notation. We denote the ordinary limit as “→”.

We use the notation c0, c1,... to denote positive and finite constants, which do not depend on the

sample sizes and whose value can change from line to line. I (·) denotes the indicator function. We

use the expression “a.s.” as short-hand for “almost surely”. We denote the j -th largest eigenvalue

of a matrix A as g ( j) (A); ιN denotes an N -dimensional vector of ones; Ik is an identity matrix

of dimension k; we use the notation aN =Ω (bN ) to indicate that, as N →∞, b−1N aN →∞; we set

CN ,T = min

N 1/2,T 1/2.

2.1 Unconditional asset pricing model

Let Ri ,t be the return (in excess of the risk-free rate) on the test asset i at time t . Our starting

point is the unconditional asset pricing model

Ri ,t = γ0 +αi +β′i γ1 +β′

i ut +εi ,t , (2.1)

with 1 ≤ i ≤ N , 1 ≤ t ≤ T , where N and T denote the cross-section and time series dimensions

of the available sample, respectively, and

ut = ft −µ f . (2.2)

In (2.2), ft =(

f1,t , . . . , fP,t)′

is the P ×1 vector of latent factors with E (ft ) =µ f , so that ut in (2.1)

represents the P ×1 zero mean vector of factor innovations; further, in (2.1), γ0 is the zero-beta

rate; γ1 =(γ1,1, . . . ,γ1,P

)′is the P ×1 vector of risk premia; βi =

(βi ,1, . . . ,βi ,P

)′is the P ×1 vector

of risk exposures; αi is the pricing error; εi ,t is the idiosyncratic component such that E(εi ,t

)= 0

and E(utεi ,t

)= 0. In matrix notation, the model becomes

Rt = γ0ιN +α+Bγ1 +But +εt , (2.3)

where Rt = (R1t , . . . ,RN t )′, α= (α1, . . . ,αN )′, and B = (β1, . . . ,βN

)′. Following Giglio and Xiu (2020),

we let some of the true factors be omitted or measured with error (or both). The K observable

factors gt , with K ≤ P , can be either tradable of nontradable (or both) and have the structure

gt = a+Λut +et . (2.4)

As shown in Giglio and Xiu (2020), given γ1 the K ×1 vector of risk premia of gt is γg =Λγ1.

5

2.2 Conditional asset pricing model with downside risk

Following Ang, Chen, and Xing (2006), Lettau, Maggiori, and Weber (2014), and Farago and

Tedongap (2018), we generalize the asset pricing model in (2.1) to account for downside risk. The

conditional factor model of interest is

Ri ,t = I (Dt )(γD,0 +αD,i +β′

D,i γD,1 +β′D,i uD,t

)+I (St )

(γS ,0 +αS ,i +β′

S ,i γS ,1 +β′S ,i uS ,t

)+εi ,t ,

(2.5)

where

u j ,t = f j ,t −µ j , f , j =D,S . (2.6)

The model in (2.5) generalizes (2.1) to allow for downside risk as captured by the disappoint-

ment event Dt , which is defined formally below. The satisfactory event St occurs whenever Dt

does not take place: formally, this means that I (Dt )+ I (St ) = 1 and I (Dt ) I (St ) = 0. Section 2.3

below discusses Dt and St in detail. In (2.6), f j ,t =(

f j ,1,t , . . . , f j ,P j ,t

)′is the P j ×1 vector of latent

factors satisfying E[f j ,t

∣∣I( jt)= 1

] = µ j , f ; u j ,t is the P j ×1 vector of factor innovations such that

E[u j t

∣∣I( jt)= 1

] = 0. As for (2.5): γ j ,0 is the zero-beta rate and γ j ,1 =(γ j ,1,1, . . . ,γ j ,1,P j

)′is the

P j ×1 vector of risk premia; β j ,i =(β j ,i ,1, . . . ,β j ,i ,P j

)′is the P j ×1 vector of risk exposures; α j ,i is

the pricing error; the idiosyncratic components εi ,t satisfy E[u j ,tεi ,t

∣∣I( jt)= 1

]= 0. The model in

matrix form isRt = I (Dt )

(γD,0ιN +αD +BDγD,1 +BDuD,t

)+I (St )

(γS ,0ιN +αS +BS γS ,1 +BS uS ,t

)+εt ,(2.7)

where α j =(α j ,1, . . . ,α j ,N

)′, and B = (

β j ,1, . . . ,β j ,N)′

. We generalize the mapping between the true

K latent factors, and observable factors as

gt = I (Dt )(aD+ΛDuD,t

)+ I (St )(aS +ΛS uS ,t

)+et . (2.8)

The K ×1 vector of risk premia of gt becomes

γg = I (Dt )γD,g + I (St )γS ,g, γ j ,g =Λ j γ j ,1, j =D,S . (2.9)

When the disappointment event occurs we thus have γg =γD,g =ΛDγD,1.

6

2.3 Disappointment event

Following, inter alia, Farago and Tedongap (2018), we define the disappointment event as

Dt =rW,t ≤ θ

, where rW,t is the log-return on the market and θ is the corresponding threshold

value that determines the occurrence of Dt . The satisfactory event then is St =rW,t > θ

. Both

Dt and St depend on θ: to stress this dependence, we let d j ,t (θ) = I(

jt), for j =D,S .

2.4 Pricing errors and zero-beta rate

The conditional asset pricing model in (2.7) allows for unrestricted zero-beta rates γD,0 and

γS ,0, and for non-zero pricing errors αD and αS . None of these invalidates our methodology.

In particular, non-zero pricing errors, which for instance may arise from failing no arbitrage

conditions, do not affect estimation and inference on the risk premia.2 We focus on the risk premia

in a conditional setting with downside risk: allowing for some degree of model misspecification is

a desirable feature of our procedure.

3 Identification

3.1 Identification of risk premia

The conditional factor model in (2.7) is subject to two kinds of identification issues: the

rotational indeterminacy typical of statistical factor models; the unobserved heterogeneity problem

induced by the pricing errors. Let us elaborate on these identification issues. Risk exposures,

pricing factors and risk premia in the linear model in (2.3) are identified up to a rotation: for any

P ×P positive definite matrix L we have

Rt = γ0ιN +α+Bγ1 +But +εt

= γ0ιN +α+BLL−1γ1 +BLL−1ut +εt ,

and B, γ1 and ut are only identified up to L as they are all unobservable. However, as shown in

Giglio and Xiu (2020), the risk premia for the observable factors gt in (2.4) are identified, since it

2We do not assess a particular asset pricing model: for example, Pesaran and Yamagata (2012) and Fan, Liao,and Yao (2013) empirically validate the APT in a linear setting by testing the null hypothesis that all pricing errorsare equal to zero.

7

holds that γg =Λγ1 =ΛLL−1γ1.

The conditional pricing model in (2.5) faces similar identification features. For any P j ×P j

positive definite matrix L j , with j =D,S , we have

Rt = dD,t (θ)(γD,0ιN +αD +BDLDL−1

DγD,1 +BDLDL−1

DuD,t

)+dS ,t (θ)

(γS ,0ιN +αS +BS LS L−1

SγS ,1 +BS LS L−1

SuS ,t

)+εt .

The risk premia γg defined in (2.9), however, are identified, since

γg = dD,t (θ)ΛDγD,1 +dS ,t (θ)ΛS γS ,1

= dS ,t (θ)ΛDLDL−1D

γD,1 +dS ,t (θ)ΛS LS L−1S

γS ,1.

We solve the unobserved heterogeneity problem induced by the pricing errors by expressing the

model for Rt in terms of deviations from the conditional means. Formally, let T j (θ) =∑Tt=1 d j ,t (θ)

be the number of times that the event j occurs, for j = D,S . Define the conditional average

returns R j (θ) = T j (θ)−1 ∑Tt=1 d j ,t (θ)Rt : R j (θ) is the mean of Rt when j occurs. In order to estimate

pricing factors and risk exposures, we consider the model

Rt (θ) = Rt −dD,t (θ)RD(θ)−dS ,t (θ)RS (θ) = dD,t (θ)BDuD,t (θ)+dS ,t (θ)BS uS t (θ)+Λεt (3.1)

where u j ,t (θ) = ut−u j (θ) is the deviation of ut from the conditional mean u j (θ) = T j (θ)−1 ∑Tt=1 d j ,t (θ)ut ;

Λεt = dD,t (θ)[εt −ΨεD(θ)]+dS ,t (θ)[εt −ΨεS (θ), where Ψε j (θ) = T j (θ)−1 ∑Tt=1 d j ,t (θ)ε j ,t is the conditional

mean of εt when j occurs.

3.2 Determining the disappointment threshold

From Section 3.1, identification of factors, exposures and risk premia requires knowledge of

the disappointment event Dt and thus of the threshold θ. Given that no procedure is available to

estimate θ, the literature typically assumes a value for θ and estimation of the asset pricing model

is then conditioned on that value being true.3 In general, however, θ is not a structural parameter

on which theory can provide guidance, which makes it difficult to assume a particular value of θ.

In this paper, we show how to estimate θ by least squares, as detailed in Section 4.2.1, building on

recent results in in Massacci (2017));. Henceforth, we let θ be the corresponding estimator.

3For example, Farago and Tedongap (2018) a priori set θ = −0.03 and run robustness checks for θ ∈0,−0.015,−0.04.

8

4 Inference

We begin by introducing some short-hand notation, which we use henceforth. The superscript

“0” will denote the true value of a parameter; when this is not used, we refer to a generic value.

We will also use the short-hand notation T j = T j(θ0

)and π j = π j

(θ0

); further, the true number

of factors in each regime will be denoted as P 0D

and P 0S

respectively, also setting, when needed,

P 0 = P 0D+P 0

S.

4.1 Assumptions

Let

u0t = u0

D,t ∪u0S ,t , (4.1)

where u0t has dimension P 0

u such that max(P 0

D,P 0

S

)≤ P 0u ≤ P 0

D+P 0

S, and let B0

u, j =(β0

u, j ,1, ...,β0u, j ,N

)′be, for j =D,S , the N ×P 0

u matrix suitably filled with zeros so that B0u, j u0

t = B0j u0

j ,t for j =D,S .

Finally, let

δ0i =β0

u,S ,i −β0u,D,i . (4.2)

Assumption 1. There exists an η ∈ (12 ,1

]such that δ0

i 6= 0 for i = 1, ...,bNηc and∑N

i=bNηc+1

∥∥δ0i

∥∥<∞.

Assumption 1 is related to the results in Bates, Plagborg-Møller, Stock, and Watson (2013),

who show that, if no more than a fraction O(N 1/2

)of the cross-sectional units in a large dimensional

factor model have a structural break in the loadings, then the principal components estimator

applied to a (misspecified) linear model has the same convergence rate as in the no break case -

namely, Op

(C−1

N ,T

)(see Bai and Ng (2002)). Thus, Assumption 1 requires that at least a fraction

O (Nη) of the N assets have regime specific factor loadings, for 12 < η≤ 1.

Assumption 2. It holds that, for all N : (i) (a) the largest eigenvalue of T −1j

∑Tt=1 E

(εtε

′t d j ,t

(θ0

))is finite for j = D,S , and (b) the smallest eigenvalue of T −1

j

∑Tt=1 E

(εtε

′t d j ,t

(θ0

))is positive for

j =D,S ; (ii)

max1≤i≤N

N∑k=1

1

T j

T∑t=1

T∑s=1

∣∣E (εi ,tεk,sd j ,t

(θ0)d j ,s

(θ0))∣∣<∞,

for j =D,S .

9

Assumption 3. It holds that (i) E∥∥f j ,t

∥∥4 < ∞ for j = D,S and 1 ≤ t ≤ T ; (ii) it holds that, as

T →∞1

T

T∑t=1

E(f0

j ,t d j ,t (θ) f′j ,t d j ,t(θ0))→Σf, j

(θ,θ0) ,

where Σf, j(θ,θ0

)is positive definite for j = D,S and all θ; (iii)

∥∥∥β0j ,i

∥∥∥ < ∞ for j = D,S and

1 ≤ i ≤ N ; (iv) N−1B′j B j →ΣB, j as N →∞, where ΣB, j are positive definite matrices for j =D,S .

Assumption 2 stipulates that the idiosyncratic part of the model (i.e., the errors εi ,t ) can be

cross-sectionally correlated; however, by part (i) of the assumption such cross-sectional correlation

is weak. Conversely, by Assumption 3, the “signal” part of the model (i.e., the one related

to the common factor structure) introduces strong cross-sectional correlation between the Ri ,t s.

In particular, as we show in Lemma 1 below, Assumption 3 entails that the eigenvalues of the

covariance matrix of the Ri ,t s have a spiked structure, whereby the largest eigenvalues diverge

proportionally to N whereas the other ones are bounded.

Assumption 4. It holds that (i) E(εi ,t

)= 0 and E∣∣εi ,t

∣∣8 <∞; (ii) E(d j ,t (θ)d j ,v (θ)εi ,tεi ,v

)= τ j ,i ,t ,v (θ)

with∣∣τ j ,i ,t ,v (θ)

∣∣< ∣∣τ j ,t ,v∣∣ for 1 ≤ i ≤ N , and T −1

j

∑Tt=1

∑Tv=1

∣∣τ j ,t ,v∣∣<∞ for j =D,S ; (iii) E

(T −1

j

∑Tt=1 d j ,t (θ)εi ,tεl ,t

)=

σ j ,i ,l ,t (θ) with∣∣σ j ,i ,l ,t (θ)

∣∣≤ ∣∣σ j ,i ,l∣∣ for 1 ≤ t ≤ T and N−1 ∑N

i=1

∑Nl=1

∣∣σ j ,i ,l∣∣<∞ for j =D,S ; (iv)

E

∣∣∣∣∣T −1/2j

T∑t=1

(d j ,t (θ)εi ,tεl ,t −E

(d j ,t (θ)εi ,tεl ,t

))∣∣∣∣∣4

<∞,

for j =D,S and 1 ≤ i , l ≤ N ; (v) it holds that E(f0

j ,sεi ,t |zt

)= 0 for j =D,S and 1 ≤ t , s ≤ T , with

E

[N−1

N∑i=1

∥∥∥∥∥T −1/2j

T∑t=1

d j ,t (θ) f0j ,tεi ,t

∥∥∥∥∥2]

<∞,

for all θ and j =D,S .

Assumption 4 mimics similar assumptions in the literature (see e.g. Assumptions C and D

in Bai (2003)), and essentially it states that the idiosyncratic errors are weakly dependent. Parts

(iv) and (v) of the assumption are, essentially, weak (serial) dependence conditions.

Assumption 5. It holds that (i)

f0j ,t , zt ,εt

is a strictly stationary, ergodic, ρ-mixing sequence

with mixing numbers ρ j ,m satisfying∑∞

m=1ρ1/2j ,m < ∞ for j = D,S (ii) E

(∥∥∥f0j ,tεi ,t

∥∥∥4 |zt

)< ∞ and

E

(∥∥∥f0j ,t

∥∥∥4 |zt

)<∞ for j = D,S and 1 ≤ i ≤ N ; (iii) zt has bounded density; (iv) the density of zt

10

is continuous at θ0; (v) E(f0

j ,t f0′j ,t |zt = θ

)is positive definite and continuous at θ = θ0 for j =D,S

with∑N

i=bNηc+1δ0′i E

(f0

j ,t f0′j ,t |zt = θ0

)δ0

i <∞.

Assumption 5 draws upon Assumption 1 in Hansen (2000) (see also Massacci (2017)), to which

we refer to for further discussion. In our context, it is used in order to obtain the convergence

rate of the estimators for factor innovations and loadings. We point out that some of the other

assumptions above - chiefly the ones that contain “time series results”(e.g., Assumptions 4(iv)-(v))

- would follow from Assumption 5, although they could also be derived from other dependence

assumptions.

A final note on the assumptions above. Assumption 3(iv) is typical in the literature on

inference on factor models (we refer, inter alia, to the contributions by Bai and Ng (2002) and

Bai (2003)) and, in essence, it requires that the common factors be “strong” or “pervasive”. A

well-known consequence of the assumption is that - as we show in Lemma 1 - the eigenvalues of

the second moment matrix of the data diverge at a rate exactly equal to N , which we exploit in

the construction of our tests for the determination of the number of common factors. Conversely,

Assumption 3(iv) rules out the potentially interesting case of having “weak” common factors.

As pointed out in Uematsu and Yamagata (2019), a “weak” common factor can be defined as

a common factor whose corresponding eigenvalue in the second moment of the matrix diverges,

but at a rate slower than N . Such a case could be of interest in general, and especially in the

context of a threshold factor model, where e.g. a pervasive factor affects only a fraction of the

units in one regime, and is absent in the other regime. Although such cases could, in principle, be

embedded in our analysis, we prefer to present the main results for the (restrictive) case implied

by Assumption 3(iv), and discuss extensions in Section 4.2.5.

4.2 Determining the number of common factors

The first step in the analysis is to determine the dimensions of the factor spaces P 0D

and

P 0S

.

11

4.2.1 Estimating θ0

We begin by studying the (strongly consistent) estimation of θ0, extending the results in

Massacci (2017). Let the N ×1 vector of demeaned returns be Rt =(R1,t , . . . , RN ,t

)′, for 1 ≤ t ≤ T ,

defined as in (3.1). Let Θ be the parameter space of θ0. For generic numbers of factors PD , PS and

P = PD +PS , define the N ×P matrix of loadings BP =(BPD

D,BPS

S

), and the associated PD ×T and

PS ×T matrices of demeaned factor innovations UPD

D=

(uPD

D,1, . . . ,uPD

D,T

)and UPS

S=

(uPS

S ,1, . . . ,uPS

S ,T

),

with UP =(UPD ′

D,UPS ′

S

)′. The objective function in terms of BP , UP and θ, is the sum of squared

residuals (divided by N T )

S(BP ,UP ,θ

)= 1

N T

T∑t=1

∥∥∥Rt −dD,t (θ)BPD

DuPD

D,t −dS ,t (θ)BPS

SuPS

S ,t

∥∥∥2. (4.3)

For given P , the estimator for B, U and θ0 are obtained by minimizing S(BP ,UP ,θ

)with respect to

BP , UP and θ. For identification purposes, we concentrate out UP . The estimators BP =(BPD

D, BPS

S

)with B

P j

j = (β j ,1, ..., β j ,N

)′for j =D,S , UP , and θR , for BP0 , UP0 and θ0, respectively, solve

BP ,UP , θP = arg min

BP ,UP ,θS

(BP ,UP ,θ

),

subject to the normalization restrictions N−1(B

P j ′j B

P j

j

)= IP j , for j = D,S . It can be easily seen

that, letting

Σ j ,R (θ) = 1

N T

T∑t=1

d j ,t (θ) Rt (θ) Rt (θ)′ , (4.4)

for j =D,S and θ ∈Θ, the estimator BP j

j (θ) for BP j

j isp

N time the N ×P j matrix of eigenvectors

of Σ j ,R (θ) corresponding to its largest P j eigenvalues. The restriction N−1(B

P j

j (θ)′ BP j

j (θ))= IP j ,

for j =D,S , implies that the corresponding estimators for uD,t and uS ,t are

uPD

D,t (θ) = N−1(dD,t (θ) BPD

D(θ)

)′Rt .

uPS

S ,t (θ) = N−1(dS ,t (θ) BPS

S(θ)

)′Rt .

Let BP (θ) =[

BPD

D(θ) , BPS

S(θ)

]and UP (θ) =

(UPD ′

D(θ) ,UPS ′

S(θ)

)′; define

S[BP (θ) ,UP (θ) ,θ

]= 1

N T

T∑t=1

∥∥∥Rt −dD,t (θ) BPD

D(θ) uPD

D,t (θ)−dS ,t (θ) BPS

S(θ) uPS

S ,t (θ)∥∥∥2

.

12

For P = PD +PS with PD ≥ P 0D

and PS ≥ P 0S

, we now define the estimator

θ = argminθ

S[

BP (θ) ,UP (θ) ,θ]

. (4.5)

In Lemma A.7 in Appendix A, we show that θ is a strongly consistent estimator of θ0, also

providing its rate; see also Theorem 3.4 in Massacci (2017), where weak consistency is shown.

4.2.2 The eigenvalues of the second moment matrices

Let

dD,t = I(Dt

),

dS ,t = 1− dD,t ,

and define πS = T −1 ∑Tt=1 dD,t and πD = 1− πS . We extend the procedure suggested by Trapani

(2018) to estimate the number of factors in each regime, P 0D

and P 0S

. We use the sample covariance

matrices

ΣD = 1

T πD

T∑t=1

Rt R′t dD,t , (4.6)

ΣS = 1

T πS

T∑t=1

Rt R′t dS ,t . (4.7)

Equations (4.6) and (4.7) contain two possible estimators of the population covariance matrices,

defined as

ΣD = 1

TπS

T∑t=1

E(Rt R′

t dD,t(θ0)) , (4.8)

ΣS = 1

TπS

T∑t=1

E(Rt R′

t dS ,t(θ0)) . (4.9)

We will extensively employ the following notation for the eigenvalues: g (i )j refers to the i -th

largest eigenvalue of ΣD and ΣS (according as j =D or S ); similarly, g (i )j refers to the i -th largest

eigenvalue of ΣD and ΣS .

In order to make the presentation easier to follow, we begin by considering the case where,

in Assumption 1, η = 1; that is, the number of units which undergo a regime change is (equal

13

or proportional to) N . We discuss the case where η < 1 – which entails that f02,t only affects a

fraction of the units, or, in other words, that it is not a pervasive/strong set of factors – in Section

4.2.5.

Lemma 1. We assume that Assumptions 1-3 are satisfied. Then it holds that, for j =D,S

c(i )j N ≤ g (i )

j ≤ c(i )j N , for 0 ≤ i ≤ P 0

j ,

for some 0 < c(i )j ≤ c(i )

j <∞, and

g (i )j ≤ c(i )

j , for i ≥ P 0j +1,

for some c(i )j <∞.

Theorem 1. We assume that Assumptions 1-5 are satisfied. Then it holds that, for all 1 ≤ i ≤ N

and j =D,S

g (i )j = g (i )

j +oa.s.

(N T 1/2

Tπ0j

(ln N )1+ε (lnT )1/2+ε)

,

for every ε> 0.

Lemma 1 and Theorem 1 can be read together. According to Lemma 1, the spectrum of each

second moment matrix has a spiked structure; more specifically, the largest P 0j eigenvalues diverge

to positive infinity as N → ∞, whereas the others are bounded. The fact that the divergence

rate is exactly N is a direct consequence of Assumption 3(iv) and also of having assumed η= 1.

However, as long as the largest P 0j eigenvalues diverge, even at a slower rate, our arguments below

are still valid although the construction of the tests may be more convoluted. We further discuss

this in Section 4.2.5. Theorem 1 offers the sample counterpart to Lemma 1. In essence, it poses

a bound on the estimation error g (i )j − g (i )

j .4 5

Based on the results in Lemma 1 and Theorem 1, we can determine the number of common

4As is typical in (large) Random Matrix Theory, there is no guarantee that g (i )j − g (i )

j will drift to zero as

min(N ,T ) →∞. See for example the related contributions by Bai and Yao (2008), Bai and Yao (2012) and Wangand Fan (2017), and also Lemma 2 in Trapani (2018), where a similar result is derived, under different assumptions

and for a linear model. Indeed, g (i )j − g (i )

j may even diverge to infinity. However, as long as g (i )j − g (i )

j is of smaller

magnitude than g (i )j for j ≤ P 0

j , the spiked structure of the spectrum of the second moment matrices is preserved,

in that g (i )j is of smaller magnitude when j > P 0

j compared to the case j ≤ P 0j .

5Note that we are not imposing that 0 < πD ,πS < 1 - we are therefore entertaining the possibility that e.g.πD =πD (T ) → 0, which may be well suited to a situation in which one of the two regimes occurs very rarely.

14

factors in each regime using the following algorithm based on two separate steps: in the first step,

we study a test for H0 : g (i )

j = c(i )j N

HA : g (i )j ≤ c(i )

j

, (4.10)

for some 0 < c(i )j <∞ and j =D,S . In essence, the null hypothesis is that the i -th largest eigenvalue

of the covariance matrix of the data - in each regime - diverges, thus suggesting that there are

at least i common factors. Conversely, upon rejecting the null hypothesis, the conclusion can be

drawn that there are fewer than i common factors in regime j . Thus, the second step of the

analysis is to determine P 0D

and P 0S

by carrying out a sequence of tests for i = 1, ...,Pmax, with

Pmax a user-defined upper bound.

4.2.3 The individual tests

We begin by discussing how to test for (4.10). Since Theorem 1 only contains rates, we propose

a randomised version of the estimated eigenvalues g (i )j , based on Trapani (2018). We need the

following assumption on the size of the two regimes TπD and TπS .

Assumption 6. As min(N ,T ) →∞, it holds that

(ln N )1+ε (lnT )1/2+ε

T 1/2π j= 0,

for j =D,S .

By Assumption 6, g (i )j − g (i )

j is of a smaller order of magnitude than N . As mentioned above,

this ensures the spiked structure of the spectrum of the sample second moment matrices.6

Based on Lemma 1, each g (i )j should diverge or not according to weather there are at least

i common factors or fewer. Thus, we would expect the same separation result to also hold for

g (i )j . Indeed, whenever g (i )

j diverges as fast as N (which, in (4.10), represents the null hypothesis),

g (i )j also does, at the same rate: this is an immediate consequence of Theorem 1. Conversely,

6In essence, the assumption is a restriction on the rates at which π j can drift to zero: in particular, theassumption is satisfied as long as π−1

j = O(T 1/2−ε) for any ε > 0. Note that the assumption also (very mildly)

restricts the relative rate of divergence between N and T as they pass to infinity - in essence, however, it allows forN to grow at an arbitrarily high polynomial order with T , i.e. it allows for N =O (T κ) for any value of κ.

15

there is no guarantee that g (i )j converges when g (i )

j does so: in this case (which corresponds to

the alternative hypothesis in (4.10)), from Theorem 1, it is still possible that the estimation error

g (i )j − g (i )

j will diverge at a rate N T −1/2, modulo some logarithmic terms. Therefore, in order to

ensure that g (i )j has the same separation result as its population counterpart g ( j)

i , we propose to

use the statistic

ψ(g (i )

j

)= exp

N−% j g (i )j

g j

(p

) , (4.11)

where

g j

(p

)= 1

N −p

N∑h=p+1

g (h)j ,

and p is user-defined. The purpose of g j

(p

)is to rescale g (i )

j ; in Trapani (2018), it is suggested

to set p = 0 (that is, to use the trace to rescale g (i )j ), but other choices are possible, such as p = i ,

which is recommended in Barigozzi and Trapani (2018). In (4.11), we have defined

% j =ε when

lnT j

ln N ≥ 12

1− 12

lnT j

ln N +ε whenlnT j

ln N < 12

, (4.12)

where ε > 0 is a (small) user-defined number. The rationale for this choice is that we need to

choose % j such that

limmin(N ,T )→∞

N 1−% j T 1/2

Tπ j(ln N )1+ε (lnT )1/2+ε = 0; (4.13)

(4.12) is therefore proposed under the condition 0 <πD ,π2 < 1, although it is valid for any values of

πD and πS . If this is violated, in principle (4.13) can be employed to construct a refined proposal

for % j . Based on (4.12) and on the strong rates in Theorem 1, it is easy to see that

limmin(N ,T )→∞

ψ(g (i )

j

)=∞ under H0,

limmin(N ,T )→∞

ψ(g (i )

j

)= 1 under HA.

Based on these heuristic considerations, we now describe the randomised test.

Step 1 Generate an i.i.d., N (0,1) sequenceξ(i )

j ,m ,1 ≤ m ≤ M, where the ξ(i )

j ,ms are independent

across i and j .

Step 2 Define the Bernoulli random variable

ζ(i )j ,m (s) = I

(ψ

(g (i )

j

)×ξ( j)

i ,m ≤ s)

. (4.14)

16

Step 3 Compute

υ(i )j (s) = 2

M 1/2

M∑m=1

(ζ(i )

j ,m (s)− 1

2

). (4.15)

Step 4 Compute

Υ(i )j =

∫ +∞

−∞

(υ(i )

j (s))2 1p

2πexp

(−1

2s2

)d s. (4.16)

Let P∗ be the probability conditional onRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T

; we use “

D∗→” and “

P∗→” to

indicate convergence in distribution and in probability according to P∗ respectively. Also, define

Å such that

0 < Å<c(i )

j

1N−p

∑Nh=p+1 c(h)

j

,

where c(i )j and the c(h)

j s are defined in Lemma 1.

Theorem 2. We assume that Assumptions 1-6 are satisfied with η = 1 in Assumption 1. Let

M = M (N ), so that limN→∞ M (N ) =∞. Then, as min(N ,T j

)→∞ with

M exp(−ÅN 1−% j

)→ 0, (4.17)

it holds that under H0

Υ(i )j

D∗→ χ2

1, (4.18)

for almost all realizations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T

, j =D,S and all 1 ≤ i ≤ N .

Under HA, it holds that there exists a positive, finite constant c0 such that

1

4MΥ(i )

jP∗→ c0, (4.19)


, j =D,S and all 1 ≤ i ≤ N .

The theorem, and the algorithm above, suggest that some tuning is needed prior to imple-

menting the test. To begin with, note that the power of tests based on Υ(i )j increases with M by

(4.19); conversely, in the proof of (4.18) it is shown that the test statistic has a non-centrality

parameter which is controlled by (4.17), and which vanishes faster the smaller M . Thus, the

choice of M reflects the well-known size-power trade-off. We point out that, based on (4.17), the

17

choice M = N satisfies the restriction, and it is our recommended choice.7 Similarly, we have set % j

according to (4.12), with ε= 0.01; based on Theorem 2, ε does not need to be too large, since its

purpose is to smooth away a slowly varying sequence. Even in this case, as the theory prescribes,

the impact of % j is minimal; we have tried, as robustness check, ε= 0.05 and ε= 0.1, and results

do not change in general (with one exception discussed below in Section 7.2.2).

In Step 2, we recommend to draw the values of s from a standard normal. By integrating s out

in Step 4 of the algorithm, the statistic Υ(i )j becomes scale-invariant, in the sense that Υ(i )

j does not

depend on the support of s.8 This form of scale invariance is clearly desirable. From a practical

point of view, we propose to use a Gauss-Hermite quadrature to approximate the integral that

defines Υ(i )j , viz.

Υ(i )j = 1p

π

nS∑i=1

wi

(υ(i )

j

(p2xi

))2, (4.20)

where the xi , 1 ≤ i ≤ nS , are the zeros of the Hermite polynomial HnS (x) and

wi =pπ2nS−1 (nS −1)!

nS[HnS−1

(p2xi

)]2 . (4.21)

Thus, when constructing υ(i )j (s), we construct nS of these statistics, each with s =p

2xi ; the values

of the roots xi , and of the corresponding weights wi , are tabulated e.g. in Salzer, Zucker, and

Capuano (1952).

4.2.4 Determining P 0D

and P 0S

Based on the results in the previous section, it is now possible to propose a sequential procedure

to determine P 0D

and P 0S

. This is based on the following two-step algorithm, which we present

only for the determination of P 0D

, the case of P 0S

being exactly the same.

For each j =D,S

7We point out that altering the choice of M within a reasonable range does not have any impact on the finalresults. Indeed, we tried M = N /2 and M = 2N , without noting any changes. This robustness is in line with thesimulations in Trapani (2018).

8We point out that, in the original paper by Trapani (2018), this is not the case since the methodology isimplemented by extracting several values of s from a uniform distribution with support, say, S: doing this makesthe test statistics dependent on S, thus making them not scale invariant.

18

Step 1 Run the test for H0 : g (i )j =∞ based on Υ(1)

j . If the null is rejected, set P j = 0 and stop,

otherwise go to the next step.

Step 2 Starting from i = 1, run the test for H0 : g (i+1)j =∞ based on Υ(i+1)

j . If the null is rejected, set

P j = i and stop; otherwise repeat the step until the null is rejected, or until the pre-specified

maximum number Pmax is reached.

Let α=α (N ,T ) denote the level of the individual tests. It holds that

Theorem 3. We assume that Assumptions 1-6 are satisfied. As min(N ,T j

) →∞ under (4.17), if

α(N ,T j

)→ 0, then it holds that

P j = P 0j +oP∗ (1) ,


, and j =D,S .

In the implementation of the algorithm, it is necessary to have an upper bound Pmax; this is

chosen as

Pmax =O(C 1−c

N ,T

), (4.22)

for some user-defined c > 0. The selection rule in (4.22) is as an extension of the otherwise

customarily employed Schwert’s rule (Schwert (1989); see also the comment in Bai and Ng (2002));

as a possible example, one may use Pmax =⌊

min

N 1/3,T 1/3⌋

; in our simulations, we have tried

this specification and, as a robustness checks, other ones, but results do not seem to be affected

in any major way.

4.2.5 Discussion on weak factors

As pointed in Section 4.1, Assumption 3(iv) implicitly stipulates that common factors are

strong, or pervasive, thus ruling out the case of weak common factors. In this section, we provide

a heuristic discussion, based on some examples, of (i) under which circumstances weak common

factors may arise in the context of our nonlinear factor model, and (ii) under which circumstances

our procedure is able to find evidence of weak common factors.

Example 1. A first example is intimately related to the discussion in Trapani (2018) on weak

factors; we also refer to the paper by Uematsu and Yamagata (2019) for a very general, thorough

19

characterisation of weak factors. Suppose (for simplicity and with no loss of generality) that there

exists only one common factor in both regimes, which affects all units and whose loadings are such

that N−1B′j B j → 0, but N−νB′

j B j → c0 > 0 for some 0 < ν< 1. In this case, adapting the proof of

Lemma 1, it is easy to see that

g (1)j = c1, j Nν, (4.23)

for both regimes j =D,S ; in turn, this entails that, adapting Theorem 1

g (1)j = c1, j Nν+oa.s.

(N T 1/2

Tπ j(ln N )1+ε (lnT )1/2+ε

). (4.24)

In order to detect that g (1)j →∞, by construction it is required that N−% j g (1)

j →∞, which in turn

requires ν> % j . Recalling (4.12), after some elementary algebra, this is tantamount to requiring

ν>

0 whenlnT j

ln N ≥ 12

1− 12

lnT j

ln N whenlnT j

ln N < 12

, (4.25)

This result essentially states that the ability to detect weak factors depends on the relative rate

of divergence between N and T : as N increases, weak factors become more and more difficult to

determine. Note that this is not a property of the testing procedure per se, but of the estimated

eigenvalues, where the “signal” part g (1)j becomes weaker and weaker as the “noise” component

increases with N . Note that when N is comparatively small with respect to T - i.e. when

N = o(T 1/2

)- then weak factors can, in principle, be always detected.

Example 2. Another possible example is the appearance/disappearance, from one regime to an-

other, of a possibly strong common factor which affects only some of the units. Consider the

case where (again, for simplicity) there is no common factor in one regime, whereas there is one

common factor in the other regime, but only O (Nν) units are affected by it. In this case, since

only O (Nν) of the loadings are non-zero, it follows that N−1B′j B j → 0, but N−νB′

j B j → c0 > 0,

which, similarly to (4.23) and (4.24), entails

g (1)j = c1, j Nν,

g (1)j = c1, j Nν+oa.s.

(N T 1/2


).

In this case, two (different) tasks are affected by the appearance of a cluster-specific factor: firstly,

the estimation of θ0, which, by Assumption 1, is possible only if ν> 12 ; secondly, the test for the

20

presence of a common factor, which, as before, requires N−% j g (1)j → ∞, which in turn requires

ν> % j . Putting these two requirements together, it follows that

ν> max

1

2,1− 1

2

lnT j

ln N

, (4.26)

which, in essence, entails that a new common factor which affects only a fraction of the units can

be detected only if it affects “enough units”. Of course, if θ0 were known a priori, we would be in

the same situation as (4.25).

Example 3. As a closely related variation to the previous example, consider the case where - again

- there is only one common factor which affects only a fraction of the units, say O (Nν), whose

loadings differ across regimes. The same logic as in the previous examples implies that

g (1)j = c1, j Nν,

g (1)j = c1, j Nν+oa.s.

(N T 1/2


).

In this case, by definition, only O (Nν) units are subject to a change between regimes; thus,

recalling Assumption 1, it must hold that

ν> max

1

2,1− 1

2

lnT j

ln N

,

exactly as in (4.26).

Example 4. As (4.12) - and, consequently, (4.25)-(4.26) - show, the relative speed of divergence

between N and T j plays an important role. So far, our analysis has focused on two regimes of

length TD and TS respectively, with the implicit assumption that TD = cT (and consequently

TS = (1− c)T ), for some c ∈ (0,1). This means that the number of time periods in each regime

is of the same order of magnitude as T . However, the results in Theorems 1, 2 and 3 only

require min(TD ,TS ) →∞, so that, in principle, the case TD = o (T ) can be considered. This has

consequences on the ability to detect weak common factors: the requirement

ν> max

1

2,1− 1

2

lnTD

ln N

,

coming from the examples considered above, still holds, but when TD = o (T ), it entails that the

detection of a weak common factor in regime D is more difficult, and factors that are “too weak”

21

will not be identified, thus leading to an understatement of the number of common factors in

regime D.

As a final comment, all the examples considered show that our procedure (similarly to all

procedures to estimate the number of common factors) may fail in the presence of a weak common

factor, thus resulting in an underestimation of the number of common factors. A possible solution

in this case - which we exploit in the empirical part - is to estimate P 0j first and then, by way of

robustness check, estimate (2.1) with P j and P j +1 factors.

5 Three-pass estimation of downside risk premia

We now extend the Fama-MacBeth two-pass procedure (Fama and MacBeth (1973)) and

estimate downside risk premia for the observable factors gt as defined in (3.1) in the spirit of

Giglio and Xiu (2020). In particular, we propose the following the three-pass procedure:

First-pass: Based on Section 4.2 and Massacci (2017), estimate θ0, P 0j ,

u j ,t

Tt=1, and

β j ,i

Ni=1 for

j =D,S : this gives the estimators θ, P j ,

u

P j

j ,t

T

t=1, and

β

P j

j ,i

N

i=1for j =D,S , respectively.

Second-pass: Estimate the risk premia of f j ,t by running regime-specific cross-sectional regressions.

Third-pass: Estimate Λ within each regime by running a regression of gt on uP j

j ,t based on (2.8)

and rotate the risk premia of f j ,t .

We have discussed the first pass in Section 4.2, where consistency of θ and P j are derived. We

now turn to the second and third one.

5.1 Regime-specific cross-sectional regression

Given the N × P j matrices of beta estimates BP j

j =(β

P j

j ,1, . . . , βP j

j ,N

)′, define the N × (

1+ P j)

matrices X j =(ιN , B

P j

j

), for j = D,S . Let Γ j =

(γ j ,0,γ′j ,1

)′, for j = D,S , and Rt =

(R1,t , . . . ,RN ,t

)′,

for t = 1, . . . ,T .

22

The regime specific cross-sectional regression of Rt on X j yields

Γ j ,t =(X′

j X j

)−1 (X′

j Rt

)d j ,t ,

for j =D,S and 1 ≤ t ≤ T . Further, letting T j = T π j for j =D,S , Γ j is estimated as

Γ j =(γ j ,0

γ j ,1

)= 1

T j

T∑t=1Γ j ,t =

(X′

j X j

)−1 [X′

j R j(θ)]

,

for j =D,S , where R j(θ)= T −1

j

∑Tt=1 d j ,t Rt .

5.2 Estimation of regime-specific risk premia

Let g j ,t = gt d j ,t(θ)−g j , with g j = T −1

j

∑Tt=1 gt d j ,t

(θ), for j =D,S .

The estimator for Λ j within each regime is

Λ j =[

T∑t=1

d j ,t(θ)

g j ,t u′j ,t

][T∑

t=1d j ,t

(θ)

u j ,t u′j ,t

]−1

,

for j =D,S . Similarly, the estimator for γg, j ,1 is

γg, j ,1 = Λ j γ j ,1,

for j =D,S .

5.3 Asymptotics: consistency and limiting distribution

We now report the asymptotics for γg, j ,1; results for Γ j and Λ j are in appendix. Define the

P 0j ×T matrices of regime-specific factor innovations U j (θ) = [

u j ,1 (θ) , . . . ,u j ,T (θ)], for j = D,S ,

such that U1 (θ)+U2 (θ) = (u1, . . . ,uT ) = U.

5.3.1 Consistency

As is well known, Γ j (and Λ j ) cannot be estimated consistently, but only up to a rotation.

Let HP j

j j (θ) be the P 0j × P j rotation matrix defined as

HP j

j j (θ) = U j(θ0

)U j (θ)′

T

B0′j B j (θ)

NV j (θ)−1 , (5.1)

23

for j =D,S , where V j (θ) is the P j ×P j diagonal matrix of the first P j largest eigenvalues of Σ j ,R (θ)

defined in (4.4). Define the matrix HΓP j

j j (θ) as

HΓP j

j j (θ) = 1 0′

P j

0P jH

P j

j j (θ)

, (5.2)

for j =D,S .

For short, we define H j j (θ) as HP j

j j (θ) calculated with P 0j common factors, and HΓ

j j (θ) similarly.

Finally, we define H j j(θ0

)= H j j and HΓj j

(θ0

)= HΓj j .

We need the following assumptions, which complement Assumptions 1-6.

Assumption 7. 9It holds that, for j = D,S : (i) E(∑N

i=1 wiα j ,i)2 ≤ c0N for all wi N

i=1 such that∑Ni=1 w 2

i <∞; (ii)α j ,i

Ni=1 is independent of all other quantities; (iii) it holds that, as N →∞,[

V ar(∑N

i=1 viα j ,i)]−1/2 ∑N

i=1 viαiD→ N (0, Iν) for every ν-dimensional sequence vi , 1 ≤ i ≤ N , such

that∥∥N−1 ∑N

i=1 vi v ′i

∥∥<∞.

Assumption 8. It holds that: (i) in (2.4), (a) ‖Λ‖ = O (1) and (b) E (et ) = 0 with E ‖et‖4 <∞; (ii)

T −1 ∑Tt=1 et u0′

j ,t =OP (1) for j =D,S .

Theorem 4. We assume that Assumptions 1-8 are satisfied. Then, as min(N ,T j

) →∞, it holds

that

γg, j ,1 −γ0g, j ,1 = oP (1)+oP∗ (1) ,

for j =D,S , for almost all realizations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T

.

5.3.2 Asymptotic distribution

In order to derive the limiting distribution of the estimators proposed above, we need the

following assumption, which strenghtens some of the assumptions above.

9Assumption 7 contains some regularity conditions on the pricing errors αi in (2.1). By part (i), we require thatthe second moment of the αi s grows linearly with N ; a possible case in which this would happen is when αi is i.i.d.with zero mean and finite variance. However, the assumption is more general than this, and it allows e.g. for the

αi s to have nonzero mean. Indeed, if E (αi ) 6= 0 for 1 ≤ i ≤ Nα with Nα =O(N 1/2

), and E

(∑Ni=Nα+1 wiαi

)2 ≤ c0N , the

assumption would hold anyway. Note also that part (iii) of the assumption could be derived using more primitiveassumptions on the αi s (see e.g. the very general CLT in McLeish et al. (1974)).

24

Assumption 9. For j =D,S , it holds that (i) E(et |u0

j ,t

)= 0 for all t and (ii) as T →∞

T −1/2

( ∑Tt=1 vec

(et u0′

j ,t

)d j ,t

(θ0

)∑Tt=1 u0

j ,t d j ,t(θ0

) )D→ N

[(00

),

(Veu, j Π′

ue, j

Πue, j Vu, j

)],

where

Veu, j = limT→∞

E

(T −1

T∑t ,s=1

vec(et u0′

j ,t

)(vec

(esu0′

j ,s

))′d j ,t

(θ0)d j ,s

(θ0)) ,

Vu, j = limT→∞

E

(T −1

T∑t ,s=1

u0j ,t u′0

j ,sd j ,t(θ0)d j ,s

(θ0)) ,

Πue, j = limT→∞

E

(T −1

T∑t ,s=1

u0j ,t

(vec

(esu0′

j ,s

))′d j ,t

(θ0)d j ,s

(θ0)) .

We also make the following (simplifying) assumption, which complements Assumption 7 by

requiring the sphericity of the α j ,i .

Assumption 10. It holds that α j ,i is i.i.d. across i with E(α2

j ,1

)=σ2

α, j <∞ for j =D,S .

For j = D,S , consider the matrices M( j)u = limT→∞ T −1 ∑T

t=1 u0j ,t u0′

j ,t d j ,t(θ0

). We then de-

fine

Σγ, j = c−2j Λ

0j Vu, jΛ

0′j +

((γ0′

j ,1

(M( j)

u

)−1)⊗ IK

)Veu, j

(((M( j)

u

)−1γ0

j ,1

)⊗ IK

)+c−1

j Λ0jΠue, j

(((M( j)

u

)−1γ0

j ,1

)⊗ IK

)+ c−1

j

((γ0′

j ,1

(M( j)

u

)−1)⊗ IK

)Π′

ue, jΛ0′j ,

Σαγ, j = σ2α, jΛ

0j

[1

NB0′

j B0j −

(1

NB0′

j iN

)(1

NB0′

j iN

)′]−1

Λ0′j ,

where iN is an N ×1 vector of ones.

Theorem 5. We assume that Assumptions 1-9 are satisfied, and that min(N ,T j

)→∞, with

T j

T→ c j ∈ (0,1) , (5.3)

T 1/2

N→ 0. (5.4)

Then, it holds that (1

TΣγ, j + 1

NΣαγ, j

)−1/2 (γg, j ,1 −γ0

g, j ,1

)D∗→ N (0, IK ) , (5.5)

for j =D,S , for almost all realisations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T

.

25

A comment on (5.4) may be in order. By Theorem 4, all estimators are consistent, irrespective

of the relative rate of divergence between N and T as they pass to infinity. Theorem 5, conversely,

requires a restriction, namely that T = o(N 2

). However, it can be envisaged that most datasets

(unless T is very “short”, which is typically not the case in these studies) should satisfy this

requirement.

We propose the following estimators

Σγ, j = c−2j Λ j Vu, j Λ

′j +

((γ′j ,1

(M( j)

u

)−1)⊗ IK

)Veu, j

(((M( j)

u

)−1γ j ,1

)⊗ IK

)(5.6)

+c−1j Λ j Πue, j

(((M( j)

u

)−1γ j ,1

)⊗ IK

)+ c−1

j

((γ′j ,1

(M( j)

u

)−1)⊗ IK

)Π′

ue, j Λ′j ,

Σαγ, j = σ2α, j Λ j

[1

NB

P j ′j B

P j

j −(

1

NB

P j ′j iN

)(1

NB

P j ′j iN

)′]−1

Λ′j , (5.7)

In (5.6), recall that X j =(ιN , B

P j

j

). Also, we have used

M( j)u = 1

T

T∑t=1

u j ,t u′j ,t d j ,t , (5.8)

Vu, j = m( j)u,0 +

hm∑k=1

(1− k

hm +1

)(m( j)

u,k +m( j)′u,k

), (5.9)

Veu, j = m( j)eu,0 +

hm∑k=1

(1− k

hm +1

)(m( j)

eu,k +m( j)′eu,k

), (5.10)

Π′ue, j = 1

T

T∑t=1

(vec

(et u′

j ,t

))u′

j ,t d j ,t +hm∑k=1

(1− k

hm +1

)(m( j)Π,k +m( j)

Π,k

), (5.11)

with

m( j)u,k = 1

T

T∑t=k+1

(u j ,t d j ,t

)(u j ,t−k d j ,t−k

)′,

m( j)eu,k = 1

T

T∑t=k+1

(vec

(et u′

j ,t

)d j ,t

)(vec

(et−k u′

j ,t−k

)d j ,t−k

)′,

m( j)Π,k = 1

T

T∑t=k+1

(vec

(et−k u′

j ,t−k

)d j ,t−k

)(u j ,t d j ,t

)′,

m( j)Π,k = 1

T

T∑t=k+1

(vec

(et u′

j ,t

)d j ,t

)(u j ,t−k d j ,t−k

)′,

for k = 0,1, .... In (5.9)-(5.10), we use hm = O(T 1/3

), and et = gt −

(a+ Λut

). Finally, in order to

compute σ2α, j , let

e j ,i ,t = Ri ,t d j ,t −[γ j ,0 + β′

j ,i

(γ j ,1 + u j ,t

)]d j ,t ,

26

and define e j ,i = T −1j

∑Tt=1 e j ,i ,t and

σ2α, j =

1

N

N∑i=1

e2j ,i .

All the estimators defined above are consistent - in order to save space, we omit the proofs,

which is based on standard, if tedious, calculations based on using the results reported in the

above. The only result that does not follow immediately is the consistency of σ2α, j , which we state

as a lemma.

Lemma 2. We assume that Assumptions 1-10 hold. Then, as min(N ,T j

)→∞ for j =D,S under

(5.3) and (5.4), it holds that σ2α, j =σ2

α, j+oP∗ (1), for almost all realizations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T

.

When estimating Γ j , we also obtain an estimator of the zero-beta rate γ j ,0 for j =D,S . Based

on equation (2.5), we note that the quantity

E(I (Dt )γD,0 + I (St )γS ,0

)=πDγD +πS γS ,

can be interpreted as an “unconditional” zero-beta rate. Thus, we can also propose a test for

H0 :πDγD,0 +πS γS ,0 = 0. (5.12)

The following lemma characterizes the distribution of γ j ,0 and the test for (5.12)

Lemma 3. We assume that Assumptions 1-10 hold, and that E(αi ,Dαk,S

)= 0 for all 1 ≤ i ,k ≤ N .

Then, as min(N ,T j

)→∞ with NT 2

j→ 0, it holds that

(σ2α, j

(i ′N M0

B , j iN

)−1)−1/2 (

γ j ,0 −γ j ,0

)D∗→ N (0,1) , (5.13)

for j =D,S , and(π2

Dσ2α,D

(i ′N M0

B ,DiN

)−1 +π2S σ

2α,S

(i ′N M0

B ,S iN

)−1)−1/2 ((

πD γD,0 + πS γS ,0)− (

πDγD,0 +πS γS ,0)) D∗→ N (0,1) ,

(5.14)


, where M0

B , j = B0j

(B0′

j B0j

)−1B0′

j for j =D,S .

The lemma can be compared with Theorem 3 in Giglio and Xiu (2020).

27

5.3.3 Testing for θ0 = c

We propose a test for the (general) null hypothesis

H0 : θ0 = c, (5.15)

where c is a number. Given that - to the best of our knowledge - the limiting distribution of θ is

not available, we only rely on rates, and used a randomised test (closely related to e.g. the test

proposed in Horvath and Trapani (2019)).

Recall that, by Lemma A.7

θ−θ0 = oa.s.(N−ηT −1v N ,T (ε)

),

where v N ,T (ε) is defined in (A.3). In order to propose a test for (5.15), define the sequence sN ,T

such that

limmin(N ,T )→∞

sN ,T =∞,

limmin(N ,T )→∞

sN ,T N−ηT −1v N ,T (ε) = 0,

for all ε> 0, and note that, by construction

sN ,T(θ− c

)= sN ,T(θ0 − c

)+oa.s. (1) .

This entails that, under H0

P

(ω : lim

min(N ,T )→∞sN ,T

(θ− c

)= 0

)= 1,

and therefore we can assume that, under the null

limmin(N ,T )→∞

sN ,T(θ− c

)= 0.

Conversely, whenever θ0 6= c, it follows that

P

(ω : lim

min(N ,T )→∞sN ,T

(θ− c

)=∞)= 1,

28

and therefore we can assume that, under HA

limmin(N ,T )→∞

sN ,T(θ− c

)=∞.

This dichotomous behaviour can be reversed by choosing a transformation g (·) such that limx→∞ g (x) =0 and limx→0 g (x) =∞, thus defining

fN T = g(sN ,T

(θ− c

)).

As before, we can assume that

limmin(N ,T )→∞

g(sN ,T

(θ− c

))=∞,

limmin(N ,T )→∞

g(sN ,T

(θ− c

))= 0,

under H0 and HA respectively. Hence, we can define a test similar to the one in Section 4.2.3:

Step 1 Generate an i.i.d. sequenceξθm

, 1 ≤ m ≤ Mθ, with common distribution N (0,1) ;

Step 2 Generate the Bernoulli sequence ζθm (s) = I(

f 1/2N ,T ξ

θm ≤ s

);

Step 3 Compute

σθ (s) = 2

M 1/2θ

Mθ∑m=1

(ζθm (s)− 1

2

).

Step 4 Define

Sθ =∫ +∞

−∞

∣∣∣σθ (s)∣∣∣2 1p

2πexp

(−1

2x2

)d x. (5.16)

Then it can be shown that

Theorem 6. We assume that the assumptions of Lemma A.7 are satisfied. Then, as min(N , Mθ,T ) →∞ with

Mθ

g 1/2(sN ,T N−ηT −1v N ,T (ε)

) → 0, (5.17)

it holds that SθD∗→ χ2

1, under H0, and that R−1SθP∗→ c0 > 0 under HA, for almost all realisations of

Ri ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T.

As far as the practical application of the test is concerned, we have used

fN ,T = 1pN T

∣∣θ− c∣∣ ,

29

and Mθ = (N T )1/4.

Note that the test can be immediately generalised to comparing estimates of θ from different

datasets. Indeed, let the subscripts 1 and 2 denote each datatset, and define the sample sizes as

(N1,T1) and (N2,T2), and the relevant estimates as θ1 and θ2. Then, upon defining a sequence

s∗N ,T = s∗N ,T (N1,T1, N2,T2) such that

limmin(N1,T1,N2,T2)→∞

s∗N ,T =∞,

limmin(N1,T1,N2,T2)→∞

s∗N ,T

min

Nη1 T1, Nη

2 T2 max

v N1,T1 (ε) , v N2,T2 (ε)

= 0,

the test can be applied readily using

f ∗N ,T =

pminN1T1, N2T2∣∣θ1 − θ2

∣∣ ,

with Mθ = min(N1T1)1/4 , (N2T2)1/4

.

Theorem 6 entails that

limmin(N ,T,Mθ)→∞

P∗(Sθ ≥ cα

)= 1, (5.18)

under HA, and


P∗(Sθ ≥ cα

)=α, (5.19)

under H0, where cα is defined as P(χ2

1 ≥ cα) = α for a nominal level α. Both equations hold for

almost all realisations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T

.

Whilst equation (5.18) has the “traditional” interpretation (when the null is false, the test

- asymptotically - rejects with probability 1), equation (5.19) does not have a straightforward

meaning. As pointed out e.g. in Horvath and Trapani (2019), the test is constructed using a

randomisation which does not vanish asymptotically, and therefore the asymptotics of Sθ is driven

by the added randomness. Thus, different researchers using the same data will obtain different

values of Sθ, and, consequently, different p-values. If this feature is deemed undesirable, Horvath

and Trapani (2019) propose to generate the test statistic Sθb for 1 ≤ b ≤ B times, using, at each

iteration, an i.i.d. sequenceξθm,b

, 1 ≤ m ≤ Mθ, with common distribution N (0,1), independent

30

across b. Defining

Qα = 1

B

B∑b=1

I(Sθb ≤ cα

), (5.20)

it is easy to see that, by the LIL, it follows that10

lim infB→∞


√B

2lnlnB

Qα− (1−α)pα (1−α)

=−1,

and therefore a “strong rule” to decide in favour of H0 is

Qα ≥ (1−α)−√α (1−α)

√2lnlnB

B. (5.21)

6 Simulations

We evaluate the performance of (i) the sequential procedure to determine the number of com-

mon factors and (ii) the three pass estimation, through a small-scale Monte Carlo exercise.11

Data are generated according to (2.5), that is

Ri ,t =I (Dt )(γD,0 +αD,i +β′

D,i γD,1 +β′D,i uD,t

)+ I (St )

(γS ,0 +αS ,i +β′

S ,i γS ,1 +β′S ,i uS ,t

)+p

0.5εi ,t . (6.1)

In this data generating process (DGP), we generate all variables as having zero mean for simplicity.

Where possible and relevant, we calibrate the values of parameters to values that arise from

empirical analysis. In particular, as far as the downside risk is concerned, we set θ0 = −0.03,

which is the value set in Farago and Tedongap (2018) and Lettau, Maggiori, and Weber (2014).

Similarly, we generate zt as i .i .d .N(µz ,σ2

z

). Based on our data, we set σz = 0.044; also, in Section

7 we found that P(zt ≤ θ0

)' 0.15, which corresponds to a value µz =−0.015. The rest of the DGP

is specified as follows. For j = D,S , we generate u j ,t as i .i .d . N (0,1) for 1 ≤ t ≤ T ; similarly, we

generate βi , j as i .i .d . N (0,1) for 1 ≤ i ≤ N ; finally, α j ,i is also i .i .d . N (0,1) for 1 ≤ i ≤ N .12

10Details on this argument are in Horvath and Trapani (2019).11As far as the former is concerned, the results complement the ones in Trapani (2018), which considers the case

of a linear model.12We note that we have carried out some limited experiments where, e.g., a vanishing proportion of the α j ,i s

had nonzero mean, but results did not change. As far as the idiosyncratic innovation εi ,t is concerned, we onlyconsider the case of no cross sectional independence in our simulations; thus, we generate εi ,t as i.i.d. N (0,0.5).

31

Results are reported for the following combinations of (N ,T ), which are directly based on

the sample sizes in our empirical applications: (17,273) and (130,666). In addition, we also

report simulation results for two more cases: (70,273), because it is helpful in understanding the

improvement in estimation performance that accrues when moving from a very low N = 17 to

N = 130, hence providing guidance to applied researchers on the size of the cross section desired

for implementing our methodology; and (330,666), to consider the case of a sample with large N

and large T . We consider various number of common factors which mimic the findings in our

empirical applications: P 0D∈ 1,2 and P 0

S∈ 1,2,3,4,5,6.13

We compute, for 1000 replications, two indicators: (i) the average estimated number of fac-

tors

P j = 1

1000

1000∑r=1

P j ,r ,

j =D,S , and (ii) the percentage of times P j ,r 6= P 0j , j =D,S .

The results, reported in Table 1, are very satisfactory. Specifically, the estimated numbers of

factors is always close to the number of factors. This is very clear when N is at least equal to 70.

The worst performance occurs for the smalles cross-section considered, i.e. (17,273), where the

number of common factors is underestimated when the true number of factors in the satisfactory

state is 5 or 6 (the estimates are, on average, 4 and 5 respectively). However, this is hardly

surprising, given that the true DGP is generated for a small cross section and a large set of

common factors, which seems a rather unlikely DGP anyway. Overall, with the exception noted

above, the simulation results provide comfort that the number of common factors is precisely

identified by our procedure even in moderately-sized cross-sections of asset returns.

13Therefore we do not consider the case where there is no factor at all (i.e., either P 0D= 0 or P 0

S= 0, or both),

requiring that at least one factor should always be present in asset returns, which seems reasonable (see Lettauand Pelger (2020) for a related discussion).

32

7 Empirical analysis

7.1 Data

The empirical analysis in this section illustrates the use of the methodology proposed in this

paper in relation to two panels of test assets, namely a cross-section of equity portfolios and a set

of multiple assets. The data are described in details in Sections 7.1.1 and 7.1.2, respectively.

7.1.1 Equity panel

We begin with returns in excess of the risk-free rate from the following set of N = 130 U.S.

equity portfolios: 25 portfolios sorted by size and book-to-market; 25 portfolios sorted by size

and momentum; 10 size portfolios; 10 book-to-market portfolios; 10 momentum portfolios; 25

portfolios sorted by size and operating profitability; 25 portfolios sorted by size and investment.

The risk-free rate is the one-month US Treasury bill rate. The market return used to determine

the disappointment event (as detailed in Section 2.3) is the value-weighted average log-return

computed with respect to all stocks available from CRSP.

The data are taken from Kenneth French’s website.14 We consider the sample period from

July 1963 to December 2018, a total of T = 666 monthly observations. Farago and Tedongap

(2018) analyse each of these sets of portfolios individually.15 We consider the entire cross-section

of returns, which allows us to exploit our theoretical results and to be aligned with Lewellen,

Nagel, and Shanken (2010), who advocate the use of sizable cross-sections of asset returns.

7.1.2 Data for multiple asset classes

Cochrane (2011) suggests investigating the factor structure across multiple asset classes to

study the corresponding underlying discount factors. To analyze the effect of downside risk on

multiple asset classes we consider a cross section that includes currency, equity and commodity

14See https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html.15See Section 3.2 and Table 1 in Farago and Tedongap (2018).

33

https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html

returns, with N = 17 portfolios, obtained from Lettau, Maggiori, and Weber (2014)16: 6 currency

portfolios constructed from 53 currencies sorted according to their interest rate differential with

the US rate (carry trade portfolios); 6 equity portfolios sorted by size and book-to-market ratio; 5

commodity futures portfolios sorted by the commodity basis from Yang (2013). As for the equity

panel, the market return to determine the disappointment event as detailed in Section 2.3 is the

value-weighted average log-return computed with respect to all stocks available from CRSP.

The data sample runs from April 1986 to December 2012, which gives us T = 273 time series

observations. The data and our empirical exercise are similar to the one in Lettau, Maggiori, and

Weber (2014), who however capture the equity market by including the market excess return as a

test asset. This second dataset has a smaller cross section of asset returns than the equity returns

sample, but a much more diverse set of returns, hence providing incremental information on the

usefulness of our model.

7.2 Estimation and inference

In this section, we report estimation and inference on the threshold level θ (Section 7.2.1) and

on the number of factors in each regime (Section 7.2.2).

7.2.1 Estimation and inference on θ

Results are in Table 2. Across both datasets, the estimated threshold θ is approximately

equal to −0.06. Therefore, θ is more negative than the fixed threshold θ =−0.03 used in Lettau

et al. (2014), and Farago and Tedongap (2018): the estimated downside state is less frequent

but more severe than implied by the fixed threshold. This feature can be seen also by looking

at the number of time periods within regime TD and the corresponding relative frequency πD : θ

identifies periods of downside that occur approximately half of the times than those associated to θ.

16The data are available at https://faculty.chicagobooth.edu/michael.weber/research/index.html.

34

https://faculty.chicagobooth.edu/michael.weber/research/index.html

To enhance our understanding of the downside regime, we evaluate the correlation between

the downside indicators, as resulting from using θ and θ, and various measures of economic and

financial conditions that may be related to the downside regime. We consider the following set of

variables: the NBER U.S. recession indicator (REC); the log-difference of industrial production

(∆I P); the CBOE volatility index based on S&P 500 index options (VIX); the CBOE volatility

index based on S&P 100 index options (VXO); the economic policy uncertainty index of Baker,

Bloom, and Davis (2016) (EPU); the Equity Market Volatility tracker of Baker, Bloom, and Davis

(2016) (EMV); the Policy-Related Equity Market Volatility tracker of Baker, Bloom, Davis, and

Kost (2019) (PREMV); the Geopolitical Risk index of Caldara and Iacoviello (2018) (GPR); the

TED spread (TED); the disaster probability for the U.S. of Barro and Liao (2020) (SPX); the

spread for the T-bill between 10 and 2 year maturities (T10Y2Y). In all cases (with the excep-

tion of REC), we compute the standard Pearson correlation coefficient, which in this case is also

known as the “power biserial correlation.” When computing the correlation between the downside

indicator and REC - in light of the fact that these are both binary variables - we also use a

different measure of association, known as the “tetrachoric correlation coefficient.” This indicator

may be regarded as more adequate for our case, since it is based on the assumption that there

exists a continuous latent variable which determines each binary variable; however, there exists a

one-to-one mapping between this indicator and Pearson’s correlation (see Ekstrom (2011) for an

insightful discussion of both indicators and the historical debate around them). Broadly speaking,

the correlations point towards a connection between the downside indicator and various volatility

measures (not just the VIX but also several others). This finding is consistent with a vast litera-

ture on asset pricing, which shows that volatility risk is a priced risk factor (see, inter alia, Ang,

Hodrick, Xing, and Zhang (2006), Ang, Hodrick, Xing, and Zhang (2009), and Menkhoff, Sarno,

Schmeling, and Schrimpf (2012)). Note however, that the correlation is imperfect and, generally,

it revolves around 0.5; this indicates that the determinants of the threshold go beyond volatility.

Indeed, the correlation with NBER is also significant and positive, but again around 0.5. One

exception is the correlation with the term spread at different maturities, which is statistically

insignificantly different from zero.17

17Generally speaking, an interesting question is what drives the value of θ, and the model could allow for θ to

35

We further run inference on the threshold parameter. We first conduct a test, whose details

are spelt out in Section 5.3.3, for the null hypothesis H0 : θ = θ: this allows us to formally assess

whether the choice of the fixed threshold θ is supported by the data. As can be seen from Table 2,

the estimated threshold θ is significantly different from θ: this provides evidence in favour of our

approach that allows to estimate the timing of the downside regime. In addition to testing for the

significance of the difference between θ and θ, we use the same testing approach to test for the null

that the estimated thresholds are the same across the two datasets: the results in Table 2 show

that the estimated threshold is essentially the same across the two datasets we consider.

7.2.2 Estimating the number of common factors

Results are in Table 3.18 We estimate the number of common factors for three cases: the

whole dataset (this case is denoted henceforth without using any subscripts), the “satisfactory”

regime j =S , and the downside regime corresponding to the “disappointing event” j =D. In light

of the results in Section 7.2.2, we only report results using θ. Alongside the estimated number of

common factors, we evaluate the proportion of the total variance explained by each factor (and

the cumulative one). Letting g (i )j denote, as above, the i -th largest eigenvalue for each of the

two regimes j =D,S (and, similarly, using g (i ) for the whole sample), the proportion of the total

variance associated with the i -th common factor is measured as

ν(i )j =

g (i )j∑N

i=1 g (i )j

, (7.1)

with ν(i ) defined similarly. The cumulative percentage of the total variance explained by P j

estimated factors, is given by

ν j =∑P j

i=1 g (i )j∑N

i=1 g (i )j

, (7.2)

be endogenously determined by some state variables. We leave this generalization for future research.18Given the results from our simulations, we choose M = N ; we point out that, similarly to the simulations,

we assess the robustness of our results by using also M = bN /2c and M = 2N , but recorded no changes, at leastqualitatively.

36

with ν defined analogously for the whole sample.

The most striking finding arising from the results in Table 3 is the discrepancy between the

number of estimated common factors in the two regimes. First, PD is always smaller than PS .

Second, PD = 1 or 2 at most. The latter occurs the case when using the dataset of Lettau,

Maggiori, and Weber (2014); in such a case, we know from our simulations that the number of

common factors may be overstated when PD = 1, and therefore the determination of the number

of common factors may be done with considerations beyond the test. However, the percentage of

the variance explained by each eigenvalue in the downside state seems to suggest the presence of

two common factors, as opposed to only one. Even in this case, though, the first common factor

explains more than five times as much variance as the second one, whereas such discrepancy is

far more attenuated in regime S . Third, whilst the number of common factors decreases from

regime S to regime D, the proportion of the variance explained by the common factors in the

latter case is invariably higher than in the former case; this is even more remarkable when PD = 1

for the equity returns dataset, which suggests that the only factor driving the data in regime D

is more pervasive than the whole factor structure in regime S . Fourth, applying the test to the

unconditional, linear model would lead to overestimation of the number of factors (P =6 or 4 in

these two datasets).

These findings clearly suggest that it is important to use an estimation method which allows

for a different number of factors in each regime. From a statistical point of view, it is natural that

removing a restriction results in better inference - in this case, the number of common factors

is neither understated (which would correspond to an omitted variable problem) nor overstated

(which would be tantamount to including irrelevant variables). Also, from an empirical viewpoint,

results clearly suggest that restricting the number of common factors to be the same across regimes

is not an adequate choice. We point out that a related finding when applying our methodology

is that the data tend to be driven by few(er) common factors during bad times, which entails

that dependence among assets increases in a downturn. In turn, this result is intuitively clear and

consistent with the anecdotal evidence that factor diversification tends to disappear when needed

37

most (see e.g. Page and Panariello (2018) and the references therein).19

7.3 Model fit

In this section, we consider the performance of our latent factor model with downside risk,

and compare it with a linear specification, in the same vein as the approach followed, e.g., by

Lettau, Maggiori, and Weber (2014).

From these results, we can see that the conditional model performs well, with a very high

R2 both in the S regime and in the D regime, as well as for the full sample; similar comments

apply to the RMSPE. The results also show that performance of the linear, unconditional model

over the full sample is quite satisfactory with an R2 in the range between 0.925 and 0.978 (when

the number of common factors set to P = P), and relatively low RMSPE. However, we know from

earlier evidence that the number of latent factors that matter for pricing the cross section of

asset returns in the downside regime is much smaller (1 or at most 2) than the number of factors

required in the S regime. Given that the data experience this strong regime dependence, it seems

surprising that the unconditional, regime-independent pricing model performs as well as it does.

To better understand the gain from using a conditional asset pricing that allows for downside

risk, it is therefore useful to check the performance of the unconditional model for observations

in the S and D regimes separately. The results in the table show that the performance of the

unconditional model is very poor in both the D and S regimes, with very low explanatory power

and large RMSPEs. This suggests that the seemingly good performance of the unconditional

model over the full sample is illusory, in the sense that it comes from somehow combining large

pricing errors from two regimes which generate small pricing errors on aggregate, over the full

sample.

The latter point can be seen clearly in Figure 1, where we report - for each of the three

sets of test assets examined - scatter plots of realized versus model-implied returns, for both the

19This result is also consistent with the literature that shows that, in crisis times, the macroeconomy is driven bya smaller number of common factors (we refer, inter alia, to Stock and Watson (2009), Cheng, Liao, and Schorfheide(2016) and Li, Todorov, Tauchen, and Lin (2017)).

38

unconditional model, and the conditional model under three different cases: the D regime, the S

regime, and the full sample. This figure makes it apparent how the unconditional model performs

very poorly in the downside regime, with virtually no explanatory power and large positive errors

- i.e., the unconditional model predicts much higher returns than the realized ones, which tend

to be very negative in this regime. Hence, the unconditional model is not able to capture such

extreme, bad events. The conditional model performs much better, and there appears to be no

systematic tendency to generate positive or negative errors in the downside regime D. Turning to

the S regime, the conditional model performs very well, with the majority of test asset returns

lying close to, or on, the 45 degree line. In contrast, the returns implied by the unconditional

model display a monotonic pattern (hence are highly positively correlated) with realized returns

in the S regime, but do not lie on the 45 degree line for any of the test asset returns. This

means that, even in the S regime, the unconditional model has a tendency to generate systematic

(non-random) pricing errors; however, these pricing errors have the opposite sign than in the D

regime, since the unconditional model predicts lower returns than realized in the S regime. Over

the full sample, the averaging across the errors of these two regimes - large positive errors for

a small number of observations in the D regimes and more moderate negative pricing errors for

a large number of observations in the S regime - gives the appearance that the unconditional

model performs well, but this is of course illusory: the unconditional model performs poorly in

both regimes.

The analysis above clarifies the gain from using a conditional latent factor model of asset

pricing that allows for different regimes, and hence different sets of factors in good and bad times.

The pricing model is very different in these two regimes, capturing the stark difference in the time-

varying factor structure of asset returns, which is very low-dimensional in bad times, and requires

more factors in good times. A model that does not allow for this feature, which is prominent

across all three data sets examined here, can spuriously generate a strong correlation between

model-implied and realized returns over the full sample, but in fact it performs poorly in both

good and bad times. At an applied level, the conditional model is consistent with the notion that

asset returns become very highly correlated in bad times and their factor structure reduces in

39

dimensionality to one or two factors, and this feature is key for the model to generate predicted

returns that capture downside risk.

7.4 Risk premia

Table 5 displays the results from estimation of the risk premia obtained from the three-pass

procedure detailed in Section 5. We consider the following set of observable pricing factors: the

log-return on the market (ln(Rm), namely the measure of overall equity market performance used

in Farago and Tedongap (2018); the four additional factors that complete the Fama and French

(2015) five-factor model, that is size (SMB), value (HML), operating profitability (RMW), and

investment (CMA); the momentum factor (MOM), taken from Carhart (1997), whose importance

is further stressed in Asness et al. (2013); the liquidity factor (LIQ) from Pastor and Stambaugh

(2003); the spread between the 10 and the 2 year zero-coupon Treasury rate (T10Y2Y), which

captures the information stemming from the yield curve.20 For equity and multi-asset portfolios,

we perform our analysis with respect to conditional and unconditional specifications.

Starting from equity portfolios, in the case of the unconditional model HML, MOM and, to

a lesser extent, ln(Rm) and CMA, generate positive and statistically significant risk premia. The

unconditional model however hides highly asymmetric dynamics between the regimes: in the oc-

currence of the downside event, ln(Rm) and SMB generate excess returns that are negative and

strongly statistically significant; in tranquil times a higher number of factors, namely ln(Rm),

SMB, MOM and LIQ, are positively priced. Moving to multi-asset portfolios, the unconditional

model does not identify any statistically significant risk premium. However, the conditional mod-

els provide a more informative picture: the risk premia of ln(Rm), SMB and, marginally, RMW

are negative and significant in downside periods; ln(Rm), SMB, HML, RMW and CMA, more

weakly contribute to the excess returns of the test assets in tranquil times.

20The series for ln(Rm), SMB, HML, RMW, CMA and MOM are available from Kenneth French website: seefootnote 14. LIQ is available from Lubos Pastor’s website. T10Y2Y is built from the series of 10 and 2 year Treasuryrates from Sarno et al. (2007) from July 1963 to December 2003, and from the updated dataset of Gurkaynak et al.(2007), as available from Refet Gurkaynak website, from January 2004 to December 2018.

40

To gain further insights we study the log-market return more into details. The estimator for

the risk premium of ln(Rm) obtained from the unconditional model is statistically insignificant for

both sets of test assets. Turning to the conditional model, we can proxy the regime probabilities

with their relative number of observations in the two states and calculate the overall risk premium

as the weighted average of the risk premia of ln(Rm): this is equal to 0.55% and 0.25% on a

monthly basis for equity and multi-asset portfolios, respectively, which correspond to annualized

values of 6.60% and 3.05%. These are sizeable and greater than zero, which shows that the market

factor is indeed priced within the whole time period we consider. The risk premium estimated

from equity portfolios is higher than that from the set of multiple assets: in the latter case the

estimated standard error when the downside event occurs is sizeable and the corresponding esti-

mate for the risk premium is likely to suffer from a loss in precision; this may be due to the fact

that the available panel is of limited size.

Finally, it is worth noting that the conditional model is estimated with an intercept, which is

also allowed to vary across the two states. The bottom panel of Table 5 reports the estimates of

the intercepts for both datasets. While the intercepts can of course be different from zero in each

regime and indeed they are, one would expect the unconditional (probability-weighted) intercept

to be small or indeed equal zero in a well-specified model of asset prices. This is indeed the case

for both datasets, as the 95% confidence intervals of the unconditional intercepts cleary include

zero.

8 Conclusions

In this paper, we propose a methodology to price the cross section of asset returns in the

presence of common (possibly latent) factors and downside risk. Both features have been shown

to offer superior explanatory power in several empirical papers (Lettau, Maggiori, and Weber

(2014)). We extend the framework used in the extant literature in at least three ways: we allow

41

for downside risk via a threshold specification which allows for the estimation of the (usually set

a priori) threshold level; we consider different factor structures in the two regimes, allowing also

for different numbers of factors; we adapt the methodology developed by Giglio and Xiu (2020)

to recover the observable factors risk premia from the estimated latent ones in both regimes. We

point out that, although our theory is based on a threshold model and strong, or pervasive, factors,

it can be readily generalised to the case of structural breaks and/or of weak common factors.

Our methodology is illustrated through three applications to a low dimensional, a medium

sized, and a large datasets. In all cases, we show that: estimating, rather than setting a priori, the

threshold level improves goodness of fit, also determining substantial differences in the estimation

of risk premia with respect to an exogenously fixed level; estimating the number of common factors

separately between regimes yields very different estimates between regimes, with the downside risk

one having a small number of common factors (one or two at most), whilst returns in the “normal”

regime are driven by substantially more common factors; estimating the risk premia of classically

employed observable factors is more accurate and yields estimates with the expected sign.

42

References

Ang, A., J. Chen, and Y. Xing (2006). Downside risk. The Review of Financial Studies 19(4), 1191–1239.

Ang, A., R. J. Hodrick, Y. Xing, and X. Zhang (2006). The cross-section of volatility and expectedreturns. The Journal of Finance 61(1), 259–299.

Ang, A., R. J. Hodrick, Y. Xing, and X. Zhang (2009). High idiosyncratic volatility and low returns:International and further us evidence. Journal of Financial Economics 91(1), 1–23.

Asness, C. S., T. J. Moskowitz, and L. H. Pedersen (2013). Value and momentum everywhere. TheJournal of Finance 68(3), 929–985.

Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71, 135–171.

Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Economet-rica 70, 191–221.

Bai, Z. and J. Yao (2012). On sample eigenvalues in a generalized spiked population model. Journal ofMultivariate Analysis 106, 167–177.

Bai, Z. and J.-f. Yao (2008). Central limit theorems for eigenvalues in a spiked population model. InAnnales de l’IHP Probabilites et statistiques, Volume 44, pp. 447–474.

Baker, S. R., N. Bloom, and S. J. Davis (2016). Measuring economic policy uncertainty. The QuarterlyJournal of Economics 131(4), 1593–1636.

Baker, S. R., N. Bloom, S. J. Davis, and K. J. Kost (2019). Policy news and stock market volatility.Technical report, National Bureau of Economic Research.

Barigozzi, M. and L. Trapani (2018). Determining the dimension of factor structures in non-stationarylarge datasets. arXiv preprint arXiv:1806.03647.

Barro, R. J. and G. Y. Liao (2020). Rare disaster probability and options pricing. Journal of FinancialEconomics. forthcoming.

Bates, B. J., M. Plagborg-Møller, J. H. Stock, and M. W. Watson (2013). Consistent factor estimationin dynamic factor models with structural instability. Journal of Econometrics 177, 289–304.

Caldara, D. and M. Iacoviello (2018). Measuring geopolitical risk. FRB International Finance DiscussionPaper (1222).

Cappiello, L., R. F. Engle, and K. Sheppard (2006). Asymmetric dynamics in the correlations of globalequity and bond returns. Journal of Financial econometrics 4(4), 537–572.

Carhart, M. M. (1997). On persistence in mutual fund performance. The journal of finance 52(1), 57–82.

Cheng, X., Z. Liao, and F. Schorfheide (2016). Shrinkage estimation of high-dimensional factor modelswith structural instabilities. The Review of Economic Studies 83, 1511–1543.

Cochrane, J. H. (2011). Presidential address: Discount rates. The Journal of finance 66(4), 1047–1108.

Davidson, J. (1994). Stochastic limit theory: An introduction for econometricians. OUP Oxford.

43

Ekstrom, J. (2011). The phi-coefficient, the tetrachoric correlation coefficient, and the pearson-yuledebate.

Fama, E. F. and K. D. French (2015). A five-factor asset pricing model. Journal of financial eco-nomics 116(1), 1–22.

Fama, E. F. and J. D. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. Journal ofpolitical economy 81(3), 607–636.

Fan, J., Y. Liao, and J. Yao (2013). Large panel test of factor pricing models. arXiv preprintarXiv:1310.3899.

Farago, A. and R. Tedongap (2018). Downside risks and the cross-section of asset returns. Journal ofFinancial Economics 129(1), 69–86.

Giglio, S. and D. Xiu (2020). Asset pricing with omitted factors. Journal of Political Economy. forth-coming.

Gurkaynak, R. S., B. Sack, and W. J. H (2007). The u.s. treasury yield curve: 1961 to the present.Journal of Monetary Economics 54(8), 2291–2304.

Hansen, B. E. (2000). Sample splitting and threshold estimation. Econometrica 68(3), 575–603.

Harvey, C. R. and Y. Liu (2020). A census of the factor zoo. mimeo.

Horvath, L. and L. Trapani (2019). Testing for randomness in a random coefficient autoregression model.Journal of Econometrics 209(2), 338–352.

Lettau, M., M. Maggiori, and M. Weber (2014). Conditional risk premia in currency markets and otherasset classes. Journal of Financial Economics 114(2), 197–225.

Lettau, M. and M. Pelger (2020). Estimating latent asset-pricing factors. Journal of Econometrics.

Lewellen, J., S. Nagel, and J. Shanken (2010). A skeptical appraisal of asset pricing tests. Journal ofFinancial economics 96(2), 175–194.

Li, J., V. Todorov, G. Tauchen, and H. Lin (2017). Rank tests at jump events. Journal of Business &Economic Statistics. forthcoming.

Markovitz, H. M. (1959). Portfolio selection: Efficient diversification of investments. John Wiley.

Massacci, D. (2017). Least squares estimation of large dimensional threshold factor models. Journal ofEconometrics 197(1), 101–129.

McLeish, D. L. et al. (1974). Dependent central limit theorems and invariance principles. the Annals ofProbability 2(4), 620–628.

Menkhoff, L., L. Sarno, M. Schmeling, and A. Schrimpf (2012). Carry trades and global foreign exchangevolatility. The Journal of Finance 67(2), 681–718.

Merikoski, J. K. and R. Kumar (2004). Inequalities for spreads of matrix sums and products. AppliedMathematics E-Notes 4, 150–159.

44

Moricz, F. (1983). A general moment inequality for the maximum of the rectangular partial sums ofmultiple series. Acta Mathematica Hungarica 41, 337–346.

Page, S. and R. A. Panariello (2018). When diversification fails. Financial Analysts Journal 74(3), 19–32.

Pastor, L. and R. F. Stambaugh (2003). Liquidity risk and expected stock returns. Journal of Politicaleconomy 111(3), 642–685.

Peligrad, M. et al. (1987). On the central limit theorem for ρ-mixing sequences of random variables. TheAnnals of Probability 15(4), 1387–1394.

Peligrad, M., S. Utev, and W. Wu (2007). A maximal lp-inequality for stationary sequences and itsapplications. Proceedings of the American Mathematical Society 135(2), 541–550.

Pesaran, M. H. and T. Yamagata (2012). Testing capm with a large number of assets. In AFA 2013 SanDiego Meetings Paper.

Salzer, H. E., R. Zucker, and R. Capuano (1952). Table of the zeros and weight factors of the first twentyHermite polynomials. US Government Printing Office.

Sarno, L., D. L. Thornton, and G. Valente (2007). The empirical failure of the expectations hypothesisof the term structure of bond yields. Journal of Financial and Quantitative Analysis 42(1), 81–100.

Schwert, G. W. (1989). Why does stock market volatility change over time? The Journal of Finance 44(5),1115–1153.

Shao, Q.-M. (1995). Maximal inequalities for partial sums of ρ-mixing sequences. The Annals of Proba-bility, 948–965.

Stock, J. H. and M. W. Watson (2009). Forecasting in dynamic factor models subject to structuralinstability. In J. Castle and N. Shephard (Eds.), The Methodology and Practice of Econometrics: AFestschrift in Honour of David F. Hendry. Oxford University Press.

Trapani, L. (2018). A randomized sequential procedure to determine the number of factors. Journal ofthe American Statistical Association 113(523), 1341–1349.

Uematsu, Y. and T. Yamagata (2019). Estimation of weak factor models. ISER DP (1053).

Wang, W. and J. Fan (2017). Asymptotics of empirical eigenstructure for high dimensional spikedcovariance. Annals of statistics 45(3), 1342.

Yang, F. (2013). Investment shocks and the commodity basis spread. Journal of Financial Eco-nomics 110(1), 164–184.

45

A Technical Lemmas

We begin with a Borel-Cantelli type result, whose proof can be found in Barigozzi and Trapani

(2018).

Lemma A.1. Consider a multi-index random variable Ui1,...,ih , with 1 ≤ i1 ≤ S1, 1 ≤ i2 ≤ S2, etc...

Assume that ∑S1

· ·∑Sh

1

S1 · ... ·ShP

(max

1≤i1≤S1,...,1≤ih≤Sh

∣∣Ui1,...,ih

∣∣> εLS1,...,Sh

)<∞, (A.1)

for some ε> 0 and a sequence LS1,...,Sh defined as

LS1,...,Sh = Sd11 · ... ·Sdh

h l1 (S1) · ...lh (Sh) ,

where d1, d2, etc. are non-negative numbers and l1 (·), l2 (·), etc. are slowly varying functions in

the sense of Karamata. Then it holds that

lim sup(S1,...,Sh )→∞

∣∣US1,...,Sh

∣∣LS1,...,Sh

= 0 a.s. (A.2)

Henceforth, we will extensively use the following notation:

v N ,T (ε) = (ln N )2+ε (lnT )2+ε , (A.3)

where ε> 0. Recall (4.3)

S(BP ,UP ,θ

)= 1

N T

N∑i=1

T∑t=1

[Ri ,t −

(β

PD ′D,i uPD

D,t dD,t (θ)+βPS ′S ,i uPS

S ,t dS ,t (θ))]2

,

which is the least squares loss function calculated at the triplet(BP ,UP ,θ

); similarly, we de-

fine

SUB (θ) = 1

N T

N∑i=1

T∑t=1

[Ri ,t −

(β

P 0D′

D,i uP 0

D

D,t dD,t (θ)+βP 0S′

S ,i uP 0

S

S ,t dS ,t (θ)

)]2

, (A.4)

which denotes the concentrated version of S(BP ,UP ,θ

)when using the correct number of factors

P 0D

and P 0S

. Finally, when using PD and PS factors, with P 0D≤ PD ≤ Pmax and P 0

S≤ PS ≤ Pmax,

we write

S(P )uB (θ) = 1

N T

N∑i=1

T∑t=1

[Ri ,t −

(β

PD ′D,i uPD

D,t dD,t (θ)+ βPS ′S ,i uPS

S ,t dS ,t (θ))]2

, (A.5)

46

to define the concentrated version of S(BP ,UP ,θ

)in that case. We will also use the Pm ×P j

projection matrices

HP j ,m

j m (θ) =U0

m,m

(θ0

)U0

m, j (θ)′

T

B0′mB

P j

j (θ)

N

(V

P j

j (θ))−1

, (A.6)

where j ,m = D,S , U0m, j (θ) is defined as the Pm ×T matrix of regime-specific factors, Um (θ),

multiplied by a T ×T diagonal matrix whose entries are d j ,t (θ), 1 ≤ t ≤ T ; finally, VP j

j (θ) is a

P j ×P j matrix containing the largest P j eigenvalues of

1

N T

T∑t=1

Rt R′t d j ,t (θ) .

Note that, for j 6= m, HP j m

j m

(θ0

)reduces to a matrix of zeros. In order to make the notation lighter,

the superscript P j ,m is set equal to P j when j = m.

We now present a few lemmas to show the strong consistency of θ.

Lemma A.2. We assume that Assumptions 1-4 are satisfied. Then it holds that

1

N

N∑i=1

∥∥∥βP j

j ,i

(θ0)− H

P j

j j

(θ0)′β0

j ,i

∥∥∥2 = oa.s.(P j C−2

N ,T v N ,T (ε))

,

for every ε> 0 and j =D,S .

Proof. This follows by adapting the passages in the proof of Theorem 3.2 in Massacci (2017).

Then we have

1

N

N∑i=1

∥∥∥βP j

j ,i

(θ0)− H

P j

j j

(θ0)′β0

j ,i

∥∥∥2(A.7)

≤ 4

N 2

N∑i=1

N∑l=1

σ2j i l

(θ0)+ 1

N

(1

N 2

N∑l=1

N∑q=1

∣∣∣∣∣ N∑i=1

Å j i l(θ0)Å j i q

(θ0)∣∣∣∣∣

2)1/2

+ 2

N

N∑i=1

(∥∥∥β0D,i

∥∥∥2 +∥∥∥β0

S ,i

∥∥∥2)

1

N T 2

N∑l=1

∥∥∥∥∥ T∑t=1

d j ,t(θ0)u0

j ,t el ,t

∥∥∥∥∥2]

×(

1

N

N∑i=1

∥∥∥βP j

j ,i

(θ0)∥∥∥2

)

= AN ,T

(1

N

N∑i=1

∥∥∥βP j

j ,i

(θ0)∥∥∥2

),

47

having defined

σ j i l(θ0)= 1

T

T∑t=1

E(d j ,t

(θ0)ei ,t el ,t

),

Å j i l(θ0)= 1

T

T∑t=1

d j ,t(θ0)ei ,t el ,t −σ j i l

(θ0) .

By the proof of Theorem 3.2 in Massacci (2017), it follows immediately that E∣∣AN ,T

∣∣ ≤ c0C−2N ,T .

Thus, by a similar logic as in the previous passages, it follows that AN ,T = oa.s.

(C−2

N ,T (ln N )2+ε (lnT )2+ε)

for every ε> 0. Also,1

N

N∑i=1

∥∥∥βP j

j ,i

(θ0)∥∥∥2 = P j ,

by construction, since BP j

j

(θ0

)′ BP j

j

(θ0

) = N IP j . The desired result follows by putting everything

together.


∣∣θ−θ0∣∣= oa.s. (1) ,

for all P 0j ≤ P j ≤ Pmax, j =D,S .

Proof. The lemma is a generalisation of Lemma A.9 in Massacci (2017). In particular, the desired

result follows if we show that

S(P )UB (θ)−S(P0)

UB

(θ0)= c0 +oa.s. (1) , (A.8)

with c0 > 0, for every θ 6= θ0 - note that S(P )UB (θ) is defined in (A.5). We being by considering the

identity

S(P )UB (θ)−S(P )

UB

(θ0)=S(P )

UB (θ)−S(P )B

(B0

D H PD

DD(θ)+B0

S HPS ,D

S D(θ) ,B0

D HPD,S

DS(θ)+B0

S H PS

S S(θ) ,θ0

)+S(P )

B

(B0

D H PD

DD(θ)+B0

S HPS ,D

S D(θ) ,B0

D HPD,S

DS(θ)+B0

S H PS

S S(θ) ,θ0

)−S(P )

B

(B0

D H PD

DD(θ) ,B0

S H PS

S S(θ) ,θ0

)+S(P )

B

(B0

D H PD

DD(θ) ,B0

S H PS

S S(θ) ,θ0

)−S(P )

UB

(θ0) ,

where S(P )B denotes the loss function S

(BP ,UP ,θ

)concentrated at BP . By construction, it holds

that

S(P )B

(B0

D H PD

DD(θ) ,B0

S H PS

S S(θ) ,θ0

)= S(P )

B

(B0

D ,B0S ,θ0) ,

48

with

S(P )B

(B0

D ,B0S ,θ0)−S(P )

UB

(θ0)> 0.

Following the proof of Lemma A.2 and Theorem 3.3 in Massacci (2017), it can be shown by

standard arguments that

E∣∣∣S(P )

UB (θ)−S(P )B

(B0

D H PD

DD(θ)+B0

S HPS ,D

S D(θ) ,B0

D HPD,S

DS(θ)+B0

S H PS

S S(θ) ,θ0

)∣∣∣2 ≤ c0C−2N ,T ;

then, by the maximal inequality for multi-parameter processes in Moricz (1983), it follows that

E max1≤t≤T,1≤i≤N

∣∣∣S(P )UB,i t (θ)−S(P )

B,i t

(B0

D H PD

DD(θ)+B0

S HPS ,D

S D(θ) ,B0

1HPD,S

DS(θ)+B0

S H PS

S S(θ) ,θ0

)∣∣∣2

≤ c0C−2N ,T (ln N ) (lnT ) ,

where

S(P )UB,i t (θ) = 1

N T

i∑i ′=1

t∑t ′=1

[Ri ,t −

(β′

D,i uD,t dD,t (θ)+ β′S ,i uS ,t dS ,t (θ)

)]2.

Hence, Lemma A.1 yields


B

(B0

D H PD

DD(θ)+B0

S HPS ,D

S D(θ) ,B0

D HPD,S

DS(θ)+B0

S H PS

S S(θ) ,θ0

)= oa.s.

(C−1

N ,T (ln N )1+ε (lnT )1+ε) .

Similarly, by using Lemma A.3 in Massacci (2017) and the same arguments as above, it follows

that there exists a c0 > 0 such that

S(P )B

(B0

D H PD

DD(θ)+B0

S HP2,D

S D(θ) ,B0

D HPD,S

DS(θ)+B0

S H PS

S S(θ) ,θ0

)−S(P )

B

(B0

D ,B0S ,θ0)= c0 +oa.s. (1) .

Putting everything together, we have


UB

(θ0)= c0 +oa.s. (1) .

Then (A.8) follows if we show that

S(P )UB

(θ0)−S(P0)

UB

(θ0)= oa.s. (1) . (A.9)

It holds that ∣∣∣S(P )UB

(θ0)−S(P0)

UB

(θ0)∣∣∣≤ ∣∣S1

UB

(θ0)∣∣+ ∣∣S2

UB

(θ0)∣∣ ,

49

where

S1UB

(θ0)= 2

N T

T∑t=1

(dD,t

(θ0)u0′

D,t H PD+DD

(θ0)′ (BPD

D

(θ0)−B0

D H PD

DD

(θ0))′

+dS ,t(θ0)u0′

S ,t H PS +S S

(θ0)′ (BPS

S

(θ0)−B0

S H PS

S S

(θ0))′)et ,

S2UB

(θ0)= 1

N T

T∑t=1

(dD,t

(θ0)u0′

D,t H PD+DD

(θ0)′ (BPD

D

(θ0)−B0

D H PD

DD

(θ0))′

×(BPD

D

(θ0)−B0

D H PD

DD

(θ0)) H PD+

DD

(θ0)u0

D,t

+dS ,t(θ0)u0′

S ,t H PS +S S

(θ0)′ (BPS

S

(θ0)−B0

S H PS

S S

(θ0))′

+(BPS

S

(θ0)−B0

S H PS

S S

(θ0)) H PS +

S S

(θ0)u0

2,t

).

We now estimate the order of magnitude of S1UB

(θ0

)and S2

UB

(θ0

). We have

∣∣S1UB

(θ0)∣∣≤ 2T −1/2P 0

D

∥∥∥H PD+DD

(θ0)∥∥∥(

1

N

N∑i=1

∥∥∥βPD

D,i

(θ0)− H PD

DD

(θ0)′β0

D,i

∥∥∥2)1/2

(1

N

N∑i=1

∥∥∥∥∥ 1pT

T∑t=1

dD,t(θ0)u0

D,t ei ,t

∥∥∥∥∥2)1/2

+2T −1/2P 0S

∥∥∥H PS +S S

(θ0)∥∥∥(

1

N

N∑i=1

∥∥∥βPS

S ,i

(θ0)− H PS

S S

(θ0)′β0

S ,i

∥∥∥2)1/2

(1

N

N∑i=1

∥∥∥∥∥ 1pT

T∑t=1

dS ,t(θ0)u0

S ,t ei ,t

∥∥∥∥∥2)1/2

.

By construction, note that ∥∥∥HP j+j j

(θ0)∥∥∥≤ c0P 1/2

j (A.10)

for j = D,S . Also, Assumption 4(v) entails that, by the same logic as in the previous passages

that1

N

N∑i=1

∥∥∥∥∥ 1pT

T∑t=1

d j ,t(θ0)u0

j ,t ei ,t

∥∥∥∥∥2

= oa.s.(v N ,T (ε)

),

for every ε> 0 and j =D,S . Thus, using Lemma A.2, we finally have

∣∣S1UB

(θ0)∣∣= oa.s.

(PC−1

N ,T T −1/2v N ,T (ε))

,

50

having set P = maxPD ,PS ; this term is therefore oa.s. (1) by (4.22). Similarly, it holds that (see

Massacci (2017))

∣∣S2UB

(θ0)∣∣≤ 1

T

T∑t=1

∥∥∥u0D,t

∥∥∥2 ∥∥∥H PD+DD

(θ0)∥∥∥2 1

N

N∑i=1

∥∥∥βPD

D,i

(θ0)− H PD

DD

(θ0)′β0

D,i

∥∥∥2

+ 1

T

T∑t=1

∥∥∥u0S ,t

∥∥∥2 ∥∥∥H PS +S S

(θ0)∥∥∥2 1

N

N∑i=1

∥∥∥βPS

S ,i

(θ0)− H PS

S S

(θ0)′β0

S ,i

∥∥∥2.

Using (A.10) and Lemma A.2, and noting that Assumptions 3(i) and 5(i) entail

1

T

T∑t=1

∥∥∥u0j ,t

∥∥∥2 =Oa.s. (1) ,

for j =D,S , we have ∣∣S2UB

(θ0)∣∣= oa.s.

(P

2C−2

N ,T v N ,T (ε))

,

which again is oa.s. (1) by (4.22). Thus, (A.9) follows. Putting all together, the desired result

obtains.

In the next two lemmas, we define the set

BN ,T =(

v N ,T (ε)

NηT,c0

), (A.11)

where 0 < c0 <∞ and v N ,T is defined in (A.3), and the functions

ω0 (η,θ

)= 1

NηT

N∑i=1

T∑t=1

∣∣dS ,t (θ)−dS ,t(θ0)∣∣(δ0′

i u0t

)2, (A.12)

h0 (η,θ

)= 1

NηT

N∑i=1

T∑t=1

dS ,t (θ)δ0′i u0

t ei ,t , (A.13)

where u0t and δ0

i are defined in (4.1) and (4.2) respectively.

Lemma A.4. We assume that Assumptions 3 and 5 are satisfied. Then it holds that

ω0(η,θ

)∣∣θ−θ0∣∣ ≥ c0 +oa.s. (1) ,

for∣∣θ−θ0

∣∣ ∈ BN ,T and 0 < c0 <∞.

51

Proof. Define

ω0N ,T =ω0 (

η,θ)−Eω0 (

η,θ)

.

By equation (35) in Massacci (2017) we have E∣∣ω0

N ,T

∣∣2 ≤ c0N−ηT −1∣∣θ−θ0

∣∣, and thus by the maxi-

mal inequality for rectangular sums (Moricz (1983)), Markov inequality and Lemma A.1, it follows

that there exist two random variables N0, T0 such that for N ≥ N0 and T ≥ T0 and every 0 < ε′ < ε2 ,

with ε defined in (A.3)

ω0N ,T ≤ c0N−η/2T −1/2

∣∣θ−θ0∣∣1/2

(ln N )1+ε′ (lnT )1+ε′ . (A.14)

Consider nowω0

(η,θ

)∣∣θ−θ0∣∣ = Eω0

(η,θ

)∣∣θ−θ0∣∣ + ω0

N ,T∣∣θ−θ0∣∣ .

Equation (33) in Massacci (2017) stipulates that

inf|θ−θ0|≤c0

Eω0 (η,θ

)> c0∣∣θ−θ0

∣∣ . (A.15)

Also, using (A.14) and recalling that, by the definition of BN ,T ,∣∣θ−θ0

∣∣ ≥ v N ,T (ε)NηT , it follows that

there exist two random variables N0, T0 such that for N ≥ N0 and T ≥ T0

ω0N ,T∣∣θ−θ0

∣∣ ≤ c0(ln N )1+ε′ (lnT )1+ε′

v1/2N ,T (ε)

= oa.s. (1) . (A.16)

Combining (A.15) with (A.16), we obtain the desired result.

Lemma A.5. We assume that Assumptions 3-5 are satisfied. Then it holds that there exist two

random variables N0, T0 and a constant 0 < c0 <∞ such that for N ≥ N0 and T ≥ T0∣∣h0(η,θ

)−h0(η,θ0

)∣∣∣∣θ−θ0∣∣ ≤ c0v−1

N ,T (ε) ,

for∣∣θ−θ0

∣∣ ∈ BN ,T .

Proof. Consider

NηT(h0 (

η,θ)−h0 (

η,θ0))= N∑i=1

T∑t=1

(dS ,t (θ)−dS ,t

(θ0))δ0′

i u0t ei ,t

=Nη∑i=1

T∑t=1

(dS ,t (θ)−dS ,t

(θ0))δ0′

i u0t ei ,t .

52

By Assumptions 3(i), 3(iii), 4(i) and 5(i), and by repeated use of the Cauchy-Schwartz inequality,

it follows immediately that

E∣∣NηT

(h0 (

η,θ)−h0 (

η,θ0))∣∣≤ c0NηT max

1≤i≤N

∥∥δ0i

∥∥2(E

∣∣dS ,t (θ)−dS ,t(θ0)∣∣2

)1/2(E

∥∥u0t

∥∥4max

1≤i≤NE

∥∥ei ,t∥∥4

)1/4

≤ c0NηT∣∣θ−θ0

∣∣ ,

so that by the same logic as above we have∣∣h0 (η,θ

)−h0 (η,θ0)∣∣= oa.s.

(N−η/2T −1/2

∣∣θ−θ0∣∣1/2

(ln N )1+ε (lnT )1+ε)

.

Thus, recalling that∣∣θ−θ0

∣∣ ∈ BN ,T∣∣h0(η,θ

)−h0(η,θ0

)∣∣∣∣θ−θ0∣∣ ≤ c0

Nη/2T 1/2

v1/2N ,T (ε)

N−η/2T −1/2 (ln N )1+ε (lnT )1+ε = oa.s. (1) .

Lemma A.6. We assume that Assumptions 1-4 are satisfied. Then it holds that, for every P j ≥ P 0j∥∥∥βP j ′

j ,i u j ,t −β0′j ,i u0

j ,t

∥∥∥= oa.s. (1) , (A.17)

for j =D,S and all 1 ≤ i ≤ N , 1 ≤ t ≤ T . Also, S(BP ,UP ,θ

)is continuous in (B,U).

Proof. We omit rotation matrices for the sake of the notation. By using the results in the proof

of Lemma A.8 in Massacci (2017), it can be shown that E∥∥∥βP j

j ,i − HP j

j j

(θ0

)β0

j ,i

∥∥∥2 ≤ c0C−2N ,T . Thus,

using the same logic as above, Lemma A.1 entails that for every(i , j , t

),

∥∥∥βP j

j ,i − HP j

j j

(θ0

)β0

j ,i

∥∥∥ =oa.s.

(C−1

N ,T (ln N )1+ε (lnT )1+ε). Similarly, by Corollary 3.1 in Massacci (2017), it can be shown that∥∥u j ,t −u j ,t

∥∥ = oa.s.((ln N )1+ε (lnT )1+ε). Equation (A.17) now follows immediately. Further, as a

consequence we can assume that

limN ,T→∞

βP j ′j ,i u j ,t =β0′

j ,i u0j ,t . (A.18)

Note now that by Taylor’s expansion, there exists an S′ such that

S(BP ,UP ,θ

)= S(B0,U0,θ

)+S′ (BP UP −B0U0) .

Given that, by (A.18), we can assume∣∣S′∣∣ < ∞, we immediately get limN ,T→∞ S

(BP ,UP ,θ

) =S

(B0,U0,θ

), which proves the last statement of the lemma.

53

We are now ready to derive a strong rate for θ.


θ−θ0 = oa.s.(N−ηT −1v N ,T (ε)

),

for every ε> 0.

Proof. We know from Lemma A.3 that∣∣θ−θ0

∣∣ > c0 is an event which happens with probability

zero, and we can therefore only focus on the case∣∣θ−θ0

∣∣ < c0. We now show, by contradiction,

that∣∣θ−θ0

∣∣ ∉ BN ,T . It is easy to see that as (N ,T ) →∞

S(BP ,UP ,θ

)−S(BP ,UP ,θ0

)∣∣θ−θ0∣∣ ≥ inf

BN ,T

ω0(η,θ

)∣∣θ−θ0∣∣ −2 sup

BN ,T

∣∣h0(η,θ

)−h0(η,θ0

)∣∣∣∣θ−θ0∣∣ > 0 a.s.,

by using Lemmas A.4 and A.5. However, by extending the proof of Theorem 3.3 in Massacci

(2017), it can be shown that S(BP ,UP ,θ

)−S(BP ,UP ,θ0

)≤ 0 a.s., which is a contradiction. Thus,∣∣θ−θ0∣∣ ∈ BN ,T has measure zero, which entails that there exist two random variables N0, T0 and

a constant 0 < c0 <∞ such that for N ≥ N0 and T ≥ T0

∣∣θ−θ0∣∣≤ c0N−ηT −1v N ,T (ε) ,

which proves the lemma.


T j − T j

T j= oa.s.

(N−ηT −1

j v N ,T (ε))

,

for j =D,S and every ε> 0.

Proof. We show the lemma for j =D. Recall that TD =∑Tt=1 dD,t

(θ0

)and TD =∑T

t=1 dD,t . It holds

that

TD − TD ≤T∑

t=1

∣∣dD,t(θ0)− dD,t

∣∣Denoting the density of zt by ft (z) = f (z), Assumption 5(iii) entails

∣∣dD,t −dD,t(θ0)∣∣= ∣∣∣∣∣

∫ θ0

θf (z)d z

∣∣∣∣∣≤ c0∣∣θ−θ0

∣∣ , (A.19)

54

so that

TD − TD ≤ c0T∣∣θ−θ0

∣∣ .

The desired result follows immediately from Lemma A.7.


E∣∣Ri ,t

∣∣4+ε <∞ (A.20)

for some ε> 0, and

E max1≤t≤T

∣∣∣∣ t∑s=1

Rh,sRl ,sdD,s(θ0)−E

[Rh,sRl ,sdD,s

(θ0)]∣∣∣∣2

≤ c1T, (A.21)

E max1≤t≤T

∣∣∣∣ t∑s=1

Rh,sRl ,sdS ,s(θ0)−E

[Rh,sRl ,sdS ,s

(θ0)]∣∣∣∣2

≤ c2T, (A.22)

for all 1 ≤ h, l ≤ N .

Proof. Equation (A.20) is an immediate consequence of Assumptions 3(i) and 4(i). We show only

(A.21); (A.22) follows from exactly the same arguments. To begin with, note that Assumption 5(i)

entails that Rh,sRl ,sd1,s(θ0

)is also a strictly stationary, ρ-mixing sequence with mixing numbers

ρm such that∑∞

m=1ρ1/2m <∞ - this is a consequence of Theorem 14.1 in Davidson (1994), which is

stated for the cases of strong and uniform mixing but can be readily extended to ρ-mixing. Using

(A.20), it follows that

E∣∣Rh,sRl ,sdD,s

(θ0)∣∣2 ≤

(E

∣∣Rh,s∣∣4

)1/2 (E

∣∣Rl ,s∣∣4

)1/2 <∞.

Note also that∞∑

m=1ρ1/2

m <∞⇒∞∑

m=1

ρm

m<∞⇒

∞∑m=1

ρm(2m)<∞.

Hence, Corollary 1.1 in Shao (1995) and the strict stationarity of Rh,sRl ,sdD,s(θ0

)immediately

yield (A.21).


lim supmin(N ,T )→∞

1

N −p

N∑h=p+1

g (h)j = g j <∞,

lim infmin(N ,T )→∞

1

N −p

N∑h=p+1

g (h)j = g

j> 0,

for j =D,S and every 0 ≤ p ≤ Pmax.

55

Proof. Note that

N∑h=p+1

g (h)j −

N∑h=p+1

g (h)j

=N∑

h=1g (h)

j −N∑

h=1g (h)

j −p∑

h=1g (h)

j +p∑

h=1g (h)

j

= tr(Σ j

)− tr(Σ j

)−(p∑

h=1

(g (h)

j − g (h)j

)).

Using Lemma 1, it is easy to see that

p∑h=1

(g (h)

j − g (h)j

)= oa.s.

(pN T 1/2


). (A.23)

Also, the same passages as in the proof of Lemma A.1 in Trapani (2018) yield

tr(Σ j

)− tr(Σ j

)= oa.s.

(N (ln N lnT )

1+ε2

T 1/2π j

). (A.24)

Thus, recalling that Pmax =O(min

T 1/2−c , N 1/2−c

), it follows that

1

N −p

N∑h=p+1

g (h)j = 1

N −p

N∑h=p+1

g (h)j +oa.s.

((ln N lnT )

1+ε2

T 1/2π j

)= 1

N −p

N∑h=p+1

g (h)j +oa.s. (1) ,

under Assumption 6.

Let now g u,(h)j and g e,(h)

j denote the h-th largest eigenvalues of 1Tπ j

∑Tt−1 E

(B0

j u0j ,t u0′

j ,t B0′j d j ,t

(θ0

))and 1

Tπ j

∑Tt−1 E

(et e′t d j ,t

(θ0

))respectively. By Weyl’s inequality

g e,(N )j + g u,(h)

j ≤ g (h)j ≤ g u,(h)

j + g e,(1)j ,

so that

g e,(N )j + 1

N −p

N∑h=p+1

g u,(h)j ≤ 1

N −p

N∑h=p+1

g (h)j ≤ 1

N −p

N∑h=p+1

g u,(h)j + g e,(1)

j .

By Assumption 2(i)(b), it holds that

g e,(N )j + 1

N −p

N∑h=p+1

g u,(h)j > 0;

also, Assumption 2(i)(a) and (B.1) entail

1

N −p

N∑h=p+1

g u,(h)j + g e,(1)

j <∞.

56

Hence, as N →∞0 < lim inf

N→∞< 1

N −p

N∑h=p+1

g (h)j < lim sup

N→∞<∞.

Putting all together, the lemma follows.

Lemma A.11. We assume that Assumptions 1-5 are satisfied. Then it holds that, for j =D,S

E

∥∥∥∥∥T −1T∑

t=1u0

j ,t d j ,t(θ0)∥∥∥∥∥

2

≤ c0T −1. (A.25)

Further, under Assumption 9(ii), as T →∞

T −1/2T∑

t=1u0

j ,t d j ,t(θ0) D→ N

(0,Vu, j

), (A.26)

where Vu, j = limT→∞ E(T −1 ∑T

t ,s=1 u0j ,t u′0

j ,sd j ,t(θ0

)d j ,s

(θ0

)).

Proof. We begin by showing that

f0j ,t d j ,t

(θ0

)is a ρ-mixing sequence with mixing number ρ j ,m

such that∑∞

m=1 ρ1/2j ,m < ∞. Note that f0

j ,t d j ,t(θ0

)is a measurable transformation of

f0

j ,t , zt ,εt

.

Let now ℑt−∞ and ℑ∞t+m be the σ-fields generated by

f0

j ,t ′ , zt ′ ,εt ′t

t ′=−∞ and

f0j ,t ′ , zt ′ ,εt ′

∞t ′=t+m

respectively, and similarly ℑt−∞ and ℑ∞t+m be the σ-fields generated by

f0

j ,t ′d j ,t ′(θ0

)t

t ′=−∞ andf0

j ,t ′d j ,t ′(θ0

)∞t ′=t+m

respectively. By measurability, ℑt−∞ ⊆ ℑt−∞ and ℑ∞t+m ⊆ ℑ∞

t+m; hence, ρ j ,m =ρ j ,m

(ℑt−∞,ℑ∞t+m

)≤ ρ j ,m = ρ j ,m(ℑt−∞,ℑ∞

t+m

). This therefore entails that

∑∞m=1 ρ

1/2j ,m <∑∞

m=1ρ1/2j ,m <∞

by Assumption 5(i), which in turn entails that∑∞

k=1 ρ j(2k

)<∞.

Note now that E∥∥∥u0

j ,t d j ,t(θ0

)∥∥∥2 ≤ E∥∥∥u0

j ,t

∥∥∥2, which is finite by Assumption 3(i). Thus

E

∥∥∥∥∥ T∑t=1

u0j ,t d j ,t

(θ0)∥∥∥∥∥

2

≤ E max1≤k≤T

∥∥∥∥∥ k∑t=1

u0j ,t d j ,t

(θ0)∥∥∥∥∥

2

≤ c0T,

where the last inequality follows from Lemma 1 in Peligrad, Utev, and Wu (2007). This proves

(A.25). As far as (A.26) is concerned, it follows readily upon checking the assumptions of Theorem

0 in Peligrad et al. (1987).

Lemma A.12. ??We assume that Assumptions 1-8 are satisfied. Then, as min(N ,T j

)→∞, it holds

that

Γ j −(HΓ

j j

(θ0))−1

Γ0j =oP (1)+oP∗ (1) , (A.27)

Λ j −Λ0j H j j

(θ0)=oP (1)+oP∗ (1) , (A.28)

57

for j =D,S , for almost all realizations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T

.

Proof. We present the full version of the proof of (A.27) only; the other two results follow from

similar arguments and, where possible, we omit the details to avoid repetition. We begin by

showing that the estimation error P j −P 0j is negligible. To this end, we assume without loss of

generality that P j ≥ P 0j , and we omit dependence on θ or θ0 whenever possible. Let

HΓP j

j j Γ j = H∗Γj j Γ

∗j + HΓ

j j Γ j ,

where H∗Γj j is

(P 0

j +1)×

(P 0

j +1), Γ∗j is

(P 0

j +1)×1, HΓ

j j is(P 0

j +1)×

(P j −P 0

j

)and Γ j is

(P j −P 0

j

)×1,

defined such that HΓP j

j j =[

H∗Γj j |HΓ

j j

]and Γ j =

(Γ∗′j , Γ′j

)′. We begin by showing that∥∥∥∥H

ΓP j

j j

∥∥∥∥F=OP∗ (1) . (A.29)

Note that ∥∥∥∥HΓP j

j j

∥∥∥∥F≤

∥∥∥∥∥U j(θ0

)U j (θ)′

T

∥∥∥∥∥F

∥∥∥∥∥B′j

(θ0

)B j (θ)

N

∥∥∥∥∥F

∥∥V j (θ)−1∥∥

F ;

the first term is bounded by Assumption 3(i); also note that, using the triangular inequality

∥∥∥∥∥B′j

(θ0

)B j (θ)

N

∥∥∥∥∥F

≤∥∥∥∥∥B′

j

(θ0

)B j

(θ0

)N

∥∥∥∥∥F

+

∥∥∥∥∥∥∥∥B′

j

(B j (θ)−B j

(θ0

)HΓP j

j j

)N

∥∥∥∥∥∥∥∥F

≤∥∥∥∥∥B′

j

(θ0

)B j

(θ0

)N

∥∥∥∥∥F

+∥∥∥∥ B j

N 1/2

∥∥∥∥F

∥∥∥∥∥∥∥B j (θ)−B j

(θ0

)HΓP j

j j

N 1/2

∥∥∥∥∥∥∥F

.

The first term is bounded by virtue of Assumption 3(iv), whereas the second one can be shown

to be dominated by using the same arguments as in footnote 5 in Bai (2003). Finally, note that

∥∥V j (θ)−1∥∥

F ≤[

g (N )(Σ j ,R (θ)

)]−1P j =

[g (N )

(Σ j ,R (θ)

)]−1P 0

j +oP∗ (1) ,

by Theorem 3, with g (N )(Σ j ,R (θ)

)> 0 implied by Assumption 2(i)(b). By similar passages, it is

easy to see that ∥∥Γ j∥∥

F =OP∗ (1) . (A.30)

We now show that ∥∥∥HΓj j Γ j

∥∥∥F= oP∗ (1) , (A.31)

58

for almost all realizations of the sample. Indeed,∥∥∥HΓ

j j Γ j

∥∥∥F≤

∥∥∥HΓj j

∥∥∥F

∥∥Γ j∥∥

F , and∥∥Γ j

∥∥F is bounded

by (A.30). Also ∥∥∥HΓj j

∥∥∥F= tr

(HΓ′

j j HΓj j

)≤

(P j −P 0

j

)g (1)

(HΓ′

j j HΓj j

),

with P j −P 0j = oP∗ (1) by Theorem 3 and g (1)

(HΓ′

j j HΓj j

)bounded by (A.29). Therefore, H

ΓP j

j j Γ j =H∗Γ

j j Γ∗j +oP∗ (1). Turning to H∗Γ

j j Γ∗j , let Γ

∗P 0j

j be the estimate obtained when using P 0j , and consider

H∗Γj j Γ

∗j =

(H∗Γ

j j ± HΓj j

)(Γ∗j ± Γ

∗P 0j

j

).

By the same arguments as in footnote 5 in Bai (2003), it is easy to see that

∥∥∥∥Γ∗j − Γ∗P 0j

j

∥∥∥∥F= oP∗ (1);

then it follows immediately that

H∗Γj j Γ

∗j = HΓ

j j Γ∗P 0

j

j +oP∗ (1) ,

which, combined with (A.31), yields

HΓj j Γ j = HΓ

j j Γ∗P 0

j

j +oP∗ (1) . (A.32)

We now conclude the proof of (A.27) by showing that

HΓj j Γ

∗P 0j

j = Γ j +oP (1) .

Let XP 0

j

j (θ0) =[ιN , B

P 0j

j (θ0)

]. By the same logic as in the previous passages, and using Theorem 3.4

in Massacci (2017), we have

1

T

T∑t=1Γ∗P 0

j

j ,t d j ,t =(

XP j ′j (θ)X

P j

j (θ)

)−1

XP j ′j (θ)

1

T

T∑t=1

Rt d j ,t

=(

XP 0

j ′j (θ0)X

P 0j

j (θ0)

)−1

XP 0

j ′j (θ0)

1

T

T∑t=1

Rt d j ,t +oP∗ (1)+oP (1) .

Also (X

P 0j ′

j (θ0)XP 0

j

j (θ0)

)−1

XP 0

j ′j (θ0)

1

T

T∑t=1

Rt d j ,t (A.33)

=(

XP 0

j ′j (θ0)X

P 0j

j (θ0)

)−1

XP 0

j ′j (θ0)

1

T

T∑t=1

Rt d j ,t(θ0)

+(

XP 0

j ′j (θ0)X

P 0j

j (θ0)

)−1

XP 0

j ′j (θ0)

1

T

T∑t=1

Rt(d j ,t −d j ,t

(θ0))

= I + I I .

59

Noting that

1

T

T∑t=1

Rt(d j ,t −d j ,t

(θ0))≤ (

1

T

T∑t=1

‖Rt‖2

)1/2 (1

T

T∑t=1

(d j ,t −d j ,t

(θ0))2

)1/2

,

and using Theorem 3.4 in Massacci (2017), it follows that

1

T

T∑t=1

Rt(d j ,t −d j ,t

(θ0))=OP

(N−ηT −1) ,

so that, in equation (A.33), we have I I = OP(N−ηT −1

). Also, by Corollary 3.1(a) in Massacci

(2017),

∥∥∥∥BP 0

j

j −B0j H j j

∥∥∥∥2

=OP (NC−2N ,T ), which in turn implies

XP 0

j

j (θ0)− (ιN ,B0j H j j ) = X

P 0j

j (θ0)−X0j HΓ

j j =OP (NC−2N ,T );

thus

1

T

T∑t=1Γ∗P 0

j

j ,t d j ,t =(

HΓ′j j X0′

j X0j HΓ

j j

N

)−1 (HΓ′

j j X0′j

N

)(1

T

T∑t=1

Rt d j ,t(θ0))

+OP (C−2N ,T )+OP

(N−ηT −1)+oP∗ (1) .

Let α= (α1, ...,αN )′. Recalling (2.4), it follows that

1

T

T∑t=1Γ∗P 0

j

j ,t d j ,t −T j

T

(HΓ

j j

)−1Γ0

j

= T j

T

(HΓ′

j j X0′j X0

j HΓj j

N

)−1 (HΓ′

j j X0′j α j

N

)

+(

HΓ′j j X0′

j X0j HΓ

j j

N

)−1 (HΓ′

j j X0′j

N

)(1

T

T∑t=1

B0j u0

j ,t d j ,t(θ0))

+(

HΓ′j j X0′

j X0j HΓ

j j

N

)−1 (HΓ′

j j X0′j

N

)(1

T

T∑t=1

εt d j ,t(θ0))

+OP(C−2

N ,T

)+OP(N−ηT −1)+oP∗ (1)

= Ia + Ib + Ic +OP(C−2

N ,T

)+OP(N−ηT −1)+oP∗ (1)

60

Assumptions 3(iv) and 7(i) immediately yields Ia =OP

(T j

N 1/2T

). Consider Ic ; it holds that

E

(1

N T

T∑t=1

X0′j εt d j ,t

(θ0))2

= 1

(N T )2

N∑i ,k=1

T∑t ,s=1

X 0i , j X 0′

k, j E(εi ,tε j ,sd j ,t

(θ0)d j ,s

(θ0))

≤ 1

(N T )2 max1≤i≤N

∥∥∥X 0i , j

∥∥∥2 N∑i ,k=1

T∑t ,s=1

∣∣E (εi ,tε j ,sd j ,t

(θ0)d j ,s

(θ0))∣∣

≤ c0N

(N T )2 max1≤i≤N

N∑k=1

T∑t ,s=1

∣∣E (εi ,tε j ,sd j ,t

(θ0)d j ,s

(θ0))∣∣≤ c0

N T,

by Assumptions 3(iii) and 2(iv); so, putting all together and using Assumption 3(iv), it follows

that Ic =OP

(1pN T

). Turning to Ib, (A.25) yields

(HΓ′

j j X0′j X0

j HΓj j

N

)−1 (HΓ′

j j X0′j B0

j

N

)(1

T

T∑t=1

u0j ,t d j ,t

(θ0))=OP

(1pT

)(A.34)

Putting all together, it follows that

1

T j

T∑t=1Γ∗P 0

j

j ,t d j ,t −(HΓ

j j

)−1Γ0

j =OP

(1

N 1/2

)+OP

(T 1/2

T j

)+OP

(T

T j N

)+OP

(N−ηT −1)+oP∗ (1) ; (A.35)

the desired result now follows immediately. We now turn to showing (A.28). We assume that

P 0j is known; the estimation error P j −P 0

j can be shown to be negligible on account of the same

arguments as above. Ee have

Λ j −Λ0j H j j =

(T∑

t=1gt u

P 0j ′

j ,t d j ,t

)(T∑

t=1u

P 0j

j ,t uP 0

j ′j ,t d j ,t

)−1

.

Note that, by Corollary 3.2 in Massacci (2017), it is easy to see that

1

T

T∑t=1

uP 0

j

j ,t uP 0

j ′j ,t d j ,t = 1

T

T∑t=1

H−1j j u

P 0j

j ,t uP 0

j ′j ,t

(H′

j j

)−1d j ,t

(θ0)+oP (1) . (A.36)

Also1

T

T∑t=1

gt uP 0

j ′j ,t d j ,t = 1

T

T∑t=1

gt uP 0

j ′j ,t d j ,t

(θ0)+ 1

T

T∑t=1

gt uP 0

j ′j ,t

(d j ,t −d j ,t

(θ0))= I + I I . (A.37)

Consider I I ; we have

I I ≤(

1

T

T∑t=1

∥∥gt∥∥4

)1/4 (1

T

T∑t=1

∥∥∥∥uP 0

j

j ,t

∥∥∥∥4)1/4 (

1

T

T∑t=1

(d j ,t −d j ,t

(θ0))2

)1/2

; (A.38)

61

using Assumptions 3(i) and 8(i), and using Theorem 3.4 in Massacci (2017), it follows that I I =OP

(N−ηT −1

). Turning to I , by the definition of gt

(θ)

it holds that

I = 1

T

T∑t=1

(gt −a

)u

P 0j ′

j ,t d j ,t(θ0)+ 1

T

T∑t=1

(gt −a

) 1

T

T∑t=1

uP 0

j ′j ,t d j ,t

(θ0)= Ia + Ib .

Using (A.25) and Assumption 8(ii), it follows that Ib =OP(T −1

).

Ia = 1

T

T∑t=1Λ0

j u0j ,t u

P 0j ′

j ,t d j ,t(θ0)+ 1

T

T∑t=1

et uP 0

j ′j ,t d j ,t

(θ0)= Ia,1 + Ia,2.

Now

Ia,1 = Λ0j H j j H−1

j j

(1

T

T∑t=1

u0j ,t u0′

j ,t

)(H′

j j

)−1d j ,t

(θ0)

+Λ0j

1

T

T∑t=1

u0j ,t

(u

P 0j

j ,t − H−1j j u0

j ,t

)′d j ,t

(θ0)

= Ia,1,1 + Ia,1,2.

Following the same passages as in the proof of Lemma B.2 in Bai (2003), it can be shown that

1

T

T∑t=1

u0j ,t

(u

P 0j

j ,t − H−1j j u0

j ,t

)′d j ,t

(θ0)=OP

(C−2

N ,T

), (A.39)

so that Ia,1,2 =OP

(C−2

N ,T

). Also

Ia,2 =(

1

T

T∑t=1

et u0′j ,t

)(H′

j j

)−1d j ,t

(θ0)+ 1

T

T∑t=1

et

(u

P 0j

j ,t − H−1j j u0

j ,t

)′d j ,t

(θ0)= Ia,2,1 + Ia,2,2.

Assumption 8(ii) entails that Ia,2,1 =OP(T −1/2

). Finally, by the same token as for (A.39), Ia,2,2 =

OP

(C−2

N ,T

), whence Ia,2 =OP

(T −1/2

)+OP

(C−2

N ,T

). Putting all together, we obtain

Λ j −Λ0j H j j =OP

(1

T 1/2

)+OP

(C−2

N ,T

)+OP(N−ηT −1) . (A.40)

For j = D,S , consider the matrices: M( j)H,X = limN→∞ N−1HΓ′

j j

(θ0

)X0′

j X0j HΓ

j j

(θ0

); M( j)

H,X,B =limN→∞ N−1HΓ′

j j

(θ0

)X0′

j B0j ; and M( j)

H,u =(M( j)

u

)−1H j j

(θ0

). We then define

ΣΓ, j =c−2j

(M( j)

H,X

)−1M( j)

H,X,BVu, j M( j)′H,X,B

(M( j)

H,X

)−1,

ΣαΓ, j =σ2α, j

(M( j)

H,X

)−1,

ΣΛ, j =(M( j)′

H,u ⊗ IK

)Veu, j

(M( j)

H,u ⊗ IK

).

62

Lemma A.13. We assume that Assumptions 1-8 are satisfied, and that min(N ,T j

)→∞, with

T j

T→ c j ∈ (0,1) , (A.41)

T 1/2

N→ 0. (A.42)

Then, under Assumptions 7 and 9, it holds that(1

TΣΓ, j + 1

NΣαΓ, j

)−1/2 (Γ j −

(HΓ

j j

(θ0))−1

Γ0j

)D∗→ N

(0, IP j+1

); (A.43)

under Assumption 9(ii), it holds that(1

TΣΛ, j

)−1/2

vec(Λ j −Λ0

j H j j(θ0)) D∗

→ N(0, IP j K

), (A.44)

for j =D,S , for almost all realisations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T

.

Proof. Consider (A.43). Upon inspecting (A.35), under (A.41) and (A.42), and using Lemma A.8,

it is immediate to see that the limiting behaviour of T 1/2(Γ∗P 0

j

j −(HΓ

j j

)−1Γ0

j

)is driven by

(HΓ′

j j X0′j X0

j HΓj j

N

)−1 [(HΓ′

j j X0′j B0

j

N

)(T

T j

1

T 1/2

T∑t=1

u0j ,t d j ,t

(θ0))+ T j

T

(HΓ′

j j X0′j α j

N

)].

The desired result now follows immediately from (A.26) and Assumption 7. Turning to (A.44),

combining (A.40) and (A.42), it follows that the limiting law of T 1/2(Λ j −Λ0

j H j j

)is driven by(

1

T 1/2

T∑t=1

et u0′j ,t

(H j j

)−1d j ,t

(θ0))(

1

T

T∑t=1

(H j j

)−1u0

j ,t u0′j ,t

(H′

j j

)−1d j ,t

(θ0))−1

=(

1

T 1/2

T∑t=1

et u0′j ,t d j ,t

(θ0))(

1

T

T∑t=1

u0j ,t u0′

j ,t d j ,t(θ0))−1

H j j ,

and using Assumption 9(ii) the desired result follows immediately.

In order to use the results in Lemma A.13, we propose the following estimators

ΣΓ, j =c−2j

(X′

j X j

N

)−1 X′

j BP j

j

N

Vu, j

X′j B

P j

j

N

′ (

X′j X j

N

)−1

,

ΣαΓ, j =σ2α, j

(X′

j X j

N

)−1

,

ΣΛ, j =((

M( j)u

)−1 ⊗ IK

)Veu, j

((M( j)

u

)−1 ⊗ IK

),

63

B Proofs

Henceforth, we use E∗ and V ∗ to denote, respectively, the expected value and the variance

with respect to P∗.

Proof of Lemma 1. The lemma is an adaptation of Lemma 1 in Trapani (2018). In order to prove

it, note that Assumption 4(v) entails that, for j =D,S

1

Tπ j

T∑t=1

E[(

B0j u0

j ,t +et

)(B0

j u0j ,t +et

)′d j ,t

(θ0)]

= B0j

1

Tπ j

T∑t=1

E(u0

j ,t u0′j ,t d j ,t

(θ0))B0′

j + 1

Tπ j

T∑t=1

E(et e′t d j ,t

(θ0)) .

We begin by showing that

g (i )j

(B0

j1

Tπ j

T∑t=1

E(u0

j ,t u0′j ,t d j ,t

(θ0))B0′

j

)∈

(c(i )

j N ,c(i )j N

)for 0 ≤ i ≤ P 0

j

= 0 for i ≥ P 0j +1

, (B.1)

for j =D,S , where 0 < c(i )j ≤ c(i )

j <∞. We show this result for the case j =D only - the case j =S

is repetitive. By the multiplicative Weyl’s inequality (Merikoski and Kumar (2004)) we have

g (i )D

(B0

D

1

TπD

T∑t=1

E(u0

D,t u0′D,t dD,t

(θ0))B0′

D

)≥ g (min)

D

(1

TπD

T∑t=1

E(u0

D,t u0′D,t dD,t

(θ0)))g (i )

D

(B0′

DB0D

),

g (i )D

(B0

D

1

TπD

T∑t=1

E(u0

D,t u0′D,t dD,t

(θ0))B0′

D

)≤ g (max)

D

(1

TπD

T∑t=1

E(u0

D,t u0′D,t dD,t

(θ0)))g (i )

D

(B0′

DB0D

),

by Assumption 3(ii) it holds that

g (min)D

(1

TπD

T∑t=1

E(u0

D,t u0′D,t dD,t

(θ0)))> 0.

Moreover, by Assumption 3(iv) we have

g (i )D

(B0

DB0′D

) ≥ c(i )1 N for 1 ≤ i ≤ P 0

D= 0 for i ≥ P 0

D+1

.

Equation (B.1) now follows immediately. Consider now

1

TπD

T∑t=1

et e′t dD,t(θ0)= 1

Tπ j

T∑t=1

εtε′t dD,t

(θ0)−εDε′D ,

64

so that

g (1)D

(1

TπD

T∑t=1

E(et e′t dD,t

(θ0)))≤ g (1)

D

(1

TπD

T∑t=1

E(εtε

′t dD,t

(θ0)))+ g (1)

D

(εDε

′D

).

The first term is bounded by Assumption 2(i). Also, after some algebra we have

g (1)D

(εDε

′D

)≤ max1≤i≤N

N∑k=1

1

(TπD)2

T∑t=1

T∑s=1

∣∣E (εi ,tεk,sdD,t

(θ0)dD,s

(θ0))∣∣ ,

which is bounded by Assumption 2(ii). The proof of the lemma now follows immediately along

the same lines as the proof of Lemma 1 in Trapani (2018).

Proof of Theorem 1. We prove the theorem for j = 1 - again, the proof for the case j = 2 is just a

repetition of the same arguments. Note that∣∣∣g (i )1 − g (i )

1

∣∣∣≤ ∥∥ΣS −ΣD

∥∥op ,

so that, by symmetry

∣∣∣g (i )1 − g (i )

1

∣∣∣≤ ∣∣∣∣∣ N∑h=1

N∑l=1

(1

TπD

T∑t=1

(Rh,t Rl ,t dD,t −E

(Rh,t Rl ,t d1,t

)))2∣∣∣∣∣1/2

. (B.2)

We define the short-hand notation δh,l ,t = Xh,t Xl ,t dD,t(θ0

)−E(Xh,t Xl ,t dD,t

(θ0

)); based on this,

(B.2) can be rewritten as

∣∣∣g (i )1 − g (i )

1

∣∣∣≤ ∣∣∣∣∣ N∑h=1

N∑l=1

(1

TπD

T∑t=1

(δh,l ,t + Rh,t Rl ,t

(dD,t −dD,t

(θ0))))2∣∣∣∣∣

1/2

. (B.3)

By repeated use of the Cr -inequality we have

∣∣∣g (i )1 − g (i )

1

∣∣∣≤ c0

∣∣∣∣∣ N∑h=1

N∑l=1

(1

TπD

T∑t=1

δh,l ,t

)2

+N∑

h=1

N∑l=1

(1

TπD

T∑t=1

Rh,t Rl ,t(dD,t −dD,t

(θ0)))2∣∣∣∣∣

1/2

≤ c1

∣∣∣∣∣ N∑h=1

N∑l=1

(1

TπD

T∑t=1

δh,l ,t

)2∣∣∣∣∣1/2

+ c2

∣∣∣∣∣ N∑h=1

N∑l=1

(1

TπD

T∑t=1


(θ0)))2∣∣∣∣∣

1/2

.

(B.4)

Using Lemma A.9, we have

E max1≤h≤N ,1≤l≤N ,1≤t≤T

h∑h′=1

l∑l ′=1

(t∑

t ′=1

δh′,l ′,t ′

)2

≤ c0N 2T ;

65

thus, by Lemma A.1 and Markov inequality, we have

N∑h=1

N∑l=1

(1

TπD

T∑t=1

δh,l ,t

)2

= oa.s.

(N 2T

T 2π2D

(ln N )2+ε (lnT )1+ε)

,

for every ε> 0. Considering the second term in (B.4), convexity implies that

N∑h=1

N∑l=1

(1

TπD

T∑t=1


(θ0)))2

≤ 1

Tπ2D

N∑h=1

N∑l=1

T∑t=1

R2h,t R2

l ,t

(dD,t −dD,t

(θ0))2

. (B.5)

Hence, applying (A.19) to (B.5) it follows that

N∑h=1

N∑l=1

(1

TπD

T∑t=1

Rh,t Rl ,t(dS ,t −dD,t

(θ0)))2

≤ c0∣∣θ−θ0

∣∣2 1

Tπ2D

N∑h=1

N∑l=1

T∑t=1

R2h,t R2

l ,t .

Equation (A.20) entails

E

(1

N 2T

N∑h=1

N∑l=1

T∑t=1

R2h,t R2

l ,t

)≤ c0.

The maximal inequality for rectangular sums (Moricz (1983)) now yields

E max1≤h≤N ,1≤l≤N ,1≤t≤T

h∑h′=1

l∑l ′=1

t∑t ′=1

R2h′,t ′R

2l ′,t ′ ≤ c0N 2T (ln N )2 lnT,

which, by Lemma A.1 and Markov inequality, yields

N∑h=1

N∑l=1

T∑t=1

R2h,t R2

l ,t = oa.s.(N 2T (ln N )3+ε (lnT )2+ε) ,

for every ε> 0. Then, by Lemma A.7 we obtain the general result∣∣∣∣∣ N∑h=1

N∑l=1

(1

TπD

T∑t=1

Rh,t Rl ,t(d1,t −dD,t

(θ0)))2∣∣∣∣∣

1/2

= oa.s.

(N 1−η

TπS(ln N )

3+ε2 (lnT )

2+ε2 v N ,T (ε)

);

recalling that we have assumed η= 1, the desired result follows.

Proof of Theorem 2. The proofs are similar to those of Theorems 3 and 4 in Horvath and Trapani

(2019), and therefore we only report the main arguments to save space. We begin by showing

(4.18). It follows from Lemma 1 and Theorem 1 that, for all i and j =D,S

P

ω : limmin(N ,T )→∞

N−% j g (i )j

g j

(p

) −ÅN 1−% j

=∞= 1.

66

Thus, by continuity, we can assume henceforth that

limmin(N ,T )→∞

exp(−ÅN 1−% j

)ψ

(g (i )

j

)=∞. (B.6)

Let Φ (·) denote the standard normal distribution; it holds that

M−1/2M∑

m=1

(ζ(i )

j ,m (s)− 1

2

)= M−1/2

M∑m=1

(Iξ

( j)i ,m ≤ 0

− 1

2

)+M−1/2

M∑m=1

Φ s

ψ(g (i )

j

)− 1

2

+M−1/2

M∑m=1

I

ξ( j)i ,m ≤ s

ψ(g (i )

j

)− I

ξ

( j)i ,m ≤ 0

−

Φ s

ψ(g (i )

j

)− 1

2

.

By definition

E∗ζ(i )j ,m (s) = Φ

s

ψ(g (i )

j

) ,

V ∗ζ(i )j ,m (s) = Φ

s

ψ(g (i )

j

)1−Φ

s

ψ(g (i )

j

) .

Thus

E∗∫ ∞

−∞

∣∣∣∣∣∣M−1/2M∑

m=1

I

ξ( j)i ,m ≤ s

ψ(g (i )

j

)− I

ξ

( j)i ,m ≤ 0

−

Φ s

ψ(g (i )

j

)− 1

2

∣∣∣∣∣∣2

dΦ(s)

=∫ ∞

−∞E∗

∣∣∣∣∣∣I

ξ( j)i ,1 ≤ s

ψ(g (i )

j

)− I

ξ

( j)i ,1 ≤ 0

−

Φ s

ψ(g (i )

j

)− 1

2

∣∣∣∣∣∣2

dΦ(s),

on account of the independence of the ξ( j)i ,ms across m. Also

E∗I

ξ( j)i ,1 ≤ s

ψ(g (i )

j

)− I

ξ

( j)i ,1 ≤ 0

= Φ

s

ψ(g (i )

j

)− 1

2

V ∗I

ξ( j)i ,1 ≤ s

ψ(g (i )

j

)− I

ξ

( j)i ,1 ≤ 0

=Φ

s

ψ(g (i )

j

)− 1

2

3

2−Φ

s

ψ(g (i )

j

)

≤ Φ

s

ψ(g (i )

j

)− 1

2.

67

Hence, we have∫ ∞

−∞E∗

∣∣∣∣∣∣I

ξ( j)i ,1 ≤ s

ψ(g (i )

j

)− I

ξ

( j)i ,1 ≤ 0

−

Φ s

ψ(g (i )

j

)− 1

2

∣∣∣∣∣∣2

dΦ (s)

≤∫ ∞

−∞

∣∣∣∣∣∣Φ s

ψ(g (i )

j

)− 1

2

∣∣∣∣∣∣dΦ (s) ≤ 1p

2πψ(g (i )

j

) ∫ ∞

−∞|s|dΦ (s) = 1

πψ(g (i )

j

) ,

which drifts to zero by (B.6). Also∫ ∞

−∞M−1/2

M∑m=1

Φ s

ψ(g (i )

j

)− 1

2

dΦ (s) ≤ M

2π∣∣∣ψ(

g (i )j

)∣∣∣2

∫ ∞

−∞s2dΦ (s) = M

2π∣∣∣ψ(

g (i )j

)∣∣∣2 .

Hence, using (4.17), we conclude via Markov’s inequality that

Υ(i )j =

∫ ∞

−∞

2pM

M∑m=1

(Iξ

( j)i ,m ≤ 0

− 1

2

)2

dΦ (s)+oP∗(1)

=

2pM

M∑m=1

(Iξ

( j)i ,m ≤ 0

− 1

2

)2

+oP∗(1),

and therefore the desired result follows from the Central Limit Theorem for Bernoulli random

variables.

We now turn to showing (4.19). Again Lemma 1 and Theorem 1 entail that

P

ω : limmin(N ,T )→∞

N−% j g (i )j

g j

(p

) = 0

= 1.

By continuity, this means that we can assume from now on that

limmin(N ,T )→∞

ψ(g (i )

j

)= 1. (B.7)

Consider

ζ(i )j ,m (s)− 1

2= I

ξ( j)i ,1 ≤ s

ψ(g (i )

j

)±Φ

s

ψ(g (i )

j

)− 1

2.

Then we have

E∗∫ ∞

−∞

∣∣∣∣∣∣M−1/2M∑

m=1

I

ξ( j)i ,1 ≤ s

ψ(g (i )

j

)− 1

2

∣∣∣∣∣∣2

dΦ (s)

= E∗I

ξ( j)i ,1 ≤ s

ψ(g (i )

j

)−Φ

s

ψ(g (i )

j

)2

+M∫ ∞

−∞

Φ s

ψ(g (i )

j

)− 1

2

2

dΦ (s) .

68

Given that

E∗I

ξ( j)i ,1 ≤ s

ψ(g (i )

j

)−Φ

s

ψ(g (i )

j

)2

<∞,

by Markov’s inequality it follows that

∫ ∞

−∞

M−1/2M∑

m=1

I

ξ( j)i ,1 ≤ s

ψ(g (i )

j

)−Φ

s

ψ(g (i )

j

)2

dΦ (s) =OP∗(1),

for almost all realizations of e j ,b j ,−∞< j <∞. Thus, as min(M , N ,T j

)→∞ for j =D,S

1

4MΥ(i )

j =∫ ∞

−∞

(Φ (s)− 1

2

)2

dΦ (s)+OP∗(1). (B.8)

Hence the proof of (4.19) is complete.

Proof of Theorem 3. See the proof of (the identical) Theorem 3 in Trapani (2018).

Proof of Theorem 4. The result follows from combining (A.27) and (A.28), with

γg, j ,1 −γ0g, j ,1 =OP

(1

N 1/2

)+OP

(T 1/2

T j

)+OP

(C−2

N ,T

)+OP(N−ηT −1) . (B.9)

Proof of Theorem 5. The result readily follows by combining the results in Lemma A.13 (see also

Theorem 3 in Giglio and Xiu (2020)).

Proof of Lemma 2. We report the proof using the assumption that the true d j ,t has been used

(also in the computation of T j ); extending to the case where d j ,t is employed is straighforward in

light of the results derived above. It holds that

e j ,i =α j ,i +ε j ,i +(

1

T j

T∑t=1

d j ,t

)(γ j ,0 − γ j ,0

)+ 1

T j

T∑t=1

(β′

j ,i

(γ j ,1 +u j ,t

)− β′j ,i

(γ j ,1 + u j ,t

))d j ,t ,

where

ε j ,i = 1

T j

T∑t=1

εi ,t d j ,t .

69

Also, applying Theorem 3.4 and Corollary 3.2 in Massacci (2017) (recalling also (5.3)) yields

1

N

N∑i=1

∥∥β j ,i − β j ,i∥∥2 = oP (1) , (B.10)∥∥∥∥∥ 1

T j

T∑t=1

(u j ,t − u j ,t

)∥∥∥∥∥2

≤ T

T j

1

T

T∑t=1

∥∥u j ,t − u j ,t∥∥2 = oP (1) , (B.11)

having omitted rotation matrices for the sake of the notation. Note now that

1

N

N∑i=1

e2j ,i = 1

N

N∑i=1

α2j ,i

+ 1

N

N∑i=1

(ε j ,i +

(γ j ,0 − γ j ,0

)+ 1

T j

T∑t=1

(β′

j ,i

(γ j ,1 +u j ,t

)− β′j ,i

(γ j ,1 + u j ,t

))d j ,t

)2

+ 2

N

N∑i=1

α j ,i

(ε j ,i +

(γ j ,0 − γ j ,0

)+ 1

T j

T∑t=1

(β′

j ,i

(γ j ,1 +u j ,t

)− β′j ,i

(γ j ,1 + u j ,t

))d j ,t

).

The LLN entails1

N

N∑i=1

α2j ,i =σ2

α, j +oP (1) ;

upon showing that

1

N

N∑i=1

(ε j ,i +

(γ j ,0 − γ j

)+ 1

T j

T∑t=1

(β′

j ,i

(γ j ,1 +u j ,t

)− β′j ,i

(γ j ,1 + u j ,t

))d j ,t

)2

= oP (1) ,

and using (repeatedly) the Cauchy-Schwartz inequality, the lemma follows. Now

1

N

N∑i=1

(ε j ,i +

(γ j ,0 − γ j

)+ 1

T j

T∑t=1

(β′

j ,i

(γ j ,1 +u j ,t

)− β′j ,i

(γ j ,1 + u j ,t

))d j ,t

)2

≤ c0

(1

N

N∑i=1

ε2j ,i +

(γ j ,0 − γ j

)2 + 1

N

N∑i=1

(1

T j

T∑t=1

(β′

j ,i

(γ j ,1 +u j ,t

)− β′j ,i

(γ j ,1 + u j ,t

)))2

d j ,t

).

By (A.27), γ j ,0 − γ j = oP (1). Also, combining Theorem 4 and (B.10)-(B.11), it is easy to see that

1

N

N∑i=1

(1

T j

T∑t=1

(β′

j ,i

(γ j ,1 +u j ,t

)− β′j ,i

(γ j ,1 + u j ,t

)))2

d j ,t = oP (1) .

Finally we have

max1≤i≤N

E(ε2

j ,i

)= max

1≤i≤N

1

T 2j

T∑t=1

T∑s=1

E(εi ,tεi ,sd j ,t d j ,s

)≤ c0T −1j ,

using Assumption 4(ii). Hence1

N

N∑i=1

ε2j ,i = oP (1) .

Putting all together, the desired result follows.

70

Proof of Lemma 3. The proof makes appeal to several arguments which have already been used

in the paper and therefore, for the sake of a concise discussion, we omit passages when this does

not give rise to ambiguity.

We begin by noting that, by the Frisch-Waugh-Lovell theorem, it holds that

γ j ,0 =(i ′N MB , j iN

)−1 (i ′N MB , j R j

(θ))

,

having defined

MB , j = B j

(B′

j B j

)−1B′

j ,

for j =D,S - note that we use B j in lieu of BP j

j to make the notation less burdensome. Thus, on

account of equation (2.5), it holds that

γ j ,0 = γ j ,0 +(i ′N MB , j iN

)−1 (i ′N MB , jα j

)+ (i ′N MB , j iN

)−1

(i ′N MB , j

1

T j

T∑t=1

εt d j ,t

). (B.12)

We begin by noting that, using Corollary 3.1(a) in Massacci (2017), it is easy to see that∥∥∥MB , j −M0B , j

∥∥∥2 =OP(NC−2

N ,T

),

so thati ′N MB , j iN

N=

i ′N M0B , j iN

N+OP

(C−2

N ,T

)=OP (1) . (B.13)

Also, it holds that

E(B j −B0

j H j j

)′α jα

′j

(B j −B0

j H j j

)=σ2

α, j E(B j −B0

j H j j

)′ (B j −B0

j H j j

)=O

(NC−2

N ,T

),

again by Corollary 3.1(a) in Massacci (2017); this yields the (non sharp) bound

i ′N(MB , j −M0

B , j

)α j =OP

(N 1/2C−1

N ,T

). (B.14)

Finally, note that Assumption 7(i) entails that

N 1/2(i ′N M0

B , j iN

)−1 (i ′N M0

B , jα j

)=OP (1) . (B.15)

Putting (B.13)-(B.15) together, we obtain

N 1/2 (i ′N MB , j iN

)−1 (i ′N MB , jα j

)= N 1/2(i ′N M0

B , j iN

)−1 (i ′N M0

B , jα j

)+oP (1) . (B.16)

71

Consider now

i ′N MB , j1

T j

T∑t=1

εt d j ,t = i ′N1

T j

T∑t=1

εt d j ,t + i ′N B j

(B′

j B j

)−1B′

j1

T j

T∑t=1

εt d j ,t = I + I I . (B.17)

Using Assumption 2(ii) and equation (A.19), it is easy to see that

E

(i ′N

1

T j

T∑t=1

εt d j ,t

)2

= T −2j E

(N∑

i ,k=1

T∑t ,s=1

εi ,tεk,s d j ,t d j ,s

)≤ N T −2

j max1≤k≤N

N∑i=1

T∑t ,s=1

∣∣E (εi ,tεk,s d j ,t d j ,s

)∣∣≤ c0N

T j,

so that

i ′N1

T j

T∑t=1

εt d j ,t =OP

(N 1/2T −1/2

j

). (B.18)

In order to study I I in (B.17), note that

i ′N B j

(B′

j B j

)−1B′

j1

T j

T∑t=1

εt d j ,t (B.19)

= i ′N B0j H j j

(B′

j B j

)−1H′

j j B0′j

1

T j

T∑t=1

εt d j ,t + i ′N(B j −B0

j H j j

)(B′

j B j

)−1H′

j j B0′j

1

T j

T∑t=1

εt d j ,t

+ i ′N B0j H j j

(B′

j B j

)−1 (B j −B0

j H j j

)′ 1

T j

T∑t=1

εt d j ,t

+ i ′N(B j −B0

j H j j

)(B′

j B j

)−1 (B j −B0

j H j j

)′ 1

T j

T∑t=1

εt d j ,t

= I + I I + I I I + IV.

By construction,(B′

j B j

)−1 = Op(N−1

), and i ′N B0

j = OP (N ). Using Assumption 2(ii) and equation

(A.19), similarly to (B.18), it also follows that

B0′j T −1

j

T∑t=1

εt d j ,t =OP

(N 1/2T −1/2

j

), (B.20)

so that I =OP

(N−1/2T −1/2

j

). Consider now

B0j − B j H−1

j j

= T −1B0j H j j

(H−1

j j U0j ,t − U j ,t

)U′

j ,t H−1j j +T −1ε j

(U′

j ,t −U0′j ,t

(H−1

j j

)′)H−1

j j

+T −1ε j U0′j ,t

(H−1

j j

)′H−1

j j , (B.21)

where ε j is an N ×T matrix with columns εt d j ,t . Using exactly the same arguments as in the

proofs of Lemmas B.1 and B.3 in Bai (2003), it holds that∥∥∥T −1

(H−1


)U′

j ,t H∥∥∥=OP

(C−2

N ,T

)and

∥∥∥T −1ε j

(U′

j ,t −U0′j ,t

(H−1

j j

)′)H

∥∥∥=OP

(C−2

N ,T

). Hence, it follows that

i ′N B0j

(T −1H j j

(H−1


)U′

j ,t

)H−1

j j =OP(NC−2

N ,T

). (B.22)

72

Also

i ′N T −1ε j

(U′

j ,t −U0′j ,t

(H−1

j j

)′)= N∑i=1

T −1T∑

t=1ε j ,i ,t d j ,t

(u′

j ,t −u0′j ,t

(H−1

j j

)′)≤

N∑i=1

∥∥∥∥∥T −1T∑


(u′

j ,t −u0′j ,t

(H−1

j j

)′)∥∥∥∥∥ ;

again by using the same arguments as in the proof of Lemma B.1 in Bai (2003), it can be shown

that

E

∥∥∥∥∥T −1T∑


(u′

j ,t −u0′j ,t

(H−1

j j

)′)∥∥∥∥∥≤ c0C−2N ,T ,

which yields that

i ′N T −1ε j

(U′

j ,t −U0′j ,t

(H−1

j j

)′)=OP(NC−2

N ,T

). (B.23)

Finally, note that

E∥∥∥i ′N T −1ε j U0′

j ,t

∥∥∥2 =N∑

i=1

N∑j=1

T −2T∑

t=1

T∑s=1

E(εi ,tε j ,s d j ,t d j ,s

)E

∥∥∥u0′j ,t u0

j ,s

∥∥∥≤

(max

1≤t≤TE

∥∥∥u0j ,t

∥∥∥2)

N

Tmax

1≤i≤N

N∑k=1

T −1T∑

t=1

T∑s=1

∣∣E (εi ,tεk,s d j ,t d j ,s

)∣∣≤ c0N

T,

after Assumptions 2(ii) and 3(i), and using (A.19), which entails that

i ′N T −1ε j U0′j ,t =OP

(N 1/2T −1/2) . (B.24)

Putting (B.22)-(B.24) together, it follows that

i ′N(B j −B0

j H j j

)=OP

(NC−2

N ,T

). (B.25)

Thus, considering (B.19), (B.20) and (B.25) immediately entail that I I = OP

(N 1/2T −1/2

j C−2N ,T

).

Turning to I I I , note that, using (B.21)(B j −B0

j H j j

)′ ( 1

T j

T∑t=1

εt d j ,t

)

=(B0

j H j j T −1(H−1


)U′

j ,t H−1j j

)′ ( 1

T j

T∑t=1

εt d j ,t

)

+(T −1ε j

(U′

j ,t −U0′j ,t

(H−1

j j

)′)H−1

j j

)′ ( 1

T j

T∑t=1

εt d j ,t

)

+(T −1ε j U0′

j ,t

(H−1

j j

)′H−1

j j

)′ ( 1

T j

T∑t=1

εt d j ,t

)= I + I I + I I I . (B.26)

73

Using (B.20), it follows that

I =(H j j T −1

(H−1


)U′

j ,t H−1j j

)(′B0′

j1

T j

T∑t=1

εt d j ,t

)=OP

(N 1/2T −1/2

j C−2N ,T

);

further, we have

N∑i=1

(1

T j

T∑t=1

εi ,t d j ,t

)T −1

T∑t=1

εi ,t d j ,t

(u′

j ,t −u0′j ,t

(H−1

j j

)′)=OP

(N T −1/2

j C−2N ,T

)N∑

i=1

(1

T j

T∑t=1

εi ,t d j ,t

)(1

T

T∑t=1

εi ,t u′j ,t d j ,t

)=OP

(N T −1) ,

whence I I =OP

(N T −1/2

j C−2N ,T

)and I I I =OP

(N T −1

). Finally, using the same arguments as above,

it holds that IV is dominated. Putting all together, we conclude that, in (B.17), I I =OP(N T −1

),

whence

i ′N MB , j1

T j

T∑t=1

εt d j ,t =OP(N 1/2T −1/2)+OP

(N T −1) . (B.27)

Putting (B.13), (B.15) and (B.27) together, it follows that, in (B.12)

N 1/2 (γ j ,0 −γ j ,0

)= (i ′N M0

B , j iN

N

)−1 (i ′N M0

B , jα j

N 1/2

)+OP

(N 1/2T −1)+oP (1) . (B.28)

Recalling that, by Lemma A.8, π j −π j =OP(N−ηT −1

)with η> 1

2 , the desired result follows from

Assumption 7(iii) and the Cramer-Wold device.

Proof of Theorem 6. The proof is essentially the same as the proof of Theorems 3 and 4 in Horvath

and Trapani (2019) and most of it is therefore omitted. In order to understand the role of (5.17),

note that

M−1/2θ

Mθ∑m=1

(ζθm (s)− 1

2

)= M−1/2

θ

Mθ∑m=1

(ζθm (0)− 1

2

)

+M−1/2θ

Mθ∑m=1

(ζθm (s)−ζθm (0)−

∫ s/ fN ,T

0

1p2π

exp

(−1

2x2

)d x

)

+M−1/2θ

Mθ∑m=1

∫ s/ fN ,T

0

1p2π

exp

(−1

2x2

)d x.

By the same passages as in the proof of Theorem 3 in Horvath and Trapani (2019), it is easy to

see that

M−1/2θ

Mθ∑m=1

(ζθm (0)− 1

2

)=OP∗ (1) ;

74

also

E∗∫ ∞

−∞

∣∣∣∣∣M−1/2θ

Mθ∑m=1

(ζθm (s)−ζθm (0)−

∫ s/ fN ,T

0

1p2π

exp

(−1

2x2

)d x

)∣∣∣∣∣2

d s = o (1) ,

and ∫ ∞

−∞

∣∣∣∣∣M−1/2θ

Mθ∑m=1

∫ s/ fN ,T

0

1p2π

exp

(−1

2x2

)d x

∣∣∣∣∣2

d s = c0Mθ

f 1/2N ,T

= o (1) ,

after (5.17). The rest of the proof follows immediately.

75

Table 1: Number of estimated common factors across regimes

(N ,T ) = (17,273) (N ,T ) = (70,273)

PD PD

1 2 1 2

1 (1.273,1.006)[27.50,1.400]

(1.992,0.997)[0.600,0.300]

1 (1.038,0.992)[5.400,0.800]

(2.000,0.996)[1.800,0.400]

2 (1.008,1.980)[1.400,1.200]

(2.002,1.994)[2.100,0.300]

2 (1.094,2.007)[6.900,0.900]

(1.982,1.980)[1.000,1.600]

PS 3 (1.003,2.986)[0.700,0.700]

(1.700,2.992)[30.00,0.400]

PS 3 (0.994,2.964)[0.600,2.000]

(1.980,2.966)[1.400,2.200]

4 (1.001,3.939)[0.300,4.800]

(1.988,3.970)[0.900,1.300]

4 (0.996,3.952)[0.400,1.800]

(1.996,3.938)[0.400,2.200]

5 (0.999,4.864)[0.100,11.80]

(1.997,4.725)[0.300,25.30]

5 (0.998,4.882)[0.200,4.600]

(1.982,4.914)[1.400,3.000]

6 (1.000,4.883)[0.000,97.60]

(1.997,4.771)[0.200,99.80]

6 (0.990,5.848)[1.000,10.40]

(1.980,5.800)[1.400,11.60]

(N ,T ) = (130,666) (N ,T ) = (330,666)

PD PD

1 2 1 21 (0.996,0.998)

[0.800,0.200](1.984,0.996)

[1.000,0.400]1 (0.980,1.000)

[2.000,0.000](2.000,1.000)

[0.000,0.000]2 (0.996,1.980)

[0.400,1.400](1.984,1.990)

[1.000,0.600]2 (0.980,2.000)

[2.000,0.000](2.000,2.000)

[0.000,0.000]PS 3 (1.000,2.976)

[0.000,1.000](1.980,2.950)

[1.400,2.000]PS 3 (1.000,3.000)

[0.000,0.000](2.000,3.000)

[0.000,0.000]4 (0.992,3.966)

[0.800,1.200](1.980,3.958)

[1.600,2.000]4 (1.000,4.000)

[0.000,0.000](2.000,4.000)

[0.000,0.000]5 (0.994,4.946)

[0.600,2.400](1.984,4.924)

[1.000,2.000]5 (1.000,5.000)

[0.000,0.000](2.000,4.940)

[0.000,2.000]6 (0.994,5.924)

[0.600,2.000](1.980,5.912)

[1.200,2.200]6 (1.000,6.000)

[0.000,0.000](2.000,5.960)

[0.000,2.000]

The table contains the average numbers of estimated factors and the percentageof times the estimator is wrong. In particular, in each cell the numbers in roundbrackets are the average values of P 0

Dand P 0

Srespectively. The numbers under-

neath, in square brackets, represent the percentage of time that each estimatormissed the true values P 0

D,P 0

S.

76

Table 2: Estimation and inference on θ

Equity Multi-asset

fixed estimated fixed estimated

θ −0.030 −0.061 −0.030 −0.060

TD 105 45 43 21

πD 0.158 0.068 0.158 0.077

Cor r∗ (D,REC ) 0.330[0.148,0.490]

0.510[0.255,0.698]

0.390[0.102,0.617]

0.550[0.168,0.788]

Cor r (D,REC ) 0.166[−0.026,0.346]

0.244[−0.053,0.501]

0.194[−0.112,0.467]

0.287[−0.165,0.639]

Cor r (D,∆I P ) −0.054[−0.243,0.139]

−0.084[−0.368,0.214]

−0.017[−0.315,0.284]

−0.097[−0.507,0.349]

Cor r (D,V I X ) 0.508[0.351,0.637]

0.538[0.291,0.718]

0.491[0.224,0.689]

0.534[0.133,0.784]

Cor r (D,V XO) 0.503[0.345,0.633]

0.545[0.300,0.722]

0.489[0.222,0.688]

0.547[0.151,0.791]

Cor r (D,EPU ) 0.166[−0.026,0.346]

0.293[0.010,0.540]

0.206[−0.100,0.476]

0.361[−0.083,0.685]

Cor r (D,E MV ) 0.450[0.283,0.590]

0.525[0.274,0.709]

0.415[0.131,0.636]

0.498[0.085,0.765]

Cor r (D,PRE MV ) 0.426[0.256,0.571]

0.516[0.263,0.703]

0.398[0.111,0.623]

0.490[0.074,0.760]

Cor r (D,GPR) −0.003[−0.194,0.188]

0.045[−0.251,0.334]

0.032[−0.270,0.329]

0.087[−0.358,0.499]

Cor r (D,T ED) 0.163[−0.029,0.343]

0.210[−0.089,0.474]

0.187[−0.120,0.461]

0.239[−0.214,0.607]

Cor r (D,SP X ) 0.431[0.261,0.575]

0.493[0.234,0.687]

0.398[0.111,0.623]

0.459[0.035,0.743]

Cor r (D,T 10Y 2Y ) −0.059[−0.247,0.134]

−0.010[−0.302,0.284]

0.046[−0.257,0.341]

0.114[−0.334,0.520]

H0 : θ0 =−0.03 H0 : θ0 =−0.03(0.000) (0.004)[0.000] [0.032]

H0 : θ0E q = θ0

M A(0.122)[0.999]

The table contains all the relevant inference on θ in the datasets considered,and we refer to the data description in Section 7.1 for details. For each dataset,we report the estimated value, θ, and, as a term of comparison, the exogenouslyfixed value θ, which is proposed in Farago and Tedongap (2018). For both valuesof θ, we also report the the proportion of time periods in the downside regime D,denoted as πD , and the corresponding number of time periods, denoted as TD .In the bottom half of the table, we report the correlation between the indicatorfunctions implicitly defined by both θ and θ, and various other variables, definedas follows. REC is the NBER U.S. recession indicator. ∆IP is the log difference ofindustrial production. VIX is the CBOE volatility index based on S&P 500 indexoptions. VXO is the CBOE volatility index based on S&P 100 index options.EPU is the economic policy uncertainty index of Baker et al. (2016). EMV is theEquity Market Volatility tracker of Baker, Bloom, and Davis (2016). PREMV isthe Policy-Related Equity Market Volatility tracker of Baker, Bloom, Davis, andKost (2019). GPR is the Geopolitical Risk index of Caldara and Iacoviello (2018).TED is the TED spread. SPX is the disaster probability for the U.S. of Barro andLiao (2020). Finally, T10Y2Y is the spread for the T-bill between 10 and 2 yearmaturities.All correlations have been computed using the classical Pearson correlation co-efficient; the only exception is the correlation coefficient denoted as Cor r∗(D,REC ),which is the tetrachoric correlation coefficient.In the last part of the table, we report tests based on Section 5.3.3. The first

three tests correspond to the null hypotheses that θ0 = θ for the two datasets. The

last test is for the null that θ0 is the same across the two datasets - we refer tothe values of θ0 in each dataset as θ0

E q and θ0M A respectively. The numbers in

round brackets are the p-values of the test based on Sθ ; the numbers in squarebrackets contain the values of Qα computed using (5.20) with B = 2,000. In thiscase, the threshold, as defined in (5.21), is equal to 0.940.

77

Table 3: Number of estimated common factors across regimesEquities Currencies, equities and commodieties

Unconditional model - (N ,T ) = (130,666) Unconditional model - (N ,T ) = (17,273)

Common Individual Common Individual

P 6 Eigenv. ν(i ) P 4 Eigenv. ν(i )

1 0.833 1 0.518ν 0.945 2 0.050 ν 0.744 2 0.165

3 0.028 3 0.0614 0.019 4 0.0545 0.0096 0.007

Conditional model Conditional model

Regime S -(N ,TS

)= (130,621) Regime S -(N ,TS

)= (17,252)


PS 6 Eigenv. ν(i )S

PS 4 Eigenv. ν(i )S

1 0.781 1 0.375νS 0.928 2 0.068 νS 0.654 2 0.190

3 0.037 3 0.0894 0.021 4 0.0735 0.0126 0.009

Regime D -(N ,TD

)= (130,45) Regime D -(N ,TD

)= (17,21)


PD 1 Eigenv. ν(i )D

PD 2 Eigenv. ν(i )D

1 0.924 1 0.748νD 0.924 νD 0.892 2 0.144

The table contains the number of estimated common factors in each regime; eachpanel corresponds to a different dataset. We refer to the data description in Section7.1 for details.Each panel contains inference on the number of common factors in the two regimes -the “normal”regime, denoted as S , and the “downside”regime, denoted as D. More-over, at the top of each panel, we have estimated the number of common factors forthe whole sample, which is tantamount to estimating the number of common factorsin a linear specification without thresholds (the “unconditional model”).In each panel, we have estimated the number of common factors using the method-ology described in Section 4.2.1.Each sub-table in the panel (for different regimes and different ways of setting thethreshold) contains two types of information. The aggregate information (in thecolumns headed as “Common”) contains the estimated number of common factors,P , and the percentage of the variance explained by all the common factors (this isdenoted as ν j , for j =D,S , using the notation in equation (7.1)); similarly, ν refers tothe whole sample estimation). The individual information (in the columns headed as“Individual”) contains the percentage of the variance explained by the i -th individual

common factor (this is denoted as ν(i )j , for j = D,S , using the notation in equation

(7.2)); similarly, ν(i ) refers to the whole sample estimation).

78

Table 4: Goodness of fit

Panel A: Conditional model with downside risk

RMSPE RMSPE/RMSR R2

Equity Multi-asset Equity Multi-asset Equity Multi-asset

P = P P = P P = P P = P P = P P = P

Realised returns

Regime D 1.323 1.576 0.134 0.251 0.982 0.928

Regime S 0.102 0.257 0.070 0.307 0.994 0.874

Full sample 0.184 0.358 0.074 0.303 0.993 0.878

Panel B: Linear, unconditional model

Equity Multi-asset

RMSPE

Realized Returns P = P P = 1 P = 5 P = 10 P = 15 P = P P = 1 P = 5 P = 10 P = 15

Regime D 10.570 10.575 10.570 10.570 10.570 6.517 6.519 6.516 6.514 6.512

Regime S 0.773 0.797 0.773 0.771 0.770 0.579 0.588 0.565 0.559 0.544

Full Sample 0.103 0.229 0.103 0.081 0.075 0.204 0.231 0.161 0.133 0.037

RMSPE/RMSR


Regime D 1.067 1.067 1.067 1.067 1.067 1.036 1.037 1.036 1.036 1.036

Regime S 0.533 0.549 0.533 0.531 0.531 0.690 0.701 0.672 0.665 0.648

Full Sample 0.145 0.323 0.146 0.113 0.105 0.513 0.582 0.405 0.333 0.092

R2


Regime D -0.044 -0.003 -0.035 -0.079 -0.126 -0.329 -0.064 -0.450 -1.657 -14.936

Regime S 0.196 0.208 0.203 0.172 0.136 -0.113 0.092 -0.181 -1.134 -11.437

Full Sample 0.978 0.895 0.978 0.986 0.988 0.650 0.639 0.762 0.704 0.865

Realised returns in regimed D, regime S and full sample are computed as RD = T−1D

∑Tt=1 dD,t Rt , RS = T−1

S

∑Tt=1 dS ,t Rt and R =

T−1 ∑Tt=1 Rt , respectively. Predicted returns are computed as: XD ΓD in regime D; XS ΓS in regime S ; XΓ in the unconditional model;[(

TD /T)

XD ΓD + (TS /T

)XS ΓS

]in the average model.

RMSPE is computed as:√α′

DαD /N where αD = RD − XD ΓD in regime D when the conditional model D is used;

√α′

DUαDU /N where

αDU = RD − XΓ in regime D when the unconditional model is used;√α′

SαS /N where αS = RS − XS ΓS in regime S when the conditional

model S is used;√α′

S UαS U /N where αS U = RS − XΓ in regime S when the unconditional model is used;

√α′

DSαDS /N where αDS =

R− [(TD /T

)XD ΓD + (

TS /T)

XS ΓS]

in the whole sample when the average model is used;√α′α/N where α= R− XΓ in the whole sample when

the unconditional model is used.RMSPE/RMSR is computed as:

√α′

DαD

/√R′

DRD in regime D when the conditional model D is used;

√α′

DUαDU

/pR′R in regime D when

the unconditional model is used;√α′

SαS

/√R′

SRS in regime S when the conditional model S is used;

√α′

S UαS U

/pR′R in regime S

when the unconditional model is used;√α′

DSαDS

/pR′R in the whole sample when the average model is used;

√α′α

/pR′R in the whole

sample when the unconditional model is used.

R2 is computed as: 1−(α′

DαD

/R′

DRD

)[(N −1)

/(N − PD

) ]in regime D when the conditional model D is used; 1−

(α′

DUαDU

/R′R

)[(N −1)

/(N − P

) ]in Regime D when the unconditional model is used; 1−

(α′

SαS

/R′

SRS

)[(N −1)

/(N − PS

) ]in regime S when the conditional model S is used;√

α′S U

αS U

/pR′R in regime S when the unconditional model is used; 1−

(α′

DSαDS

/R′R

)[(N −1)

/(N −min

PD , PS

) ]in the whole sample when

the average model is used; 1− (α′α/

R′R )[(N −1)

/(N − P

) ]in the whole sample when the unconditional model is used.

79

Table 5: Estimated risk premia

Estimated risk premia - third stage estimation

Realized returns Full Sample Regime D Regime S

Model Unconditional Conditional D Conditional SEquity Multi-asset Equity Multi-asset Equity Multi-asset

ln(Rm) 0.080 0.461 −1.793∗∗ −5.124∗∗∗ 0.720∗∗∗ 0.702∗(0.254) (0.401) (0.720) (1.767) (0.220) (0.392)

SMB 0.241∗ 0.027 −1.464∗∗∗ −3.021∗∗∗ 0.399∗∗∗ 0.397∗∗(0.144) (0.152) (0.428) (0.792) (0.152) (0.186)

HML 0.278∗∗ −0.068 −0.444 −1.724 0.186 −0.256∗(0.135) (0.111) (0.351) (1.557) (0.126) (0.153)

RMW 0.091 −0.130 −0.092 −1.000∗ −0.004 −0.315∗(0.098) (0.117) (0.246) (0.547) (0.088) (0.179)

CMA 0.172∗ −0.083 −0.214 −0.998 0.073 −0.163∗(0.091) (0.076) (0.219) (0.723) (0.081) (0.093)

Mom 0.644∗∗∗ −0.071 −0.469 −2.380 0.592∗∗∗ −0.079(0.162) (0.152) (0.544) (2.183) (0.198) (0.205)

Liq 0.002 0.003 −0.025 −0.086 0.002∗∗ 0.002(0.001) (0.002) (0.016) (0.054) (0.001) (0.002)

T10Y2Y 0.000 −0.003 −0.008 0.030 0.016 0.009(0.011) (0.024) (0.106) (0.225) (0.015) (0.034)

Estimated γ j ,0 - second stage estimation

Realized returns Full Sample Regime D Regime S

Model Unconditional Conditional D Conditional SEquity Multi-asset Equity Multi-asset Equity Multi-asset

Intercept 0.467∗∗ −0.150 −7.381∗∗∗ −2.474∗∗∗ 0.488∗∗∗ 0.360(0.190) (0.285) (0.435) (0.919) (0.175) (0.334)

Dataset: Equity Dataset: Multi-asset

Confidence interval for πDγD,0 +πS γS ,0 (−0.375,0.287) (−0.490,0.774)

N 130 17 130 17 130 17T 666 273 45 21 621 252

The table contains the estimates of the risk premia for different datasets. We use the short-hand notation “Equity”to refer to thedataset employed by Farago and Tedongap (2018), and “Multi-asset”to refer to the dataset employed by Lettau, Maggiori, andWeber (2014) (after removing 18 portfolios corresponding to options).In Panel A, we estimate the impact of the unobservable factors. In Panel B, we consider the risk premia associated with commonlyemployed factors, using the full three-pass methodology.In each panel, we estimate the model in each separate regime, and in the corresponding (misspecified) linear model.

80

Figure 1: Predicted and expected returns

−15 −10 −5 0

−1

5−

10

−5

0

Equities, downside regime

Predicted return (%)

Re

aliz

ed

re

turn

(%

)

−15 −10 −5 0

−1

5−

10

−5

0


Re

aliz

ed

re

turn

(%

)Conditional

Unconditional

−1 0 1 2 3

−1

01

23

Equities, upside regime


Re

aliz

ed

re

turn

(%

)

−1 0 1 2 3

−1

01

23


Re

aliz

ed

re

turn

(%

)

−1 0 1 2 3

−1

01

23


Re

aliz

ed

re

turn

(%

)

−1 0 1 2 3

−1

01

23

Equities, full sample


Re

aliz

ed

re

turn

(%

)

−15 −10 −5 0

−1

5−

10

−5

0

Multi−asset, downside regime


Re

aliz

ed

re

turn

(%

)

−15 −10 −5 0

−1

5−

10

−5

0


Re

aliz

ed

re

turn

(%

)

−1 0 1 2 3

−1

01

23

Multi−asset, upside regime


Re

aliz

ed

re

turn

(%

)

−1 0 1 2 3

−1

01

23


Re

aliz

ed

re

turn

(%

)

−1 0 1 2 3

−1

01

23


Re

aliz

ed

re

turn

(%

)

−1 0 1 2 3

−1

01

23

Multi−asset, full sample


Re

aliz

ed

re

turn

(%

).

81

Download - Factor models with downside risk by

Top Related