Download - Factor models with downside risk by
Factor models with downside risk
by
Daniele Massacci, Lucio Sarno and Lorenzo Trapani
Granger Centre Discussion Paper No. 20/02
Factor Models with Downside Risk*
Daniele Massacci† Lucio Sarno‡ Lorenzo Trapani§
This Version: November 2020
*The usual disclaimer applies.†Daniele Massacci is with King’s College London. Email: [email protected].‡Lucio Sarno is with the University of Cambridge and the Centre for Economic Policy Research (CEPR). Email:
[email protected].§Lorenzo Trapani is with the University of Nottingham. Email: [email protected].
Factor Models with Downside Risk
We propose a conditional model of asset returns in the presence of common factors and down-side risk. Specifically, we generalize existing latent factor models in three ways: we allow fordownside risk via a threshold specification which allows for the estimation of the ‘disappointment’event, which therefore does not need to be assumed arbitrarily; we permit different factor struc-tures (and number of factors) in different regimes; we show how to recover the observable factorsrisk premia from the estimated latent ones in different regimes. The usefulness of this generalizedmodel is illustrated through two applications to cross-sections of asset returns in equity marketsand other major asset classes.
1 Introduction
A growing body of research argues that the cross-section of asset returns should and does
reflect a premium for bearing downside risk. This risk premium is required by investors for
holding assets that covary strongly with the market when the market declines or is abnormally
low: such assets are undesirable precisely because they offer particularly low returns in bad times.
For example, Ang, Chen, and Xing (2006) show that there is a downsize risk premium in the
cross-section of US stock returns, a result confirmed by Farago and Tedongap (2018) using a
five-factor conditional model, while Lettau, Maggiori, and Weber (2014) show that a conditional
capital asset pricing model that allows for downside risk can explain not only US stocks, but also
the cross section of returns in currency and other major asset classes. These results are consistent
with the original conceptual framework in Markovitz (1959), which advocated using semivariance
as a measure of risk, rather than variance, because semivariance measures downside losses rather
than upside gains.1
In a parallel development in the context of asset pricing models, the literature has also sought
to tackle the vexing issue of potentially biased estimation of unconditional (i.e., linear, and hence
free from downside risk) asset pricing models arising from omitting pricing factors. Since it is not
at all clear which pricing factors should be used, virtually any model which uses observable factors
is potentially at risk of omitting a relevant pricing factor, thereby suffering from omitted variable
bias. This misspecification problem is addressed rigorously in a recent, seminal contribution by
Giglio and Xiu (2020), who propose a three-pass procedure to conduct inference on the risk premia
of virtually any observable factor. The procedure solves both the omitted variable bias problem,
and the much debated “factor zoo” issue (see Harvey and Liu, 2020 and Cochrane, 2011), by
considering a latent factor model, which allows to span the underlying true factor space.
1The motivation for a downside risk premium arises from various theoretical perspectives. In some cases, therationale is behavioral and linked to theories of loss aversion, as in the setups of Ang, Chen, and Xing (2006) andFarago and Tedongap (2018), whereas in other cases it is based on rational asset pricing models, as in Lettau,Maggiori, and Weber (2014). The specific nature of the mechanism capable of generating the downside risk premiumis not a focus of our paper. We simply take seriously the finding that such premium can exist in equilibrium andask how it should be identified in an empirical asset pricing model.
1
The two research areas summarised above indicate that a linear asset pricing model based on
observable factors, which constitutes the workhorse model of empirical asset pricing, can suffer
from at least two possible misspecification issues: the omission of relevant pricing factors, and the
presence of a downside regime. Both these issues can be resolved by using a nonlinear, latent factor
model which allows for the presence of downside risk. However, despite the intuitive appeal of such
a setup, inference for asset pricing models with downside risk and latent factors is a challenging
task. In addition to the nonlinearity of the model, allowing for different regimes means that the
number of common latent factors may be different between good and bad states of the world; the
loadings on the factors may also change across states; the “disappointment event” that triggers the
downside state is in general unknown; and, finally, a solution to the omitted variable problem which
characterises linear asset pricing models needs to be fully developed when also taking downside
risk into account. Our paper offers a methodological contribution, addressing all these issues and
presenting a nonlinear model of asset returns in the presence of common factors and downside
risk; further, we consider a latent factor model approach, which allows to span the entire factor
space as advocated in Giglio and Xiu (2020). Our setup is general enough to encompass previous
models, while allowing for different common factors across states and estimation of the threshold
determining the downside state.
We make at least three contributions to the literature on factor models of asset returns.
Firstly, whilst we define the disappointment event as depending on the relative position of the
market return with respect to a threshold value in a similar fashion to the extant literature,
unlike previous studies we estimate the value of the threshold rather than setting it a priori
equal to a prespecified value. This is achieved by generalizing the inferential theory for panel
threshold models proposed by Massacci (2017) to the case where the number of common factors
and their loadings are allowed to differ across states; we also derive the relevant inferential theory
for the estimated threshold. Secondly, we develop a procedure to estimate the number of common
factors in a model with downside risk, by explicitly allowing for the case where such number may
differ across regimes. This is done in a similar fashion to Trapani (2018), although this is not a
trivial extension since it requires a preliminary round of estimation where the full factor model
2
is estimated (thereby including the threshold, and the spaces spanned by factors and loadings),
and the strong consistency of all estimates is required. We further provide test statistics that are
completely scale-invariant. Finally, we complete our inferential theory by developing a further
step, to allow for the mapping of the estimated latent factors in each stage onto economically
relevant observable ones. This step can be viewed as an extension of Giglio and Xiu (2020). The
distilled essence of these results is a general conditional model of asset returns with downside risk
which can be estimated and tested without imposing any arbitrary restrictions on the stochastic
process of returns. In our view, this setup constitutes at least a workhorse model for conducting
empirical asset pricing in a downside risk framework.
We employ our methodology to study the factor structure and estimate risk premia in the
presence of downside risk using two datasets previously employed in other studies - a dataset
composed of equity portfolios (the same as used by Farago and Tedongap (2018)) and a dataset
spanning multiple asset classes (based on the one used by Lettau, Maggiori, and Weber (2014)).
In both cases, our approach produces superior goodness of fit relative both to unconditional
(linear) factor models and restricted conditional models that impose the number of factors or the
threshold. The empirical estimation of the proposed latent factor model with downside risk proves
illuminating when compared to an unconditional model of asset returns. Indeed, the unconditional
model performs poorly in both states of the world, with systematic pricing errors that are positive
in the downside state and negative in the satisfactory state - i.e., the unconditional model predicts
much higher returns than the realized ones in the downside state, and lower returns than the
realized ones in the satisfactory state. Puzzingly, when analyzing the full sample performance
of the unconditional model, its performance can appear satisfactory, but this is only because
averaging across negative and positive pricing errors in different states gives the false impression
that the model can capture the average excess returns precisely. In other words, assessing the
performance of the unconditional factor model over the full sample can give the illusion that it
performs well when in fact it performs poorly in both regimes. In contrast, the latent factor model
with downside risk appears to have no systematic tendency to generate positive or negative errors
in either regime, and therefore also unconditionally, fitting the cross-sections of asset returns used
3
here very well.
The estimation results highlight the importance of estimating the threshold (rather than im-
posing it a priori) and of allowing for different common factors across states. For example, with
respect to the estimation of the threshold, we find a remarkable similarity in its point estimate
across the two datasets employed, with the estimate being around −6 percent. This estimate is
twice as large in absolute value as the most common value imposed in estimation by researchers,
which is −3 percent, thereby implying that the downside risk state is characterised by a much
smaller fraction of the return distribution with more extreme negative returns. We also find that,
while at least four (and up to six) common factors are needed in the satisfactory (good) state
of the world to capture the factor structure of returns, only one or at most two common factors
are needed in the downside risk state. This is consistent with the widely held view, and much
anecdotal evidence, that returns display a far lower-dimensional factor structure in bad times than
in good times, i.e. that diversification benefits disappear in bad times when they are needed most
(see e.g. Cappiello et al., 2006). This result arises endogenously in the model, estimated using
our procedure that explicitly allows the factors to change across states. In short, the empirical
results provide a strong case for the full-blown, unconstrained estimation of latent factor models
with downside risk in the context of asset pricing.
The remainder of the paper is organised as follows. Section 2 introduces the asset pricing
model. Section 3 deals with identification of risk premia. Section 4 presents our methodology to
determine the number of pricing factors. Section 5 discusses estimation and inference of pricing
factors, risk exposures and risk premia. Section 6 presents simulation results. Section 7 performs
the empirical analysis. Section 8 concludes.
2 Model
In this section we present both the setup of the unconditional asset pricing model that is most
commonly used in the literature, followed by the conditional asset pricing model with downside
risk that we study in this paper.
4
At the outset, it is useful to establish some notation. We denote the ordinary limit as “→”.
We use the notation c0, c1,... to denote positive and finite constants, which do not depend on the
sample sizes and whose value can change from line to line. I (·) denotes the indicator function. We
use the expression “a.s.” as short-hand for “almost surely”. We denote the j -th largest eigenvalue
of a matrix A as g ( j) (A); ιN denotes an N -dimensional vector of ones; Ik is an identity matrix
of dimension k; we use the notation aN =Ω (bN ) to indicate that, as N →∞, b−1N aN →∞; we set
CN ,T = min
N 1/2,T 1/2.
2.1 Unconditional asset pricing model
Let Ri ,t be the return (in excess of the risk-free rate) on the test asset i at time t . Our starting
point is the unconditional asset pricing model
Ri ,t = γ0 +αi +β′i γ1 +β′
i ut +εi ,t , (2.1)
with 1 ≤ i ≤ N , 1 ≤ t ≤ T , where N and T denote the cross-section and time series dimensions
of the available sample, respectively, and
ut = ft −µ f . (2.2)
In (2.2), ft =(
f1,t , . . . , fP,t)′
is the P ×1 vector of latent factors with E (ft ) =µ f , so that ut in (2.1)
represents the P ×1 zero mean vector of factor innovations; further, in (2.1), γ0 is the zero-beta
rate; γ1 =(γ1,1, . . . ,γ1,P
)′is the P ×1 vector of risk premia; βi =
(βi ,1, . . . ,βi ,P
)′is the P ×1 vector
of risk exposures; αi is the pricing error; εi ,t is the idiosyncratic component such that E(εi ,t
)= 0
and E(utεi ,t
)= 0. In matrix notation, the model becomes
Rt = γ0ιN +α+Bγ1 +But +εt , (2.3)
where Rt = (R1t , . . . ,RN t )′, α= (α1, . . . ,αN )′, and B = (β1, . . . ,βN
)′. Following Giglio and Xiu (2020),
we let some of the true factors be omitted or measured with error (or both). The K observable
factors gt , with K ≤ P , can be either tradable of nontradable (or both) and have the structure
gt = a+Λut +et . (2.4)
As shown in Giglio and Xiu (2020), given γ1 the K ×1 vector of risk premia of gt is γg =Λγ1.
5
2.2 Conditional asset pricing model with downside risk
Following Ang, Chen, and Xing (2006), Lettau, Maggiori, and Weber (2014), and Farago and
Tedongap (2018), we generalize the asset pricing model in (2.1) to account for downside risk. The
conditional factor model of interest is
Ri ,t = I (Dt )(γD,0 +αD,i +β′
D,i γD,1 +β′D,i uD,t
)+I (St )
(γS ,0 +αS ,i +β′
S ,i γS ,1 +β′S ,i uS ,t
)+εi ,t ,
(2.5)
where
u j ,t = f j ,t −µ j , f , j =D,S . (2.6)
The model in (2.5) generalizes (2.1) to allow for downside risk as captured by the disappoint-
ment event Dt , which is defined formally below. The satisfactory event St occurs whenever Dt
does not take place: formally, this means that I (Dt )+ I (St ) = 1 and I (Dt ) I (St ) = 0. Section 2.3
below discusses Dt and St in detail. In (2.6), f j ,t =(
f j ,1,t , . . . , f j ,P j ,t
)′is the P j ×1 vector of latent
factors satisfying E[f j ,t
∣∣I( jt)= 1
] = µ j , f ; u j ,t is the P j ×1 vector of factor innovations such that
E[u j t
∣∣I( jt)= 1
] = 0. As for (2.5): γ j ,0 is the zero-beta rate and γ j ,1 =(γ j ,1,1, . . . ,γ j ,1,P j
)′is the
P j ×1 vector of risk premia; β j ,i =(β j ,i ,1, . . . ,β j ,i ,P j
)′is the P j ×1 vector of risk exposures; α j ,i is
the pricing error; the idiosyncratic components εi ,t satisfy E[u j ,tεi ,t
∣∣I( jt)= 1
]= 0. The model in
matrix form isRt = I (Dt )
(γD,0ιN +αD +BDγD,1 +BDuD,t
)+I (St )
(γS ,0ιN +αS +BS γS ,1 +BS uS ,t
)+εt ,(2.7)
where α j =(α j ,1, . . . ,α j ,N
)′, and B = (
β j ,1, . . . ,β j ,N)′
. We generalize the mapping between the true
K latent factors, and observable factors as
gt = I (Dt )(aD+ΛDuD,t
)+ I (St )(aS +ΛS uS ,t
)+et . (2.8)
The K ×1 vector of risk premia of gt becomes
γg = I (Dt )γD,g + I (St )γS ,g, γ j ,g =Λ j γ j ,1, j =D,S . (2.9)
When the disappointment event occurs we thus have γg =γD,g =ΛDγD,1.
6
2.3 Disappointment event
Following, inter alia, Farago and Tedongap (2018), we define the disappointment event as
Dt =rW,t ≤ θ
, where rW,t is the log-return on the market and θ is the corresponding threshold
value that determines the occurrence of Dt . The satisfactory event then is St =rW,t > θ
. Both
Dt and St depend on θ: to stress this dependence, we let d j ,t (θ) = I(
jt), for j =D,S .
2.4 Pricing errors and zero-beta rate
The conditional asset pricing model in (2.7) allows for unrestricted zero-beta rates γD,0 and
γS ,0, and for non-zero pricing errors αD and αS . None of these invalidates our methodology.
In particular, non-zero pricing errors, which for instance may arise from failing no arbitrage
conditions, do not affect estimation and inference on the risk premia.2 We focus on the risk premia
in a conditional setting with downside risk: allowing for some degree of model misspecification is
a desirable feature of our procedure.
3 Identification
3.1 Identification of risk premia
The conditional factor model in (2.7) is subject to two kinds of identification issues: the
rotational indeterminacy typical of statistical factor models; the unobserved heterogeneity problem
induced by the pricing errors. Let us elaborate on these identification issues. Risk exposures,
pricing factors and risk premia in the linear model in (2.3) are identified up to a rotation: for any
P ×P positive definite matrix L we have
Rt = γ0ιN +α+Bγ1 +But +εt
= γ0ιN +α+BLL−1γ1 +BLL−1ut +εt ,
and B, γ1 and ut are only identified up to L as they are all unobservable. However, as shown in
Giglio and Xiu (2020), the risk premia for the observable factors gt in (2.4) are identified, since it
2We do not assess a particular asset pricing model: for example, Pesaran and Yamagata (2012) and Fan, Liao,and Yao (2013) empirically validate the APT in a linear setting by testing the null hypothesis that all pricing errorsare equal to zero.
7
holds that γg =Λγ1 =ΛLL−1γ1.
The conditional pricing model in (2.5) faces similar identification features. For any P j ×P j
positive definite matrix L j , with j =D,S , we have
Rt = dD,t (θ)(γD,0ιN +αD +BDLDL−1
DγD,1 +BDLDL−1
DuD,t
)+dS ,t (θ)
(γS ,0ιN +αS +BS LS L−1
SγS ,1 +BS LS L−1
SuS ,t
)+εt .
The risk premia γg defined in (2.9), however, are identified, since
γg = dD,t (θ)ΛDγD,1 +dS ,t (θ)ΛS γS ,1
= dS ,t (θ)ΛDLDL−1D
γD,1 +dS ,t (θ)ΛS LS L−1S
γS ,1.
We solve the unobserved heterogeneity problem induced by the pricing errors by expressing the
model for Rt in terms of deviations from the conditional means. Formally, let T j (θ) =∑Tt=1 d j ,t (θ)
be the number of times that the event j occurs, for j = D,S . Define the conditional average
returns R j (θ) = T j (θ)−1 ∑Tt=1 d j ,t (θ)Rt : R j (θ) is the mean of Rt when j occurs. In order to estimate
pricing factors and risk exposures, we consider the model
Rt (θ) = Rt −dD,t (θ)RD(θ)−dS ,t (θ)RS (θ) = dD,t (θ)BDuD,t (θ)+dS ,t (θ)BS uS t (θ)+Λεt (3.1)
where u j ,t (θ) = ut−u j (θ) is the deviation of ut from the conditional mean u j (θ) = T j (θ)−1 ∑Tt=1 d j ,t (θ)ut ;
Λεt = dD,t (θ)[εt −ΨεD(θ)]+dS ,t (θ)[εt −ΨεS (θ), where Ψε j (θ) = T j (θ)−1 ∑Tt=1 d j ,t (θ)ε j ,t is the conditional
mean of εt when j occurs.
3.2 Determining the disappointment threshold
From Section 3.1, identification of factors, exposures and risk premia requires knowledge of
the disappointment event Dt and thus of the threshold θ. Given that no procedure is available to
estimate θ, the literature typically assumes a value for θ and estimation of the asset pricing model
is then conditioned on that value being true.3 In general, however, θ is not a structural parameter
on which theory can provide guidance, which makes it difficult to assume a particular value of θ.
In this paper, we show how to estimate θ by least squares, as detailed in Section 4.2.1, building on
recent results in in Massacci (2017));. Henceforth, we let θ be the corresponding estimator.
3For example, Farago and Tedongap (2018) a priori set θ = −0.03 and run robustness checks for θ ∈0,−0.015,−0.04.
8
4 Inference
We begin by introducing some short-hand notation, which we use henceforth. The superscript
“0” will denote the true value of a parameter; when this is not used, we refer to a generic value.
We will also use the short-hand notation T j = T j(θ0
)and π j = π j
(θ0
); further, the true number
of factors in each regime will be denoted as P 0D
and P 0S
respectively, also setting, when needed,
P 0 = P 0D+P 0
S.
4.1 Assumptions
Let
u0t = u0
D,t ∪u0S ,t , (4.1)
where u0t has dimension P 0
u such that max(P 0
D,P 0
S
)≤ P 0u ≤ P 0
D+P 0
S, and let B0
u, j =(β0
u, j ,1, ...,β0u, j ,N
)′be, for j =D,S , the N ×P 0
u matrix suitably filled with zeros so that B0u, j u0
t = B0j u0
j ,t for j =D,S .
Finally, let
δ0i =β0
u,S ,i −β0u,D,i . (4.2)
Assumption 1. There exists an η ∈ (12 ,1
]such that δ0
i 6= 0 for i = 1, ...,bNηc and∑N
i=bNηc+1
∥∥δ0i
∥∥<∞.
Assumption 1 is related to the results in Bates, Plagborg-Møller, Stock, and Watson (2013),
who show that, if no more than a fraction O(N 1/2
)of the cross-sectional units in a large dimensional
factor model have a structural break in the loadings, then the principal components estimator
applied to a (misspecified) linear model has the same convergence rate as in the no break case -
namely, Op
(C−1
N ,T
)(see Bai and Ng (2002)). Thus, Assumption 1 requires that at least a fraction
O (Nη) of the N assets have regime specific factor loadings, for 12 < η≤ 1.
Assumption 2. It holds that, for all N : (i) (a) the largest eigenvalue of T −1j
∑Tt=1 E
(εtε
′t d j ,t
(θ0
))is finite for j = D,S , and (b) the smallest eigenvalue of T −1
j
∑Tt=1 E
(εtε
′t d j ,t
(θ0
))is positive for
j =D,S ; (ii)
max1≤i≤N
N∑k=1
1
T j
T∑t=1
T∑s=1
∣∣E (εi ,tεk,sd j ,t
(θ0)d j ,s
(θ0))∣∣<∞,
for j =D,S .
9
Assumption 3. It holds that (i) E∥∥f j ,t
∥∥4 < ∞ for j = D,S and 1 ≤ t ≤ T ; (ii) it holds that, as
T →∞1
T
T∑t=1
E(f0
j ,t d j ,t (θ) f′j ,t d j ,t(θ0))→Σf, j
(θ,θ0) ,
where Σf, j(θ,θ0
)is positive definite for j = D,S and all θ; (iii)
∥∥∥β0j ,i
∥∥∥ < ∞ for j = D,S and
1 ≤ i ≤ N ; (iv) N−1B′j B j →ΣB, j as N →∞, where ΣB, j are positive definite matrices for j =D,S .
Assumption 2 stipulates that the idiosyncratic part of the model (i.e., the errors εi ,t ) can be
cross-sectionally correlated; however, by part (i) of the assumption such cross-sectional correlation
is weak. Conversely, by Assumption 3, the “signal” part of the model (i.e., the one related
to the common factor structure) introduces strong cross-sectional correlation between the Ri ,t s.
In particular, as we show in Lemma 1 below, Assumption 3 entails that the eigenvalues of the
covariance matrix of the Ri ,t s have a spiked structure, whereby the largest eigenvalues diverge
proportionally to N whereas the other ones are bounded.
Assumption 4. It holds that (i) E(εi ,t
)= 0 and E∣∣εi ,t
∣∣8 <∞; (ii) E(d j ,t (θ)d j ,v (θ)εi ,tεi ,v
)= τ j ,i ,t ,v (θ)
with∣∣τ j ,i ,t ,v (θ)
∣∣< ∣∣τ j ,t ,v∣∣ for 1 ≤ i ≤ N , and T −1
j
∑Tt=1
∑Tv=1
∣∣τ j ,t ,v∣∣<∞ for j =D,S ; (iii) E
(T −1
j
∑Tt=1 d j ,t (θ)εi ,tεl ,t
)=
σ j ,i ,l ,t (θ) with∣∣σ j ,i ,l ,t (θ)
∣∣≤ ∣∣σ j ,i ,l∣∣ for 1 ≤ t ≤ T and N−1 ∑N
i=1
∑Nl=1
∣∣σ j ,i ,l∣∣<∞ for j =D,S ; (iv)
E
∣∣∣∣∣T −1/2j
T∑t=1
(d j ,t (θ)εi ,tεl ,t −E
(d j ,t (θ)εi ,tεl ,t
))∣∣∣∣∣4
<∞,
for j =D,S and 1 ≤ i , l ≤ N ; (v) it holds that E(f0
j ,sεi ,t |zt
)= 0 for j =D,S and 1 ≤ t , s ≤ T , with
E
[N−1
N∑i=1
∥∥∥∥∥T −1/2j
T∑t=1
d j ,t (θ) f0j ,tεi ,t
∥∥∥∥∥2]
<∞,
for all θ and j =D,S .
Assumption 4 mimics similar assumptions in the literature (see e.g. Assumptions C and D
in Bai (2003)), and essentially it states that the idiosyncratic errors are weakly dependent. Parts
(iv) and (v) of the assumption are, essentially, weak (serial) dependence conditions.
Assumption 5. It holds that (i)
f0j ,t , zt ,εt
is a strictly stationary, ergodic, ρ-mixing sequence
with mixing numbers ρ j ,m satisfying∑∞
m=1ρ1/2j ,m < ∞ for j = D,S (ii) E
(∥∥∥f0j ,tεi ,t
∥∥∥4 |zt
)< ∞ and
E
(∥∥∥f0j ,t
∥∥∥4 |zt
)<∞ for j = D,S and 1 ≤ i ≤ N ; (iii) zt has bounded density; (iv) the density of zt
10
is continuous at θ0; (v) E(f0
j ,t f0′j ,t |zt = θ
)is positive definite and continuous at θ = θ0 for j =D,S
with∑N
i=bNηc+1δ0′i E
(f0
j ,t f0′j ,t |zt = θ0
)δ0
i <∞.
Assumption 5 draws upon Assumption 1 in Hansen (2000) (see also Massacci (2017)), to which
we refer to for further discussion. In our context, it is used in order to obtain the convergence
rate of the estimators for factor innovations and loadings. We point out that some of the other
assumptions above - chiefly the ones that contain “time series results”(e.g., Assumptions 4(iv)-(v))
- would follow from Assumption 5, although they could also be derived from other dependence
assumptions.
A final note on the assumptions above. Assumption 3(iv) is typical in the literature on
inference on factor models (we refer, inter alia, to the contributions by Bai and Ng (2002) and
Bai (2003)) and, in essence, it requires that the common factors be “strong” or “pervasive”. A
well-known consequence of the assumption is that - as we show in Lemma 1 - the eigenvalues of
the second moment matrix of the data diverge at a rate exactly equal to N , which we exploit in
the construction of our tests for the determination of the number of common factors. Conversely,
Assumption 3(iv) rules out the potentially interesting case of having “weak” common factors.
As pointed out in Uematsu and Yamagata (2019), a “weak” common factor can be defined as
a common factor whose corresponding eigenvalue in the second moment of the matrix diverges,
but at a rate slower than N . Such a case could be of interest in general, and especially in the
context of a threshold factor model, where e.g. a pervasive factor affects only a fraction of the
units in one regime, and is absent in the other regime. Although such cases could, in principle, be
embedded in our analysis, we prefer to present the main results for the (restrictive) case implied
by Assumption 3(iv), and discuss extensions in Section 4.2.5.
4.2 Determining the number of common factors
The first step in the analysis is to determine the dimensions of the factor spaces P 0D
and
P 0S
.
11
4.2.1 Estimating θ0
We begin by studying the (strongly consistent) estimation of θ0, extending the results in
Massacci (2017). Let the N ×1 vector of demeaned returns be Rt =(R1,t , . . . , RN ,t
)′, for 1 ≤ t ≤ T ,
defined as in (3.1). Let Θ be the parameter space of θ0. For generic numbers of factors PD , PS and
P = PD +PS , define the N ×P matrix of loadings BP =(BPD
D,BPS
S
), and the associated PD ×T and
PS ×T matrices of demeaned factor innovations UPD
D=
(uPD
D,1, . . . ,uPD
D,T
)and UPS
S=
(uPS
S ,1, . . . ,uPS
S ,T
),
with UP =(UPD ′
D,UPS ′
S
)′. The objective function in terms of BP , UP and θ, is the sum of squared
residuals (divided by N T )
S(BP ,UP ,θ
)= 1
N T
T∑t=1
∥∥∥Rt −dD,t (θ)BPD
DuPD
D,t −dS ,t (θ)BPS
SuPS
S ,t
∥∥∥2. (4.3)
For given P , the estimator for B, U and θ0 are obtained by minimizing S(BP ,UP ,θ
)with respect to
BP , UP and θ. For identification purposes, we concentrate out UP . The estimators BP =(BPD
D, BPS
S
)with B
P j
j = (β j ,1, ..., β j ,N
)′for j =D,S , UP , and θR , for BP0 , UP0 and θ0, respectively, solve
BP ,UP , θP = arg min
BP ,UP ,θS
(BP ,UP ,θ
),
subject to the normalization restrictions N−1(B
P j ′j B
P j
j
)= IP j , for j = D,S . It can be easily seen
that, letting
Σ j ,R (θ) = 1
N T
T∑t=1
d j ,t (θ) Rt (θ) Rt (θ)′ , (4.4)
for j =D,S and θ ∈Θ, the estimator BP j
j (θ) for BP j
j isp
N time the N ×P j matrix of eigenvectors
of Σ j ,R (θ) corresponding to its largest P j eigenvalues. The restriction N−1(B
P j
j (θ)′ BP j
j (θ))= IP j ,
for j =D,S , implies that the corresponding estimators for uD,t and uS ,t are
uPD
D,t (θ) = N−1(dD,t (θ) BPD
D(θ)
)′Rt .
uPS
S ,t (θ) = N−1(dS ,t (θ) BPS
S(θ)
)′Rt .
Let BP (θ) =[
BPD
D(θ) , BPS
S(θ)
]and UP (θ) =
(UPD ′
D(θ) ,UPS ′
S(θ)
)′; define
S[BP (θ) ,UP (θ) ,θ
]= 1
N T
T∑t=1
∥∥∥Rt −dD,t (θ) BPD
D(θ) uPD
D,t (θ)−dS ,t (θ) BPS
S(θ) uPS
S ,t (θ)∥∥∥2
.
12
For P = PD +PS with PD ≥ P 0D
and PS ≥ P 0S
, we now define the estimator
θ = argminθ
S[
BP (θ) ,UP (θ) ,θ]
. (4.5)
In Lemma A.7 in Appendix A, we show that θ is a strongly consistent estimator of θ0, also
providing its rate; see also Theorem 3.4 in Massacci (2017), where weak consistency is shown.
4.2.2 The eigenvalues of the second moment matrices
Let
dD,t = I(Dt
),
dS ,t = 1− dD,t ,
and define πS = T −1 ∑Tt=1 dD,t and πD = 1− πS . We extend the procedure suggested by Trapani
(2018) to estimate the number of factors in each regime, P 0D
and P 0S
. We use the sample covariance
matrices
ΣD = 1
T πD
T∑t=1
Rt R′t dD,t , (4.6)
ΣS = 1
T πS
T∑t=1
Rt R′t dS ,t . (4.7)
Equations (4.6) and (4.7) contain two possible estimators of the population covariance matrices,
defined as
ΣD = 1
TπS
T∑t=1
E(Rt R′
t dD,t(θ0)) , (4.8)
ΣS = 1
TπS
T∑t=1
E(Rt R′
t dS ,t(θ0)) . (4.9)
We will extensively employ the following notation for the eigenvalues: g (i )j refers to the i -th
largest eigenvalue of ΣD and ΣS (according as j =D or S ); similarly, g (i )j refers to the i -th largest
eigenvalue of ΣD and ΣS .
In order to make the presentation easier to follow, we begin by considering the case where,
in Assumption 1, η = 1; that is, the number of units which undergo a regime change is (equal
13
or proportional to) N . We discuss the case where η < 1 – which entails that f02,t only affects a
fraction of the units, or, in other words, that it is not a pervasive/strong set of factors – in Section
4.2.5.
Lemma 1. We assume that Assumptions 1-3 are satisfied. Then it holds that, for j =D,S
c(i )j N ≤ g (i )
j ≤ c(i )j N , for 0 ≤ i ≤ P 0
j ,
for some 0 < c(i )j ≤ c(i )
j <∞, and
g (i )j ≤ c(i )
j , for i ≥ P 0j +1,
for some c(i )j <∞.
Theorem 1. We assume that Assumptions 1-5 are satisfied. Then it holds that, for all 1 ≤ i ≤ N
and j =D,S
g (i )j = g (i )
j +oa.s.
(N T 1/2
Tπ0j
(ln N )1+ε (lnT )1/2+ε)
,
for every ε> 0.
Lemma 1 and Theorem 1 can be read together. According to Lemma 1, the spectrum of each
second moment matrix has a spiked structure; more specifically, the largest P 0j eigenvalues diverge
to positive infinity as N → ∞, whereas the others are bounded. The fact that the divergence
rate is exactly N is a direct consequence of Assumption 3(iv) and also of having assumed η= 1.
However, as long as the largest P 0j eigenvalues diverge, even at a slower rate, our arguments below
are still valid although the construction of the tests may be more convoluted. We further discuss
this in Section 4.2.5. Theorem 1 offers the sample counterpart to Lemma 1. In essence, it poses
a bound on the estimation error g (i )j − g (i )
j .4 5
Based on the results in Lemma 1 and Theorem 1, we can determine the number of common
4As is typical in (large) Random Matrix Theory, there is no guarantee that g (i )j − g (i )
j will drift to zero as
min(N ,T ) →∞. See for example the related contributions by Bai and Yao (2008), Bai and Yao (2012) and Wangand Fan (2017), and also Lemma 2 in Trapani (2018), where a similar result is derived, under different assumptions
and for a linear model. Indeed, g (i )j − g (i )
j may even diverge to infinity. However, as long as g (i )j − g (i )
j is of smaller
magnitude than g (i )j for j ≤ P 0
j , the spiked structure of the spectrum of the second moment matrices is preserved,
in that g (i )j is of smaller magnitude when j > P 0
j compared to the case j ≤ P 0j .
5Note that we are not imposing that 0 < πD ,πS < 1 - we are therefore entertaining the possibility that e.g.πD =πD (T ) → 0, which may be well suited to a situation in which one of the two regimes occurs very rarely.
14
factors in each regime using the following algorithm based on two separate steps: in the first step,
we study a test for H0 : g (i )
j = c(i )j N
HA : g (i )j ≤ c(i )
j
, (4.10)
for some 0 < c(i )j <∞ and j =D,S . In essence, the null hypothesis is that the i -th largest eigenvalue
of the covariance matrix of the data - in each regime - diverges, thus suggesting that there are
at least i common factors. Conversely, upon rejecting the null hypothesis, the conclusion can be
drawn that there are fewer than i common factors in regime j . Thus, the second step of the
analysis is to determine P 0D
and P 0S
by carrying out a sequence of tests for i = 1, ...,Pmax, with
Pmax a user-defined upper bound.
4.2.3 The individual tests
We begin by discussing how to test for (4.10). Since Theorem 1 only contains rates, we propose
a randomised version of the estimated eigenvalues g (i )j , based on Trapani (2018). We need the
following assumption on the size of the two regimes TπD and TπS .
Assumption 6. As min(N ,T ) →∞, it holds that
(ln N )1+ε (lnT )1/2+ε
T 1/2π j= 0,
for j =D,S .
By Assumption 6, g (i )j − g (i )
j is of a smaller order of magnitude than N . As mentioned above,
this ensures the spiked structure of the spectrum of the sample second moment matrices.6
Based on Lemma 1, each g (i )j should diverge or not according to weather there are at least
i common factors or fewer. Thus, we would expect the same separation result to also hold for
g (i )j . Indeed, whenever g (i )
j diverges as fast as N (which, in (4.10), represents the null hypothesis),
g (i )j also does, at the same rate: this is an immediate consequence of Theorem 1. Conversely,
6In essence, the assumption is a restriction on the rates at which π j can drift to zero: in particular, theassumption is satisfied as long as π−1
j = O(T 1/2−ε) for any ε > 0. Note that the assumption also (very mildly)
restricts the relative rate of divergence between N and T as they pass to infinity - in essence, however, it allows forN to grow at an arbitrarily high polynomial order with T , i.e. it allows for N =O (T κ) for any value of κ.
15
there is no guarantee that g (i )j converges when g (i )
j does so: in this case (which corresponds to
the alternative hypothesis in (4.10)), from Theorem 1, it is still possible that the estimation error
g (i )j − g (i )
j will diverge at a rate N T −1/2, modulo some logarithmic terms. Therefore, in order to
ensure that g (i )j has the same separation result as its population counterpart g ( j)
i , we propose to
use the statistic
ψ(g (i )
j
)= exp
N−% j g (i )j
g j
(p
) , (4.11)
where
g j
(p
)= 1
N −p
N∑h=p+1
g (h)j ,
and p is user-defined. The purpose of g j
(p
)is to rescale g (i )
j ; in Trapani (2018), it is suggested
to set p = 0 (that is, to use the trace to rescale g (i )j ), but other choices are possible, such as p = i ,
which is recommended in Barigozzi and Trapani (2018). In (4.11), we have defined
% j =ε when
lnT j
ln N ≥ 12
1− 12
lnT j
ln N +ε whenlnT j
ln N < 12
, (4.12)
where ε > 0 is a (small) user-defined number. The rationale for this choice is that we need to
choose % j such that
limmin(N ,T )→∞
N 1−% j T 1/2
Tπ j(ln N )1+ε (lnT )1/2+ε = 0; (4.13)
(4.12) is therefore proposed under the condition 0 <πD ,π2 < 1, although it is valid for any values of
πD and πS . If this is violated, in principle (4.13) can be employed to construct a refined proposal
for % j . Based on (4.12) and on the strong rates in Theorem 1, it is easy to see that
limmin(N ,T )→∞
ψ(g (i )
j
)=∞ under H0,
limmin(N ,T )→∞
ψ(g (i )
j
)= 1 under HA.
Based on these heuristic considerations, we now describe the randomised test.
Step 1 Generate an i.i.d., N (0,1) sequenceξ(i )
j ,m ,1 ≤ m ≤ M, where the ξ(i )
j ,ms are independent
across i and j .
Step 2 Define the Bernoulli random variable
ζ(i )j ,m (s) = I
(ψ
(g (i )
j
)×ξ( j)
i ,m ≤ s)
. (4.14)
16
Step 3 Compute
υ(i )j (s) = 2
M 1/2
M∑m=1
(ζ(i )
j ,m (s)− 1
2
). (4.15)
Step 4 Compute
Υ(i )j =
∫ +∞
−∞
(υ(i )
j (s))2 1p
2πexp
(−1
2s2
)d s. (4.16)
Let P∗ be the probability conditional onRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T
; we use “
D∗→” and “
P∗→” to
indicate convergence in distribution and in probability according to P∗ respectively. Also, define
Å such that
0 < Å<c(i )
j
1N−p
∑Nh=p+1 c(h)
j
,
where c(i )j and the c(h)
j s are defined in Lemma 1.
Theorem 2. We assume that Assumptions 1-6 are satisfied with η = 1 in Assumption 1. Let
M = M (N ), so that limN→∞ M (N ) =∞. Then, as min(N ,T j
)→∞ with
M exp(−ÅN 1−% j
)→ 0, (4.17)
it holds that under H0
Υ(i )j
D∗→ χ2
1, (4.18)
for almost all realizations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T
, j =D,S and all 1 ≤ i ≤ N .
Under HA, it holds that there exists a positive, finite constant c0 such that
1
4MΥ(i )
jP∗→ c0, (4.19)
for almost all realizations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T
, j =D,S and all 1 ≤ i ≤ N .
The theorem, and the algorithm above, suggest that some tuning is needed prior to imple-
menting the test. To begin with, note that the power of tests based on Υ(i )j increases with M by
(4.19); conversely, in the proof of (4.18) it is shown that the test statistic has a non-centrality
parameter which is controlled by (4.17), and which vanishes faster the smaller M . Thus, the
choice of M reflects the well-known size-power trade-off. We point out that, based on (4.17), the
17
choice M = N satisfies the restriction, and it is our recommended choice.7 Similarly, we have set % j
according to (4.12), with ε= 0.01; based on Theorem 2, ε does not need to be too large, since its
purpose is to smooth away a slowly varying sequence. Even in this case, as the theory prescribes,
the impact of % j is minimal; we have tried, as robustness check, ε= 0.05 and ε= 0.1, and results
do not change in general (with one exception discussed below in Section 7.2.2).
In Step 2, we recommend to draw the values of s from a standard normal. By integrating s out
in Step 4 of the algorithm, the statistic Υ(i )j becomes scale-invariant, in the sense that Υ(i )
j does not
depend on the support of s.8 This form of scale invariance is clearly desirable. From a practical
point of view, we propose to use a Gauss-Hermite quadrature to approximate the integral that
defines Υ(i )j , viz.
Υ(i )j = 1p
π
nS∑i=1
wi
(υ(i )
j
(p2xi
))2, (4.20)
where the xi , 1 ≤ i ≤ nS , are the zeros of the Hermite polynomial HnS (x) and
wi =pπ2nS−1 (nS −1)!
nS[HnS−1
(p2xi
)]2 . (4.21)
Thus, when constructing υ(i )j (s), we construct nS of these statistics, each with s =p
2xi ; the values
of the roots xi , and of the corresponding weights wi , are tabulated e.g. in Salzer, Zucker, and
Capuano (1952).
4.2.4 Determining P 0D
and P 0S
Based on the results in the previous section, it is now possible to propose a sequential procedure
to determine P 0D
and P 0S
. This is based on the following two-step algorithm, which we present
only for the determination of P 0D
, the case of P 0S
being exactly the same.
For each j =D,S
7We point out that altering the choice of M within a reasonable range does not have any impact on the finalresults. Indeed, we tried M = N /2 and M = 2N , without noting any changes. This robustness is in line with thesimulations in Trapani (2018).
8We point out that, in the original paper by Trapani (2018), this is not the case since the methodology isimplemented by extracting several values of s from a uniform distribution with support, say, S: doing this makesthe test statistics dependent on S, thus making them not scale invariant.
18
Step 1 Run the test for H0 : g (i )j =∞ based on Υ(1)
j . If the null is rejected, set P j = 0 and stop,
otherwise go to the next step.
Step 2 Starting from i = 1, run the test for H0 : g (i+1)j =∞ based on Υ(i+1)
j . If the null is rejected, set
P j = i and stop; otherwise repeat the step until the null is rejected, or until the pre-specified
maximum number Pmax is reached.
Let α=α (N ,T ) denote the level of the individual tests. It holds that
Theorem 3. We assume that Assumptions 1-6 are satisfied. As min(N ,T j
) →∞ under (4.17), if
α(N ,T j
)→ 0, then it holds that
P j = P 0j +oP∗ (1) ,
for almost all realizations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T
, and j =D,S .
In the implementation of the algorithm, it is necessary to have an upper bound Pmax; this is
chosen as
Pmax =O(C 1−c
N ,T
), (4.22)
for some user-defined c > 0. The selection rule in (4.22) is as an extension of the otherwise
customarily employed Schwert’s rule (Schwert (1989); see also the comment in Bai and Ng (2002));
as a possible example, one may use Pmax =⌊
min
N 1/3,T 1/3⌋
; in our simulations, we have tried
this specification and, as a robustness checks, other ones, but results do not seem to be affected
in any major way.
4.2.5 Discussion on weak factors
As pointed in Section 4.1, Assumption 3(iv) implicitly stipulates that common factors are
strong, or pervasive, thus ruling out the case of weak common factors. In this section, we provide
a heuristic discussion, based on some examples, of (i) under which circumstances weak common
factors may arise in the context of our nonlinear factor model, and (ii) under which circumstances
our procedure is able to find evidence of weak common factors.
Example 1. A first example is intimately related to the discussion in Trapani (2018) on weak
factors; we also refer to the paper by Uematsu and Yamagata (2019) for a very general, thorough
19
characterisation of weak factors. Suppose (for simplicity and with no loss of generality) that there
exists only one common factor in both regimes, which affects all units and whose loadings are such
that N−1B′j B j → 0, but N−νB′
j B j → c0 > 0 for some 0 < ν< 1. In this case, adapting the proof of
Lemma 1, it is easy to see that
g (1)j = c1, j Nν, (4.23)
for both regimes j =D,S ; in turn, this entails that, adapting Theorem 1
g (1)j = c1, j Nν+oa.s.
(N T 1/2
Tπ j(ln N )1+ε (lnT )1/2+ε
). (4.24)
In order to detect that g (1)j →∞, by construction it is required that N−% j g (1)
j →∞, which in turn
requires ν> % j . Recalling (4.12), after some elementary algebra, this is tantamount to requiring
ν>
0 whenlnT j
ln N ≥ 12
1− 12
lnT j
ln N whenlnT j
ln N < 12
, (4.25)
This result essentially states that the ability to detect weak factors depends on the relative rate
of divergence between N and T : as N increases, weak factors become more and more difficult to
determine. Note that this is not a property of the testing procedure per se, but of the estimated
eigenvalues, where the “signal” part g (1)j becomes weaker and weaker as the “noise” component
increases with N . Note that when N is comparatively small with respect to T - i.e. when
N = o(T 1/2
)- then weak factors can, in principle, be always detected.
Example 2. Another possible example is the appearance/disappearance, from one regime to an-
other, of a possibly strong common factor which affects only some of the units. Consider the
case where (again, for simplicity) there is no common factor in one regime, whereas there is one
common factor in the other regime, but only O (Nν) units are affected by it. In this case, since
only O (Nν) of the loadings are non-zero, it follows that N−1B′j B j → 0, but N−νB′
j B j → c0 > 0,
which, similarly to (4.23) and (4.24), entails
g (1)j = c1, j Nν,
g (1)j = c1, j Nν+oa.s.
(N T 1/2
Tπ j(ln N )1+ε (lnT )1/2+ε
).
In this case, two (different) tasks are affected by the appearance of a cluster-specific factor: firstly,
the estimation of θ0, which, by Assumption 1, is possible only if ν> 12 ; secondly, the test for the
20
presence of a common factor, which, as before, requires N−% j g (1)j → ∞, which in turn requires
ν> % j . Putting these two requirements together, it follows that
ν> max
1
2,1− 1
2
lnT j
ln N
, (4.26)
which, in essence, entails that a new common factor which affects only a fraction of the units can
be detected only if it affects “enough units”. Of course, if θ0 were known a priori, we would be in
the same situation as (4.25).
Example 3. As a closely related variation to the previous example, consider the case where - again
- there is only one common factor which affects only a fraction of the units, say O (Nν), whose
loadings differ across regimes. The same logic as in the previous examples implies that
g (1)j = c1, j Nν,
g (1)j = c1, j Nν+oa.s.
(N T 1/2
Tπ j(ln N )1+ε (lnT )1/2+ε
).
In this case, by definition, only O (Nν) units are subject to a change between regimes; thus,
recalling Assumption 1, it must hold that
ν> max
1
2,1− 1
2
lnT j
ln N
,
exactly as in (4.26).
Example 4. As (4.12) - and, consequently, (4.25)-(4.26) - show, the relative speed of divergence
between N and T j plays an important role. So far, our analysis has focused on two regimes of
length TD and TS respectively, with the implicit assumption that TD = cT (and consequently
TS = (1− c)T ), for some c ∈ (0,1). This means that the number of time periods in each regime
is of the same order of magnitude as T . However, the results in Theorems 1, 2 and 3 only
require min(TD ,TS ) →∞, so that, in principle, the case TD = o (T ) can be considered. This has
consequences on the ability to detect weak common factors: the requirement
ν> max
1
2,1− 1
2
lnTD
ln N
,
coming from the examples considered above, still holds, but when TD = o (T ), it entails that the
detection of a weak common factor in regime D is more difficult, and factors that are “too weak”
21
will not be identified, thus leading to an understatement of the number of common factors in
regime D.
As a final comment, all the examples considered show that our procedure (similarly to all
procedures to estimate the number of common factors) may fail in the presence of a weak common
factor, thus resulting in an underestimation of the number of common factors. A possible solution
in this case - which we exploit in the empirical part - is to estimate P 0j first and then, by way of
robustness check, estimate (2.1) with P j and P j +1 factors.
5 Three-pass estimation of downside risk premia
We now extend the Fama-MacBeth two-pass procedure (Fama and MacBeth (1973)) and
estimate downside risk premia for the observable factors gt as defined in (3.1) in the spirit of
Giglio and Xiu (2020). In particular, we propose the following the three-pass procedure:
First-pass: Based on Section 4.2 and Massacci (2017), estimate θ0, P 0j ,
u j ,t
Tt=1, and
β j ,i
Ni=1 for
j =D,S : this gives the estimators θ, P j ,
u
P j
j ,t
T
t=1, and
β
P j
j ,i
N
i=1for j =D,S , respectively.
Second-pass: Estimate the risk premia of f j ,t by running regime-specific cross-sectional regressions.
Third-pass: Estimate Λ within each regime by running a regression of gt on uP j
j ,t based on (2.8)
and rotate the risk premia of f j ,t .
We have discussed the first pass in Section 4.2, where consistency of θ and P j are derived. We
now turn to the second and third one.
5.1 Regime-specific cross-sectional regression
Given the N × P j matrices of beta estimates BP j
j =(β
P j
j ,1, . . . , βP j
j ,N
)′, define the N × (
1+ P j)
matrices X j =(ιN , B
P j
j
), for j = D,S . Let Γ j =
(γ j ,0,γ′j ,1
)′, for j = D,S , and Rt =
(R1,t , . . . ,RN ,t
)′,
for t = 1, . . . ,T .
22
The regime specific cross-sectional regression of Rt on X j yields
Γ j ,t =(X′
j X j
)−1 (X′
j Rt
)d j ,t ,
for j =D,S and 1 ≤ t ≤ T . Further, letting T j = T π j for j =D,S , Γ j is estimated as
Γ j =(γ j ,0
γ j ,1
)= 1
T j
T∑t=1Γ j ,t =
(X′
j X j
)−1 [X′
j R j(θ)]
,
for j =D,S , where R j(θ)= T −1
j
∑Tt=1 d j ,t Rt .
5.2 Estimation of regime-specific risk premia
Let g j ,t = gt d j ,t(θ)−g j , with g j = T −1
j
∑Tt=1 gt d j ,t
(θ), for j =D,S .
The estimator for Λ j within each regime is
Λ j =[
T∑t=1
d j ,t(θ)
g j ,t u′j ,t
][T∑
t=1d j ,t
(θ)
u j ,t u′j ,t
]−1
,
for j =D,S . Similarly, the estimator for γg, j ,1 is
γg, j ,1 = Λ j γ j ,1,
for j =D,S .
5.3 Asymptotics: consistency and limiting distribution
We now report the asymptotics for γg, j ,1; results for Γ j and Λ j are in appendix. Define the
P 0j ×T matrices of regime-specific factor innovations U j (θ) = [
u j ,1 (θ) , . . . ,u j ,T (θ)], for j = D,S ,
such that U1 (θ)+U2 (θ) = (u1, . . . ,uT ) = U.
5.3.1 Consistency
As is well known, Γ j (and Λ j ) cannot be estimated consistently, but only up to a rotation.
Let HP j
j j (θ) be the P 0j × P j rotation matrix defined as
HP j
j j (θ) = U j(θ0
)U j (θ)′
T
B0′j B j (θ)
NV j (θ)−1 , (5.1)
23
for j =D,S , where V j (θ) is the P j ×P j diagonal matrix of the first P j largest eigenvalues of Σ j ,R (θ)
defined in (4.4). Define the matrix HΓP j
j j (θ) as
HΓP j
j j (θ) = 1 0′
P j
0P jH
P j
j j (θ)
, (5.2)
for j =D,S .
For short, we define H j j (θ) as HP j
j j (θ) calculated with P 0j common factors, and HΓ
j j (θ) similarly.
Finally, we define H j j(θ0
)= H j j and HΓj j
(θ0
)= HΓj j .
We need the following assumptions, which complement Assumptions 1-6.
Assumption 7. 9It holds that, for j = D,S : (i) E(∑N
i=1 wiα j ,i)2 ≤ c0N for all wi N
i=1 such that∑Ni=1 w 2
i <∞; (ii)α j ,i
Ni=1 is independent of all other quantities; (iii) it holds that, as N →∞,[
V ar(∑N
i=1 viα j ,i)]−1/2 ∑N
i=1 viαiD→ N (0, Iν) for every ν-dimensional sequence vi , 1 ≤ i ≤ N , such
that∥∥N−1 ∑N
i=1 vi v ′i
∥∥<∞.
Assumption 8. It holds that: (i) in (2.4), (a) ‖Λ‖ = O (1) and (b) E (et ) = 0 with E ‖et‖4 <∞; (ii)
T −1 ∑Tt=1 et u0′
j ,t =OP (1) for j =D,S .
Theorem 4. We assume that Assumptions 1-8 are satisfied. Then, as min(N ,T j
) →∞, it holds
that
γg, j ,1 −γ0g, j ,1 = oP (1)+oP∗ (1) ,
for j =D,S , for almost all realizations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T
.
5.3.2 Asymptotic distribution
In order to derive the limiting distribution of the estimators proposed above, we need the
following assumption, which strenghtens some of the assumptions above.
9Assumption 7 contains some regularity conditions on the pricing errors αi in (2.1). By part (i), we require thatthe second moment of the αi s grows linearly with N ; a possible case in which this would happen is when αi is i.i.d.with zero mean and finite variance. However, the assumption is more general than this, and it allows e.g. for the
αi s to have nonzero mean. Indeed, if E (αi ) 6= 0 for 1 ≤ i ≤ Nα with Nα =O(N 1/2
), and E
(∑Ni=Nα+1 wiαi
)2 ≤ c0N , the
assumption would hold anyway. Note also that part (iii) of the assumption could be derived using more primitiveassumptions on the αi s (see e.g. the very general CLT in McLeish et al. (1974)).
24
Assumption 9. For j =D,S , it holds that (i) E(et |u0
j ,t
)= 0 for all t and (ii) as T →∞
T −1/2
( ∑Tt=1 vec
(et u0′
j ,t
)d j ,t
(θ0
)∑Tt=1 u0
j ,t d j ,t(θ0
) )D→ N
[(00
),
(Veu, j Π′
ue, j
Πue, j Vu, j
)],
where
Veu, j = limT→∞
E
(T −1
T∑t ,s=1
vec(et u0′
j ,t
)(vec
(esu0′
j ,s
))′d j ,t
(θ0)d j ,s
(θ0)) ,
Vu, j = limT→∞
E
(T −1
T∑t ,s=1
u0j ,t u′0
j ,sd j ,t(θ0)d j ,s
(θ0)) ,
Πue, j = limT→∞
E
(T −1
T∑t ,s=1
u0j ,t
(vec
(esu0′
j ,s
))′d j ,t
(θ0)d j ,s
(θ0)) .
We also make the following (simplifying) assumption, which complements Assumption 7 by
requiring the sphericity of the α j ,i .
Assumption 10. It holds that α j ,i is i.i.d. across i with E(α2
j ,1
)=σ2
α, j <∞ for j =D,S .
For j = D,S , consider the matrices M( j)u = limT→∞ T −1 ∑T
t=1 u0j ,t u0′
j ,t d j ,t(θ0
). We then de-
fine
Σγ, j = c−2j Λ
0j Vu, jΛ
0′j +
((γ0′
j ,1
(M( j)
u
)−1)⊗ IK
)Veu, j
(((M( j)
u
)−1γ0
j ,1
)⊗ IK
)+c−1
j Λ0jΠue, j
(((M( j)
u
)−1γ0
j ,1
)⊗ IK
)+ c−1
j
((γ0′
j ,1
(M( j)
u
)−1)⊗ IK
)Π′
ue, jΛ0′j ,
Σαγ, j = σ2α, jΛ
0j
[1
NB0′
j B0j −
(1
NB0′
j iN
)(1
NB0′
j iN
)′]−1
Λ0′j ,
where iN is an N ×1 vector of ones.
Theorem 5. We assume that Assumptions 1-9 are satisfied, and that min(N ,T j
)→∞, with
T j
T→ c j ∈ (0,1) , (5.3)
T 1/2
N→ 0. (5.4)
Then, it holds that (1
TΣγ, j + 1
NΣαγ, j
)−1/2 (γg, j ,1 −γ0
g, j ,1
)D∗→ N (0, IK ) , (5.5)
for j =D,S , for almost all realisations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T
.
25
A comment on (5.4) may be in order. By Theorem 4, all estimators are consistent, irrespective
of the relative rate of divergence between N and T as they pass to infinity. Theorem 5, conversely,
requires a restriction, namely that T = o(N 2
). However, it can be envisaged that most datasets
(unless T is very “short”, which is typically not the case in these studies) should satisfy this
requirement.
We propose the following estimators
Σγ, j = c−2j Λ j Vu, j Λ
′j +
((γ′j ,1
(M( j)
u
)−1)⊗ IK
)Veu, j
(((M( j)
u
)−1γ j ,1
)⊗ IK
)(5.6)
+c−1j Λ j Πue, j
(((M( j)
u
)−1γ j ,1
)⊗ IK
)+ c−1
j
((γ′j ,1
(M( j)
u
)−1)⊗ IK
)Π′
ue, j Λ′j ,
Σαγ, j = σ2α, j Λ j
[1
NB
P j ′j B
P j
j −(
1
NB
P j ′j iN
)(1
NB
P j ′j iN
)′]−1
Λ′j , (5.7)
In (5.6), recall that X j =(ιN , B
P j
j
). Also, we have used
M( j)u = 1
T
T∑t=1
u j ,t u′j ,t d j ,t , (5.8)
Vu, j = m( j)u,0 +
hm∑k=1
(1− k
hm +1
)(m( j)
u,k +m( j)′u,k
), (5.9)
Veu, j = m( j)eu,0 +
hm∑k=1
(1− k
hm +1
)(m( j)
eu,k +m( j)′eu,k
), (5.10)
Π′ue, j = 1
T
T∑t=1
(vec
(et u′
j ,t
))u′
j ,t d j ,t +hm∑k=1
(1− k
hm +1
)(m( j)Π,k +m( j)
Π,k
), (5.11)
with
m( j)u,k = 1
T
T∑t=k+1
(u j ,t d j ,t
)(u j ,t−k d j ,t−k
)′,
m( j)eu,k = 1
T
T∑t=k+1
(vec
(et u′
j ,t
)d j ,t
)(vec
(et−k u′
j ,t−k
)d j ,t−k
)′,
m( j)Π,k = 1
T
T∑t=k+1
(vec
(et−k u′
j ,t−k
)d j ,t−k
)(u j ,t d j ,t
)′,
m( j)Π,k = 1
T
T∑t=k+1
(vec
(et u′
j ,t
)d j ,t
)(u j ,t−k d j ,t−k
)′,
for k = 0,1, .... In (5.9)-(5.10), we use hm = O(T 1/3
), and et = gt −
(a+ Λut
). Finally, in order to
compute σ2α, j , let
e j ,i ,t = Ri ,t d j ,t −[γ j ,0 + β′
j ,i
(γ j ,1 + u j ,t
)]d j ,t ,
26
and define e j ,i = T −1j
∑Tt=1 e j ,i ,t and
σ2α, j =
1
N
N∑i=1
e2j ,i .
All the estimators defined above are consistent - in order to save space, we omit the proofs,
which is based on standard, if tedious, calculations based on using the results reported in the
above. The only result that does not follow immediately is the consistency of σ2α, j , which we state
as a lemma.
Lemma 2. We assume that Assumptions 1-10 hold. Then, as min(N ,T j
)→∞ for j =D,S under
(5.3) and (5.4), it holds that σ2α, j =σ2
α, j+oP∗ (1), for almost all realizations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T
.
When estimating Γ j , we also obtain an estimator of the zero-beta rate γ j ,0 for j =D,S . Based
on equation (2.5), we note that the quantity
E(I (Dt )γD,0 + I (St )γS ,0
)=πDγD +πS γS ,
can be interpreted as an “unconditional” zero-beta rate. Thus, we can also propose a test for
H0 :πDγD,0 +πS γS ,0 = 0. (5.12)
The following lemma characterizes the distribution of γ j ,0 and the test for (5.12)
Lemma 3. We assume that Assumptions 1-10 hold, and that E(αi ,Dαk,S
)= 0 for all 1 ≤ i ,k ≤ N .
Then, as min(N ,T j
)→∞ with NT 2
j→ 0, it holds that
(σ2α, j
(i ′N M0
B , j iN
)−1)−1/2 (
γ j ,0 −γ j ,0
)D∗→ N (0,1) , (5.13)
for j =D,S , and(π2
Dσ2α,D
(i ′N M0
B ,DiN
)−1 +π2S σ
2α,S
(i ′N M0
B ,S iN
)−1)−1/2 ((
πD γD,0 + πS γS ,0)− (
πDγD,0 +πS γS ,0)) D∗→ N (0,1) ,
(5.14)
for almost all realizations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T
, where M0
B , j = B0j
(B0′
j B0j
)−1B0′
j for j =D,S .
The lemma can be compared with Theorem 3 in Giglio and Xiu (2020).
27
5.3.3 Testing for θ0 = c
We propose a test for the (general) null hypothesis
H0 : θ0 = c, (5.15)
where c is a number. Given that - to the best of our knowledge - the limiting distribution of θ is
not available, we only rely on rates, and used a randomised test (closely related to e.g. the test
proposed in Horvath and Trapani (2019)).
Recall that, by Lemma A.7
θ−θ0 = oa.s.(N−ηT −1v N ,T (ε)
),
where v N ,T (ε) is defined in (A.3). In order to propose a test for (5.15), define the sequence sN ,T
such that
limmin(N ,T )→∞
sN ,T =∞,
limmin(N ,T )→∞
sN ,T N−ηT −1v N ,T (ε) = 0,
for all ε> 0, and note that, by construction
sN ,T(θ− c
)= sN ,T(θ0 − c
)+oa.s. (1) .
This entails that, under H0
P
(ω : lim
min(N ,T )→∞sN ,T
(θ− c
)= 0
)= 1,
and therefore we can assume that, under the null
limmin(N ,T )→∞
sN ,T(θ− c
)= 0.
Conversely, whenever θ0 6= c, it follows that
P
(ω : lim
min(N ,T )→∞sN ,T
(θ− c
)=∞)= 1,
28
and therefore we can assume that, under HA
limmin(N ,T )→∞
sN ,T(θ− c
)=∞.
This dichotomous behaviour can be reversed by choosing a transformation g (·) such that limx→∞ g (x) =0 and limx→0 g (x) =∞, thus defining
fN T = g(sN ,T
(θ− c
)).
As before, we can assume that
limmin(N ,T )→∞
g(sN ,T
(θ− c
))=∞,
limmin(N ,T )→∞
g(sN ,T
(θ− c
))= 0,
under H0 and HA respectively. Hence, we can define a test similar to the one in Section 4.2.3:
Step 1 Generate an i.i.d. sequenceξθm
, 1 ≤ m ≤ Mθ, with common distribution N (0,1) ;
Step 2 Generate the Bernoulli sequence ζθm (s) = I(
f 1/2N ,T ξ
θm ≤ s
);
Step 3 Compute
σθ (s) = 2
M 1/2θ
Mθ∑m=1
(ζθm (s)− 1
2
).
Step 4 Define
Sθ =∫ +∞
−∞
∣∣∣σθ (s)∣∣∣2 1p
2πexp
(−1
2x2
)d x. (5.16)
Then it can be shown that
Theorem 6. We assume that the assumptions of Lemma A.7 are satisfied. Then, as min(N , Mθ,T ) →∞ with
Mθ
g 1/2(sN ,T N−ηT −1v N ,T (ε)
) → 0, (5.17)
it holds that SθD∗→ χ2
1, under H0, and that R−1SθP∗→ c0 > 0 under HA, for almost all realisations of
Ri ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T.
As far as the practical application of the test is concerned, we have used
fN ,T = 1pN T
∣∣θ− c∣∣ ,
29
and Mθ = (N T )1/4.
Note that the test can be immediately generalised to comparing estimates of θ from different
datasets. Indeed, let the subscripts 1 and 2 denote each datatset, and define the sample sizes as
(N1,T1) and (N2,T2), and the relevant estimates as θ1 and θ2. Then, upon defining a sequence
s∗N ,T = s∗N ,T (N1,T1, N2,T2) such that
limmin(N1,T1,N2,T2)→∞
s∗N ,T =∞,
limmin(N1,T1,N2,T2)→∞
s∗N ,T
min
Nη1 T1, Nη
2 T2 max
v N1,T1 (ε) , v N2,T2 (ε)
= 0,
the test can be applied readily using
f ∗N ,T =
pminN1T1, N2T2∣∣θ1 − θ2
∣∣ ,
with Mθ = min(N1T1)1/4 , (N2T2)1/4
.
Theorem 6 entails that
limmin(N ,T,Mθ)→∞
P∗(Sθ ≥ cα
)= 1, (5.18)
under HA, and
limmin(N ,T,Mθ)→∞
P∗(Sθ ≥ cα
)=α, (5.19)
under H0, where cα is defined as P(χ2
1 ≥ cα) = α for a nominal level α. Both equations hold for
almost all realisations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T
.
Whilst equation (5.18) has the “traditional” interpretation (when the null is false, the test
- asymptotically - rejects with probability 1), equation (5.19) does not have a straightforward
meaning. As pointed out e.g. in Horvath and Trapani (2019), the test is constructed using a
randomisation which does not vanish asymptotically, and therefore the asymptotics of Sθ is driven
by the added randomness. Thus, different researchers using the same data will obtain different
values of Sθ, and, consequently, different p-values. If this feature is deemed undesirable, Horvath
and Trapani (2019) propose to generate the test statistic Sθb for 1 ≤ b ≤ B times, using, at each
iteration, an i.i.d. sequenceξθm,b
, 1 ≤ m ≤ Mθ, with common distribution N (0,1), independent
30
across b. Defining
Qα = 1
B
B∑b=1
I(Sθb ≤ cα
), (5.20)
it is easy to see that, by the LIL, it follows that10
lim infB→∞
limmin(N ,T,Mθ)→∞
√B
2lnlnB
Qα− (1−α)pα (1−α)
=−1,
and therefore a “strong rule” to decide in favour of H0 is
Qα ≥ (1−α)−√α (1−α)
√2lnlnB
B. (5.21)
6 Simulations
We evaluate the performance of (i) the sequential procedure to determine the number of com-
mon factors and (ii) the three pass estimation, through a small-scale Monte Carlo exercise.11
Data are generated according to (2.5), that is
Ri ,t =I (Dt )(γD,0 +αD,i +β′
D,i γD,1 +β′D,i uD,t
)+ I (St )
(γS ,0 +αS ,i +β′
S ,i γS ,1 +β′S ,i uS ,t
)+p
0.5εi ,t . (6.1)
In this data generating process (DGP), we generate all variables as having zero mean for simplicity.
Where possible and relevant, we calibrate the values of parameters to values that arise from
empirical analysis. In particular, as far as the downside risk is concerned, we set θ0 = −0.03,
which is the value set in Farago and Tedongap (2018) and Lettau, Maggiori, and Weber (2014).
Similarly, we generate zt as i .i .d .N(µz ,σ2
z
). Based on our data, we set σz = 0.044; also, in Section
7 we found that P(zt ≤ θ0
)' 0.15, which corresponds to a value µz =−0.015. The rest of the DGP
is specified as follows. For j = D,S , we generate u j ,t as i .i .d . N (0,1) for 1 ≤ t ≤ T ; similarly, we
generate βi , j as i .i .d . N (0,1) for 1 ≤ i ≤ N ; finally, α j ,i is also i .i .d . N (0,1) for 1 ≤ i ≤ N .12
10Details on this argument are in Horvath and Trapani (2019).11As far as the former is concerned, the results complement the ones in Trapani (2018), which considers the case
of a linear model.12We note that we have carried out some limited experiments where, e.g., a vanishing proportion of the α j ,i s
had nonzero mean, but results did not change. As far as the idiosyncratic innovation εi ,t is concerned, we onlyconsider the case of no cross sectional independence in our simulations; thus, we generate εi ,t as i.i.d. N (0,0.5).
31
Results are reported for the following combinations of (N ,T ), which are directly based on
the sample sizes in our empirical applications: (17,273) and (130,666). In addition, we also
report simulation results for two more cases: (70,273), because it is helpful in understanding the
improvement in estimation performance that accrues when moving from a very low N = 17 to
N = 130, hence providing guidance to applied researchers on the size of the cross section desired
for implementing our methodology; and (330,666), to consider the case of a sample with large N
and large T . We consider various number of common factors which mimic the findings in our
empirical applications: P 0D∈ 1,2 and P 0
S∈ 1,2,3,4,5,6.13
We compute, for 1000 replications, two indicators: (i) the average estimated number of fac-
tors
P j = 1
1000
1000∑r=1
P j ,r ,
j =D,S , and (ii) the percentage of times P j ,r 6= P 0j , j =D,S .
The results, reported in Table 1, are very satisfactory. Specifically, the estimated numbers of
factors is always close to the number of factors. This is very clear when N is at least equal to 70.
The worst performance occurs for the smalles cross-section considered, i.e. (17,273), where the
number of common factors is underestimated when the true number of factors in the satisfactory
state is 5 or 6 (the estimates are, on average, 4 and 5 respectively). However, this is hardly
surprising, given that the true DGP is generated for a small cross section and a large set of
common factors, which seems a rather unlikely DGP anyway. Overall, with the exception noted
above, the simulation results provide comfort that the number of common factors is precisely
identified by our procedure even in moderately-sized cross-sections of asset returns.
13Therefore we do not consider the case where there is no factor at all (i.e., either P 0D= 0 or P 0
S= 0, or both),
requiring that at least one factor should always be present in asset returns, which seems reasonable (see Lettauand Pelger (2020) for a related discussion).
32
7 Empirical analysis
7.1 Data
The empirical analysis in this section illustrates the use of the methodology proposed in this
paper in relation to two panels of test assets, namely a cross-section of equity portfolios and a set
of multiple assets. The data are described in details in Sections 7.1.1 and 7.1.2, respectively.
7.1.1 Equity panel
We begin with returns in excess of the risk-free rate from the following set of N = 130 U.S.
equity portfolios: 25 portfolios sorted by size and book-to-market; 25 portfolios sorted by size
and momentum; 10 size portfolios; 10 book-to-market portfolios; 10 momentum portfolios; 25
portfolios sorted by size and operating profitability; 25 portfolios sorted by size and investment.
The risk-free rate is the one-month US Treasury bill rate. The market return used to determine
the disappointment event (as detailed in Section 2.3) is the value-weighted average log-return
computed with respect to all stocks available from CRSP.
The data are taken from Kenneth French’s website.14 We consider the sample period from
July 1963 to December 2018, a total of T = 666 monthly observations. Farago and Tedongap
(2018) analyse each of these sets of portfolios individually.15 We consider the entire cross-section
of returns, which allows us to exploit our theoretical results and to be aligned with Lewellen,
Nagel, and Shanken (2010), who advocate the use of sizable cross-sections of asset returns.
7.1.2 Data for multiple asset classes
Cochrane (2011) suggests investigating the factor structure across multiple asset classes to
study the corresponding underlying discount factors. To analyze the effect of downside risk on
multiple asset classes we consider a cross section that includes currency, equity and commodity
14See https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html.15See Section 3.2 and Table 1 in Farago and Tedongap (2018).
33
returns, with N = 17 portfolios, obtained from Lettau, Maggiori, and Weber (2014)16: 6 currency
portfolios constructed from 53 currencies sorted according to their interest rate differential with
the US rate (carry trade portfolios); 6 equity portfolios sorted by size and book-to-market ratio; 5
commodity futures portfolios sorted by the commodity basis from Yang (2013). As for the equity
panel, the market return to determine the disappointment event as detailed in Section 2.3 is the
value-weighted average log-return computed with respect to all stocks available from CRSP.
The data sample runs from April 1986 to December 2012, which gives us T = 273 time series
observations. The data and our empirical exercise are similar to the one in Lettau, Maggiori, and
Weber (2014), who however capture the equity market by including the market excess return as a
test asset. This second dataset has a smaller cross section of asset returns than the equity returns
sample, but a much more diverse set of returns, hence providing incremental information on the
usefulness of our model.
7.2 Estimation and inference
In this section, we report estimation and inference on the threshold level θ (Section 7.2.1) and
on the number of factors in each regime (Section 7.2.2).
7.2.1 Estimation and inference on θ
Results are in Table 2. Across both datasets, the estimated threshold θ is approximately
equal to −0.06. Therefore, θ is more negative than the fixed threshold θ =−0.03 used in Lettau
et al. (2014), and Farago and Tedongap (2018): the estimated downside state is less frequent
but more severe than implied by the fixed threshold. This feature can be seen also by looking
at the number of time periods within regime TD and the corresponding relative frequency πD : θ
identifies periods of downside that occur approximately half of the times than those associated to θ.
16The data are available at https://faculty.chicagobooth.edu/michael.weber/research/index.html.
34
To enhance our understanding of the downside regime, we evaluate the correlation between
the downside indicators, as resulting from using θ and θ, and various measures of economic and
financial conditions that may be related to the downside regime. We consider the following set of
variables: the NBER U.S. recession indicator (REC); the log-difference of industrial production
(∆I P); the CBOE volatility index based on S&P 500 index options (VIX); the CBOE volatility
index based on S&P 100 index options (VXO); the economic policy uncertainty index of Baker,
Bloom, and Davis (2016) (EPU); the Equity Market Volatility tracker of Baker, Bloom, and Davis
(2016) (EMV); the Policy-Related Equity Market Volatility tracker of Baker, Bloom, Davis, and
Kost (2019) (PREMV); the Geopolitical Risk index of Caldara and Iacoviello (2018) (GPR); the
TED spread (TED); the disaster probability for the U.S. of Barro and Liao (2020) (SPX); the
spread for the T-bill between 10 and 2 year maturities (T10Y2Y). In all cases (with the excep-
tion of REC), we compute the standard Pearson correlation coefficient, which in this case is also
known as the “power biserial correlation.” When computing the correlation between the downside
indicator and REC - in light of the fact that these are both binary variables - we also use a
different measure of association, known as the “tetrachoric correlation coefficient.” This indicator
may be regarded as more adequate for our case, since it is based on the assumption that there
exists a continuous latent variable which determines each binary variable; however, there exists a
one-to-one mapping between this indicator and Pearson’s correlation (see Ekstrom (2011) for an
insightful discussion of both indicators and the historical debate around them). Broadly speaking,
the correlations point towards a connection between the downside indicator and various volatility
measures (not just the VIX but also several others). This finding is consistent with a vast litera-
ture on asset pricing, which shows that volatility risk is a priced risk factor (see, inter alia, Ang,
Hodrick, Xing, and Zhang (2006), Ang, Hodrick, Xing, and Zhang (2009), and Menkhoff, Sarno,
Schmeling, and Schrimpf (2012)). Note however, that the correlation is imperfect and, generally,
it revolves around 0.5; this indicates that the determinants of the threshold go beyond volatility.
Indeed, the correlation with NBER is also significant and positive, but again around 0.5. One
exception is the correlation with the term spread at different maturities, which is statistically
insignificantly different from zero.17
17Generally speaking, an interesting question is what drives the value of θ, and the model could allow for θ to
35
We further run inference on the threshold parameter. We first conduct a test, whose details
are spelt out in Section 5.3.3, for the null hypothesis H0 : θ = θ: this allows us to formally assess
whether the choice of the fixed threshold θ is supported by the data. As can be seen from Table 2,
the estimated threshold θ is significantly different from θ: this provides evidence in favour of our
approach that allows to estimate the timing of the downside regime. In addition to testing for the
significance of the difference between θ and θ, we use the same testing approach to test for the null
that the estimated thresholds are the same across the two datasets: the results in Table 2 show
that the estimated threshold is essentially the same across the two datasets we consider.
7.2.2 Estimating the number of common factors
Results are in Table 3.18 We estimate the number of common factors for three cases: the
whole dataset (this case is denoted henceforth without using any subscripts), the “satisfactory”
regime j =S , and the downside regime corresponding to the “disappointing event” j =D. In light
of the results in Section 7.2.2, we only report results using θ. Alongside the estimated number of
common factors, we evaluate the proportion of the total variance explained by each factor (and
the cumulative one). Letting g (i )j denote, as above, the i -th largest eigenvalue for each of the
two regimes j =D,S (and, similarly, using g (i ) for the whole sample), the proportion of the total
variance associated with the i -th common factor is measured as
ν(i )j =
g (i )j∑N
i=1 g (i )j
, (7.1)
with ν(i ) defined similarly. The cumulative percentage of the total variance explained by P j
estimated factors, is given by
ν j =∑P j
i=1 g (i )j∑N
i=1 g (i )j
, (7.2)
be endogenously determined by some state variables. We leave this generalization for future research.18Given the results from our simulations, we choose M = N ; we point out that, similarly to the simulations,
we assess the robustness of our results by using also M = bN /2c and M = 2N , but recorded no changes, at leastqualitatively.
36
with ν defined analogously for the whole sample.
The most striking finding arising from the results in Table 3 is the discrepancy between the
number of estimated common factors in the two regimes. First, PD is always smaller than PS .
Second, PD = 1 or 2 at most. The latter occurs the case when using the dataset of Lettau,
Maggiori, and Weber (2014); in such a case, we know from our simulations that the number of
common factors may be overstated when PD = 1, and therefore the determination of the number
of common factors may be done with considerations beyond the test. However, the percentage of
the variance explained by each eigenvalue in the downside state seems to suggest the presence of
two common factors, as opposed to only one. Even in this case, though, the first common factor
explains more than five times as much variance as the second one, whereas such discrepancy is
far more attenuated in regime S . Third, whilst the number of common factors decreases from
regime S to regime D, the proportion of the variance explained by the common factors in the
latter case is invariably higher than in the former case; this is even more remarkable when PD = 1
for the equity returns dataset, which suggests that the only factor driving the data in regime D
is more pervasive than the whole factor structure in regime S . Fourth, applying the test to the
unconditional, linear model would lead to overestimation of the number of factors (P =6 or 4 in
these two datasets).
These findings clearly suggest that it is important to use an estimation method which allows
for a different number of factors in each regime. From a statistical point of view, it is natural that
removing a restriction results in better inference - in this case, the number of common factors
is neither understated (which would correspond to an omitted variable problem) nor overstated
(which would be tantamount to including irrelevant variables). Also, from an empirical viewpoint,
results clearly suggest that restricting the number of common factors to be the same across regimes
is not an adequate choice. We point out that a related finding when applying our methodology
is that the data tend to be driven by few(er) common factors during bad times, which entails
that dependence among assets increases in a downturn. In turn, this result is intuitively clear and
consistent with the anecdotal evidence that factor diversification tends to disappear when needed
37
most (see e.g. Page and Panariello (2018) and the references therein).19
7.3 Model fit
In this section, we consider the performance of our latent factor model with downside risk,
and compare it with a linear specification, in the same vein as the approach followed, e.g., by
Lettau, Maggiori, and Weber (2014).
From these results, we can see that the conditional model performs well, with a very high
R2 both in the S regime and in the D regime, as well as for the full sample; similar comments
apply to the RMSPE. The results also show that performance of the linear, unconditional model
over the full sample is quite satisfactory with an R2 in the range between 0.925 and 0.978 (when
the number of common factors set to P = P), and relatively low RMSPE. However, we know from
earlier evidence that the number of latent factors that matter for pricing the cross section of
asset returns in the downside regime is much smaller (1 or at most 2) than the number of factors
required in the S regime. Given that the data experience this strong regime dependence, it seems
surprising that the unconditional, regime-independent pricing model performs as well as it does.
To better understand the gain from using a conditional asset pricing that allows for downside
risk, it is therefore useful to check the performance of the unconditional model for observations
in the S and D regimes separately. The results in the table show that the performance of the
unconditional model is very poor in both the D and S regimes, with very low explanatory power
and large RMSPEs. This suggests that the seemingly good performance of the unconditional
model over the full sample is illusory, in the sense that it comes from somehow combining large
pricing errors from two regimes which generate small pricing errors on aggregate, over the full
sample.
The latter point can be seen clearly in Figure 1, where we report - for each of the three
sets of test assets examined - scatter plots of realized versus model-implied returns, for both the
19This result is also consistent with the literature that shows that, in crisis times, the macroeconomy is driven bya smaller number of common factors (we refer, inter alia, to Stock and Watson (2009), Cheng, Liao, and Schorfheide(2016) and Li, Todorov, Tauchen, and Lin (2017)).
38
unconditional model, and the conditional model under three different cases: the D regime, the S
regime, and the full sample. This figure makes it apparent how the unconditional model performs
very poorly in the downside regime, with virtually no explanatory power and large positive errors
- i.e., the unconditional model predicts much higher returns than the realized ones, which tend
to be very negative in this regime. Hence, the unconditional model is not able to capture such
extreme, bad events. The conditional model performs much better, and there appears to be no
systematic tendency to generate positive or negative errors in the downside regime D. Turning to
the S regime, the conditional model performs very well, with the majority of test asset returns
lying close to, or on, the 45 degree line. In contrast, the returns implied by the unconditional
model display a monotonic pattern (hence are highly positively correlated) with realized returns
in the S regime, but do not lie on the 45 degree line for any of the test asset returns. This
means that, even in the S regime, the unconditional model has a tendency to generate systematic
(non-random) pricing errors; however, these pricing errors have the opposite sign than in the D
regime, since the unconditional model predicts lower returns than realized in the S regime. Over
the full sample, the averaging across the errors of these two regimes - large positive errors for
a small number of observations in the D regimes and more moderate negative pricing errors for
a large number of observations in the S regime - gives the appearance that the unconditional
model performs well, but this is of course illusory: the unconditional model performs poorly in
both regimes.
The analysis above clarifies the gain from using a conditional latent factor model of asset
pricing that allows for different regimes, and hence different sets of factors in good and bad times.
The pricing model is very different in these two regimes, capturing the stark difference in the time-
varying factor structure of asset returns, which is very low-dimensional in bad times, and requires
more factors in good times. A model that does not allow for this feature, which is prominent
across all three data sets examined here, can spuriously generate a strong correlation between
model-implied and realized returns over the full sample, but in fact it performs poorly in both
good and bad times. At an applied level, the conditional model is consistent with the notion that
asset returns become very highly correlated in bad times and their factor structure reduces in
39
dimensionality to one or two factors, and this feature is key for the model to generate predicted
returns that capture downside risk.
7.4 Risk premia
Table 5 displays the results from estimation of the risk premia obtained from the three-pass
procedure detailed in Section 5. We consider the following set of observable pricing factors: the
log-return on the market (ln(Rm), namely the measure of overall equity market performance used
in Farago and Tedongap (2018); the four additional factors that complete the Fama and French
(2015) five-factor model, that is size (SMB), value (HML), operating profitability (RMW), and
investment (CMA); the momentum factor (MOM), taken from Carhart (1997), whose importance
is further stressed in Asness et al. (2013); the liquidity factor (LIQ) from Pastor and Stambaugh
(2003); the spread between the 10 and the 2 year zero-coupon Treasury rate (T10Y2Y), which
captures the information stemming from the yield curve.20 For equity and multi-asset portfolios,
we perform our analysis with respect to conditional and unconditional specifications.
Starting from equity portfolios, in the case of the unconditional model HML, MOM and, to
a lesser extent, ln(Rm) and CMA, generate positive and statistically significant risk premia. The
unconditional model however hides highly asymmetric dynamics between the regimes: in the oc-
currence of the downside event, ln(Rm) and SMB generate excess returns that are negative and
strongly statistically significant; in tranquil times a higher number of factors, namely ln(Rm),
SMB, MOM and LIQ, are positively priced. Moving to multi-asset portfolios, the unconditional
model does not identify any statistically significant risk premium. However, the conditional mod-
els provide a more informative picture: the risk premia of ln(Rm), SMB and, marginally, RMW
are negative and significant in downside periods; ln(Rm), SMB, HML, RMW and CMA, more
weakly contribute to the excess returns of the test assets in tranquil times.
20The series for ln(Rm), SMB, HML, RMW, CMA and MOM are available from Kenneth French website: seefootnote 14. LIQ is available from Lubos Pastor’s website. T10Y2Y is built from the series of 10 and 2 year Treasuryrates from Sarno et al. (2007) from July 1963 to December 2003, and from the updated dataset of Gurkaynak et al.(2007), as available from Refet Gurkaynak website, from January 2004 to December 2018.
40
To gain further insights we study the log-market return more into details. The estimator for
the risk premium of ln(Rm) obtained from the unconditional model is statistically insignificant for
both sets of test assets. Turning to the conditional model, we can proxy the regime probabilities
with their relative number of observations in the two states and calculate the overall risk premium
as the weighted average of the risk premia of ln(Rm): this is equal to 0.55% and 0.25% on a
monthly basis for equity and multi-asset portfolios, respectively, which correspond to annualized
values of 6.60% and 3.05%. These are sizeable and greater than zero, which shows that the market
factor is indeed priced within the whole time period we consider. The risk premium estimated
from equity portfolios is higher than that from the set of multiple assets: in the latter case the
estimated standard error when the downside event occurs is sizeable and the corresponding esti-
mate for the risk premium is likely to suffer from a loss in precision; this may be due to the fact
that the available panel is of limited size.
Finally, it is worth noting that the conditional model is estimated with an intercept, which is
also allowed to vary across the two states. The bottom panel of Table 5 reports the estimates of
the intercepts for both datasets. While the intercepts can of course be different from zero in each
regime and indeed they are, one would expect the unconditional (probability-weighted) intercept
to be small or indeed equal zero in a well-specified model of asset prices. This is indeed the case
for both datasets, as the 95% confidence intervals of the unconditional intercepts cleary include
zero.
8 Conclusions
In this paper, we propose a methodology to price the cross section of asset returns in the
presence of common (possibly latent) factors and downside risk. Both features have been shown
to offer superior explanatory power in several empirical papers (Lettau, Maggiori, and Weber
(2014)). We extend the framework used in the extant literature in at least three ways: we allow
41
for downside risk via a threshold specification which allows for the estimation of the (usually set
a priori) threshold level; we consider different factor structures in the two regimes, allowing also
for different numbers of factors; we adapt the methodology developed by Giglio and Xiu (2020)
to recover the observable factors risk premia from the estimated latent ones in both regimes. We
point out that, although our theory is based on a threshold model and strong, or pervasive, factors,
it can be readily generalised to the case of structural breaks and/or of weak common factors.
Our methodology is illustrated through three applications to a low dimensional, a medium
sized, and a large datasets. In all cases, we show that: estimating, rather than setting a priori, the
threshold level improves goodness of fit, also determining substantial differences in the estimation
of risk premia with respect to an exogenously fixed level; estimating the number of common factors
separately between regimes yields very different estimates between regimes, with the downside risk
one having a small number of common factors (one or two at most), whilst returns in the “normal”
regime are driven by substantially more common factors; estimating the risk premia of classically
employed observable factors is more accurate and yields estimates with the expected sign.
42
References
Ang, A., J. Chen, and Y. Xing (2006). Downside risk. The Review of Financial Studies 19(4), 1191–1239.
Ang, A., R. J. Hodrick, Y. Xing, and X. Zhang (2006). The cross-section of volatility and expectedreturns. The Journal of Finance 61(1), 259–299.
Ang, A., R. J. Hodrick, Y. Xing, and X. Zhang (2009). High idiosyncratic volatility and low returns:International and further us evidence. Journal of Financial Economics 91(1), 1–23.
Asness, C. S., T. J. Moskowitz, and L. H. Pedersen (2013). Value and momentum everywhere. TheJournal of Finance 68(3), 929–985.
Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71, 135–171.
Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Economet-rica 70, 191–221.
Bai, Z. and J. Yao (2012). On sample eigenvalues in a generalized spiked population model. Journal ofMultivariate Analysis 106, 167–177.
Bai, Z. and J.-f. Yao (2008). Central limit theorems for eigenvalues in a spiked population model. InAnnales de l’IHP Probabilites et statistiques, Volume 44, pp. 447–474.
Baker, S. R., N. Bloom, and S. J. Davis (2016). Measuring economic policy uncertainty. The QuarterlyJournal of Economics 131(4), 1593–1636.
Baker, S. R., N. Bloom, S. J. Davis, and K. J. Kost (2019). Policy news and stock market volatility.Technical report, National Bureau of Economic Research.
Barigozzi, M. and L. Trapani (2018). Determining the dimension of factor structures in non-stationarylarge datasets. arXiv preprint arXiv:1806.03647.
Barro, R. J. and G. Y. Liao (2020). Rare disaster probability and options pricing. Journal of FinancialEconomics. forthcoming.
Bates, B. J., M. Plagborg-Møller, J. H. Stock, and M. W. Watson (2013). Consistent factor estimationin dynamic factor models with structural instability. Journal of Econometrics 177, 289–304.
Caldara, D. and M. Iacoviello (2018). Measuring geopolitical risk. FRB International Finance DiscussionPaper (1222).
Cappiello, L., R. F. Engle, and K. Sheppard (2006). Asymmetric dynamics in the correlations of globalequity and bond returns. Journal of Financial econometrics 4(4), 537–572.
Carhart, M. M. (1997). On persistence in mutual fund performance. The journal of finance 52(1), 57–82.
Cheng, X., Z. Liao, and F. Schorfheide (2016). Shrinkage estimation of high-dimensional factor modelswith structural instabilities. The Review of Economic Studies 83, 1511–1543.
Cochrane, J. H. (2011). Presidential address: Discount rates. The Journal of finance 66(4), 1047–1108.
Davidson, J. (1994). Stochastic limit theory: An introduction for econometricians. OUP Oxford.
43
Ekstrom, J. (2011). The phi-coefficient, the tetrachoric correlation coefficient, and the pearson-yuledebate.
Fama, E. F. and K. D. French (2015). A five-factor asset pricing model. Journal of financial eco-nomics 116(1), 1–22.
Fama, E. F. and J. D. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. Journal ofpolitical economy 81(3), 607–636.
Fan, J., Y. Liao, and J. Yao (2013). Large panel test of factor pricing models. arXiv preprintarXiv:1310.3899.
Farago, A. and R. Tedongap (2018). Downside risks and the cross-section of asset returns. Journal ofFinancial Economics 129(1), 69–86.
Giglio, S. and D. Xiu (2020). Asset pricing with omitted factors. Journal of Political Economy. forth-coming.
Gurkaynak, R. S., B. Sack, and W. J. H (2007). The u.s. treasury yield curve: 1961 to the present.Journal of Monetary Economics 54(8), 2291–2304.
Hansen, B. E. (2000). Sample splitting and threshold estimation. Econometrica 68(3), 575–603.
Harvey, C. R. and Y. Liu (2020). A census of the factor zoo. mimeo.
Horvath, L. and L. Trapani (2019). Testing for randomness in a random coefficient autoregression model.Journal of Econometrics 209(2), 338–352.
Lettau, M., M. Maggiori, and M. Weber (2014). Conditional risk premia in currency markets and otherasset classes. Journal of Financial Economics 114(2), 197–225.
Lettau, M. and M. Pelger (2020). Estimating latent asset-pricing factors. Journal of Econometrics.
Lewellen, J., S. Nagel, and J. Shanken (2010). A skeptical appraisal of asset pricing tests. Journal ofFinancial economics 96(2), 175–194.
Li, J., V. Todorov, G. Tauchen, and H. Lin (2017). Rank tests at jump events. Journal of Business &Economic Statistics. forthcoming.
Markovitz, H. M. (1959). Portfolio selection: Efficient diversification of investments. John Wiley.
Massacci, D. (2017). Least squares estimation of large dimensional threshold factor models. Journal ofEconometrics 197(1), 101–129.
McLeish, D. L. et al. (1974). Dependent central limit theorems and invariance principles. the Annals ofProbability 2(4), 620–628.
Menkhoff, L., L. Sarno, M. Schmeling, and A. Schrimpf (2012). Carry trades and global foreign exchangevolatility. The Journal of Finance 67(2), 681–718.
Merikoski, J. K. and R. Kumar (2004). Inequalities for spreads of matrix sums and products. AppliedMathematics E-Notes 4, 150–159.
44
Moricz, F. (1983). A general moment inequality for the maximum of the rectangular partial sums ofmultiple series. Acta Mathematica Hungarica 41, 337–346.
Page, S. and R. A. Panariello (2018). When diversification fails. Financial Analysts Journal 74(3), 19–32.
Pastor, L. and R. F. Stambaugh (2003). Liquidity risk and expected stock returns. Journal of Politicaleconomy 111(3), 642–685.
Peligrad, M. et al. (1987). On the central limit theorem for ρ-mixing sequences of random variables. TheAnnals of Probability 15(4), 1387–1394.
Peligrad, M., S. Utev, and W. Wu (2007). A maximal lp-inequality for stationary sequences and itsapplications. Proceedings of the American Mathematical Society 135(2), 541–550.
Pesaran, M. H. and T. Yamagata (2012). Testing capm with a large number of assets. In AFA 2013 SanDiego Meetings Paper.
Salzer, H. E., R. Zucker, and R. Capuano (1952). Table of the zeros and weight factors of the first twentyHermite polynomials. US Government Printing Office.
Sarno, L., D. L. Thornton, and G. Valente (2007). The empirical failure of the expectations hypothesisof the term structure of bond yields. Journal of Financial and Quantitative Analysis 42(1), 81–100.
Schwert, G. W. (1989). Why does stock market volatility change over time? The Journal of Finance 44(5),1115–1153.
Shao, Q.-M. (1995). Maximal inequalities for partial sums of ρ-mixing sequences. The Annals of Proba-bility, 948–965.
Stock, J. H. and M. W. Watson (2009). Forecasting in dynamic factor models subject to structuralinstability. In J. Castle and N. Shephard (Eds.), The Methodology and Practice of Econometrics: AFestschrift in Honour of David F. Hendry. Oxford University Press.
Trapani, L. (2018). A randomized sequential procedure to determine the number of factors. Journal ofthe American Statistical Association 113(523), 1341–1349.
Uematsu, Y. and T. Yamagata (2019). Estimation of weak factor models. ISER DP (1053).
Wang, W. and J. Fan (2017). Asymptotics of empirical eigenstructure for high dimensional spikedcovariance. Annals of statistics 45(3), 1342.
Yang, F. (2013). Investment shocks and the commodity basis spread. Journal of Financial Eco-nomics 110(1), 164–184.
45
A Technical Lemmas
We begin with a Borel-Cantelli type result, whose proof can be found in Barigozzi and Trapani
(2018).
Lemma A.1. Consider a multi-index random variable Ui1,...,ih , with 1 ≤ i1 ≤ S1, 1 ≤ i2 ≤ S2, etc...
Assume that ∑S1
· ·∑Sh
1
S1 · ... ·ShP
(max
1≤i1≤S1,...,1≤ih≤Sh
∣∣Ui1,...,ih
∣∣> εLS1,...,Sh
)<∞, (A.1)
for some ε> 0 and a sequence LS1,...,Sh defined as
LS1,...,Sh = Sd11 · ... ·Sdh
h l1 (S1) · ...lh (Sh) ,
where d1, d2, etc. are non-negative numbers and l1 (·), l2 (·), etc. are slowly varying functions in
the sense of Karamata. Then it holds that
lim sup(S1,...,Sh )→∞
∣∣US1,...,Sh
∣∣LS1,...,Sh
= 0 a.s. (A.2)
Henceforth, we will extensively use the following notation:
v N ,T (ε) = (ln N )2+ε (lnT )2+ε , (A.3)
where ε> 0. Recall (4.3)
S(BP ,UP ,θ
)= 1
N T
N∑i=1
T∑t=1
[Ri ,t −
(β
PD ′D,i uPD
D,t dD,t (θ)+βPS ′S ,i uPS
S ,t dS ,t (θ))]2
,
which is the least squares loss function calculated at the triplet(BP ,UP ,θ
); similarly, we de-
fine
SUB (θ) = 1
N T
N∑i=1
T∑t=1
[Ri ,t −
(β
P 0D′
D,i uP 0
D
D,t dD,t (θ)+βP 0S′
S ,i uP 0
S
S ,t dS ,t (θ)
)]2
, (A.4)
which denotes the concentrated version of S(BP ,UP ,θ
)when using the correct number of factors
P 0D
and P 0S
. Finally, when using PD and PS factors, with P 0D≤ PD ≤ Pmax and P 0
S≤ PS ≤ Pmax,
we write
S(P )uB (θ) = 1
N T
N∑i=1
T∑t=1
[Ri ,t −
(β
PD ′D,i uPD
D,t dD,t (θ)+ βPS ′S ,i uPS
S ,t dS ,t (θ))]2
, (A.5)
46
to define the concentrated version of S(BP ,UP ,θ
)in that case. We will also use the Pm ×P j
projection matrices
HP j ,m
j m (θ) =U0
m,m
(θ0
)U0
m, j (θ)′
T
B0′mB
P j
j (θ)
N
(V
P j
j (θ))−1
, (A.6)
where j ,m = D,S , U0m, j (θ) is defined as the Pm ×T matrix of regime-specific factors, Um (θ),
multiplied by a T ×T diagonal matrix whose entries are d j ,t (θ), 1 ≤ t ≤ T ; finally, VP j
j (θ) is a
P j ×P j matrix containing the largest P j eigenvalues of
1
N T
T∑t=1
Rt R′t d j ,t (θ) .
Note that, for j 6= m, HP j m
j m
(θ0
)reduces to a matrix of zeros. In order to make the notation lighter,
the superscript P j ,m is set equal to P j when j = m.
We now present a few lemmas to show the strong consistency of θ.
Lemma A.2. We assume that Assumptions 1-4 are satisfied. Then it holds that
1
N
N∑i=1
∥∥∥βP j
j ,i
(θ0)− H
P j
j j
(θ0)′β0
j ,i
∥∥∥2 = oa.s.(P j C−2
N ,T v N ,T (ε))
,
for every ε> 0 and j =D,S .
Proof. This follows by adapting the passages in the proof of Theorem 3.2 in Massacci (2017).
Then we have
1
N
N∑i=1
∥∥∥βP j
j ,i
(θ0)− H
P j
j j
(θ0)′β0
j ,i
∥∥∥2(A.7)
≤ 4
N 2
N∑i=1
N∑l=1
σ2j i l
(θ0)+ 1
N
(1
N 2
N∑l=1
N∑q=1
∣∣∣∣∣ N∑i=1
Å j i l(θ0)Å j i q
(θ0)∣∣∣∣∣
2)1/2
+ 2
N
N∑i=1
(∥∥∥β0D,i
∥∥∥2 +∥∥∥β0
S ,i
∥∥∥2)
1
N T 2
N∑l=1
∥∥∥∥∥ T∑t=1
d j ,t(θ0)u0
j ,t el ,t
∥∥∥∥∥2]
×(
1
N
N∑i=1
∥∥∥βP j
j ,i
(θ0)∥∥∥2
)
= AN ,T
(1
N
N∑i=1
∥∥∥βP j
j ,i
(θ0)∥∥∥2
),
47
having defined
σ j i l(θ0)= 1
T
T∑t=1
E(d j ,t
(θ0)ei ,t el ,t
),
Å j i l(θ0)= 1
T
T∑t=1
d j ,t(θ0)ei ,t el ,t −σ j i l
(θ0) .
By the proof of Theorem 3.2 in Massacci (2017), it follows immediately that E∣∣AN ,T
∣∣ ≤ c0C−2N ,T .
Thus, by a similar logic as in the previous passages, it follows that AN ,T = oa.s.
(C−2
N ,T (ln N )2+ε (lnT )2+ε)
for every ε> 0. Also,1
N
N∑i=1
∥∥∥βP j
j ,i
(θ0)∥∥∥2 = P j ,
by construction, since BP j
j
(θ0
)′ BP j
j
(θ0
) = N IP j . The desired result follows by putting everything
together.
Lemma A.3. We assume that Assumptions 1-5 are satisfied. Then it holds that
∣∣θ−θ0∣∣= oa.s. (1) ,
for all P 0j ≤ P j ≤ Pmax, j =D,S .
Proof. The lemma is a generalisation of Lemma A.9 in Massacci (2017). In particular, the desired
result follows if we show that
S(P )UB (θ)−S(P0)
UB
(θ0)= c0 +oa.s. (1) , (A.8)
with c0 > 0, for every θ 6= θ0 - note that S(P )UB (θ) is defined in (A.5). We being by considering the
identity
S(P )UB (θ)−S(P )
UB
(θ0)=S(P )
UB (θ)−S(P )B
(B0
D H PD
DD(θ)+B0
S HPS ,D
S D(θ) ,B0
D HPD,S
DS(θ)+B0
S H PS
S S(θ) ,θ0
)+S(P )
B
(B0
D H PD
DD(θ)+B0
S HPS ,D
S D(θ) ,B0
D HPD,S
DS(θ)+B0
S H PS
S S(θ) ,θ0
)−S(P )
B
(B0
D H PD
DD(θ) ,B0
S H PS
S S(θ) ,θ0
)+S(P )
B
(B0
D H PD
DD(θ) ,B0
S H PS
S S(θ) ,θ0
)−S(P )
UB
(θ0) ,
where S(P )B denotes the loss function S
(BP ,UP ,θ
)concentrated at BP . By construction, it holds
that
S(P )B
(B0
D H PD
DD(θ) ,B0
S H PS
S S(θ) ,θ0
)= S(P )
B
(B0
D ,B0S ,θ0) ,
48
with
S(P )B
(B0
D ,B0S ,θ0)−S(P )
UB
(θ0)> 0.
Following the proof of Lemma A.2 and Theorem 3.3 in Massacci (2017), it can be shown by
standard arguments that
E∣∣∣S(P )
UB (θ)−S(P )B
(B0
D H PD
DD(θ)+B0
S HPS ,D
S D(θ) ,B0
D HPD,S
DS(θ)+B0
S H PS
S S(θ) ,θ0
)∣∣∣2 ≤ c0C−2N ,T ;
then, by the maximal inequality for multi-parameter processes in Moricz (1983), it follows that
E max1≤t≤T,1≤i≤N
∣∣∣S(P )UB,i t (θ)−S(P )
B,i t
(B0
D H PD
DD(θ)+B0
S HPS ,D
S D(θ) ,B0
1HPD,S
DS(θ)+B0
S H PS
S S(θ) ,θ0
)∣∣∣2
≤ c0C−2N ,T (ln N ) (lnT ) ,
where
S(P )UB,i t (θ) = 1
N T
i∑i ′=1
t∑t ′=1
[Ri ,t −
(β′
D,i uD,t dD,t (θ)+ β′S ,i uS ,t dS ,t (θ)
)]2.
Hence, Lemma A.1 yields
S(P )UB (θ)−S(P )
B
(B0
D H PD
DD(θ)+B0
S HPS ,D
S D(θ) ,B0
D HPD,S
DS(θ)+B0
S H PS
S S(θ) ,θ0
)= oa.s.
(C−1
N ,T (ln N )1+ε (lnT )1+ε) .
Similarly, by using Lemma A.3 in Massacci (2017) and the same arguments as above, it follows
that there exists a c0 > 0 such that
S(P )B
(B0
D H PD
DD(θ)+B0
S HP2,D
S D(θ) ,B0
D HPD,S
DS(θ)+B0
S H PS
S S(θ) ,θ0
)−S(P )
B
(B0
D ,B0S ,θ0)= c0 +oa.s. (1) .
Putting everything together, we have
S(P )UB (θ)−S(P )
UB
(θ0)= c0 +oa.s. (1) .
Then (A.8) follows if we show that
S(P )UB
(θ0)−S(P0)
UB
(θ0)= oa.s. (1) . (A.9)
It holds that ∣∣∣S(P )UB
(θ0)−S(P0)
UB
(θ0)∣∣∣≤ ∣∣S1
UB
(θ0)∣∣+ ∣∣S2
UB
(θ0)∣∣ ,
49
where
S1UB
(θ0)= 2
N T
T∑t=1
(dD,t
(θ0)u0′
D,t H PD+DD
(θ0)′ (BPD
D
(θ0)−B0
D H PD
DD
(θ0))′
+dS ,t(θ0)u0′
S ,t H PS +S S
(θ0)′ (BPS
S
(θ0)−B0
S H PS
S S
(θ0))′)et ,
S2UB
(θ0)= 1
N T
T∑t=1
(dD,t
(θ0)u0′
D,t H PD+DD
(θ0)′ (BPD
D
(θ0)−B0
D H PD
DD
(θ0))′
×(BPD
D
(θ0)−B0
D H PD
DD
(θ0)) H PD+
DD
(θ0)u0
D,t
+dS ,t(θ0)u0′
S ,t H PS +S S
(θ0)′ (BPS
S
(θ0)−B0
S H PS
S S
(θ0))′
+(BPS
S
(θ0)−B0
S H PS
S S
(θ0)) H PS +
S S
(θ0)u0
2,t
).
We now estimate the order of magnitude of S1UB
(θ0
)and S2
UB
(θ0
). We have
∣∣S1UB
(θ0)∣∣≤ 2T −1/2P 0
D
∥∥∥H PD+DD
(θ0)∥∥∥(
1
N
N∑i=1
∥∥∥βPD
D,i
(θ0)− H PD
DD
(θ0)′β0
D,i
∥∥∥2)1/2
(1
N
N∑i=1
∥∥∥∥∥ 1pT
T∑t=1
dD,t(θ0)u0
D,t ei ,t
∥∥∥∥∥2)1/2
+2T −1/2P 0S
∥∥∥H PS +S S
(θ0)∥∥∥(
1
N
N∑i=1
∥∥∥βPS
S ,i
(θ0)− H PS
S S
(θ0)′β0
S ,i
∥∥∥2)1/2
(1
N
N∑i=1
∥∥∥∥∥ 1pT
T∑t=1
dS ,t(θ0)u0
S ,t ei ,t
∥∥∥∥∥2)1/2
.
By construction, note that ∥∥∥HP j+j j
(θ0)∥∥∥≤ c0P 1/2
j (A.10)
for j = D,S . Also, Assumption 4(v) entails that, by the same logic as in the previous passages
that1
N
N∑i=1
∥∥∥∥∥ 1pT
T∑t=1
d j ,t(θ0)u0
j ,t ei ,t
∥∥∥∥∥2
= oa.s.(v N ,T (ε)
),
for every ε> 0 and j =D,S . Thus, using Lemma A.2, we finally have
∣∣S1UB
(θ0)∣∣= oa.s.
(PC−1
N ,T T −1/2v N ,T (ε))
,
50
having set P = maxPD ,PS ; this term is therefore oa.s. (1) by (4.22). Similarly, it holds that (see
Massacci (2017))
∣∣S2UB
(θ0)∣∣≤ 1
T
T∑t=1
∥∥∥u0D,t
∥∥∥2 ∥∥∥H PD+DD
(θ0)∥∥∥2 1
N
N∑i=1
∥∥∥βPD
D,i
(θ0)− H PD
DD
(θ0)′β0
D,i
∥∥∥2
+ 1
T
T∑t=1
∥∥∥u0S ,t
∥∥∥2 ∥∥∥H PS +S S
(θ0)∥∥∥2 1
N
N∑i=1
∥∥∥βPS
S ,i
(θ0)− H PS
S S
(θ0)′β0
S ,i
∥∥∥2.
Using (A.10) and Lemma A.2, and noting that Assumptions 3(i) and 5(i) entail
1
T
T∑t=1
∥∥∥u0j ,t
∥∥∥2 =Oa.s. (1) ,
for j =D,S , we have ∣∣S2UB
(θ0)∣∣= oa.s.
(P
2C−2
N ,T v N ,T (ε))
,
which again is oa.s. (1) by (4.22). Thus, (A.9) follows. Putting all together, the desired result
obtains.
In the next two lemmas, we define the set
BN ,T =(
v N ,T (ε)
NηT,c0
), (A.11)
where 0 < c0 <∞ and v N ,T is defined in (A.3), and the functions
ω0 (η,θ
)= 1
NηT
N∑i=1
T∑t=1
∣∣dS ,t (θ)−dS ,t(θ0)∣∣(δ0′
i u0t
)2, (A.12)
h0 (η,θ
)= 1
NηT
N∑i=1
T∑t=1
dS ,t (θ)δ0′i u0
t ei ,t , (A.13)
where u0t and δ0
i are defined in (4.1) and (4.2) respectively.
Lemma A.4. We assume that Assumptions 3 and 5 are satisfied. Then it holds that
ω0(η,θ
)∣∣θ−θ0∣∣ ≥ c0 +oa.s. (1) ,
for∣∣θ−θ0
∣∣ ∈ BN ,T and 0 < c0 <∞.
51
Proof. Define
ω0N ,T =ω0 (
η,θ)−Eω0 (
η,θ)
.
By equation (35) in Massacci (2017) we have E∣∣ω0
N ,T
∣∣2 ≤ c0N−ηT −1∣∣θ−θ0
∣∣, and thus by the maxi-
mal inequality for rectangular sums (Moricz (1983)), Markov inequality and Lemma A.1, it follows
that there exist two random variables N0, T0 such that for N ≥ N0 and T ≥ T0 and every 0 < ε′ < ε2 ,
with ε defined in (A.3)
ω0N ,T ≤ c0N−η/2T −1/2
∣∣θ−θ0∣∣1/2
(ln N )1+ε′ (lnT )1+ε′ . (A.14)
Consider nowω0
(η,θ
)∣∣θ−θ0∣∣ = Eω0
(η,θ
)∣∣θ−θ0∣∣ + ω0
N ,T∣∣θ−θ0∣∣ .
Equation (33) in Massacci (2017) stipulates that
inf|θ−θ0|≤c0
Eω0 (η,θ
)> c0∣∣θ−θ0
∣∣ . (A.15)
Also, using (A.14) and recalling that, by the definition of BN ,T ,∣∣θ−θ0
∣∣ ≥ v N ,T (ε)NηT , it follows that
there exist two random variables N0, T0 such that for N ≥ N0 and T ≥ T0
ω0N ,T∣∣θ−θ0
∣∣ ≤ c0(ln N )1+ε′ (lnT )1+ε′
v1/2N ,T (ε)
= oa.s. (1) . (A.16)
Combining (A.15) with (A.16), we obtain the desired result.
Lemma A.5. We assume that Assumptions 3-5 are satisfied. Then it holds that there exist two
random variables N0, T0 and a constant 0 < c0 <∞ such that for N ≥ N0 and T ≥ T0∣∣h0(η,θ
)−h0(η,θ0
)∣∣∣∣θ−θ0∣∣ ≤ c0v−1
N ,T (ε) ,
for∣∣θ−θ0
∣∣ ∈ BN ,T .
Proof. Consider
NηT(h0 (
η,θ)−h0 (
η,θ0))= N∑i=1
T∑t=1
(dS ,t (θ)−dS ,t
(θ0))δ0′
i u0t ei ,t
=Nη∑i=1
T∑t=1
(dS ,t (θ)−dS ,t
(θ0))δ0′
i u0t ei ,t .
52
By Assumptions 3(i), 3(iii), 4(i) and 5(i), and by repeated use of the Cauchy-Schwartz inequality,
it follows immediately that
E∣∣NηT
(h0 (
η,θ)−h0 (
η,θ0))∣∣≤ c0NηT max
1≤i≤N
∥∥δ0i
∥∥2(E
∣∣dS ,t (θ)−dS ,t(θ0)∣∣2
)1/2(E
∥∥u0t
∥∥4max
1≤i≤NE
∥∥ei ,t∥∥4
)1/4
≤ c0NηT∣∣θ−θ0
∣∣ ,
so that by the same logic as above we have∣∣h0 (η,θ
)−h0 (η,θ0)∣∣= oa.s.
(N−η/2T −1/2
∣∣θ−θ0∣∣1/2
(ln N )1+ε (lnT )1+ε)
.
Thus, recalling that∣∣θ−θ0
∣∣ ∈ BN ,T∣∣h0(η,θ
)−h0(η,θ0
)∣∣∣∣θ−θ0∣∣ ≤ c0
Nη/2T 1/2
v1/2N ,T (ε)
N−η/2T −1/2 (ln N )1+ε (lnT )1+ε = oa.s. (1) .
Lemma A.6. We assume that Assumptions 1-4 are satisfied. Then it holds that, for every P j ≥ P 0j∥∥∥βP j ′
j ,i u j ,t −β0′j ,i u0
j ,t
∥∥∥= oa.s. (1) , (A.17)
for j =D,S and all 1 ≤ i ≤ N , 1 ≤ t ≤ T . Also, S(BP ,UP ,θ
)is continuous in (B,U).
Proof. We omit rotation matrices for the sake of the notation. By using the results in the proof
of Lemma A.8 in Massacci (2017), it can be shown that E∥∥∥βP j
j ,i − HP j
j j
(θ0
)β0
j ,i
∥∥∥2 ≤ c0C−2N ,T . Thus,
using the same logic as above, Lemma A.1 entails that for every(i , j , t
),
∥∥∥βP j
j ,i − HP j
j j
(θ0
)β0
j ,i
∥∥∥ =oa.s.
(C−1
N ,T (ln N )1+ε (lnT )1+ε). Similarly, by Corollary 3.1 in Massacci (2017), it can be shown that∥∥u j ,t −u j ,t
∥∥ = oa.s.((ln N )1+ε (lnT )1+ε). Equation (A.17) now follows immediately. Further, as a
consequence we can assume that
limN ,T→∞
βP j ′j ,i u j ,t =β0′
j ,i u0j ,t . (A.18)
Note now that by Taylor’s expansion, there exists an S′ such that
S(BP ,UP ,θ
)= S(B0,U0,θ
)+S′ (BP UP −B0U0) .
Given that, by (A.18), we can assume∣∣S′∣∣ < ∞, we immediately get limN ,T→∞ S
(BP ,UP ,θ
) =S
(B0,U0,θ
), which proves the last statement of the lemma.
53
We are now ready to derive a strong rate for θ.
Lemma A.7. We assume that Assumptions 1-5 are satisfied. Then it holds that
θ−θ0 = oa.s.(N−ηT −1v N ,T (ε)
),
for every ε> 0.
Proof. We know from Lemma A.3 that∣∣θ−θ0
∣∣ > c0 is an event which happens with probability
zero, and we can therefore only focus on the case∣∣θ−θ0
∣∣ < c0. We now show, by contradiction,
that∣∣θ−θ0
∣∣ ∉ BN ,T . It is easy to see that as (N ,T ) →∞
S(BP ,UP ,θ
)−S(BP ,UP ,θ0
)∣∣θ−θ0∣∣ ≥ inf
BN ,T
ω0(η,θ
)∣∣θ−θ0∣∣ −2 sup
BN ,T
∣∣h0(η,θ
)−h0(η,θ0
)∣∣∣∣θ−θ0∣∣ > 0 a.s.,
by using Lemmas A.4 and A.5. However, by extending the proof of Theorem 3.3 in Massacci
(2017), it can be shown that S(BP ,UP ,θ
)−S(BP ,UP ,θ0
)≤ 0 a.s., which is a contradiction. Thus,∣∣θ−θ0∣∣ ∈ BN ,T has measure zero, which entails that there exist two random variables N0, T0 and
a constant 0 < c0 <∞ such that for N ≥ N0 and T ≥ T0
∣∣θ−θ0∣∣≤ c0N−ηT −1v N ,T (ε) ,
which proves the lemma.
Lemma A.8. We assume that Assumptions 1-5 are satisfied. Then it holds that
T j − T j
T j= oa.s.
(N−ηT −1
j v N ,T (ε))
,
for j =D,S and every ε> 0.
Proof. We show the lemma for j =D. Recall that TD =∑Tt=1 dD,t
(θ0
)and TD =∑T
t=1 dD,t . It holds
that
TD − TD ≤T∑
t=1
∣∣dD,t(θ0)− dD,t
∣∣Denoting the density of zt by ft (z) = f (z), Assumption 5(iii) entails
∣∣dD,t −dD,t(θ0)∣∣= ∣∣∣∣∣
∫ θ0
θf (z)d z
∣∣∣∣∣≤ c0∣∣θ−θ0
∣∣ , (A.19)
54
so that
TD − TD ≤ c0T∣∣θ−θ0
∣∣ .
The desired result follows immediately from Lemma A.7.
Lemma A.9. We assume that Assumptions 3-5 are satisfied. Then it holds that
E∣∣Ri ,t
∣∣4+ε <∞ (A.20)
for some ε> 0, and
E max1≤t≤T
∣∣∣∣ t∑s=1
Rh,sRl ,sdD,s(θ0)−E
[Rh,sRl ,sdD,s
(θ0)]∣∣∣∣2
≤ c1T, (A.21)
E max1≤t≤T
∣∣∣∣ t∑s=1
Rh,sRl ,sdS ,s(θ0)−E
[Rh,sRl ,sdS ,s
(θ0)]∣∣∣∣2
≤ c2T, (A.22)
for all 1 ≤ h, l ≤ N .
Proof. Equation (A.20) is an immediate consequence of Assumptions 3(i) and 4(i). We show only
(A.21); (A.22) follows from exactly the same arguments. To begin with, note that Assumption 5(i)
entails that Rh,sRl ,sd1,s(θ0
)is also a strictly stationary, ρ-mixing sequence with mixing numbers
ρm such that∑∞
m=1ρ1/2m <∞ - this is a consequence of Theorem 14.1 in Davidson (1994), which is
stated for the cases of strong and uniform mixing but can be readily extended to ρ-mixing. Using
(A.20), it follows that
E∣∣Rh,sRl ,sdD,s
(θ0)∣∣2 ≤
(E
∣∣Rh,s∣∣4
)1/2 (E
∣∣Rl ,s∣∣4
)1/2 <∞.
Note also that∞∑
m=1ρ1/2
m <∞⇒∞∑
m=1
ρm
m<∞⇒
∞∑m=1
ρm(2m)<∞.
Hence, Corollary 1.1 in Shao (1995) and the strict stationarity of Rh,sRl ,sdD,s(θ0
)immediately
yield (A.21).
Lemma A.10. We assume that Assumptions 1-6 are satisfied. Then it holds that
lim supmin(N ,T )→∞
1
N −p
N∑h=p+1
g (h)j = g j <∞,
lim infmin(N ,T )→∞
1
N −p
N∑h=p+1
g (h)j = g
j> 0,
for j =D,S and every 0 ≤ p ≤ Pmax.
55
Proof. Note that
N∑h=p+1
g (h)j −
N∑h=p+1
g (h)j
=N∑
h=1g (h)
j −N∑
h=1g (h)
j −p∑
h=1g (h)
j +p∑
h=1g (h)
j
= tr(Σ j
)− tr(Σ j
)−(p∑
h=1
(g (h)
j − g (h)j
)).
Using Lemma 1, it is easy to see that
p∑h=1
(g (h)
j − g (h)j
)= oa.s.
(pN T 1/2
Tπ j(ln N )1+ε (lnT )1/2+ε
). (A.23)
Also, the same passages as in the proof of Lemma A.1 in Trapani (2018) yield
tr(Σ j
)− tr(Σ j
)= oa.s.
(N (ln N lnT )
1+ε2
T 1/2π j
). (A.24)
Thus, recalling that Pmax =O(min
T 1/2−c , N 1/2−c
), it follows that
1
N −p
N∑h=p+1
g (h)j = 1
N −p
N∑h=p+1
g (h)j +oa.s.
((ln N lnT )
1+ε2
T 1/2π j
)= 1
N −p
N∑h=p+1
g (h)j +oa.s. (1) ,
under Assumption 6.
Let now g u,(h)j and g e,(h)
j denote the h-th largest eigenvalues of 1Tπ j
∑Tt−1 E
(B0
j u0j ,t u0′
j ,t B0′j d j ,t
(θ0
))and 1
Tπ j
∑Tt−1 E
(et e′t d j ,t
(θ0
))respectively. By Weyl’s inequality
g e,(N )j + g u,(h)
j ≤ g (h)j ≤ g u,(h)
j + g e,(1)j ,
so that
g e,(N )j + 1
N −p
N∑h=p+1
g u,(h)j ≤ 1
N −p
N∑h=p+1
g (h)j ≤ 1
N −p
N∑h=p+1
g u,(h)j + g e,(1)
j .
By Assumption 2(i)(b), it holds that
g e,(N )j + 1
N −p
N∑h=p+1
g u,(h)j > 0;
also, Assumption 2(i)(a) and (B.1) entail
1
N −p
N∑h=p+1
g u,(h)j + g e,(1)
j <∞.
56
Hence, as N →∞0 < lim inf
N→∞< 1
N −p
N∑h=p+1
g (h)j < lim sup
N→∞<∞.
Putting all together, the lemma follows.
Lemma A.11. We assume that Assumptions 1-5 are satisfied. Then it holds that, for j =D,S
E
∥∥∥∥∥T −1T∑
t=1u0
j ,t d j ,t(θ0)∥∥∥∥∥
2
≤ c0T −1. (A.25)
Further, under Assumption 9(ii), as T →∞
T −1/2T∑
t=1u0
j ,t d j ,t(θ0) D→ N
(0,Vu, j
), (A.26)
where Vu, j = limT→∞ E(T −1 ∑T
t ,s=1 u0j ,t u′0
j ,sd j ,t(θ0
)d j ,s
(θ0
)).
Proof. We begin by showing that
f0j ,t d j ,t
(θ0
)is a ρ-mixing sequence with mixing number ρ j ,m
such that∑∞
m=1 ρ1/2j ,m < ∞. Note that f0
j ,t d j ,t(θ0
)is a measurable transformation of
f0
j ,t , zt ,εt
.
Let now ℑt−∞ and ℑ∞t+m be the σ-fields generated by
f0
j ,t ′ , zt ′ ,εt ′t
t ′=−∞ and
f0j ,t ′ , zt ′ ,εt ′
∞t ′=t+m
respectively, and similarly ℑt−∞ and ℑ∞t+m be the σ-fields generated by
f0
j ,t ′d j ,t ′(θ0
)t
t ′=−∞ andf0
j ,t ′d j ,t ′(θ0
)∞t ′=t+m
respectively. By measurability, ℑt−∞ ⊆ ℑt−∞ and ℑ∞t+m ⊆ ℑ∞
t+m; hence, ρ j ,m =ρ j ,m
(ℑt−∞,ℑ∞t+m
)≤ ρ j ,m = ρ j ,m(ℑt−∞,ℑ∞
t+m
). This therefore entails that
∑∞m=1 ρ
1/2j ,m <∑∞
m=1ρ1/2j ,m <∞
by Assumption 5(i), which in turn entails that∑∞
k=1 ρ j(2k
)<∞.
Note now that E∥∥∥u0
j ,t d j ,t(θ0
)∥∥∥2 ≤ E∥∥∥u0
j ,t
∥∥∥2, which is finite by Assumption 3(i). Thus
E
∥∥∥∥∥ T∑t=1
u0j ,t d j ,t
(θ0)∥∥∥∥∥
2
≤ E max1≤k≤T
∥∥∥∥∥ k∑t=1
u0j ,t d j ,t
(θ0)∥∥∥∥∥
2
≤ c0T,
where the last inequality follows from Lemma 1 in Peligrad, Utev, and Wu (2007). This proves
(A.25). As far as (A.26) is concerned, it follows readily upon checking the assumptions of Theorem
0 in Peligrad et al. (1987).
Lemma A.12. ??We assume that Assumptions 1-8 are satisfied. Then, as min(N ,T j
)→∞, it holds
that
Γ j −(HΓ
j j
(θ0))−1
Γ0j =oP (1)+oP∗ (1) , (A.27)
Λ j −Λ0j H j j
(θ0)=oP (1)+oP∗ (1) , (A.28)
57
for j =D,S , for almost all realizations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T
.
Proof. We present the full version of the proof of (A.27) only; the other two results follow from
similar arguments and, where possible, we omit the details to avoid repetition. We begin by
showing that the estimation error P j −P 0j is negligible. To this end, we assume without loss of
generality that P j ≥ P 0j , and we omit dependence on θ or θ0 whenever possible. Let
HΓP j
j j Γ j = H∗Γj j Γ
∗j + HΓ
j j Γ j ,
where H∗Γj j is
(P 0
j +1)×
(P 0
j +1), Γ∗j is
(P 0
j +1)×1, HΓ
j j is(P 0
j +1)×
(P j −P 0
j
)and Γ j is
(P j −P 0
j
)×1,
defined such that HΓP j
j j =[
H∗Γj j |HΓ
j j
]and Γ j =
(Γ∗′j , Γ′j
)′. We begin by showing that∥∥∥∥H
ΓP j
j j
∥∥∥∥F=OP∗ (1) . (A.29)
Note that ∥∥∥∥HΓP j
j j
∥∥∥∥F≤
∥∥∥∥∥U j(θ0
)U j (θ)′
T
∥∥∥∥∥F
∥∥∥∥∥B′j
(θ0
)B j (θ)
N
∥∥∥∥∥F
∥∥V j (θ)−1∥∥
F ;
the first term is bounded by Assumption 3(i); also note that, using the triangular inequality
∥∥∥∥∥B′j
(θ0
)B j (θ)
N
∥∥∥∥∥F
≤∥∥∥∥∥B′
j
(θ0
)B j
(θ0
)N
∥∥∥∥∥F
+
∥∥∥∥∥∥∥∥B′
j
(B j (θ)−B j
(θ0
)HΓP j
j j
)N
∥∥∥∥∥∥∥∥F
≤∥∥∥∥∥B′
j
(θ0
)B j
(θ0
)N
∥∥∥∥∥F
+∥∥∥∥ B j
N 1/2
∥∥∥∥F
∥∥∥∥∥∥∥B j (θ)−B j
(θ0
)HΓP j
j j
N 1/2
∥∥∥∥∥∥∥F
.
The first term is bounded by virtue of Assumption 3(iv), whereas the second one can be shown
to be dominated by using the same arguments as in footnote 5 in Bai (2003). Finally, note that
∥∥V j (θ)−1∥∥
F ≤[
g (N )(Σ j ,R (θ)
)]−1P j =
[g (N )
(Σ j ,R (θ)
)]−1P 0
j +oP∗ (1) ,
by Theorem 3, with g (N )(Σ j ,R (θ)
)> 0 implied by Assumption 2(i)(b). By similar passages, it is
easy to see that ∥∥Γ j∥∥
F =OP∗ (1) . (A.30)
We now show that ∥∥∥HΓj j Γ j
∥∥∥F= oP∗ (1) , (A.31)
58
for almost all realizations of the sample. Indeed,∥∥∥HΓ
j j Γ j
∥∥∥F≤
∥∥∥HΓj j
∥∥∥F
∥∥Γ j∥∥
F , and∥∥Γ j
∥∥F is bounded
by (A.30). Also ∥∥∥HΓj j
∥∥∥F= tr
(HΓ′
j j HΓj j
)≤
(P j −P 0
j
)g (1)
(HΓ′
j j HΓj j
),
with P j −P 0j = oP∗ (1) by Theorem 3 and g (1)
(HΓ′
j j HΓj j
)bounded by (A.29). Therefore, H
ΓP j
j j Γ j =H∗Γ
j j Γ∗j +oP∗ (1). Turning to H∗Γ
j j Γ∗j , let Γ
∗P 0j
j be the estimate obtained when using P 0j , and consider
H∗Γj j Γ
∗j =
(H∗Γ
j j ± HΓj j
)(Γ∗j ± Γ
∗P 0j
j
).
By the same arguments as in footnote 5 in Bai (2003), it is easy to see that
∥∥∥∥Γ∗j − Γ∗P 0j
j
∥∥∥∥F= oP∗ (1);
then it follows immediately that
H∗Γj j Γ
∗j = HΓ
j j Γ∗P 0
j
j +oP∗ (1) ,
which, combined with (A.31), yields
HΓj j Γ j = HΓ
j j Γ∗P 0
j
j +oP∗ (1) . (A.32)
We now conclude the proof of (A.27) by showing that
HΓj j Γ
∗P 0j
j = Γ j +oP (1) .
Let XP 0
j
j (θ0) =[ιN , B
P 0j
j (θ0)
]. By the same logic as in the previous passages, and using Theorem 3.4
in Massacci (2017), we have
1
T
T∑t=1Γ∗P 0
j
j ,t d j ,t =(
XP j ′j (θ)X
P j
j (θ)
)−1
XP j ′j (θ)
1
T
T∑t=1
Rt d j ,t
=(
XP 0
j ′j (θ0)X
P 0j
j (θ0)
)−1
XP 0
j ′j (θ0)
1
T
T∑t=1
Rt d j ,t +oP∗ (1)+oP (1) .
Also (X
P 0j ′
j (θ0)XP 0
j
j (θ0)
)−1
XP 0
j ′j (θ0)
1
T
T∑t=1
Rt d j ,t (A.33)
=(
XP 0
j ′j (θ0)X
P 0j
j (θ0)
)−1
XP 0
j ′j (θ0)
1
T
T∑t=1
Rt d j ,t(θ0)
+(
XP 0
j ′j (θ0)X
P 0j
j (θ0)
)−1
XP 0
j ′j (θ0)
1
T
T∑t=1
Rt(d j ,t −d j ,t
(θ0))
= I + I I .
59
Noting that
1
T
T∑t=1
Rt(d j ,t −d j ,t
(θ0))≤ (
1
T
T∑t=1
‖Rt‖2
)1/2 (1
T
T∑t=1
(d j ,t −d j ,t
(θ0))2
)1/2
,
and using Theorem 3.4 in Massacci (2017), it follows that
1
T
T∑t=1
Rt(d j ,t −d j ,t
(θ0))=OP
(N−ηT −1) ,
so that, in equation (A.33), we have I I = OP(N−ηT −1
). Also, by Corollary 3.1(a) in Massacci
(2017),
∥∥∥∥BP 0
j
j −B0j H j j
∥∥∥∥2
=OP (NC−2N ,T ), which in turn implies
XP 0
j
j (θ0)− (ιN ,B0j H j j ) = X
P 0j
j (θ0)−X0j HΓ
j j =OP (NC−2N ,T );
thus
1
T
T∑t=1Γ∗P 0
j
j ,t d j ,t =(
HΓ′j j X0′
j X0j HΓ
j j
N
)−1 (HΓ′
j j X0′j
N
)(1
T
T∑t=1
Rt d j ,t(θ0))
+OP (C−2N ,T )+OP
(N−ηT −1)+oP∗ (1) .
Let α= (α1, ...,αN )′. Recalling (2.4), it follows that
1
T
T∑t=1Γ∗P 0
j
j ,t d j ,t −T j
T
(HΓ
j j
)−1Γ0
j
= T j
T
(HΓ′
j j X0′j X0
j HΓj j
N
)−1 (HΓ′
j j X0′j α j
N
)
+(
HΓ′j j X0′
j X0j HΓ
j j
N
)−1 (HΓ′
j j X0′j
N
)(1
T
T∑t=1
B0j u0
j ,t d j ,t(θ0))
+(
HΓ′j j X0′
j X0j HΓ
j j
N
)−1 (HΓ′
j j X0′j
N
)(1
T
T∑t=1
εt d j ,t(θ0))
+OP(C−2
N ,T
)+OP(N−ηT −1)+oP∗ (1)
= Ia + Ib + Ic +OP(C−2
N ,T
)+OP(N−ηT −1)+oP∗ (1)
60
Assumptions 3(iv) and 7(i) immediately yields Ia =OP
(T j
N 1/2T
). Consider Ic ; it holds that
E
(1
N T
T∑t=1
X0′j εt d j ,t
(θ0))2
= 1
(N T )2
N∑i ,k=1
T∑t ,s=1
X 0i , j X 0′
k, j E(εi ,tε j ,sd j ,t
(θ0)d j ,s
(θ0))
≤ 1
(N T )2 max1≤i≤N
∥∥∥X 0i , j
∥∥∥2 N∑i ,k=1
T∑t ,s=1
∣∣E (εi ,tε j ,sd j ,t
(θ0)d j ,s
(θ0))∣∣
≤ c0N
(N T )2 max1≤i≤N
N∑k=1
T∑t ,s=1
∣∣E (εi ,tε j ,sd j ,t
(θ0)d j ,s
(θ0))∣∣≤ c0
N T,
by Assumptions 3(iii) and 2(iv); so, putting all together and using Assumption 3(iv), it follows
that Ic =OP
(1pN T
). Turning to Ib, (A.25) yields
(HΓ′
j j X0′j X0
j HΓj j
N
)−1 (HΓ′
j j X0′j B0
j
N
)(1
T
T∑t=1
u0j ,t d j ,t
(θ0))=OP
(1pT
)(A.34)
Putting all together, it follows that
1
T j
T∑t=1Γ∗P 0
j
j ,t d j ,t −(HΓ
j j
)−1Γ0
j =OP
(1
N 1/2
)+OP
(T 1/2
T j
)+OP
(T
T j N
)+OP
(N−ηT −1)+oP∗ (1) ; (A.35)
the desired result now follows immediately. We now turn to showing (A.28). We assume that
P 0j is known; the estimation error P j −P 0
j can be shown to be negligible on account of the same
arguments as above. Ee have
Λ j −Λ0j H j j =
(T∑
t=1gt u
P 0j ′
j ,t d j ,t
)(T∑
t=1u
P 0j
j ,t uP 0
j ′j ,t d j ,t
)−1
.
Note that, by Corollary 3.2 in Massacci (2017), it is easy to see that
1
T
T∑t=1
uP 0
j
j ,t uP 0
j ′j ,t d j ,t = 1
T
T∑t=1
H−1j j u
P 0j
j ,t uP 0
j ′j ,t
(H′
j j
)−1d j ,t
(θ0)+oP (1) . (A.36)
Also1
T
T∑t=1
gt uP 0
j ′j ,t d j ,t = 1
T
T∑t=1
gt uP 0
j ′j ,t d j ,t
(θ0)+ 1
T
T∑t=1
gt uP 0
j ′j ,t
(d j ,t −d j ,t
(θ0))= I + I I . (A.37)
Consider I I ; we have
I I ≤(
1
T
T∑t=1
∥∥gt∥∥4
)1/4 (1
T
T∑t=1
∥∥∥∥uP 0
j
j ,t
∥∥∥∥4)1/4 (
1
T
T∑t=1
(d j ,t −d j ,t
(θ0))2
)1/2
; (A.38)
61
using Assumptions 3(i) and 8(i), and using Theorem 3.4 in Massacci (2017), it follows that I I =OP
(N−ηT −1
). Turning to I , by the definition of gt
(θ)
it holds that
I = 1
T
T∑t=1
(gt −a
)u
P 0j ′
j ,t d j ,t(θ0)+ 1
T
T∑t=1
(gt −a
) 1
T
T∑t=1
uP 0
j ′j ,t d j ,t
(θ0)= Ia + Ib .
Using (A.25) and Assumption 8(ii), it follows that Ib =OP(T −1
).
Ia = 1
T
T∑t=1Λ0
j u0j ,t u
P 0j ′
j ,t d j ,t(θ0)+ 1
T
T∑t=1
et uP 0
j ′j ,t d j ,t
(θ0)= Ia,1 + Ia,2.
Now
Ia,1 = Λ0j H j j H−1
j j
(1
T
T∑t=1
u0j ,t u0′
j ,t
)(H′
j j
)−1d j ,t
(θ0)
+Λ0j
1
T
T∑t=1
u0j ,t
(u
P 0j
j ,t − H−1j j u0
j ,t
)′d j ,t
(θ0)
= Ia,1,1 + Ia,1,2.
Following the same passages as in the proof of Lemma B.2 in Bai (2003), it can be shown that
1
T
T∑t=1
u0j ,t
(u
P 0j
j ,t − H−1j j u0
j ,t
)′d j ,t
(θ0)=OP
(C−2
N ,T
), (A.39)
so that Ia,1,2 =OP
(C−2
N ,T
). Also
Ia,2 =(
1
T
T∑t=1
et u0′j ,t
)(H′
j j
)−1d j ,t
(θ0)+ 1
T
T∑t=1
et
(u
P 0j
j ,t − H−1j j u0
j ,t
)′d j ,t
(θ0)= Ia,2,1 + Ia,2,2.
Assumption 8(ii) entails that Ia,2,1 =OP(T −1/2
). Finally, by the same token as for (A.39), Ia,2,2 =
OP
(C−2
N ,T
), whence Ia,2 =OP
(T −1/2
)+OP
(C−2
N ,T
). Putting all together, we obtain
Λ j −Λ0j H j j =OP
(1
T 1/2
)+OP
(C−2
N ,T
)+OP(N−ηT −1) . (A.40)
For j = D,S , consider the matrices: M( j)H,X = limN→∞ N−1HΓ′
j j
(θ0
)X0′
j X0j HΓ
j j
(θ0
); M( j)
H,X,B =limN→∞ N−1HΓ′
j j
(θ0
)X0′
j B0j ; and M( j)
H,u =(M( j)
u
)−1H j j
(θ0
). We then define
ΣΓ, j =c−2j
(M( j)
H,X
)−1M( j)
H,X,BVu, j M( j)′H,X,B
(M( j)
H,X
)−1,
ΣαΓ, j =σ2α, j
(M( j)
H,X
)−1,
ΣΛ, j =(M( j)′
H,u ⊗ IK
)Veu, j
(M( j)
H,u ⊗ IK
).
62
Lemma A.13. We assume that Assumptions 1-8 are satisfied, and that min(N ,T j
)→∞, with
T j
T→ c j ∈ (0,1) , (A.41)
T 1/2
N→ 0. (A.42)
Then, under Assumptions 7 and 9, it holds that(1
TΣΓ, j + 1
NΣαΓ, j
)−1/2 (Γ j −
(HΓ
j j
(θ0))−1
Γ0j
)D∗→ N
(0, IP j+1
); (A.43)
under Assumption 9(ii), it holds that(1
TΣΛ, j
)−1/2
vec(Λ j −Λ0
j H j j(θ0)) D∗
→ N(0, IP j K
), (A.44)
for j =D,S , for almost all realisations ofRi ,t ,1 ≤ i ≤ N ,1 ≤ t ≤ T
.
Proof. Consider (A.43). Upon inspecting (A.35), under (A.41) and (A.42), and using Lemma A.8,
it is immediate to see that the limiting behaviour of T 1/2(Γ∗P 0
j
j −(HΓ
j j
)−1Γ0
j
)is driven by
(HΓ′
j j X0′j X0
j HΓj j
N
)−1 [(HΓ′
j j X0′j B0
j
N
)(T
T j
1
T 1/2
T∑t=1
u0j ,t d j ,t
(θ0))+ T j
T
(HΓ′
j j X0′j α j
N
)].
The desired result now follows immediately from (A.26) and Assumption 7. Turning to (A.44),
combining (A.40) and (A.42), it follows that the limiting law of T 1/2(Λ j −Λ0
j H j j
)is driven by(
1
T 1/2
T∑t=1
et u0′j ,t
(H j j
)−1d j ,t
(θ0))(
1
T
T∑t=1
(H j j
)−1u0
j ,t u0′j ,t
(H′
j j
)−1d j ,t
(θ0))−1
=(
1
T 1/2
T∑t=1
et u0′j ,t d j ,t
(θ0))(
1
T
T∑t=1
u0j ,t u0′
j ,t d j ,t(θ0))−1
H j j ,
and using Assumption 9(ii) the desired result follows immediately.
In order to use the results in Lemma A.13, we propose the following estimators
ΣΓ, j =c−2j
(X′
j X j
N
)−1 X′
j BP j
j
N
Vu, j
X′j B
P j
j
N
′ (
X′j X j
N
)−1
,
ΣαΓ, j =σ2α, j
(X′
j X j
N
)−1
,
ΣΛ, j =((
M( j)u
)−1 ⊗ IK
)Veu, j
((M( j)
u
)−1 ⊗ IK
),
63
B Proofs
Henceforth, we use E∗ and V ∗ to denote, respectively, the expected value and the variance
with respect to P∗.
Proof of Lemma 1. The lemma is an adaptation of Lemma 1 in Trapani (2018). In order to prove
it, note that Assumption 4(v) entails that, for j =D,S
1
Tπ j
T∑t=1
E[(
B0j u0
j ,t +et
)(B0
j u0j ,t +et
)′d j ,t
(θ0)]
= B0j
1
Tπ j
T∑t=1
E(u0
j ,t u0′j ,t d j ,t
(θ0))B0′
j + 1
Tπ j
T∑t=1
E(et e′t d j ,t
(θ0)) .
We begin by showing that
g (i )j
(B0
j1
Tπ j
T∑t=1
E(u0
j ,t u0′j ,t d j ,t
(θ0))B0′
j
)∈
(c(i )
j N ,c(i )j N
)for 0 ≤ i ≤ P 0
j
= 0 for i ≥ P 0j +1
, (B.1)
for j =D,S , where 0 < c(i )j ≤ c(i )
j <∞. We show this result for the case j =D only - the case j =S
is repetitive. By the multiplicative Weyl’s inequality (Merikoski and Kumar (2004)) we have
g (i )D
(B0
D
1
TπD
T∑t=1
E(u0
D,t u0′D,t dD,t
(θ0))B0′
D
)≥ g (min)
D
(1
TπD
T∑t=1
E(u0
D,t u0′D,t dD,t
(θ0)))g (i )
D
(B0′
DB0D
),
g (i )D
(B0
D
1
TπD
T∑t=1
E(u0
D,t u0′D,t dD,t
(θ0))B0′
D
)≤ g (max)
D
(1
TπD
T∑t=1
E(u0
D,t u0′D,t dD,t
(θ0)))g (i )
D
(B0′
DB0D
),
by Assumption 3(ii) it holds that
g (min)D
(1
TπD
T∑t=1
E(u0
D,t u0′D,t dD,t
(θ0)))> 0.
Moreover, by Assumption 3(iv) we have
g (i )D
(B0
DB0′D
) ≥ c(i )1 N for 1 ≤ i ≤ P 0
D= 0 for i ≥ P 0
D+1
.
Equation (B.1) now follows immediately. Consider now
1
TπD
T∑t=1
et e′t dD,t(θ0)= 1
Tπ j
T∑t=1
εtε′t dD,t
(θ0)−εDε′D ,
64
so that
g (1)D
(1
TπD
T∑t=1
E(et e′t dD,t
(θ0)))≤ g (1)
D
(1
TπD
T∑t=1
E(εtε
′t dD,t
(θ0)))+ g (1)
D
(εDε
′D
).
The first term is bounded by Assumption 2(i). Also, after some algebra we have
g (1)D
(εDε
′D
)≤ max1≤i≤N
N∑k=1
1
(TπD)2
T∑t=1
T∑s=1
∣∣E (εi ,tεk,sdD,t
(θ0)dD,s
(θ0))∣∣ ,
which is bounded by Assumption 2(ii). The proof of the lemma now follows immediately along
the same lines as the proof of Lemma 1 in Trapani (2018).
Proof of Theorem 1. We prove the theorem for j = 1 - again, the proof for the case j = 2 is just a
repetition of the same arguments. Note that∣∣∣g (i )1 − g (i )
1
∣∣∣≤ ∥∥ΣS −ΣD
∥∥op ,
so that, by symmetry
∣∣∣g (i )1 − g (i )
1
∣∣∣≤ ∣∣∣∣∣ N∑h=1
N∑l=1
(1
TπD
T∑t=1
(Rh,t Rl ,t dD,t −E
(Rh,t Rl ,t d1,t
)))2∣∣∣∣∣1/2
. (B.2)
We define the short-hand notation δh,l ,t = Xh,t Xl ,t dD,t(θ0
)−E(Xh,t Xl ,t dD,t
(θ0
)); based on this,
(B.2) can be rewritten as
∣∣∣g (i )1 − g (i )
1
∣∣∣≤ ∣∣∣∣∣ N∑h=1
N∑l=1
(1
TπD
T∑t=1
(δh,l ,t + Rh,t Rl ,t
(dD,t −dD,t
(θ0))))2∣∣∣∣∣
1/2
. (B.3)
By repeated use of the Cr -inequality we have
∣∣∣g (i )1 − g (i )
1
∣∣∣≤ c0
∣∣∣∣∣ N∑h=1
N∑l=1
(1
TπD
T∑t=1
δh,l ,t
)2
+N∑
h=1
N∑l=1
(1
TπD
T∑t=1
Rh,t Rl ,t(dD,t −dD,t
(θ0)))2∣∣∣∣∣
1/2
≤ c1
∣∣∣∣∣ N∑h=1
N∑l=1
(1
TπD
T∑t=1
δh,l ,t
)2∣∣∣∣∣1/2
+ c2
∣∣∣∣∣ N∑h=1
N∑l=1
(1
TπD
T∑t=1
Rh,t Rl ,t(dD,t −dD,t
(θ0)))2∣∣∣∣∣
1/2
.
(B.4)
Using Lemma A.9, we have
E max1≤h≤N ,1≤l≤N ,1≤t≤T
h∑h′=1
l∑l ′=1
(t∑
t ′=1
δh′,l ′,t ′
)2
≤ c0N 2T ;
65
thus, by Lemma A.1 and Markov inequality, we have
N∑h=1
N∑l=1
(1
TπD
T∑t=1
δh,l ,t
)2
= oa.s.
(N 2T
T 2π2D
(ln N )2+ε (lnT )1+ε)
,
for every ε> 0. Considering the second term in (B.4), convexity implies that
N∑h=1
N∑l=1
(1
TπD
T∑t=1
Rh,t Rl ,t(dD,t −dD,t
(θ0)))2
≤ 1
Tπ2D
N∑h=1
N∑l=1
T∑t=1
R2h,t R2
l ,t
(dD,t −dD,t
(θ0))2
. (B.5)
Hence, applying (A.19) to (B.5) it follows that
N∑h=1
N∑l=1
(1
TπD
T∑t=1
Rh,t Rl ,t(dS ,t −dD,t
(θ0)))2
≤ c0∣∣θ−θ0
∣∣2 1
Tπ2D
N∑h=1
N∑l=1
T∑t=1
R2h,t R2
l ,t .
Equation (A.20) entails
E
(1
N 2T
N∑h=1
N∑l=1
T∑t=1
R2h,t R2
l ,t
)≤ c0.
The maximal inequality for rectangular sums (Moricz (1983)) now yields
E max1≤h≤N ,1≤l≤N ,1≤t≤T
h∑h′=1
l∑l ′=1
t∑t ′=1
R2h′,t ′R
2l ′,t ′ ≤ c0N 2T (ln N )2 lnT,
which, by Lemma A.1 and Markov inequality, yields
N∑h=1
N∑l=1
T∑t=1
R2h,t R2
l ,t = oa.s.(N 2T (ln N )3+ε (lnT )2+ε) ,
for every ε> 0. Then, by Lemma A.7 we obtain the general result∣∣∣∣∣ N∑h=1
N∑l=1
(1
TπD
T∑t=1
Rh,t Rl ,t(d1,t −dD,t
(θ0)))2∣∣∣∣∣
1/2
= oa.s.
(N 1−η
TπS(ln N )
3+ε2 (lnT )
2+ε2 v N ,T (ε)
);
recalling that we have assumed η= 1, the desired result follows.
Proof of Theorem 2. The proofs are similar to those of Theorems 3 and 4 in Horvath and Trapani
(2019), and therefore we only report the main arguments to save space. We begin by showing
(4.18). It follows from Lemma 1 and Theorem 1 that, for all i and j =D,S
P
ω : limmin(N ,T )→∞
N−% j g (i )j
g j
(p
) −ÅN 1−% j
=∞= 1.
66
Thus, by continuity, we can assume henceforth that
limmin(N ,T )→∞
exp(−ÅN 1−% j
)ψ
(g (i )
j
)=∞. (B.6)
Let Φ (·) denote the standard normal distribution; it holds that
M−1/2M∑
m=1
(ζ(i )
j ,m (s)− 1
2
)= M−1/2
M∑m=1
(Iξ
( j)i ,m ≤ 0
− 1
2
)+M−1/2
M∑m=1
Φ s
ψ(g (i )
j
)− 1
2
+M−1/2
M∑m=1
I
ξ( j)i ,m ≤ s
ψ(g (i )
j
)− I
ξ
( j)i ,m ≤ 0
−
Φ s
ψ(g (i )
j
)− 1
2
.
By definition
E∗ζ(i )j ,m (s) = Φ
s
ψ(g (i )
j
) ,
V ∗ζ(i )j ,m (s) = Φ
s
ψ(g (i )
j
)1−Φ
s
ψ(g (i )
j
) .
Thus
E∗∫ ∞
−∞
∣∣∣∣∣∣M−1/2M∑
m=1
I
ξ( j)i ,m ≤ s
ψ(g (i )
j
)− I
ξ
( j)i ,m ≤ 0
−
Φ s
ψ(g (i )
j
)− 1
2
∣∣∣∣∣∣2
dΦ(s)
=∫ ∞
−∞E∗
∣∣∣∣∣∣I
ξ( j)i ,1 ≤ s
ψ(g (i )
j
)− I
ξ
( j)i ,1 ≤ 0
−
Φ s
ψ(g (i )
j
)− 1
2
∣∣∣∣∣∣2
dΦ(s),
on account of the independence of the ξ( j)i ,ms across m. Also
E∗I
ξ( j)i ,1 ≤ s
ψ(g (i )
j
)− I
ξ
( j)i ,1 ≤ 0
= Φ
s
ψ(g (i )
j
)− 1
2
V ∗I
ξ( j)i ,1 ≤ s
ψ(g (i )
j
)− I
ξ
( j)i ,1 ≤ 0
=Φ
s
ψ(g (i )
j
)− 1
2
3
2−Φ
s
ψ(g (i )
j
)
≤ Φ
s
ψ(g (i )
j
)− 1
2.
67
Hence, we have∫ ∞
−∞E∗
∣∣∣∣∣∣I
ξ( j)i ,1 ≤ s
ψ(g (i )
j
)− I
ξ
( j)i ,1 ≤ 0
−
Φ s
ψ(g (i )
j
)− 1
2
∣∣∣∣∣∣2
dΦ (s)
≤∫ ∞
−∞
∣∣∣∣∣∣Φ s
ψ(g (i )
j
)− 1
2
∣∣∣∣∣∣dΦ (s) ≤ 1p
2πψ(g (i )
j
) ∫ ∞
−∞|s|dΦ (s) = 1
πψ(g (i )
j
) ,
which drifts to zero by (B.6). Also∫ ∞
−∞M−1/2
M∑m=1
Φ s
ψ(g (i )
j
)− 1
2
dΦ (s) ≤ M
2π∣∣∣ψ(
g (i )j
)∣∣∣2
∫ ∞
−∞s2dΦ (s) = M
2π∣∣∣ψ(
g (i )j
)∣∣∣2 .
Hence, using (4.17), we conclude via Markov’s inequality that
Υ(i )j =
∫ ∞
−∞
2pM
M∑m=1
(Iξ
( j)i ,m ≤ 0
− 1
2
)2
dΦ (s)+oP∗(1)
=
2pM
M∑m=1
(Iξ
( j)i ,m ≤ 0
− 1
2
)2
+oP∗(1),
and therefore the desired result follows from the Central Limit Theorem for Bernoulli random
variables.
We now turn to showing (4.19). Again Lemma 1 and Theorem 1 entail that
P
ω : limmin(N ,T )→∞
N−% j g (i )j
g j
(p
) = 0
= 1.
By continuity, this means that we can assume from now on that
limmin(N ,T )→∞
ψ(g (i )
j
)= 1. (B.7)
Consider
ζ(i )j ,m (s)− 1
2= I
ξ( j)i ,1 ≤ s
ψ(g (i )
j
)±Φ
s
ψ(g (i )
j
)− 1
2.
Then we have
E∗∫ ∞
−∞
∣∣∣∣∣∣M−1/2M∑
m=1
I
ξ( j)i ,1 ≤ s
ψ(g (i )
j
)− 1
2
∣∣∣∣∣∣2
dΦ (s)
= E∗I
ξ( j)i ,1 ≤ s
ψ(g (i )
j
)−Φ
s
ψ(g (i )
j
)2
+M∫ ∞
−∞
Φ s
ψ(g (i )
j
)− 1
2
2
dΦ (s) .
68
Given that
E∗I
ξ( j)i ,1 ≤ s
ψ(g (i )
j
)−Φ
s
ψ(g (i )
j
)2
<∞,
by Markov’s inequality it follows that
∫ ∞
−∞
M−1/2M∑
m=1
I
ξ( j)i ,1 ≤ s
ψ(g (i )
j
)−Φ
s
ψ(g (i )
j
)2
dΦ (s) =OP∗(1),
for almost all realizations of e j ,b j ,−∞< j <∞. Thus, as min(M , N ,T j
)→∞ for j =D,S
1
4MΥ(i )
j =∫ ∞
−∞
(Φ (s)− 1
2
)2
dΦ (s)+OP∗(1). (B.8)
Hence the proof of (4.19) is complete.
Proof of Theorem 3. See the proof of (the identical) Theorem 3 in Trapani (2018).
Proof of Theorem 4. The result follows from combining (A.27) and (A.28), with
γg, j ,1 −γ0g, j ,1 =OP
(1
N 1/2
)+OP
(T 1/2
T j
)+OP
(C−2
N ,T
)+OP(N−ηT −1) . (B.9)
Proof of Theorem 5. The result readily follows by combining the results in Lemma A.13 (see also
Theorem 3 in Giglio and Xiu (2020)).
Proof of Lemma 2. We report the proof using the assumption that the true d j ,t has been used
(also in the computation of T j ); extending to the case where d j ,t is employed is straighforward in
light of the results derived above. It holds that
e j ,i =α j ,i +ε j ,i +(
1
T j
T∑t=1
d j ,t
)(γ j ,0 − γ j ,0
)+ 1
T j
T∑t=1
(β′
j ,i
(γ j ,1 +u j ,t
)− β′j ,i
(γ j ,1 + u j ,t
))d j ,t ,
where
ε j ,i = 1
T j
T∑t=1
εi ,t d j ,t .
69
Also, applying Theorem 3.4 and Corollary 3.2 in Massacci (2017) (recalling also (5.3)) yields
1
N
N∑i=1
∥∥β j ,i − β j ,i∥∥2 = oP (1) , (B.10)∥∥∥∥∥ 1
T j
T∑t=1
(u j ,t − u j ,t
)∥∥∥∥∥2
≤ T
T j
1
T
T∑t=1
∥∥u j ,t − u j ,t∥∥2 = oP (1) , (B.11)
having omitted rotation matrices for the sake of the notation. Note now that
1
N
N∑i=1
e2j ,i = 1
N
N∑i=1
α2j ,i
+ 1
N
N∑i=1
(ε j ,i +
(γ j ,0 − γ j ,0
)+ 1
T j
T∑t=1
(β′
j ,i
(γ j ,1 +u j ,t
)− β′j ,i
(γ j ,1 + u j ,t
))d j ,t
)2
+ 2
N
N∑i=1
α j ,i
(ε j ,i +
(γ j ,0 − γ j ,0
)+ 1
T j
T∑t=1
(β′
j ,i
(γ j ,1 +u j ,t
)− β′j ,i
(γ j ,1 + u j ,t
))d j ,t
).
The LLN entails1
N
N∑i=1
α2j ,i =σ2
α, j +oP (1) ;
upon showing that
1
N
N∑i=1
(ε j ,i +
(γ j ,0 − γ j
)+ 1
T j
T∑t=1
(β′
j ,i
(γ j ,1 +u j ,t
)− β′j ,i
(γ j ,1 + u j ,t
))d j ,t
)2
= oP (1) ,
and using (repeatedly) the Cauchy-Schwartz inequality, the lemma follows. Now
1
N
N∑i=1
(ε j ,i +
(γ j ,0 − γ j
)+ 1
T j
T∑t=1
(β′
j ,i
(γ j ,1 +u j ,t
)− β′j ,i
(γ j ,1 + u j ,t
))d j ,t
)2
≤ c0
(1
N
N∑i=1
ε2j ,i +
(γ j ,0 − γ j
)2 + 1
N
N∑i=1
(1
T j
T∑t=1
(β′
j ,i
(γ j ,1 +u j ,t
)− β′j ,i
(γ j ,1 + u j ,t
)))2
d j ,t
).
By (A.27), γ j ,0 − γ j = oP (1). Also, combining Theorem 4 and (B.10)-(B.11), it is easy to see that
1
N
N∑i=1
(1
T j
T∑t=1
(β′
j ,i
(γ j ,1 +u j ,t
)− β′j ,i
(γ j ,1 + u j ,t
)))2
d j ,t = oP (1) .
Finally we have
max1≤i≤N
E(ε2
j ,i
)= max
1≤i≤N
1
T 2j
T∑t=1
T∑s=1
E(εi ,tεi ,sd j ,t d j ,s
)≤ c0T −1j ,
using Assumption 4(ii). Hence1
N
N∑i=1
ε2j ,i = oP (1) .
Putting all together, the desired result follows.
70
Proof of Lemma 3. The proof makes appeal to several arguments which have already been used
in the paper and therefore, for the sake of a concise discussion, we omit passages when this does
not give rise to ambiguity.
We begin by noting that, by the Frisch-Waugh-Lovell theorem, it holds that
γ j ,0 =(i ′N MB , j iN
)−1 (i ′N MB , j R j
(θ))
,
having defined
MB , j = B j
(B′
j B j
)−1B′
j ,
for j =D,S - note that we use B j in lieu of BP j
j to make the notation less burdensome. Thus, on
account of equation (2.5), it holds that
γ j ,0 = γ j ,0 +(i ′N MB , j iN
)−1 (i ′N MB , jα j
)+ (i ′N MB , j iN
)−1
(i ′N MB , j
1
T j
T∑t=1
εt d j ,t
). (B.12)
We begin by noting that, using Corollary 3.1(a) in Massacci (2017), it is easy to see that∥∥∥MB , j −M0B , j
∥∥∥2 =OP(NC−2
N ,T
),
so thati ′N MB , j iN
N=
i ′N M0B , j iN
N+OP
(C−2
N ,T
)=OP (1) . (B.13)
Also, it holds that
E(B j −B0
j H j j
)′α jα
′j
(B j −B0
j H j j
)=σ2
α, j E(B j −B0
j H j j
)′ (B j −B0
j H j j
)=O
(NC−2
N ,T
),
again by Corollary 3.1(a) in Massacci (2017); this yields the (non sharp) bound
i ′N(MB , j −M0
B , j
)α j =OP
(N 1/2C−1
N ,T
). (B.14)
Finally, note that Assumption 7(i) entails that
N 1/2(i ′N M0
B , j iN
)−1 (i ′N M0
B , jα j
)=OP (1) . (B.15)
Putting (B.13)-(B.15) together, we obtain
N 1/2 (i ′N MB , j iN
)−1 (i ′N MB , jα j
)= N 1/2(i ′N M0
B , j iN
)−1 (i ′N M0
B , jα j
)+oP (1) . (B.16)
71
Consider now
i ′N MB , j1
T j
T∑t=1
εt d j ,t = i ′N1
T j
T∑t=1
εt d j ,t + i ′N B j
(B′
j B j
)−1B′
j1
T j
T∑t=1
εt d j ,t = I + I I . (B.17)
Using Assumption 2(ii) and equation (A.19), it is easy to see that
E
(i ′N
1
T j
T∑t=1
εt d j ,t
)2
= T −2j E
(N∑
i ,k=1
T∑t ,s=1
εi ,tεk,s d j ,t d j ,s
)≤ N T −2
j max1≤k≤N
N∑i=1
T∑t ,s=1
∣∣E (εi ,tεk,s d j ,t d j ,s
)∣∣≤ c0N
T j,
so that
i ′N1
T j
T∑t=1
εt d j ,t =OP
(N 1/2T −1/2
j
). (B.18)
In order to study I I in (B.17), note that
i ′N B j
(B′
j B j
)−1B′
j1
T j
T∑t=1
εt d j ,t (B.19)
= i ′N B0j H j j
(B′
j B j
)−1H′
j j B0′j
1
T j
T∑t=1
εt d j ,t + i ′N(B j −B0
j H j j
)(B′
j B j
)−1H′
j j B0′j
1
T j
T∑t=1
εt d j ,t
+ i ′N B0j H j j
(B′
j B j
)−1 (B j −B0
j H j j
)′ 1
T j
T∑t=1
εt d j ,t
+ i ′N(B j −B0
j H j j
)(B′
j B j
)−1 (B j −B0
j H j j
)′ 1
T j
T∑t=1
εt d j ,t
= I + I I + I I I + IV.
By construction,(B′
j B j
)−1 = Op(N−1
), and i ′N B0
j = OP (N ). Using Assumption 2(ii) and equation
(A.19), similarly to (B.18), it also follows that
B0′j T −1
j
T∑t=1
εt d j ,t =OP
(N 1/2T −1/2
j
), (B.20)
so that I =OP
(N−1/2T −1/2
j
). Consider now
B0j − B j H−1
j j
= T −1B0j H j j
(H−1
j j U0j ,t − U j ,t
)U′
j ,t H−1j j +T −1ε j
(U′
j ,t −U0′j ,t
(H−1
j j
)′)H−1
j j
+T −1ε j U0′j ,t
(H−1
j j
)′H−1
j j , (B.21)
where ε j is an N ×T matrix with columns εt d j ,t . Using exactly the same arguments as in the
proofs of Lemmas B.1 and B.3 in Bai (2003), it holds that∥∥∥T −1
(H−1
j j U0j ,t − U j ,t
)U′
j ,t H∥∥∥=OP
(C−2
N ,T
)and
∥∥∥T −1ε j
(U′
j ,t −U0′j ,t
(H−1
j j
)′)H
∥∥∥=OP
(C−2
N ,T
). Hence, it follows that
i ′N B0j
(T −1H j j
(H−1
j j U0j ,t − U j ,t
)U′
j ,t
)H−1
j j =OP(NC−2
N ,T
). (B.22)
72
Also
i ′N T −1ε j
(U′
j ,t −U0′j ,t
(H−1
j j
)′)= N∑i=1
T −1T∑
t=1ε j ,i ,t d j ,t
(u′
j ,t −u0′j ,t
(H−1
j j
)′)≤
N∑i=1
∥∥∥∥∥T −1T∑
t=1ε j ,i ,t d j ,t
(u′
j ,t −u0′j ,t
(H−1
j j
)′)∥∥∥∥∥ ;
again by using the same arguments as in the proof of Lemma B.1 in Bai (2003), it can be shown
that
E
∥∥∥∥∥T −1T∑
t=1ε j ,i ,t d j ,t
(u′
j ,t −u0′j ,t
(H−1
j j
)′)∥∥∥∥∥≤ c0C−2N ,T ,
which yields that
i ′N T −1ε j
(U′
j ,t −U0′j ,t
(H−1
j j
)′)=OP(NC−2
N ,T
). (B.23)
Finally, note that
E∥∥∥i ′N T −1ε j U0′
j ,t
∥∥∥2 =N∑
i=1
N∑j=1
T −2T∑
t=1
T∑s=1
E(εi ,tε j ,s d j ,t d j ,s
)E
∥∥∥u0′j ,t u0
j ,s
∥∥∥≤
(max
1≤t≤TE
∥∥∥u0j ,t
∥∥∥2)
N
Tmax
1≤i≤N
N∑k=1
T −1T∑
t=1
T∑s=1
∣∣E (εi ,tεk,s d j ,t d j ,s
)∣∣≤ c0N
T,
after Assumptions 2(ii) and 3(i), and using (A.19), which entails that
i ′N T −1ε j U0′j ,t =OP
(N 1/2T −1/2) . (B.24)
Putting (B.22)-(B.24) together, it follows that
i ′N(B j −B0
j H j j
)=OP
(NC−2
N ,T
). (B.25)
Thus, considering (B.19), (B.20) and (B.25) immediately entail that I I = OP
(N 1/2T −1/2
j C−2N ,T
).
Turning to I I I , note that, using (B.21)(B j −B0
j H j j
)′ ( 1
T j
T∑t=1
εt d j ,t
)
=(B0
j H j j T −1(H−1
j j U0j ,t − U j ,t
)U′
j ,t H−1j j
)′ ( 1
T j
T∑t=1
εt d j ,t
)
+(T −1ε j
(U′
j ,t −U0′j ,t
(H−1
j j
)′)H−1
j j
)′ ( 1
T j
T∑t=1
εt d j ,t
)
+(T −1ε j U0′
j ,t
(H−1
j j
)′H−1
j j
)′ ( 1
T j
T∑t=1
εt d j ,t
)= I + I I + I I I . (B.26)
73
Using (B.20), it follows that
I =(H j j T −1
(H−1
j j U0j ,t − U j ,t
)U′
j ,t H−1j j
)(′B0′
j1
T j
T∑t=1
εt d j ,t
)=OP
(N 1/2T −1/2
j C−2N ,T
);
further, we have
N∑i=1
(1
T j
T∑t=1
εi ,t d j ,t
)T −1
T∑t=1
εi ,t d j ,t
(u′
j ,t −u0′j ,t
(H−1
j j
)′)=OP
(N T −1/2
j C−2N ,T
)N∑
i=1
(1
T j
T∑t=1
εi ,t d j ,t
)(1
T
T∑t=1
εi ,t u′j ,t d j ,t
)=OP
(N T −1) ,
whence I I =OP
(N T −1/2
j C−2N ,T
)and I I I =OP
(N T −1
). Finally, using the same arguments as above,
it holds that IV is dominated. Putting all together, we conclude that, in (B.17), I I =OP(N T −1
),
whence
i ′N MB , j1
T j
T∑t=1
εt d j ,t =OP(N 1/2T −1/2)+OP
(N T −1) . (B.27)
Putting (B.13), (B.15) and (B.27) together, it follows that, in (B.12)
N 1/2 (γ j ,0 −γ j ,0
)= (i ′N M0
B , j iN
N
)−1 (i ′N M0
B , jα j
N 1/2
)+OP
(N 1/2T −1)+oP (1) . (B.28)
Recalling that, by Lemma A.8, π j −π j =OP(N−ηT −1
)with η> 1
2 , the desired result follows from
Assumption 7(iii) and the Cramer-Wold device.
Proof of Theorem 6. The proof is essentially the same as the proof of Theorems 3 and 4 in Horvath
and Trapani (2019) and most of it is therefore omitted. In order to understand the role of (5.17),
note that
M−1/2θ
Mθ∑m=1
(ζθm (s)− 1
2
)= M−1/2
θ
Mθ∑m=1
(ζθm (0)− 1
2
)
+M−1/2θ
Mθ∑m=1
(ζθm (s)−ζθm (0)−
∫ s/ fN ,T
0
1p2π
exp
(−1
2x2
)d x
)
+M−1/2θ
Mθ∑m=1
∫ s/ fN ,T
0
1p2π
exp
(−1
2x2
)d x.
By the same passages as in the proof of Theorem 3 in Horvath and Trapani (2019), it is easy to
see that
M−1/2θ
Mθ∑m=1
(ζθm (0)− 1
2
)=OP∗ (1) ;
74
also
E∗∫ ∞
−∞
∣∣∣∣∣M−1/2θ
Mθ∑m=1
(ζθm (s)−ζθm (0)−
∫ s/ fN ,T
0
1p2π
exp
(−1
2x2
)d x
)∣∣∣∣∣2
d s = o (1) ,
and ∫ ∞
−∞
∣∣∣∣∣M−1/2θ
Mθ∑m=1
∫ s/ fN ,T
0
1p2π
exp
(−1
2x2
)d x
∣∣∣∣∣2
d s = c0Mθ
f 1/2N ,T
= o (1) ,
after (5.17). The rest of the proof follows immediately.
75
Table 1: Number of estimated common factors across regimes
(N ,T ) = (17,273) (N ,T ) = (70,273)
PD PD
1 2 1 2
1 (1.273,1.006)[27.50,1.400]
(1.992,0.997)[0.600,0.300]
1 (1.038,0.992)[5.400,0.800]
(2.000,0.996)[1.800,0.400]
2 (1.008,1.980)[1.400,1.200]
(2.002,1.994)[2.100,0.300]
2 (1.094,2.007)[6.900,0.900]
(1.982,1.980)[1.000,1.600]
PS 3 (1.003,2.986)[0.700,0.700]
(1.700,2.992)[30.00,0.400]
PS 3 (0.994,2.964)[0.600,2.000]
(1.980,2.966)[1.400,2.200]
4 (1.001,3.939)[0.300,4.800]
(1.988,3.970)[0.900,1.300]
4 (0.996,3.952)[0.400,1.800]
(1.996,3.938)[0.400,2.200]
5 (0.999,4.864)[0.100,11.80]
(1.997,4.725)[0.300,25.30]
5 (0.998,4.882)[0.200,4.600]
(1.982,4.914)[1.400,3.000]
6 (1.000,4.883)[0.000,97.60]
(1.997,4.771)[0.200,99.80]
6 (0.990,5.848)[1.000,10.40]
(1.980,5.800)[1.400,11.60]
(N ,T ) = (130,666) (N ,T ) = (330,666)
PD PD
1 2 1 21 (0.996,0.998)
[0.800,0.200](1.984,0.996)
[1.000,0.400]1 (0.980,1.000)
[2.000,0.000](2.000,1.000)
[0.000,0.000]2 (0.996,1.980)
[0.400,1.400](1.984,1.990)
[1.000,0.600]2 (0.980,2.000)
[2.000,0.000](2.000,2.000)
[0.000,0.000]PS 3 (1.000,2.976)
[0.000,1.000](1.980,2.950)
[1.400,2.000]PS 3 (1.000,3.000)
[0.000,0.000](2.000,3.000)
[0.000,0.000]4 (0.992,3.966)
[0.800,1.200](1.980,3.958)
[1.600,2.000]4 (1.000,4.000)
[0.000,0.000](2.000,4.000)
[0.000,0.000]5 (0.994,4.946)
[0.600,2.400](1.984,4.924)
[1.000,2.000]5 (1.000,5.000)
[0.000,0.000](2.000,4.940)
[0.000,2.000]6 (0.994,5.924)
[0.600,2.000](1.980,5.912)
[1.200,2.200]6 (1.000,6.000)
[0.000,0.000](2.000,5.960)
[0.000,2.000]
The table contains the average numbers of estimated factors and the percentageof times the estimator is wrong. In particular, in each cell the numbers in roundbrackets are the average values of P 0
Dand P 0
Srespectively. The numbers under-
neath, in square brackets, represent the percentage of time that each estimatormissed the true values P 0
D,P 0
S.
76
Table 2: Estimation and inference on θ
Equity Multi-asset
fixed estimated fixed estimated
θ −0.030 −0.061 −0.030 −0.060
TD 105 45 43 21
πD 0.158 0.068 0.158 0.077
Cor r∗ (D,REC ) 0.330[0.148,0.490]
0.510[0.255,0.698]
0.390[0.102,0.617]
0.550[0.168,0.788]
Cor r (D,REC ) 0.166[−0.026,0.346]
0.244[−0.053,0.501]
0.194[−0.112,0.467]
0.287[−0.165,0.639]
Cor r (D,∆I P ) −0.054[−0.243,0.139]
−0.084[−0.368,0.214]
−0.017[−0.315,0.284]
−0.097[−0.507,0.349]
Cor r (D,V I X ) 0.508[0.351,0.637]
0.538[0.291,0.718]
0.491[0.224,0.689]
0.534[0.133,0.784]
Cor r (D,V XO) 0.503[0.345,0.633]
0.545[0.300,0.722]
0.489[0.222,0.688]
0.547[0.151,0.791]
Cor r (D,EPU ) 0.166[−0.026,0.346]
0.293[0.010,0.540]
0.206[−0.100,0.476]
0.361[−0.083,0.685]
Cor r (D,E MV ) 0.450[0.283,0.590]
0.525[0.274,0.709]
0.415[0.131,0.636]
0.498[0.085,0.765]
Cor r (D,PRE MV ) 0.426[0.256,0.571]
0.516[0.263,0.703]
0.398[0.111,0.623]
0.490[0.074,0.760]
Cor r (D,GPR) −0.003[−0.194,0.188]
0.045[−0.251,0.334]
0.032[−0.270,0.329]
0.087[−0.358,0.499]
Cor r (D,T ED) 0.163[−0.029,0.343]
0.210[−0.089,0.474]
0.187[−0.120,0.461]
0.239[−0.214,0.607]
Cor r (D,SP X ) 0.431[0.261,0.575]
0.493[0.234,0.687]
0.398[0.111,0.623]
0.459[0.035,0.743]
Cor r (D,T 10Y 2Y ) −0.059[−0.247,0.134]
−0.010[−0.302,0.284]
0.046[−0.257,0.341]
0.114[−0.334,0.520]
H0 : θ0 =−0.03 H0 : θ0 =−0.03(0.000) (0.004)[0.000] [0.032]
H0 : θ0E q = θ0
M A(0.122)[0.999]
The table contains all the relevant inference on θ in the datasets considered,and we refer to the data description in Section 7.1 for details. For each dataset,we report the estimated value, θ, and, as a term of comparison, the exogenouslyfixed value θ, which is proposed in Farago and Tedongap (2018). For both valuesof θ, we also report the the proportion of time periods in the downside regime D,denoted as πD , and the corresponding number of time periods, denoted as TD .In the bottom half of the table, we report the correlation between the indicatorfunctions implicitly defined by both θ and θ, and various other variables, definedas follows. REC is the NBER U.S. recession indicator. ∆IP is the log difference ofindustrial production. VIX is the CBOE volatility index based on S&P 500 indexoptions. VXO is the CBOE volatility index based on S&P 100 index options.EPU is the economic policy uncertainty index of Baker et al. (2016). EMV is theEquity Market Volatility tracker of Baker, Bloom, and Davis (2016). PREMV isthe Policy-Related Equity Market Volatility tracker of Baker, Bloom, Davis, andKost (2019). GPR is the Geopolitical Risk index of Caldara and Iacoviello (2018).TED is the TED spread. SPX is the disaster probability for the U.S. of Barro andLiao (2020). Finally, T10Y2Y is the spread for the T-bill between 10 and 2 yearmaturities.All correlations have been computed using the classical Pearson correlation co-efficient; the only exception is the correlation coefficient denoted as Cor r∗(D,REC ),which is the tetrachoric correlation coefficient.In the last part of the table, we report tests based on Section 5.3.3. The first
three tests correspond to the null hypotheses that θ0 = θ for the two datasets. The
last test is for the null that θ0 is the same across the two datasets - we refer tothe values of θ0 in each dataset as θ0
E q and θ0M A respectively. The numbers in
round brackets are the p-values of the test based on Sθ ; the numbers in squarebrackets contain the values of Qα computed using (5.20) with B = 2,000. In thiscase, the threshold, as defined in (5.21), is equal to 0.940.
77
Table 3: Number of estimated common factors across regimesEquities Currencies, equities and commodieties
Unconditional model - (N ,T ) = (130,666) Unconditional model - (N ,T ) = (17,273)
Common Individual Common Individual
P 6 Eigenv. ν(i ) P 4 Eigenv. ν(i )
1 0.833 1 0.518ν 0.945 2 0.050 ν 0.744 2 0.165
3 0.028 3 0.0614 0.019 4 0.0545 0.0096 0.007
Conditional model Conditional model
Regime S -(N ,TS
)= (130,621) Regime S -(N ,TS
)= (17,252)
Common Individual Common Individual
PS 6 Eigenv. ν(i )S
PS 4 Eigenv. ν(i )S
1 0.781 1 0.375νS 0.928 2 0.068 νS 0.654 2 0.190
3 0.037 3 0.0894 0.021 4 0.0735 0.0126 0.009
Regime D -(N ,TD
)= (130,45) Regime D -(N ,TD
)= (17,21)
Common Individual Common Individual
PD 1 Eigenv. ν(i )D
PD 2 Eigenv. ν(i )D
1 0.924 1 0.748νD 0.924 νD 0.892 2 0.144
The table contains the number of estimated common factors in each regime; eachpanel corresponds to a different dataset. We refer to the data description in Section7.1 for details.Each panel contains inference on the number of common factors in the two regimes -the “normal”regime, denoted as S , and the “downside”regime, denoted as D. More-over, at the top of each panel, we have estimated the number of common factors forthe whole sample, which is tantamount to estimating the number of common factorsin a linear specification without thresholds (the “unconditional model”).In each panel, we have estimated the number of common factors using the method-ology described in Section 4.2.1.Each sub-table in the panel (for different regimes and different ways of setting thethreshold) contains two types of information. The aggregate information (in thecolumns headed as “Common”) contains the estimated number of common factors,P , and the percentage of the variance explained by all the common factors (this isdenoted as ν j , for j =D,S , using the notation in equation (7.1)); similarly, ν refers tothe whole sample estimation). The individual information (in the columns headed as“Individual”) contains the percentage of the variance explained by the i -th individual
common factor (this is denoted as ν(i )j , for j = D,S , using the notation in equation
(7.2)); similarly, ν(i ) refers to the whole sample estimation).
78
Table 4: Goodness of fit
Panel A: Conditional model with downside risk
RMSPE RMSPE/RMSR R2
Equity Multi-asset Equity Multi-asset Equity Multi-asset
P = P P = P P = P P = P P = P P = P
Realised returns
Regime D 1.323 1.576 0.134 0.251 0.982 0.928
Regime S 0.102 0.257 0.070 0.307 0.994 0.874
Full sample 0.184 0.358 0.074 0.303 0.993 0.878
Panel B: Linear, unconditional model
Equity Multi-asset
RMSPE
Realized Returns P = P P = 1 P = 5 P = 10 P = 15 P = P P = 1 P = 5 P = 10 P = 15
Regime D 10.570 10.575 10.570 10.570 10.570 6.517 6.519 6.516 6.514 6.512
Regime S 0.773 0.797 0.773 0.771 0.770 0.579 0.588 0.565 0.559 0.544
Full Sample 0.103 0.229 0.103 0.081 0.075 0.204 0.231 0.161 0.133 0.037
RMSPE/RMSR
Realized Returns P = P P = 1 P = 5 P = 10 P = 15 P = P P = 1 P = 5 P = 10 P = 15
Regime D 1.067 1.067 1.067 1.067 1.067 1.036 1.037 1.036 1.036 1.036
Regime S 0.533 0.549 0.533 0.531 0.531 0.690 0.701 0.672 0.665 0.648
Full Sample 0.145 0.323 0.146 0.113 0.105 0.513 0.582 0.405 0.333 0.092
R2
Realized Returns P = P P = 1 P = 5 P = 10 P = 15 P = P P = 1 P = 5 P = 10 P = 15
Regime D -0.044 -0.003 -0.035 -0.079 -0.126 -0.329 -0.064 -0.450 -1.657 -14.936
Regime S 0.196 0.208 0.203 0.172 0.136 -0.113 0.092 -0.181 -1.134 -11.437
Full Sample 0.978 0.895 0.978 0.986 0.988 0.650 0.639 0.762 0.704 0.865
Realised returns in regimed D, regime S and full sample are computed as RD = T−1D
∑Tt=1 dD,t Rt , RS = T−1
S
∑Tt=1 dS ,t Rt and R =
T−1 ∑Tt=1 Rt , respectively. Predicted returns are computed as: XD ΓD in regime D; XS ΓS in regime S ; XΓ in the unconditional model;[(
TD /T)
XD ΓD + (TS /T
)XS ΓS
]in the average model.
RMSPE is computed as:√α′
DαD /N where αD = RD − XD ΓD in regime D when the conditional model D is used;
√α′
DUαDU /N where
αDU = RD − XΓ in regime D when the unconditional model is used;√α′
SαS /N where αS = RS − XS ΓS in regime S when the conditional
model S is used;√α′
S UαS U /N where αS U = RS − XΓ in regime S when the unconditional model is used;
√α′
DSαDS /N where αDS =
R− [(TD /T
)XD ΓD + (
TS /T)
XS ΓS]
in the whole sample when the average model is used;√α′α/N where α= R− XΓ in the whole sample when
the unconditional model is used.RMSPE/RMSR is computed as:
√α′
DαD
/√R′
DRD in regime D when the conditional model D is used;
√α′
DUαDU
/pR′R in regime D when
the unconditional model is used;√α′
SαS
/√R′
SRS in regime S when the conditional model S is used;
√α′
S UαS U
/pR′R in regime S
when the unconditional model is used;√α′
DSαDS
/pR′R in the whole sample when the average model is used;
√α′α
/pR′R in the whole
sample when the unconditional model is used.
R2 is computed as: 1−(α′
DαD
/R′
DRD
)[(N −1)
/(N − PD
) ]in regime D when the conditional model D is used; 1−
(α′
DUαDU
/R′R
)[(N −1)
/(N − P
) ]in Regime D when the unconditional model is used; 1−
(α′
SαS
/R′
SRS
)[(N −1)
/(N − PS
) ]in regime S when the conditional model S is used;√
α′S U
αS U
/pR′R in regime S when the unconditional model is used; 1−
(α′
DSαDS
/R′R
)[(N −1)
/(N −min
PD , PS
) ]in the whole sample when
the average model is used; 1− (α′α/
R′R )[(N −1)
/(N − P
) ]in the whole sample when the unconditional model is used.
79
Table 5: Estimated risk premia
Estimated risk premia - third stage estimation
Realized returns Full Sample Regime D Regime S
Model Unconditional Conditional D Conditional SEquity Multi-asset Equity Multi-asset Equity Multi-asset
ln(Rm) 0.080 0.461 −1.793∗∗ −5.124∗∗∗ 0.720∗∗∗ 0.702∗(0.254) (0.401) (0.720) (1.767) (0.220) (0.392)
SMB 0.241∗ 0.027 −1.464∗∗∗ −3.021∗∗∗ 0.399∗∗∗ 0.397∗∗(0.144) (0.152) (0.428) (0.792) (0.152) (0.186)
HML 0.278∗∗ −0.068 −0.444 −1.724 0.186 −0.256∗(0.135) (0.111) (0.351) (1.557) (0.126) (0.153)
RMW 0.091 −0.130 −0.092 −1.000∗ −0.004 −0.315∗(0.098) (0.117) (0.246) (0.547) (0.088) (0.179)
CMA 0.172∗ −0.083 −0.214 −0.998 0.073 −0.163∗(0.091) (0.076) (0.219) (0.723) (0.081) (0.093)
Mom 0.644∗∗∗ −0.071 −0.469 −2.380 0.592∗∗∗ −0.079(0.162) (0.152) (0.544) (2.183) (0.198) (0.205)
Liq 0.002 0.003 −0.025 −0.086 0.002∗∗ 0.002(0.001) (0.002) (0.016) (0.054) (0.001) (0.002)
T10Y2Y 0.000 −0.003 −0.008 0.030 0.016 0.009(0.011) (0.024) (0.106) (0.225) (0.015) (0.034)
Estimated γ j ,0 - second stage estimation
Realized returns Full Sample Regime D Regime S
Model Unconditional Conditional D Conditional SEquity Multi-asset Equity Multi-asset Equity Multi-asset
Intercept 0.467∗∗ −0.150 −7.381∗∗∗ −2.474∗∗∗ 0.488∗∗∗ 0.360(0.190) (0.285) (0.435) (0.919) (0.175) (0.334)
Dataset: Equity Dataset: Multi-asset
Confidence interval for πDγD,0 +πS γS ,0 (−0.375,0.287) (−0.490,0.774)
N 130 17 130 17 130 17T 666 273 45 21 621 252
The table contains the estimates of the risk premia for different datasets. We use the short-hand notation “Equity”to refer to thedataset employed by Farago and Tedongap (2018), and “Multi-asset”to refer to the dataset employed by Lettau, Maggiori, andWeber (2014) (after removing 18 portfolios corresponding to options).In Panel A, we estimate the impact of the unobservable factors. In Panel B, we consider the risk premia associated with commonlyemployed factors, using the full three-pass methodology.In each panel, we estimate the model in each separate regime, and in the corresponding (misspecified) linear model.
80
Figure 1: Predicted and expected returns
−15 −10 −5 0
−1
5−
10
−5
0
Equities, downside regime
Predicted return (%)
Re
aliz
ed
re
turn
(%
)
−15 −10 −5 0
−1
5−
10
−5
0
Predicted return (%)
Re
aliz
ed
re
turn
(%
)Conditional
Unconditional
−1 0 1 2 3
−1
01
23
Equities, upside regime
Predicted return (%)
Re
aliz
ed
re
turn
(%
)
−1 0 1 2 3
−1
01
23
Predicted return (%)
Re
aliz
ed
re
turn
(%
)
−1 0 1 2 3
−1
01
23
Predicted return (%)
Re
aliz
ed
re
turn
(%
)
−1 0 1 2 3
−1
01
23
Equities, full sample
Predicted return (%)
Re
aliz
ed
re
turn
(%
)
−15 −10 −5 0
−1
5−
10
−5
0
Multi−asset, downside regime
Predicted return (%)
Re
aliz
ed
re
turn
(%
)
−15 −10 −5 0
−1
5−
10
−5
0
Predicted return (%)
Re
aliz
ed
re
turn
(%
)
−1 0 1 2 3
−1
01
23
Multi−asset, upside regime
Predicted return (%)
Re
aliz
ed
re
turn
(%
)
−1 0 1 2 3
−1
01
23
Predicted return (%)
Re
aliz
ed
re
turn
(%
)
−1 0 1 2 3
−1
01
23
Predicted return (%)
Re
aliz
ed
re
turn
(%
)
−1 0 1 2 3
−1
01
23
Multi−asset, full sample
Predicted return (%)
Re
aliz
ed
re
turn
(%
).
81