econometrica, vol. 72, no. 5 (september, 2004), 1519–1563 consistent... · econometrica, vol. 72,...

45
Econometrica, Vol. 72, No. 5 (September, 2004), 1519–1563 WAVELET-BASED TESTING FOR SERIAL CORRELATION OF UNKNOWN FORM IN PANEL MODELS BY YONGMIAO HONG AND CHIHWA KAO 1 Wavelet analysis is a new mathematical method developed as a unified field of sci- ence over the last decade or so. As a spatially adaptive analytic tool, wavelets are useful for capturing serial correlation where the spectrum has peaks or kinks, as can arise from persistent dependence, seasonality, and other kinds of periodicity. This paper proposes a new class of generally applicable wavelet-based tests for serial correlation of unknown form in the estimated residuals of a panel regression model, where error components can be one-way or two-way, individual and time effects can be fixed or random, and regressors may contain lagged dependent variables or deterministic/stochastic trending variables. Our tests are applicable to unbalanced heterogenous panel data. They have a convenient null limit N(0 1) distribution. No formulation of an alternative model is re- quired, and our tests are consistent against serial correlation of unknown form even in the presence of substantial inhomogeneity in serial correlation across individuals. This is in contrast to existing serial correlation tests for panel models, which ignore inho- mogeneity in serial correlation across individuals by assuming a common alternative, and thus have no power against the alternatives where the average of serial correla- tions among individuals is close to zero. We propose and justify a data-driven method to choose the smoothing parameter—the finest scale in wavelet spectral estimation, making the tests completely operational in practice. The data-driven finest scale au- tomatically converges to zero under the null hypothesis of no serial correlation and diverges to infinity as the sample size increases under the alternative, ensuring the con- sistency of our tests. Simulation shows that our tests perform well in small and finite samples relative to some existing tests. KEYWORDS: Error component, hypothesis testing, serial correlation of unknown form, spectral peak, static and dynamic panel models, unbalanced panel data, wavelet. 1. INTRODUCTION P ANEL DATA HAVE BEEN WIDELY USED in economics and finance. They often provide insights not available in pure time-series or cross-sectional data (e.g., Baltagi (2002), Granger (1996), Hsiao (2003)). This paper proposes a new class of generally applicable wavelet-based consistent tests for serial correlation of unknown form in the errors of panel models. It is important to test serial cor- relation for panel models because existence of serial correlation will invalidate conventional tests such as t - and F -tests which use standard covariance esti- mators of parameter estimators, and will indicate model misspecification when 1 We thank the co-editor and three referees for insightful comments that have lead to sig- nificant improvement on a previous version. We also thank Badi Baltagi, Pierre Duchesne, Jiti Gao, Jerry Hausman, Cheng Hsiao, Heshem Pesaran, Jim Stock, and seminar participants at MIT-Harvard Econometrics Workshop, 2001 North American Summer Meeting of Econo- metric Society in Washington, DC, 2001 Far Eastern Meeting of Econometric Society in Kobe, Japan, Fifth ICSA International Conference in Hong Kong, and 10th International Conference on Panel Data Models in Berlin for helpful comments. This research is supported by National Science Foundation via Grant SES-0111769. 1519

Upload: others

Post on 23-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Econometrica, Vol. 72, No. 5 (September, 2004), 1519–1563

    WAVELET-BASED TESTING FOR SERIAL CORRELATION OFUNKNOWN FORM IN PANEL MODELS

    BY YONGMIAO HONG AND CHIHWA KAO1

    Wavelet analysis is a new mathematical method developed as a unified field of sci-ence over the last decade or so. As a spatially adaptive analytic tool, wavelets are usefulfor capturing serial correlation where the spectrum has peaks or kinks, as can arise frompersistent dependence, seasonality, and other kinds of periodicity. This paper proposesa new class of generally applicable wavelet-based tests for serial correlation of unknownform in the estimated residuals of a panel regression model, where error componentscan be one-way or two-way, individual and time effects can be fixed or random, andregressors may contain lagged dependent variables or deterministic/stochastic trendingvariables. Our tests are applicable to unbalanced heterogenous panel data. They have aconvenient null limit N(0�1) distribution. No formulation of an alternative model is re-quired, and our tests are consistent against serial correlation of unknown form even inthe presence of substantial inhomogeneity in serial correlation across individuals. Thisis in contrast to existing serial correlation tests for panel models, which ignore inho-mogeneity in serial correlation across individuals by assuming a common alternative,and thus have no power against the alternatives where the average of serial correla-tions among individuals is close to zero. We propose and justify a data-driven methodto choose the smoothing parameter—the finest scale in wavelet spectral estimation,making the tests completely operational in practice. The data-driven finest scale au-tomatically converges to zero under the null hypothesis of no serial correlation anddiverges to infinity as the sample size increases under the alternative, ensuring the con-sistency of our tests. Simulation shows that our tests perform well in small and finitesamples relative to some existing tests.

    KEYWORDS: Error component, hypothesis testing, serial correlation of unknownform, spectral peak, static and dynamic panel models, unbalanced panel data, wavelet.

    1. INTRODUCTION

    PANEL DATA HAVE BEEN WIDELY USED in economics and finance. They oftenprovide insights not available in pure time-series or cross-sectional data (e.g.,Baltagi (2002), Granger (1996), Hsiao (2003)). This paper proposes a new classof generally applicable wavelet-based consistent tests for serial correlation ofunknown form in the errors of panel models. It is important to test serial cor-relation for panel models because existence of serial correlation will invalidateconventional tests such as t- and F -tests which use standard covariance esti-mators of parameter estimators, and will indicate model misspecification when

    1We thank the co-editor and three referees for insightful comments that have lead to sig-nificant improvement on a previous version. We also thank Badi Baltagi, Pierre Duchesne,Jiti Gao, Jerry Hausman, Cheng Hsiao, Heshem Pesaran, Jim Stock, and seminar participantsat MIT-Harvard Econometrics Workshop, 2001 North American Summer Meeting of Econo-metric Society in Washington, DC, 2001 Far Eastern Meeting of Econometric Society in Kobe,Japan, Fifth ICSA International Conference in Hong Kong, and 10th International Conferenceon Panel Data Models in Berlin for helpful comments. This research is supported by NationalScience Foundation via Grant SES-0111769.

    1519

  • 1520 Y. HONG AND C. KAO

    regressors include lagged dependent variables. Moreover, the choice of estima-tion methods may depend on whether there exists serial correlation in the er-rors of panel models. When the errors are serially correlated, for example, thecomputation of MLE (e.g., Anderson and Hsiao (1982), Hsiao (2003), Binder,Hsiao, and Pesaran (1999)) and GMM (e.g., Blundell and Bond (1998)) couldbe complicated, and the feasible GLS estimator will be invalid or have to bemodified substantially (e.g., Baltagi and Li (1991)). Some procedures, such asBreusch and Pagan’s (1980) tests for random effects, also assume serial uncor-relatedness in the errors of panel models.

    There have been some tests for serial correlation in panel models. Bhargava,Franzini, Narendranathan (1982) extend Durbin and Watson’s (1951) test tostatic panel models. Breusch and Pagan (1980) propose an LM test for first-order serial correlation, assuming no random effects. Baltagi and Li (1995)propose a class of LM tests for first-order serial correlation, allowing randomor fixed individual effects. Bera, Sosa-Escudero, and Yoon (2001) also pro-pose a convenient OLS-based test for first-order serial dependence. Li andHsiao (1998) propose tests for first-order and higher-order serial correlationin a semiparametric partially linear panel model.

    All existing tests for serial correlation in panel models assume a known formof a common alternative, e.g., an AR(1) or MA(1) model. These tests haveoptimal power when the assumed model is true, but they are not consistentagainst serial correlation of unknown form. It is useful to test serial correlationof unknown form because prior information about the alternative is usuallynot available in practice. This is particularly relevant for panel models becausethere may exist significant inhomogeneity in serial correlation across individ-uals (e.g., Choi (2002)). By assuming a common alternative model, existingtests ignore inhomogeneity in serial correlation across individuals, and so havelittle power against the alternatives where the average of serial correlationsamong individuals is close to zero. Moreover, as Granger and Newbold (1977,p. 92) pointed out, the first few lags of OLS residuals of a misspecified lin-ear dynamic model often appear like a white noise, due to the very nature ofOLS. It is therefore important to check serial correlation at higher order lags.Little effort has been made on evaluation of dynamic panel models (Granger(1996)).

    Wavelets are a new mathematical tool alternative to the Fourier transform.They can effectively capture nonsmooth features such as singularities and spa-tial inhomogeneity (e.g., Donoho and Johnstone (1994, 1995a, 1995b), Donohoet al. (1996), Gao (1997), Hong and Lee (2001), Jensen (2000), Neumann(1996), Lee and Hong (2001), Ramsey (1999), Wang (1995)). Many economicand financial time series have a spectrum with peaks and kinks, as can arisefrom persistent dependence, business cycles, seasonality, and other kinds ofperiodicity (e.g., Bizer and Durlauf (1992), Granger (1969), Watson (1993)).In this paper we use wavelets to test serial correlation in estimated resid-uals of panel models. Unlike existing tests, whose constructions are usually

  • SERIAL CORRELATION IN PANEL MODELS 1521

    model-dependent, our tests are generally applicable. The panel model can bestatic or dynamic, and one-way or two-way; both balanced and unbalancedpanel data are covered; individual and time effects can be fixed or random;regressors can contain lagged dependent variables or deterministic/stochastictrending variables; and no specific estimation method is required. Our testshave a convenient limit N(0�1) distribution under the null hypothesis, nomatter whether regressors contain lagged dependent variables or determin-istic/stochastic trending variables. In contrast to Durbin and Watson’s (1951)test and Box and Pierce’s (1970) portmanteau test, parameter estimation un-certainty has no impact on the limit distribution of our test statistics whenapplied to dynamic panel models. We do not require an alternative model,and our tests are consistent against serial correlation of unknown form even inthe presence of substantial inhomogeneity in serial correlation across individ-uals. No consistent test for serial correlation of unknown form was availablefor panel models.

    Our asymptotic theory considers a panel model with both large n and T ,where n is the number of individuals and T is the number of time-seriesobservations� Increasing effort has been devoted to the study of panel mod-els with both large n and T , due to the growing use of cross-country data overtime to study growth convergence, international R&D spillover and purchas-ing power parity, and to the growing use of firm- or portfolio-level financialtime series. As is well known (e.g., Phillips and Moon (1999), Hahn andKuersteiner (2002)), asymptotic analysis in panel models is much more in-volved than in pure time-series analysis, due to the need to handle doubleindices. As a distinct feature, we treat both n→ ∞ and T → ∞ simultaneously,which complements Phillips and Moon (1999) and Hahn and Kuersteiner’s(2002) joint limit theory for panel models. Our general theory does not re-quire that the ratio n/T go to 0 or a constant. We also show that the use ofthe estimated residuals from a possibly nonstationary panel model rather thanthe unobservable errors has no impact on the null limit distribution of our teststatistics. In addition, we find several interesting features not available in puretime-series analysis. Most remarkably, the limit N(0�1) distribution of our teststatistics is obtained without requiring the smoothing parameter—the finestscale in wavelet estimation to grow with T . This not only leads to reasonableasymptotic approximation in finite samples, but also makes it possible to usedata-driven methods that deliver a fixed finest scale under the null hypothe-sis of no serial correlation. This is in sharp contrast to Lee and Hong (2001),who, in testing serial correlation for observed raw time series data, require thefinest scale to grow as T → ∞ under the null hypothesis. We further developa data-driven method to choose the finest scale, making our tests completelyoperational in practice. The data-driven finest scale converges to 0 under thenull and grows to ∞ under the alternative, ensuring consistency against serialcorrelation of unknown form. We also find that a heteroskedasticity-correctedtest may be less powerful than a heteroskedasticity-consistent test. This differs

  • 1522 Y. HONG AND C. KAO

    from the well-known estimation result that heteroskedasticity-corrected esti-mators (e.g., feasible GLS) are more efficient than heterokedasticity-consistentestimators (e.g., OLS). Our tests work reasonably well in small and finite sam-ples often encountered in economics.

    We describe the panel model and hypotheses in Section 2, introduce waveletsand test statistics in Section 3, derive the limit distributions of these tests inSection 4, and establish their consistency in Section 5. Section 6 proposes adata-driven finest scale. Section 7 is a simulation study. Section 8 concludes.All proofs are in the Appendix. Throughout, ‖A‖ denotes the Euclidean norm[tr(A′A)]1/2; A∗ and Re(A) the complex conjugate and the real part of A;Z ≡ {0�±1� � � � } and Z+ ≡ {0�1� � � � } the set of integers and the set of nonneg-ative integers; and c and C generic bounded constants, with 0 < c < C

  • SERIAL CORRELATION IN PANEL MODELS 1523

    {εit} in (2.1) is serially uncorrelated, among other things. The existence of se-rial correlation of any form, however, will generally invalidate the covarianceestimator and related inference. In particular, conventional t- and F -tests willbe misleading. Moreover, when the Xit contain lagged dependent variables,serial correlation will further render inconsistent the within estimator β̂ for β.

    We are interested in testing whether the error process {εit} is serially corre-lated. The hypotheses of interest are H0 : cov(εit� εit−|h|) = 0 for all h �= 0 andall i vs. HA : cov(εit� εit−|h|) �= 0 at least for some h �= 0 and some i� The alter-native HA allows some (but not all) individual series to be white noises. Priorinformation about the alternative is usually not available in practice, althoughthere may exist substantial inhomogeneity in serial correlation across i.

    To test H0� we will examine serial correlation in the demeaned estimatedresidual

    v̂it ≡ ûit−ūi·−ū·t+ū (t = 1� � � � � Ti; i= 1� � � � � n)�(2.3)

    where ûit ≡ Yit−X ′it β̂� ūi· ≡ T−1i∑Ti

    t=1 ûit� ū·t ≡ n−1∑n

    i=1 ûit� ū≡ n−1∑n

    i=1 T−1i ×∑Ti

    t=1 ûit� and β̂ is an estimator consistent for β under H0�When β̂ is the withinestimator in (2.2), v̂it is the well known within residual. However, we allow us-ing other estimators that are consistent for β under H0 but not necessarilyunder HA.

    Suppose β̂p→ β, as does the within estimator in (2.2) under H0; then v̂it will

    converge in probability to the true error εit� Under HA� however, β̂ may notbe consistent for β (as in a dynamic panel model), and v̂it will converge inprobability to the model error

    vit ≡ εit + (β−β∗)′(Xit −EX̄i· −EX̄·t +EX̄)�(2.4)which contains both the true error εit and the misspecified component, whereβ∗ ≡ p lim β̂. This does not invalidate our tests, but it affects the power of thetests in finite samples, because serial correlation in {vit} may differ from serialcorrelation in {εit}� However, our tests are still consistent against HA� becauseH0 holds if and only if {vit} is serially uncorrelated: When H0 holds, {vit} co-incides with {εit} and so is serially uncorrelated; on the other hand, if {vit} isserially uncorrelated in a linear dynamic panel model, we can view {vit} as thetrue error for (2.1), and estimation and inference can be implemented in astandard fashion. (We note that in a linear dynamic panel setup, it is possiblethat {εit} is serially correlated but {vit} is serially uncorrelated, due to β∗ �= β�This occurs when and only when εit contains the misspecified linear compo-nent, (β∗ − β)′(Xit − EX̄i· − EX̄·t + EX̄)� plus a white noise. In this case,serial correlation in {εit} is solely caused by the misspecified linear component,and it is actually more appropriate to view that (2.1) is a correctly specifiedlinear dynamic panel model, but with vit as the true error and β∗ as the true

  • 1524 Y. HONG AND C. KAO

    model parameter. With such an interpretation, H0 holds when {vit} is seriouslyuncorrelated.)

    Suppose {vit} has autocovariance function Ri(h) ≡ E(vitvit−|h|) and power

    spectrum

    fi(ω)≡ (2π)−1∞∑

    h=−∞Ri(h)e

    −ihω� ω ∈ [−π�π]� i ≡√

    −1�(2.5)

    Both Ri(h) and fi(ω) contain the same information on serial correlationof {vit}� One can use Ri(h) or fi(ω) to test H0 vs. HA� All existing tests forserial correlation in panel models are based on Ri(h)� assuming a commonmodel with some prespecified lags h for all i (e.g., AR(1) and MA(1)). Weuse fi(ω) here, which is a natural tool to test serial correlation of unknownform, because it contains information on serial correlation at all lags. Un-der H0� fi(ω) becomes fi0(ω)≡ (2π)−1Ri(0) for all ω ∈ [−π�π]� Under HA�fi(ω) �= (2π)−1Ri(0) at least for some i� Thus, a consistent test for H0 vs.HA can be formed by comparing consistent estimators of fi(ω) and fi0(ω)�We will use wavelets to estimate fi(ω)� which are suitable for time series withspectral peaks and kinks. Of course, other nonparametric methods (e.g., kernelsmoothing; see Hong (1996) and Section 7 below) could be used.

    3. WAVELET METHOD

    3.1. Wavelets

    The essence of wavelet analysis is to expand a function as a sum of ele-mentary functions called wavelets centered at a sequence of locations. Thesewavelets are derived from a single function ψ(·)� called the mother wavelet,by translations and dilations. As a spatially adaptive analytic tool, wavelets arepowerful in capturing singularities of nonsmooth functions, such as spectralpeaks and kinks (e.g., Gao (1997), Neumann (1996)). We first impose a stan-dard condition on ψ(·).

    ASSUMPTION 1: ψ : R → R is an orthonormal wavelet such that∫ ∞−∞ψ(x)dx = 0�

    ∫ ∞−∞ |ψ(x)|dx

  • SERIAL CORRELATION IN PANEL MODELS 1525

    translation parameters. Intuitively, j localizes analysis in frequency and k lo-calizes analysis in time or space.

    Assumption 1 ensures that the Fourier transform of ψ(·),

    ψ̂(z)≡ (2π)−1/2∫ ∞

    −∞ψ(x)e−izx dx� z ∈ R�(3.2)

    exists and is continuous in z almost everywhere. We impose a conditionon ψ̂(·)�

    ASSUMPTION 2: (a) |ψ̂(z)| ≤ C(1 + |z|)−τ for some τ > 32 ; (b) ψ̂(z) =eiz/2b(z) or ψ̂(z)= −ieiz/2b(z)� where b(·) is real-valued with b(0)= 0.

    Many wavelets satisfy this. One example is spline wavelets of positive orderm ∈ Z+� For odd m, this family has ψ̂(z)= eiz/2b(z), where b(·) is real-valuedand symmetric. For evenm, it has ψ̂(z)= −ieiz/2b(z)� where b(·) is real-valuedand odd (e.g., Hernández and Weiss (1996, (2.16), p. 161)). One member in thisfamily is the first-order spline wavelet, called the Franklin wavelet, whose

    ψ̂(z)= eiz/2(2π)−1/2 sin4(z/4)(z/4)2

    [P3(z/4 +π/4)P3(z/2)P3(z/4)

    ]1/2� where(3.3)

    P3(z)≡ 23 +13

    cos(2z)�

    Another member is the second-order spline wavelet, with

    ψ̂(z)= −ieiω/2(2π)−1/2 sin6(z/4)(z/4)3

    [P5(z/4 +π/4)P5(z/2)P5(z/4)

    ]1/2� where(3.4)

    P5(z)≡ 130 cos2(2z)+ 13

    30cos(2z)+ 8

    15�

    3.2. Wavelet Representation of Spectrum

    We now consider wavelet representation of spectral density fi(·)� Given anorthonormal wavelet basis {ψjk(·)} for L2(R)� we define

    Ψjk(ω)≡ (2π)−1/2∞∑

    m=−∞ψjk

    2π+m

    )� ω ∈ [−π�π]�

    This constitutes an orthonormal wavelet basis for L2[−π�π]� the space of2π-periodic functions on [−π�π]� See, e.g., Daubechies (1992, Ch. 9) orHernández and Weiss (1996, Ch. 4).

  • 1526 Y. HONG AND C. KAO

    One can also computeΨjk(·) from its Fourier transform Ψ̂jk(h)≡ (2π)−1/2 ×∫ π−π Ψjk(ω)e

    −ihω dω via the formula

    Ψjk(ω)= (2π)−1/2∞∑

    h=−∞Ψ̂jk(h)e

    ihω�(3.6)

    Lee and Hong (2001) show that the spectral density fi(·) in (2.5) can bedecomposed as

    fi(ω)= (2π)−1σ 2i +∞∑j=0

    2j∑k=1αijkΨjk(ω)� ω ∈ [−π�π]�(3.7)

    where the wavelet coefficient αijk is the orthogonal projection of fi(·) on thebase Ψjk(·); i.e.,

    αijk ≡∫ π

    −πfi(ω)Ψjk(ω)dω�(3.8)

    Unlike the Fourier transforms, αijk depends on the local behavior of fi(·)� be-cause Ψjk(·) is effectively 0 outside an interval of width 2−j centered at k/2j�Such a spatial adoption feature makes it powerful for capturing nonsmoothfeatures. We can also express αijk in time domain; i.e.,

    αijk = (2π)−1/2∞∑

    h=−∞Ri(h)Ψ̂

    ∗jk(h)�(3.9)

    3.3. Wavelet Spectral Density Estimator

    Define the sample autocovariance function of {v̂it}:

    R̂i(h)≡ T−1iTi∑

    t=|h|+1v̂it v̂it−|h| (h= 0�±1� � � � �±(Ti − 1))�(3.10)

    Then a wavelet estimator of the spectral density fi(·) can be given by

    f̂i(ω)≡ (2π)−1R̂i(0)+Ji∑j=0

    2j∑k=1α̂ijkΨjk(ω)� ω ∈ [−π�π]�(3.11)

    where the empirical wavelet coefficient

    α̂ijk ≡ (2π)−1/2Ti−1∑h=1−Ti

    R̂i(h)Ψ̂∗jk(h)�(3.12)

  • SERIAL CORRELATION IN PANEL MODELS 1527

    and Ji ≡ Ji(Ti) is the finest scale corresponding to the highest resolution level.Appropriate conditions on Ji will be given. We allow a different Ji for a dif-ferent i� This is useful because the pattern of serial correlation may varysubstantially across i� We will also propose a data-driven method to choose Ji.Note that (3.11) is a linear wavelet estimator. Nonlinear wavelet estimatorsof Donoho et al. (1996) which are popular in curve estimation could be used.However, under our regularity conditions (see subsequent sections) which arenot stronger than standard assumptions in time series panel econometrics, bothlinear and nonlinear wavelet estimators have the same convergence rate. Non-linear wavelet estimators have no advantage at least in large samples. Masry(1994, 1997) also finds that the gain from using nonlinear wavelet estimatorsrather than linear wavelet estimators is marginal for different models. More-over, the use of nonlinear wavelet estimators would lead to a much morecomplicated analysis in theory.

    3.4. Wavelet-Based Tests

    Put Q(f1� f2)≡∫ π

    −π[f1(ω)− f2(ω)]2 dω for any f1(·) and f2(·)� We use thequadratic form

    Q(f̂i� f̂i0)=Ji∑j=0

    2j∑k=1α̂2ijk�(3.13)

    where f̂i0(ω)≡ (2π)−1R̂i(0) and the equality follows by Parseval’s identity. Ourfirst test statistic

    Ŵ1 ≡(

    n∑i=1

    2πTiJi∑j=0

    2j∑k=1α̂2ijk − M̂

    )/V̂ 1/2�(3.14)

    where

    M̂ ≡n∑i=1R̂2i (0)Mi0�

    V̂ ≡n∑i=1R̂4i (0)Vi0�

    Mi0 ≡Ti−1∑h=1(1 − h/Ti)bJi(h�h)�

    Vi0 ≡ 4Ti∑h=1

    Ti∑m=1(1 − h/Ti)(1 −m/Ti)b2Ji (h�m)�

  • 1528 Y. HONG AND C. KAO

    and

    bJ(h�m)≡ 2 Re[aJ(h�m)+ aJ(h�−m)]�

    aJ(h�m)≡J∑j=0

    2j∑k=1Ψ̂jk(h)Ψ̂

    ∗jk(m)�

    and Ψ̂jk(·) is as in (3.6)�Put âijk ≡ (2π)−1/2 ∑Ti−1h=1−Ti ρ̂i(h)Ψ̂jk(h)� where ρ̂i(h) ≡ R̂i(h)/R̂i(0). Our

    second test statistic

    Ŵ2 ≡ 1√n

    n∑i=1

    (2πTi

    Ji∑j=0

    2j∑k=1â2ijk −Mi0

    )/V

    1/2i0 �(3.15)

    Intuitively, Ŵ2 can be viewed as a heteroskedasticity-corrected test whileŴ1 is a heteroskedasticity-consistent test, where heteroskedasticity arises fromdifferent variances σ 2i and finest scales Ji� In Ŵ2� these two forms of het-eroskedasticity are corrected first for each i. As is shown below, Ŵ1 and Ŵ2are asymptotically N(0�1) under H0� but their power properties generally dif-fer. The heteroskedasticity-robust test Ŵ1 may be more powerful than theheteroskedasticity-consistent test Ŵ2. This differs from the well-known estima-tion result that correcting heteroskedasticity leads to more efficient estimation(e.g., the feasible GLS is more efficient than OLS).

    Both Ŵ1 and Ŵ2 apply to one-way or two-way error component models. Forone-way component models, however, one can use v̂it ≡ ûit − ūi· if one knowsλt = 0� and use v̂it ≡ ûit − ū·t if one knows µi = 0� The limit distribution of thetest statistics is unchanged.

    4. ASYMPTOTIC DISTRIBUTION

    We now impose a set of unified regularity conditions that hold under bothH0 and HA.

    ASSUMPTION 3:√nT(β̂−β∗)=OP(1)� where β∗ = β under H0.

    ASSUMPTION 4: (a) For each i� {vit} is covariance-stationary with E(vit) = 0,E(v2it) = σ 2i ∈ [c�C]� and E(v8it) ∈ [c�C]; (b) the individual and time effects,µi and λt� can be stochastic (random effects) or deterministic ( fixed effects).

    ASSUMPTION 5: Put Γ̃ixv(h) ≡ T−1i∑Ti

    t=h+1 X̃it ṽit−h if h ≥ 0 and Γ̃ixv(h) ≡Γ̃ixv(−h)′ if h < 0� Γixv(h) ≡ p lim Γ̃ixv(h)� where X̃it ≡Xit − X̄i· − X̄·t + X̄

  • SERIAL CORRELATION IN PANEL MODELS 1529

    and ṽit ≡ vit − v̄i· − v̄·t + v̄� Then (a) sup1≤i≤n T−1i∑Ti

    t=1E‖X̃it‖4 ≤ C;(b) sup1≤i≤n sup1≤h

  • 1530 Y. HONG AND C. KAO

    Most remarkably, we permit but do not require Ji → ∞ for any i; all Ji canbe fixed as n�T → ∞ under H0� This is in sharp contrast to Lee and Hong(2001), who require J → ∞ as T → ∞ to achieve asymptotic normality in test-ing serial correlation for observed raw time-series data. The reason that all Jican be fixed is that the additional smoothing provided by n ensures asymptoticnormality of Ŵ1 and Ŵ2� Intuitively, Ŵ1 and Ŵ2 are sums of approximately in-dependent random variables {2πTiQ(f̂i� f̂i0)}ni=1� By the central limit theorem,they will converge to a normal distribution as n → ∞. This occurs no mat-ter whether Ji → ∞. In the time-series or cross-sectional literature it is oftenfound that the normal approximation is inadequate for the finite sample dis-tributions of quadratic forms involving smoothed nonparametric estimation.This is because the asymptotic normality of these quadratic forms requiresthe smoothing parameter to grow or vanish at a suitable rate as the samplesize grows and the convergence rate of test statistics delicately depends on thesmoothing parameter. The fact that the asymptotic normality of Ŵ1 and Ŵ2does not depend on whether Ji → ∞ suggests that asymptotic approxima-tion may work well in the panel context. Indeed, our simulation shows thatŴ1 and Ŵ2 perform well in finite samples even when Ji = 0 for all i� Mostimportantly, the fact that Ji may be fixed for all i allows use of data-drivenmethods that may deliver a fixed finest scale under H0� Sensible data-drivenmethods have this feature because the optimal finest scale J0 = 0 under H0�We will propose a plug-in method to select a finest scale, which automaticallyconverges to 0 under H0 and grows to ∞ under HA� thus ensuring consistencyagainst serial correlation of unknown form. Such a data-driven method couldnot be used for Lee and Hong’s (2001) test.

    Although we require that both n and T grow to ∞, we do not imposea restrictive relative speed limit between them. Also, no specific estimationmethod is required. From the proof of Theorem 1, we find that parame-ter estimation uncertainty for β has no impact on the null limit distributionof Ŵ1 and Ŵ2, no matter whether Xit contains lagged dependent variablesor deterministic/stochastic trending variables. This is in contrast to Durbinand Watson (1951) and Box and Pierce (1970), whose test statistics or limitdistributions have to be modified when applied to estimated residuals of astationary dynamic model. If regressors contain deterministic or stochastictrending variables, the limit distributions of these tests will become nonstan-dard (e.g., Kao and Chiang (2000), Kao and Emerson (2004)). Intuitively,parameter estimation uncertainty for β induces an adjustment of a finite num-ber of degrees of freedom for Ŵ1 and Ŵ2, but this becomes negligible asn→ ∞�

    The tests Ŵ1 and Ŵ2 are applicable for both small and large Ji� When (andonly when) Ji → ∞ for all i = 1� � � � � n� we can use the following simplified

  • SERIAL CORRELATION IN PANEL MODELS 1531

    versions of test statistics:

    W̃1 =∑n

    i=1[2πTi

    ∑Jij=0

    ∑2jk=1 α̂

    2ijk − R̂2i (0)(2Ji+1 − 1)

    ]2[∑n

    i=1 R̂4i (0)(2Ji+1 − 1)

    ]1/2 �(4.3)

    W̃2 = 1√n

    n∑i=1

    [2πTi∑Jij=0 ∑2jk=1 â2ijk − (2Ji+1 − 1)2(2Ji+1 − 1)1/2

    ]�(4.4)

    These are the generalizations of Lee and Hong’s (2001) test to estimated resid-uals of panel models. Theorem 2 shows that they are asymptotically N(0�1)under H0 if Ji → ∞ for all i.

    THEOREM 2: Suppose Assumptions 1–5 hold, 2Ji+1 = aiT νi for ai ∈ [c�C] andν ∈ (0� 12)� n/T ν log22 T → 0� n/T 2(2τ−1)−2(2τ−1/2)ν → 0 as n�T → ∞� where τ ≥ 32is as in Assumption 2. If {εit} in (2.1) is i.i.d. for each i� then W̃1 − Ŵ1 p→ 0� W̃2 −W2

    p→ 0, W̃1 d→ N(0�1), and W̃2 d→ N(0�1).Thus, for large (and only large) Ji� W̃1 and W̃2 are asymptotically equivalent

    to Ŵ1 and Ŵ2 respectively. Note that here, n cannot grow faster than T ν� whereν < 12 .

    5. CONSISTENCY

    We now show that Ŵ1 and Ŵ2 are consistent against HA. We assume thefollowing condition.

    ASSUMPTION 6: For each i, {vit} is a fourth-order zero-mean stationaryprocess with

    ∑∞h=−∞R

    2i (h) ≤ C and

    ∑∞j=−∞

    ∑∞k=−∞

    ∑∞l=−∞ |κi(j�k� l)| ≤ C�

    where κi(j�k� l) is the fourth-order cumulant of the joint distribution of {vit � vit+j�vit+k� vit+l}�

    Assumption 6 characterizes temporal dependence of {vit}� When {vit} isGaussian, the cumulant condition holds trivially because κi(j�k� l)= 0 for allj�k� l ∈ Z� If for each i� {vit} is a fourth-order stationary linear process withabsolutely summable coefficients and i.i.d. innovations whose fourth order mo-ment exists, the cumulant condition also holds (e.g., Hannan (1970, p. 211)).More primitive conditions (e.g., strong mixing) could be imposed, but suchprimitive conditions would rule out long memory processes. Assumption 6 al-lows long memory processes I(d) with d < 14 for {vit}.

    THEOREM 3: Put nA ≡ #(NA) and ci ≡ Ti/T� where NA ≡ {i : 0 ≤ i ≤ n�Q(fi� fi0) > 0}. Suppose Assumptions 1–6 hold, (nAT)−1 ∑ni=1 2Ji → 0, and

  • 1532 Y. HONG AND C. KAO

    Ji → ∞ for all i = 1� � � � � n as n�T → ∞. Then (a) (nAT)−1V̂ 1/2Ŵ1 − n−1A ×∑i∈NA 2πciQ(fi� fi0)

    p→ 0; (b) if in addition 2Ji+1 = aiT νi for all i� where ai ∈[c�C] and ν ∈ (0�1)� then (nAT 1−ν/2)−1Ŵ2 −n−1A

    ∑i∈NA π(ci/ai)

    1/2Q(fi� fi0)p→ 0.

    As discussed earlier, although serial correlation in {vit} may differ from se-rial correlation in {εit}� H0 holds if and only if {vit} is serially uncorrelated.Consequently, the index set NA is nonempty under HA, at least for large n� Itfollows that n−1A

    ∑i∈NA ciQ(fi� fi0) ≥ c for large n. Then P[Ŵ1 > C(n�T )] → 1

    and P[Ŵ2 >C(n�T )] → 1 under HA for any sequence of constants {C(n�T )=o[nAT/(∑ni=1 2Ji )1/2]}� Thus, Ŵ1 and Ŵ2 are consistent against HA provided(nAT)

    −1 ∑ni=1 2

    Ji → 0 and Ji → ∞ for all i. For simplicity, we let Ji → ∞ forall i to ensure consistency against HA� This differs from the case under H0�where Ji can be fixed for all i� Our data-driven method below will deliver afinest scale that automatically converges to 0 under H0 but grows to ∞ with Tunder HA�

    Under HA� Ŵ1 and Ŵ2 diverge to ∞ at the rate of nAT/(∑ni=1 2Ji )1/2� Thus,the larger the set NA is, the more powerful Ŵ1 and Ŵ2 are. In fact, the powerdepends on nA/n� the proportion of individuals with serial correlation. For2Ji+1 = aiT νi , nAT(

    ∑ni=1 2

    Ji )1/2 ∝ (nA/n)n1/2T 1−ν/2� This implies that Ŵ1 and Ŵ2have asymptotic power 1 against HA even if the proportion nA/n → 0 at arate slightly slower than n1/2T 1−ν/2� For Ŵ1 and Ŵ2� serial correlations fromdifferent individuals never cancel each other out when some individuals havepositive autocorrelations and some have negative autocorrelations. In contrast,cancellation may occur at least in part for existing tests, leading to low or littlepower; see Section 7 for more discussion. We emphasize that the ability of ourtests to detect serial correlation in the presence of substantial inhomogeneityin serial correlation across i is not due to the use of wavelets, but to the use ofthe quadratic form in (3.13). On the other hand, we may extend the adaptiveprocedures of Fan (1996) and Spokoiny (1996) to further improve the powerof our tests, as one referee pointed out. We leave this for future research.

    Theorems 1 and 3 imply that for large n and T , the negative values ofŴ1 and Ŵ2 can occur only under H0� Thus, upper-tailed N(0�1) critical valuesshould be used.

    As noted earlier, Ŵ1 and Ŵ2 are heteroskedasticity-consistent and hetero-skedasticity-corrected tests respectively. An interesting question is, which test,Ŵ1 or Ŵ2� is more powerful? For convenience, we assume 2Ji+1 = aiT νi for all i�where ai ∈ [c�C] and ν ∈ (0�1)� and assume a larger ai for processes withstronger serial correlation in terms of a larger Q(fi� fi0). With this rule, wehave the following theorem.

    THEOREM 4: Suppose Assumptions 1–6 hold, n = γT ς for γ ∈ (0�∞) andς ∈ (0�∞)� and 2Ji+1 = aiT νi for ai ∈ [c�C] and ν ∈ (0�1)� If ai is monotonically

  • SERIAL CORRELATION IN PANEL MODELS 1533

    increasing in Q(fi� fi0) and Ti = T for all i� then Ŵ1 is more efficient than Ŵ2 interms of Bahadur’s asymptotic efficiency criterion.

    Bahadur’s (1960) asymptotic slope criterion is pertinent for power compari-son of large sample tests under fixed alternatives. The basic idea is to comparethe logarithms of the asymptotic significance levels (i.e., p-values) of the testsunder a fixed alternative. Bahadur’s asymptotic efficiency is defined as the limitratio of the sample sizes required by the two tests under comparison to achievethe same asymptotic significance level (p-value) under a fixed alternative.

    Theorem 4 implies that for hypothesis testing, correcting heteroskedasticitymay give poorer power. This is in contrast to the well-known result that correct-ing heteroskedasticity leads to more efficient estimation. Intuitively, for Ŵ2�a larger Q(fi� fi0) is more heavily discounted by

    √Vi0 ∼ 2(2Ji+1 − 1)1/2 when

    Ji is larger. Thus, it is less powerful than Ŵ1� which puts uniform weighting oneach Q(fi� fi0)� Of course it is possible that Ŵ1 is asymptotically less powerfulthan Ŵ2� as will occur when ai is monotonically decreasing in Q(fi� fi0)� How-ever, sensible data-driven methods usually provide a rule that ai is increasingin Q(fi� fi0)� When Ji = J for all i� Ŵ1 and Ŵ2 may still not be asymptoticallyequally efficient, because of heteroskedasticity (σ 2i �= σ 2). We note that the as-ymptotic power of Ŵ1 and Ŵ2 does not depend on mother wavelet ψ(·). Allwavelets are asymptotically equally efficient by Bahadur’s criterion. This dif-fers from the kernel method, where the choice of a kernel affects the asymp-totic power of tests (Hong (1996)).

    6. ADAPTIVE CHOICE OF FINEST SCALE

    Theorem 1 implies that the choice of Ji is not important for asymptoticnormality of Ŵ1 and Ŵ2. Both small and large Ji can be used� However,the choice of Ji may have significant impact on the power. Therefore, it isdesirable to choose Ji via suitable data-driven methods. We now develop adata-driven method to select a finest scale. We first justify the use of a data-driven finest scale Ĵ. For simplicity, we consider a common Ĵ for all i here. Weuse Ŵc(Ĵ) and Ŵc(J) to denote the Ŵc tests using Ĵ and J respectively, wherec = 1�2.

    We impose a condition on the smoothness of ψ̂(·) at 0.ASSUMPTION 7: |ψ̂(z)| ≤ C|z|q for some q ∈ (0�∞).THEOREM 5: Suppose Assumptions 1–5 and 7 hold, and Ĵ is a data-driven

    finest scale with 2Ĵ/2J = 1 + oP(2−J/2)� where J is a nonstochastic finest scalesuch that 22J/(n2 + T) → 0 as n → ∞ and T → ∞� If {εit} in (2.1) is i.i.d.for each i, then W1(Ĵ)−W1(J) p→ 0�W2(Ĵ)−W2(J) p→ 0�Ŵ1(Ĵ) d→ N(0�1), andŴ2(Ĵ)

    d→ N(0�1).

  • 1534 Y. HONG AND C. KAO

    Thus, the use of Ĵ rather than J has no impact on the limit distribution ofŴ1(Ĵ) and Ŵ2(Ĵ) provided that Ĵ converges to J at a suitable rate. The ratecondition 2Ĵ/2J − 1 = oP(2−J/2) is mild. If 2J ∝ T 1/5� for example, we require2Ĵ/2J = 1 +oP(T−1/10)� If J is fixed (e.g., J = 0), which occurs under H0 for ourdata-driven method below, the condition becomes 2Ĵ/2J

    p→ 1�So far very few data-driven methods to choose J are available in the lit-

    erature. Walter (1994) and Hall and Patil (1996) consider some data-drivenmethods in related but different contexts. They cannot be applied directly toour tests. We now develop a data-driven Ĵ that can satisfy the condition of The-orem 5. We first derive the average asymptotic IMSE formula for {f̂i(·)}ni=1�which was not available in the literature. We impose an additional conditionon {vit}.

    ASSUMPTION 8:∑∞

    h=−∞ |h|q|Ri(h)| ≤ C for all i, where q ∈ [1�∞) is as inAssumption 7.

    This characterizes the smoothness of fi(·)� It rules out long memoryprocesses. Under Assumption 8, we have a well-defined qth order generalizedspectral derivative of fi(ω):

    f(q)i (ω)≡ (2π)−1

    ∞∑h=−∞

    |h|qRi(ω)e−ihω� ω ∈ [−π�π]�(6.1)

    We also define a measure λq ∈ (0�∞) of the smoothness of ψ̂(·) at 0:

    λq ≡ −(2π)2q+1

    1 − 2−2q limz→0|ψ̂(z)|2|z|2q �(6.2)

    For the Franklin wavelet (3.3), q= 2; for the second-order spline wavelet (3.4),q= 3.

    To state the next result, we define a pseudo spectral density estimator f̄i(·)for fi(·) that is based on the unobservable series {vit}Tit=1; namely,

    f̄i(ω)≡ (2π)−1R̄i(0)+Ji∑j=1

    2j∑k=1ᾱijkΨjk(ω)� ω ∈ [−π�π]�(6.3)

    where R̄i(h)≡ T−1i∑Ti

    t=h+1 vitvit−|h| and ᾱijk ≡ (2π)−1/2∑Ti−1

    h=1−Ti R̄i(h)Ψ̂∗jk(h).

    THEOREM 6: Suppose Assumptions 1–8 hold, λq∈ (0�∞)� Ji→ ∞, 2Ji/Ti→ 0as Ti → ∞. Then (a) for each i� Q(f̂i� fi) = Q(f̄i� fi) + oP(2Ji/Ti + 2−2qJi)�

  • SERIAL CORRELATION IN PANEL MODELS 1535

    and

    EQ(f̄i� fi)= 2Ji+1

    Ti

    ∫ π−πf 2i (ω)dω+ 2−2q(Ji+1)λq

    ∫ π−π

    [f (q)(ω)

    ]2dω

    + o(2Ji/Ti + 2−2qJi)�(b) If in addition Ji = J for all i and Ti/T = ci� then n−1 ∑ni=1Q(f̂i� fi) =n−1

    ∑ni=1Q(f̄i� fi)+ oP(2J/T + 2−2qJ)� and

    n−1n∑i=1EQ(f̄i� fi)= 2

    J+1

    Tn−1

    n∑i=1c−1i

    ∫ π−πf 2i (ω)dω

    + 2−2q(J+1)λqn−1n∑i=1

    ∫ π−π

    [f(q)i (ω)

    ]2dω

    + o(2J/T + 2−2qJ)�Theorem 6(a) gives the asymptotic IMSE of f̄i(·)� and Theorem 6(b) gives

    the average asymptotic IMSE of {f̄i(·)}ni=1. They imply that the optimal conver-gence rates ofQ(f̂i� fi) and n−1

    ∑ni=1Q(f̂i� fi) are the same as those ofQ(f̄i� fi)

    and n−1∑n

    i=1Q(f̄i� fi) respectively. Parameter estimation uncertainty in β̂ hasno impact on the optimal convergence rates ofQ(f̂i� fi) and n−1

    ∑ni=1Q(f̂i� fi)�

    The optimal finest scale J0 that minimizes the average asymptotic IMSEof {f̄i(·)}ni=1 is

    2J0+1 = [2qλqξ0(q)T ]1/(2q+1)�(6.4)where ξ0(q) ≡ ∑ni=1 ∫ π−π[f (q)i (ω)]2 dω/∑ni=1 c−1i ∫ π−π f 2i (ω)dω� This is infeasi-ble because ξ0(q) is unknown under HA� However we can use some estima-tor ξ̂0(q) to obtain a plug-in finest scale Ĵ0:

    2Ĵ0+1 ≡ [2qλqξ̂0(q)T ]1/(2q+1)�(6.5)Because Ĵ0 is a nonnegative integer, we should use

    Ĵ0 ≡ max{[

    12q+ 1 log2

    (2qλqξ̂0(q)T

    )− 1]�0}�(6.6)where the square bracket denotes the integer part. We impose the followingcondition on ξ̂0(q).

    ASSUMPTION 9: ξ̂0(q)− ζ0(q)= oP(T−δ) for some constant ζ0(q)� where δ=1/2(2q+ 1) if ζ0(q) ∈ [c�C] and δ= 1/(2q+ 1) if ζ0(q)= 0.

  • 1536 Y. HONG AND C. KAO

    Note that the condition on ξ̂0(q) is more stringent when ζ0(q) = 0 thanwhen ζ0(q) �= 0� but for both cases the conditions are mild� We do not requirep lim ξ̂0(2)≡ ζ0(q)= ξ0(q)� where ξ0(q) is as in (6.4)� When (and only when)ζ0(q)= ξ0(q)� Ĵ0 in (6.6) will converge to the optimal J0 in (6.4).

    COROLLARY 1: Suppose Assumptions 1–9 hold and Ĵ0 is given as in (6.6).

    If {εit} in (2.1) is i.i.d. for each i, then Ŵ1(Ĵ0) d→ N(0�1) and Ŵ2(Ĵ0) d→ N(0�1).We can use parametric or nonparametric (e.g., local linear smoothing) meth-

    ods for estimator ξ̂0(q). The former generally deliver a suboptimal finest scale,but have less variation in finite samples. The latter will deliver an asymptot-ically optimal finest scale, but are subject to substantial variation. There arealso some methods (e.g., cross-validation and AIC) in the literature for select-ing the truncation parameter. There is usually a trade-off between Type I andType II errors in choosing a specific method.

    To obtain reasonable power while having a proper rejection probability un-der H0 for sample sizes often encountered in economics, we use a parametricAR(pi) model for each i:

    v̂it = γi0 +pi∑h=1γihv̂it−h + εit (t = 1� � � � � Ti; i= 1� � � � � n)�(6.7)

    where v̂it ≡ 0 if t ≤ 0� In practice, one can use AIC or BIC to select pi for each i�Suppose γ̂i ≡ (γ̂i0� γ̂i1� � � � � γ̂ip)′ is the OLS estimator of γi ≡ (γi0� γi1� � � � � γip)′�We consider q = 2; an example is the Franklin wavelet in (3.3), whose λ2 =π4/45� We have

    ξ̂0(2)≡n∑i=1

    ∫ π−π

    [d2

    dω2f̂i(ω)

    ]2dω

    / n∑i=1(Ti/T )

    ∫ π−πf̂ 2i (ω)dω�(6.8)

    where f̂i(ω) ≡ (2π)−1|1 − ∑pih=1 γ̂ihe−ihω|−2� We note that ξ̂0(2) satisfies As-sumption 9 with q= 2 because for parametric AR(pi) approximations, ξ̂0(2)−ζ0(2)=OP((nT)−1/2).

    The performance of the plug-in method in (6.6) relies on the specificationin (6.7). To further improve the power, one could also consider a data-driven,individual-specific Ĵi using the IMSE criterion of fi(·) in Theorem 6(a)� Suchindividual-specific Ĵi may more effectively capture spatial nonhomogeneity inthe degree of serial correlation across i� However, they may have wide varia-tions across i, leading to large Type I errors for the tests. A compromise is todevelop a data-driven Ĵc� where c is an index for some suitable groups such asregions and sectors where all individuals in the same group will have the samefinest scale.

  • SERIAL CORRELATION IN PANEL MODELS 1537

    In fact, the IMSE criterion is more suitable for estimation than for testing.A better criterion for testing is to maximize power or trade off between leveldistortion and power improvement. This will, however, require higher-orderasymptotic analysis for our tests, which is beyond the scope of this paper andshould be pursued in future work. Simulation studies below show that the finestscale chosen via (6.6)–(6.8) gives reasonable rejection probabilities under H0and gives robust and good power, particularly when the spectrum has distinctpeaks or kinks.

    7. MONTE CARLO EXPERIMENT

    We now compare the performance of Ŵ1 and Ŵ2 with three existing testsfor serial correlation—Bhargava, Franzini, and Narendranathan’s (1982; BFN)Durbin–Watson type test, Baltagi and Li’s (1995; BL) LM test,

    BL = nT2

    T − 1

    (n∑i=1

    T∑t=1v̂it v̂it−1

    / n∑i=1

    T∑t=1v̂2it

    )2�(7.1)

    and Bera, Sosa-Escudero, and Yoon’s (2001; BSY) modified LM test:

    BSY = nT 2(B̃+ Ã

    T

    )2/[(T − 1)

    (1 − 2

    T

    )]�(7.2)

    where v̂it is the within residual,

    Ã≡(

    1 −n∑i=1ũ

    ′iIT ũi

    )/ n∑i=1

    T∑t=1ũ2it �

    B̃ ≡n∑i=1

    T∑t=1ũitũit−1

    / n∑i=1

    T∑t=1ũ2it �

    IT is a matrix of ones, and ũi ≡ (ũi1� � � � � ũiT )′ is the OLS residual without ran-dom effects. All these tests consider balanced panels. Both BL and BSY havean asymptotic χ21 distribution under H0. In contrast, BFN converges to 2 un-der H0, a degenerate distribution. Bhargava, Franzini, and Narendranathan(1982, p. 436) suggest using a critical value of 2 at the 5% level. We findthat BFN rejects H0 up to 67�7%, 66�9%, and 64�8% at the 5% level, when(n�T ) = (25�32), (50�64), and (100�128) respectively. It seems that this testcannot be used, so we drop it from comparison. For Ŵ1 and Ŵ2� because thechoice of ψ(·) is not important, we only use the Franklin wavelet in (3.3).To examine the impact of the choice of J� we consider J = 0�1 and the data-driven Ĵ0 in (6.6). We also compare Ŵ1 and Ŵ2 with the panel versions ofHong’s (1996) kernel-based tests, K̂1 and K̂2, which are obtained by gener-

  • 1538 Y. HONG AND C. KAO

    alizing Hong’s (1996) kernel method to model (2.1). They are based on anindividual-specific kernel spectral density estimator. We use the Daniell ker-nel k(z) = sin(πz)/πz� z ∈ R� which has the optimal power over a class ofkernels. We choose a data-driven bandwidth using a plug-in method similar tothat for Ĵ0�

    We consider the following two data generating processes (DGP): (a) DGP1�a static panel model: Yit = 5 + �5Xit + µi + εit , Xit = �5Xit−1 + ηit� ηit ∼i.i.d. U[−�5� �5]� and (b) DGP2� a dynamic panel model:Yit = 5+ �5Yit−1 +µi+εit� For both DGPs, µi ∼ i.i.d. N(0�σ 2µ)� Let τ measure the relative strength ofrandom effects (no random effect when τ = 0) and we choose a variety of τas in Baltagi, Chang, and Li (1992). To examine the probability of rejecting acorrect H0, we set εit = zit� where zit ∼ i.i.d. N(0�1)� To examine the power, weconsider the following error processes:

    AR(1) Alternatives:

    AR(1)a: εit = �2εit−1 + zit� i= 1� � � � � n�AR(1)b: εit = −�2εit−1 + zit� i= 1� � � � � n�

    AR(1)c: εit ={�2εit−1 + zit� i= 1� � � � � n2 �−�2εit−1 + zit� i= n2 + 1� � � � � n�

    (7.3)

    ARMA(12�4)Alternatives:

    ARMA(12�4)a: εit = −�3εit−12 + zit + zit−4� i= 1� � � � � n�ARMA(12�4)b: εit = �3εit−12 + zit − zit−4� i= 1� � � � � n�

    ARMA(12�4)c: εit ={−�3εit−12 + zit + zit−4� i= 1� � � � � n2 ��3εit−12 + zit − zit−4� i= n2 + 1� � � � � n�

    (7.4)

    AR(1)a and AR(1)b are the full positive and negative AR(1) respectively.We expect BL and BSY to have optimal power against them. Wavelet testshave no advantages because these alternatives have a relatively flat spectrum.AR(1)c is a mixed AR(1), where the first half individuals have a positive ARcoefficient, and the second half have a negative AR coefficient. On the otherhand, ARMA(12�4) can arise from monthly data; it has four distinct spec-tral peaks.

    The top panel of Table I reports rejection probabilities under H0 with τ = �4for DGP1� When n < T , the rejection probabilities of BSY are reasonable andthe best among all the tests. BL overrejects H0, while Ŵ1, Ŵ2, K̂1, and K̂2 under-reject H0 but not excessively. For other values of τ (not reported), the rejectionprobability of BSY is sensitive to the choice of τ� displaying severe overrejec-tions when τ is large. BL still overrejects H0 for all τ� The Ŵ1, Ŵ2� K̂1, and K̂2tests are robust to the choice of τ. The rejection probabilities of Ŵ1 and Ŵ2

  • SER

    IAL

    CO

    RR

    EL

    AT

    ION

    INPA

    NE

    LM

    OD

    EL

    S1539

    TABLE I

    PERCENTAGE OF REJECTIONS UNDER THE NULL HYPOTHESIS IN THE STATIC PANEL MODEL

    (n, T ) (5, 8) (10, 16) (25, 32) (50, 64) (5, 5) (10, 10) (25, 25) (50, 50) (8, 5) (16, 10) (32, 25) (64, 50)

    Level 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5%

    Asymptotic Critical ValuesŴ1(0) 2�5 1�6 4�5 2�8 6�0 3�1 6�6 2�9 �6 �0 3�1 1�7 4�8 2�6 6�6 3�4 �4 �0 2�6 1�8 4�9 2�5 6�3 3�0Ŵ1(1) 1�8 1�3 3�3 2�0 5�1 2�8 5�8 2�7 �3 �0 2�0 1�3 3�7 2�1 5�8 3�5 �2 �1 1�7 �7 3�5 1�2 4�8 2�1Ŵ1(Ĵ0) 1�8 1�3 3�3 1�9 6�3 4�0 7�9 4�0 �6 �0 2�0 1�3 4�5 2�5 7�8 4�3 �4 �1 1�7 �7 4�1 1�7 6�9 3�8Ŵ2(0) 1�4 �4 4�5 2�5 5�7 3�0 6�8 3�4 �1 �0 2�4 1�1 4�3 2�2 6�6 3�5 �1 �0 1�4 �7 4�3 2�8 5�9 2�9Ŵ2(1) 1�0 �4 2�8 1�4 4�6 1�9 6�0 3�0 �0 �0 �8 �3 2�9 1�4 5�4 2�6 �0 �0 �7 �3 2�4 1�7 4�2 2�2Ŵ2(Ĵ0) 1�0 �4 3�0 1�1 4�7 2�3 8�2 3�7 �1 �0 �8 �3 3�0 1�6 7�2 3�6 �1 �0 �7 �3 2�6 1�3 6�3 3�7K̂1 2�5 1�5 3�6 1�7 6�6 3�9 7�2 3�9 �9 �0 2�3 1�3 5�0 2�9 7�5 4�0 �4 �2 1�6 �8 4�2 2�1 6�7 3�5K̂2 1�4 �4 2�8 1�2 5�8 2�6 8�0 3�8 �1 �0 1�2 �2 3�3 2�0 7�5 3�4 �1 �0 �7 �2 3�1 1�7 7�1 3�2BL 30�5 19�9 22�6 14�9 23�2 14�6 22�5 14�5 45�3 32�4 35�2 23�9 30�2 20�8 28�2 19�2 57�8 46�3 48�0 35�4 34�7 24�5 35�3 23�8BSY 9�0 3�0 5�7 2�1 6�7 1�8 7�7 2�2 18�3 9�3 13�5 5�1 9�9 3�4 7�3 2�8 31�9 20�7 22�0 10�3 14�2 5�9 11�4 4�8

    Bootstrapped Critical ValuesŴ1(0) 8�9 4�6 8�3 3�3 10�5 5�3 10�9 5�8 12�2 5�3 9�8 4�7 9�5 3�9 10�4 3�8 7�8 3�3 10�4 3�8 10�3 4�8 10�6 5�9Ŵ1(1) 8�6 4�0 8�9 4�4 9�4 4�8 10�1 4�9 12�4 5�5 9�9 5�1 9�1 4�9 9�5 3�6 8�8 3�6 9�2 5�4 10�2 5�2 7�4 4�4Ŵ1(Ĵ0) 8�9 4�6 8�3 3�3 9�7 4�0 10�5 4�8 12�2 5�3 9�8 4�7 9�5 3�9 9�3 4�2 9�8 5�0 9�2 5�4 10�0 5�8 8�4 4�8Ŵ2(0) 12�7 5�8 10�9 5�3 11�5 5�5 11�9 5�6 16�0 10�2 14�7 8�2 12�3 6�4 12�0 4�9 11�9 4�3 11�7 4�8 10�6 5�5 11�3 5�8Ŵ2(1) 12�2 6�7 11�2 5�9 10�6 6�3 11�1 5�8 16�1 9�3 15�5 9�1 12�8 7�6 11�7 5�0 11�8 6�6 10�0 5�6 10�4 3�8 9�2 4�4Ŵ2(Ĵ0) 12�7 5�8 10�9 5�3 11�1 5�9 10�8 5�4 16�0 10�2 14�7 8�2 12�3 6�4 12�1 5�2 13�4 8�0 10�0 5�6 9�4 3�8 9�4 5�4K̂1 12�9 6�1 6�4 3�3 9�2 4�7 11�7 5�9 22�5 11�9 9�3 4�6 9�2 3�9 9�9 3�7 7�3 3�4 9�4 5�0 10�1 5�7 10�4 5�9K̂2 20�9 11�3 10�6 6�8 11�3 5�0 11�9 6�2 36�9 23�9 21�1 11�4 13�5 7�7 10�9 6�1 10�9 7�0 10�8 6�2 10�6 5�3 10�5 5�0BL 5�0 1�5 1�8 �4 6�9 2�7 9�4 4�0 10�6 2�7 �8 �2 3�6 1�5 8�1 4�1 8�1 4�0 8�6 3�8 11�5 6�3 11�6 5�7BSY 3�7 1�2 1�7 �4 4�7 1�7 9�2 5�0 1�4 �0 �4 �0 3�1 1�5 8�4 3�4 7�3 5�4 10�8 5�9 10�7 5�5 10�1 5�3

    Note: (a) DGP: Yit = 5 + �5Xit +µi + εit �Xit = �5Xit−1 +ηit �ηit ∼ i.i.d. U[−�5� �5]�µi ∼ i.i.d. N(0� �4)� and εit ∼ i.i.d. N(0�1). (b) Ŵ1, heteroskedasticity-consistent Franklinwavelet-based test; Ŵ2, heteroskedasticity-corrected Franklin wavelet-based test; Ĵ0� data-driven finest scale. (c) K̂1, heteroskedasticity-consistent Daniell kernel-based test;K̂2� heteroskedasticity-corrected Daniell kernel-based test. (d) BL, Baltagi–Li test; BSY, Bera, Sosa-Escudero, and Yoon test. (e) 1000 simulation replications; 500 bootstrapreplications.

  • 1540 Y. HONG AND C. KAO

    are better when a smaller J or data-driven Ĵ0 is used. When n = T� Ŵ1, Ŵ2�K̂1, and K̂2 all underreject H0, but they improve when n= T increase. On theother hand, BL overrejects severely while BSY performs well. When n > T�again, BSY has the best rejection probabilities if T > 10� but overrejects if T issmall� BL still overrejects H0 for all sample sizes. The Ŵ1, Ŵ2� K̂1, and K̂2 testsall underreject H0 but they are substantially improved when T > 25, especiallywhen data-driven Ĵ0 or J = 0 is used. The rejection probabilities of Ŵ1 and Ŵ2are similar to those of K̂1 and K̂2 in all cases.

    Table I indicates that when using asymptotic critical values, most tests donot perform well when n�T < 32. For small samples, we suggest using wildbootstrap (e.g., Härdle and Mammen (1993), Davidson and Flachaire (2001),Horowitz (2001)) as an alternative to asymptotic approximation. The bottompanel of Table I reports bootstrap rejection probabilities, which are basedon 1000 replications and 500 bootstrap resamples. The Ŵ1 and Ŵ2 tests havereasonable rejection probabilities in small samples for all cases (n < T , n= T ,and n > T ). It appears that wild bootstrap can remedy the underrejection ofŴ1 and Ŵ2 using asymptotic critical values, especially when the sample size isas small as 5 or 8. However, K̂1 and K̂2 overreject when (n�T )= (5�5). BL andBSY perform better using wild bootstrap, but they still tend to underreject.

    Table II reports the rejection probabilities under DGP2, a dynamic panelprocess. Again Ŵ1, Ŵ2� K̂1, and K̂2 all underreject using asymptotic critical val-ues, but they all perform well using wild bootstrap, though they still underrejectcompared to Table I. Unlike under DGP1 (a static panel process), BSY and BLperform poorly using either asymptotic or bootstrap critical values, though BLperforms slightly better than BSY when n≤ T . BSY overrejects for almost allcases using bootstrap critical values.

    The top panel of Table III first reports the Type I error corrected pow-ers against AR(1) alternatives, with τ = �4� using empirical critical values,which provides a fair comparison. Because empirical critical values are notobservable in practice, we also report power using bootstrap critical val-ues. Under AR(1)a� BSY is the most powerful, followed by BL. This is ex-pected because both BSY and BL are optimal against AR(1). The Ŵ1, Ŵ2�K̂1, and K̂2 tests have nontrivial but substantially lower power when samplesizes are small. This is because AR(1)a has a relatively flat spectrum andthe advantage of consistent testing is not displayed. Under AR(1)b, BL be-comes the most powerful. Somewhat surprisingly, Ŵ1, Ŵ2� K̂1, and K̂2 haverather high power and dominate BSY, perhaps due to the fact that a negativeAR(1) has a less smooth spectrum than a positive AR(1). For example, with(n�T )= (10�16), the power of BSY is 6�3% while those of BL and Ŵ1(0) are71�4% and 47�1% respectively. Interestingly, both BSY and BL fail to detectAR(1)c� the mixed model. In contrast, Ŵ1, Ŵ2� K̂1, and K̂2 are very powerfulagainst AR(1)c, indicating that wavelet and kernel tests are rather effective in

  • SER

    IAL

    CO

    RR

    EL

    AT

    ION

    INPA

    NE

    LM

    OD

    EL

    S1541

    TABLE II

    PERCENTAGE OF REJECTIONS UNDER THE NULL HYPOTHESIS IN THE DYNAMIC PANEL MODEL

    (n, T ) (5, 8) (10, 16) (25, 32) (50, 64) (5, 5) (10, 10) (25, 25) (50, 50) (8, 5) (16, 10) (32, 25) (64, 50)

    Level 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5%

    Asymptotic Critical ValuesŴ1(0) �5 �2 1�3 1�1 4�6 2�3 6�2 2�9 �0 �0 �5 �0 2�7 1�9 3�8 1�2 �4 �0 2�6 1�8 4�9 2�5 6�3 3�0Ŵ1(1) �4 �2 1�1 �4 2�5 1�3 4�1 2�4 �0 �0 �4 �3 2�7 1�1 2�0 �7 �2 �1 1�7 �7 3�5 1�2 4�8 2�1Ŵ1(Ĵ0) �5 �2 1�1 �4 2�8 1�1 6�3 3�3 �0 �0 �4 �3 2�3 1�5 3�5 1�8 �4 �1 1�7 �7 4�1 1�7 6�9 3�8Ŵ2(0) �1 �0 1�0 �3 3�4 2�0 5�5 2�4 �0 �0 �1 �0 2�3 1�4 3�0 1�5 �1 �0 1�4 �7 4�3 2�8 5�9 2�9Ŵ2(1) �0 �0 �4 �2 2�6 1�3 3�5 2�0 �0 �0 �0 �0 1�7 �4 1�8 �5 �0 �0 �7 �3 2�4 1�7 4�2 2�2Ŵ2(Ĵ0) �1 �0 �4 �2 3�2 1�1 5�9 2�7 �0 �0 �0 �0 1�2 �7 2�8 �9 �1 �0 �7 �3 2�6 1�3 6�3 3�7K̂1 �6 �3 1�6 �7 3�7 2�2 7�0 3�8 �0 �0 �3 �2 2�8 1�4 3�2 1�5 �4 �2 1�6 �8 4�2 2�1 6�7 3�5K̂2 �1 �0 �8 �2 3�4 2�1 6�7 2�8 �0 �0 �1 �0 1�8 �9 3�3 1�6 �1 �0 �7 �2 3�1 1�7 7�1 3�2BL 16�6 9�8 4�8 1�8 2�1 �3 �3 �0 40�2 31�4 12�8 5�8 2�8 1�0 1�3 �2 56�7 44�2 17�7 10�0 1�9 �5 �6 �2BSY 12�9 6�2 29�8 19�8 99�9 99�7 �0 �0 24�0 16�3 14�2 7�3 96�9 93�1 94�9 94�9 23�3 16�0 14�5 8�1 99�4 98�6 59�0 59�0

    Bootstrapped Critical ValuesŴ1(0) 3�5 1�5 4�5 3�0 12�0 5�5 7�5 4�5 4�5 3�0 7�0 2�0 5�5 2�5 11�0 6�5 3�0 �5 4�0 2�0 5�5 3�5 8�0 4�0Ŵ1(1) 2�0 2�0 6�0 4�0 8�5 4�5 8�5 5�0 4�0 2�5 7�0 3�0 8�0 3�5 8�5 3�0 2�0 �5 4�0 1�5 6�5 3�5 11�0 4�5Ŵ1(Ĵ0) 3�5 1�5 4�5 3�0 12�0 5�5 9�0 6�0 4�5 3�0 7�0 2�0 5�5 2�5 10�0 5�5 3�0 �5 4�0 2�0 5�5 3�5 9�0 4�5Ŵ2(0) 4�0 1�5 6�5 2�5 9�5 6�0 9�0 4�5 8�0 4�5 6�5 1�0 7�5 2�5 9�5 5�0 5�0 3�0 5�0 1�5 6�5 3�0 10�5 6�0Ŵ2(1) 3�5 1�0 9�0 2�5 11�5 6�5 9�0 4�0 6�5 4�0 4�0 2�0 10�0 4�0 10�0 4�0 2�5 1�5 4�5 2�0 6�5 3�0 9�5 6�5Ŵ2(Ĵ0) 4�0 1�5 6�5 2�5 9�5 6�0 9�0 4�0 8�0 4�5 6�5 1�0 7�5 4�0 12�0 5�0 5�0 3�0 5�0 1�5 6�5 3�0 10�0 6�5K̂1 6�0 1�0 6�5 3�0 8�5 3�5 9�0 5�0 4�0 2�5 5�5 2�0 7�0 3�5 10�5 5�5 2�0 �5 3�0 1�5 7�0 3�0 7�5 5�0K̂2 3�5 �5 6�5 3�5 7�5 6�0 8�5 4�5 7�0 4�0 5�5 1�5 8�5 3�0 11�0 6�5 2�5 1�5 3�5 1�5 5�5 3�0 11�0 6�5BL 12�0 4�5 2�0 1�5 2�5 �0 �5 �0 24�0 16�0 11�0 6�0 3�0 1�5 1�5 �0 33�5 20�5 14�0 6�0 3�5 1�5 �5 �0BSY 15�5 8�0 10�0 2�5 15�0 9�0 20�5 9�0 22�0 13�0 11�0 7�5 21�0 11�0 24�0 12�0 19�5 11�0 11�5 5�0 23�0 13�5 30�5 16�0

    Notes: (a) DGP: Yit = 5 + �5Yit−1 + µi + εit �µi ∼ i.i.d. N(0� �4)� and εit ∼ i.i.d. N(0�1). (b) Ŵ1, heteroskedasticity-consistent Franklin wavelet-based test;Ŵ2, heteroskedasticity-corrected Franklin wavelet-based test; Ĵ0� data-driven finest scale. (c) K̂1 , heteroskedasticity-consistent Daniell kernel-based test; K̂2� heteroskedasticity-corrected Daniell kernel-based test. (d) BL, Baltagi–Li test; BSY, Bera, Sosa-Escudero, and Yoon test. (e) 1000 simulation replications; 500 bootstrap replications.

  • 1542Y

    .HO

    NG

    AN

    DC

    .KA

    O

    TABLE III

    PERCENTAGE OF REJECTIONS UNDER THE AR(1) ALTERNATIVE IN THE STATIC PANEL MODEL

    Model 1 Model 2 Model 3

    (n, T ) (5, 8) (10, 16) (25, 32) (50, 64) (5, 8) (10, 16) (25, 32) (50, 64) (5, 8) (10, 16) (25, 32) (50, 64)

    Level 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5%

    Empirical Critical ValuesŴ1(0) 3�6 1�4 14�4 6�1 74�8 62�3 99�9 99�9 28�6 19�8 60�8 47�1 99�6 97�9 99�9 99�9 16�7 10�5 38�6 25�2 93�3 88�7 99�9 99�9Ŵ1(1) 4�3 1�4 9�9 5�8 55�4 42�4 99�9 99�9 24�7 16�3 49�8 35�6 55�4 42�4 99�9 99�9 16�0 10�6 29�9 18�5 83�7 73�6 99�9 99�9Ŵ1(Ĵ0) 4�2 1�4 11�5 6�9 56�4 41�7 99�9 99�8 24�6 16�0 36�9 28�3 90�2 82�8 99�9 99�9 16�1 10�5 25�7 15�9 80�0 67�8 99�9 99�9Ŵ2(0) 3�6 �9 9�3 3�5 68�3 53�7 99�9 99�9 26�3 16�4 59�2 42�2 99�2 97�7 99�9 99�9 14�3 7�2 34�4 19�7 93�7 88�3 99�9 99�9Ŵ2(1) 5�6 1�4 7�2 4�0 46�2 33�3 99�9 99�9 24�5 14�0 47�0 35�2 46�2 33�3 99�9 99�9 15�4 7�2 23�8 15�1 80�0 68�1 99�9 99�9Ŵ2(Ĵ0) 5�6 1�4 10�2 4�9 47�7 37�3 99�9 99�8 24�5 14�0 36�6 25�7 88�3 84�0 99�9 99�9 15�5 7�2 22�1 12�7 75�8 66�6 99�9 99�9K̂1 4�7 1�6 15�7 9�6 72�0 60�0 99�9 99�9 27�0 17�9 50�8 39�2 94�9 90�8 99�9 99�9 16�4 10�3 34�6 24�2 87�9 80�7 99�9 99�9K̂2 3�8 1�5 11�6 5�8 65�3 51�1 99�9 99�9 26�5 16�3 47�6 35�4 94�6 90�5 99�9 99�9 14�9 8�1 28�7 6�9 86�6 77�5 99�9 99�9BL 3�5 1�1 19�9 11�9 97�8 96�0 99�9 99�9 40�1 24�3 83�0 71�4 99�9 99�9 99�9 99�9 17�7 8�3 11�6 61�0 17�4 12�4 13�9 7�4BSY 30�7 20�9 74�8 59�9 99�9 99�9 99�9 99�9 7�9 5�2 9�5 6�3 70�8 59�2 99�9 99�9 11�1 6�2 8�7 4�5 10�7 6�0 12�6 6�3

    Bootstrapped Critical ValuesŴ1(0) 2�5 2�0 9�0 3�0 70�0 56�0 99�5 99�5 21�5 14�5 62�0 46�5 99�5 97�0 99�5 99�5 12�5 7�5 42�0 25�5 91�5 82�5 99�5 99�5Ŵ1(1) 2�5 1�5 7�5 4�0 49�5 37�0 99�5 99�5 18�0 10�0 49�0 33�5 93�0 88�0 99�5 99�5 12�0 7�5 27�0 17�0 79�5 70�0 99�5 99�5Ŵ1(Ĵ0) 2�5 2�0 7�5 4�0 53�5 41�0 99�5 99�5 21�5 14�5 96�0 92�0 99�5 99�5 99�5 99�5 12�5 7�5 42�0 25�5 82�5 71�0 99�5 99�5Ŵ2(0) 4�0 �5 6�5 4�0 64�5 51�5 99�5 99�5 28�5 17�0 58�5 47�0 98�5 97�0 99�5 99�5 14�0 6�5 35�5 24�0 91�5 83�0 99�5 99�5Ŵ2(1) 3�5 1�5 6�0 3�0 44�5 35�0 99�5 99�5 24�5 16�5 48�5 33�5 93�0 88�5 99�5 99�5 14�5 5�0 25�5 16�0 77�0 67�5 99�5 99�5Ŵ2(Ĵ0) 4�0 �5 6�0 3�0 48�0 34�0 99�5 99�5 28�5 17�0 95�5 93�0 99�5 99�5 99�5 99�5 14�0 6�5 35�5 24�0 80�0 71�5 99�5 99�5K̂1 3�0 2�0 9�0 4�0 70�0 55�5 99�5 99�5 22�0 15�0 45�0 34�0 99�5 99�5 99�5 99�5 13�0 6�5 35�0 19�5 84�5 78�5 99�5 99�5K̂2 5�0 1�0 9�0 4�5 70�0 53�0 99�5 99�5 29�5 17�0 51�0 34�5 99�5 99�5 99�5 99�5 4�5 7�0 30�0 20�5 86�5 78�0 99�5 99�5BL 1�5 �0 11�0 4�0 95�0 90�5 99�5 99�5 16�0 8�0 71�5 58�0 99�5 99�5 99�5 99�5 7�0 2�5 10�0 3�5 17�5 11�5 11�5 8�0BSY 20�5 19�5 67�0 66�5 99�5 99�5 99�5 99�5 4�0 4�0 8�0 8�0 99�5 99�5 99�5 99�5 6�5 6�5 6�0 5�5 2�0 2�0 8�0 7�5

    Notes: (a) DGP: Yit = 5 + �5Xit + µi + εit �Xit = �5Xit−1 + ηit �ηit ∼ i.i.d. U[−�5� �5]�µi ∼ i.i.d. N(0� �4)� and Model 1: εit = �2εit−1 + zit , i = 1� � � � �n; Model 2: εit =−�2εit−1 + zit , i = 1� � � � �n; and Model 3: εit = �2εit−1 + zit , i = 1� � � � �n/2 and εit = −�2εit−1 + zit � i = n/2 + 1� � � � �n� where zit ∼ i.i.d. N(0�1). (b) Ŵ1, heteroskedasticity-consistent Franklin wavelet-based test; Ŵ2, heteroskedasticity-corrected Franklin wavelet-based test; Ĵ0� data-driven finest scale. (c) K̂1, heteroskedasticity-consistent Daniellkernel-based test; K̂2� heteroskedasticity-corrected Daniell kernel-based test. (d) BL, Baltagi–Li test; BSY, Bera, Sosa-Escudero, and Yoon test. (e) 200 simulation replications;200 bootstrap replications.

  • SERIAL CORRELATION IN PANEL MODELS 1543

    capturing inhomogeneous serial correlations across individuals. The bottompanel of Table III reports bootstrap power. Due to the extensive computationinvolved, we use 200 bootstrap resamples and 200 replications for bootstrappower. Again, BSY is the best for AR(1)a and BL, Ŵ1, Ŵ2� K̂1, and K̂2 havelow power when T < 32. We also observe that Ŵ1 and Ŵ2 are much morepowerful than K̂1 and K̂2 under AR(1)b for (n�T )= (10�16)� though the ad-vantages diminish as the sample sizes increase. For all AR(1) alternatives, thechoice of J has significant impact on Ŵ1 and Ŵ2� and J = 0 gives the best powerfor Ŵ1 and Ŵ2 against various AR(1) alternatives� The data-driven Ĵ0 deliversreasonable and robust power in all cases.

    The top panel of Table IV reports the Type I error corrected powers againstARMA(12�4) using empirical critical values. All tests have no power whenthe sample size is (10�16) since the effective sample size in fact is (10�4).This is because we remove the first 12 time series observations for each i.Hence the effective samples for the results reported are (10�4), (25�20),(50�52). Ŵ1(Ĵ0) and Ŵ2(Ĵ0) have the best power and dominate K̂1, K̂2� BL,and BSY against all three ARMA(12�4). This is apparently due to the factthat ARMA(12�4) has four sharp spectral peaks, which can be more effec-tively captured by wavelets than kernels. This confirms our prediction thatwavelet-based tests are powerful in capturing spectral modes/peaks in smalland finite samples. The bottom panel of Table IV reports bootstrap powers.Both Ŵ1(Ĵ0) and Ŵ2(Ĵ0) perform the best and clearly dominate K̂1 and K̂2for most cases. We note that the clear dominance of Ŵ1 and Ŵ2 over K̂1 andK̂2 is reduced if bootstrap rather than empirical critical values are used. Thechoice of J has significant impact on the power either using empirical or boot-strap critical values of Ŵ1 and Ŵ2� Data-driven Ĵ0 gives better power thanJ = 0�1� Apparently due to the seasonal patterns of ARMA(12�4), the choiceof J = 0�1 yields little or no power for Ŵ1 and Ŵ2 against ARMA(12�4)b andARMA(12�4)c. In contrast, Ĵ0 is able to adapt to the unknown serial correla-tion pattern and gives robust and high power. This highlights the value of ourdata-driven finest scale Ĵ0.

    We also conduct a simulation study on the power under a dynamic panelmodel (DGP2)� which we do not report in the paper for space. The relativeranking between our tests and other tests under a dynamic model remainssimilar to that under a static model, and in fact the dominance of our wavelet-based tests over the kernel-based tests against ARMA(12�4) error alterna-tives becomes more striking. For example, the Type I error corrected powersof Ŵ1, Ŵ2� K̂1, and K̂2 are 67�3%, 82�8%, 45�7%, and 58�3% respectively when(n�T ) = (10�16). On the other hand, the relative ranking between BL andBSY is reversed: unlike under a static model, BSY now has poor power whileBL is most powerful against a positive or negative (but not mixed) AR(1) erroralternative.

  • 1544Y

    .HO

    NG

    AN

    DC

    .KA

    O

    TABLE IV

    PERCENTAGE OF REJECTIONS UNDER THE ARMA(12, 4) ALTERNATIVE IN THE STATIC PANEL MODEL

    Model 1 Model 2 Model 3

    (n, T ) (10, 16) (25, 32) (50, 64) (10, 16) (25, 32) (50, 64) (10, 16) (25, 32) (50, 64)

    Level 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5%

    Empirical Critical ValuesŴ1(0) �0 �0 18�1 11�9 41�8 33�4 �0 �0 3�8 1�0 7�3 3�9 �0 �0 9�8 5�7 22�0 14�6Ŵ1(1) �0 �0 46�3 46�6 91�5 84�1 �0 �0 �1 �0 �0 �0 �0 �0 10�8 6�0 29�3 19�7Ŵ1(Ĵ0) 1�0 �0 43�2 29�9 96�8 95�2 �1 �0 34�4 22�8 95�5 95�5 �0 �0 37�1 25�8 94�3 93�6Ŵ2(0) �0 �0 16�1 9�5 44�2 31�4 �0 �0 2�7 �7 5�7 2�8 �0 �0 7�9 4�0 22�9 13�2Ŵ2(1) �0 �0 42�0 30�1 91�3 84�1 �0 �0 �0 8�1 �0 �0 �0 �0 8�1 3�5 26�0 20�1Ŵ2(Ĵ0) �0 �0 34�8 26�2 96�8 95�6 �0 �0 16�5 28�9 95�5 94�4 �0 �0 28�9 19�6 94�4 93�9K̂1 �0 �0 35�0 24�1 93�0 90�3 �0 �0 11�6 5�9 95�8 95�0 �0 �0 23�0 24�0 94�8 93�7K̂2 �0 �0 27�7 19�1 94�1 91�5 �0 �0 7�3 3�2 96�1 94�4 �0 �0 16�3 10�2 94�4 93�9BL 2�7 �0 29�1 17�7 19�4 12�3 2�0 �0 8�6 5�0 6�2 3�5 2�0 �0 16�2 9�8 14�7 7�2BSY �0 �0 9�5 4�7 6�6 2�9 �0 �0 33�9 23�7 22�9 14�8 �0 �0 21�0 12�4 17�1 10�3

    Bootstrapped Critical ValuesŴ1(0) 3�0 �5 27�0 16�5 39�5 30�0 3�0 1�0 6�0 1�0 11�5 4�5 4�0 1�0 13�5 5�5 29�0 14�5Ŵ1(1) 2�0 �0 57�0 42�5 92�5 85�5 2�0 �5 �0 �0 �0 �0 2�0 �5 11�5 7�0 33�5 23�5Ŵ1(Ĵ0) 3�0 �5 55�0 42�5 96�5 95�0 3�0 1�0 16�0 9�0 97�5 97�0 4�0 1�0 34�0 25�5 94�5 94�0Ŵ2(0) 5�5 3�0 27�5 18�0 42�0 30�0 9�0 4�5 6�5 3�5 8�0 4�5 10�0 5�5 13�0 6�0 22�5 15�0Ŵ2(1) 3�5 1�0 61�0 48�0 93�5 87�0 6�5 3�5 �0 �0 �0 �0 8�0 2�5 16�0 9�5 30�0 21�5Ŵ2(Ĵ0) 5�5 3�0 60�5 45�5 98�0 96�0 9�0 4�5 15�5 10�5 97�0 97�0 10�0 5�5 36�0 27�5 94�5 92�5K̂1 1�5 �0 47�0 33�5 92�0 90�0 1�5 �0 21�0 12�5 97�5 97�5 1�5 �5 28�0 19�5 95�5 94�0K̂2 3�0 1�0 50�5 37�5 92�5 89�5 5�5 3�0 23�0 15�5 97�5 97�5 6�0 2�0 37�0 24�0 95�5 93�5BL 1�0 �5 17�0 8�5 16�5 10�5 1�5 �5 4�5 1�5 7�0 3�0 1�0 �5 8�0 3�5 12�5 6�5BSY �5 �5 �0 �0 1�0 �5 1�0 1�0 5�0 5�0 8�5 7�5 �5 �5 1�5 1�5 5�0 5�0

    Notes: (a) DGP: Yit = 5 + �5Xit + µi + εit �Xit = �5Xit−1 + ηit �ηit ∼ i.i.d. U[−�5� �5]�µi ∼ i.i.d. N(0� �4)� and Model 1: εit = −�3εit−2 + zit + zit−4, i = 1� � � � �n; Model 2:εit = �3εit−2 + zit + zit−4 , i = 1� � � � �n; and Model 3: εit = −�3εit−2 + zit + zit−4 , i = 1� � � � �n/2 and εit = �3εit−2 + zit + zit−4 , i = n/2 + 1� � � � �n� where zit ∼ i.i.d. N(0�1).(b) Ŵ1, heteroskedasticity-consistent Franklin wavelet-based test; Ŵ2, heteroskedasticity-corrected Franklin wavelet-based test; Ĵ0� data-driven finest scale. (c) K̂1 ,heteroskedasticity-consistent Daniell kernel-based test; K̂2� heteroskedasticity-corrected Daniell kernel-based test. (d) BL, Baltagi–Li test; BSY, Bera, Sosa-Escudero, and Yoontest. (e) 200 simulation replications; 200 bootstrap replications.

  • SERIAL CORRELATION IN PANEL MODELS 1545

    8. CONCLUSION

    We have proposed a class of generally applicable wavelet-based consistenttests for serial correlation in static and dynamic panel models. Wavelets arepowerful for detecting serial correlation where the spectrum has peaks orkinks. The new tests have a convenient limit N(0�1) distribution, which is notaffected by parameter estimation uncertainty, even if regressors contain laggeddependent variables or deterministic/stochastic trending variables. Our testsdo not require an alternative model, and are consistent against serial corre-lation of unknown form even in the presence of substantial inhomogeneity inserial correlation across individuals. They are applicable to unbalanced het-erogeneous panel models with one way or two way error components. A data-driven method is developed to select the smoothing parameter—the finestscale in wavelet estimation, making our tests entirely operational in practice.Simulation shows that our tests perform well in finite samples.

    Dept. of Economics and Dept. of Statistical Science, Cornell University, 424Uris Hall, Ithaca, NY 14850, U.S.A., and Department of Economics, TsinghuaUniversity, Beijing 100084, China; [email protected]

    andDept. of Economics and Center for Policy Research, Syracuse University, 426

    Eggers Hall, Syracuse, NY, U.S.A.; [email protected].

    Manuscript received October, 2000; final revision received February, 2004.

    APPENDIX A: MATHEMATICAL APPENDIX

    To prove Theorems 1–6 and Corollary 1, we will use the following lemma.

    LEMMA A.1: Suppose Assumptions 1 and 2 hold, and let bJi (h�m) be as in (3.15). Then for anyJi� Ti ∈ Z+ and a bounded constant C that does not depend on i, Ji and Ti�

    (i) bJi (h�m) is real-valued, bJi (0�m)= bJi (h�0)= 0, and bJi (h�m)= bJi (m�h);(ii)

    ∑Ti−1h=1

    ∑Ti−1m=1 h

    υ|bJi (h�m)| ≤ C2(1+υ)(Ji+1) for 0 ≤ υ≤ 12 ;(iii)

    ∑Ti−1h=1 [

    ∑Ti−1m=1 |bJi (h�m)|]2 ≤ C2(Ji+1);

    (iv)∑Ti−1

    h1=1∑Ti−1

    h2=1[∑Ti−1

    m=1 |bJi (h1�m)bJi(h2�m)|]2 ≤ C(Ji + 1)2Ji+1;(v) |∑Ti−1h=1 bJi (h�h)− (2Ji+1 − 1)| ≤ C[(Ji + 1)+ 2Ji+1(2Ji+1/Ti)(2τ−1)]� where τ is in Assump-

    tion 2;(vi) |∑Ti−1h=1 ∑Ti−1m=1 b2Ji (h�m)− 2(2Ji+1 − 1)| ≤ C[(Ji + 1)2 + 2Ji+1(2Ji+1/Ti)(2τ−1)];

    (vii) sup1≤h�m≤Ti−1 |bJi (h�m)| ≤ C(Ji + 1);(viii) sup1≤h≤Ti−1

    ∑Ti−1m=1 |bJi (h�m)| ≤ C(Ji + 1).

    PROOF OF LEMMA A.1: This lemma extends Lee and Hong (2001, Lemma A.1), who con-sider the case where Ji ≡ J → ∞ as Ti ≡ T → ∞� See Hong and Kao (2002) for a detailedproof. Q.E.D.

    PROOF OF THEOREM 1: We shall show Theorems A.1–A.3.

  • 1546 Y. HONG AND C. KAO

    THEOREM A.1: Let α̂ijk and ᾱijk be defined as in (3.12) and (6.3), and VnT ≡ ∑ni=1 σ8i Vi0, whereVi0 is as in (3.15). Then V

    −1/2nT

    ∑ni=1 2πTi

    ∑Jij=0

    ∑2jk=1(α̂

    2ijk − ᾱ2ijk)

    p→ 0.

    THEOREM A.2: Put MnT ≡ ∑ni=1σ4i Mi0� where Mi0 is as in (3.14). Then V −1/2nT (∑ni=1 2πTi ×∑Jij=0

    ∑2jk=1 ᾱ

    2ijk −MnT ) d→ N(0�1).

    THEOREM A.3: Let M̂ and V̂ be defined as in (3.15). Then V −1/2nT (M̂ − MnT )p→ 0 and

    V̂ /VnTp→ 1.

    PROOF OF THEOREM A.1: Because α̂2ijk− ᾱ2ijk = (α̂ijk− ᾱijk)2 +2(α̂ijk− ᾱijk)ᾱijk� we shall showPropositions A.1 and A.2 below under the conditions of Theorem 1. Q.E.D.

    PROPOSITION A.1: V −1/2nT∑n

    i=1 2πTi∑Ji

    j=0∑2j

    k=1(α̂ijk − ᾱijk)2 =OP [V −1/2nT + (n−1 + T−1)V 1/2nT ].

    PROPOSITION A.2: V −1/2nT∑n

    i=1 2πTi∑Ji

    j=0∑2j

    k=1(α̂ijk − ᾱijk)ᾱijkp→ 0.

    PROOF OF PROPOSITION A.1: By the definition of v̂it in (2.3), we have v̂it = ṽit − X̃ ′it (β̂− β)�where X̃it and ṽit are as in Assumption 5. We note that under H0� {vit} in (2.4) coincides with thetrue errors {εit} in (2.1), and so is i.i.d. for each i� and {vit} and {vls} are independent for all i �= land all t� s�

    Putting R̃i(h)≡ T−1 ∑Tt=|h|+1 ṽit ṽit−|h|, we writeR̂i(h)− R̃i(h) = (β̂−β)′Γ̃ixx(h)(β̂−β)− (β̂−β)′Γ̃ixv(h)− (β̂−β)′Γ̃ivx(h)(A.1)

    ≡3∑c=1ξ̂ci(h)�

    where Γ̃ixx(h)≡ T−1i∑Ti

    t=|h|+1 X̃it X̃′it−|h|� and Γ̃ixv(h) and Γ̃ivx(h) are as in Assumption 5.

    Next, recalling the definition of R̄i(h) as in (6.3), we can write

    R̃i(h)− R̄i(h) = T−1iTi∑

    t=|h|+1

    (−v̄i·vit − v̄i·ṽit−|h| − vit v̄·t−|h| − v̄·t ṽit−|h| + v̄vit + v̄ṽit−|h|)(A.2)

    ≡9∑c=4ξ̂ci(h)�

    Given (A1) and (A2), we have α̂ijk − ᾱijk = (2π)−1/2 ∑9c=1 ∑Ti−1h=1−Ti ξ̂ci(h)Ψ̂jk(h)� It follows thatn∑i=1

    2πTiJi∑j=0

    2j∑k=1(α̂ijk − ᾱijk)2 ≤ 25

    9∑c=1

    {n∑i=1Ti

    Ji∑j=0

    2j∑k=1

    [Ti−1∑h=1−Ti

    ξ̂ci(h)Ψ̂jk(h)

    ]2}(A.3)

    ≡ 259∑c=1Âc�

    We shall show that V −1/2nT Âcp→ 0 for 1 ≤ c ≤ 9� We first consider Â1� From (A.1), we have

    |ξ̂1i(h)| ≤ ‖β̂ − β‖2‖Γ̃ixx(h)‖ ≤ ‖β̂ − β‖2‖Γ̃ixx(0)‖� Let bJi (h�m) be defined as in Lemma A.1.

  • SERIAL CORRELATION IN PANEL MODELS 1547

    Then we have

    V−1/2nT |Â1| = V −1/2nT

    ∣∣∣∣∣n∑i=1Ti

    Ti−1∑h=1

    Ti−1∑m=1

    bJi (h�m)ξ̂1i(h)ξ̂1i(m)

    ∣∣∣∣∣≤ V −1/2nT ‖β̂−β‖4

    [n∑i=1Ti

    Ti−1∑h=1

    Ti−1∑m=1

    b2Ji (h�m)

    ]1/2[ n∑i=1T 3i ‖Γ̃ixx(0)‖4

    ]1/2

    = OP(n−3/2)�given Lemma A.1(vi), VnT ≤ C∑ni=1 2Ji+1, Assumptions 3–5, Ti ≤ CT� and σ2i ∈ [c�C].

    Next, we consider the second term Â2 in (A.3). Recalling Γixv(h)≡ p limTi→∞ Γ̃ixv(h)� we haveξ̂2i(h)= (β̂−β)′Γixv(h)+ (β̂−β)′[Γ̃ixv(h)− Γixv(h)]� It follows that

    Â2 ≤ 2‖β̂−β‖2n∑i=1Ti

    Ji∑j=0

    2j∑k=1

    {∥∥∥∥∥T−1∑h=1−T

    Γixv(h)Ψ̂jk(h)

    ∥∥∥∥∥2

    (A.4)

    +∥∥∥∥∥

    T−1∑h=1−T

    [Γ̃ixv(h)− Γixv(h)]Ψ̂jk(h)∥∥∥∥∥

    2}

    ≡ 2‖β̂−β‖2M̂1 + 2‖β̂−β‖2M̂2� say�We now consider M̂1 in (A.4). Let Λxvijk ≡

    ∫ π−π fixv(ω)Ψjk(ω)dω, where fxv(ω) ≡ (2π)−1 ×∑∞

    h=−∞ Γxv(h)e−ijω� Then Λxvijk = (2π)−1/2

    ∑∞h=−∞ Γixv(h)Ψ̂jk(h) by Parseval’s identity, and∑Ti−1

    h=1−Ti Γixv(h)Ψ̂jk(h) = (2π)1/2Λxvijk +∑

    |h|≥Ti Γixv(h)Ψ̂ij(h)� It follows by the Cauchy–Schwarzinequality that

    M̂1 ≤ 2n∑i=1

    2πTiJi∑j=0

    2j∑k=1

    ‖Λxvijk‖2 + 2n∑i=1Ti

    ∑|h|≥Ti

    ‖Γixv(h)‖2Ji∑j=0

    2j∑k=1

    ∑|h|≥Ti

    |Ψ̂jk(h)|2(A.5)

    = O(nT)+ o[nT(2J̄ /T )2τ] =O(nT)given 2J̄ /T → 0� where J̄ ≡ max1≤i≤n(Ji). Here, we have used the facts that (a) ∑Jij=0 ×∑2j

    k=1 ‖Λxvijk‖2 = (2π)−1∑∞

    h=−∞ ‖Γixv(h)‖2 ≤ C by Parseval’s identity and Assumption 5;(b)

    ∑|h|≥Ti ‖Γixv(h)‖2 = o(T−1) given Assumption 5 and Ti = ciT ≥ cT ; and (c)

    ∑Jij=0

    ∑2jk=1 ×∑

    h≥|Ti | |Ψ̂jk(h)|2 ≤∑Ji

    j=0∑

    |h|≥Ti |ψ̂(2πh/2j )|2 ≤ C22τJi /T 2τ−1i given Assumption 2.For the second term M̂2 in (A.4), we have

    ‖β̂−β‖2M̂2(A.6)

    ≤ ‖β̂−β‖2n∑i=1Ti

    Ti−1∑h=1

    Ti−1∑m=1

    |bJi (h�m)|‖Γ̃ixv(h)− Γixv(h)‖‖Γ̃ixv(m)− Γixv(m)‖

    =OP[(nT )−1

    n∑i=1

    Ti−1∑h=1

    Ti−1∑m=1

    |bJi (h�m)|]

    =OP [(nT )−1VnT ]given Lemma A.1(ii), VnT ≤ C∑Ti=1 2Ji+1, and Assumption 5. Combining (A.4)–(A.6) yieldsV −1/2nT Â2 =OP(V −1/2nT + (nT )−1V 1/2nT )� Similarly, we have V −1/2nT Â3 =OP(V −1/2nT + (nT )−1V 1/2nT )�

  • 1548 Y. HONG AND C. KAO

    Now we consider the term Â4 in (A.3). By the Cauchy–Schwarz inequality and the fact thatunder H0� {vit} coincides with {εit} and so is i.i.d. with Ev8it ≤ C for each i, we have E(v̄2i·|T−1i ×∑Ti

    t=h+1 vit ||T−1i∑Ti

    t=m+1 vit |) ≤ CT−2i for h�m > 0� It follows from Markov’s inequality,Lemma A.1(ii), and VnT ≤ C∑ni=1 2Ji+1 that

    V −1/2nT Â4 ≤ V −1/2nTn∑i=1Ti

    Ti−1∑h=1

    Ti−1∑m=1

    |bJi (h�m)|v̄2i·∣∣∣∣∣T−1i

    Ti∑t=h+1

    vit

    ∣∣∣∣∣∣∣∣∣∣T−1i

    Ti∑t=m+1

    vit

    ∣∣∣∣∣(A.7)= OP(T−1V 1/2nT )�

    Similarly, we have V −1/2nT Â5 =OP(T−1V 1/2nT )�Next, for the term Â6 in (A.3), noting that vit and v̄·t−h are independent for h > 0 under H0�

    we have E(|v̄·t−hv̄·t−m||T−1i∑Ti

    t=h+1 vit ||T−1i∑Ti

    t=m+1 vit |) ≤ C(nTi)−1 for h�m > 0 by the Cauchy–Schwarz inequality and Ev8it ≤ C� It follows that V −1/2nT Â6 = OP(n−1V 1/2nT )� Similarly, we haveV −1/2nT Â7 =OP(n−1V 1/2nT )�

    Finally, given E(v̄2|T−1i∑Ti

    t=h+1 vit ||T−1i∑Ti

    t=m+1 vit |) ≤ Cn−1T−2i for h�m > 0 under H0� wehave V −1/2nT Âc =OP [(nT )−1V 1/2nT ] for c = 8�9� We have shown V −1/2nT Âc

    p→0 for all 1 ≤ c ≤ 9 givenmax1≤i≤n 22(Ji+1)/(n2 + T)→ 0� Proposition A.1 then follows from (A.3). Q.E.D.

    PROOF OF PROPOSITION A.2: Recalling α̂ijk − ᾱijk = (2π)−1/2 ∑9c=1 ∑Ti−1h=1−Ti ξ̂ci(h)Ψ̂jk(h), wecan write

    n∑i=1

    2πTiJi∑j=0

    2j∑k=1(α̂ijk − ᾱijk)ᾱijk =

    9∑c=1

    {n∑i=1Ti

    Ji∑j=0

    2j∑k=1

    [Ti−1∑h=1−Ti

    ξ̂ci(h)Ψ̂jk(h)

    ]ᾱijk

    }(A.8)

    ≡9∑c=1δ̂c �

    We shall show V −1/2nT δ̂cp→ 0 for 1 ≤ c ≤ 9� First, we have

    V−1/2nT |δ̂1 + δ̂8 + δ̂9| ≤ V −1/2nT (Â1 + Â8 + Â9)1/2

    (n∑i=1

    2πTiJi∑j=0

    2j∑k=1ᾱ2ijk

    )1/2(A.9)

    = OP [n−3/4V 1/4nT + (VnT /nT )1/2]�

    where V −1nT∑n

    i=1 2πTi∑Ji

    j=0∑2j

    k=1 ᾱ2ijk = OP(1) by Lemma A.1(v) and Eᾱ2ijk ≤ CT−1i ×∑Ti−1−1

    h=1−Ti |ψ̂jk(2πh)|2 .Next, we consider the second term δ̂2 in (A8). We write

    δ̂2 = (β̂−β)′n∑i=1Ti

    Ji∑j=0

    2j∑k=1

    [Ti−1∑h=1−Ti

    Γixv(h)Ψ̂jk(h)(A.10)

    +Ti−1∑h=1−Ti

    [Γ̃ixv(h)− Γixv(h)]Ψ̂jk(h)]ᾱijk

    ≡ (β̂−β)′M̂3 + (β̂−β)′M̂4� say.

  • SERIAL CORRELATION IN PANEL MODELS 1549

    For the first term M̂3� noting that {ᾱijk} is a zero-mean sequence independent across i� we obtain

    EM̂23 =n∑i=1T 2i E

    ∥∥∥∥∥Ti−1∑h=1

    Ti−1∑m=1

    bJi (h�m)Γixv(h)R̄i(m)

    ∥∥∥∥∥2

    ≤n∑i=1σ4i Ti

    [Ti−1∑h=1

    ‖Γixv(h)‖]2

    sup1≤h≤Ti−1

    [Ti−1∑m=1

    b2Ji (h�m)

    ]2

    = O[T

    n∑i=1(Ji + 1)2

    ]=O(TVnT )

    given Assumption 5, Lemma A.1(viii), and VnT ≤ C∑ni=1 2Ji+1. It follows that V −1/2nT (β̂ −β)′M̂3 = OP(n−1/2) by Chebyshev’s inequality� For the second term M̂4 in (A.10)� we haveV

    −1/2nT |(β̂ − β)′M̂4| ≤ V −1/2nT ‖β̂− β‖M̂1/22 (

    ∑ni=1 2πTi

    ∑Jij=0

    ∑2jk=1 ᾱ

    2ijk)

    1/2 = OP [(nT )−1V 1/2nT ]� whereM̂2 = OP [(nT )−1VnT ] as shown in (A.6). It follows from (A.10) that V −1/2nT δ̂2 = OP(n−1/2 +(nT )−1V 1/2nT )� Similarly, we have V

    −1/2nT δ̂3 =OP(n−1/2 + (nT )−1V 1/2nT )�

    We now consider δ̂4 in (A.8). Write δ̂4 = ∑ni=1 Ti∑Ti−1h=1 ∑Ti−1m=1 bJi (h�m)ξ̂4i(h)R̄i(m)� UsingLemma A.1(ii) and VnT ≤ C∑ni=1 2Ji+1, we can obtain

    Eδ̂24 =n∑i=1

    n∑l=1TiTl

    Ti−1∑h1=1

    Ti−1∑h2=1

    Ti−1∑m1=1

    Ti−1∑m2=1

    bJi (h1�m1)bJl (h2�m2)

    ×E[ξ̂4i(h1)R̄i(m1)ξ̂4l(h2)R̄l(m2)]

    ≤ CT−1n∑i=1

    [Ti−1∑h=1

    Ti−1∑m=1

    |bJi (h�m)|]2

    +CT−2[

    n∑i=1

    Ti−1∑h=1

    Ti−1∑m=1

    |bJi (h�m)|]2

    ≤ C(2J̄ /T )VnT +CT−2V 2nTwhere the first inequality follows from the facts that (a) |E[ξ̂4i(h1)R̄i(m1)ξ̂4i(h2)R̄i(m2)]| ≤CT−3/2i T

    −3/2l , and (b) for i �= l� |E[ξ̂4i(h1)R̄i(m1)ξ̂4l(h2)R̄l(m2)]| ≤ CT−2i T−2l � which can be

    shown by exploiting the facts that under H0, {vit} coincides with {εit}� and so is i.i.d. withE(v8it ) ≤ C for each i� and {vit} and {vlt } are mutually independent for i �= l. Hence, we haveV

    −1/2nT δ̂4 =OP(2J̄2/T 1/2 + V 1/2nT /T ) by Chebyshev’s inequality� Similarly, we can obtain V −1/2nT δ̂5 =OP(2J̄/2/T 1/2 + V 1/2nT /T )�

    Next, we consider δ̂6� Write δ̂6 = ∑ni=1 Ti∑Ti−1h=1 ∑Ti−1m=1 bJi (h�m)ξ̂6i(h)R̄i(m)� where ξ̂6i(h) =T−1i

    ∑Tit=h+1 vit v̄·t−h as in (A.2). Then using Lemma A.1(ii) and VnT ≤ C

    ∑ni=1 2

    Ji+1, we can obtain

    Eδ̂26 =n∑i=1

    n∑l=1TiTl

    Ti−1∑h1=1

    Ti−1∑h2=1

    Ti−1∑m1=1

    Ti−1∑m2=1

    bJi (h1�m1)bJl (h2�m2)

    ×E[ξ̂6i(h1)R̄i(m1)ξ̂6l(h2)R̄l(m2)]

    ≤ Cn−1n∑i=1

    [Ti−1∑h=1

    Ti−1∑m=1

    |bJi (h�m)|]2

    +Cn−2[

    n∑i=1

    Ti−1∑h=1

    Ti−1∑m=1

    |bJi (h�m)|]2

    ≤ C(2J̄ /T )VnT +Cn−2V 2nT

  • 1550 Y. HONG AND C. KAO

    where for the first inequality, we have used the facts that (a) |E[ξ̂6i(h1)R̄i(m1)ξ̂6i(h2)R̄i(m2)]|≤ CT−2i n−1; (b) for i �= l� |E[ξ̂6i(h1)R̄i(m1)ξ̂6l(h2)R̄l(m2)]| ≤ CT−1i T−1l n−2� which can be shownby exploiting the i.i.d. property of {vit} and the independence between {vit} and {vlt} for i �= lunder H0 via tedious algebra. It follows by Chebyshev’s inequality and 2J̄ /n→ 0 that V −1/2nT δ̂6 =OP(2J̄/2/T 1/2 + V 1/2nT /n)

    p→ 0� Similarly, we have V −1/2nT δ̂7 =OP(2J̄/2/T 1/2 + V 1/2nT /n)p→ 0� We have

    shown V −1/2nT δ̂cp→ 0 for 1 ≤ c ≤ 9 given max1≤i≤n 22(Ji+1)/(n2 + T) → 0� Proposition A.2 then

    follows from (A.8). Q.E.D.

    PROOF OF THEOREM A.2: Recalling the definition of ᾱijk in (6.3), we can write∑n

    i=1 2πTi ×∑Jij=0

    ∑2jk=1 ᾱ

    2ijk =

    ∑ni=1 Ti

    ∑T−1h=1

    ∑T−1m=1 bJi (h�m)R̄i(h)R̄i(m)≡

    ∑ni=1(Âi + B̂1i − B̂2i − B̂3i)� where

    Âi ≡ 2T−1iTi−1∑h=1

    Ti−1∑m=1

    bJi (h�m)

    Ti∑t=2

    t−1∑s=1vitvit−jvisvis−m (by symmetry of bJi (·� ·))�

    B̂1i ≡ T−1iTi−1∑h=1

    Ti−1∑m=1

    bJi (h�m)

    Ti∑t=1v2itvit−hvit−m�

    B̂2i ≡ T−1iT−1∑h=1

    T−1∑m=1

    bJi (h�m)

    h∑t=1

    Ti∑s=m+1

    vitvit−hvisvis−m�

    B̂3i ≡ T−1iT−1∑h=1

    T−1∑m=1

    bJi (h�m)

    Ti∑t=1

    m∑s=1vitvit−hvisvis−m�

    Note again that under H0, {vit} coincides with {εit }� and so is i.i.d. for each i� and {vit} and {vls}are independent for i �= l and all t� s. Q.E.D.

    PROPOSITION A.3: V −1/2nT (∑n

    i=1 2πTi∑Ji

    j=0∑2j

    k=1 ᾱ2ijk −MnT )= V −1/2nT

    ∑ni=1 Âi + oP(1).

    Next, we decompose Âi into the terms with t − s > qi and t − s ≤ qi , for some integer qi ∈ Z+:

    Âi = 2T−1iTi−1∑h=1

    Ti−1∑m=1

    bJi (h�m)

    (Ti∑

    t=qi+2

    t−qi−1∑s=1

    +Ti∑t=2

    t−1∑s=max(t−qi�1)

    )vitvit−hvisvis−m(A.11)

    ≡ B̂i + B̂4i�Furthermore, we decompose

    B̂i = 2T−1i(

    qi∑h=1

    qi∑m=1

    +qi∑h=1

    n−1∑m=qi+1

    +n−1∑

    h=qi+1

    n−1∑m=1

    )bJi (h�m)

    Ti∑t=qi+2

    t−qi−1∑s=1

    vitvit−hvisvis−m(A.12)

    ≡ Ûi + B̂5i + B̂6i�

    PROPOSITION A.4: Suppose Assumptions 1 and 2 hold, 22J̄ /T → 0� qi ≡ qi(Ti) → ∞,qi/2Ji → ∞, and q2i /Ti → 0� where J̄ ≡ max1≤i≤n(Ji)� If {vit} is i.i.d. for each i� then V −1/2nT

    ∑ni=1 Âi =

    V −1/2nT∑n

    i=1 Ûi + oP(1).

    PROPOSITION A.5: Under the conditions of Proposition A.4, V −1/2nT∑n

    i=1 Ûid→ N(0�1).

  • SERIAL CORRELATION IN PANEL MODELS 1551

    Propositions A.3–A.5 and the Slutsky theorem imply Theorem A.2� We now prove Proposi-tions A.3–A.5.

    PROOF OF PROPOSITION A.3: Recall the definition of MnT in Theorem A.2. We obtain

    n∑i=1

    (2πTi

    Ji∑j=0

    2j∑k=1ᾱ2ijk −MnT

    )=

    n∑i=1Âi +

    n∑i=1(B̂1i − σ4i Mi0)−

    n∑i=1B̂2i −

    n∑i=1B̂3i�

    We shall show (a) V −1/2nT (∑n

    i=1 B̂1i − MnT )p→ 0; (b) V −1/2nT

    ∑ni=1 B̂2i

    p→ 0; and (c) V −1/2nT ×∑ni=1 B̂3i

    p→ 0�(a) Observe that B̂1i has a structure similar to B̂1n in Lee and Hong (2001). Following Lee

    and Hong’s (2001) reasoning and using Lemma A.1(ii), we can obtain that for each i and for Tisufficiently large,

    E(B̂1i −EB̂1i)2 ≤ CT−1i[Ti−1∑h=1

    Ti−1∑m=1

    |bJi (h�m)|]2

    ≤ C2(2J̄ /T )Ti−1∑h=1

    Ti−1∑m=1

    |bJi (h�m)|�

    Because {B̂1i} is a random sequence independent across i and ∑ni=1EB̂1i = MnT � we haveE(

    ∑ni=1 B̂1i − MnT )2 =

    ∑ni=1E(B̂1i − EB̂1i)2 = O(VnT2J̄ /T ) given Lemma A.1(ii) and VnT ≤

    C∑n

    i=1 2Ji+1�Hence, by Chebyshev’s inequality and 22J̄ /T → 0� we have V −1/2nT (

    ∑ni=1 B̂1i−MnT )=

    OP [(2J̄ /T )1/2] = oP(1).(b) Next, we consider B̂2i� Following Lee and Hong (2001), we have EB̂22i ≤ CT−1i ×

    [∑Ti−1h=1 ∑Ti−1m=1 |bJi (h�m)|]3� Then by the fact that B̂2i is a zero-mean random sequence inde-pendent across i, Lemma A.1(ii), and VnT ≤ C∑ni=1 2Ji+1, we have E(∑ni=1 B̂2i)2 = ∑ni=1EB̂22i =O[(22J̄ /T )VnT ]� Hence, V −1/2nT

    ∑ni=1 B̂2i

    p→ 0 by Chebyshev’s inequality and 22J̄ /T → 0�(c) By reasoning similar to (b)� we can obtain V −1/2nT B̂3i

    p→ 0� Q.E.D.

    PROOF OF PROPOSITION A.4: Given (A.11) and (A.12), we have Âi = Ûi + B̂4i + B̂5i + B̂6i� Itsuffices to show V −1/2nT

    ∑ni=1 B̂ci

    p→ 0 for c = 4�5�6� (a) We first consider B̂4i in (A.11)� From Leeand Hong (2001, Proof of Theorem 1), we have for each i and for Ti sufficiently large,

    EB̂24i ≤C(qi/Ti)[Ti−1∑h=1

    Ti−1∑m=1

    |bJi (h�m)]2

    ≤ C2(q̄2J̄ /T )Ti−1