[g. s. maddala, in-moo kim] unit roots, cointegrat(bookzz.org)

524

Upload: hclitl

Post on 16-Dec-2015

62 views

Category:

Documents


4 download

DESCRIPTION

G. S. Maddala, In-Moo Kim Unit Roots, Cointegration

TRANSCRIPT

  • Unit Roots, Cointegration, and Structural Change

    Time series analysis has undergone many changes inrecent years with the advent of unit roots andcointegration. Maddala and Kim present a comprehensivereview of these important developments and examinestructural change. The volume provides an analysis ofunit root tests, problems with unit root testing,estimation of cointegration systems, cointegration tests,and econometric estimation with integrated regressors.The authors also present the Bayesian approach to theseproblems and bootstrap methods for small-sampleinference. The chapters on structural change discuss theproblems of unit root tests and cointegration understructural change, outliers and robust methods, theMarkov switching model, and Harvey's structural timeseries model. Unit Roots, Cointegration, and StructuralChange is a major contribution to Themes in ModernEconometrics, of interest both to specialists andgraduate and upper-undergraduate students.

    G. S. MADDALA is University Eminent Scholar at theOhio State University and one of the most distinguishedeconometricians writing today. His many acclaimedpublications include Limited Dependent and QualitativeVariables in Econometrics (Cambridge, 1983) andEconometrics (McGraw-Hill, 1977) and Introduction toEconometrics (MacMillan, 1988, 1992).

    IN-MOO KIM is Professor of Economics at Sung KyunKwan University, Seoul, Korea.

  • UNIT ROOTSCOINTEGRATION

    AND STRUCTURAL CHANGE

    G. S. MaddalaThe Ohio State University

    In-Moo KimSung Kyun Kwan University

    CAMBRIDGEUNIVERSITY PRESS

  • CAMBRIDGE UNIVERSITY PRESSCambridge, New York, Melbourne, Madrid, Cape Town, Singapore, Sao Paulo

    Cambridge University PressThe Edinburgh Building, Cambridge CB2 8RU, UK

    Published in the United States of America by Cambridge University Press, New York

    www.cambridge.orgInformation on this title: www.cambridge.org/9780521582575

    Cambridge University Press 1998

    This publication is in copyright. Subject to statutory exceptionand to the provisions of relevant collective licensing agreements,no reproduction of any part may take place without the written

    permission of Cambridge University Press.

    First published 1998Sixth printing 2004

    A catalogue record for this publication is available from the British Library

    ISBN 978-0-521-58257-5 hardbackISBN 978-0-521-58782-2 paperback

    Transferred to digital printing 2007

  • To my parents

    G. S. Maddala

    To Jong Han, Jung Youn, and So Youn

    In-Moo Kim

  • Contents

    Figures page xiiTables xiiiPreface xvii

    Part I Introduction and basic concepts 1

    1 Introduction 3References 6

    2 Basic concepts 82.1 Stochastic processes 82.2 Some commonly used stationary models 112.3 Box-Jenkins methods 172.4 Integrated variables and cointegration 202.5 Spurious regression 282.6 Deterministic trend and stochastic trend 292.7 Detrending methods 322.8 VAR, ECM, and ADL 342.9 Unit root tests 372.10 Cointegration tests and ECM 392.11 Summary 41

    References 42

    Part II Unit roots and cointegration 45

    3 Unit roots 473.1 Introduction 473.2 Unit roots and Wiener processes 493.3 Unit root tests without a deterministic trend 603.4 DF test with a linear deterministic trend 65

    vn

  • viii Contents

    3.5 Specification of deterministic trends 723.6 Unit root tests for a wide class of errors 743.7 Sargan-Bhargava and Bhargava tests 823.8 Variance ratio tests 863.9 Tests for TSP versus DSP 873.10 Forecasting from TS versus DS models 893.11 Summary and conclusions 92

    References 92

    4 Issues in unit root testing 984.1 Introduction 984.2 Size distortion and low power of unit root tests 1004.3 Solutions to the problems of size and power 1034.4 Problem of overdifferencing: MA roots 1164.5 Tests with stationarity as null 1204.6 Confirmatory analysis 1264.7 Frequency of observations and power of unit root tests 1294.8 Other types of nonstationarity 1314.9 Panel data unit root tests 1334.10 Uncertain unit roots and the pre-testing problem 1394.11 Other unit root tests 1404.12 Median-unbiased estimation 1414.13 Summary and conclusions 145

    References 146

    5 Estimation of cointegrated systems 1555.1 Introduction 1555.2 A general CI system 1555.3 A two-variable model: Engle-Granger methods 1565.4 A triangular system 1605.5 System estimation methods 1655.6 The identification problem 1735.7 Finite sample evidence 1755.8 Forecasting in cointegrated systems 1845.9 Miscellaneous other problems 1875.10 Summary and conclusions 191

    References 191

    6 Tests for cointegration 1986.1 Introduction 1986.2 Single equation methods: residual-based tests 198

  • Contents ix

    6.3 Single equation methods: ECM tests 2036.4 Tests with cointegration as null 2056.5 Multiple equation methods 2116.6 Cointegration tests based on LCCA 2226.7 Other tests for cointegration 2266.8 Miscellaneous other problems 2286.9 Of what use are cointegration tests? 2336.10 Conclusions 241

    References 242

    7 Econometric modeling with integrated regressors 2497.1 1(1) regressors not cointegrated 2497.2 1(1) regressors cointegrated 2507.3 Unbalanced equations 2517.4 Lagged dependent variables: the ARDL model 2527.5 Uncertain unit roots 2547.6 Uncertain unit roots and cointegration 2567.7 Summary and conclusions 258

    References 258

    Part III Extensions of the basic model 2618 The Bayesian analysis of stochastic trends 2638.1 Introduction to Bayesian inference 2648.2 The posterior distribution of an autoregressive parameter 2668.3 Bayesian inference on the Nelson-Plosser data 2688.4 The debate on the appropriate prior 2718.5 Classical tests versus Bayesian tests 2778.6 Priors and time units of measurement 2778.7 On testing point null hypotheses 2788.8 Further comments on prior distributions 2848.9 Bayesian inference on cointegrated systems 2878.10 Bayesian long-run prediction 2908.11 Conclusion 291

    References 292

    9 Fractional unit roots and fractional cointegration 2969.1 Some definitions 2969.2 Unit root tests against fractional alternatives 2989.3 Estimation of ARFIMA models 3009.4 Estimation of fractionally cointegrated models 302

  • Contents

    9.59.6

    1010.110.210.310.410.510.610.710.8

    1111.111.211.311.4

    1212.112.212.312.412.512.612.712.812.912.1012.11

    Part1313.113.213.3

    Empirical relevance of fractional unit rootsSummary and conclusionsReferencesSmall sample inference: bootstrap methodsIntroductionA review of the bootstrap approachThe AR(1) modelBootstrapping unit root testsThe moving block bootstrap and extensionsIssues in bootstrapping cointegrating regressionsMiscellaneous other applicationsConclusionsReferencesCointegrated systems with 1(2) variablesDetermination of the order of differencingCointegration analysis with 1(2) and 1(1) variablesEmpirical applicationsSummary and conclusionsReferencesSeasonal unit roots and seasonal cointegrationEffect of seasonal adjustmentSeasonal integrationTests for seasonal unit rootsThe unobserved component modelSeasonal cointegrationEstimation of seasonally cointegrated systemsEmpirical evidencePeriodic autoregression and periodic integrationPeriodic cointegration and seasonal cointegrationTime aggregation and systematic samplingConclusionReferences

    IV Structural changeStructural change, unit roots, and cointegrationTests for structural changeTests with known break pointsTests with unknown break points

    303305306

    309309309322325328332335336336

    342342348355358359

    362364365366371375376378379381381382383

    387

    389390390391

  • Contents xi

    13.4 A summary assessment 39813.5 Tests for unit roots under structural change 39913.6 The Bayesian approach 40213.7 A summary assessment of the empirical work 40713.8 Effect of structural change on cointegration tests 41013.9 Tests for structural change in cointegrated relationships 41113.10 Miscellaneous other issues 41413.11 Practical conclusions 416

    References 41814 Outliers and unit roots 42514.1 Introduction 42514.2 Different types of outliers in time series models 42514.3 Effects of outliers on unit root tests 42814.4 Outlier detection 43714.5 Robust unit root tests 44014.6 Robust estimation of cointegrating regressions 44514.7 Outliers and seasonal unit roots 44814.8 Conclusions 448

    References 44915 Regime switching models and structural time series models 45415.1 The switching regression model 45415.2 The Markov switching regression model 45515.3 The Hamilton model 45715.4 On the usefulness of the MSR model 46015.5 Extensions of the MSR model 46315.6 Gradual regime switching models 46615.7 A model with parameters following a random walk 46915.8 A general state-space model 47015.9 Derivation of the Kalman filter 47215.10 Harvey's structural time series model (1989) 47515.11 Further comments on structural time series models 47715.12 Summary and conclusions 479

    References 47916 Future directions 486

    References 488Appendix 1 A brief guide to asymptotic theory 490Author index 492Subject index 500

  • Figures

    2.1 Correlogram of an AR(2) model 162.2 Examples of two AR(1) processes with a drift 232.3 The variances of xt and yt 242.4 The autocorrelations of Xt and yt 252.5 Cointegrated and independent 1(1) variables 272.6 ARIMA(O,1,1) and its components 313.1 Random walk and step function 538.1 Marginal posterior distributions of p when p = 1 272

    xn

  • Tables

    2.1 Regression of integrated variables 323.1 Critical values for Dickey-Fuller tests 643.2 Asymptotic distributions of the t-ratios for different DGPs 713.3 Critical values for the Schmidt-Phillips LM test 853.4 Nelson and Plosser's results 894.1 Critical values of DFm a x statistics 1124.2 Critical values for the Elliott-Rothenberg-Stock DF-GLS

    test 1144.3 Critical values for the Hwang-Schmidt DF-GLS test (t-test) 1164.4 Critical values for the KPSS test 1224.5 Quantiles of the LS estimator in an AR(1) model with drift

    and trend 1436.1 Critical values for the ADF ^-statistic and Zt 2006.2 Critical values for the Za 2006.3 Response surface estimates of critical values 2016.4 Critical values for the Harris and Inder test 2106.5 Quantiles of the asymptotic distribution of the Johansen's

    LR test statistics 2136.6 Critical values of the LCCA-based tests 2247.1 Features of regressions among series with various orders of

    integration 2528.1 Posterior probabilities for the Nelson-Plosser data 27012.1 Critical values for seasonal unit roots in quarterly data 36912.2 Critical values for seasonal unit roots in monthly data 37113.1 Asymptotic critical values for the diagnostic test 412

    xin

  • Nothing is so powerful as an idea whose time has come.Victor Hugo

    The Gods love the obscure and hate the obvious.

    Brihadaranyaka Upanishad

    Undue emphasis on niceties is a disease to which persons with mathe-matical training are especially prone.

    G. A. Barnard, "A Comment on E. S. Pearson's Paper,"Biometrika, 1947, 34, 123-128.

    Simplicity, simplicity, simplicity! I say, let your affairs be as two or three,and not a hundred or a thousand. Simplify, simplify.

    H. D. Thoreau: Walden

  • Preface

    The area of unit roots, cointegration, and structural change has been anarea of intense and active research during the past decade. Developmentshave been proceeding at a fast pace. However, almost all the books aretechnically oriented and do not bring together the different strands ofresearch in this area. Even if many new developments are going to takeplace, we thought it is time to provide an overview of this area for thebenefit of empirical as well as theoretical researchers. Those who aredoing empirical research will benefit from the comprehensive coverageof the book. For those who are doing theoretical research, particularlygraduate students starting on their dissertation work, the present bookwill provide an overview and perspective of this area. It is very easyfor graduate students to get lost in the intricate algebraic detail of aparticular procedure and lose sight of the general framework their workfits in to.

    Given the broad coverage we have aimed at, it is possible that we havemissed several papers. This is not because they are not important butbecause of oversight and/or our inability to cover too many topics.

    To keep the book within reasonable length and also to provide access-ibility to a broad readership, we have omitted the proofs and derivationsthroughout. These can be found by interested readers in the paperscited. Parts of the book were used at different times in graduate coursesat the University of Florida, the Ohio State University, Caltech, StateUniversity of New York at Buffalo, and Sung Kyun Kwan University inKorea.

    We would like to thank Adrian Pagan, Peter Phillips, and an anony-mous referee for many helpful comments on an earlier draft. Thanks arealso due to professor Chul-Hwan Kim of Ajou University, Young Se Kimof Sung Kyun Kwan University, and Marine Carrasso of the Ohio State

    xvn

  • xviii PrefaceUniversity for their helpful comments. Responsibility for any remainingerrors is ours. We would also like to thank Patrick McCartan at theCambridge University Press for his patience in the production of thisbook.

    G. S. MaddalaThe Ohio State University, U.S.A.

    In-Moo KimSung Kyun Kwan University, Seoul, Korea

  • PartiIntroduction and basic concepts

    This part consists of two chapters. Chapter 1 is just an outline of thebook. Chapter 2 introduces the basic concepts: stochastic processes; sta-tionarity, the different kinds of commonly used stationary models (MA,AR, ARMA), Box-Jenkins methods; integrated variables and cointegra-tion; spurious regression; deterministic and stochastic trend; detrendingmethods; VAR, ECM, ADL models; tests for unit root and cointegration.

    All these topics are pursued in subsequent chapters and some of thestatements made here (regarding unit root and cointegration tests) arequalified in subsequent chapters. The point here is to explain what allthese terms mean.

  • 1Introduction

    During the last decade, the econometric literature on unit roots andcointegration has literally exploded. The statistical theory relating tothe first order autoregressive processes where the autoregressive param-eter is equal to one (unstable process) and greater than one (explosiveprocess) was developed by Anderson (1959), White (1958, 1959), andRao (1961) (see Puller (1985) for a review). However, the econometricliterature on unit roots took off after the publication of the paper byNelson and Plosser (1982) that argued that most macroeconomic serieshave unit roots and that this is important for the analysis of macroeco-nomic policies.

    Similar is the story on cointegration. Yule (1926) suggested that re-gressions based on trending time series data can be spurious. This prob-lem of spurious regressions was further pursued by Granger and Newbold(1974) and this also led to the development of the concept of cointegra-tion (loosely speaking, lack of cointegration means spurious regression).Again, the pathbreaking paper by Granger (1981), first presented at aconference at the University of Florida in 1980, did not catch fire untilabout five years later, and now the literature on cointegration has ex-ploded. As for historical antecedents, Hendry and Morgan (1989) arguethat Prisch's concept of multicollinearity in 1934 can be viewed as aforerunner of the modern concept of cointegration.

    The recent developments on unit roots and cointegration have changedthe way time series analysis is conducted. Of course, the publicationof the book by Box and Jenkins (1970) changed the methods of timeseries analysis, but the recent developments have formalized and madesystematic the somewhat ad hoc methods in Box and Jenkins. Moreover,the asymptotic theory for these models has been developed.

    Traditionally, the analysis of time series consisted of a decomposition

  • 4 Introduction

    of the series into trend, seasonal, and cyclical components. We can writethe time series Xt as

    xt = Tt + St + Ctwhere Tt = trend, St= seasonal, and Ct = the remaining componentwhich we might call the cyclical component. The trend and seasonalwere first removed and then the residual was explained by an elaboratemodel. The trend, which is a long-term component was considered tobe beyond the scope of explanation, and so was the seasonal. Attentionthus focused entirely on the short-term dynamics as described by Ct>

    The earliest approaches to trend removal consisted of regressing thetime series on t (if the trend is considered linear) and a polynomialof t (if the trend is considered to be nonlinear). Elaborate methodshave been devised for the removal of the seasonal. These are describedin Hylleberg (1992), but one common method used in the analysis ofseasonally unadjusted data was to use seasonal dummies (see Lovell,1963).

    By contrast to the regression methods for trend removal and sea-sonal adjustment, the methods suggested in Box and Jenkins (1970)consisted of removing the trend and seasonal by successive differencing.Define

    Axt = xt - xt-iA2xt = AAxt = xt- 2xt-i + xt-2

    = Xt - Xt-4

    xt - xt_i2

    A linear trend is removed by considering Axt, a quadratic trend by con-sidering A2xt. With quarterly data the seasonal is removed by consid-ering Axt. With monthly data the seasonal is removed by considering

    Thus, there are two approaches to the removal of trend and seasonal:

    (i) regression methods,(ii) differencing methods.

    During the 1980s there were two major developments. The first was adiscussion of these two methods in a systematic way and deriving teststo determine which is more appropriate for a given series. The secondmajor development was the argument that trend and seasonal containimportant information and that they are to be explained rather than

  • Introduction 5

    removed. Methods have been devised to estimate long-run economicrelationships. If we are considering two time series, say xt and yt, thenthe trend in xt may be related to the trend in yt (common trends) and theseasonal in xt may be related to the seasonal in yt (common seasonals).Thus modeling the trend and the seasonal should form an integratedpart of the analysis of the time series, rather than a concentration onthe short-run dynamics between xt and yt.

    Two questions also arose in this context. If one is also considering theproblem of determining long-run relationships, how long a time seriesdo we need to consider? Is it 20 years or 100 years and is having 240monthly observations better than having 20 yearly observations? Also,if we are considering a span of 100 years, would the parameters in theestimated relationships be stable over such a long period? This is theproblem of structural change. These are the issues that will be coveredin this book. In the following chapters these problems are discussed indetail.

    The book is divided into four parts.

    Part I: Introduction and basic conceptsThis consists of this chapter and chapter 2:

    Chapter 2 introduces the basic concepts of ARM A models, unit rootsand cointegration, spurious regression, Vector Autoregression (VAR),and Error Correction Models (ECM).

    Part II: Unit roots and cointegrationThis consists of chapters 3 to 7:

    Chapter 3 discusses the different unit root tests. It also has an intro-duction to Wiener processes that will repeatedly be used in the rest ofthe book.

    Chapter 4 discusses issues relating to the power of unit root tests,tests using stationarity as the null, tests for MA unit roots, LM tests forunit roots, and other important problems with unit root testing.

    Chapter 5 discusses the different methods of estimation of cointegratedsystems.

    Chapter 6 discusses different tests for cointegration. These use nocointegration as the null. The chapter also covers tests using cointegra-tion as the null.

    Chapter 7 discusses the issues that arise in modeling with integratedregressors.

  • 6 Introduction

    Part III: Extensions of the basic model and alternative ap-proaches to inference

    This consists of chapters 8 to 12:Chapter 8 is on Bayesian analysis of unit roots and cointegration and

    the Bayesian approach to model selection.Chapter 9 is on fractional unit roots and fractional cointegration.Chapter 10 is on bootstrap methods which are alternatives to asymp-

    totic inference in the preceding chapters.Chapter 11 extends the analysis of the previous chapters to the case

    of 1(2) variables. It discusses issues of testing 1(2) versus 1(1) and 1(1)versus 1(2) as well as modeling systems with 1(2), 1(1) and 1(0) variables.

    Chapter 12 is devoted to the analysis of seasonal data, tests for sea-sonal unit roots, tests for seasonal cointegration, and estimation of sea-sonally cointegrated systems.

    Part IV: Structural changeChapters 13 to 15 are devoted to analysis of structural change. Specifi-cally, they discuss the effects of structural change on unit root tests andcointegration tests:

    Chapter 13 discusses structural change and unit roots.Chapter 14 is on outlier problems and robust estimation methods. It

    discusses the effects of different types of outliers on unit root tests, androbust estimation methods in the presence of outliers.

    Chapter 15 discusses regime switching models and structural timeseries models.

    Finally chapter 16 presents some avenues for further research.Throughout the book, emphasis is on the intuition behind the different

    procedures and their practical usefulness. The algebraic detail is omittedin most cases because interested readers can refer to the original workcited. We have tried to emphasize the basic ideas, so that the book willbe of guidance to empirical researchers.

    ReferencesAnderson, T.W. (1959), "On Asymptotic Distributions of Estimates

    of Parameters of Stochastic Difference Equations," Annals ofMathematical Statistics, 30, 676-687.

    Box, G.E.P. and G.M. Jenkins (1970), Time Series Analysis Forecastingand Control, Holden-Day, San Francisco.

  • References 7Fuller, W.A. (1985), "Nonstationary Autoregressive Time Series,"

    Handbook of Statistics, 5, North-Holland Publishing Co., Ams-terdam, 1-23.

    Granger, C.W.J. (1981), "Some Properties of Time Series Data andTheir Use in Econometric Model Specification," Journal ofEconometrics, 16, 121-130.

    Granger, C.W.J. and P. Newbold (1974), "Spurious Regression inEconometrics," Journal of Econometrics, 2, 111-120.

    Hendry, D.F. and M.S. Morgan (1989), "A Re-Analysis of ConfluenceAnalysis," Oxford Economic Papers, 44, 35-52.

    Hylleberg, S. (ed.)(1992), Modeling Seasonality, Oxford UniversityPress, Oxford.

    Lovell, M.C. (1963), "Seasonal Adjustment of Economic Time Seriesand Multiple Regression Analysis," Journal of American Statis-tical Association, 58, 993-1010.

    Nelson, C.R. and C.I. Plosser (1982), "Trends and Random Walks inMacroeconomic Time Series," Journal of Monetary Economics,10, 139-162.

    Rao, M. M. (1961), "Consistency and Limit Distributions of Estimatorsof Parameters in Explosive Stochastic Difference Equations,"Annals of Mathematical Statistics, 32, 195-218.

    White, J.S. (1958), "The Limiting Distribution of the Serial Correla-tion Coefficient in the Explosive Case," Annals of MathematicalStatistics, 29, 1188-1197.

    (1959), "The Limiting Distribution of the Serial Correlation Coeffi-cient in the Explosive Case II," Annals of Mathematical Statis-tics, 30, 831-834.

    Yule, G.U. (1926), "Why Do We Sometimes Get Nonsense CorrelationsBetween Time Series? A Study in Sampling and the Nature ofTime Series," Journal of Royal Statistical Society, 89, 1-64.

  • 2Basic concepts

    The purpose of this chapter is to introduce several terms that will beused repeatedly in the subsequent chapters, and to explain their mean-ing. The terms included are: stationarity, ARMA models, integratedvariables, Box-Jenkins methods, unit roots, cointegration, determinis-tic and stochastic trends, spurious regression, spurious periodicity andtrend, vector autoregression (VAR) models, and error correction models(ECMs).

    2.1 Stochastic processesProm a theoretical point of view a time series is a collection of randomvariables {Xt}. Such a collection of random variables ordered in time iscalled a stochastic process. The word stochastic has a Greek origin andmeans pertaining to chance. If it is a continuous variable, it is customaryto denote the random variable by X(i), and if t is a discrete variable, itis customary to denote them by Xt. An example of continuous randomvariables X(t) is the recording of an electrocardiogram. Examples of dis-crete random variables Xt are the data of unemployment, money supply,closing stock prices, and so on. We will be considering discrete processesonly, and so we shall use the notation Xt or X(t) interchangeably.

    The probability structure of the sequence of random variable {Xt}is determined by the joint distribution of a stochastic process. Thequestion arises, however, since T (time) is commonly an infinite set,whether we need an infinite dimensional distribution to define the prob-ability structure of the stochastic process. Kolmogorov (1933) showedthat when the stochastic process satisfies certain regularity conditionsthe stochastic process can be described by a finite dimensional distribu-tion. That is, under certain conditions the probabilistic structure of the

  • 2.1 Stochastic processes 9

    stochastic process {Xt} is completely specified by the joint distributionF(Xtl,... ,Xtn) for all values of n (a positive integer) and any subset(1,.. . , tn) of T. One of the regularity conditions is the symmetry, thatreshuffling the ordering of the index does not change the distribution.The other is the compatibility that the dimensionality of the joint dis-tribution can be reduced by marginalization.

    Since the definition of a stochastic process by the joint distributionis too general, it is customary to define the stochastic process in termsof the first and second moments of the variable, Xt. Given that, fora specific t, Xt is a random variable, we can denote its distributionand density functions by F(Xt) and f(Xt) respectively. The parametricfamily of densities are determined by the following the first and thesecond moments:

    Mean /it = E(Xt)Variance of = var(Xt)Autocovariance 7*1,t2 = cov(Xt1,Xt2)

    The distribution of a stochastic process is characterized by the first andthe second moments, and they are both functions of t. Note that ifXt follows a normal distribution, the distribution of Xt is completelycharacterized by the first and the second moments, which is called aGaussian process.

    The fact that the unknown parameters //, of, 7t1? t2 change witht presents us with a difficult problem. There are (finite but still) toomany parameters to be estimated. However, we have just a sample ofsize 1 on each of the random variables. For example, if we say thatthe unemployment rate at the end of this week is a random variable,we have just one observation on this particular random variable in aweek. There is no way of getting another observation, so we have whatis called a single realization. This feature compels us to specify somehighly restrictive models for the statistical structure of the stochasticprocess. Given a single realization, we need to reduce the number ofparameters and the question is how to reduce the number of parametersVu h 7*i, t2-

    Reducing the number of parameters to be estimated can be done byimposing certain restrictions. Restrictions come in two forms:

    (i) stationarity: restrictions on the time heterogeneity of the process,(ii) asymptotic independence: restrictions on the memory of the pro-

    cess.

  • 10 Basic concepts

    These two restrictions reduce the numbers of parameters to be estimatedand also facilitate the derivation of asymptotic results. We shall discussstationarity here but omit discussion of asymptotic independence. Thiscan be found in Spanos (1986, chapter 8).

    Stationarity A time series is said to be strictly stationary if the jointdistribution of Xtl,..., Xtn is the same as the joint distribution ofXtl+T,.. .,Xtn+T for all ti,... , n, and r. The distribution of the sta-tionary process remains unchanged when shifted in time by an arbitraryvalue r. Thus the parameters which characterize the distribution of theprocess do not depend on t, but on the lag r. The concept of stationar-ity is difficult to verify in practice because it is defined in terms of thedistribution function. For this reason the concept of stationarity definedin terms of moments is commonly preferred.

    A stochastic process {Xt,t E T} is said to be Ith-order stationary iffor any subset (i, t2,..., tn) of T and any r the joints moments are

    where l\ H \-ln < I. Let us take positive integers for h,l2,... ,ln and/. When I = 1, i.e., h = 1

    E(Xt) = E(Xt+T) = V> (a constant)the process {Xt} is said to be first-order stationary. When I = 2, thepossible cases are (/i = l,/2 = 0), (/i = 2, l2 = 0), and (l\ = l ,/2 = 1).According to three cases the process {Xt} has its joint moments asfollows

    E(Xt) = E(Xt+T) = {i (a constant)E{Xl) = E(Xt+T) = a2 (a constant)

    cov(Xtl,Xt2) = cov(Xtl+T,Xt2+T) = jtut2 = lrwhere t\ t2 = r. The mean and variance of Xt are constant and thecovariances of Xt depend only on the lag or interval r = t\ t2, not ont\ or t2. The process Xt is said to be second-order stationary.

    Second-order stationarity is also called weak or wide-sense or covari-ance stationarity. In modeling time series, second-order stationarity isthe most commonly used form of stationarity. This is partly due to thefact that for a normal (or Gaussian) stationary process, second-orderstationarity is equivalent to strict stationarity. If Xt follow a multivari-ate normal distribution, since the multivariate normal distribution is

  • 2.2 Some commonly used stationary models 11

    completely characterized by the first and second moments, the two con-cepts of strict stationarity and weak stationarity are equivalent (recallthe case of n = 2). For other distributions this is not so.

    In order to see how stationarity reduces the number of parameters,let us consider a Gaussian process {Xt} and the parameters 6t for thesubset (i,...,n) of T. Without the assumption of stationarity thejoint distribution of the process {Xt} is characterized by the vector ofparameters

    6= [jj,ti,cov(Xti,Xtj)] for ij = l,...,rawhich is a n + n

    2+ x

    1 vector. By imposing stationarity, as we haveseen above, the vector of parameters is reduced to

    6= [/i,a2,7r]which is a (n + 1) x 1 vector. A sizeable reduction in the number ofthe unknown parameters is resulted in by imposing stationarity. Note,however, that even in the case of stationarity the number of parameters(especially 7T) increases as r increases, i.e., as the size of T increases.This is the reason why we need more restriction - the memory of theprocess which is about the meaningful size of r regardless of the size ofT.

    2.2 Some commonly used stationary modelsWe shall now discuss some commonly used stationary processes. Weshall denote the autocovariance function by acvf and the autocorrelationfunction by acf.

    2.2.1 Purely random processThis is a discrete process Xt consisting of a sequence of independentidentically distributed (iid) random variables. It has a constant meanand constant variance. Its acvf is given by

    and the acf is given by1 ifr = O

  • 12 Basic concepts

    A purely random process is also called a white noise. A white-noiseprocess is a second-order stationary process and has no memory. If Xtis also assumed to be normal, then the process is strictly stationary.

    2.2.2 Moving-average (MA) processesSuppose that {et} is a purely random process with mean zero and vari-ance a2. Then a process {Xt} defined by

    Pl6t-1 + is called a moving-average process of order q and is denoted by MA(q).Since the es are unobserved variables, we scale them so that po = 1.Since E{et) = 0 for all t, we have E(Xt) = 0. And et are independentwith a common variance a2. Further, writing out the expressions for Xtand Xt-r in terms of the es and picking up the common terms (sincethe es are independent), we get

    ) for r > qAlso considering cov(Xt, Xt+T), we get the same expressions as for 7(T).Hence 7(T) = 7(T). The ac/ can be obtained by dividing 7(7") byvar(Xt). For the MA process, p(r) = 0 for T > q, that is, they are zerofor lags greater than the order of the process. Since 7(7") is independentof , the MA(q) process is weakly stationary. Note that no restrictionson the Pi are needed to prove the stationarity of the MA process.

    To facilitate our notation we shall use the lag operator L. It is definedby VXt = Xt-j for all j . Thus LXt = Xt-uL2Xt = Xt-2,L~lXt =X^-fi, and so on. With this notation the MA(q) process can be writtenas (since /?0 = 1 )

    Xt = (1 4- PiL + p2L2 + + pqLq)et = P(L)etThe polynomial in L has q roots and we can write

    Xt = (1 - TTIL)(1 - n2L)... (1 - 7TqL)etwhere TTI, TT2, . . . , TT^ are the roots of the equation

    After estimating the model we can calculate the residuals from et =[/^(L)]-1^ provided that [/3(L)]-1 converges. This condition is calledthe invertibility condition. The condition for invertibility is that \iti\ < 1

  • 2.2 Some commonly used stationary models 13

    for all i.1 This implies that an MA(q) process can be written as anAR(oo) process uniquely.

    For instance, for the MA(2) processXt = (l + ftL + ftL2)et (2.1)

    TTI and 7T2 are roots of the quadratic equation z2 + P\z + /?2 = 0. Thecondition |-7r | < 1 gives

    - f t ^ ft2 - 4ft

    This gives the result that ft and ft must satisfyft + ft > -1ft - ft > -1 (2.2)

    Iftl < 1The last condition is derived from the fact that /?2 = 7TI7T2, the productof the roots. The first two conditions are derived from the fact that ifPi - 4/?2 > 0, then ft - 4/?2 < (2 + /?i)2 or (3\ - 4/?2 < (2 - /?i)2. Underthe condition (2.2) the MA(2) process (2.1) can be written as an AR(oo)uniquely.

    Moving-average processes arise in econometrics mostly through trendelimination methods. One procedure often used for trend elimination isthat of successive differencing of the time series Xt. If we have

    Xt = ao + ait + c*2t2 + et

    where et is a purely random process, successive differencing of Xt willeliminate the trend but the resulting series is a moving-average processthat can show a cycle. Thus the trend-eliminated series can show a cycleeven when there was none in the original series. This phenomenon ofspurious cycles is known as the Slutsky effect (Slutsky, 1937).

    2.2.3 Autoregressive (AR) processesSuppose again that et is a purely random process with mean zero andvariance a2. Then the process Xt given by

    Xt = a i X t - i + a 2 X t _ 2 + + otvXt-v + et (2.3)1 An alternative statement often found in books on time series is that the roots ofthe equation 1 + f3\z + foz2 + + (3qzq = 0 all lie outside the unit circle.

  • 14 Basic concepts

    is called an autoregressive process of order p and is denoted by AR(p).Since the expression is like a multiple regression equation, it is calledregressive. However, it is a regression of Xt on its own past values.Hence it is autoregressive.

    In terms of the lag operator L, the AR process (2.3) can be writtenas

    (1 - aiL - a2L2 apLp)Xt = et (2.4)or

    1Xt =

    - a2L2 apD>) l1

    et( l -7r iL)( l -7r 2 L)- - ( l -7r p L)where TTI, TT2, . . . , irp are the roots of the equation

    z v - v ~ x 0

    The condition is that the expansion of (2.4) is valid and the variance ofXt is finite, that is \TTI\ < 1 for all i.

    To find the acvf, we could expand (2.3), but the expressions are messy.An alternative procedure is to assume that the process is stationary andsee what acf are. To do this we multiply equation (2.3) throughoutby Xt-T, take expectations of all the terms and divide throughout byvar(Xt), which is assumed finite. This gives us

    p(r) = aip{r - 1) + + aTp(r - p)Substituting r = 1,2,... ,p and noting p(r) = p(r) we get equations todetermine the p parameters ai , a2, . . . , ap. These equations are knownas the Yule-Walker equations.

    To illustrate these procedures we will consider an AR(2) processXt = Oi\Xti -h cx2Xt2 ~\~ i

    TTI and n2 are the roots of the equation

    z2 OL\Z a2 = 0

    Thus |7Ti| < 1 implies that

  • 2.2 Some commonly used stationary models 15

    This gives

    ai + OL2 < 1OL\ a2 > 1

    < 1 (2.5)(The conditions are similar to the conditions (2.2) derived for the in-vert ibility of the MA(2) process.)

    In the case of the AR(2) process we can also obtain the p(r) recursivelyusing the Yule-Walker equations. We know that p(0) = 1 and

    1 oc2Thus

    2p(2) = aip( l ) + OL2p{$$) = - h oc2

    \ - OL2

    = aip(2) + a2p(l) = -^-r1 ~ H

    and so on.As an example, consider the AR(2) process

    Here OL\ = 1.0 and a2 = 0.5. Note that conditions (2.5) for weakstationarity are satisfied. However, since a\ + 4a2 < 0 the roots arecomplex and p(r) will be a sinusoidal function. A convenient method tocompute p(r) is to use the Yule-Walker equations

    p[r) = p(r - 1) - 0.5p(r - 2)Note that p(0) = 1 and p(l) = a i / ( l - a2) = 0.6666. We then havethe autocorrelations for r = 2,3 , . . . , 13 recursively. This method can beused whether the roots are real or complex. Figure 2.1 shows a plot ofthis correlogram.

    2.2.4 Autoregressive moving-average (ARMA) processesWe will now discuss models that are combinations of the AR and MAmodels. These are called ARM A models. An ARMA(p, q) model isdefined as

    Xt = axXt-i + a2Xt-2 + + apXt-p + et 4- /?iet-i + + Pqet-

  • 16 Basic concepts

    Lags

    Fig. 2.1. Correlogram of an AR(2) model

    where St is a purely random process with mean zero and variance a2.The motivation for these methods is that they lead to parsimoniousrepresentations of higher-order AR(oo) or MA(oo) processes.

    Using the lag operator L, we can write this as

    4>(L)Xt = 0{L)etwhere

  • 2.3 Box-Jenkins methods YJ

    In terms of the lag operator L this can be written as

    or

    Xt ~

    = [1 + (a + 0)L + a(a + /?)2 + a2(a + /?)L3 + ]*Since {*} is a purely random process with mean zero and variance a2we get

    varyJit) = [i -\- ycx -\- /J) -\- ct {a, -f- p) -f-'' '\cr

    = 11 + I - a 2 J I - a 2Also

    cov(Xt, X t_i) = [(a + f3)+ a(a + (3)2 + a3(a + f3)2

    (a + /?)(! + a/3)= T^

    Hence= cov(XX) (

    var(Xt) 1 + (32 -Successive values of p(r)can be obtained from the recurrence relationp(r) = ap(r 1) for r > 2. For the AR(1) process p(l) = a. It canbe verified that p(l) for ARMA(1,1) process is > or < a depending onwhether (3 > 0 or < 0, respectively.

    2.3 BoxJenkins methodsThe Box-Jenkins method is one of the most widely used methodologiesfor the analysis of time series data. The influential work of Box andJenkins (1970) shifted professional attention away from the stationaryserially correlated deviations from deterministic trend paradigm towardthe ARIMA(p,d,q) paradigm. It is popular because of its generality;it can handle any series, stationary or not, with or without seasonalelements, and it has well-documented computer programs. It is perhapsthe last factor that contributed most to its popularity. Although Box

  • 18 Basic concepts

    and Jenkins have been neither the originators nor the most importantcontributors in the field of ARMA models (for earlier discussion, seeQuenouille, 1957), they have popularized these models and made themreadily accessible to everyone, so much so that ARMA models are oftenreferred to as Box-Jenkins models.

    The basic steps in the Box-Jenkins methodology consist of the follow-ing five steps.

    1. Differencing to achieve stationarity How do we conclude whe-ther a time series is stationary or not? We can do this by studying thegraph of the correlogram of the series. The correlogram of a stationaryseries drops off as r, the number of lags, becomes large, but this is notthe case of a nonstationary series. Thus the common procedure is toplot the correlogram of given series yt and successive differences Ay,A2?/, and so on, and look at the correlograms at each stage. We keepdifferencing until the correlogram dampens.

    2. Identification of a tentative model Once we have used the differ-encing procedure to get a stationary time series, we examine the correlo-gram to decide on the appropriate orders of the AR and MA components.The correlogram of a MA process is zero after a point; that of an ARprocess declines geometrically. The correlograms of ARMA processesshow different patterns (but all dampen after a while). Based on these,one arrives at a tentative ARMA model. This step involves more of ajudgmental procedure than the use of any clear-cut rules.

    3. Estimation of the model The next step is the estimation of thetentative ARMA model identified in step 2. The estimation of AR mod-els is straightforward. We estimate them by ordinary least squares (OLS)by minimizing the error sum of squares, E2. In the case of MA models,we cannot write the error sum of squares Es2 as simply a function ofthe observed ys and the parameters as in the model. What we can dois to write down the covariance matrix of the moving-average error and,assuming normality, use the maximum likelihood method of estimation.An alternative procedure suggested by Box and Jenkins is the grid-searchprocedure. In this procedure we compute it by successive substitutionfor each value of the MA parameters and choose the set of values of theparameters that minimizes the error sum of squares E2. For ARMAmodels, again the problem is with the MA component, either use ML

  • 2.3 Box-Jenkins methods 19

    methods or use the grid-search procedure for the MA component. Ans-ley (1979) provides an algorithm for the exact likelihood of the mixedautoregressive moving-average process.

    4. Diagnostic checking When an AR, MA, or ARMA model hasbeen fitted to a given time series, it is advisable to check that the modeldoes really give an adequate description of the data. There are twocriteria often used that reflect the closeness of fit and the number ofparameters estimated. One is the Akaike Information Criterion (AIC)and the other is Schwartz Bayesian Information Criterion (BIC). If p isthe total number of parameters estimated, we have

    AIC(p) = nlog

  • 20 Basic concepts

    ernative Lagrangian multiplier (LM) test statistics, see Maddala (1992,pp. 540-542).

    5. Forecasting Suppose that we have estimated the model with nobservations. We want to forecast xn+k- This is called a A;-periodsahead forecast. First we need to write out the expression for xn+k-And then replace all future values xn+j(0 < j < k) by their forecastsand en+j(j > 0) by zero (since its expected value is zero). We alsoreplace all n-j(j > 0) by the predicted residuals. For example, fromthe ARMA(2,2) model

    Xt =

    we have the expression for the /c-period ahead forecast

    When k = 2, we replace xn by the observed xn and #n+i by the forecasts.The forecasted unknown error n+2 and n+i are replaced by zero, whileen is replaced by the predicted residual

    it = Zt Oi\Zt~\ OL2Zt-2

    where zt = (1 OL\L a ^ I / 2 ) " 1 ^ .

    2.4 Integrated variables and cointegrationIn time series analysis we do not confine ourselves to the analysis ofstationary time series. In fact, most of the time series we encounter arenonstationary.

    Consider the following processes

    yt = yt-i + vt

    The error terms Ut and vt are assumed to be normally independentlyidentically distributed with mean zero and unit variance, Ut,vt ~ tin(0,1), i.e., a purely random processes. Both Xt and yt are AR(1) models.The difference between two models is that yt is a special case of an Xtprocess when p = 1 and is called a random walk model. It is alsoreferred to as an AR(1) model with a unit root since the root of theAR(1) equation is 1 (or unit). When we consider the statistical behaviorof the two processes by investigating the mean (the first moment), the

  • 2.4 Integrated variables and cointegration 21

    variance and autocovariance (the second moments), they are completelydifferent. Although the two processes belong to the same AR(1) class,Xt is a stationary process, while yt is a nonstationary process.

    The two stochastic processes can be expressed as the sum of the initialobservation and the errors by successive substitution

    Xt = pXt-i + Ut = p(pXt-2 + Ut-l) + Ut = . . .= pfx0 + ut + put-i H h p*" 1 ^

    t-i

    = pfxo + ^ 2 9%ut-%2=0

    Similarly, in the unit root caset-i

    2=0

    Since both series are expressed as the sum of the initial observationand the errors, it can be said that the autoregressive model has beentransformed to the moving-average form.

    Suppose that the initial observations are zero, x0 = 0 and yo = 0.The means of the two processes are

    E{xt) = 0 and E(yt) = 0and the variances are

    *-! . ivar(xt) = 22p2lvar(ut-.i) > -

    2=0 ^

    where means converge to asymptotically (as t > oo) andt-i

    var(yt) = ^2var(vt-i) = *2=0

    The autocovariances of the two series are

    [ t - l t+T-l2=0 2=0

    t+T-l

    / J2=0

    and

    7^ = E(ytyt+T) = Et-l

    _ 2=0

    t+r1

    2=0= (*-r)

  • 22 Basic concepts

    since the errors are assumed to be iin and cov(ut,us) = 0,t ^ s. Themeans of Xt and yt are the same, but the variances (including autocovari-ances) are different. The important thing to note is that the variancesand the autocovariance of yt are functions of t, while those of xt con-verge to a constant asymptotically. Thus as t increases the variance ofyt increases, while the variance of Xt converges to a constant.

    The above example shows that the two processes Xt and yt have dif-ferent statistical properties. The variance of the stationary stochasticprocess Xt converges to a constant, while the variance of the randomwalk process yt increases as t increases. Now if we add a constant to theAR(1) model, then the means of two processes also behave differently.Consider the AR(1) processes with a constant (or drift) as follows

    xt = a + pxt-i + ut, \p\ < 1yt = a + yt-i + vt

    The successive substitution yieldst t

    i=0 i=0

    andt-i

    4- / 4- \"^ (2 6)i=0

    Note that yt series contains a (deterministic) trend t. If the initial ob-servations are zero, xo = 0 and yo = 0, then the means of two processesare

    E(yt) = atbut the variances and the autocovariances are the same as those derivedfrom the AR(1) model without the constant. By adding a constant tothe AR(1) processes, the means of two processes as well as the variancesare different. Both the mean and variances of yt are time varying, whilethose of xt are constant.

    To illustrate these properties, we generate the 150 observations ofxt and yt with a = 0.5 and p = 0.9. The innovations ut and vt aregenerated by using a pseudo-random number generator. Figure 2.2 showthe typical shape of the two stochastic processes. If we slice the timedomain with some windows, say r = 30, we can find that the process

  • 2.4 Integrated variables and cointegration 23

    Stationary AR(1)Random walk

    1 0 20 40 60 80 100 120 140 160

    Time

    Fig. 2.2. Examples of two AR(1) processes with a drift

    xt passes the mean of the xt process (E(xt) = 0) at least once, whilethe process yt does not. The process xt has a force to converge towardthe mean (that is, it is mean-reverting) and randomly fluctuates aroundthe mean (no systematic changes). On the other hand, the process ytincreases systematically as t > oo (sometimes systematically decreases)and there is no force to move it toward its mean.

    The variances of the two series are computed after the first 20 periods.Figure 2.3 illustrates the variances of the two series after the first 20periods. The variance of yt increases as t increases, while the varianceof xt converges to a constant (1/(1 p) = 10) after t > 70.

    Figure 2.4 shows the correlograms of Xt and yt. As we have seen

    thus we can expect

    7oas r > oo since \p\ < 1. For a nonstationary series yt, since

  • 24 Basic concepts

    Stationary AR(1)Random walk

    20 40 140 160

    Fig. 2.3. The variances of xt and

    the values of pvr will not come down to zero except for a very large valueof the lag.

    Since the variance of the nonstationary series is not constant overtime (not covariance stationary or just nonstationary), the conventionalasymptotic theory cannot be applied for these series. One of the easiestways to analyze those series is to make those series stationary by differ-encing. In our example, the random walk series yt can be transformedto a stationary series by differencing once

    &yt = yt- yt-i = (l L)yt = etwhere L is a lag operator. Since the error St is assumed to be indepen-dently normal, the first difference of yt is stationary. The variance ofAyt is constant over the sample period.

    When the nonstationary series can be transformed to the stationaryseries by differencing once, the series is said to be integrated of order 1and is denoted by 1(1). If the series needs to be differenced k times to bestationary, then the series is said to be I(k). In our example, yt is 1(1)variable, since the series needs to be differenced once to be stationary,while Xt is the 1(0) variable. The I(k) series (k ^ 0) is also called adifference-stationary process (DSP). When AdXt is a stationary series

  • 2.4 Integrated variables and cointegration 25

    Stationary AR(1)Random walk

    ' 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38

    Lags

    Fig. 2.4. The autocorrelations of xt and yt

    that can be represented by an ARMA(p, q) model, we say that Xt is anautoregressive integrated moving-average (ARIMA) process. Since thenumber of differences is equal to the order of integration, Xt is denotedas ARIMA(p, d, q) process.

    Another important class is the trend-stationary process (TSP). Con-sider the series

    zt = a + 6t 4- et (2.7)The mean of zt is E(zt) = a + 6t and is not constant over the sampleperiod, while the variance of Zt is var(zt) = o2 and constant. Althoughthe mean of zt is not constant over the period, it can be forecastedperfectly whenever we know the value of t and the parameters a and6. In this sense it is stationary around the deterministic trend t and ztcan be transformed to stationarity by regressing it on time. Note thatboth the DSP model equation (2.6) and the TSP model equation (2.7)exhibit a linear trend, but the appropriate method of eliminating thetrend differs.

    Most econometric analysis is based on the variance and covarianceamong the variables. For example, the OLS estimator from the regres-sion yt on xt is the ratio of the covariance between yt and xt to the

  • 26 Basic concepts

    variance of Xt. Thus if the variances of the variables behave differently,the conventional asymptotic theory cannot be applicable. When theorder of integration is different, the variance of each process behavesdifferently. For example, if yt is an 1(0) variable and xt is 1(1), theOLS estimator from the regression yt on Xt converges to zero asymp-totically, since the denominator of the OLS estimator, the variance ofxt, increases as t increases, and thus it dominates the numerator, thecovariance between Xt and yt. That is, the OLS estimator does not havean asymptotic distribution. (It is degenerate with the conventional nor-malization of y/T.) We need to have the normalization of T rather thanthat of y/T.CointegrationAn important property of 1(1) variables is that there can be linear com-binations of these variables that are 1(0). If this is so then these variablesare said to be cointegrated. The concept of cointegration was introducedby Granger (1981). Suppose that we consider two variables yt and xtthat are 1(1). Then yt and xt are said to be cointegrated if there existsa /? such that yt j3xt is 1(0). This is denoted by saying that yt and xtare CI(1,1). More generally, if yt is I(d) and xt is I(d), then yt and xtare CI(d,b) if yt (3xt is I(d b) with b> 0. What this mean is thatthe regression equation

    yt = pxt + utmakes sense because yt and Xt do not drift too far apart from each otherover time. Thus, there is a long-run equilibrium relationship betweenthem. If yt and xt are not cointegrated, that is yt /3xt = ut is also 1(1),then yt and xt would drift apart from each other over time. In this casethe relationship between yt and Xt that we obtain by regressing yt andXt would be spurious. We shall discuss spurious regression in the nextsection.

    To fix idea let us consider a simple system of equations

    xt = ut, ut = ut-i +euyt + Oixt = vu vt = pvt-i + e2t

    In this system of equation, Xt and yt are 1(1) variables regardless ofthe value of p. If p = 1, then the linear combination yt + axt ~ (^1)>thus Xt and yt are two independent random walks. If \p\ < 1, thenyt + ocxt ~ ^(0), thus Xt and yt are cointegrated. In order to see howtwo independent random walk variables and two cointegrated variablesdrift apart from each other over time, we draw the typical shapes of

  • 2.4 Integrated variables and cointegration 27

    -

    -

    M

    Random walk XCointegrated YRandom walk Y

    4 ft-

    -

    1 0 20 40 60 80 100 120 140 160 180 200

    Time

    Fig. 2.5. Cointegrated and independent 1(1) variablestwo cases based on the 100 observations of xt and yt with a = 1 andp = 0.8,1. The errors ut and v* are generated from iin(0,1). Figure2.5 shows the typical shapes of three stochastic processes, where xt is a1(1) variable and yt is an independent 1(1) variable (when p = 1) and acointegrated 1(1) variable (when p = 0.8 and a = 1). Two cointegrated1(1) variables xt and yt (solid line and dashed line) show some tendencyfor the two series not to drift too far apart (or move together), whiletwo independent 1(1) variables xt and yt (solid line and dotted line) donot have such tendency.

    The concept of cointegration can be extended for the higher order ofintegrated variables. We can have 1(2) variables that are cointegratedto produce an 1(1) variable. When dealing with 1(2) variables, differenttypes of cointegration can occur. Firstly, linear combinations of 1(2)variables can be 1(1) or even 1(0) and, secondly, some linear combina-tions of 1(1) variables, can cointegrate with first-differences of the 1(2)variables to produce an 1(0) variable. In the modeling of demand formoney equations, if m< = log of nominal money, yt = log of income,and pt = log of prices, it has been found that rat and pt are possibly1(2) and real money (ra^ Pt) and velocity (mt Pt yt) a r e possibly

  • 28 Basic concepts

    2.5 Spurious regressionConsider two uncorrelated random walk processes

    Vt = Vt-i + uu ut ~ iin(Q, al)xt = Xt-i + Vt, vt ~ wn(0, cr^ )

    where itt and vt are assumed to be serially uncorrelated as well as mu-tually uncorrelated. And consider the regression

    Vt = Po + Pixt + et

    Since yt and xt are uncorrelated random walk processes, we would expectthat the R2 from this regression would tend to zero. However, this is notthe case. The parameter (3i detects correlation and Yule (1926) showedlong ago that spurious correlation can persist even in large samples innonstationary time series. If the two time series are growing over time,they can be correlated even if the increments in each series are uncorre-lated. Thus, we have to be cautious when interpreting regressions with1(1) variables.

    This point was also illustrated in Granger and Newbold (1974) whopresent some examples with artificially generated data where the errorsut and vt were generated independently so that there was no relationshipbetween yt and xt, but the correlations between yt and yt-i> and xt andXt-i were high. The regression of y on x gave a high R2 but a lowDurbin-Watson (DW) statistic. When the regression was run in firstdifferences, the R2 was close to zero and the DW statistic was close to2, thus demonstrating that there was no relationship between y and xand that the R2 obtained was spurious.

    Phillips (1986) shows that with two independent random walk pro-cesses with no drift (like the ones we have considered) the least squaresregression

    $ i t (2.8)leads to a divergent estimator fio and convergence of f}\ to a randomvariable. By showing the limiting distributions of the OLS estimator andtest statistics, including t and DW statistics and R2, Phillips explains thesame results found by the simulation of Granger and Newbold (1974).

    Entorf (1992) shows that the results are altered if we consider inde-pendent random walk processes with drifts. Suppose that we have

    yt = ay + yt-i + utxt = ax+xt-i+vt

  • 2.6 Deterministic trend and stochastic trend 29

    then Entorf shows that in the regression (2.8)

    where -^ denotes convergence in probability. Thus P\ converges to aconstant rather than a random variable. Entorf also finds that /3o hasa divergent distribution. Thus, the results from spurious regression de-pend on whether we consider random walk processes with drifts or withno drifts.

    2.6 Deterministic trend and stochastic trendAs we have seen, integrated variables exhibit a systematic variation. Butthe variation is hardly predictable, though the variation is systematic.This type of variation is called a stochastic trend. On the other hand,trends which are completely predictable (if we know the coefficient oftime) are known as deterministic trends. The specification of a deter-ministic trend can be any functional form of time. For example, thedeterministic trend DTt can be any of following:

    DTt = O(zero), c(constant), a + ^(linear form)or

    p

    2^ Pit1 (polynomial time trend)i=0

    or

    ao + Pot , t = l , . . . , raai + Pit , t = m + l , . . . , T (segmented trend)

    To fix ideas about the deterministic trend and the stochastic trend, letus consider the following ARIMA(O,1,1) model with a drift (a constantterm)

    Ayt = a + et + ^et-i

    where et is assumed to be iid errors. Let y0 = eo = 0 so that yt can bewritten by successive substitution as

    yt =yt-2 + et-i + jet-2) + et

    t-i

  • 30 Basic conceptst

    Letting

    DTt = at

    STt = (1 + 7

    Ct = --yet

    we can rewrite yt as

    yt = DTt + Zt = DTt + STt + Ct (2.9)Here DTt is a deterministic trend in yt, and Zt is the noise function orstochastic component of yt. The noise function Zt can be decomposedas the sum of the stochastic trend STt and the cyclical component Ct.The cyclical component is assumed to be a mean-zero stationary process.The stochastic trend incorporates all random shocks (ei to et) that havepermanent effects on the level of yt. The sum of the deterministic trendDTt and the stochastic trend STt is the overall trend and the permanentcomponent of yt.

    If we denote the permanent component of yt as y\, then yt can berewritten as

    yt = yPt+Ctwhere the permanent component of yt is y\ = DTt 4- STt, the sum ofthe deterministic trend and the stochastic trend. It can be shown thatthe permanent component of yt is a random walk with drift such that

    2 = 1

    2 = 1

    To see the typical shape of each component we generate the ARIMA(0,1,1) series by setting a = 0.008 and 7 = 0.3. The innovations et aregenerated from a normal distribution of (0, 0.0532). Figure 2.6 showsthe typical shape of series and each component. The solid line is thegraph of the generated ARIMA(0,1,1) series. The dashed line is thedeterministic trend DTt = 0.008t, the short-dashed line is the stochastic

  • 2.6 Deterministic trend and stochastic trend 31

    ARIMA(O,1,1) Deterministic trend

    - - Stochastic trendCyclical component

    1 0 20 40 60 80 100 120 140 160

    Time

    Fig. 2.6. ARIMA(O,1,1) and its componentstrend STt = l - 3 ^ i = 1 et, and the dotted line is the cyclical componentCt = 0.3e. The sum of two trend and the cyclical component is thegenerated ARIMA(O,1,1) series.

    The decomposition of the noise function Zt into STt and Ct as we haveseen in equation (2.9) can be extended to any ARIMA(p, l,q) model.Beveridge and Nelson (1981) show that any ARIMA(p, 1, q) model can berepresented as a stochastic trend plus a stationary component. The noisefunction Zt is assumed to be described by an autoregressive-moving-average process

    A(L)Zt = B(L)etwhere A(L) and B(L) are polynomials in the lag operator L of orderp and q, respectively, and et is assumed to be a sequence of iid errors.Suppose that the polynomial A(L) has a unit root, we can write

    A{L) = (l-L)A*{L)where A* (L) has roots strictly outside the unit circle. The first-differenceof the noise function is

    - L)A*(L)Zt = A*(L)AZt = B(L)et

  • 32 Basic concepts

    Table 2.1. Regression of integrated variables

    xt \ ytDeterministicStochastic

    DeterministicRegression validSpurious regression

    StochasticSpurious regressionSpurious regressionunless yt and xtare cointegrated

    AZt = A^

    and

    where i/>*(L) = (1-L)~1[for both sides of (2.10) yields

    zt =

    (2.10). Applying the operator (1-L)" 1

    Thus, any ARIMA(p, l,g) model, here the noise function Zt, can bedecomposed into a stochastic component STt and a cyclical componentct.

    In the previous sections we talked of spurious regressions arising fromtrending variables. Whether such regressions are spurious or not dependson the type of trend - whether it is deterministic or stochastic. Table 2.1shows when the regressions of yt on xt are valid. In the case of randomwalk processes with drifts, considered by Entorf, note that both yt andxt can be expressed as a function of time plus a random walk processwith no drift.

    2.7 Detrending methodsThe method used for detrending depends on wether the time series isTSP (trend-stationary process or a process that is stationary around atrend) or DSP (difference-stationary process or a process that is sta-tionary in first-differences). We shall be discussing problems of distin-guishing between TSP and DSP in the next two chapters. The empiricalevidence on this is mixed. One summary conclusion that emerges from

  • 2.7 Detrending methods 33

    the voluminous empirical work is that the evidence in favor of determin-istic trends is stronger for real variables than for nominal variables.

    There have been several papers that have studied the consequencesof underdifferencing or overdifferencing. If the time series is DSP andwe treat it as TSP, this is a case of underdifferencing. If the time seriesis TSP, but we treat it as DSP, we have a case of overdifferencing.However, the serial correlation properties of the resulting errors fromthe misspecified processes need to be considered. For instance, if theregression relationship is correctly specified in first differences, i.e.

    Ayt = (3Axt + etthis implies that

    yt = a + (3xt + utwhere ut = et+et-\-\ is serially correlated and nonstationary. On theother hand, if the regression relationship is correctly specified in levels,i.e.

    yt = a + f)xt + vtthis implies that

    Ayt = (3Axt +vt -vt-iThe errors follow a noninvertible moving-average process. The onlyquestion is whether OLS estimation of this equation with first-orderMA errors leads us astray.

    Plosser and Schwert argue that even if the MA coefficient is somewhatunderestimated, the sampling distribution of /? does not lead frequentlyto incorrect conclusions, and hence "the cost of overdifferencing maynot be large when care is taken to analyze the properties of regressiondisturbances." (1978, p. 643)

    Nelson and Kang (1984) list several ways in which investigators wouldbe led to misleading results if they estimate underdifferenced relation-ships. But these results hold if we do not correct for serial correlationin the errors. Their results on pages 79-80 show that the consequencesof underdifferencing are not as serious if serial correlation in the errorsis taken into account. Plosser and Schwert (1978, p. 638) argue that"the real issue is not differencing but an appropriate appreciation of therole of the error term in regression models." This point is discussedfurther in McCallum (1993), with reference to regression models withlagged dependent variables. Note that in all this discussion we havebeen considering DSP without trend and TSP with a linear trend.

  • 34 Basic concepts

    There is also some discussion of detrending on the periodic propertiesof the detrended series. Nelson and Kang (1981) argue that if the trueprocess is DSP (with errors not exhibiting any cycles) and trend removalis done by regression on time (treating it as TSP) then the detrendedseries exhibits spurious periodicity.

    2.8 VAR, ECM, and ADLThe vector autoregressive (VAR) model is just a multiple time seriesgeneralization of the AR model. The multiple time series generalizationof the ARMA model is the VARMA model but we shall not consider ithere. The VAR model has been popularized by Sims (1980) and it alsoforms a starting point for the analysis of cointegrating regression. Inmatrix notation, the VAR model for k variables can be written as

    where Y{ = (ylt, y2t, - ., Vkt) and Ai, A 2 , . . . , Ap are k x k matrices andUt is a A:-dimensional vector of errors. With E(ut) = 0 and E is positivedefinite. More compactly the VAR model can be represented as

    Yt = A(L)Yt + Utwhere L is the lag operator. Since the VAR model is nothing but thestacked form of stationary AR(p) models and the regressors are the samefor all the equations, the estimation of the VAR model is straightforward.The maximum likelihood estimator (MLE) reduces to the OLS estimatorfor each equation in the VAR model. The MLE of E is also provided bythe OLS residuals Ut giving E = E(utu's).

    The above results apply only for the unrestricted VAR model. Inpractice it has been found that the unrestricted VAR model gives veryerratic estimates (because of high multicollinearity among the explana-tory variables) and several restricted versions have been suggested. Also,when some of the variables in Yt are 1(1), then one needs to use them infirst-differences. If some of these, 1(1) variables are cointegrated, thenthis imposes further restrictions on the parameters of the VAR model.These problem will be discussed in subsequent chapters.

    The error correction model (ECM), first introduced into the econo-metric literature by Sargan (1964) and popularized by Davidson et al.(1978) has been a viable alternative to the VAR model. For some timein the 1980s, the American econometricians were all estimating VARsand the European econometricians were all estimating ECMs. There

  • 2.8 VAR, ECM, and ADL 35

    are several interpretations of the error correction model and these arediscussed in Algoskoufis and Smith (1991). The main characteristic ofECMs as compared with the VARs is the notion of an equilibrium long-run relationship and the introduction of past disequilibrium as explana-tory variables in the dynamic behavior of current variables. The recentrevival in the popularity of the ECMs has been based on the demonstra-tion by Granger and Weiss (1983) that if two variables are integratedof order 1, and are cointegrated, they can be modeled as having beengenerated by an ECM.

    The ECM links the realized value yt to its target value y% = f3'zt. Inits simplest form, it can be written as

    Ayt = AiAy; + A2(yt*_i - yt-i)where Ai > 0, A2 > 0 . The last term represents past disequilibrium.The partial adjustment model is given by

    Aj/t = Kvt - yt-i) = AA?/t* + Kvt-i - yt-i)Thus the partial adjustment model corresponds to the ECM with Ai =A2-

    Another class of models often considered is the autoregressive dist-ributed lag (ADL) model discussed in Hendry, Pagan, and Sargan (1984).A general ADL model with p regressors, m lags in y, and n lags in eachof the p regressors is denoted by ADL(m, n;p). It is given by

    t=l j=l i=0

    Such models are also called dynamic linear regression (DLR) models.Consider the simplest ADL(1,1;1) model

    yt = a0 + i2/t-i + A)#t + Pixt-i + et (2.11)where it is assumed that et ~ iid(0, a2) and |c*i| < 1. We shall show theconnection between this and the ECM model. In long-run equilibriumyt = yt-\ and xt = #t-i, then we can write

    Thus, the long-run response to y of a change in x is given by

  • 36 Basic concepts

    Now write the equation (2.11) as

    Vt ~ 2/t-i = ao + (ai - l)yt-i + A) (a* ~ t-i) + (A>Writing /?0 + /?i = fc(l OL\) we get

    Ayt = a0 + (ai - l)(y t-i - kxt-i) + A) Ax* + et (2.12)Note that (yt-i kxt-i) is a last periods disequilibrium. Thus (2.12) isthe ECM that is implied by the ADL(1,1;1) model.

    If we write (2.12) as+ t (2.13)

    then clearly an estimate of the long-run response k is given by

    k =

  • 2.9 Unit root tests 37

    One other point to note is that we can write equation (2.12) as

    A> + ft - l)a;t-i + et (2.15)

    or

    Note that the estimate of the error correction term, namely (a\ 1), isthe same in both (2.12) and (2.15). What we have done is, we assumedthe long-run response in the error correction term to be equal to 1,and added Xt-i as an explanatory variable. This simple but importantresult will be used when we come to cointegration tests based on ECM.The generalization of these results to a general ADL(m, n\p) model arestraightforward except for some extra notation and will not be pursuedhere.1

    2.9 Unit root testsAs discussed earlier, the question of whether to detrend or to difference atime series prior to further analysis depends on whether the time seriesis trend-stationary (TSP) or difference-stationary (DSP). If the seriesis trend-stationary, the data generating process (DGP) for yt can bewritten as

    Vt = 7o + 7i* + etwhere t is time and et is a stationary ARM A process. If it is difference-stationary, the DGP for yt can be written as

    yt = a0 + yt-i + etwhere et is again a stationary ARMA process. If the ets are seriallyuncorrelated, then this is a random walk with drift a0.

    Following Bhargava (1986), we can nest these two models in the fol-lowing model

    Vt = 7o + 71* + utut = put-i + et

    so thatyt = 7o + 7i* + p[yt-i ~ 7o - 7i(* - 1)] + et (2.16)

    1 For details see chapter 2 of Banerjee et al. (1993).

  • 38 Basic concepts

    where e* is a stationary process. If \p\ < 1, yt is trend-stationary. If\p\ = 1, yt is difference-stationary. Equation (2.16) can be written as

    yt = A) + /3it 4- pyt-i + e* (2.17)or

    Aj/t = A) + A* + (p ~ l)yt-i + et (2.18)where /?0 = 7o(l p ) 4 7 i p and /?i = 7i( l p) . Note that if p = 1, thenAL = 0. If we start with the model

    Vt 7o + utut = put-i + et

    then we get

    yt = 7o(l - p) 4- pyt-i + etor

    Vt = Po + PUt-i +et

    with /?o = 0 if p = 1. If we have a quadratic trend, then equation (2.16)would be

    yt = 7o + 7i* + 72*2 + p[yt-i - 7o - 7i(* - 1) - 72(* - I)2] + etwhich can be written as

    Vt = Po + Pit 4- /32*2 4- pyt-i 4 et (2.19)where

    A) = 7o(l - p ) + (7i

    and

    02 = 72(1 - P )Thus, if p = 1, then /32 = 0. We shall discuss the problems caused bythis in later chapters.

    It is customary to test the hypothesis p = 1 against the one-sidedalternative \p\ < 1. This is called a unit root test. One cannot, however,use the usual t-test to test p = 1 in equation (2.17) because under thenull hypothesis, yt is 1(1), and hence the ^-statistic does not have anasymptotic normal distribution. The relevant asymptotic distribution,based on Wiener processes, will be discussed in chapter 3.

  • 2.10 Cointegration tests and ECM 39

    Equation (2.16) and (2.17) have been derived by considering the issueof testing whether a series is difference-stationary or trend-stationary.By contrast the tests for unit roots developed by Dickey (1976), Puller(1976), and Dickey and Fuller (1979), commonly known as Dickey-Fullertests are based on a simple autoregression with or without a constant ortime trend. They are based on testing p = 1 in the equations

    Vt = pyt~i+et (2.20)Vt = A> + P2/t-i+et (2.21)Vt = A) + 01* + /t-i + et (2.22)

    However, as we noted earlier, the Bhargava formulation implies thatin equation (2.21), /?o = 0 if p = 1 and in equation (2.22), (3\ = 0 ifp = 1. No such restrictions are imposed in the formulations in (2.21)and (2.22). As a consequence, the parameters in equations (2.21) and(2.22) have different interpretations under the null and the alternative(see Schmidt and Phillips, 1992). For instance, in equation (2.21), underthe null hypothesis p = 1; (3Q represents the coefficient of trend, whereasunder the alternative yt is stationary around the level /?o/(l ~ p) (seethe discussion in section 1.6 earlier). Similarly, by successive substi-tution, we can show that in equation (2.22), under the null hypoth-esis p = 1, the parameters /?o and (3i represent coefficients of t andt2 in a quadratic trend, whereas under the alternative they representthe level and the coefficient of t in a linear trend. Because of theseproblems, the Bhargava-type formulation is preferred in unit root test-ing.

    2.10 Cointegration tests and ECMAs we mentioned earlier (section 2.4), if a linear combination of 1(1)variables is stationary or 1(0), then the variables are said to be coin-tegrated. Suppose we have a set of k variables yt which are all 1(1)and j3'yt = ut is 1(0), then (3 is said to be a cointegrated vector andthe equation /3fyt = ut is called the cointegrating regression. Notethat the elements of the cointegrating vector can be zero, but not allof them. If there are two such vectors (3\ and /32 so that (3[yt = u\tand f3f2yt = u2t are both 1(0), then any linear combination of these vec-tors is also a cointegrating vector because linear combinations of 1(0)variables are 1(0). There is, thus, an identification problem. Unlesswe bring in some extraneous information, we cannot identify the long-

  • 40 Basic concepts

    run equilibrium relationship. These problems will be illustrated in laterchapters.

    We shall discuss here the relationship between cointegration and ECM.To fix ideas, consider the case of two variables yu and y2t that are both1(1). In this case if we write yu = f3y2t + ut and ut is 1(0) then yu and2/2t are cointegrated and (3 is the cointegration vector (for the case oftwo variables, scalar). In the two variable case, if there is cointegration,we can show that (3 is unique. Because, if we have yu = "yyn + Vtwhere Vt is also 1(0), by substraction we have (f3 7)2/2* + Ut Vt is1(0). But Ut vt is 1(0) which means {(3 7)3/2* is 1(0). This is notpossible since y2t is 1(1). Hence we should have (3 = 7. But, in thecase of more than two variables, the cointegration vector is no longerunique.

    How do we know whether yu and y2t are cointegrated? We can testwhether the error in the cointegrating regression is 1(1). Thus, the hyp-othesis that Ut has a unit root is a hypothesis that there is no cointe-gration. Tests for cointegration thus, have no cointegration as the nullhypothesis. On the other hand, if the null hypothesis is that there iscointegration, this has to be based on stationarity as the null hypoth-esis for Ut. These issues will be discussed in the chapter on tests forcointegration.

    The earliest cointegration test is the one suggested in Engle andGranger (1987) which consists of estimating the cointegrating regres-sion by OLS, obtaining the residuals Ut and applying unit root tests forUt. Since Ut are themselves estimates, new critical values need to be tab-ulated. Several extensions of this test have been proposed and criticalvalues have been tabulated for these tests. Since they are all based onUt, they are called residual-based tests.

    Another set of tests are those that are based on the ECM model.These tests are based on what is known as the Granger representationtheorem which says that if a set of 1(1) variables are cointegrated, theycan be regarded as being generated by an ECM.

    To illustrate these ideas, consider a simple two-equation model usedin Engle and Granger (1987)

    2/it + PV2t = uu, uu = uitt-i + elt (2.23)Vit + OLy2t = u2t, u2t = pu2,t-i + e2t, \p\ < 1 (2.24)

    where e\t and e^t are possibly correlated white-noise errors. The modelis internally consistent only if a ^ (3. The reason for this constraint

  • 2.11 Summary 41

    is that if a = /?, it is impossible to find any values of y\t and y2t thatsatisfy both equations. The reduced forms for yu and y^t are

    a (3Vit = aU>it ou2t

    a p a p1 1

    2/2t = \a p a pThese equations also make clear the fact that both y\t and y2t are drivenby a common 1(1) variable uu. This is known as the common trendrepresentation of the cointegrated system. Also equation (2.24) statesthat a linear combination of yu and y^t is stationary. Hence yu andy2t are cointegrated with a cointegration coefficient a. Note that ifp = 1, then u2t is also 1(1) and hence there is no cointegration. The nullhypothesis p = 1 is thus a test of the hypothesis of no cointegration.The Engle-Granger test is thus based on equation (2.24).

    There is also an autoregressive representation for this system. Equa-tion (2.23) and (2.24) can be written as

    As/it = P0yiit-i + af36y2,t-i + Vit

    where 8 = (1 p)/(a /?) and r/it and r/2t are linear combinations of e\tand e2t- If we write zt = yu + 2/2t, then equation (2.24) implies

    zt = pzt-i + e2tor

    A^ = (p - l)zt-i + e2tor

    Ayit = -aAjfet + (p - l)^t-i + e2t (2.25)This is in the form of an ECM where zt-\ represents past disequilibrium.

    2.11 SummaryThis chapter provides an introduction to the basic models that will bediscussed in detail in later chapters. Besides an introduction to ARM Amodels and Box-Jenkins methods, the chapter introduces trend and dif-ference stationarity (TSP and DSP), unit roots and cointegration, vectorautoregressive (VAR) models, and error correction models (ECMs).

    The next two chapters will discuss problems concerning unit roots andthe subsequent two chapters will discuss cointegration.

  • 42 Basic concepts

    ReferencesAlgoskoufis, G. and R. Smith (1991), "On Error Correction Models:

    Specification, Interpretation, Estimation," Journal of EconomicSurveys, 5, 97-128.

    Ansley, C.F. (1979), "An Algorithm for the Exact Likelihood of MixedAutoregressive Moving Average Process," Biometrika, 66, 59-65.

    Banerjee, A., J.J. Dolado, J.W. Galbraith, and D.F. Hendry (1993),Cointegration, Error Correction, and the Econometric Analysisof Non-Stationary Data, Oxford University Press, Oxford.

    Bardsen, G. (1989), "The Estimation of Long-Run Coefficients fromError-Correction Models," Oxford Bulletin of Economics andStatistics, 51, 345-350.

    Beveridge, S. and C.R. Nelson (1981), "A New Approach to Decompo-sition of Economic Time Series into Permanent and TransitoryComponents with Particular Attention to Measurement of the'Business Cycle'," Journal of Monetary Economics, 7, 151-174.

    Bewley, R. (1979), "The Direct Estimation of the Equilibrium Re-sponse in a Linear Dynamic Model," Economic Letters, 3, 357-361.

    Bhargava, A. (1986), "On the Theory of Testing for Unit Roots inObserved Time Series," Reiew of Economic Studies, 53, 137-160.

    Box, G.E.P. and G.M. Jenkins (1970), Time Series Analysis Forecastingand Control, Holden-Day, San Francisco.

    Box, G.E.P. and D.A. Pierce (1970), "Distribution of Residual Auto-correlations in Autoregressive Integrated Moving Average TimeSeries Models," Journal of the American Statistical Association,65, 1509-1526.

    Davidson, J.E.H., D.F. Hendry, F. Srba, and S. Yeo (1978), "Economet-ric modeling of the Aggregate Time-Series Relationship BetweenConsumer's Expenditure and Income in the United Kingdom,"Economic Journal, 88, 661-692.

    Dickey, D.A. (1976), "Estimation and Hypothesis Testing for Nonsta-tionary Time Series," Ph.D. dissertation, Iowa State University.

    Dickey, D.A. and W.A. Fuller (1979), "Distribution of the Estimatorsfor Autoregressive Time Series with a Unit Root," Journal ofthe American Statistical Association, 74, 427-431.

    Engle, R.F. and C.W.J. Granger (1987), "Co-integration and Error

  • References 43Correction: Representation, Estimation and Testing," Econo-metrica, 55, 251-276.

    Entorf, H. (1992), "Random Walk with Drift, Simultaneous Errors,and Small Samples: Simulating the Bird's Eye View," InstitutNational de la Statistique et des Etudes Economiques.

    Fuller, W.A. (1976), Introduction to Statistical Time Series, JohnWiley, New York.

    Granger, C.W.J. (1981), "Some Properties of Time Series Data andTheir Use in Econometric Model Specification," Journal ofEconometrics, 16, 121-130.

    Granger, C.W.J. and P. Newbold (1974), "Spurious Regression inEconometrics," Journal of Econometrics, 2, 111-120.

    Granger, C.W.J. and A.A. Weiss (1983), "Time-Series Analysis ofError-Correction Models," in S. Karlin, T. Amemiya, and L.A.Goodman (eds.), Studies in Econometrics, Time Series andMultivariate Statistics, Academic Press, New York.

    Hendry, D.F., A.R. Pagan, and J.D. Sargan (1984), "Dynamic Specifi-cation," in Z. Griliches and M.D. Intrilligator (eds.), Handbookof Econometrics II, North-Holland, Amsterdam, 1023-1100.

    Kolmogorov, A. (1933), Foundations of the Theory of Probability, pub-lished in 1950 by Chelsea, New York.

    Maddala, G.S. (1992), Introduction to Econometrics, 2nd ed, Macmil-lan, New York.

    McCallum, B.T. (1993), "Unit Roots in Macroeconomic Time Series:Some Critical Issues," Economic Quarterly, Federal ReserveBank of Richmond, 79, 13-43.

    Nelson, C.R. and H. Kang (1981), "Spurious Periodicity in Inappropri-ately Detrended Time Series," Journal of Monetary Economics,10, 139-162.

    (1984), "Pitfalls in the Use of Time as an Explanatory Variable inRegression," Journal of Business and Economics, 2, 73-82.

    Phillips, P.C.B. (1986), "Understanding Spurious Regression in Econo-metrics," Journal of Econometrics, 33, 311-340.

    Plosser, C.I. and W.G. Schwert (1978), "Money, Income and Sunspots:Measuring Economic Relationships and the Effects of Differenc-ing," Journal of Monetary Economics, 4, 637-660.

    Quenouille, M. (1957), The Analysis of Multiple Time Series, CharlesGriffin, London.

    Sargan, J.D. (1964), "Wages and Prices in the United Kingdom: AStudy in Econometric Methodology," in P.E. Hart, G. Mills,

  • 44 Basic concepts

    and J.K. Whitaker (eds.), Econometric Analysis for NationalEconomic Planning, Butterworth, London; reprinted in D.F.Hendry and K.F. Wallis (eds.), Econometrics and QuantitativeEconomics, Basil Blackwell, Oxford, 1984.

    Schmidt, P. and P.C.B. Phillips (1992), "LM Test for a Unit Rootin the Presence of Deterministic Trends," Oxford Bulletin ofEconomics and Statistics, 54, 257-287.

    Sims, C. (1980), "Macroeconomics and Reality," Econometrica, 48, 1-48.

    Slutsky, E. (1937), "The Summation of Random Causes as the Sourceof Cyclic Processes," Econometrica, 5, 105.

    Spanos, A. (1986), Statistical Foundations of Econometric modeling,Cambridge University Press, Cambridge.

    Wickens, M.R. and T.S. Breusch (1988), "Dynamic Specification, theLong Run and the Estimation of Transformed Regression Mod-els," Economic Journal, 98, 189-205.

    Yule, G.U. (1926),"Why Do We Sometimes Get Nonsense CorrelationsBetweeen Time Series? A Study in Sampling and the Nature ofTime Series," Journal of the Royal Statistical Society, 89, 1-64.

  • Part IIUnit roots and cointegration

    This part contains five chapters that form the core material that needsto be understood to follow the rest of the book.

    Chapter 3 gives a brief introduction to Wiener processes. We do notgo into these in great detail because we do not go into details of thederivations of asymptotic distributions. Those interested in these canrefer to the source material. (Many empirical researchers do not needthe derivations.) We next discuss the importance of scaling factors inthe derivation of asymptotic distributions. Next we discuss the Dickey-Fuller (DF) distribution and the DF tests, the ADF test and the problemof selection of lag length (a problem that needs special attention). Nextwe discuss the Phillips-Perron (PP) tests, Sargan-Bhargava tests, vari-ance ratio tests, and finally forecasting problems.

    Although often used, the ADF and PP tests are useless in practiceand should not be used. Some useful modifications of these tests arediscussed in chapter 4. The material covered in this chapter forms thebasis of all the modifications discussed in the next chapter.

    Chapter 4 considers several issues in unit root testing. The reason whythere are so many unit root tests is that there is no uniformly powerfultest for the unit root hypothesis. We discuss several of the tests forcompleteness. Some of them are not worth considering but they are allpromoted by the respective authors and the Nelson-Plosser data set isused as a guinea pig for every new test suggested.

    We first discuss the problems of size distortions and low power of unitroot tests and then some solutions to these problems. We also discusstests for stationarity as null, the oft-quoted KPSS test. We do notrecommend its use - it has the same low power problems as the ADFand PP tests. It is discussed here because it is often referred to - asuseful for confirmatory analysis in conjunction with the ADF and PP

    45

  • 46 / / Unit roots and cointegration

    tests. But we feel that such confirmatory analysis is an illusion (with twotests that lack power). Some useful modifications of the ADF, PP, andKPSS tests are discussed and these should be preferred. This chapteralso discusses in detail panel data unit root tests. The Levin-Lin test isvery often used (in fact over used) and we discuss useful modificationsof this test.

    Chapter 5 is on estimation of cointegrated systems and inference onthe estimated coefficients. We first discuss the two-variable model andthe normalization issue (not usually discussed). Second, we discuss atriangular system, the FM-OLS method, and several other methods in-volving lags and leads. Third, discuss system estimation methods -the Johansen procedure (widely used) and the Box-Tiao method (rarelyused). Fourth, we discuss identification problems since the system meth-ods identify only the cointegration space but not the individual cointe-grating vectors without more prior information. Fifth, we discuss severalMonte Carlo studies that have been conducted to study the different es-timation methods but very few unambiguous conclusions emerge exceptthat the Engle-Granger two-step method should be avoided. Finally wediscuss several miscellaneous issues such as forecasting from cointegratedsystems, threshold cointegration, and so on.

    Chapter 6 on tests for cointegration is complementary to chapter 5.It discusses the commonly used residual-based tests (these should beavoided), ECM-based tests, tests with cointegration as null, and testsassociated with system-based estimation methods (the Johansen andBox-Tiao methods). The chapter also has a sceptical note on the useof cointegration tests. One important problem discussed (often com-pletely ignored) is the pre-testing problem - cointegration itself dependson preliminary tests for unit roots and thus there is a question of theappropriate significance levels to use in cointegration tests.

    Chapter 7, the final chapter in this part, is on inferential proceduresin regression models involving 1(1) and 1(0) regressors, unbalanced equa-tions, the problem of uncertain unit roots, and testing procedures underuncertainty about unit roots and cointegration. Many of these problemsare not commonly discussed in books.

  • 3Unit roots

    3.1 Introduction

    In the previous chapter we discussed the econometric problems thatmight occur when we run a regression with different orders of integratedvariables. To avoid this problem, the first thing we have to do is to iden-tify the correct order of the integration of each variable. In the contextof ARIMA modeling, this identification is equivalent to determining theparameter d in the ARIMA(p, d, q) model. The Box-Jenkins approachdiscussed in the previous chapter suggested the use of visual inspectionof correlograms fo