introduction to statistical time series (wiley series in probability and statistics)
TRANSCRIPT
Introduction to Statistical Time Series
WILEY SERIES IN PROBABILITY AND STATISTICS
Established by WALTER A. SHEWHART and SAMUEL S. WILKS
Editors: Vic Barnett, Ralph A. Bradley, Nicholas I. Fisher, J . Stuart Hunter, J. B. Kadane, David G. Kentiall, David W, Scott, Adrian F. M. Smith, Jozef L. Teugels, Geoffrey S. Watson
A complete list of the titles in this series appears at the end of this volume
Introduction to Statistical Time Series Second Edition
WAYNE A. FULLER Iowa State University
A Wiley-Interscience Publication JOHN WILEY & SONS, UVC. New York Chichester Brisbane Toronto 0 Singapore
A NOTE TO THE READER This book has been electroaiailly replodaoed Eram digital infixmation stated at John Why & Sons, Inc. we are pleased that the use oftbis Itew tcchndogy will enabie us to keq works of enduriag scholarly value in print as loq as thge is arcasonable demaad for them. The content of thisbodc is idmicd to
This text is printed on acid-free paper.
Copyright 8 1996 by John Wiley & Sons, Inc.
All rights reserved. Published simultaneousfy in Canada.
No pdrl ul rlus plblruiim may be lrpmJufd stored u) a m v a l sywm or msmtlrd in any form or hy any means. elctronic. mechanical. photocopying, r d n g , scanmg or dimwise, except as pmnitrcd under Wiona 107 or 108 of b e 1976 United States Copyright Act, wilboiii either thc pnor wntlen permtssiw o# be Publrsher, or authorimion hmgh p~ymrnt of he appropriate p-copy fec to the Copyright CIearamv Center, 222 Roscwocd Dnvc, Danvers. MA 01923, (978) 73J -W. fax (978) 750-4470. Requcsca to the P u M i r fw p e m u s s h sbould br addressed to the Petmmionc Ueporcment, John Wtky & Sons, Im , 11 1 Rlvm Street, Hoboken, NJ 07030, (201) 748-6011. fax (201) 748-6008, &Mail P E W Q O W I L E Y C Q M
To order b k s or for cuslomer service please, call 1(8Oo)-CALL-WY (225-59452
Libmy of Congrem CatelogiRg in Pubkzfion Da*r: Fuller, Wayne A.
Introduction to statistical time series / Wayne A. Fuller. -- 2nd ed.
statistics) p. cm. -- (Wiley wries in probability and
“A Wiley-Interscience publication.” Includes bibliographical references and index. ISBN 0-471-55239-9 (cloth : alk. paper) I . Time-series analysis. 2. Regression analysis. I. Title
11. Series. QA280.F%4 1996 5 19.5’5--&20 95- 14875
1 0 9 8 7 6 5 4 3
To Evelyn
Contents
Preface to the First Edition Preface to the Second Edition List of Principal Results List of Examples
1. Introduction
1.1 ProbabiIity Spaces 1.2 Time Series 1.3 Examples of Stochastic Processes 1.4 1.5 Complex Valued Time Series 1.6 1.7 Vector Valued Time Series
Properties of the Autocovariance and Autocorrelation Functions
Periodic Functions and Periodic Time Series
References Exercises
2. Moving Average aad Autoregressive Processes
2.1 2.2 2.3 2.4 2.5 2.6
2.7
2.9 2.10
2.8
Moving Average Processes Absolutely S u m b l e Sequences and Infinite Moving Averages An Introduction to Autoregressive Time Series Difference Equations The Second order Autoregressive Time Series Alternative Representations of Autoregressive and Moving Average Processes Autoregressive Moving Average Time Series Vector Processes Prediction The Wold Decomposition
xi Xiii
xxi
1
1 3 4 7
12 13 15 17 17
21
21 26 39 41 54
58 70 75 79 94
vii
xv
viii
2.1 1 Long Memory proCesses References Exercises
3. Introduction to Fourier Analysis
3.1 3.2 3.3 3.4
Systems of Orthogonal Functions-Fourier Coefficients Complex Representation of Trigonometric Series Fourier Transfonn-Functions Defined on the Real Line Fourier Transfonn of a Convolution References Exercises
4. Spectral Theory and Filtering
4.1 The Spectrum 4.2 Circulants-Diagonalization of the Covariance Matrix of
Stationary Process
CONTENTS
98 101 101
112
112 130 132 136 139 139
143
143
149 4.3 The Spectral Density of Moving Average and Autoregressive
Time Series 155 4.4 Vector Processes 169 4.5 Measurement Enor4ignal Detection 181 4.6 State Space Models and Kalman Filtering 187
References 205 Exercises 205
5. Some Large Sample Theory 214
5.1 5.2 5.3 5.4 5.5
5.6 5.7 5.8
Order in Probability Convergence in Distribution Central Limit Theorems Approximating a Sequence of Expectations Estimation for Nonlinear Models 5.5.1 5.5.2 One-Step Estimation Instrumental Variables Estimated Generalized Least Squares Sequences of Roots of Polynomials References Exercises
Estimators that Minimize an Objective Function
214 227 233 240 250 250 268 273 279 290 299 299
CONTENTS
6. Estimation of the Mean and Autocormlatiom
6.1 Estimation of the Mean 6.2
6.3 6.4
Estimators of the Autocovariance and Autocorrelation Functions Central Limit Theorems for Stationary Time Series Estimation of the Cross Covariances References Exercises
7. The Periodogram, Estimated Spectrum
7.1 The Periodogram 7.2 Smoothing, Estimating the Spectrum 7.3 7.4 Multivariate Spectral Estimates
Other Estimators of the Spectrum
References Exercises
8. Parsmeter Mmation 8.1 8.2
8.3 8.4 8.5 8.6 8.7 8.8
h
308
308
313 320 339 348 348
355
355 366 380 385 400 400
404
First Order Autoregressive Time Series Higher Order Autoregressive Time Series
404 407
8.2.1 Least Squares Estimation for Univariate Processes 407 8.2.2 Alternative Estimators for Autoregressive Time Series 413 8.2.3 Multivariate Autoregressive Time Series 419 Moving Average Time Series 42 1 Autoregressive Moving Average Time Series 429 Prediction with Estimated Parameters 443 Nonlinear Processes 45 1 Missing and Outlier Observations 458 Long Memory Processes 466 References 47 1 Exercises 47 1
9. Regression, Trend, and Seasonality
9.1 Global Least Squares 9.2 Grafted Polynomials 9.3 Estimation Based on Least Squares Residuals
9.3.1 Estimated Autocorrelations 9.3.2 Estimated Variance Functions
475
476
484 484 488
480
X m m
9.4 Moving Averages-Linear Filtering 9.4.1 Moving Averages for the Mean 9.4.2 Moving Averages of Integrated Time Series 9.4.3 S~XSOMI Adjustment 9.4.4 Differences
9.5 Structural Models 9.6 Some Effects of Moving Average Operators 9.7 Regression with Time Series Errors 9.8 Regression Equations with Lagged Dependent Variables and
Time Series Errors References Exercises
10. Unit R o d and Explosive Time Series
10.1 Unit Root Autoregressive Time Series 10.1.1 The Autoregressive Process with a Unit Root 10.1.2 Random Walk with Drift 10. I .3 Alternative Estimators 10.1.4 Prediction for Unit Root Autoregressions
10.2 Explosive Autoregressive Time Series 10.3 Multivariate Autoregressive Processes with Unit Roots
10.3.1 Multivariate Random Walk 10.3.2 Vector Process with a Single Unit Root 10.3.3 Vector Process with Several Unit Roots
10.4 Testing for a Unit Root in a Moving Average Model References Exercises
10.A Percentiles for Unit Root Distributions 10.B Data Used in Examples
Biblrography
Index
497 497 502 504 507 509 513 518
530 538 538
546
546 546 565 568 582 583 596 596 599 617 629 638 638 641 653
664
689
Preface to the First Edition
This textbook was developed from a course in time series given at Iowa State University. The classes were composed primarily of graduate students in econ- omics and statistics. Prerequisites for the course were an introductory graduate course in the theory of statistics and a course in linear regression analysis. Since the students entering the course had varied backgrounds, chapters containing elementary results in Fourier analysis and large sample statistics, as well as a section on difference equations, were included in the presentation.
The theorem-proof format was followed because it offered a convenient method of organizing the material. No attempt was made to present the most general results available. Instead, the objective was to give results with practical content whose proofs were generally consistent with the prerequisites. Since many of the statistics students had completed advanced courses, a few theorems were presented at a level of mathematical sophistication beyond the prerequisites. Homework requiring application of the statistical methods was an integral part of the course.
By emphasizing the relationship of the techniques to regression analysis and using data sets of moderate size, most of the homework problems can be worked with any of a number of statistical packages. One such package is SAS (Statistical Analysis System, available through the Institute of Statistics, North Carolina State University). SAS contains a segment for periodogram computations that is particularly suited to this text. The system also contains a segment for regression with time series errors compatible with the presentation in Chapter 9. Another package is available from International Mathematical and Statistical Library, Inc.; this package has a chapter on time series programs.
There is some flexibility in the order in which the material can be covered. For example, the major portions of Chapters 1, 2, 5, 6, 8, and 9 can be treated in that order with little difficulty. Portions of the later chapters deal with spectral matters, but these are not central to the development of those chapters. The discussion of multivariate time series is positioned in separate sections so that it m a y be introduced at any point.
I thank A. R. Gallant for the proofs of severat theorems and for the repair of others: J. J. Goebel for a careful reading of the manuscript that led to numerous substantive improvements and the removal of uncounted mistakes: and D. A.
xi
xii PREFACE TO THE FIRST EDITION
Dickey, M. Hidimglou, R. J. Klemm, and a. H. K. Wang for computing examples and for proofreading. G. E. Battese, R. L. Carter, K. R. Crwse, J. D. Cryer, D. P. Hasza, J. D. Jobson, B. Macpherson, J. Mellon, D. A. Pierce and K. N. Wolter also read portions of the manuscript. I also thank my colleagues, R Groeneveld, D. Isaacson, and 0. Kempthorne, for useful comments and discussions. I am indebted to a seminar conducted by Marc Nerlove at Stanford University for the organiza- tion of some of the material on Fourier analysis and spectral theory. A portion of the research was supported by joint statistical agreements with the U.S. Bureau of the Census.
I thank Margaret Nichols for the repeated typings required to bring the manuscript to final form and Avmelle Jacobson for transforming much of the original illegible draft into typescript.
WAYNE A. FULLER
Anus, Iowa February 1976
Preface to the Second Edition
Considerable development in statistical time series has occurred since the first edition was published in 1976. Notable areas of activity include nonstationary models, nonlinear estimation, multivariate models, state space representations and empirical mode1 identification. The second edition attempts to incorporate new results and to respond to recent emphases while retaining the basic format of the first edition.
With the exception of new sections on the Wold decomposition, partial autocorrelation, long memory processes, and the Kalman filter, Chapters one through four are essentially unchanged from the first edition. Chapter 5 has been enlarged, with additional material on central limit theorems for martingale differences, an expanded treatment of nonlinear estimation, a section on estimated generalized least squares, and a section on the roots of polynomials. Chapter 6 and Chapter 8 have been revised using the asymptotic Wry of Chapter 5. Also, the discussion of estimation methods has been modified to reflect advances in computing. Chapter 9 has been revised and the material on the estimation of regression equations has been expanded.
The material on nonstationary autoregressive models is now in a separate chapter, Chapter 10. New tests for unit roots in univariate processes and in vector processes have been added.
As with the first edition, the material is arranged in sections so that there is considerable flexibility to the order in which topics can be covered.
I thank David Dickey and Heon Jin Park for constructing the tables of Chapter 10. I thank Anthony An, Rohit Deo, David Hasza, N. K. Nagaraj, Sastry Pantula, Heon Jin Park, Savas Papadopoulos, Sahadeb Sarkar, Dongwan Shin, and George H. K. Wang for many useful suggestions. I am particularly indebted to Sastry Pantula who assisted with the material of Chapters 5, 8, 9, and 10 and made substantial contributions to other parts of the manuscript, including proofs of several results. Sahadeb Sarkar contributed to the material on nonlinear estimation of Chapter 5, Todd Sanger contributed to the discussion of estimated generalized least squares, Yasuo Amemiya contributed to the section on roots of polynomials, Rohit Deo contributed to the material on long memory processes, Sastry Pantula, Sahadeb Sarkar and Dongwan Shin contributed to the material on the limiting
2Liwiii.
XiV PREFACE To THE SECOND EDITION
distribution of estimators for autoregressive moving averages, and Heon Jin Park contributed to the sections on unit toot autoregressive processes. I thank Abdoulaye Adam, Jay Breidt, Rohit Deo. Kevin Dodd, Savas Papadopouios. and Anindya Roy for computing examples. I thank SAS Institute, Cary, NC, for providing computing support to Heon Jin Park for the construction of tables for unit root tests. The research for the second edition was partly supported by joint statistical agreements with the U.S. Bureau of the Census.
I thank Judy Shafer for the extensive word processing required during preparation of the second edition.
WAYNE A. FULLER
Ames. lo wa November 1995
List of Principal Results
Theorem 1.4. I 1.4.2 1.4.3 1.4.4 1.5.1 2.2. I
2.2.2 2.2.3 2.4.1 2.4.2 2.6.1
2.6.2
2.6.3
2.6.4 2.7. I
2.7.2
2.8. I
2.8.2
2.9.1 2.9.2
Topic Covariance function is positive semidefinite, 7 Covariance function is even, 8 Correlation function on real tine is a characteristic function, 9 Correlation function on integers is a characteristic function, 9 Covariance function of complex series is positive semidefinite, 13 Weighted average of random variables, where weights are absolutely summable, defines a random variable, 31 Covariance of two infinite sums of random variables, 33 Convergence in mean square of sum of random variables, 35 Order of a polynomial is reduced by differencing, 46 Jordan canonical form of a matrix, 51 Representation of autoregressive process as an infinite moving aver- age, 59 Representation of invertible moving average as an infinite autoregres- sion, 65 Moving average representation of a time series based on covariance function, 66 Canonical representation of moving average time series, 68 Representation of autoregressive moving average as an infinite moving average, 72 Representation of autoregressive moving average as an infinite autoregression, 74 Representation of vector autoregression as an infinite moving average, 77 Representation of vector moving average as an infinite autoregression, 78 Minimum mean square error predictor, 80 Durbin-Levinson algorithm for constructing predictors, 82
xv
xvi LIST OF PRINCIPAL RESULTS
2.9.3 2.9.4 2.10.1 2.10.2 3.1.1
3.1.2 3.1.3 3.1.4 3.1.5 ’ 3.1.6 3.1.7 3.1.8 3.1.9 3.1.10 3.3.1 3.4.1 4.2.1
4.3.1 4.3.2
4.3.3
4.3.4
4.3.5
4.4.1 4.5.1
5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 5.1.6 5.1.7 5.2.1
Predictors as a function of previous prediction errors, 86 Limit of prediction error is a moving average, 89 Limit of one period prediction emr, 94 Wold decomposition, % Sine and cosine functions form an orthogonal basis for N dimensional vectors, 112 Sine and cosine functions are orthogonal on [- n; r], 116 Bessel’s inequality, 118 Fourier coefficients are zero if and only if function is zero, 119 If Fourier coefficients are zero, integral of function is zero, 119 Integral of function defined in terms of Fourier coefficients, 120 Pointwise representation of a function by Fourier series, 123 Absolute convergence of Fourier series for a class of functions, 125 The correlation function defines a continuous spectral density, 127 The Fourier series of a continuous function is C e s h summable, 129 Fourier integral theorem, 133 Fourier transform of a convolution, 137 Approximate diagonalization of covariance matrix with orthogonal sine-cosine functions, 154 Spectral density of an infinite moving average of a time series, 156 Moving average representation of a time series based on covariance function, 162 Moving average representation of a time series based on continuous spectral density, 163 Autorepwive representation of a time series based on continuous spectral density, 165 Spectral density of moving average with square summable coeffi- cients, 167 Spectral density of vector process, 179 Linear filter for time series observed subject to measurement error, 183 Cbebyshev’s inequality, 219 Common probability limits, 221 Convergence in rth mean implies convergence in probability, 221 Probability limit of a continuous function, 222 The algebra of O,, 223 The algebra of o,, 224 Taylor expansion about a random point, 226 Convergence in law when difference converges in probability, 228
LIST OF PRINClPAL RESULTS xvii
5.2.2 5.2.3
5.2.4 5.2.5 5.2.6
5.3.1 5.3.2 5.3.3 5.3.4 5.3.5 5.3.6 5.3.7 5.3.8 5.4.1 5.4.2 5.4.3 5.4.4 5.5.1 5.5.2
5.5.3
5.5.4 5.6.1 5.7.1
5.7.2 5.7.3
5.7.4
5.8.1 5.8.2 6.1.1 6.1.2 6.1.3 6.2.1 6.2.2
Helly-Bray Theorem, 230 Joint convergence of distribution functions and characteristic func- tions, 230 Convergence in distribution of continuous functions, 230 Joint convergence in law of independent sequences, 230 Joint convergence in law when one element converges to a constant, 232 Lindeberg central limit theorem, 233 Liapounov central limit theorem, 233 Centrai limit theorem for vectors, 234 Central limit theorem for martingale differences, 235 Functional central limit theorem, 236 Convergence of functions of partial sums, 237 Multivariate functional central limit theorem, 238 Almost sure convergence of martingales, 239 Moments of products of sample means, 242 Approximate expectation of functions of means, 243 Approximate expectation of functions of vector means, 244 Bounded integrals of functions of random variables, 247 Limiting properties of nonlinear least squares estimator, 256 Limiting distribution of estimator &fined by an objective function, 260 Consistency of nonlinear estimator with different rates of conver- gence, 262 Limiting properties of one-step Gauss-Newton estimator, 269 Limiting properties of instrumental variable estimators, 275 Convergence of estimated generalized least squares estimator to generalized least squares estimator, 280 Central limit theorem for estimated generalized least squares, 284 Estimated generalized least squares with finite number of covariance parameters, 286 Estimated generalized least squares based on simple least squares residuals, 289 Convergence of roots of a sequence of polynomials, 295 Differentials of roots of determinantal equation, 298 Convergence in mean square of sample mean, 309 Variance of sample mean as function of the spectral density, 310 Limiting efficiency of sample mean, 312 Covariances of sample autocovariances, 314 Covariances of sample autocovariances, mean estimated, 316
XViii LlST OF PRINCIPAL RESULTS
6.2.3 6.3.1 6.3.2
6.3.3 6.3.4
6.3.5 6.3.6 6.4.1 6.4.2
7.1.1 7.1.2 7.2.1 7.2.2 7.3.1
7.4.1 7.4.2 7.4.3 8.2.1
8.2.2
8.2.3
8.3.1
8.4.1
8.4.2
8.5.1 8.5.2 8.5.3 8.6.1 8.8.1 9.1.1
Covariances of sample correlations, 317 Central limit theorem for rn-dependent sequences, 321 Convergence in probability of mean of an infinite moving average, 325 Central limit theorem far mean of infinite moving average, 326 Central limit theorem for linear function of infinite moving average, 329 Consistency of sample autocovariances, 331 Central Iimit theorem for autocovariances, 333 Covariances of sample autocovariances of vector time series, 342 Central limit theorem for cross covariances one series independent
Expected value of periodogram, 359 Limiting distribution of periodogram ordinates, 360 Covariances of periodogram ordinates, 369 Limiting behavior of weighted averages of priodogram ordinates, 372 Limiting behavior of estimated spectral density based on weighted autocovariances, 382 Diagonalization of covariance matrix of bivariate process, 387 Distribution of sample Fourier coefficients for bivariate process, 389 Distribution of smoothed bivariate periodogram, 390 Limiting distribution of regression estimators of parametem of pth order autoregressive pmess, 408 Equivalence of alternative estimators of parameters of autoregressive process, 418 Limiting distribution of estimators of parameter of pth order vector autoregressive process, 420 Limiting distribution of nonlinear estimator of parameter of first order moving average, 424 Limiting distribution of nonlinear estimator of vector of parameters of autoregressive moving average, 432 Equivalence of alternative estimators for autoregressive moving average, 434 Order of error in prediction due to estimated parameters, 444 Expectation of prediction error, 445 Order n-' approximation to variance of prediction error, 446 Polynomial autoregression, 452 Maximum likelihood estimators for long memory processes, 470 Limiting distribution of simple least squares estimated parameters of regression model with time series errors, 478
(0, d), 345
LIST OF PRINCIPAL RESULTS xix
9.1.2
9.1.3
9.3.1
9.4.1 9.4.2 9.4.3 9.7.1
9.7.2
9.8.1
9.8.2
9.8.3
10.1.1
10.1.2
10.1.3
10.1.4
10.1.5
10.1.6
10.1.7
10.1.8
10.1.9
10.1.10
10.1.1 1
Spectral representation of covariance matrix of simple least squares estimator and of generalized least squares estimator, 479 Asymptotic relative efficiency of simple least squares and generalized least squares, 480 Properties of autocovariances computed from least squares residuals, 485 Centered moving average estimator of polynomial trend, 501 Trend moving average removes autoregressive unit root, 502 Effect of a moving average for polynomial trend removal, 504 Asymptotic equivalence of generalized least squares and estimated generalized least squares for model with autoregressive errors, 521 Limiting distribution of maximum likelihood estimator of regression model with autoregressive errors, 526 Limiting distribution of least squares estimator of model with lagged dependent variables, 530 Limiting distribution of instrumental variable estimator of model with lagged dependent variables, 532 Limiting properties of autocovariances computed from instrumental variable residuals, 534 Limiting distribution of least squares estimator of first order auto- regressive process with unit root, 550 Limiting distribution of least squares estimator of pth order auto- regressive process with a unit root, 556 Limit distribution of least squares estimator of first order unit root process with mean estimated, 561 Limiting distribution of least squares estimator of pth order process with a unit root and mean estimated, 563 Limiting distribution of least squares estimator of unit root autoreges- sion with drift, 566 Limiting distribution of least squares estimator of unit root autoregres- sion with time in fitted model, 567 Limiting distribution of symmetric estimator of first order unit root autoregressive process, 570 Limiting distribution of symmetric estimators adjusted for mean and time trend, 571 Limiting distribution of symmetric test statistics for pth order auto- regressive process with a unit root, 573 Limiting distribution of maximum likeIihood estimator of unit root, 573 Order of e m r in prediction of unit root process with estimated parameters, 582
xx LIST OF PRINCIPAL RESULTS
10.2.1 10.2.2
10.3.1
Limiting distribution of explosive autoregressive estimator, 585 Limiting distribution of vector of least squares estimators for pth order autoregression with an explosive mot, 589 Limiting distribution of least squares estimator of vector of estimators of equation containing the lagged dependent variable with unit coefficient, 600 Limiting distribution of least squares estimator when one of the explanatory variables is a unit root process, 603 Limiting distribution of least s q m estimator of coefficients of vector process with unit roots, 610 Limiting distribution of maximum likelihood estimator for multi- variate process with a single root, 613 Limiting distribution of maximum likelihood estimator for multi- variate process with g unit roots, 619
10.3.2
10.3.3
10.3.4
10.3.5
List of Examples
Number 1.3.1 1.3.2 1.3.3 1.3.4 2.1.1 2.1.2 2.3. I 2.5.1 2.9.1 2.9.2 4.1.1 4.1.2
4.5.1 4.6. I 4.6.2 4.6.3 4.6.4 4.6.5 5.1.1 5.4.1 5.5.1 5.5.2 5.5.3 5.6. I 6.3.1
Topic Finite index set, 4 White noise, 5 A nonstationary time series, 5 Continuous time series, 6 First order moving average time series, 23 Second order moving average time series, 24 First order autoregressive time series, 40 Correlogram of time series, 57 Prediction for unemployment rate, 90 Prediction for autoregressive moving average, 91 Spectral distribution function composed of jumps, 147 Spectral distribution function of series with white noise component, 148 Filter for time series observed subject to measurement emr, 184 Kalman Filter for Des Moines River data, 192 Kalman filter, Des Moines River, mean unknown, 193 M a n filter, autoregressive unit root, 195 &ilman filter for missing observations, 198 Predictions constructed with the Kalman filter, 202 Taylor expansion about a random point, 227 Approximation to expectation, 248 Transformation of model with different rates of convergence, 265 Failure of convergence of second derivative of nonlinear model, 267 Gauss-Newton estimation, 272 Instrumental variable estimation, 277 Sample autocorrelations and means of unemployment rate,, 336
xxii LIST OF EXAMPLES
6.4.1 7.1.1 7.2.1 7.2.2 7.4.1 8.2.1 8.3.1 8.4.1 8.4.2 8.5.1 8'6.1 8.7.1 8.7.2 9.2.1 9.3.1 9.3.2 9.5.1 9.7.1 9.7.2
9.7.3 9.8.1
10.1.1 10.12 10.1.3 10.1.4 10.2.1 10.2.2 10.3.1 10.3.2 10.4.1
Autocomlations and cross correlations of Boone-Saylorville data, 345 Periodogram for wheat yield data, 363 Periodogram of autoregression, 375 Periodogram of unemployment rate, 379 Cross spectrum computations, 394 Autoregressive fit to unemployment rate, 412 Estimation of first order moving average, 427 Autoregressive moving average fit to artificial data, 439 Autoregressive moving average fit to unemployment rate, 440 Prediction with estimated parameters, 449 Nonlinear models for lynx data, 455 Missing observations, 459 Outlier observations, 462 Grafted quadratic fit to U.S. wheat yields, 482 Variance as a function of the mean, 490 Stochastic volatility model, 495 Structural model for wheat yields, 510 Regression with autoregressive errors, spirit consumption, 522 Nonlinear estimation of trend in wheat yields with autoregressive error, 527 Nonlinear estimation for spirit consumption model, 528 Regression with lagged dependent variable and autoregressive errors, 535 Estimation and testing for a unit root, ordinary least squares, 564 Testing for a unit root, symmetric and likelihood procedures, 577 Estimation for process with autoregressive root in (-1,1], 581 Prediction for a process with a unit root, 583 Estimation with an explosive root, 593 Prediction for an explosive process, 596 Estimation of regression with autoregressive explanatory variable, 606 Estimation and testing for vector process with unit roots. 624 Test for a moving average unit root, 635
Introduction to Statistical Time Series
C H A P T E R 1
Introduction
The analysis of time series applies to many fields. In economics the recorded history of the economy is often in the form of time series. Economic behavior is quantified in such time series as the consumer price index, unemployment, gross national product, population, and production. The natural sciences also furnish many examples of time series. The water level in a lake, the air temperature, the yields of corn in Iowa, and the size of natural populations are all collected over time. Growth models that arise naturally in biology and in the social sciences represent an entire field in themselves.
The mathematical theory of time series has been an area of considerable activity in recent years. Applications in the physical sciences such as the development of designs for airplanes and rockets, the improvement of radar and other ebctronic devices, and the investigation of certain production and chemcial processes have resulted in considerable interest in time series analysis, This recent work should not disguise the fact that the analysis of time series is one of the oldest activities of scientific man.
A successful application of statistial methods to the real world requires a melding of statistical theory and knowledge of the material under study. We shall confine ourselves to the statistical treatment of moderately well-behaved time series, but we shall illustrate some techniques with real data.
1.1. PROBABILH'Y SPACES
When investigating outcomes of a game, an experiment, or some natural phenomenon, it is useful to have a representation for all possible outcomes. The individual outcomes, denoted by o, are called elementary events. The set of all possible elementary events is called the sure event and is denoted by R. An example is the tossing of a die, where we could take R = (one spot shows, two spots show,. . . , six spots show} or, more simply, R = {1,2,3,4,5,6}.
Let A be a subset of R, and let d be a collection of such subsets. If we observe the outcome o and w is in A, we say that A has occurred. Intuitively, it is possible
1
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
2 INTRODUCTION
to specify P(A), the probability that (or expected long-run frequency with which) A will occur. It is reasonable to require that the function P(A) satisfy:
AXIOM 1. P(A) 3 0 for every A in I.
A X I O M ~ . IfA,,A,, ... isacountablesequencefromdandA,flAj isthenull AXIOM 2. P ( 0 ) = 1.
set for all i Zj , then P( U ;= A,) = 22;= P(A,).
Using our die tossing example, if the die is fair we would take P(A) = i [the number of elementary events o in A]. Thus P({1,3,5)) = i X 3 = i. It may be verified that Axioms 1 to 3 are satisfied for I equal to fin), the collection of all possible subsets of 0.
Unfortunately, for technical mathematical reasons, it is not always possible to define P(A) for all A in P(0) and also to satisfy Axiom 3. To e h b t e this difficulty, the class of subsets I of 12 on which P is defined is restricted. The collection d is required to satisfy:
1. If A is in &I?, then the complement A" is also in 58. 2. IfA,,A, ,... isacountablesequencefromI,then u ~ = l A i h i n I . 3. The ndl set is in d.
A nonempty collection I of subsets of 0 that satisfies conditions 1 to 3 is said to be a sigma-algebra or sigma-Bid.
We are now in a position to give a formal definition of a probability space. A probability space, represented by (0, I, P), is the sure event Q together with a sigma-algebra 1 of subsets of 12 and a function P(A) defined on I that satisfies Axioms 1 to 3.
For our purposes it is unnecessary to consider the subject of probability spaces in detail. In simple situations such as tossing a die, 0 is easy to enumerate, and P satisfies Axioms 1 to 3 for I equal to the set of all subsets of 0.
Although it is conceptually possible to enumerate all possible outcomes of an experiment, it may be a practical impossibility to do so, and for most purposes it is unnecessary to do so. It is usually enougb to record the outcome by some function that assumes values on the real line. That is, we assign to each outcome o a real number X(o), and if o is observed, we record X(w). In our die tossing exampie we could take X(w) = 1 if the player wins and - 1 if the house wins.
Formally, a random variable X is a real valued function defined on a such that the set {o : X(w) six} is a member of I for every real number x. The function F,(x) = P({w : X(o) 6 x}) is called the distribution function of the random variable X.
The reader who wishes to explore further the subjects of probability spaces, random variables, and distribution functions for stochastic processes may read Tucker (1967, pp. 1-33). The preceding brief introduction will sufEce for our Purposes.
Tam SERIES 3
1.2. TIME SERIES
Let (a, d, P) be a probability space, and let T be an index set. A real valued time series (or stochastic process) is a real valued function X(t, o) defined on T X fi such that for each fixed t, X(t , o) is a random variable on (a, d, P). The function X ( t , o ) is often written X , ( o ) or X,, and a time series can be considered as a collection {X, : t E T} of random variables.
For fixed o, X(t, w ) is a real valued function of t. This function of f is called a realization or a sample function. If we look at a plot of some recorded time series such as the gross national product, it is important to realize that conceptually we are looking at a plot of X ( t , w ) with o fixed. The collection of all possible realizations is called the ensemble of functions or the ensemble of realizations.
If the index set contains exactly one element, the stochastic process is a single random variable and we have d e W the distribution function of the process. For stochastic processes with more than one random variable we need to consider the joint distribution function. The joint distribution function of a finite set of random variables {Xt1, XIZ, . . . , X, } from the collection {X, : t E T} is defined by
A time series is called strictly stationary if
where the equality must hold for all possible (nonempty finite distinct) sets of indices t , , t 2 , . . . , tm and t , + h , t2 + h , , . . , t , + h in the index set and all ( . X , ~ , X , ~ , . . . ,x , ) in the range of the random variable X,. Note that the indices t,, t,, . . . , t, are not necessarily consecutive. If a time series is strictly stationary, we see that the distribution function of the random variable is the same at every point in the index set. Furthermore, the joint distribution depends only on the distance between the elements in the index set, and not on their actual values. Naturally this does not mean a particular realization will appear the same as another realization.
If {X,: t E T} is a strictly stationary time series with E{B,l}C 00, then the expected value of X, is a constant for all t, since the distribution function is the same for a~ r. Likewise, if E{x:} < 00, then the variance of X, is a constant for all t.
A time series is defined completely in a probabilistic sense if me knows the cumulative distribution (1.2.1) for any finite set of random variables (X,r, X,2, . . . ,XI"). However, in most applications, the form of the distribution function is not known. A great deal can be accomplished, however, by dealing
4 INTRODUCplON
only with the first two moments of the time series. In line with this approach we define a time series to be weakly stationary if:
1. The expected value of XI is a constant for all t. 2. The covariance matrix of (XI,, XI*, . . . ,XI ) is the same as the covariance
matrix of ( X ~ ~ + , , , X ~ ~ + ~ , ~ . . ,x,,+,+) for a11 nGnempty finite sets of indices ( t , , t,, , . . ,t,,), and all h such that t , , t,, . . . , t , , t , + h, t, + h , . . . , r,, + h are contained in the index set.
As before t , , t,, . . . , t,, are not necessarily consecutive members of the index set. Also, since the expected value of XI is a constant, it may conveniently be taken as 0. The covariance matrix, by definition, is a function only of the distance between observations. That is, the covariance of XI+,, and X, depends only on the distance, h, and we may write
where E{X,} has been taken to be zero. The function Hh) is called the autocovariance of XI. When there is no danger of confusion, we shall abbreviate the expression to covariance.
The terms stationary in the wide sense, covariance stationary, second order stationary, and stationary are also used to describe a weakly stationary time series. It follows from the definitions that a strictly stationary process with the first two moments finite is also weakly stationary. However, a strictly stationary time series may not possess finite moments and hence may not be covariance stationary.
Many time series as they occur in practice are not stationary. For example, the economies of many countries are developing or growing. Therefore, the typical economic indicators will be showing a “ttend” through time. This trend may be in either the mean, the variance, or both. Such nonstationaiy time series are sometimes called evolutionary. A good portion of the practical analysis of time series is connected with the transformation of an evolving time series into a stationary time series. In later sections we shall consider several of the procedures used in this connection. Many of these techniques will be familiar to the reader because they are closely related to least squares and regression.
13. EXAMPLES OF STOCHASTIC PROCESSES
Example 13.1. Let the index set be T= {1,2}, and let the space of outcomes be the possible outcomes associated with tossing two dice, one at “time” t = 1 and one at time t = 2 . Then
fz = ( I , 2,3,4,5,6}X {1,2,3,4,5,6}.
EXAMPLES OP STOCHASTIC PROCESSES 5
Define
X(r, w) = t + [value on die r ] * . Therefore, for a particular o, say o, = (1,3), the realization or sample function would be (1 + 1,2 + 9) = (2,ll). In this case, both St and Tare finite, and we are able to determine that there are 36 possible realizations. AA
Example 1.32. One of the most important time series is a sequence of uncorrelated random variables, say {el : t E (0,4 1,+2,. . .)}, each with zero mean and finite variance, c2 > 0. We say that e, is a sequence of UncorreIated (0, a*) random variables.
This time series is sometimes called white noise. Note that the index set is the set of all integers. The set 4'l is determined by the range of the random variables. Let us assume that the e, are normally distributed and therefore have range (-m,a). Then w € S t is a real valued infinite-dimensional vector with an element associated with each integer. The covariance function of {e,} is given by
h = O , '(') _(0"2 ' otherwise .
Because of the importance of this time series, we shall reserve the symbol el for a sequence of uncorrelated random variables with zero mean and positive finite variance. On occasion we shall further assume that the variables are independent and perhaps of a specified distribution. These additional assumptions will be stated when used.
In a similar manner the most commonly used index set will be the set of all integers. If the index set is not stated, the reader may assume that it is the set of ail integers. AA
Example 133. Consider the time series
where B = (b,, a2)' is distributed as a bivariate normal random varible with mean #3 = (PI, &>' and covariance matrix
(:: Z::) *
Any realization of this process yields a function continuous on the interval [0, 11, and the process therefore is called conrinuous. Such a process might represent the outcome of an experiment to estimate the linear response to an input variabIe measured on the interval [0,1]. The 8 ' s are then the estimated regression coefficients. The set il is the two-dimensional space of real numbers. Each
6 INTRODUCTION
experiment conducted to obtain a set of estimates of /$ and realization of the process.
constitutes a
The mean function of X, is given by
mf) = a 1 + At> = w + &t 9
and the covariance function by
It is clear that this process is not stationary. AA
Example 13.4. For the reader familiar only with the study of multivariate statistics, the idea of a random variable that is both continuous and random in time may require a moment of reflection. On the 0th hand, such processes occur in nahue. For example, the water level of a lake or river cao be plotted as a continuous function of time. Futhemore, such a plot might appear so smooth as to support the idea of a derivative or instantaneous rate of change in level.
Consider such a process, {Xf: r €(-m, m)}, and let the covariance of the process be given by
Hh) = E{(Xf - P)(X~+~ - p)) = Ae-uh2, a > 0. '
Thus, for example, if time is measured in hours and the river level at 7:OO A.M. is reported daily, then the covariance between daily reports is Ae-576a. Likewise, the covariance between the change from 7:OO A.M. to 8:OO A.M. and the change from 8:OO A.M. to 900 A.M. is given by
E{(x,+, - X,XX,+, -x,+,) = -A[I - 2e-" + e - 4 a ~ .
Note that the varianm of the change from r to r + h is
var(x,+, -Xt}=2A[1 - e -*h21 .
and
limvar(x,+h-x,}=o. h 4
A process with this We might define
property is called man square conti-. the rate of change per unit of time by
AUTUCOVARLANCE AND AUTOCORRELATION PUNCITONS 7
For a fixed h > 0, R,(h) is a well-defined random variable and
Furthermore, by L'Hospital's rule,
2A[2ahe - ah23 lim Var@,(k)} = lim h -+O h+O
= 2 A a .
Stochastic processes for which this limit exists are called meun squaw differentiable. AA
1.4. PROPERTIES OF THE AUTOCOVARIANCE AND AUTOCORRELATION FUNCTIONS
To compare the basic properties of time series, it is often useful to have a function that is not influenced by the units of measurement. To this end, we define the autocorrelation function of a stationary time series by
Thus the autocorrelation function is the autocovariance function normalized to be one at h = 0. As with the autocovariance function, we shall abbreviate the expression to correlation function when no confusion will result.
The autocovariance and autocorrelation functions of stationary time series possess several distinguishing characteristics. A function Ax) defined for x E ,y is said to be positive semide$nite if it satisfies
( 1.4.1)
for any set of real numbers (u, , u2.. . . , a n ) and for any (r,, t,, . . . , f n ) such that t , - t, is in ,y for all (i , j ) . Some authors reserve the term positive semidefinite for a function that is zero for at least one nonzero vector (a,, a,, . . . , un) and use nonnegative definite to describe functions satisfying (1.4.1). We shal1 use both terms for functions satisfying ( I .4.1).
Theorem 1.4.1. The covariance function of the stationary time series {T, : t E T } is nonnegative definite.
Proof. Without loss of generality, let E{X,} = 0. Let (tl, t,, . . . , t , ) E T, let (al, a2 , , . . , u,,) be any set of real numbers, and let r(tk - t,) be the covariance
8 INTRODUCTION
between X, and Xtk. We b o w that the variance of a random variable, when it is defined, is 'nonnegative. Therefore,
A
If we set n = 2 in (1.4.1), we have
which implies
For t , - t , = h we set - a , = a , = l and then set a , = a , = I to obtain the well-known property of correlations:
The concepts of even and odd functions (about zero) will prove useful in our study. For example, cast is an even function. The use of this description is apparent when one notes the symmetry of cos t on the interval [ - n, n]. Similarly, sin t is an oddfunction. In general, an even function, fit), defined on a domain T, is a function that satisfiesfit) = f l - t ) for all t and - f in T. An odd function g(f) is a function satisfying g(t) = -g(-t) for ali t and --t in T. Many simple praperties of even and odd functions fallow immediately. For example, iffit) is an even function and g(t) is an odd function, where both are integrable on the interval [-A, A], then, for 0 S b C A,
As an exercise, the reader may verify that the product of two even functions is even, the product of two odd functions is even, the product of an odd and an even function is odd, the sum of even functions is even, and the sum of odd functions is odd.
Theorem 1.4.2. The covariance function of a real valued stationary time series is an even function of ir. That is, Hh) = y(-h).
AUTOCOVARIANCE AND AUTOCORRELATION FUNCTIONS 9
Proof. We assume, without loss of generality, that E{X,} = 0. By stationarity,
E{X,X,+,)= r(h)
for all r and t + h contained in the index set. Therefore, if we set to = t , - h,
A
Given this theorem, we shall often evaluate r(h) for real valued time series for nonnegative h only. Should we fail to so specify, the reader may always safely substitute lhl for h in a covariance function.
In the study of statistical distribution functions the characteristic function of a distribution function is defined by
whek the integral is a Lebesgue-Stieltjes integral, G(x) is the distribution function, and ecxh is the complex exponential defined by lXh = cos xh f 6 sin xh. It is readily established that the function q(h) satisfies:
1. do)= 1. 2. Ip(h)j s 1 for all h E (-a, 00).
3. p(h) is uniformly continuous on (-m,w).
See, for example, Tucker (1967, p. 42 ff.) or Gnedenko (1967, p. 266 ff.). It can be shown that a continuous function p(h) with p(0) = 1 is a characteristic
function if and only if it is nonnegative definite. For example, see Gnedenko (1967, pp. 290, 387).
Tbeoresn 1.43. The real valued function p(h) is the correlation function of a mean square continuous stationary real valued time series with index set T = (-do, m) if and only if it is representable in the form
f"
where ax) is a symmetric distribution function, and the integral is a Lebesgue- Stieltjes integraI.
For the index set T = (0, rt 1, 2 2 , . . .}, the corresponding theorem is:
Theorem 1.4.4. The real valued function p(h) is the correlation function of a real valued stationary time series with index set T = {0,2 1.22, . . .} if and only if
10 INTRODUCTION
it is representable in the form
where a x ) is a symmetric distribution function.
This theorem will be discussed in Chapter 3. Associated with the autocorrelation function of a time series is the partial
autocorrelation function. Before discussing ideas of partial correlation for time series, we recall the partial correlation coefficient of multivariate analysis. Let Y = (Y,* Y2, . . . * Y,) be a p-variate random variable with nonsingular covariance matrix. Then the partial correlation between Yl and Yz after Y3 is
where pu=(qio;,)-1’2q,2and qj is the covariance between Y, and 5. An alternative definition of ~ 1 2 . 3 in tern of regression coefficients is
( 1.4.4) Pi2.3 = 882.3fl2821.3 9
where j3,,.k is the partial regression coefficient for q in the regression of U, on y j and Yk. For example,
813.2
Recall that pi,+ is the simple population regression coefficient for the regression of
U, - P A on yl. - &Y,, where & = aki’a,,. Therefore, the partial conelation between U, and y j after V, is the simple correlation between Y, - &Y, and Y, - &Yk. Also note that the sign of
is always equal to the sign of &k. The multiple correlation associated with the regression of on y/ and U, is denoted by Ri,, , and is defined by the equation
(1.4.5) 1 -R:(jk) = (1 - pi)(1 - p:k.j) = (1 - p:&)(l - pi.&> *
Using (1.4.4), the extension of the definition to higher order partial correlations is straightforward. The squared partial cumiation between & and 7 after adjusting for Y, and & is
( 1.4.6) 2 P t , . k l = 6 , . k J $ . k I 9
AUTOCOVARIANCE AND AUTOCORRELATlON FUNCTIONS 11
where
qj q k q{ - I (2:;)=(; 2: ;) (z). (1.4.7)
The partial correlation P,~.~, can also be interpreted as the simple Correlation between yi - pikJ, - /3,/J and - &Y, - & k & The multiple correlation associated with the regression of yi on the set (U,. Yk, U,) satisfies
pi/. j k
2 1 - RZ,,/, = (1 - P i x 1 - P;k.j)(l - Pi/.,*)
( 1.4.8)
The squared partial correlation between Y, and Y, after adjusting for
2 = (1 - PjL,)(l - P ; / . N - P,J.t,)
(Y3, Y,,. . . , Y,) is
2 P12.3.4 ... .p [email protected], .p&l.3.4 ..... p 9
where /312.3.4, , , p is the population regression coefficient for Y2 in the regression of Yl on Yz, Y3, . . . , Y,, and ,,, is the population regression coefficient for Yl in the regression of Y2 on Yl, Y3, Y,, . . . , Y,. The multiple correlation between Y, and the set (Y2, Y,, . . . , Y,) satisfies the equation
2 2 2 2 1 - R I ( 2 . 3 , , ,p)=(l - k ) ; Z ) ( l - P 1 3 . 2 ) ( 1 - P 1 4 . 2 . 3 ) ” ’ ( 1 -P lp .2 .3 .._. p - 1 ) ’
(1.4.9)
< 1, the partial autocarrelatation function is denoted by &h) and, for 0 < h G r, is the partial correlation between U, and U,+h after adjusting for Y,+I, Y,+2,. . . , T + h - , . It is understood that 40) = 1 and Cjyl) = p(i). The partial autocorrelation function is defined in an analogous way for h < 0 . Because p(h) is symmetric, +(h) is symmetric. Let q.h be the population regression coefficient for q+, 1 C i S h, in the regression of Y, of U,- yI-+ . , . , y I - h . The population regression equation is
For a covariance stationary time series U, with R:o,3.
where
( 1.4.1 1)
12 INTRODUCTION
the matrix of correlations is positive definite by the assumption that R ; ( ~ , ~ , . , , , , . - , ) < 1, and a,,, is the population regression residual. From (1.4.11).
&h) = ehh (1.4.12)
because the coefficient for U,-h in the regression of Y, on U,- Y,-2, . . . , q-h is equal to the coefficient for U, in the regression of q-,, on &-h+ I , q - h + Z I . . . , U,. The equality follows from the symmetry of the autocorrelation function d h ) . From (1.4.9) we have
(1.4.13)
1.5. COMPLEX VALUED TIME SERlES
Occasionally it is advantageous, from a theoretical point of view, to consider complex valued time series. Letting X, and Y, be two real valued time series, we define the complex valued time series 2, by
z, =x, + fix. (1.5.1)
The expected value of 2, is given by
and we note that
E*{Z,} = E{ZY} = E{X,} - iE{y,},
(1 5 2 )
(1.5.3)
where the symbol "*" is used to denote the complex conjugate. The covariance of 2, and z , + h is defined as
Note that the variance of a complex valued process is always real and nonnegative, since it is the sum of the variances of two real valued random variables.
The definitions of stationarity for complex time series are completely analogous to those for real time series. Thus, a complex time series Z, is weakly stationary if the expected value of 2, is a constant for all z and the covariance matrix of (Zf,, Z12, . . . ,Z, ) is the same as the covariance matrix of (Z,, +,,, Z , Z + h , . . . , Zt ,+h) ,
where all indi&s are contained in the index set.
PERIODIC FUNCTIONS AND PERIODIC TIME SERIES 13
From (1.5.4), the autocovariance of a stationary complex time series 2, with zero mean is given by
We see that g, (h) is a symmetric or even function of h, and g,(h) is an odd function of h. By the definition of the complex conjugate, we have
rz*(h) = 7g-W * (1.5.6)
Therefore, the autocovariance function of a complex time series is skew symmetric, where (1.5.6) is the definition of a skew symmetric function.
A complex valued fucntion r( - ), defined on the integers, is positive semidefi- nite if
(1 57)
for any set of n complex numbers (u, , u2, . . . , un) and any integers (t I , t,, . . . , rJ. Thus, as in the real valued case, we have:
Theorem 1.5.1. The covariance function of a stationary complex valued time series is positive semidefinite.
It follows from Theorem 1.5.1 that the correlation inequality holds for complex random variables; that is,
In the sequel, if we use the simple term “time series,” the reader may assume that we are speaking of a reaI valued time series. All complex valued time series will be identified as such.
1.6. PERIODIC FUNCTIONS AND PERIODIC TIME SERIES
Periodic functions play an important roie in the analysis of empirical time series. We define a function f(t) with domain T to be periodic if there exists an H > 0 such that for all t, t + H E T,
At + H) =fct>. where H is the period of the function. That is, the functionflt) takes on all of its
14 WTRODUmION
possible values in an interval of length H. For any positive integer, k, kff is also a period of the function. While examples of perfectly periodic functions are rare, there are situations where observed time series may be decomposed into the sum of two time series, one of which is periodic or nearly periodic. Even casual observation of many economic time series will disclose seasonal variation wherein peaks and troughs occur at approximately the same month each year. Seasonal variation is apparent in many natural time series, such as the water level in a lake, daily temperature, wind speeds and velocity, and the levels of the tides. Many of these time series also display regular daily variation that has the appearance of rough “cycles.”
The trigonometric functions have traditionally been used to approximate periodic behavior. A function obeying the sine-cosine type periodicity is com- pletely specified by three parameters. Thus, we write
At) = A sin( At + q) , ( 1.6.1)
where the amplitude is the absolute value of A, A is the frequency, and Q is the phase angle. The frequency is the number of times the function repeats itself in a period of length 2 ~ . The phase angle is a “shift parameter” in that it determines the points ( - - A - ’ Q plus integer multiples of T) where the function is zero. A parametrization that is more useful in the estimation of such a function can be constructed by using the trigonometric identity
sin(Ar + 4p) = sin A t cos Q + cos A t sin 4p .
Thus (1.6.1) becomes
At) = B, cos At + B, sin A t , (1.6.2)
where B, = A sin Q and B, = A cos 4p.
perfect periodicity. Define {X, : t E (0, -t 1,-t2, . . .)) by Let us consider a simple type of time series whose realizations will display
X, = e , cos At + e, sin At, (1.6.3)
where e , an e, are independent drawings from a normal (0 , l ) population. Note that the realization is completely determined by the two random variables e l and e,. The amplitude and phase angle vary from realization to realization, but the period is the same for every realization.
The stochastic properties of this time series are easily derived:
E{X,} = E{e, cos At + e2 sin Ar} = O ;
r(h) = E{(e, cos At + e2 sin At)[e, cos A(t + h) + e, sin A(r + h)]}
= E{e f cos k cos A(t f h) + e 2” sin At sin A(t + h) ( 1.6.4)
VECTOR VALUED TIME SERIES 1s
+ e,e2 cos At sin A(t + h) + e2el sin At cos A(t + h)}
= cos At cos A(t + h) + sin At sin A(t + h) =cosMt.
We see that the process is stationary. This example also serves to emphasize the fact that the covariance function is obtained by averaging the product X;Yl+,, over realizations.
It is clear that any time series defined by the finite sum
M M
( 1.6.5)
where e, , and eZj are independent drawings from a normal (0, a:) population and A, Z A, for k Sj, will display periodic behavior. The representation (1.65) is a useful approximation for portions of some empirical time series, and we shall see that ideas associated with this representation are important for the theoretical study of time series as well.
1.7. VECTOR VALUED TIME SERIES
Most of the concepts associated with univariate time series have immediate generalizations to vector valued time series. We now introduce representations for multivariate processes.
The kdimensional time series {X,: t = 0, rt 1 , t 2 , . . .} is defined by
XI = [ X f , . x,,, . f . 7 &,I' 9 (1.7.1)
where {Xi, : t = 0, If: 1, 22, . . .}, i = 1,2,. . . , k, are scalar time series. The expected value of Xi is
E@:) = [W,,), E{X,,}, * 9 E{X,,)I 9
Assuming the mean is zero, we define the covariance matrix of X, and XI+,, by
As with scalar time series, we define X, to be covariance stationary if:
16 UriaODUClgON
1. The expected value of X I is a constant function of time. 2. The covariance matrix of XI and X,+,, is the same as the covariance matrix
of X, and X,,,, for all r, t + h, j , j + h in the index set.
If a vector time series is stationary, then every component scalar time series is stationary. However, a vector of scalar stationary time series is not necessarily a vector stationary time series.
The second stationarity condition means that we can express the covariance matrix as a function of h only, and, assuming E{X,} = 0, we write
for stationary time series. Note that the diagonal elements of this matrix are the autocovariances of the $,. The off-diagonal elements ate the cross covariances of Xi, and Tg. The element E{X,&.f+h} is not necessarily equal to E{X,,f+,,X,,}, and hence r(h) is not necessarily equal to r(-h). For example, let
X I , = el , X z f = e f + / 3 e f _ , .
(1.7.3)
Then
However,
and it is clear that
It is easy to verify that (1.7.4) holds for all vector stationary time series, and we state the result as a lemma.
Lemma 1.7.1. The autocovariance matrix of a vector stationary time series satisfies r (h ) = r’( - h).
The nonnegative definite property of the scalar autocovariance function is maintained essentially unchanged for vedor processes.
EXERCISES 17
Lemma 1.7.2. The covariance function of a vector stationary time series {X, : r E T} is a nonnegative definite function in that
for any set of real vectors {a,, a2,. . . , a,} and any set of indices {rl, t,, . . . , f,} E T.
Proof. The result can be obtained by evaluating the variance of
A
We define the correlation matrix of X, and X,+,, by
P(h) = D, ‘r(h)D, I , (1.7.5)
where Do is a diagonal matrix with the square root of the variances of the Xi, as diagonal elements; that is,
D: = diagb,, (01, ~ ~ ( 0 ) . . . . y,(O)} . The ijth eiement of P(h), p,(h), is called the cross correlation of X,, and Xjt.
For the time series of (1.7.3) we have
Chung (1968). Gnedenko (1%7), M v e (1963). Rao (1973), Tucker (1967). Yaglom (1%2).
EXERCISES
1. Determine the mean and covariance function for Example 1 of Section 1.3.
2. Discuss the statiomity of the following time series:
18 INTRODUCTION
(a) (X, : t E (0, 5 1.22, . . .)} =Value of a randomly chosen
] t odd observation from a n o d distribution with mean $ and variance +
t even I = 1 if toss of a true coin results in a head
= 0 if toss of a true coin results in a tail
(b) {X,: t E (0, tl,-t-2,. . .)} is a time series of independent identicidly distributed random variables whose distribution function is that of Student’s t-distribution with one degree of freedom.
t = O , ={Ti:, +@,, t = 1,2,,, .,
where c is a canstant, IpI< 1, and the el are iid(0.1) random variables.
3. Which of the following processes is covariance stationary for T = { t : t E (0.1,2,. . .)}, where the e, am independent identically distributed (0,l) random variables and a, , a, are fixed real numbers? (a) e, + e2 cos t. (b) el + e, cost t + e3 sin t. (c) a, + e l cos t. (d) a, + e l cost + e2 sin t. (e) el + a, cos t. (f) a, +e,a:+e2, O<a,<l.
4. (a) Prove the following theorem.
Theorem, If X, and y( are independent covariance stationary time series, then UX, + bY,, where a and b are real numbers, is a covariance stationary time series.
(b) Let Z, = X, f Y,. Give an example to show that Z, covariaace stationary does not imply X, is covariance stationary.
5. Give a covariance stationary time series such that p(h) # 0 for ad h.
6. Which of the following functions is the covariance function of a stationary time series? Explain why or why not.
EXERCISES 19
(a) g(h) = 1 + Ihl, h = 0 , t l , t 2 , . . . ;
(b)g(h)= -+, h = - C l ,
(c) g(h)=l+$sin4h, h=0 ,+1 ,+2 ,... ; (d) g(h)= 1 + $ cos4h, h =0,+1,+2, . . .;
1 , h = O ,
{ 0 otherwise;
1 , h = 0 , + 1 , 0 otherwise.
7. Let e, be time series of normal independent (0, a') random variables. Define
e, ifcisodd, Y, ={ -e, iff is even.
Is the complex valued time series Z, = e, f iU, stationary?
8. Let
1 , h = O ,
0 otherwise.
(a) For what values of a is p(h) the autocorrelation function of a stationary
(b) Compute the partial autocorrelation function associated with p(h) , given time series?
that a is such that p(h) is an autocorrelation function.
9. Let the time series $ be defined by
U, = e , cos 0.5m + e2 sin 0.5.rrt + e,
for f = 3,4, . . . , where e, - NI(0,l) and -M(O, 1) denotes normally and independently distributed with zero mean and unit variance. Compute the partial autocorrelation function of U, for h = 1,2,3,4.
10. Show that if /312.3 of (1.4.4) is zero, then i s also zero. Extend this result to the pth order partial regression coefficients ., p-l and P,,l.2, , ,p-l.
11. Assume that for the stationary time series U,, there is some r such that
correlation between Y, and (Y2,. . . , Y,). Show that I& - 1)1= 1 and that the correlation matrix in (1.4.1 1) is singular for h 3 r.
2 R& ..,,) = 1, and R f ( 2 . 3 , . . , r - l ) C 1, where R,(2 ,3 , .,) is the squared multiple
20 IN'TRODUCTION
12. Let {Z,} be a sequence of n o d independent (0.1) random variables defined on the integers. (a) Give the mean and covariance function of the time series (i) U, = Z,Z,-,
(b) Is the time series and (ii) X, = 2, + Z,- I .
t even,
covariance stationary? Is it strictly stationary?
C H A P T E R 2
Moving Average and Autoregressive Processes
In any study, one of the first undertakings is to describe the item under investigation, Typically there is no unique description and, as the discipline deveiaps, alternatives appear that prove more useful in some applications.
"he joint distribution function (1.2.1) gives a complete description of the distributional properties of a time series. The covariance function provides a less complete description, but one from which useful conclusions can be drawn. As we have noted, the correlation function has functional properties analogous to those of the characteristic function of a statisticat distribution function. Therefore, the distribution function associated with the correlation function provides an alter- native representation for a time series, This representation will be introduced in Chapter 3 and discussed in Chapter 4.
Another method of describing a time series is to express it as a function of more elementary time series. A sequence of independent identically distributed random variables is a very simple type of time series. In applications it is often possible to express the observed time series as a relatively simple function of this elementary time series. In particular, many time series have been represented as linear combinations of uncorrelated (0, (r2) or of independent (0, a') random variables. We now consider such representations.
2.1. MOVING AVERAGE PROCESSES
We shall call the time series {XI : f E (0 .2 1-22, . . .)), defined by
(2.1.1)
where M is a nonnegative integer, a, are real numbers, a_, # 0, and the e, are uncorrelated (0, a') random variables, a finite moving average time series or a jinite moving average process.
21
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
22 MOVING AVERAGE A M ) AUTOREGRESSIVE PROCESSES
If there exists no finite A4 such that t y ~ = 0 for all j with absolute value greater than M, then {XI} is an infinite moving average process and is given the representation
The time series defined by
X, = C , jE.0
(2.1.2)
(2.1.3)
where a, # 0 and aM # 0, is called a one-sided moving average of order M . Note that the number of terms in the sum on the right-hand side of (2.1.3) is M + 1. In particular, (2.1.3) is a kfi-moving average, since nonzero weights are applied only to e, and to e's with indexes less than 1. We shall most frequently use the one-sided representation of moving average time series. Note that there would be no loss of generality in considering only the one-sided representation. If in the representation (21.1) we define the random variable e, by e, =e l+M, we have, for (2.1.1),
A4 2M
(2.1.4)
where $I, = a#+.
is not one, we simply define Likewise we can, without loss of generality, take B, in (2.1.4) to be one. If &,
U P S = B , L 9
6*=/i?sp;', s=0,1,2 ,..., 2M,
and obtain the representation 2H
XI = c 6*ul-s, (2.1.5) r=O
where S, = 1 a d the u, are uncorrelated (0, jl&2) random variables.
is zero for all h satisfying [hi > M. For the time series (2.1.3) we have The covariance function of an Mth order moving average is distinctive in that it
E{XJ = 0 9
(2.1.6)
MOVING AVERAGE PROCESSES 23
As an example, consider the simple time series
Clearly,
and
By (2.1.6) only the first autocornlation of a first order moving average is nonzero, while all autocorrelations of order greater than two for a second order moving average are zero. For a q& order moving average, p(q) # 0, and all higher order autocorrelations are zero. There are further restrictions on the Correlation function of moving average time series. Consider the first order moving average
X,=e ,+ae , - , . (2.1.7)
Then
a p(l)=-
l f f f Z ' (2.1.8)
Using elementary calculus, it is easy to prove that p( 1) achieves a maximum of 0.5 for a = 1 and a minimum of -0.5 for a = -1.
Thus, the first autocorrelation of a first order moving average process will always fall in the interval [-0.5,0.5]. For any p in (0.0.5) [or in (-0.5, O)] there are two a-values yielding such a p, one in the interval (0.1) [or in (- 1,O)) and one in the interval ( 1 , ~ ) [or (-03, -I)].
Example 21.1. In Figure 2.1.1 we display 100 observations from a realiza- tion of the time series
X, = e, + 0.7et-, ,
where the el are normal independent (0,l) random variables. The positive correlation between adjacent observations is clear to the eye and is emphasized by the plot of XI against XI-, of Figure 2.1.2. The correlation between X, and X,-, is (1.49)-'(0.7) = 0.4698, and this is the slope of the line plotted on the figure. On the other hand, since the e, are independent, X, is independent of and this is clear from the plot of X, against XI-* in Figure 2.1.3.
24
2
xt 0
-2
MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
I
I I I I i I I 0 15 30 45 60 75 90
Time Flgun 2.1.1. One hundred observations from the time series X, = e, + 0.7e,- ,.
While the simple correhtion between X, and XI - 2 is zero, the partial correIation between XI and X,-.z adjusted for X,-, is not zero. The correlation between XI and Xt- l is p( l )=(l + Cy2)-la, which is the same as that between X,-l and Xg-2. Therefore, the partial correlation between X, and X,-2 adjusted for X I - , is
COV{X, - cxI-I,xf-z - CX,-J - a' - - var(x, - cx, - 1 1 1 + a z + Cy4 *
where c = (1 f a2)-'(u. In Figure 2.1.4, X, - 0.7( 1 .49)-'X, - I is piotted against XI -2 - 0.7( 1 ,49)-'X,-
for the time series of Figure 2.1.1. For this time series the theoretical partial autocorrelation is -0.283, and this is the slope of the line plotted in Figure 2.1.4.
AA
&le 2.1.2. One hundred observations from the second order moving average
XI = el - 1.40e,-, + 0.4&,-,
are displayed in Figure 2.1.5. The correlation function of this process is
p(h)f - 0.65, h = 1 , = 0.15, h = 2 , = 0 , h L 3 .
Once again the nature of the correlation can be observed from the plot. Adjacent
MOVING AVERAGE PROCESSES 25
3 1
-3 -2 -1 0 1 2 3 xt-1
Figure 21.2. Plot of X, against X,-, for 100 observations from X, = e, + 0.7e,- ,.
observations are often on opposite sides of the mean, indicating that the first order autocorrelation is negative. This correlation is also apparent in Figure 2.1.6, where X, is plotted against X, -, . AA
At the beginning of this section we defined a moving average time series to be a linear combination of a SeQuence of uncorrelated random variables. It is also common to speak of forming a moving average of an arbitrary time series X,. Thus, Y, defined by
is a moving average of the time series XI. The reader may prove:
Lemma 21.1. If {X,: t E (0,+1,+_2,. . .)} is a stationary time series with mean zero and covariance function ~ ( h ) , then
26 MOVING AVERAGE AND AUTOREORESSIVE PROCESSES
Figure 2.13. Plot of X, against X,..z for 100 observations from X, = e, + 0 . 7 ~ ,-,.
M
u,= Z: G t + i 9
i = - M
where M is finite and the a, are real numbers, is a stationary time series with mean zero and covariance function
M M
22. ABSOLUTELY SUMMABLE SEQUENCES AND INFINITE MOVING AVERAGES
Before investigating the properties of infinite moving average time series, we review some results on the convergence of partial sums of infinite sequences of real numbers.
Recall that an injhire sequence is a function whose domain is the set
ABSOLUTELY SUMMAELE SEQUENCES
3-
2-
n
+ + +s + +
-2
-3
EvgUre 2.1.4. Plot of X, -0.47X,-, against X,-2 -0.47X,-, for LOO observations from X, = e, + 0.7e ,-,.
+ + + * + .I. + -
+ +
-
I I I I I I
T, = (1,2,. . .) of natural numbers. A &z.&ly inJinite sequence is a function whose domain is the set T2 = (0, 21, 22,. . .) of all integers. When no confusion will result, we shall use the word sequence for a function whose domain is either T, or T,. We denote the infinite sequence by {u,},?=, and the doubly infinite sequence by {aj},----. When the domain is clear, the notation will be abbreviated to {uj} or
The inJinite series created from the sequence {uj}/"-, is the sequence of partid Perhaps to uj.
sums {sj}lm_, defined by
(2.2.1)
The symbols Z,-=, uj and X{uj} are often used to represent both the infinite sequence {si}/"ol generated by the sequence {u,} and the limit
28
xt
Time Figure 215. One hundred observations from the time series X, = e, - 1.4Oe,-, + 0.48e,-,.
when the limit is defined. If the sequence {s,) has a finite limit, we say that the series Z{uj} is convergent. If the limit is not defined, the series is divergent.
If the limit
is finite, the series X{a,> is said to be absolutely convergenr, and we write XJwx, Iu,I < 00. We also describe this situation by saying the sequence {uj} is absolutely summable. For a doubly infinite sequence, if the limit
ABSOLUTELY SUMMABLE SEQUENCES 29
-3
-2 t + +
++
- +
I t I t + I I
exists and is finite, then the series is absolutely convergent, and we write
and say that the sequence { Q ~ } is absolutely summable,
useful in our investigation of moving average time series. The following elernenmy properties of absolutely summable sequences will be
1 . If (u,} is absolutely summable, then
30 MOVING AVERAGE AND AUTOREGRESSIVE PRoceSSES
The reader can verify that the converse is not hue by considering X;= j -' and ZJm=, j - ' .
2. Given two absolutely summable sequences {uj} and {bj}, then the sequences {aj + b,} and {uJbj} are absolutely summable:
m m m m
m m m
Of course, XT=-.. (lujl + [bj1)2<m, since {(lujl + 16jl)2) is the square of an absolutely summable sequence.
3. The convolution of two absolutely summable sequences {aj} and {bj}, defined by
is absolutely summable:
This result generalizes, and, for example, if we define m
where {u,}, {bj}, and {cj} are absolutely summable, then
j = - *
The infinite moving average time series
x,= 2 aje,-;, r=0,+1, t2 , ..., I = - m
was introduced in equation (2.1.2) of Section 2.1. In investigating the behavior of infinite moving averages we will repeatedly interchange the summation and expectation operations. As justification for this procedure, we give the following theorems. The theorems may be skipped by those not familiar with real analysis. The organization of the proofs follows suggestions made by Torres (1986) and Pantula (1988a). We begin with a lemma on almost sure convergence of a sequence of random functions.
ABSOLUTELY SUMMABLE SEQUENCES 31
Lemma 2.2.1. Let {q} be a sequence of random variables defined on the probability space (a, Sa, P). Assume
Then )=I"- is defined as an almost sure knit and
h L Let a nondecreasing sequence of random variables be defined by
U W ) = 2 IZ,Wl j = - k
Therefore, the limit as k -+ CQ is defined. By assumption, E{IZ,l} is finite for each j and it follows that
E{Y,} C M <
for some M and all k. Therefore, by the monotone convergence theorem,
See, for example, Royden (1989, p. 265). AIso C,m=-= 151 < as. Therefore,
and
where S,, = X;=-,, q. It follows, by the dominated convergence theorem, that
A
Theorem 2.2.1. Let the sequence of real numbers {a,} and the sequence of random variables {Z,} satisfy
m
j - -m
32 MOVING AVERAGE AND AUTOREORESIVE PROCESSES
and E{Z:} 6 K, t = 0,k 1.22, . . . , for some finite K. Then there exists a sequence of random variables {XI ) such that, for r = 0,” 1,+2,. . . ,
m
X, = 2 ajZ,-, as., 1 m - m
and E{X:} d M 2 K .
Proof. By Lemma 2.2.1, X, is well defined as an almost sure limit and
Because (p,l- 1q1)’ B 0, we have E{lZll lZ,l} d K for all t, j . It follows that
and applying Lemma 2.2.1, we have that
and
for all t = 0, 2 1,+2,. . . . It follows that
and XI,,,,,luj( converges to zero as n -+m. A
Because almost sure convergence implies convergence in probability, the sequence of random variables {X,} of Theorem 2.2.1 is defined as a limit in probability, as a limit in mean square, and as an almost sure limit. The second moment condition is required for convergence in mean square, and the first
ABSOLUTELY SUMMABLE SEQUENCES 33
absolute moment is required for almost sure convergence. Hereafter, we shall often use the infinite sum with no modifiers to represent the limit random variable.
Theorem 23.2. Let the sequence of real numbers {ul}, {b,} be absolutely summable. Let {Z,} be a sequence of random variables such that E{Z:} 6 K for r=0,+-1,+2 ,..., and for some finite K. Define
Then
and
Proof. The expectation result follows from Lemma 2.2.1 and the proof of Theorem 2.2.1. By assumption,
and the conclusion follows by Lemma 2.2.1. A
Corollary 2.2.2.1. Let the sequences of real numbers {u,} and (6,) be absolutely summable. Let {e,} be a sequence of uncorrelated (0, a') random variables, and define for t = 0, 5 1, +2, . . .
Then, E{X,} = 0 and co
E{X,Y, }=uZ 2 a,bj. j - -m
Proof. The sequence {el} satisfies the hypotheses of Theorem 2.2.2 with A K = uz. Since E{e,} = 0 and E{e,ej} = 0 for t Z j , we have the results.
Corollary 2.2.2.2. Let {X,} be given by m
34
and define
MOVING AVERAGE AND AUTORWRESSIVE PROCESSES
Y, = bjXt-, , j - - m
where {e,} is a sequence of uncorrelated (0, a’) random variables and {aj} and {b,} are absolutely summable sequences of real numbers. Then {Y,} is a stationary moving average time series and
m m
m
(2.2.2)
m
h=O
Proof. By definition, m W
Since {a,} and {b,} are absolutely summable, we my interchange the order of summation (Fubini’s theorem). Setting m = j + k, we have
m m
m p - m
where c, = 2,m_-m b,a,-j. By result 3 on the convolution of absolutely summble sequences, {c,} is absolutely summable. Therefore,
2 cmcm-hu m‘-W
is also absolutely summable. Using Theorem 2.2.2,
00
2 y#)= 2 c,c,-hu ,=-m
m
ABSOLUTELY SUMMABE SEQUENCeS 35
m m m
m m
A
We may paraphrase this corollary by saying that an infinite moving average (where the coefficients are absolutely summabie) of an infinite moving average stationary time series is itself an infinite moving average stationary time series. If the {uj} and (6,) are exponentially declining, then {c,} of Corollary 2.2.2.2 is exponentially declining.
Corollary 2.2.23. Let the {uJ} and {bj) of Corollary 2.2.2.2 satisfy
IujI <MA'" and ibjl <MA'"
for some M < 00 and 0 < A < 1. Let Y, be as defined in Corollary 2.2.2.2. Then
where c, = ZJm= __ bJa, ._ and Ic, I < M, A'"' for some M, < 00.
Pmof. We have m
Ic,~ = I 2 bja,-i 1 S M 2 A C l ( m + 1 f 2 AZJ) . j = - m j = I
A
In Theorems 2.2.1 and 2.2.2, we made mild assumptions on the Z, and assumed the sequence of coefficients, {a,}, to be absolutely summable. In Theorem 2.2.3, we give a mean square result for the infinite sum of the elements of a time series under weaker assumptions on the coefficients.
Theorem 2.2.3. Let {aj} be a sequence of real numbers such that {a;} is suminable. Let Z, be the time series
m
where (e,} is a sequence of uncorrelated (0,(r2) random variables and {bj} is absolutely swnmable. Then there exists a sequence of random variables {X,} such that for t =o, 51.22, * . . ,
36
E{X,) = 0, and
MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
(2.2.3)
Proof. By the Cauchy-Schwa inequality, we have, for R > m and h > 0,
Since 2 u: < =, given E > 0, there exists an No such that, for all n > m >No,
l J ~ + l a J u j + h l <' (2.2.5)
uniformly in h. Let Yn, = X;= -,, ujZI-, . For n > m >No,
Now
where
by (2.2.5). Because, by Corollary 2.2.2.2, I Ix(h)l C 00, it follows that the right side of (2.2.6) converges to zero in mean square as rn + m and n +m. Therefore, q,f is Cauchy in squared mean and there exists a random variable XI such that E{X:) < ~4 and
E{(Xf - Ynl)'} -+ 0 as R -+ 00 9
See, for example, Tucker (1967, p. 105). Also, E{X,}=lim,,, E{Yn,}=O and E~X,X,+, ) = limn+ mnlYnl+h).
ABSOLUTELY SUMMABLE SEQUENCES 37
We write m m m
where
aj if I j l d n , o if l j l > n .
djn =
Since d,,, and b, are absolutely summable, we have m m m
where ob m
Also,
Therefore,
m
because {m,} is square summable. See Exercise 2.45. Therefore, by the dominated convergence theorem,
Corollary 2.23. Let { c ~ } : ~ be square summable and let {dj}z- be absolutely summable. Let
m
Z~ = 2 cJe , , , ,=-m
38 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
where e, is a sequence of uncorreiated (0, (r2) random variables. Then there exists a sequence of random variables {XI}:- such that
E(X} = 0, and
Proof. From Theorem 2.2.3, we have that 2, is well defined in mean square with E{Z,} = 0 and ~ ( h ) = u2 Xi"p-- cJc +h. Now, the existence of the sequence of random variables {XI}:- such that E{Xj = 0 and
n-rw
follows from Theorem 2.2.1. Furthermore, from Theorem 2.2.2, we have
where gj = 2;- -=ckd,+. Using the dominated convergence theorem, we have
n OD
AN INTRODUCTION TO AUTOREGRESSIVE W E SERIES 39
Therefore,
= O . A
23. AN INTRODUCTION TO AUTOREGRESSIVE TIME SERIES
Many time series encountered in practice are well approximated by the representa- tion
(2.3.1) i=o
where % # 0, a,, # 0, and the e, are uncorrelated (0, a*) random variables. The sequence {XI} is called a pth order autoregressive time series. The defining equation (2.3.1) is sometimes called a stochastic dwerence equation.
To study such processes, let us consider the first order autoregressive time series, which we express in its most common form,
X t = p X I - , + e l . (2.3.2)
By repeated substitution of
for i = 1,2,, . . ,N into (2.3.2), we obtain
N- I
X, =pNx,_N + 2 ple,-, 130
Under the assumptions that IpI < 1 and E{X:} < K < 00, we have
(2.3.3)
(2.3.4)
Thus, if el is defined for t E{O, +1,+2,. . .) and X, satisfies the difference equation (2.3.2) with lp( < 1, then we may express XI as an infinite moving average of the e,:
X, = C p i e r u i .
40 MOVING AVERAGE AND AUTOREaReSSIVE PROCESSES
It follows that E{X,) = 0 for all t , and
(2.3.5)
for h = 0,1, . , . . The covariance function for h # 0 is also rapidly obtained by making use of a fonn of (2.3.3),
h - I
X, = phX,-, + c, pJe,- j , h = 1,2,. . . . J 50
(2.3.6)
Since is uncorrelated with ey-h+l , er-h+z. . . . . Therefore, we have, after multiplying both sides of (2.3.6) by X,-,, and taking expectations,
is a function of e's with subscript less than or equal to t - h,
= phr (0 ) 1 h = 1,2,. . . . (2.3.7)
The correlation function is seen to be a monotonically declining function for p>O, while for p<O the function is alternately positive and negative, the absolute value declining at a geometric rate.
Example 2.3.1. Figure 2.3.1 displays 100 observations from the first order process
XI = 0.7X, - I + el . The current observation, X,, is plotted against the preceding observation, X,- I , in Figure 2.3.2 and against in Figure 2.3.3. The correlation between X, and Xt-.2 is (0.7)2 = 0.49, but the partial correlation between XI and adjusted for X,-, is zero, since
COV{X, - px, - I , XI - 2 - px,- I } = ( p 2 - p2 - p2 + p2)r(0) = 0 .
The zero partial correlation is illustrated by Figure 2.3.4, which contains a plot of X, - 0.7X,- I against X,-2 - 0.7X, - I . In fact, for any h E (2,3, . . .), the partial correlation between XI and XI-,, adjusted for X,-, is zero. This follows because
cov{x, - p x ( - ~ ~ x ~ - h - p h - l X l - I }
= ( ph - p h - ' p - pph-' + ph)r(0) = 0 , h 2,3, . . . . This important property of the first order autoregressive time series can be
DIFFERENCE EQUATIONS 41
I I I I I I
Time 0 15 30 45 60 75 90
Figure 2.3.1. One hundred observations from the time series X, = 0.7X,-, + e,.
characterized by saying that all the useful (correlational) information about X, in the entire realization previous to X, is contained in X,-,. Compare this result with the partial correlational properties of the moving average process discussed in Section 2.1. AA
Linear difference equations and their solutions play an important role in the study of autoregressive time series as well as in other areas of time series analysis. Therefore we digress to present the needed elementary results before studying higher order autoregressive processes. The initial development follows Goldberg (1958).
2A. DIFFERENCE EQUATIONS
Given a sequence ( Y ~ } ~ - ~ , the first difference is defined by
AY, = Y , - Yt- I 9 t = 1,2,. . . , (2.4.1)
42 MOVINO AVERAGE AND AUTORBORbSSIVE PROCESSES
and the nth dverence is defined by
n
Any, = A'-'y, - An-'y,- , = c (- l)r&I-r, t = n, n + 1, . . . , r=O
where
are the binomial coefficients. The equation
where the a, are real constants, a,, #0, and rf is a real function of t , is a linear diflerence equation of order n with constant coefficients. The values y I - , , y t V 2 , . , . , yI-, are sometimes called lugged values of yf. Equation (2.4.2)
DIFERENCE EQUATIONS 43
could be expressed in terms of yf-, and the differences of y,; for example,
Anyf + b, A"-'yr-, + b, A''-2yt-2 + * * * + bn-f Ay,-,,+l + b,y,-, - - r , ,
(2.4.3)
where the b,'s are linear functions of the a,'%
A third representation' of (2.4.2) is possible. Let the symbol 98 denote the operation of replacing yf by y+,; that is,
9Byi = Yr- 1 =
' The reader may wMtder at the benefiu associated with several alternative notations. AII are heavily used in the literature and all have advantages for certain operations.
44 MOVING AVERAGE AM) AUTOREGRESSIVE PROCESSES
+
Figure 2.3.4. Plot of X, - 0.7X,.., against X,..a - 0.7Xf-, for 100 observations from X, = 0.7X,-, + e,.
Using the backward shift operator 53, we have
A Y ~ = Y ~ - % ~ = ( ~ - W Y , . As with the difference operator, we denote repeated application of the operator with the appropriate exponent; for example,
g 3 y , = a@*, = y1-3 .
Therefore, we can use the operator symbol to write (2.4.2) as
(1 + a , 9 + a , B 2 + * * * +aR9In)y, = r , , t = n,n + 1,. . . * Associated with (2.4.2) is the reduced or homogeneous difference equation
y , + (I) y , - 1 + a2y1 -2 + * * * + anYl-n = 0 . (2.4.4)
DIFFERENCE EQUATIONS 45
From the linearity of (2.4.2) and (2.4.4) we have the following properties,
1. If y ( I ) and y ( , ) are solutions of (2.4.4), then b , y " ) + b,y'" is a solution of
2. If Y is a solution of (2.4.4) and y t a solution of (2.4.21, then Y + y t is a (2.4.4), where b, and b, are arbitrary constants.
solution of (2.4.2).
y t is called a particular solution and Y + y t a general solution of the difference equation (2.4.2).
Equations (2.4.2) and (2.4.4) specify y , as a function of the preceding n values of y . By using an inductive proof it is possible to demonstrate that the linear difference equation of order n has one and only one solution for which values at n consecutive t values are arbitrarily prescribed. This important result means that when we obtain n linearly independent solutions to the difference equation, we can construct the general solution.
The first order homogeneous difference equation
~ , = a y ~ - ~ , r=1,2, ..., (2.4.5)
has the unique solution
Y, = a'Yo 9 (2.4.6)
where y o is the value of y at t = 0. Note that if lal< 1, the solution tends to zero as t increases. Conversely, if b1> 1, the absolute value of the solution sequence increases without bound. If a is negative, the solution oscillates with alternately positive and negative values. The first order nonhomogeneous difference equation
Y,=aYf-I + b (2.4.7)
has the unique solution
(2.4.8) a = l .
If the constant b in the difference equation (2.4.7) is replaced by a function of time, for example,
y , = u y , - , + b , , t = l , 2 ,..., (2.4.9)
then the solution is given by , - I I .
y f = yoaf + a'bf-j , t = 1,2, . . . , J"0
(2.4.10)
where yo is the value of y, at time 0.
46 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
The solutions (2.4.6), (2.4.8), and (2.4.10) are readily verified by differencing. The fact that the linear function is reduced by differencing to the constant function is of particular interest. The generalization of this result is impartaat in the analysis of nonstationary time series.
Theorem 2.4.1. Let yf be a polynomial of degree n whose domain is the integers. Then the first difference Ay, is expressible as a polynomial of degree n - 1 in t.
proot. since
and
we have
where yn - = nPn . A
It follows that by taking repeated differences, one can reduce a polynomial to the zero function. We state the result as a corollary.
Corollary 2.4.1. Let yf be a polynomial of degree n whose domain is the integers. Then the nth difference Any, is a constant function, and the (n + 1)st difference A"' *y, is the zero function.
In investigating difference equations of higher order, the nature of the solution is a function of the auxiliary equarion, sometimes called the characteristic eqwtion. For equation (2.4.2) the auxiliary equation is
m" + u l m n - ' + *.. +an = o . (2.4.1 1)
DIFFERENCE EQUATIONS 47
This polynomial equation will have n (not necessarily distinct) mots, and the behavior of the solution is intimately related to these roots.
Consider the second order linear homogeneous equation
where a, # 0. The two mots of the auxiliary equation
m * + a , m + n , = O
(2.4.12)
(2.4.13)
will fall into one of three categories:
1, Real and distinct. 2. Real and equal. 3. A complex conjugate pair.
The general solution of the homogeneous difference equation (2.4.12) for the three categories is:
1.
y , = b ,m: + b ,m; . (2.4.14)
where b , and b, are constants determined by the initial conditions, and m, and m2 are the roots of (2.4.13).
2.
y , = (b, + b , W 7
where m is the repeated root of (2.4.13).
(2.4.15)
3.
y , = bl;ml -t b , m i , (2.4.16)
where bl; is the complex conjugate of b, and m2 = m: is the complex conjugate of m I .
If the complex roots are expressed as
~(COS 6 t i sin 0) ,
where r = a:'' = tm, I and
(cos e, sin 8 ) = (2r)-'(ai, la: - 4a,1'"),
48 MOVING AVERAGE AND AUTOREGRESSIVE PROCESS
then the solution can be expressed as
(2.4.17)
where d,, d,, g,, and g, are real constants. Substitution of (2.4.14) into (2.4.12) immediately establishes tbat (2.4.14) is a
solution and, in fact, that mi and mi are independent solutions when m, # m2. For a repeated root, substitution of (2.4.15) into (2.4.12) gives
(6, + b2t)m'-2(m2 + u p + a,) - b,m'-2(u,m + h,) = 0 .
The first term is zero because m is a root of (2.4.13). For m a repeated root, mu, + 2u, = 0, proving that (2.4.15) is the general solution.
As with unequal real roots, substitution of (2.4.16) into (2.4.12) demonstrates that (2.4.16) is the solution in the case of complex roots.
The solution for a linear homogeneous Metence equation of order n can be obtained using the roots of the auxiliary equation. The solution is a sum of n terms where:
1. For every real and distinct root m, a term of the form bm' is included, 2. For every real root of order p (a root repeated p times), a term of the form
is included. 3. For each pair of unrepeated complex conjugate roots, a term of the form
ar' cos(te + @)
is included, where t and 0 are defined following (2.4.16). 4. For a pair of complex conjugate roots occurring p times, a term of the form
r ' [ q m(t6 +pi) + ajtcos(f6 + 4) + * * * + qP-' cos(& + p,)]
is included.
yo. y , , . . . , y n - , , a particular solution is For the nonhomogeneous difference equation (2.4.2), with initial conditions
I
y ; = z w j r ' - j , t = n , n + 1 I . . . ,
j - 0
where w,, = 1, i
C a,wj+ = 0 , j = I , 2, . . , n - 1 , r = O
n
~ u l w , - , = ~ , j = n , n + 1 ,..., 1x0
DIFFERENCE EQUATIONS 49
and ro, rl, . . . , r,- , are defined by I
y' = c wjr'-j, t = 0,1,. . . , n - I . / - 0
Observe that the weights wI satisfy the same homogeneous difference equation as the y,. Therefore, wj can be expressed as a sum of the four types of terms that define the solution to the homogeneous difference equation.
Systems of difference equations in more than one function of time arise naturally in many models of the physical and economic sciences. To extend our discussion to such models, consider the system of difference equations in the two functions ylr and yzr:
Y l f =allYI.f-l +alZYz,l-l +b,' 9
Y 2 t = ~ z r Y l , f - l +%2YZ,f-I + b 2 f , (2.4.18)
where a , , , a,,, a,,, a22 are real constants, b,, and b,, are real functions of time, and t = 1.2,. . . . The system may be expressed in matrix notation as
Y, = AY,- I + bl7 (2.4.19)
where
Y: = ( Y I I I Y2') 9
A =("I a21 a,, ""1 and bi = (b l f , b,,).
difference equation, we might postulate that the homogeneous equation Proceeding from the solution (2.4.6) of the first order homogeneous scalar
Yf =Ay,-, (2.4.20)
would have the solution
Yr = A'Y, * (2.4.21)
and that the equation (2.4.19) would have the solution l- 1 . .
yf = A'y, + 2 A%,-, j - 0
(2.4.22)
where we understand that A' = I. Direct substitution of (2.4.21) into (2.4.20) and (2.4.22) into (2.4.19) verifies that these are the solutions of the matrix analogs of
In the scalar Case the behavior of the solution depends in a critical manner on the magnitude of the coefficient a for the first order equation and on the magnitude of the roots of the characteristic equation for the higher order equations. If the
the first order scalar equations.
50 MOVING AVERAGE! AND AUTORBOReSSIVE PROCESSES
roots are all less than one in absolute value, then the effect of the initial conditions is transient, the initial conditions being multiplied by the product of powers of the roots and polynomials in t.
In the vector case the quantities analogous to the roots of the characteristic equation are the roots of the determinantad equation
/A - mI/ = 0.
For the moment, we assume that the roots of the determinantal equation are distinct. Let the two mts of the matrix A of (2.4.19) be m, and m2, and define the vectors q.] and q.2 by the equations
(A - m,I)q., = 0 , (A - m21)q., = 0,
q"q.1 = q.'zq.* = 1 . By construction, the matrix Q = (q.!, q.2) is such that
AQ=QM, (2.4.23)
where M = diag(m, , m,). Since the roots are distinct, the matrix Q is nonsingular. H-9
Q-'AQ = M . (2.4.24)
Define Z: = (z#,, 221) C: = (c,,, Czr) by
2, = Q-'Y, , c, = Q-lb,.
Then, multiplying (2.4.19) by Q-', we have
Z, = Q-'AQz,-, + C,
or
The transformation Q-I reduces the system of difference equations to two simple first order difference equations with solutions
(2.4.25) I
DIFFERENCE EQUATIONS 51
where we have set y l0 = b , , and y,, = b,, for notational convenience, Therefore, the effect of the initial conditions is transient if 1.2, I < 1 and Im,l< 1. The solutions (2.4.25) may be expressed in terms of the original variables as
or I
yt = C QM*Q-%-, j = O
(2.4.26)
where the last representation follows from A' = QMQ-'QMQ-' = QM2Q-', etc. If the A X n matrix A has multiple roots, it cannot always be reduced to a
diagonal by a transformation matrix Q. However, A can be reduced to a matrix whose form is slightly more complicated.
Theorem 2.4.2. Let A, be the k, X ki matrix
0 0 0 * * - 0 m,
where A, =mi when ki = 1. Then there exists a matrix Q such that
Q-'AQ = = A , (2.4.27)
0 0 * * *
where Xi=, k, = n and the mi, i = 1.2,. . . , r, are the characteristic roots of A, If mi is a repeated root, it may appear in more than one block of A, but the total number of times it appears is equal to its multiplicity.
Proof. See Finkbeiner (1960) or Miller (1963). A
52 MOVING AVERAGE AND AUMREf3RESSIVE PROCESSES
The representation A is called the Jordan canonical fonn. The powers of the matrix A, are given by
0 0
0 0 0 ... mi
. (2.4.28)
Therefore, the solution of the first order vector difference equation with initial conditions yo = bo is given by
I
y, = 2 QA'Q-'b,, , j - 0
(2.4.29)
and the effect of the initial cooditions goes to zero as t + m if al l of the roots of IA - rnII = 0 are less than one in absolute value.
To further illustrate the solution, consider a system of dimension two with repeated root that has been reduced to the Jordan canonical form:
Y , , - -my1,,-1 +Y2.PI 1
Y2, = mY2.r-1 +
It follows that
Y,, = Y20mf
Y I , - ~ Y , , , - ~ +yzoml-' 9
and y l , is given as the solution of the nonhomogeneous equation
-
whence
This approach may be extended to treat higher order vector difference equations. For example, consider the second order vector difference equation
DIFFERENCE EQUATIONS 53
where A , and A, are k X k matrices and y; = (y,,, Y2,.. . . , Yk,). The System of equations may also be written as
x, =Ax,-, . (2.4.31)
where xi = (yi, Y:- ) and
(2.4.32)
Thus we have converted the original second order vector difference equation of dimension & to a first order difference equation of dimension 28. The properties of the solution will depend on the nature of the Jordan canonical form of A. In particular the effect of the initial conditions will be transient if all of the roots of IA - mII = 0 are less than one in absolute value. This condition is equivalent to the condition that the mots of
lImz - A,m - A,I = 0
are less than one in absolute value. To prove this equivalence, form the matrix product
(i A,ljl-')(A, -mI
=(A, - "Io+ A,rn-' I
- mI
and take the determinant of both sides. kt the nth order vector difference equation be given by
y , + A,y,-, + A2yI-2 + * * +A,y,-, = r, , t = 1,2,. . . , (2.4.33) where y, is a &-vector and r, is a real vector valued function of time. The associateh matrix is
A =
Then the solution is
-Ai -A2 * * * -An-, -A,,'
I 0 * - . 0 0
0 0 . * * I 0 .
54 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
where the W, are k X k matrices defined by
W 0 = I , W, + A , = O ,
W2+A,W, + A 2 = 0 ,
W, +A,W,-, + * * * +AnWjAn =0, j = n , n + 1,. . . , 7 is the vector composed of the first k entries in the nk-vector x, = Aka, and x; = (y;, yl ,, . . , , y;-,,). It follows tbat W, is the upper left block of the matrix A’.
25. THE SECOND ORDER AUTOREGRESSIW TIME SERIES
We consider the stationary second order autoregressive time series defined by the stochastic difference equation
X , + a , X , - I + a j X , - 2 = e , , tE(O,k1,22 ,... 1, (2.5.1)
where the e, are uncmlated (0, a’) random variables, a2 # 0, and the foots of m2 + q m + a;_ = 0 are less than one in absolute value. Recalling that we were able to express the first order autoregressive process as a weighted average of past values of e,, we postulate that X, can be expressed as
m
X, = E wje,, , l = O
(2.5.2)
where the wj are functions of a, and %. We display this representation as
X, = woe, + w,e,- , + w2erm2 + W ~ C , - ~ + * - * , X,-, = woe,-, f ~ , e , - ~ + w2et-3 +... +
X,.-2 = woe,-’ + wle,-3 + * * 9 . If the X, are to satisfy the difference equation (2.5.1). then we must have
wo= 1 ,
w 1 + a l w o = o , (2.5.3)
wj + aIwj-I + %wj-2 =0, j = 2,3,. . . . Thus the w, are given by the general solution to the second order linear homogeneous difference equation subject to the initial conditions spified by the equations in w,, and w,. Therefore, if the roots are distinct, we have
- 1 j+r - I j+l wj =(ml -m2) m , +(m2 -m,) m, , j = O , 1,2. . . ,
THE SECOND ORDER AUTOREORESSIVE TIME SERIES 55
and if the roots are equal,
wj=(l+j)m’, j=0 ,1 ,2 ,... . In Theorem 2.6.1 of the next section, we prove that the representation (2.5.2) holds for a stationary time series satisfying (2.5.1).
The covariance function for the second order autoregressive time series defined in (2.5.2) is
m
H h ) = o Z % w,wj-*, h=0,1,2 ,... . I‘h
(2.5.4)
We shall investigate the covariance function in a slightly different manner. Multiply (2.5.1) by X,+ (h 30) to obtain
(2.5.5)
Because X, is expressible as a weighted average of e, and previous e’s,
Thus, taking the expectation of both sides of (2.5.5). we have
That is, the covariance function, for h > 0, satisfies the homogeneous difference equation associated with the stochastic difference equation defining the time series.
The equations (2.5.6) are called the Yule-Wufker equations. With the aid of these equations we can obtain a number of alternative expressions for the autocovariances, autmrrelations, and coefficients of the original representation. Using the two equations associated with h = 1 and h = 2, we may solve for a, and
as functions of the autocovariances:
If we use the three equations (2.5.6) associated with h = 0,1,2, we can obtain expressions for the autocovariances in terms of the coefficients:
(2.5.8)
56 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
Using fi0) and y(1) of (2.5.8) as the initial conditions together with the general solution of the homogeneous equation, we can obtain the expression for the covariance function.
However, it is somewhat simpler to derive tbe correlation function first. To obtain p(1) as a function of the coefficients, we divide equation (2.5.6) for h = 1 by HO), giving
P(1) + a1 + q P ( l ) = 0
and
(2.5.9)
Using (2.5.9) and p(0) = 1 as initial conditions, we have, for unequal roots (real or complex),
~ ( h ) = [(m, - m;)(l+ m,m,)l-’[m~+’(l - m i ) - m ! + l ( l - m t ) ~ , (2.5.10)
h=0,1,2 ,... . If the roots are complex, the autocorrelation function may also be expressed as
(2.5.1 1)
where
For roots real and equal the autocorrelation function is given by
P ( h ) = [ l + h ( 5 ) ] m h , h=0,1,2 ,... . (2.5.12)
If we substitute a, = -(m, + m2) and cu, = mlm2 into (2.5.8), we can express the variance as a function of the roots:
(2.5.13)
THE SECOND ORDER AUTOREGRESSIVE TIMI3 SERIES 57
w o n (2.5.13) together with (2.5.10) and (2.5.12) may be used to express the covariance function in terms of the roots.
Example 25.1. Figure 2.5.1 contains a plot of the correlation function, sometimes called the correlogrurn, for the time series
The roots of the auxiliary equation are 0.8 and 0.6. The correlogram has much the same appearance as that of a first order autoregressive process with large p. The empirical correlograms of many economic time series have this g e n d appear- ance, the first few correlations being quite large.
The cornlogram in Figure 2.5.2 is that associated with the time series
X, = X,-, - 0.89X,-, + e, .
1.2
1 .o
0.8
p(h) 0.6
0.4
0.2
0.0 0 6 I? 18 24 30 36 42
h Figure 2.5.1. Cornlogram for X, = 1.4OX,-, - 0.48X,-, + e,.
58
1 .o
0.5
p(h) 0.0
-0.5
-1.0
MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
0 6 12 18 24 30 36
Figure Ut. Com10gram for X, = X,- - 0.89X,-, + e,.
h
The roots of the auxiliary equation are the complex pair 0.520.86. In this case the conrelogram has the distinctive “declidng cyclical” appearance associated with complex mots. ’his correIogram illustrates how such a process can generate a time series with the appeamce of moderately regular “cycles.” In the present case cos 8=0.53 and 8=0.322 n, where cos B is &tined in (2.5.11). Thus the apparent period of the cyclical behavior would be 2(0.322)-’=6.2 time units. AA
2.6. ALTERNATIVE REPRESENTATIONS OF AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES
We have argued that the stationary second order autoregressive time series can dso be represented as an infinite moving average time series. We now prove that a time series satisfying a pth order difference equation has a representation as an infinite moving average.
ALTEWIATIVE REPUESENTATIONS OP MOVING AVERAGE PRoeEsSES 59
Theorem 2.6.1. Let {XI} be a time series defined on the integers with E{x~} < K for a11 c. Suppose X, satisfies
P ~ ~ + X q x , - ~ = e , , r=0,+1,+2 ,..., (2.6.1)
where {e,}, r = O,rtr1,+2,. . . , is a sequence of uncomlated (0, a’) random variables. Let m, , m,, . . . , mp be the roots of
j - I
P m p + C q m p - j = o , (2.6.2)
j - 1
and assume Im,l< 1, i = 1,2, . . . , p . Then X, is covariance stationary. Further- more, X, is given as a limit in mean square by
(2.6.3)
where { W ~ } ~ W = ~ is the unique solution of the homogeneous difference equation
wI+a,wjy i - l + * ~ * + C Y ~ W ~ - ~ = O , j = p , p + l ,..., (2.6.4)
subject to the boundary conditions wo = 1 and
i w + 2 a i w j _ , = o , j = 1 , 2 ,..., p - 1 .
j i = l
The limit (2.6.3) is also an almost sure limit.
Proof. LetY,=(X,,X,-, ,..., XI-p+l)’r r,=(e,,O, ..., 0)’,and
A =
-a, -a2 -a3 . - - - ap-l -up’
1 0 0 ..* 0 0
0 1 0 0 0
0 0 0 * . * 1 0 .
(2.6.5)
Then
r-1
60 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
for n 9 1 and t = O , t l , r t 2 ,... . Therefore,
where A:ij) is the ijth element of the matrix A'. Consider the detenninant
0 -1 A * * . 0 0 . . . . . .
0 0 0 * * ' -1 A
Beginning with column 1, successively multipIy column i by A and add the result to column i + 1 to obtain
0 ... -1 0 0
0 ... 0 -1 0
0 ... 0 0 0
= ( r \ P + a , A p - ' + . * * + a p - l ~ + aP)(-1)'"-' .
By the Cayley-Hamilton theorem,
AP + a,AP-' + * * + ap-, A + apI = 0 .
For j P p multiply the matrix equation (2.6.6) by A'-' to obtain
A' + a,AJ-' + . . .
Set wj = A:, ,), the 6rst element of A'. Then
w, + aI wj- , + q w j - 2 + - * - + apwj-p = 0 , j * p .
(2.6.6)
(2.6.7)
ALTERNATIVE REPRESENTATIONS OF MOVING AVERAGE PROCESSES 61
It i s readily verified that wj = A{ l satisfies the initial conditions associated with (2.6.4) in the statement of the theorem, Therefore,
P n- I
and
for n = I , 2, . . . . Because each element of A" satisfies the homogeneous difference equation (2.6.7), we have
A { ! ~ ) i- a r , ~ { ; , . ; + - +- a,~{;: = o for j a p ,
and there exists a c such that lA{li)l <cA' for i = 1,2,. . . , p, where 1 >A>M and M is the Iargest of the absolute values of the roots m,. See Exercise 2.24. It follows that
E [ (x, - 2 Wje'-,)'] c ( C P A ~ ~ K J-0
+ O
as n 4 0 3 , because 0 < A < 1. Because ZJm=,, I w , ~ < m, X, Now,
is covariance stationary.
C cpAnK' l 2 ,
and it follows that the limit (2.6.3) holds almost surely. A
Because X, can be written as a weighted sum of current and past e,'s, it follows that c, is uncorrelated with X,+, for h = 1,2, . . . . This enables us to derive many useful properties of the autoregressive time series and of the autocovariance function of the autoregressive time series.
Corollary 2.6.1.1. Let X, be stationary and satisfy
X, + culX,-, + - - * + ( U , X ~ - ~ = e, ,
where the e, are uncorrelated (0 ,a2) random variables and the roots of the characteristic polynomial
mP + a ,mP- '+ * * * -t- ab = o
62 MOVING AVERAGE AND AUMlReGRESSIVE PROCBSSES
are less than one in absolute value. Then
r(0) + ff,y(l) + * * + ff,y(p) = a2
and
~ ~ ) + f f , y ( ~ - l ) + ~ ~ ~ f ~ ~ ~ r ( ~ - ~ ) = O , h=1,2 ,... . (2.6.8)
Proof. Reserved for the reader. See equation (2.5.6). A
The system (2.6.8) defines the a’s in terms of the y’s and vice versa. For example, for a stationary third order autoregressive process, we have
and
(2.6.9)
Using the representation results, we show that the partial autocorrelation function, introduced in equation (1.4.12). of a pth order autoregressive process is zero €or k > p.
Corollary 2.6.1.2. Let X, be a stationary pth order autoregressive process with all of the roots of the characteristic polynomial less than one in absolute value, and let dyk) be the partial autocorrelation kction. Then
d y k ) = O , k = p f l , p + 2 ,.... Proof. By the &hition of tbe partial autocorrelation. #p+1) is the
correlation between the residual obtained in the population regression of XI on X, - I X, -,, . . . , X, -p and the residual obtained in the population regression of X, -p - I on X, - ,, X,-z, . . . , XI - p . By the definition of the autoregressive process, the residual obtained in the popuIation regression of X, on X,- Xf-2, . . . , is c,, and the coefficients are -a,, i = 1.2,. . . , p. By the representation of Thewem 2.6.1, e, is uncorrelated with Xr+ i 1 and hence with a linear combination of the
ALTERNATIVE REPRESENTATIONS OF MOVING AVERAGE PROCESSES 63
X,+ i b 1. Therefore, e,, of (1.4.11), and hence &k), is zefo for k = p + 1. If 4, = -a, for i = 1,2, . , . , p and 4, = 0 for i = p + 1, . . . , k, then e,+ I,,+ = 0 and the result follows. A
Whenever discussing stationary autoregressive time series, we have assumed that the roots of the auxiliary equation were less than one in absolute value. This is because we have explicitly or implicitly visualized the time series as being created in a forward manner. If a time series is created in a forward manner, the effect of the initial conditions will go to zero only if the mts are less than one in absolute value. For example, if we define
(2.6.1 1)
where it is understood that X, is formed by adding e, to pX, - and e, is a sequence of uncorrelated (0, a’) random variables, then {X, : t = 0,1,2, . . .} is stationary for lpl C 1 and a, = (1 - pZ)-I” . However, for IpI 3 1, the time series formed by adding e, to pX,-, is nonstationary for all a,.
On the other hand, there is a stationary time series that satisfies the difference equation
X,=pX,-, +e, , t = O , a l , + 2 ...., (2.6.12)
for lpl> 1. To see this, let us consider the stationary time series
X, = O.8Xp-, + E, (D
= 2 (0 .8)J~, , , r = o , +1,+2 ,..., (2.6.13)
where cr are unconeiated (0, u2) random variables. If we change the direction in which we count on the integers so that -r = t and t - I = --t + 1, and divide (2.6.13) by 0.8, we have
j - 0
XI+, = 1.25Xf - 1 . 2 5 ~ ~ (2.6.14)
and
(2.6.15)
By setting eIc, = -l.25ef, equation (2.6.14) can be written in the form (2.6.12) with p = 1.25.
The covariance function for the time series (2.6.15) is the same as the covariance function of the time series (2.6.13). Thus, if a time series {X, : t E (0, 2 1, 22, . . .)) has an autocorrelation function of the form plh’, where 0 < lpl< 1, it can be written as a forward moving average of uncorrelated random variables
64 MOVING AVERAGE AND AUTOREGRESSIVE F'RocesSeS
or as a backward moving average of uncorrelated random variables. Likewise,
XI - pXl-, =e l (2.6.16)
defines a sequence of uncorrelated random variables, as does
x; - p - ' x t - , =z;. (2.6.17)
From the representations (2.6.13) and (2.6.15), one sees that the only stationary time series with a unit root is the trivial time series that is a constant (with probability one) far all t. See Exercise 2.19.
While both the e, of (2.6.16) and the Z, of (2.6.17) are uncorrelated random variables, the variance of Z, is larger than the variance of the e, by a factor of p-*. This explains why the representation (2.6.16) is the one that appears in the applications of stationary autoregressive processes. That is, one typically chooses the p in an equation such as (2.6.16) to minimize the variance of e,. It is worth mentioning that nonstationary autoregressive representations with roots greater than or equal to one in absolute value have appeared in practice. Estimation for such time Series is discussed in Chapter 10. That the stationary autoregressive time series can be given either a forward or a backward representation also finds some use, and we state the result before proceeding.
cordlary 2.6.13. Let the covariance function of the time series {X,: t E (0,+1, 52, . . .)) with zero mean satisfy the difference equation
P
where q, = 1 and the mots of the characteristic equation
are less than one in absolute value. Then X, satisfies the stochastic difference equation
n
where {el} is a sequence of umrrelated (0,a2) random variables, and also satisfies the stochastic difference equation
where {u,} is a sequence of uncorrelated (0,a2) random variables.
ALTERNATIVE REPRESENTATIONS OF MOVING AVERAGE PROCESSES 65
Proof. Omitted. A
Having demonstrated that a stationary finite autoregressive time series can be given an infinite moving average representation, we now obtain an alternative representation for the finite moving average time series. Because the finite moving average tim series can be viewed as a difference equation in e,, we have a result parallel to that of Theotem 2.6.1.
beo or ern 2.6.2. Let the time series (X, : t E (0, t 1, . . .) be defined by
X,=e,+b,e , - , +b2e , - ,+ . . .+b ,e , - , , t = 0 , & 1 , t 2 ,...,
where b, Z 0, the roots of the characteristic equation
are less than one in absolute value, and {e,} is a sequence of uncorrelated (0, d) random variables. Then X, can be expressed as an infinite autoregressive process
m
Z C ~ X , - ~ = e, , j - 0
(2.6.18)
where the coefficients cj satisfy the homogeneous difference equation
cj + b,cj- , + b,c,-, + - + bqcj-, = O , j =q. q + 1,. . . , (2.6.19)
with the initial conditions co = 1, c , = -b, ,
c,= -b,c, - b, ,
cq- , = -b,c,_, - b,cq-, - * * - bq-, *
Proof. Reserved for the reader. A
The autocorrelation function of the pth autoregressive process is nonzero for some integer greater than No for all No. See Exercise 2.33. By Corollary 2.6.1.2, the partial autocorrelation function of the pth order autoregressive process is zero for argument greater than p. The opposite conditions hold for the autocorrelations of the moving average process.
Comflsry 2.6.2.1. Let the time series X, be the qth order moving average defined in Theorem 2.6.2. Let No be given. Then there is some k > No such that the partial autocorrelation function 4(&) # 0.
hf. By our definition, X , = X:,, b,e,+, where b, = 1 and 6, # 0. If Qyk) =
66 MOVING AVERAGE AND AUTORGORESSIVE PROCESSES
0 for all k >No, then the coefficients cj in Theorem 2.6.2 must be zero for j > No. By Theorem 2.6.2 the coefficients c, satisfy the difference equation (2.6.19). If al l cj are equal to zero for j >No, then cNo must be zero if it is to satisfy (2.6.19). Because bq # 0, this leads to the conclusion that all c, are zero, which contradicts
). A the initial condition for (co, c I , . . . , cq
In discussing finite moving averages we placed no restrictions on the co- efficients, and, for example, the time series
U, = e, - (2.6.20)
is clearly stationary. The root of the auxiliary equation for th is difference equation is one, and therefore the condition of Theorem 2.6.2 is not met. An attempt to express e, as an autoregressive process using that theorem will fail because the remainder associated with an autoregressive representation of order n will be ef-"-I*
Time series satisfying the conditions of Theorem 2.6.2 are sometimes called invertible moving averages. From the example (2.6.20) we see that not all moving average processes are invertibfe.
In our earlier discussion of the moving average time series we demonstrated that we could always assign the value one to the coefficient of e,. We were also able to obtain all autocorrelation functions of the first order moving average type for an a restricted to the range [- 1.11. We are now in a position to generalize this resuit.
Theorem 2.6.3. Given a time series XI with zero mean and autocorrelation function
1, h = O , p,(h)= dl), h = l , I 0 , h > l ,
where Ip(l)l< 0.5, there exists an a, 1.1 < 1, and a sequence of unmW random variables {e,) such that X, is defined by
XI = e l + cue,-, . pro~f. The equation ~ ( 1 ) = (1 + a2)-'a in a has one root that is less than
one in absolute value and one that is greater than one in absolute value. The root of smaller absolute value is chosen, and we define el by
By Theorem 2.2.1 this random variable is well defined as a limit in squared mean,
ALTERNATIVE REPRESENTATIONS OF MOVING AVERAGE PROCESSES 67
and by Theorem 2.2.2, m W
m m
X(h) = (-a)h'2j%(0) f ( I + a2)(-a)h-'+ZJ %(I) J"0 j = O
( I + a2)(-a)h-' a
1 - a 1 -a2 = 0 , h > O . A
The dualities between autoregressive and moving average representations established in Theorems 2.6.1 and 2.6.2 can also be described using formal operations with the backward shift operator. We recall that the first order autoregressive time series
K - PK- , = (1 - P W K = e, . / P I < 1 ,
can be written as
where it is understood that 53' is the identity operator, and we have written
( I - p a ) - ' = 2 (Bp)' 150
in analogy to the expansion that holds for a real number IpSeI 4 1. The fact that the operator can be formally manipulated as if it were a real
number with absolute value 1 furnishes a useful way of obtaining the alternative expressions for moving average and autoregressive processes whose characteristic equations have roots less than one in absolute value. Thus the second order autoregressive time series
XI + + a2Xl-, = e,
can also be written as
or
68
or
MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
where rn, and m2 are the roots of the characteristic equation
rn2 + a,m + a, = 0 .
Since the roots of the quadratic in 98
1 + a , 9 + a , 9 ~ = 0
are the reciprocals of the mots of the characteristic equation, the restriction on the mots is often stated as the requirement that the roots of
I + a , B + a2i@' 50
be greater than one in absolute value. Using the backward shift operator, the invertible second order moving average
Y, = e, + b,e,- , + b 2 q 2
can be given the representation
y,=(l + b , B +b,S2)e,
or
r, = (1 - 8, W(l - g, w e , 5
g2 + b,g +b2 = o , where g, and g,, the roots of
are less than one in absolute value. The autoregressive representation of the moving average process is &en given by
The backward shift representation of the moving average process is useful in illustrating the fact that any finite moving average process whose characteristic equation has some roots greater than one and some less than one can be given a representation whose characteristic equation has all roots less than one in absolute value.
Theorem 2.6A. Let {X, : t E (0, f 1,22. . . .)} have the representation "
ALTERNATIVE REPRESENTATIONS OF MOVING AVERAGE PROCESSES 69
where the e, are uncomlated (0,~') random variables and none of the mi, i = 1.2,. . , , p, are of unit absolute value. Let rn,,m,, . . . , mL, O < L G p , be greater than one in absolute value. Then X, also has the representation
where the q are uncorrelated (0, cr'[IIf=, mil2).
prod. We define the polynomial 2( 8) in 3 by
The mots of 2(3) are greater than one in absoiute value, and the roots of mP$(m-') are less than one in absolute! value. Therefore the time series
where the wj satisfy the difference equation
and the initial conditions outlined in Theorem 2.6.2, is well defined. Let mi be a real root greater than one in absolute value, and &fine 2, by
Then the two time series Z, -m,Zf-l and ml(Zf - m ; ' Z , - , ) have the same covariance function. Likewise if m , and m, = rn: are a complex conjugate pair of mts with absolute value greater than one, 2 - (m, + mz)Z,- , + Im, 122f-z has the same covariance function as Im,1*[z, - (m;' + m ; l ) Z , - , + I~,~-~z,-~I. By repeated application of these arguments, we conclude that the original time
S e r i e s
70
and the time series
MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
have the same covariance function. Therefore,
9 = 3 - I ( 3 ) X ,
and
have the same covariance function.
As an example of this theorem, the moving average
X, = el - 3.2er-, - 3.2e,-, = (1 - 4.0B)(l f 0.83)e1 ,
where the e, are uncorrelated (0, d), also has the representation
XI = e, + 0.55~,-, - 0.20~~,_,
= (1 - O.WSe)(l+ 0.8@)% ,
A
where the 6, are uncorrelated (0,160~). The reader may check that ~ ( 0 ) . ~ ( l ) , and ~ ( 2 ) an 21.48az, 7.04~~. and -3.20a2, respectively, for this moving average.
2.7. AUTOREGRESSIVE MOVING AVERAGE TIME SERIES
Having considend autoregressive and moving average processes, it is natural to investigate time series defined by the combination of low order autoregressive and moving average components. The sequence {XI : t E (0,L 1,22 , . . .)), defined by
where ap # 0, bq # 0, and (el} is a sequence of unconelated (0, a') random variables, is called an autoregressive moving average time series of order ( p , q), which we shall abbreviate to autoregressive moving average ( p , 4). The notation ARMA(p, q) is also commonly used.
The autoregressive moving average (1, l),
(2.7.2)
ALTOREGRESSIVE MOVfNG AVERAGE TIME SERIES 71
where lf?,l< 1, has furnished a useful approximation for some time series encountered in practice. Note that we have defined the coefficient on X,- I as -6, to simplify the representations to follow. If we let
u, = e, + b,e,-l
and write
x, = e,x,-, + u, ,
we can express XI of (2.7.2) as an infinite moving average of the u,, m
and hence of the el,
m
=el + (6, + b , ) x 6{-ier-J, j = I
(2.7.3)
where the random variables are defined by Theorem 2.2.1. If lb, I < 1, we can also express e, as an infinite moving average of the u,,
and hence of the X,, I)
e,=X,-(6, + b l ) x (-bi)’-’X,-j,
If we let 2, denote the first order autoregressive process
j - I (2.7.4)
then
X, =Z, + b ,Z , - , . (2.7.5)
That is, the time series can be expressed as a first order moving average of a first order autoregressive time series. Using this representation, we express the autocovariance function of X, in terms of that of Z,,
72
to obtain
MOVING AVERAGE AND AUTOREGRESSIVE PRWSSES
Hence,
and
Thus, for h 3 1, the autocorrelation function of the autoregressive moving average ( 1,l) has the same appearance as that of a first order autoregressive time series in that it is declining at a geometric rate where the rate is 6,.
From equation (2.7.7) we see that ~ ( h ) is zero for h 3 1 if b, = -el. The autoregressive moving average (1,l) prucess reduces to a sequence of uncorre- lated random variables in such a case, which is also clear from the representation in (2.7.3).
To avoid this kind of degeneracy, the autoregressive moving average of order (p, q) is often defined with the condition that none of the roots of
are roots of
mq + 6,mQ-' + * 4- 6, = O . (2.7.9)
The autoregressive and moving average representations of (2.7.3) and (2.7.4) generalize to higher order processes in the obvious manner. We state the generalizations as theorems.
Theorem 2.7.1. Let the stationary time series X, satisfy
(2.7.10)
where a, = b, = 1, the roots of (2.7.8) are less than one in absolute value, and {e,} is a sequence of uncorrelated (0.a') random variables. Then X, has the
AUTOREORESSIVE MOVING AVERAGE TIME SERIFS
representation
XI = C 0: yer-/ '
j = o
73
(2.7.1 1)
whereq,=l,v=Oifj<O,and
P
y = bj - C a i y - , i - I
with b j = O f o r j > q .
Proof. writing
P
C a 7 f - j = u, , j = O
we obtain
m m / a \
XI = 2 W ] U , - ~ = C w j ( , x bit-,,,) J 'o j - 0 i=o
where u, = Zymo hie,+ and the wj are defined in Theorem 2.6.1. Since the wj are absolutely summable, this representation is well defined as a limit in mean square. We then write
a D \
and obtain the result by equating coefficients of e, +, i = 0.1, . . . . A
The first few u, of Theorem 2.7.1 can be used to compute the first few covariances of the process. If we multiply the defining equation of Theorem 2.7.1 by for r = 0, 1, . . . , p and take expectations, we have
where it is understood that the summation on the right of the equality is zero for r > q.
74 MOVING AVERAQE AND AUTDREORESSIVE PROCESSES
Theorem 27.2. Let the stationary time series XI satisfj D a
where u0 = b, = 1, the roots of (2.7.8) and of (2.7.9) are less than one in absolute value, and {e,} is a sequence of uncorrelated (0, cr') random variables. Then XI has the representation
(2.7.12)
wheredo=l,dj=Oforj<O, and 0
d, = - 2 bidj-, + uj , , = I
with a, = 0 for j >p.
Proof. Reserved for the reader. A
The backward shift operator is often used to give a compact representation for the autoregressive moving average. Let
A(m) = 1 + u,m + - - - + u p P
and
B(m) = 1 + b,m + * 3. b,m9
be the polynomials associated with the autoregressive moving average (2.7.10). Then we can write
A ( 8 ) x , =wb, (2.7.13)
or, if the roots of A(m) = 0 art! greater than one in absolute value,
XI = A-'( B)B( B)e, (2.7.14)
or, if the roots of B(m) = 0 are also greater than one in absolute value,
el = B-'( 8 ) A ( BP, . (2.7.15)
The representation (2.7.14) corresponds to (2.7.11), and the repmentation (2.7.15) corresponds to (2.7.12). Also see Exercise 2.26.
The equation (2.7.13) defines an autoregressive moving average constructed with el variables. By (2.7.14). we could apply the same moving average operator to any sequence. Thus, if C(m) and D(m) are polynomials whose roots are greater
VECTOR PROCESSES 75
than one in absolute value, and if XI is the time series of (2.7.13), we might define the new time series Y, by
or
v, = c- I ( @)o( a>x, = C-'( 46))D(@)A-'( B)B( a ) e ,
=G- ' (B)H(g)e, ,
(2.7.16)
where G( 93) = C( B)A( a) and H( B) = o( a)B( a). If there are any common factors in G(B) and H(99), they can be removed to obtain the standard representation. On the basis of this discussion, we might say that an autoregressive moving average of an autoregressive moving average is an autoregressive moving average.
2.8. VECTOR PROCESSES
From an operational standpoint the generalizations of moving average and autoregressive representations to the vector case are obtained by substituting matrix and vector expressions for the scalar expressions of the preceding sections.
Consider a first order moving average process of dimension two:
x,, = bttet,t-, + h 2 e 2 . r - I +el , * (2.8.1)
The system may be written in matrix notation as
XI = Be,-l +el,
where {e,} is a sequence of uncorrelated (0,X) vector random variables,
and we assume B contains at least one nonzero element. Then the autocovariance function of X, is
76
and it follows that
MOVING AVERAGE AND AUTOREORESSIVE PROCESSES
fBI;B' + 2, h = 0, h = l , h = - 1 , r (h) =
l o otherwise.
Note that r(h) = P(-h) , as we would expect from Lemma 1.7.1. The qth order moving average time series of dimension k is defined by
where the Bj are R X k matrices with B, = I and at least one element of B, not zero, and the e, are uncomlated (0,C) random variables. Then
B,ZB,!+h, 0 6 h "4, i -0
r ( h ) = q f h
Bj-,ZB;, -4 ZS h 0, I j = O
(2.8.2)
10 9 I h h .
As with the scalar process, the autocovariance matrix is zero for In investigating the properties of infinite moving average and autoregressive
processes in the scalar case, we made extensive use of absolutely summable sequences. We now extend this concept to matrix sequences.
q.
Deftnition 2.8.1. Let {Gj}%, be a sequence of k X r matrices where the elements of G, are gi,,,(j), i = 1.2,. . . , k, m = 1,2,. . . , r. If each of the kr sequences {gi,,,(j)},m=l is absolutely summable , we say that the sequence {G,},m=, is absolutely summable.
The infinite moving average m
x, = 2 Bier-/ j -0
where the el are unconelated (0 .2) random variables, will be well defined as an almost sure limit if {B,} is absolutely summable. Thus, for example, if XI is a vector stationary time series of dimension k with zero mean and absolutely summable covariance function and {T.},m- is an absolutely summable sequence of k X k matrices, then
VECTOR PROCESSES 77
is well defined as an almost sure limit and
The vector autoregressive process of order p and dimension k is defined by
2 A,X,, = e, I
j-0 (2.8.3)
where the e, are uncomlated k-dimensional(0, Z) random variables and the Aj are k X k matrices with A, = I and A, # 0.
Any autoregressive process can be expressed as a first order vector process. Let Y, = (Xi, X I f - , , . . . , X,'-p+ ,)' and r, = (e:, 0,. . . , O)l, where the X, satisfy (2.8.3). Then we can write
Y, +AY,-, = e,,
This representation for the scalar autoregressive process was used in the proof of Theorem 2.6.1.
The process (2.8.3) can be expressed as an infinite one-sided moving average of the e, when the roots of the characteristic equation are less than one in absolute value.
Theorem 28.1. Let the stationary time series {X,: r E (0,+1,+2,. . .)} satisfy (2.8.3), and let the roots of the determinautal equation
lA,mP + A,mp-' f * - - + Apl = O (2.8.4)
be less than one in absolute value. Then X, can be expressed as an infinite moving average of the e,, say
m
78 MOVING AVERAGE AND AUTOREGReSSIVE PROCESSES
where KO =I, K, = -Al ,
X2= -A,K, - A , ,
P
Proof. Essentially the same argument as used at the b e m g of Section 2.5 may be used to demonstrate that 2~~oKje , - j is the solution of the difference equation. Since the roots of the determinantal equation are less than one in absolute value, the sequence of matrices {K,) is absolutely summable and the random variables are well defined as almost sure limits. A
Given Theorem 2.8.1, we can construct the multivariate Yule-Walker equations for the autocovariances of the process. Multiplying (2.8.3) on the right by X;-$, s 3 0 and taking expectations, we obtain
D
where we have used
2 Ajr(j) = 2, i = o
These equations are of the same form as Corollary 2.6.1.1, but they contain a larger
Theorein 2&2. Let the time series XI
s = o ,
(2.8.5)
s = 1,2,. . . ,
s = o , otherwise.
those given in equation (2.5.6) and number of elements.
be defined by
where the e, are uncorreIated k-dimensional vector random variables with zero mean and covariance matrix 2, the B, are k X R matrices with B, = I, B, # 0. and the kq roots of the detenninantal equation
IB,m9 + Blmq-l + * * + Bgl = 0 (2.8.6)
are less than one in absolute value. Then XI can be expressed as an infinite
PREDlCT'ION 79
autoregressive process m
e, = C c,x,-, , j - 0
where the C, satisfy C,=I, C, = -Bl,
C2=-B,C, - B 2 ,
Proof. Reserved for the reader. A
It can also be demonstrated that an autoregressive moving average of the form
where the roots of (2.8.4) and of (2.8.6) an less than one in absolute value, can be given either an infinite autoregressive or an infinite moving average representation with matrix weights of the same form as the scalar weights of Theorems 2.7.1 and 2.7.2.
2.9. PREDICTION
One of the important problems in time series analysis is the following: Given n observations on a realization, predict the (n + s)th observation in the realization, where s is a positive integer. The prediction is sometimes called theforecast of the (n + s)th observation. Because of the functional nature of a realization, prediction is also called extrapotation.
In some areas of statistics the term predictor is applied to a function of observations used to approximate an unknown random quantity, and the term estimator is applied to a function of observations used to approximate an unknown Jixad quantity. Some authors use the term estimator to describe the function in both situations, In time series, the tern $her is often used for the function of observations used to approximate an unknown value of the random process at time c. If only values of the process prior to t are used in the approximating function for the time series, the function is called a predictor. We adopt this use and apply the term predictor to functions used to approximate a future term of the realization. We will use estimator as a generic term for a function of observations used to approximate either a fixed or a random quantity.
80 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
In order to discuss alternative predictors, it is necessary to have a criterion by which the performance of a predictor is measured. The usual criterion, and the one that we adopt, is the mean square e m of the predictor. For example, if ?,+,(Y,, . . . , Y,) is the predictor of Y,+, based on the n observations Y,, Y,, . . . , Y,, then the mean square m (MSE) of the predictor is
Generally, the problem of determining optimal predictors requires that the class of predictors be restricted. We shall investigate linear predictors.
The best linear predictor for a stationary time series with k n o w covariance function and known mean is easily obtained.
Theorem 2.9.1. Given n observations on a realization of a time series with zero mean and known covariance matrix, the minimum mean square error linear predictor of the (n + s)th, s = 1,2, . . . , observation is
f,+,(Y,, . . . , Y,) = y'b,, , (2.9.2)
where y' = (Y,, Y,, . . . , Y,), V,, = Eby ' } ,
and V,+, is a generalized inverse of V,,. [See Rao (1973) for a discussion of generalized inverses.] The mean of the prediction e m r is zero, and the variance of the prediction error is
(2.9.3)
Proof. To find the minimum mean square error linear predictor, we minimize
with respect to the vector of coefficients b. Differentiating the quadratic form with respect to b and equating the derivative to zero, we find that a b that minimizes the expectation is b,, = V;,V,,. While b,, is not necessarily unique, the predictor y'b,, is unique. See Exercise 2.42. The mean of the prediction error is zero because the mean of the process is zero and
PREDICTION 81
The variance of the prediction e m r follows immediately from least squares regression theory. A
To understand why the generalized inverse was required in Theorem 2.9.1,
XI = el cos m , (2.9.4)
where e, is normally distributed with zero mean and variance u2. This time series is perfectly periodic with period two and covariance function ~ ( h ) = u2 cos wh. The matrix V,, for a sample of three Observations is
consider the time series
(2.9.5)
which is easily seen to be singular. Thus there are many possible linear combinations of past values that can be used to predict fume values. For example, one can predict X, by -X,- I or by X,-2 or by - +XI- .f 3X1-,.
The time series of equation (2.9.4) is a member of the class of time series defined by
X, = e l cos At+e, sin At , (2.9.6)
where ei, i = 1,2, are independent (0, a') random variables and A E (0, TI . Given A and two observations, it is possible to predict perfectly all future observations in a given realization. That is, for n a 2,
L1=l 1 - 1 J Lr-1
and all future observations can be predicted by substituting these e's into the defining equation (2.9.6).
This example illustrates that in predicting future observations of a realization from past observations of that realization, the element oEf l is fixed. In the current example, o€a is a two-dimensional vector and can be perfectly identified for a particular realization by a few observations on that realization. In the more common situation, w E $2 is infinite-dimensional, and we can never determine it perfectly from a finite number of observations on the realization.
Time series for which it is possible to use the observations q, Y,-, , . . . of a realization to predict f u h observations with zero mean square e m r are called deterministic. They are also sometimes called singular, a term that is particularly meaningful when one looks at the covariance matrix (2.9.5). Time series that have a nonzero lower bound on the prediction error are called regular or nodeterninis- tic.
82 MOVING AVERAGE AND AUTOReORESSIVE PROCESSES
DefZnition 2.9.1. A time series is nonshgular (regular, nondetenninistic) if the sequence of mean square e m of one-period prediction, T : ~ , is bounded away from zero. A time series is singular (u'eteministic) if
2 lim rf l l = 0 . n - w
The vector of coefficients in Theorem 2.9.1 for one-period predictions can be computed recursively for nondeterministic time series. The recursive procedure is known as the Durbin-bvinson algorithm. See Durbin (1960) a& Morettin (1984).
Theorem 2.9.2. Let be a zero mean, covariance stationary, nondetenninis- tic time series defined on the integers. Then the coefficients defining the predictor of Yn+l given in (2.9.2) of Theorem 2.9.1 satisfy
(2.9.7)
(2.9.8)
and
forn=1,2, ... .
prool. It follows from the symmetry of the covariance function of a stationary process that the coefficients of the predictor of Yn based upon (Y,, . . . , Y n V l ) are the coefficients of the predictor of Y, based upon (q, . . . , Y2). That is,
and n- I
PREDICl’ION 83
where f,(Y,,. . . , Y2) is the predictor of Y, based on (Y2,. . . , Y,) and
is the vector of (2.9.2) for s = 1. The coefficient vector b,-,,, is unique because the covariance matrix of a nondeterministic time series is positive definite.
The best predictor of Y, + based on (Y , , . , . , Y,) can be constructed from the regression of Y , + ~ on the n-dimensional vector [Y~ - f , ( ~ , , , . . . , Y,), Y~,. . . , Y,I. By stationarity, the coefficients for the regression of Y,+, on (Y2, , , . , Y,) are the same as the coefficients for the regression of Y, on (Yl,. . . , Y,- , ) . Because ?l(Yn, . . . , Y2) is the best linear predictor, Y, - f, (Y,, , , . , Y2) is uncorrelated with (Y2,. . . , Y,), and the regression coefficient for Y, - f l (Y , , . . . , Yz) is
Therefore,
and we have the result (2.9.7). The e m r mean square for the prediction of Y,+, based on Y, , Yz, . . . , Y, is equal to the prediction variance for the prediction of Y,+ , based on Y2, Y,, . . . , Y, reduced by the sum of squares due to Y, after adjusting for Y2, Y3,. .., Y,. The variance of the prediction e m for ?,,+l(Y,, . . . , Y,) is 7,-,,,, and b:,],, is the squared conelation between Y,+, -
A 2
fn+ l (Y2 , . . . , Y,) and Y, - f ,(Y,, . . . , Y,), giving the result (2.9.9).
Theorem 2.9.1 gives a quite general solution to the problem of linear prediction, but the computation of n X n generalized inverses can become inconvenient if one has a large number of predictions to make. Theorem 2.9.2 presents a recursive method of computing the coefficients of the lagged Y‘s required for prediction. However, in general, these computations also become sizable as n increases. Fortunately, the forecast equations for autoregressive processes are very simple, and those for invertible moving average processes are fairly simple.
We begin with the first order autoregressive process, assuming the parameter to be known. Let the time series Y, be defined by
Y, = P Y , - ~ + e f t [ P I < 1 ,
where the e, are independent (0, a’) random variables. We wish to construct a predictor of Y,+l =pY, f e n + , . Clearly, the “best predictor” of pY, is pY,.
84 MOMNQ AVERAGE AND AUTOREGReSSIVE PROCESSES
Therefore, we need only predict e,,,. But, by assumption, c independent of Y,,, Yn-,, , . . . Hence, the best predictor of en+, is zero, si I dcting en+, is equivalent to predicting a random selection from a distribution with zero mean and finite variance. Therefore, we choose as a predictor of Yn+] at time n
P,+,(Y,, . . ., Y,) = PY, . (2.9.10)
Notice that we constructed this predictor by finding the expect& value of Yn+l given Y,, Yn-,, . . . , Y l . Because of the nature of the time series, knowledge of Y,-,, Yn+. . . , Y, added no information beyond that contained in knowledge of Y,. It is clear that the mean square error of this predxtor is
E{EY"+, - ~,+I(Y,, . . , ,yR)J2}=~{e,2+1}=~*.
To predict more than one period into the future, we recall the representation s
Y,+, = P'Y, + 2 pf-je,,,+, , s = 1,2,. . . . (2.9. I 1)
It follows that, for independent el, the conditional expectation of Y,,, given Yls Yz,. . . , Y, is
I - 1
aY,., I Y,, *. * , Y,l=P"R 9 (2.9.12)
and h i s is the best predictor for &+,.
autoregressive process be defined by These ideas generalize immediately to higher order processes. Let the pth order
where the roots of
are less than one in absolute value and the el are uncorrelated (0, a') random variables. On the basis of the arguments for the first order process we choose as the one-period-ahead hear predictor for the stationary pth order autoregressive Pmcms
P
~ , + , ( Y , , . . ., Y,)= c t y , + I _ J * (2.9.13) j - 1
It is clear that this predictor minimizes (2.9.1), since the prediction erro~ is eAsl and this is uncorrelated witb Y, and all previous q ' s . If the e, are independent random variables, the predictor is the expected value of Y,+] conditional on Y,, Y,- I , . . . , Yl . If the e, are only uncorrelated, we cannot make this statement. (See
PREDIO'ION 85
Exercise 2.16). To obtain the best two-period linear prediction, we note that
where it is understood that eP+, = 0 for i = 1,2,. . . and, to simplify the subscripting, we have taken p 2 2. Since en+2 and en+, are uncorrelated with (Yt , Y,, . . . , Y,), the two-period predictor can be constructed as
P
In general, we can build predictors for s periods ahead by substituting the predictors for earlier periods into (2.9.13). Hence, for p b 2,
fn+,(Y,, . . . , Y,) = c qfn-,+JY,, . . . , Y,) + c qY,-,+, , s-1 P
s = 2,3, . . . , P , i-1 J'S
(2.9.15) D
= c, q f e - , + J Y I , . * - , Y,) 1 s = p + 1, p + 2,. . . . j - I
We now turn to prediction for the qth order moving average time series
Y, = C ge , - , + e, . j = 1
(2.9.16)
We assume S,, j = 1,2, . . . , q, are known. The pth order autoregressive pmcess expresses Y, as a linear combination of p previous Y's and e,, while the gth order moving average process expresses Y, as a linear cornbination of q previous e's and e,. In both cases, e, is uncomlated with previous Ys and uncorreiated with previous e's. Suppose we knew the e, for t = R, n - 1,. . . , R - q + I for the process (2.9.16). Then, by arguments analogous to those used for the auto- regressive process, the best linear predictor for s periods ahead would be
86 MOVING AVERAGE AND AUTOREORESSIVE PROCESSES
If the el's are independent,
fn+$(e1, . . . , e,) = E{Y, + 8 I e l , e2, . . . en) . Because we do not know the e,, we must develop predictors based upon estimators of the e,, where the estimators of the e, are functions of the observed Y,. We first show that the predictions of Thearem 2.9.1 can be expressed in terms of the
The recursive computation is based on the Gram-Schmidt ortbgonalization. Brockweli and Davis (1991, p. 171) call the procedure the innovation algorithm, where the Z, are the innovations. We give the proof for stationary time series, but the result holds for more general covariance functions.
. . one-period prediction errors of previous pmhctions when U, is nollclefeRllllll stic.
Theorem 2.93. Let Y, be a zero mean, stationary, nondeterministic time series defined on the integers. Let
I- I . .
2, = U, - 2 c&, t=2 ,3 , . . . , j - I
(2.9.18)
(2.9.19) i - 1
K [ - ~ [ yY(t - i ) - c, ,cjj~;] , i = 2.3,. . . , f - 1 , c*i = j - I
and
Then the best predictor
and the variance of the
of Yn+, given (Y,, . . . , Y,) is
prediction error is
rK:+l. s = l ,
(2.9.20)
(2.921)
(2.9.22)
Proof. Because Z, is the residual from the regression of yi on (Zt-l,
PREDICTION 87
q-2, , . , , Z,), Z, is uncomlated with (Zip I , Zi-2,. . . , Zl), and hence the Z's are mutually uncorrelated. The covariance between U, and Zl for t > i is
Cov(u,, Z,) = cov
= 'K.(t - i ) - cov u,. z cyz, ( :I: ) ) ( r = t I- 1
I - 1 I - I
= r;(t - i ) - cov zI + Z ctrzr, t: ci,q i - 1
where we have used the zero correlation property of the Z,. Dividing Cov(Y,, Z,) by the variance of q, we obtain the expression for c,,. The variance of the prediction error is the regression error mean square, and by the zero correlation property we have (2.9.20).
Because of the m correlation property, the coefficients of (Z", . . . , Z,) in the regression of Ym on (Z8, . . . , 2, ) are the same as the coefficients of (Z,, . . . , Z, ) in the regression of Y,+, on (Z,,+s-l,. . . , Zl). The result (2.9.21) follows. The result (2.9.22) is an immediate consequence of (2.9.21) and (2.9.20). A
The recursive equations of Theorem 2.9.3 are particularly effective for predictions of finite moving average processes. Because the autocovariances of the qth order moving average are zero for h > q, the cfi of Theorem 2.9.3 are zero for i < t - q .
CoroUary 2.93. Let Y, be the qth order moving average
where the e, are uncorrelated (0.0') random variables. Assume that the Z,, cli, and ~ j ' of Theorem 2.9.3 have been constructed for j = 1.2,. . . , t - 1, where t - 1 > q. Then the cfi of Theorem 2.9.3 are
cf i=O, i < t - q , - 2
c:,:-q = K l - q x ( 4 )
and
f o r i = t - q + 1 , ..., t -1.
88 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
Proof. omitted. A
If the roots of the moving average process are less than one in absolute value, then the 2, of Theorem 2.9.3 converges to e,, ~f converges to cr2, and qf - , converges to p, as t increases. For invertible moving averages, the coefficients are often approximated by the pi. Let (Zl ,Z2, . . . , gq) be initial estimates of (eI, e,, . . . , eJ, and let
4
z, = y, - z S,z,-j , t = q + I , q + 2 , ... I (2.9.23) i= 1
Then c?,, for n > q is
PI-1 n-1-0
where the cj are defined in Theotem 2.6.2. For an invertible moving average, the Icjl converge to zero as j increases, and Zn converges to e, in mean sqm as n increases. The initial values of (2.9.23) can be computed using Theorem 2.9.3 or by setting Z-q+l,i?-q+2,.,.,Zo equal to zero and using (2.9.23) for t = 1,2,. . . , q. Therefore, an easily computed s-pen'od predictor for the qth order invertible moving average, appropriate for large n, is
where E' is defined in (2.9.23). The prediction error for fa+ (Yl, . . . , Y,) goes to a2 as n increases.
The procedures for autoregressive and moving average processes are easily extended to the autoregressive moving average time series defined by
P P u, = 2 q v , , + c Bier-I + ef '
I- 1 i= 1
where the roots of
and of
PREDICTION 89
are less than one in absolute value and the e, are uncorrelated (0,a2) random variables. For such processes the predictor for large n is given by
s = l ,
s = 2.3,. .
(2.9.25)
. Where
ro 1 t = p, p - 1. . . . ,
and it is understood that 3 = 0 for j >p and 4
cp izn- i+s=o . s = q + l , q + 2 ,...,
' c ~ Y n - j + s = o , s = p + l , p + 2 ,....
i=s
P
j - a
The stationary autoregressive invertible moving average, which contains the autoregressive and invertible moving average processes as special cases, can be expressed as
m
where q, = 1, X,m'o lyl< 03, and the e, are uncorrelated (0, v2) random variables. The variances of the forecast errors for predictions of more than one period are readily obtained when the time seria is written in this form.
Theorem 2.9.4. Given a stationary autoregressive invertible moving average with the representation (2.9.26) and the predictor
n++--l .. . .
Un+,CY1,. . . , Yn> = c q,-,+, . j - s
where 2,. t = 1.2,. . . , n, is defined following (2.9.25). then
m MOVING AVERAGE AND AUTOREGRESSIVE PRocesSES
Proof. The result follows immediately from Theorem 2.2.1. A
Using Theorem 2.9.4, we can construct the covariance matrix of the s predicton for Y,+,, YR+2,. . . , &+,. Let Y:.# = [Y,+I, Y , + 2 , . . . , Y,+,l, and let
*:.# = if,+,CY,, . * . f Y,), fR+2(YI, y2, * * * 9 %I,. . * , f,+,<Y,, . * * 1 Y,)l. Then
n-w l h E { V , . s - Pn.,XYn.s - q n . s I f I ... 1 Y VJ- I
I
u2. (2.9.27)
Note that the predictors we have considered are unbiased: that is,
w,,, - f,+,<y,, - * * * = 0
Therefore, the mean square e m of the predictor is the variance. To use this information to establish confidence limits for the predictor, we require knowledge of the distribution of the e,. If the time series is normal, confidence intervals can be constructed using normal theory. That is, a I - level confidence interval is given by
t+,v*, - 4 9 q)qJVar(Y,+, - P,+,<Y,, * * ' S Y")111/2 I
where r? is the value such that 1 of the probability of the standard normal distribution is beyond Ztv.
Example 29.1. In Example 6.3.1, we demonstrate that the U.S. quarterly seasonally adjusted unemployment rate for the period 1948-1972 behaves much likethetimeseries
U, - 4.77 = 1.54(U,-, - 4.77) - 0.67(U,-, - 4.77) + e, , where the e, are uncorrelated (0,O. 115) random variables. Let u8 assume that we know that this is the proper representation. The four observations for 1972 are 5.83, 5.77, 5.53, and 5.30. The predictions for 1973 are
~7, - l (Y72- l , . . . , Y72-1) = 4.77 + 1.54(0.53) - 0.67(0.76) = 5.08,
PREDICYYON 91
f73-z(Y,z-t2. . -, Y72-4) = 4.77 + 1-54(0.3070) - 0.67(0*53) = 4.89,
Y7,-,(Y7,- I, . . . , Y72-4) = 4.77 + 1.54(0.1177) - 0.67(0.3070) = 4.75,
f73-4(Y7z-l,, . . , Y,2-4)=4.77 + 1.54(-0.0245)-0.67(0.1177)
= 4.65 . The covariance matrix of the prediction emrs can be obtained by using the
representation Y, = X,?wo w,e,-j, where the w’s are defined in Theorem 2.6.1. We have a, = - 1.54, az = 0.67, and
w , = l , w 1 = 1.54,
w2 = ( 1.54)’ - 0.67 = 1.702 ,
w, = 1.5q1.702) - 0.67(1.%) = 1.589.
Therefore, the covariance matrix of equation (2.9.27) given by the product
0.115 0.177 0.1% 0.183 0.177 0.388 0.478 0.477 0.1% 0.478 0.720 0.789 0.183 0.477 0.789 1.011
Where
1.59 1.70 1.54 1.00
We observe that there is a considerable increase in the variance of the predictor as s incresses from 1 to 4. Also, there is a high positive correlation between the predictors. AA
Example 29.2. In Table 2.9.1 are seven observations from the time series
U, = O.SU,_, + 0.7e,-, i- e, ,
where the e, are nonna( independent (0,l) random variables. The first 11 autocorrelations for this time series are, apjmximately, 1.00, 0.95, 0.85, 0.77, 0.69,0.62,0.56,0.50,0.45,0.41, and 0.37. The variance of the time series is about 14.4737. To construct the linear predictor for Y,, we obtain the seven weights by solving
92 MOVING AVERAOE AND AUTOREGRESSIVE PROCESSES
Table 2.9.1. Observations and predictions for the time sen- Y, = 0.9x-, f O.7et-, + e,
I Observation Prediction
1 2 3 4 5 6 7 8 9
10
-1.58 -2.86 -3.67 -2.33
0.42 2.87 3.43 -
- 3.04 2.73 2.46
the system of quations
. (2.9.28)
0.95
This system of equations gives the weights for the one-period-ahead forecast. The vector of weights b;,l = (0.06, -0.18,0.32, -0.50,0.75, - 1.10, 1.59) is applied to (Ul, Y2, Y3, Y4, Y,, Y6, Y7). [In obtaining the solution more digits were used than are displayed in equation (2.9.28).] The predictor for two periods ahead is obtained by replacing the right side of (2.9.28) by (0.45,0.50,0.56,0.62,0.77, OM)', and the weights for the three-period-ahead predictor are obtained by using (0.41,0.45,0.50,0.56,0.62,0.69,0.77)' as the right side. The predictions using these weights are displayed in Table 2.9.1.
The covariance matrix of the prediction mrs is
1 1.005 1.604 1.444
( 1.444 3.907 5.637 E{(Y,,, - 'P7,3)(Y7 - 'P7,3)') = 1.604 3.563 3.907 ,
where 97,3 and Y7,3 are defined following Theorem 2.9.4. The covarianm matrix of prediction m r s was obtained as AVA', where V is the 10 X 10 covariance matrix for 10 observations and
-0.05 0.16 -0.29 0.45 -0.68 0.99 -1.44 0 1 0 . -0.06 0.18 -0.32 0.50 -0.75 1.10 -1.59 1 0 0
-0.05 0.14 -0.26 0.41 -0.61 0.89 -1.29 0 0 1 I A=[
PREDlCTlON 93
If we had an infinite past to use in prediction, the variance of the one-period- abead prediction error would be one, the variance of e,. Thus, the loss in prediction efficiency from having only seven observations is about 3%. Since the time series is normal, we can construct 95% confidence intervals for the predictions. We obtain the intervals (1.08, 5.00). (-0.97,6.43), and (-2.19,7.11) for tbe predic- tions one, two, and three periods ahead, respectively.
To construct a predictor using the &method, we set e', = O and calculate
Zz = Yz - 0.9Y, - 0.7Z1 = -2.86 - 0.9(- 1.58) = - 1.438, E, = Y, - 0.9Y2 - o.7Zz = -3.67 - 0.9(-2.86) - 0.7(- 1.438) = -0.089, E4 = -2.33 - 0.9(-3.67) - 0.7(-0.089) = 1.035, Z5 = 0.42 - 0.9(-2.33) - 0.7( 1.035) = 1.793 , Z6 = 2.87 - O.g(O.42) - 0.7( 1.793) = 1.237, Z7 = 3.43 - 0.9(2.86) - 0.7(1.227) = -0.003 .
Therefore, the predictor of Y8 is
I',(Y,, . . . , Y,) = 0.9Y7 + 0.7Z7 = 3.085 ,
and
p9(Y)* . . 9 Y7) = 0.9f8(Y# I - . p Y7) = 2.776 1
?,,,(Y,,. . . , Y,) = 0.9f..(y,, . . + , Y7) = 2.498.
In this case the prediction using the approximate method of setting El = 0 is quite close to the prediction obtained by the method of Theorem 2.9.1, even though the sample size is small. If the coefficient of e,-, had been closer to one, the difference would have been larger.
By Theorem 2.7.1, we may write our autoregressive moving average process as m
U, = C ye,-, , j = O
q = b o = l , q = b , -a l =0.7+0.9=1.6, t j = - ~ ~ t j - ~ = ( 0 . 9 ) ~ - , , j = 2 , 3 ,... .
Therefore, the large sample covariance matrix of prediction e m for predictors one, two, and three periods ahead is
1.00 1.60 1.44
1.44 3.90 5.63
94 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
which is very close to the covariance matrix of the best predictor based on n = 7 obsemations. AA
2.10 THE WOLD DECOMPOSITION
We obtained moving average repmentations for statiomy autoregressive and autoregressive moving average processes h Sections 2.6 and 2.7. In this section, we give a moving average representation, due to Wold (1938), that is appropriate for any stationary process. We first investigate the properties of the sequence of one-period prediction errors of a stationary nondeterministic process. We are particularIy interested in the limit as n becomes large.
Theorem 218.1. Let (5) be a stationary, nondeterministic time series defined on the integers. Then there exists a unique (as.) time series (e,} defined as the limit in mean square of the one-period prediction error as n becomes large. Further- more, {e,} is a sequence of uncorrelated random variables with mean zero and common variance v2, where
(2.10.1)
and T:, is the variance of the one-period prediction error defined in Theorem 2.9.1.
Proop. Without loss of generality, let the mean of U, be zero. Define the new time series
(2.10.2)
i - I
where
bi-1.l = ( ’ ~ - i , ~ , ~ , ’ i - ~ , ~ , ~ ~ . ~ * B’i-1,i.t-l”’ (2.10.3)
E { w ~ L , ) = , i = 2,3,. . . , (2.10.4)
a d b,, and T:, are defined in Theorem 2.9.1. Because Y, is regular, E{W:) is bounded below and the vectors bi-l,, are unique. Each K, is a stationary time
THE WOLD DECOMPOSITION
Let
where
95
(2.10.5)
(2.10.6)
(2.10.7)
Then n
(2.10.8) E{e:(n)) = YAO) - I: C;T:- I , l = T~~ .
By (2.10.8), T:~ is monotone decreasing in n, and by the assumption that Y, is regular, T:~ is bounded MOW. Therefore, the limit
2
j- I
2 2 lim rn1 = u n+w
(2.10.9)
is well defined and
By the definitions,
96 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
because E{ef(n)} C %(O). Therefore, by Exercise 4.14 of Royden (1989, p. 93),
&,e,+,} =!$ E{e,(n)e,+h(n)l.
Similarly,
E{e,} = lim E{e,(n)} = 0 . n-v=
The covariance behween e,(n) and et+h(n), for n > h > 0, is
where we have used the fact that e,+,,(n) is uncorrelated with K+h- l , Yt+h-Z+ * * * x + h - n * Because
i - n - h + l m
we have
n - v lim E{e,(n)el+h(n)) = E{e,er+h) = 0 .
E(e :) = lim Ere f(n)J = CT* . Similarly,
A
We now give Wold's theorem for the representation of a stationary time series. The theorem states that it is possible to express every stationary time series as the sum of two time series: an infinite moving average time series and a singular time series.
n - m
Theorem 2.10.2 (Wold decomposition). Let U, be a covariance stationary time series defined on T = (0, k 1,+2, . . .}. Then Y, has the representation
u, =x, +z,, (2.10.10)
W h e r e m
x, = 2 a,e,-, I =o
ntE WOLD DECOMPOSITION w uncorrelated (0, a') random variables defined in Theorem 2.10.1, and X, is defined as a limit in mean square.
Proof. If Y, is singular, then Y, = 2, and X, = 0. Suppose Y, is nonsingular. Let {e,} be the time series defined by Theorem 2.10.1. Let
where
and
We have n n
and Xys0 a ; < 03. It follows that
i =O is0
in mean square. Let 2, = Y, -X,. Using the arguments follows that
of Theorem 2.10.1, it
h 3 O .
Therefore, the processes (2,) and {X,} are uncorrelated. For any set of real numbers { q : i = O , l , ..., p } with S o = l ,
where we abbreviate Var{X,} to V(X,}. By (2.10.9), given (F > 0, there is some p ,
98 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
and a set of 4 defined by (2.10.6) such that
Also,
and we conclude that (2,) is singular. A
Note that the mean can be included in the singular part of the representation (2.10.10). Proofs of the Wold decomposition using Hilbert space arguments are available in Anderson (1971, p. 420) and Btockwell and Davis (1991, p. 187).
2.11. LONG MEMORY PROCESSES
In Section 2.7, we demonstrated that the autocorrelation function of an auto- regressive moving average process declines at an exponential rate. By the results of Section 2.10, any nondeterministic stationary process can be represented as an infinite moving average of uncorrelated random variables. The coefficients of the sum decline exponentially if the original process is an autoregressive moving average. Because of this rapid rate of decline, finite autoregressive moving averages are sometimes called short memory processes.
Consider a weighted average of uncorrelated (0, a') random variables m
(2.11.1)
in which the coefficients of e , , are declining at the rate id-' for some d E (-0.5,0.5). Such a time series is well defined as a limit in mean square by Theorem 2.2.3. For the special case y = ( j + l)d-', we have
E{Y:)= 2 ( j + 1)2'd-')~2*[1 - (2d- 1)~1(1S)~~']~2, (2.11.2) 1=0
m
.yu(h) = E{Y,Y,+,) = id'- ' ( i + l~)~-'u' , h > O , (2.11.3) i - 1
LONG MEMORY PROCESSES 99
where we have used the integral approximation to the sum in (2.1 1.2). For large h, %(h) is approximately equal to ~ h ~ ~ - ~ , where c is a constant. See Exercise 2.44. We say that a process X, is a lung memoly process if px(h) is approximately p ~ p r t i 0 ~ 1 to h2d-1 for d E (-0.5,OS).
A time series that has received considerable attention is
(2.1 1.4)
where q,= 1,
and r( .) is the gamma function. Using Stirling’s approximation that r(s + 1)=(2~s)”~e- ’s~ for large s, we have
for large j . It can be shown that the autocovariance, autocorrelation, and partial autocorrelation of the time seria (2.11.4) are
(2.1 1.7)
and
(2.1 1.8)
Using Stirling’s approximation,
Thus, the moving average with weights (2.11.5) is a long memory process. The time series (2.11.4) can be written
X, = ( I - a)-$, , (2.1 1.9)
100 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
where (1 - 8)' is defined by the expansion
(1 - 93)' = 1 - t 9 - (2!)-'r(l- r ) a 2 - (3!)-'r(l - t)(2 - r)a' - - * - for -0.5 < t C 0 and 0 < r < 0.5. For r = 0, the operator is the identity operator. The operator (1 - 9 3 f for d E (-0.5,0.5) is called the fracriontlr duereme opemtor. The jth coefficient in the expansion of (1 - is the y of (2.11.5).
For the time series (2.11.4), we can also write
(2.1 1.10) ( 1 - Se)"x, = e,
or
(2.1 1.1 1)
where
j
I = 1 5 ( d ) = [ r ( j + l ) r ( - -d ) ] - ' r ( j - d ) = n i - ' ( i - 1 - d ) . (2.11.12)
If the time series satisfies (2.1 1.10) witb d E (-0.5,0.5), then it has an infinite moving average representation in which the coefficients decline at the rate jd- ' and an infinite autoregressive representation in which the coefficients decline at the
Table 2.11.1. Coefficients for fractional differenced time series
d = 0.4 d = -0.4
Index Y 5 N) Y 5 P(j )
1 2 3 4 5 6 7 8 9
10 15 20 25 30 35 40
0.4OoO 0.2800 0.2240 0.1904 0.1676 0.1508 0.1379 0,1275 0.1190 0.1 119 0.088 I 0.0743 0.0650 0.0583 0.0532 0.049 1
-0.4OOO -0.1200 -0.0640 -0.0416 -0.0300 -0.0230 -0.0184 -0.0152 -0.0128 -0.01 10 -0.0062 -0.0041 -0.0030 - 0.0023 -0.0019 -0.0015
0.6667 0.5833 0.5385 0.5085 0.4864 0.4691 0.4548 0.4429 0.4326 0.4236 0.3906 0.3688 0.3527 0.3400 0.3297 0.3210
-0.4OOO -0.1200 -0.0640 -0.0416 -0.0300 -0.0230 -0.0184 -0.0152 - 0.0 128 -0.01 10 -0.0062 -0.0041 -0.0030 -0.0023 -0.0019 -0.0015
0.4000 0.2800 0.2240 0.1904 0.1676 0.1508 0.1379 0.1275 0.1 190 0.1 119 0.0881 0.0743 0.0650 0.0583 0.0532 0.0491
-0.2857 -0.0714 -0.0336 -0.0199 -0.0 132 -0.0095 -0.0072 -0.0057 -0.0046 -0.0038 -0.0018 -0.001 1 -0.0007 -0.0005 -0.OOO4 -0.0003
EXERCISES 101
rate j - d - ’ , Thus, if d > 0, the moving average coefficients are square summable and the autoregressive coefficients are absolutely summable. If d < 0, the moving average coefficients are absolutely summable and the autoregressive coefficients ace square summable. Examples of the coefficients are given in Table 2.11.1. To generalize the class of time series with long memory properlies, we assume
Y, satisfies
(1 - se)dy, = zt , (2.1 1.13)
where 2, is the stationary autoregressive moving average satisfying
(2.11.14)
e, are uncorrelated (0,a2) random variables, the roots of the characteristic equations associated with (2.11.14) are less than one in absolute value, and d E (-0.5,OS). The time series Y, is sometimes called a fractionally differenced autoregressive moving average. The representation is sometimes abbreviated to AFUMA(p, d, 4). By Theorem 2.2.2 and Theorem 2.2.3, the time series Y, can be given an infinite moving average or an infinite autoregressive repmentation.
REFERENCES
Sections 2.1, 23, 23-29. Anderson (1971). Bartlett (1966). Box, Jenkins and Reinsel ( l W ) , Hannan (1970), Kendall and Stuart (1966), Pierce (197Oa), Wold (1938), Yule (1927).
section 2.2. Bartle (19641, Pantula (1988a), Royden (1989), Torres (1986), Tucker (1967). Section 2.4. Bellman (1960). Finkbeinex (19601, Goldberg (1958). Hiidebrand (1%8).
Section 2.10. Anderson (19711, Brockwell and Davis (1991). Royden (1989), Tucker
Section 2.11. Beran (1992). Cox (19841, Deo (1995), Granger (1980), Granger and Joyeux
Kempthorne and Folks (1971). Miller (1963, 1968).
(1%7), Wold (1938).
(1980), Hosking (1981, 1982), Mandelbmt (1969).
EXERCISES
1.
2.
Express the Coefficients b, of (2.4.3) in terms of the coefficients u, of (2.4.2) for n = 3.
Given that {e,: t E (O,kl, 22, . . .)} is a sequence of independent (0, a*) random variables, define
102 MOVING AVERAGE AND AUMREORESSIVE PROCESSES
(a) Derive and graph ~ ( h ) and ~ ( h ) . (b) Express Y, as a moving average of e,. (c) Is it possible to express e, as a finite moving average of X,?
3. For the second order autoregressive process, obtain an expression for p(h) as a function of a, and a2 analogous to (2.5.10).
4. Given a second order moving average time series, what are. the largest and smallest possible values for p(l)? For p(2)? For p(3)? Give an example of 8
moving average time series such that y(0) = 2, y(1) = 0, fi2) = - 1, and fib) = o for lhl> 2.
5. Consider the difference equation
(a) Give the characteristic equation, and find its roots. (b) Give the expression for y, if yo = 0.8 and yI = 0.9. (c) ~f e, -NI(O, a’), a* >o, t ={o, 4 1 ~ 2 2 , . . .I, is there a stationary time
Y, - 1.6Y,-, + 0.89Y,-, = e,?
series satisfying
Explain. (d) Let the time series U,, t = 2,3,4,. . , , be defined by
Y, = 1.6Y,-, - 0.89Y,-, + e, ,
where (Yo, Y,) = (0,O) and e, -M(O, n2). Give the covariance matrix for w 2 * y,, YJ.
(e) Define (Yo, Y,) so that Y, of (ti) is a stationary time series.
6. Find four (admittedly similar) time series with the covariance function
I , h = O ,
0 otherwise.
7. Draw pictures illustrating the differences among the thearetical correlation functions for (a) A (finite) moving average process. (b) A (finite) autoregressive process.
EXERCISES 103
(c) A strictly periodic process of the form
M ._
X, = {el, cos A,t t e2, sin hit}, M < Q) , i = l
where e,i and e2, are independent (0, a’) random variables independent of el, and ey for all i Z j , and A,, i = I , 2, . . . , M, are distinct frequencies.
8. Consider the time series {X,} defined by
X, + alX,-, + $X,-, + = e,,
where the e, are uncorrelared (0,a2) random variables and the roots of
m3 + almZ + a,m + = O
are less than one in absolute value. Develop a system of equations defining y(0) as a function of the a’s.
9. Let the time series Y, be defined by
Y, = p, + p r t + X, , t = 1.2,. . . , where
X, = e, + 0.6e,- I ,
{e,: t E (0, 1,2,.. .)) is a sequence of normal independent (0, c2) random variables, and Po and PI are fixed numbers. aive the mean and covariance functions for the time series.
10. The second order autoregressive time series
X,+a,X,-, +a,X, - ,=e , ,
where the e, are uncomlated ( 0 , ~ ~ ) random variables and the roots of mz f a l’p + a, = 0 are less than one in absolute value, can be expressed as the infinite moving average
X, = 2 wje,-, . J=o
For the following three time series find and plot w, for j = 0,1,. . , , 8 and p(h) forh=0,1, ..., 8: (a) X, + 1.6Xt-, + 0.64X,-, = e,. (b) X, - O.4Xf- I - 0.45Xf-, = e,.
(c) Xf-1.2X,-,+0.85X,-,=e,.
1oQ MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
11. Given the time series
Y,-O.8Y,-, =el +0.7ef-, +0.6e,-,,
where the e, are uncorrelated (0, a?) random variables, find and plot p(h) for h = 0,1,2,. . . ,8.
12. Prove Theorem 2.1.1.
13. Find a second order autoregressive time series with complex r o d s whose absolute value is 0.8 that will give realizations with an apparent cycle of 20 time periods.
14. Given that X, is a first order autoregressive process with parameter p = 0.8, use Theorem 2.9.1 to find the best one- and two-period-ahead predictions based on three observations.
15. Let the time series XI satisfy
(X, - 4) - 0.8(Xf -, - 4) = el ,
where the e, are uncomlated (0, 7) random variables. Five observations from a realization of the time series are given below:
t XI
1 5.9 2 4.9 3 2.2 4 2.0 5 4.9
Predict X, for t = 6 , 7, 8, and estimate the covariance matrix for your predictions, that is, estimate the matrix E((x,,, - 25,3)(~5,3 - 2 ~ ) .
16. Let Y, be defined by
r, = p ~ , - , + e, . lpl< 1 ,
where {e,, c E (0, ? 1, 22, . . .)} is obtained from the sequence {u,) of n o d independent (0.1) random variables as follows:
t = 0,+2,+4, * . . , t = t l , + 3 , .... el=("' 2- *I12 @ , - , - I ) , 2
Is this a strictly stationary process? Is it a covariance stationary process? Given a sample of ten observations (Y,, Y,, . . . , Y,,), what is the best linear predictor of Y,,? Can you construct a better predictor of Y,,?
EXERCISES
17. Let U, be defined by
105
? = e , + P e , - , . lPl<1.
Show that the predictor of Y,+ , given in (2.9.17) is equivalent to the predictor ~ ~ + l ( Y ] , . . . , Y , ) = - ~ ~ = l (-P)’Y,-,w
18. Let {el} be a sequence of independent (0,l) random variables. Let the time series X, be defined by: (a) X, = e,. (b) XI =el - 1.2e,-, + 0.36el-,. (c) X, = el cos 0 . 4 ~ + e, sin 0 . 4 ~ + 6. (d) X,=e,+ 1.2el-l. Assume the covariance structure is known and we have available an infinite past for prediction, that is, {X, : t E (. . . , - 1,0,1, . . . , n)} is known. What is the variance of the prediction error for a predictor of X,, for each case above?
19. Show that if a stationary time series satisfies the difference equation
Y, - =el,
then E{e:} = 0.
20. Let {X, : t E (0, 2 1,+2, . . .)} be defined by
XI + alX,-, +%XI-* =e,,
where the e, are uncorrelated (0, cr2) random variables, and the roots r , and r2 of
2 m + a , m + a j = O
are less than one in absolute value. (a) Let rl and r2 be real. Show that X, - r , X I - , is a first order autoregressive
process with parameter r2. (b) If the roots rl and r2 form a complex conjugate pair, how would you
describe the time series Y, = X, - rlXl - I ?
21. Assume that X, is a stationary, zero mean, normal time series with auto- correlation function &(h). Let Y, = X:. Show that the autocorrelation fuiiction of U, is p,(h) = p:(h).
22. LetAyl=a,+a,cosur,t=1,2 ,..., wherea,,a,,anduarerealnumbers, and yo = 0. Find yI .
106 MOVINQ AVERAGE AND AUTOREGRESSIVE PROCESSES
23. Let Y, = e, - el- , , where the e, ace normal independent (0, rr') d o m variables. Given n observations ftom a realization, predict the (n + I)st observation in the realization. What is the mean square error of your predictor? Hint: Consider the time series
and predict Z,,, ,. How would your answer change if e, were considered fixed?
24. Prove the following lemma. Lemma. Let A > fi 3 0 and let p be a nonnegative integer. Then there exists an M such that (t f I)'@ < MAf for all t 2 0.
25. Let
V, - 0.8Y,-, = e, - 0.9el-, ,
where the e, are uncorrelated (0,l) random variables. Using Theorem 2.7.1 obtain E{Y,e,-,} and E{Y,-,e,.-,}. MuItiply the equation defining Y, by Y, and Y,- and take expectations, to obtain a system of equations defining %(O) and ~ ~ ( 1 ) . Solve for the variance of U,.
26. Using the backward shift operator, we can write the autoregressive moving average process of order ( p , q) as
Show that the coefficients of the moving average representation
given in Themem 2.7.1 can be obtained from the quotient (Zf=,, u, 93')-1(Xy=ob, a') by long division. Obtain the autoregressive repre- sentation of Theorem 2.7.2 in the same manner,
27. Demonstrate how the nth order scalar difference equation y, + a l y l - l i- u,y,-, f s - s + any,-,, = 0 can be written as a first order vector difference equation of dimension n. If A denotes the matrix of coefficients of the resulting first order difference equation, show that the rods of (A - rnI[ = 0 are the same as the roots of
m"+a,rn"- ' + . - . + a , = o .
EXERCISES 107
28. F i d r(h) for the vector autoregressive time series
where e, =(el,,ez,)' is a sequence of normal independent (0,X) random variables and
Hinr: Express the covariance matrix as a linear function of the covariance functions of the two scalar autoregressive processes associated with the canonical form. Assume that we have observations XI, X,, . . . , X, with Xl= (3 .2) . Predict X,,, and XR+, and obtain the covariance matrix of the prediction errors.
29. The cobweb theory has been suggested as a model for the price behavior of some agricultural commodities. The model is composed of a demand equation and a supply equation:
where Q, is the quantity produced for time r, P, is the price at time r, and U: = (UIr, U,,) is a stationary vector time series. Assuming U, is a sequence of uncorrelated (0,s) vector random variables, express the model as a vector autoregressive process. Under what conditions will the process be stationary? If Ulr=e lr+bl l e , , - , and U =e, ,+b, ,e ,~,- i where e,=(e,,,e,,)' is a sequence of uncorrelated (0, cr I) vector random variables, how would you describe the vector time series (Qf, P,)'?
f
30. Let e, be a sequence of independent (0, a*) random variables, and let {u,} be an absolutely summable sequence of real numbers. Use Theorem 2.2.2 to show that
is defined as an almost sw limit and X, is a stationary process with zero mean and
108 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES
31. Show that the assumption that E{Wi} is bounded below gumtees the existence of the inverse of V,, in Theorem 2.9.1 used to define (2.10.3).
32. Let the stationary pth order autoregressive process satisfy
where a, = 1 and el is a sequence of uncorrelated (0, a’) variables. The equation can also be written
waw, = e, . where 3 is the back shift operator defined in Section 2.4, and
~ ( g e ) = 1 + a , S + u , ~ ’ + ~ ~ ~ + u p 4 8 ~ . Some authors define a stationary process U, to be causal if it is possible to represent U, as
m
~ = ~ w i e , - , , t=0,+-1,+2 ,..., i - 0
where ZT-,, Iwjl < 00. Show that the foUowing statements are equivalent: (i) XI is causal. (ii) The roots of the characteristic equation
mP+a,mP-‘ i- ... +ap = O
are less than one in absolute value. (iii) q z ) # 0 for all complex numbers z such that lzl C 1. (iv) All roots of @(z) = 0 are greater than one in absolute value.
33. Using (2.6.8) and the fact that ap # 0, show that the autocorrelation function of the pth order autoregressive process is nonzero for some integer greater than No for all No.
34. Let
Y,=e,cosm+Z,, t = 1 , 2 ,..., where e, - N(0, ae,), independent of 2, - NI(0. qz). Let a,, and qz be known. (a) Given three observations (Y,, Y2, Y3), what is the minimum mean square
error predictor of Y4? Give the variance of the prediction emr. (b) What is the limiting expression for the predictor of Ye+, given
(Y,, Yz,. . . , Y,), where the limit is taken as n increases? What is the limit of the variance of the prediction emr?
EXERCISES 109
35. Let (Y,, . . . Y,) be observations on the first order moving average,
Y, = e , +/?,el- , . where el - NI(0, cr2) and IS, I < 1. Construct the best predictor e^,(Y,, Y2, Y,) of e,. What is the variance of the prediction error of e^,(Y,, Y2, Y,)? What is the limit of the variance of the prediction error of e*,(Y,, . . . , Y,) as n + m?
36. Let the stationary vector autoregressive process Y, satisfy D
Y, + 2 A/Y,-~ = e, . j = I
where the e, are independent (0, Xe8) random vectors. Show that the process also satisfies an equation of the form
D
Y, + 2 B,Y,.+~ = a, , /= 1
where the a, are uncorrelated (0,2,,) random vectors. Give the equations defining the B,, j = 1,2,. . . , p, as a function of the r(h). Show that if p = I , A, = A:, and 2, = I, then 8,, = I.
37. Let a, be a sequence of uncorrelated (0,l) random variables. Put the following time series in canonical form as moving averages of uncorrelated random variables e l : (a) a, + &,-, , (b) a,- I + 4a, + a,,,, (c) a, - 2.5~,-, +
Give the variance of e, in each case.
38. Let V,, denote the n X n covariance matrix of n observations (Y,, Y2, . . . , Y,,) on the first order moving average process
~ = e , + P e , - , .
where the e, are uncorrelated (0,l) random variables. Let D,, be the determinant of V,. Show that
D, = (1 + Pz)D,-l - P2Dn-z
for n = 2,3,. . , , with Do = 1 and D, = 1 + 8'. Hence, show that D, = (1 - /32)-i(l-)62(n+')), n=1,2 ,..., for ISl<l, and D n = n + l , n=1 ,2 ,..., for 1Pt = 1.
39. Let Y, be a zero mean stationary time series with y(0) > 0, lim,+,, y(h) = 0, and &h) = 0 for h > K. Show that Y, is nondeterministic.
40. Consider the model Y, = x# f 2,. t = 1,. . . , n, where Z, is a process with mean zero and known covariance function Cov(Z,, Z,) = y,.
110 MOVING AVERAGE AND AUMREGRESSIVE PROCESSES
(a) Show that the best linear unbiased estimator for f i is
BG = (x'v;lx)-lx'v;;y,
where X = (xl, . . . , x,)', y = (Yl , . . . Y,)', and V,, =Vb} is nonsingular. (b) Given Y l , . . . Y,, Xi+,, and V,, = Cov(y, Y,,,), show that the best linear
unbiased predictor of U,+, = x:+# + Z,,, is
d+,& + bi,(Y - XSO) 1
where b,, = Vi:V,,.
41. Consider a stationary and invertible time series Y, given by
where xJm= lwj} < w, Z,m- 171 < 00, and the e, are uncorrelated (0, a') random variables. Show that
where t,, = t(Y,- I, , . , , Y,J is the best lineat predictor of given V
42. In Theorem 2.9.1, the fact that the system of equations V,,b = V,, is consistent is used. Show that this system is always consistent and that y'b,, in (2.9.2) is unique ae. for different choices of generalized inverses of V,,.
43. Consider a sequence satisfying el =Z,(po + /31e:-l)1'2, where Z, - M(0, l), /3,, > 0 and 0 d /3, < 1. That is, the distribution of e, given the past is N(0, h,), where h, = Po + p,e:- I . Thus, the conditional variance depends on the past errors. The model
Y, = cuU,-, + e,
is a special case of autoregressive conditionally heteroscedastic models called an ARCH(1) model. See Engle (1982). Note that
e : = ~ : ( ~ o + ~ ~ e : -, ) = ~ : [ B ~ + P ~ z : - ~ ( B ~ + B ~ ~ : - ~ ) I . . .
j = o ' i-0 = A 5 A z:-j) a s ,
where it is assumed that the process started with finite variance in the indefinite past.
EXERCISES 111
(a) Show that {e,} is a sequence of uncorrelated [0, (1 - PI)-'&,] random
(b) Show that if 38; < 1, then E{e:} exists. Find E{e:}. (c) Consider X, = e:. Assuming 3& C 1, show that X, is stationary. Give its
(d) Consider the stationary ARCH(1) model, Y, = at Y,...l -t- e,, where Ia,l C 1.
variables.
autocorrelation function.
Assume (a,, Po, Pi) are known. (i) What is the best predictor fn+s for Yn+, given Yl, . . . , Y,? (ii) Find V{Y,+, - ?,,+,}, the unconditional forecast error variance. (iii) Find V{(Yn+, - fn+,) I (el, . . . ,en)}, the conditional forecast erro~
variance. (iv) Show that V{(Y,+, - fn+,)l(eI, . . . ,en)} may be less than V{(Y,+, -
~ + , ) ~ ( e , , . . . , en)} for some &, fll, a,, and e:. (That is, the conditional forecast error variance for two-stepahead forecasting may be less than that for one-stepahead forecasting.)
(v) Show that
limV{Y,+, - fn+$ [(el , . . . , e n ) } = (1 - aI) 2 - I u 2 as., s - w
where U' = (1 - #31)-1P,.
44. Use the facts that, for -0.5 < d < 0.5 and h > 0,
for I C i C h , 2d-Lid-I <id-l(l + h - I j ) d - l e i d - 1
( i + h ) Z d - Z < j d - I ( j + h ) d - l < 2 d - l j 2 d - 2 for i > h
to show that y ( h ) of (2.11.3) is bounded above by clhZd-' and bounded below by czh2", where cI and c2 are constants.
45. Let and {b,}rm be sequences of real numbers satisfying
m m
m rn
c,= 3 brar-, = 3 arb,-, . j = . - O O ,'--
C H A P T E R 3
Introduction to Fourier Analysis
3.1. SYSTEMS OF ORTHOGONAL FUNCTIONS-M)URIER COEF'FICIENTS
In many areas of applied mathematics it is convenient to appmximate a function by a linear combination of elementary functions. The reader is familiar with the Weierstrass theorem, which states that any continuous function on a compact set may be approximated by a polynomial. Likewise, it may be convenient to construct a set of vectors called a busis such that all other vectors may be expressed as linear combinations of the elements of the basis. Often the basis vectors are constructed to be orthogonal, that is, the sum of the products of the elements of any pair is zero. The system that is of particular interest to us in the analysis of time series is the system of trigonometric polynomials.
Assume that we have a function defined on a finite number of points, N. We shall investigate the properties of the set of functions (cos(2mtlN), sin(2m7ulN): t = 0,1,. . . , N - 1 and m = 0.1,. . . , L"]}, where L[N] is the largest integer less than or equal to N/2. The reader should observe that the points have been indexed by t running from zero to N - 1, whiIe rn runs from zero to L"]. Note that for m = 0 the cosine is identically equal to 1. The sine function is identically zero for m = 0 and for m = N12 if N is even. Therefore, it is to be understood that these sine functions are not included in a discussion of the collection of functions. The collection of interest will always contain exactly N functions, none of which is identically zero. We shall demonstrate that these functions, defined on the integers 0, 1, , , , , N - 1, are orthogonal, and we shall derive the sum of squares for each function. This constitutes a proof that the N functions defined on the integers furnish an orthogonal basis for the N-dimensional vector space.
Theorem 3.1.1. Given that m and r are contained in the set {0,1,2, * . . , L"), then
N 2 ' N
m = r # O or - 2 '
2 m N- 1 I= cos- 1=0 N
112
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
SYSTEMS OF ORTHOGONAL PUNCIIONS-fQURIER COEFFICIENTS 113
2mn 2 m N - 1
2 sin-tcos-t=o, Vm,r; I S 0 N N
N 2mn m = r # O or 7 , N - l
1=0 0, m + r .
proof. Consider first the sum of the products of two cosine functions, and let
2mn 27rr s(m, r) = Z cos --p cos -t N - I
t==O N
2mt 27rt 1 N-l
= ';i 2 [ co8 7 (m + r) + cos - (m - r)] . (3.1.1) 1-0 N
For m = r = 0 or m = r = N/2 (N even) the cosine terms on the right-hand side of (3.1.1) are always equal to one, and we have
Form = r, but not quai to zero and not equal to N/2 if N is even, the sum (3.1.1) reduces to
1 N - l 2m 1
2 r=o S(m,r)=- 2 c o s ~ 2 m + y N ,
where the summation of cosines is given by
The two sums are geometric series whose rates are exp(i(2m)21r/N} and exp{-6(2m)27r/N}, respectively, and the first term is one in both cases. Now 42m)2dN is not an integer multiple of 27r; therefore the rates are not unity. Applying the well-known formula Zy=iI A' = (1 - A N ) / ( 1 - A), the partial sum is
Since exp(i(2m)Z.rr) = exp(&?r} = I, the numerators of the partial sums are 0,
114 INTRODUCITON TO FQURIER ANALYSIS
and S(m, r) reduces to N/2. For m # r, we have
1 N - 1 = - 2 [ei(ZurIN)(m+r) - i<Zr f lN) (m+r)
+ e 4 1-0
1 4 2 a f I N)(m - r ) + - d2nf I N X m - r ) + e
= O .
The sum of products of a sine and cosine function is given by
2mn 21rr 2 sin -jp cos -t f='O N N - 1
where the proof follows the same pattern as that for the product of cosines with m = r # 0. We leave the details to the reader. A
Having demonstrated that the N functions form an orthogonal basis, it follows that any function f l f ) defined on N integers can be represented by
At) = (a, cos W,J + b, sin wmt), f = 0, 1, . . . , N - 1 , (3.1.2) m =O
where
2mn w, =- N ' m = 0, 1,2,. . . , L"]:
N - I
2 2 flt)cosw,t
c, flt)cosw,,,t
2 2 flt)sin w,t
f =o , m = l , 2 , ..., LrN-11, N a m = N - 1
N 1-0 , m = O , and m = y i f N is even: N N - I
1-0 , m = 1,2,. . . ,L[N- 11. N
i b,,, =
The a, and b, are called Fourier coeficients. One way to obtain the
SYSTEMS OF ORTHOGONAL F"CI'IONS--POURIER COEFFICIENTS 115
representation (3.1.2) is to find the a, and b, such that
(3.1.3) LINl
{fir) - 2 (a, cos w,t + b, sin w,t) r=o m=O
is a minimum. Differentiating with respect to the aj and 6, and setting the derivatives equal to zero, we obtain
LW"
{fir) - 2 (a, cos wmr + b, sin w,t) 1=0 m =O
j=O,1,2 ,...* LEN.
LtNl
{fir) - 2 (a, cos o,r + b, sin w,t) 1-0 m-0
j = 1.2,. . . ,L[N- 11.
By the results of Theorem 3.1.1, these equations reduce to N- 1 N - I
2 fit)cos wmr = am cos2w,,,t, m = 0 ~ 1 , . . . , L [ N ] , I S 0 1-0
N- I N- I
2 f(t)sin w,t = b, 2 sin20,r, 1-0 t -o
m = 1 , . . . , L[N - I J ,
and we have the coefficients of (3.1.2). Thus we see that the coefficients are the regression coefficients obtained by regressing the vectorfft) on the vectors cos omt and sin w,t, By the orthogonality of the functions, the multiple regression coefficients are the simple regression coefficients.
Because of the different sum of squares for cos Or and cos rr, the a's are sometimes all defined with a common divisor, N / 2 , and the first and last term of the series modified accordingly. Often N is restricted to be odd to simplify the discussion. Specifically, for N odd, we have
( N - 1 ) / 2 QO f(r) = 5 f (a, cos wnrt + b, sin w,t) t = 0 , l , . . . , N - 1 ,
m=1
(3.1.4)
where N - i
2 c At) cos w,t 1-0 N - 1
am = N . m = O l l , . . . , - 2 ' N- I .. .
2 C f(t) sin wmt P O N- 1 , m=1,2 ,.... - N 2 . b, =
116 INTRODUCTION TO FOURIER ANALYSIS
The reader will have no difficulty in identifying the definitions being used. If the leading team is given as a,/2, the definition (3.1.4) is being used; if not, (3.1.2) is being used.
The preceding material demonstrates that any finite vector can be represented as a linear combination of vectors defined by the sine and cosine functions. We now turn to the investigation of similar representations for functions defined on the real line.
We first consider functions defined on a finite interval of the real line. Since the interval is finite, we may code the end pints in any convenient manner. When dealing with trigonometric functions, it will be most convenient to t m t intervals whose length is a multiple of 2z We shall most frequently take the interval to be r-?1; 721.
Deffnltion 3.1.1. An infinite system of square integrable functions {a,},:=, is orthogonal on [a, b] if
and
The following theorem states that the system of trigonometric functions is orthogonal on the interval [- n, w].
Theorem 3.1.2. Given that m andj are nonnegative integers, then
0 , m + j , sinmxsin j x & = u, m = j Z O ,
0 , m = j = O ;
2 z , m = j = O ,
0 , m f j .
I cusmxcosjxdr=
R
Proof. Reserved for the reader. A
SYSTEMS OF ORTHOGONAL FUNCTIONS-FOURJER COEFFICIENTS
The sum
117
(3.1.5)
21rk hk=- , k = O , I ,..., n ,
is called a mgonometric porynOmiul of order (or degree) n and period T. We shall take T = 2w (i.e., Ak = k) in the sequel unless stated otherwise. If we let n increase without bound, we have the infinite trigonoinetric series
m
By Theorem 3.1.2 we know that the sine and cosine functions ate orthogonal on the interval [-n, w], or on any interval of length 27r. We shall investigate the ability of a trigonometric polynomial to approximate a function defined on this interval.
In the finite theory of least squares we are given a vector y of dimension n, which we desire to approximate by a linear combination of the vectors x,, k = 1,2,. . . , p. The coefficients of the predicting equation
are obtained by minimizing
with respect t0 the bk. The analogous procedure for a function Ax) defined on the interval [ - w, v) is
to minimize the integral
(a, cos Ap + b, sin Ap) (3.1.7)
of come, m x ) - (ak cos A$ + bk sin A,x)J2 must be integrable. In this section we shall say that a function g(x) is integrable over [a, b] if it is continuous, or if it has a finite number of discontinuities [at which g(x) can be either bounded of unbounded], provided the hnpmper Riemann integral exists. More general definitions could be used. Our tteatment in the sequel follows closely that of Tolstov (1%2), and we use his definitions.
The reader may verify by expansion of the square and differentiation of the
118 INTRODUCTION To FOURIER ANALYSIS
resultant products that the coefficients that minimize (3.1.7) are
a, = - I" cosApf(x)clx, k = 0 , 1 , 2 ,..., (3.1.8) - I f
where the first term is written as a0/2. In this form the approximating sum is
(3.1.10)
When T, the length of the interval, is not equal to ZT, A, is set equal to 2vk/T, and we obtain the formulas
Observe that .n[(ui/2) + X:=,, (a: f bi)] is completely analogous to the s u m of squares due to regression on n orthogonal independent variables of finite least squares theory. We repeat below the theorem analogous to the statement in finite Ieast squares that the multiple correlation coefficient is less than or equal to one.
Theorem 3.13 ( B d ' S kt bk, and S,,(X) be defined by (3.1.8). (3.1.9), and (3.1.10), respectively. If.@) defined on [-T, w] is square integrable, then
Proof. Since the square is always nonnegative,
(3.1.12)
By the definitions of a, and b, and the orthogonality of the sine and cosine
SYSTEMS OF ORTHOGONAL ”CTIONS-FOURIER COEFFlClENTS 119
functions, we have
n
&= 1
and
A
De&Mon 3.13. A system of functions {q(~)},“=~ is complete if there does not exist a functionfix) such that
and
[flx)q(x)h=*, j = o , 1 ,... .
We shall give only a few of the theorems of Fourier analysis. The following two theorems constitute a proof of completeness of the trigonometric functions. Lebesgue’s proofs are given in Rogosinski (1959).
Theorem 3.1.4. If Ax) is continuous on the interval [-n, n], then all the Fourier coefficients are zeiu if and only if
Theorem 3.1.5. If Ax) defined on the interval [ - n, r ] is integrable and if all the Fourier coefficients {ak, b,: k = 0, 1’2,. . .} are zero, then
Proof. omitted. A
By Theorem 3.1.4 the Fourier coefficients of a continuous function are unique. Theorem 3.1.5 generalizes the result to absolutely integrable functions. Neither furnishes information on the nature of the convergence of the sequences {a,} and @,I.
120 lIWRODUCTlON TO FOURIER ANALYSIS
Theorem 3.1.6 (Parseval's theorem) answers the question of convergence for square integrable functions.
Theorem 3.1.6. If j&) defined on [ - n; 7r] is square integrable, then
or, equivalently,
Proof. By Bessel's inequality, the partial sum of squares of the Fourier coefficients converges for any square integrable function. (The sum of squares is monotone increasing and bounded.) Themfore, the function
where a, and b, are the Fourier Coefficients of a square integrable functionf(x), is square integrable on [ - v, T]. Now D(x) =fix) - g(x) is square integrable and all Fourier coefficients of D(x) are zero; hence, by Theorem 3.1.5,
A
Thus, any square integrable function defined on an interval can be approxi- mated in mean square by a trigonometric polynomial.
As one might suspect, the convergence of the sequence of functions S,,(x) at a point no q u i r e s additional conditions. We shall closely follow Tolstov's (1962) text in presenting a proof of the pointwise convergence of Sn(xo) toflxo). The following definitions are needed.
Deftnition 3.13. The lefr limit offlx) is given by
lint fl%) =fix,) i--IXg X < X g
provided the limit exists and is finite. The limit offlx) as x+xo, x >xo, is called the right limit of f lx) and is denoted by f lx i), provided it exists and is finite.
Definition 3.1.4. The right derivative of Ax) at x = xo is defined by
SYSTEMS OF ORTHOGONAL FUNCTIONS-FOURIER CQEpLlClENTS 121
and the lej? derivative by
h z 0
provided the limits exist and are finite.
Definition 3.13. The functionflx) has a jump discontinuity at the point x, if
From the definitions it is clear that the jump Ax:) -flxO) is finite. fld) +fix;).
DelWtian 3.1.6. A functionflx) is smooth on the interval [c,d] if it has a continuous derivative on the interval [c, dl.
Definition 3.1.7. A function flx) is piecewise smooth on the interval [c, d ] if
(i) Ax) andf’e) are both continuous, or (ii) flx) andf’(x) have a finite number of jump discontinuities.
Bessel’s inequality guarantees that the Fourier coefficients of any square integrable function go to zero as the frequency increases; that is,
lim urn = lim bm =: 0 . rn+ m+m
The following lemma is a proof of the same result for a different class of functions.
Lemma 3.1.1. Ifflx) is a piecewise smooth function on [c,d], then
Proof. Integrating by parts gives
Since bothflx) andf’(.x) contain at most a finite number of jump discontinuities on the interval [c ,d] , they are both bounded, Therefore, the quantity contained
A within the curly braces is bounded, and the result follows immediately.
Using the fact that any absolutely integrable function can be approximated in the mean by a piecewise smooth function, it is possible to extend this result to any absolutely integrable function. We state th is generalization without proof.
122 INTRODUCTION TO FOURIER ANALYSIS
Lemma 3.1.1A. The Fourier coefficients a, and 6, of an absolutety integrable function defined on a finite interval approach zero as tn + 00.
Lemmas 3.1.2 and 3.1.3 are presented as preliminaries to the proof of Theorem 3.1.7.
Lemma 3.1.2. Define
1 " 2 j - 1
~ , ( u ) = - + s cos j u .
Then
sin (n + f)u A,(u) = 2 sin(u/2)
and
(3.1.13)
(3.1.14)
Proof. Multiplying both sides of the definition of A&) by 2sin(u/2), we have
n
/ = I 2An(u)sin(u/2) = sin(u/2) + 2 2 cos ju sin(u/2)
and the result (3.1.13) follows. Since the integral of cos ju, j = 1.2, . . . , is zero on an interval of length 2v,
and hence
1 sin (n + 4). IT -(R 2sin(u/2) dU=1.
The result (3.1.14) follows, since A&) is an even function.
(3.1.15)
A
SYSTEMS OF ORTHOGONAL FUNCTIONS-FQURIER COEPRCIENTS 123
Lemma 3.13. The partial sum
Qo " S,,(x)=y+ 2 (ukcoskx+b,sinkx) k-1
may be expressed in the form
(3.1.16)
where a, and b, are the Fourier coefficients of a periodic functionfix) of period 2 n defined in (3.1.8) and (3.1.9).
Proof. Substituting (3.1.8) and (3.1.9) into (3.1.10), we have
[ { j-: flt)cos kt dt }cos kx + { f(t)sin kt dt }sin kx] 1 "
='I" n - w A?)[:+ k = l cosk( t -x ) 1 d f .
Using (3.1,13) of Lemma 3.1.2, we have
Setting u = t - x and noting that the integral of a periodic function of period 2n over the interval [ - n - x, n - 4 is the same as the integral over [ - n, PI, we have
U sin (n + $)u A
Equation t3.1.16) is called the integral formula for the partial sum of a Fourier series.
Theorem 3.1.7. Let f lx) be an absolutely integrable function of period 2n. Then:
(i) At a point of continuity where f i x ) has a right derivative and a left derivative,
124 INTRODUCTION TO FOURIER ANALYSIS
(ii) At every p i n t of discontinuity wherefix) has a right and a left derivative,
Proof'. Consider first a point of continuity. We wish to show that the difference
equals mro. MuitipIying (3.1.15) of Lemma 3.1.2 by fix), we have
1 /I sin(n +$)u f ix) = - du I
7r -vflx) 2sin(u/2)
(3.1.17)
(3.1.18)
Therefore, the difference (3.1.17) can be written as
Now, u - ' m x + u) -fix)] is absolutely integrable, since the existence of the left and right derivatives means that the ratio is bounded as u approaches zero. Also, [2 sin(u/2)]-'u is bounded. Therefore,
fix + u) -m U
U
is absolutely integrable. Hence, by Lemma 3.1.1A,
giving us the desired result for a point of continuity. For a point of jump discontinuity we must prove
Using (3.1.14) of Lemma 3.1.2, we have
sin (n + +)u du 2 2 sin(ul2)
SYSTEMS OF ORTHOGONAL tWNCTIONS--FOURIER COEFFICIENTS 125
and
The same arguments on boundedness and integrabiIity that were used for a point of continuity uui be used to show that
"Ax + u) - f i x + ) u sin (n + 4)u du
U 2 sin(ul2)
f l x + u ) - f l x - ) u sin (n + +)u du
U 2 sin@ / 2) =lim -
= O . A
Theorem 3.1.8. Let f i x ) be a continuous periodic function of period 2~ with derivativef'(x) that is square integrable. Then the Fourier series offlx) converges t o m ) absolutely and uniformly.
Proof. Under the assumptions we may integrate by parts to obtain
Therefore, the Fourier coefficients of the function are directly related to the Fourier coefficients of the derivative by a, = -bL/h, b, = a;/h, where
a;=- I' f'(x)cos hxak , 72 -ff
The Fourier coefficients of the derivative are well defined by the assumption that f ' ( x ) is square integrable. Furthermore, by Theorem 3.1.6 (Parseval's theorem), the series
126
converges. Since
INTRoDUCrtON TO FOURIER ANALYSIS
we have
It follows that X;,, (IuhI + lbhl) converges. Now,
and therefore, by the Weierstrass M-test, the trigonometric series
converges tofIx) absolutely and uniformly. A
As a simple consequence of the proof of Theorem 3.1.8, we have the following important result.
Corollary 3.1.8. If
converges, then the associated trigonometric series
converges absolutely and uniformly to a continuous periadic function of period 2w of which it is the Fourier series.
We are now in a position to prove a portion of Theorern 1.4.4 of Chapter 1. In the proof we require a result that will be used at later points, and therefore we state it as a lemma
Lemma 3.1.4. (Kronecker’s lemma) If the sequence {aj} is such that
SYSTEMS OF ORTHOGONAL FUNCTIONS-FOURIER COEFFICIENTS 127
then
Proof. By assumption, given Q > 0, there exists an N such that
Therefore, for n > N, we have
Clearly, for fixed N,
and since c: was arbitrary, the result follows. A
Theorem 3.1.9. Let the correlation function Ah) of a stationary time series be absolutely summable. Then there exists a continuous functionflw) such that:
(i) p(h) = J-", A w W s wh do. (ii) flo) 3 0.
(iii) Jr,, Aw) do = 1. (iv) f(w) is an even function.
proof. By Corollary 3.1.8
1 " do) = - + p(h)cos h o h i t
is a well-defined continuous function. Now, by the positive semidefinite property of the comlation function,
n n
and
2 2 p(m - q)sin mw sin qo a 0 . n=lq*1
128 INTRODUCFION To FOURIER ANALYSIS
Hence,
i: 2 d m - q)[cos mw cos g o + sin mw sin qwl m = l q=1
Letting m - q = h, we have
n-I
h = -(n 2 - I ) (*)p(h)coshw*O
Now, p(h)cos hw is absolutely summable, and hence, by Lemma 3.1.4,
Therefore,
Having shown that g(w) satisfies conditions (i) and (ii), we need only muhiply g(o) by a constant to meet condition (iii). Since
g(w)dw = 1 , 1 " Ir I,
the appropriate constant is v-', and we define flu) by
The functionfiw) is an even function, since it is the uniform limit of a sum of even functions (cosines). A
In Theorern 3.1.8 the square integrability of the derivative was used to demonstrate the convergence of the Fourier series. In fact, the Fourier series of a continuous function not meeting such restrictions need not converge. However, C e s h ' s method may be used to recover any continuous periodic function from its
SYSTEMS OF ORTHOGONAL "CI'IONS--FOURIER COEFFICIENTS 129
Fourier series. Given the sequence {S,}T= I , the sequence {C,} defined by
is called the sequence of arithmetic means of {$}. If the sequence {C,} is convergent, we say the sequence {S,} is Cesciro summable. If the original sequence was convergent, then (C,} converges.
Lemma 3.1.5. If the sequence {S,} converges to s, then the sequence (C,,} converges to s.
Proof. By hypothesis, given e > 0, we may choose an N such that IS, - sl < $e for a l l j > N . For n>N, we have
Since we can choose an n large enough so that the first term is less than i e , the result follows. A
Theorem 3.1.10. Let Am) be a continuous function of period 277. Then the Fourier series of flu) is uniformly s d l e to flu) by the method of C e s h .
Proof. The C e s h sum is . n - l J
where we have used Lemma 3.1.3. Now,
(3.1.19)
130 INTRODUCI'IQN To FOURIER ANALYSIS
Therefore,
where we have used
Since flo) is uniformly continuous on [- w, n], given any e > 0, there is a 6 >O such that
In@ + u) -A@)[ < ; for all lu1<6 and all w E [ - n , w]. We write the second integral of (3.1.20) as
+ 1" IJrw + u) -Ao)l du 27rnsin2(6/2) 0
€ M c-+ 4 nsinZ@/2) *
where M is the maximum of h w ) l on [-n, n]. A similar argument holds for the first integral of (3.1.20). Therefore, there exists an N such that (3.1.20) is less than E for all n >Nand all w E [-n, n]. A
3.2. COMPLEX REPRESENTATION OF TRIGONOMETRIC SERIES
We can represent the trigonometric series in a complex form that is somewhat more compact and that will prove useful in certain applications. Since
cos e + i s i n e +cus 8 - i s in 8 ele + e-& - 2 - 2 mse=
COMPLEX REPRESENTATION OF TRIOONOMEIWC SERIES 131
and
we have
( e & ~ + 2 e - ~ ~ ) ( e~~ - e - i k ~
2 a, cos kx + 6, sin kx = a, - 4.’
Thus we coul write the approximating sum for a function Ax). defined on [ - v, v], as
0 0 Sn(x) = - + 2 (a, cos kr + 6, sin kx) = ckeYkx , & - I k = - n
where
aL + ib, , k=0,1,2 ,... . - &k
c,= , c- ,=c$=
The coefficients c, are given by the integrals
Consider now a function X, defined on N integers. Let us identify the integers by {-m, -(m -- l), . . . , -LO, 1,. . . , m} if N is odd and by {-(m - l), -(m - 2), . . . , -l,O, 1 , . . . ,m} if N is even. Taking N to be even, the c, are given by
k=-(m-1) ,..., 0 ,..., m , (3.2.2)
132
and
INTRODUCITON TO FOURIER ANALYSIS
Note that co and c, do not require separate definitions. The complex form thus makes manipulation somewhat easier, since the correct divisor need not be specified by frequency.
If N is odd and X, is an even function (i.e.. XI =X-,), then the coefficients ct are red. Conversely, if X, is an odd function, the Coefficients ck of equation (3.2.1) are pure imaginary.
3.3. FOURIER TRANSFORM-FUNCTIONS DEFINED ON THE REAL LINE
The results of Theorems 3.1.8 and 3.1.9 represent a special case of a more general result known as the Fourier integral theorem. By Theorem 3. I .8 the sequence of Fourier coefficients for a continuous periodic function with square integrable derivative can be used to construct a sequence of functions that converges to the original function. This result can be stated in a very compact form by substituting the definition of ak and bk into the statement of Theorem 3.1.7 to obtain
(3.3.1)
Since the reciprocal relationships are well defined, we could also write, using the notation of Theorem 3.1.9,
We say that p(k) and f i x ) form a transfinn pair. We shall associate the constant and the negative exponential with one transform and call this the Fourier transform or spectral density. The terms spectrum or spectral function are also
FOURIER TRANSFORM-FUNCTIONS DEFINED ON THE REAL LINE 133
used. Thus, in Theorem 3.1.9, the function flu) defined by
is the Fourier transform of p(h), or the spectral density associated with p(h).
inverse transform or the characteristic function. Thus the correlation function The transform in (3.3.1) with positive exponent and no constant we call the
is the inverse transform, or characteristic function, of Ao). We have mentioned several times that this is the statistical characteristic function if flu) is a probability density.
These definitions are merely to aid us in remembering the transform to which we have attached the constant 1/2m. We trust that the reader will not be disturbed to find that our definitional placement of the constant may differ from that of authors in other fields.
Theorem 3.1.8 was for a periodic function. However, there exists a considerable body of theory applicable to integrable functions defined on the real line. For an integrable functionflx) defined on the real line we formally define the Fourier transform by
(3.3.2)
where u E (-m, Q)). The Fourier integral theorem states that if the function f ix ) meets certain regularity conditions, the inverse transform of the Fourier transform is again the function. We state one version without proof. [See Tolstov (1962, p. 188) for a proof.]
Theorem 3.3.1. Let f lx ) be an absolutely integrable continuous function defined on the real line with a right and a left derivative at every point. Then, for ail x E (-w, m),
1 " f i x ) = 5 I_ e ' u x fit>e- 'u' dt du . (3.3.3)
Table 3.3.1 contains a summary of the Fourier transforms of the different types of functions we bave considered. We have presented theorems for more general functions than the continuous functions described in the table. Likewise, the finite transform obviously holds for a function defined on an odd number of integers. For the first, third, and fourth entries in the column headed "Inverse
Transform" we have first listed the domain of the original function. However, the inverse transform, being a multiple of the transform of the transform, is defined for the values indicated in parentheses.
w
w
Tabl
e 33.1.
Sum
mar
y of
Fou
rier
tran
sform
s
Type
of F
unct
ion
Domain o
f Fun
ctio
n Fo
urier
Tra
nsfo
rm
Inve
rse T
rans
form
-(n
- l), . . . ,
o,. .
. ,n
Ct =
- 1
2 X
Ie-'
"k""
XI
= 2
Ck
ec
-kfl
n
Fini
te s
eque
nce
X,
2n I-
-("
-l)
I=
-m - a
)
t= -(
n -
I),.
. .,O
,. . . ,
n
(can
be ex
teM
ied to
r =
0, 2
1, f
2,. . .)
A = -(
n - 1)
. . . .
, 0, -
. . , n
Infin
ite se
quence
fib
) ab
solu
tely
su
mm
able
(0, 2
1, 2
2,.
. .) m
dh
) = I f
(o)e
""
"
do
1"
21r
It-.
-_
-*
fl
u)=
- r,
@)e
-'""
w€
(-7
T,
1r)
fio)
per
iodi
c of p
erio
d 27r
h =
0, 2 1.
22.
. . .
rn
f~
)-
C c
kem
tx
x E
[ - m,
w]
(can be
extended as
aper
iodi
c fu
nctio
n on th
e real li
m)
k---
C
ontin
uous
pie
cew
ise
[-n
; TI
c, =
- ' IT
flx)e-'t
xdx
2u -
n
t =
0,I
tl. 22.. . .
smoo
th jl
x)
- TI2
fF)=
2 C
ke=
2-
k=.-!
n co
ntiB
uous
perio
dic
(-T12, T1
2)
&
piec
ewise
smoo
th
(can
be
exte
nded
to
rhe
real
line
) k =
0.2
1,22
, . . .
x E (-
T12,
T12
) (can be
exte
nded
to r
eal li
ne)
FOURIER TRANSFORM-FUNCTIONS DEFINED ON THE REAL LINE 135
Given a continuous function defined on an interval, it may be convenient to record the function at a finite number of equally spaced points and find the Fourier transform of the N points. Time series are often created by reading “continuous” functions of time at fixed intervals. For example, river level and temperature have the appearance of continuous functions of time, but we may recoTd for analysis the readings at only a few times during a day or season.
Let g(x) be a continuous function with square integrable derivative defined on the interval [-T, TI. We evaluate the function at the 2.m points
x, = ( : ) T , t = -(m - I), -(m - 2), . . . , m - 1, rn .
The complex form of the Fourier coefficients for the vector g(xl) is given by
and g(x,) is expressible as
5 C#8)er‘ lrkr/nt = , t = -(m - 11, -(nt - 21,. . . ,M . (3.3.4) k - - (m - 1 )
The superscript (m) on the coefficients is to remind us that they are based on 2m equally spaced vaiues of gQ.
If we compute the Fourier coefficients for the function using the integral form (3.2.1), we have
and the original function evaluated at the points x,, t = -(m - l), -(m - 2), . . . , m, is given by
k = - m
By the periodicity of the complex exponential we have
(3.3.5)
136 JNTRODUCl'ION TO FOURIER ANALYSIS
Table 3.32. Cmputed Coellicients for Fhnctbn (3.3.6) observed at 10 Points
Period Preclaency cp' + c'-": Alias Frequencies
- 0 0.5 10120 20 1 120 0.0 - 10 2/20 0.0 - 2013 3/20 1.6 13/20,23/20
5 4/20 0.0 - 4 5/20 0.3 15/20
Cosine Coefficient Contributing
Equating the coefficients of eilrk'"" in (3.3.4) and (3.3.5), we have aD
cp' = ck + 2 (Ck-Zns +Ck+zmj) * s = l
It follows that the coefficient for the kth frequency of the hnction defined at 2m points is the sum of the coefficients of the continuous function at the k, k + 2m, k - 2m, k + 4m, k - 4m, . . . frequencies. The frequencies k - i - h , k+4m,. . . are called the aliases of the kth frequency.
In our representation the distance between two points is T/m. The cosine of period 2Tlm or frequency m/2T is the function of highest fresuency tbat will be used in the Fourier transform. The frequency m/2T is called the Nyquist frequency. The aliases of an observed frequency are frequencies obtained by adding or subtracting integer multiples of twice the Nyquist frequency. To illustrate these ideas, let g(x) defined on [-lo, 101 be given by
g(x) = 1.0 cos 27r& + 0.5 cos m
+ 0.4 cos 2 n S x + 0.3 cos 27r&
+ 0.2 cos 27rgx . (3.3.6)
If the function is observed at the 10 points -8, -6,. . . ,8,10 and the {ci'): k = -4, -3,. . . ,4,5} are computed by
we will obtain the coefficients given in Table 3.3.2.
3.4. FOURIER TRANSFORM OF A CONVOLUTION
Fourier transform theory is particularly useful for certain function-of-function problems. We consider one such problem that occurs in statistics and statistical time series,
FOURIER TRANSFORM OF A CONVOLUTION 137
For absolutely integrable functions j@) and g(x) defined on the real line, the function
is called the convolution of f i x ) and g@). Since f ix ) and g(x) are absolutely integrable, ~ ( x ) is absolutely integrable.
We have the following theorem on the Fourier transform of a convolution.
"beorem 3.4.1. If the functions f ix ) and g(x) are absolutely integrable, then the Fourier transform of the convolution offlx) and g(x) is given by
where
and
are the Fourier transforms offlx) and g(x).
proof. The integrals defining c,(o), CAW), and c,(w) are absolutely con- vergent. Hence,
where the absolute integrability has enabled us to interchange the order of integration. A
138 INTRODLJCIION To POURlER ANALYSIS
We discussed the convolution of two absolutely summabk quences in Section 2.2 and demonstrated that the convolution was absolutely summable.
Corollary 3.4.1.1. Given that {aJ} and {b,) ace abaolutely summable, the Fourier transform of
m
d,,, = 2 amejbj j * - m
is
Where
and
Proof. The Fourier transform of id,} is
We may paraphrase these results as follows: the spectral density (Fourier transform) of a convolution is the product of the spectral densities (Fourier transforms) multiplied by 2v. That the converse is also true is clear from the proofs. We give a direct statement and proof for the product of absolutely summable sequences.
Corollary 3.4.1.2. Let {a,) and {b,) be absolutely summable, and define d, by
d, = ambm .
EXERCISES
Then
139
where f,(w) and &(w) are &fined in Corollary 3.4.1.1.
Proof. We have . m
REFERENCES
A
Jenkins and Watts (1968), Lighthill (1970). Nerlove (1964), Rogosinski (1959), Tolstov (19621, Zygmund (1959).
EXERCISES
1. Give the 12 orthogonal sine and cosine functions that furnish an orthogonal basis for the 12-dimensional vector space. Find the linear combinations of these vectors that yield the vectors ( I , 0,. . . , O ) , (0, l , O , . . . , O ) , . . . , (0, * . . ,o, 1).
2. Expand the following functions defined on [ - 7r, 771 in Fourier series: (a) Am) = cos ao, where a is not an integer. (b) f iw) = sin aw, where a is not an integer. (c) fro> = e.'"', where a z 0.
- I r c w c o
140
3. &fine g(x) by
INTRODUCTION Vl FOURIER ANALYSIS
(a) Find the Fourier coefficients for g(x). (b) What is the maximum value for
l 3 S,(x) = 5 + 2 (ak cos Rx + b, sin Rx)?
& = I
Where does this maximum occur? What is the maximum value of S&)? S,(x)? The fact that the approximating hct ion always overestimates the true function near the point of discontinuity is called Gibbs' phenomenon.
4, Prove Theorem 3.1.2.
5. Let Ax) be the periodic function defined on the real tine by
where j = 0, -+I, 22,. . . and 0 < b < v. Find the Fourier transform of f ix) .
6. Let
-b C X < b , 0 otherwise,
where b is a positive number and f ix ) is defined on the real lint?. Find the Fourier transform of f(x). Show that the limit of this transform at zero is infinity as b -$ m. Show that as b -+ GQ the transform is bounded except at zero.
7. Let
1 I
otherwise, a,@) =
where n is a positive integer and S,(x) is defined on the real line. Find the Fourier transform of a,,@). Show that, as n -+a, the transform tends to the constant function of unit height.
8. Let the generalizedfunction &x) represent a sequence of functions {8,(x)}:=,
EXERCISES
such that
141
wherefix) is a continuous absolutely integrable function defined on the real line. Then S(x) is called Dirac's delta function. Show that the sequence of functions {S,(.x)} of Exercise 7 defines a generalized function.
112 -nx2 9. kt g,(x) = (dn) e for x E (-a, a). Show that the sequence {g,,(x)};=, yields a Dirac delta function as defined in Exercise 8.
10. Letfin) defined for n E (-00, a) have the Fourier transform c(u). Show that the Fourier transform of flax + b) is lal-le'bu'"c(ula), a f 0.
11. Assume that price of a commodity is recorded on the last day of each month for a period of 144 months. The finite Fourier coefficients for the data are computed using the formulas following (3.1.2). Which coefficients will be affected if there is a weekly periodicity in prices that is perfectly represented by a sine wave of period 7 days? Assume that there are 30.437 days in a month Which coefficients will be affected if the weekly periodicity in prices can be repented by the sum of two sine waves, one of period 7 days and one of period 3 3 days? See Granger and Hatanaka (1964).
12. Let f l x ) and g(x) be absolutely integrable functions, and define
Show that d x ) satisfies
13. Let f l x ) and g(x) be continuous absohtely integrable functions defined on the real line. State and prove the result d o g o u s to Corollary 3.4.1.2 for 444 = f l - M x ) .
14. Give a direct proof of Corollary 3.4.1.2 for finite transfoxms. That is, for the two functions y(h) and w(h) defined on the 2n - 1 integers h = 0, 2 1,
142 INTRODUcllON TO FOURIER ANALYSIS
22, . . . , +(n - I), Show that
2 nk *=-* k=0, 21, 2 2 (..., +(n- 1).
IS. Let f(o) be a nonnegative even continuous periodic function of period 27~. Show that
II
dh)= fjw)e-'"hdw, h = O , 21, 2 2 ,..., -W
is an even positive semidefinite function.
C H A P T E R 4
Spectral Theory and Filtering
In Chapter 1 we discussed the correlation function as a way of characterizing a time series. In Chapter 2 we investigated representations of time series in terms of more elementary time series. These two descriptions are sometimes called descriptions in the time domain because of the obvious importance of the index set in the representations.
In Chapter 3 we introduced the Fourier transform of the correlation function. For certain correlation functions we demonstrated the uniqueness of the transform and showed that the correlation function is expressible as the inverse transform of the Fourier transform. The Fourier transform of the absolutely summable correla- tion function was called the spectral density. The spectral density furnishes another important representation of the time series. Because of the periodic nature of the trigonometric functions, the Fourier transform is often called the representation in the freqwncy domain.
4.1. THE SPECTRUM
In Chapter 1 we stated the result that the correlation function of a time series is analogous to a statistical characteristic function and my be expressed in the form
where the integral is a Lebesjpe-Stieltjes integral and mu) is a statistical distribution function. In Theorem 3.1.9 we proved this result for time series with absolutely summable covariance functions.
Since the covariance function of a stationary time series is the correlation function multiplied by the variance of the process, we have
(4.1.1a)
143
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
144
where
dF(o) = do) dG(o) . Both of the functions G(w) and F(w) have been called the spectral distribution function in time series analysis. The spectral distribution function is a non- decreasing function that, for our pwposes, can be assumed to be composed of the sum of two parts: an absolutely continuous portion and a step function.’ We take (4.1.1a) as the definitional relationship between the spectral distribution function and the covariance function. The spectral distribution function is also sometimes called the integrated spectrum.
Let us assume Hh) is absolutely summable. Then, by Theorem 3.1.9, flu) defined by
(4.1.2)
is a continuous nonnegative even function, and
Thus for time series with absolutely sumable covariance function, dF(w)= flu) do, where f(w> was introduced as the spectral density function in Section 3.3.
Recall that we have taken {e,} to be a time series of uncorrelated (0, r’) random variables. The spectral density of {e,} is
’ Any statistical distribution function can be decomposed into three components: ( I ) a step function containing at most a countable number of finite jumps; (2) an absolutely continuous function; and (3) a “continuous singular” function. The third portion will be ignored in our treatment. See Tucker (1%7, p. 15 ff.). While not formally correct. one may think of F(o) as the sum of two parts, a step function with jumps at the points o,, j = -M. -(M - I ) , . . . , M - 1. M, and a function with continuous first derivative. Then the Lebesgue-Stieltje-s integral J g(o) dF(w) is the sum of ZE-, g(o,)f(q) and J g(w)j(w) do. where I @ ) is the height of the jump in F(o) at the point 9, flu) is the derivative of the continuous portion of F(o), and g(o)j(w) do is a Riemann integral.
THE SPECIXUM 145
which is positive, continuous, and trivially periodic. Similarly,
“ I
“ 1
yJh) = c2ed** d o
- cz cos wh dw = I_, 27r c , h = O , 0 otherwise.
As noted previously, the time series e, is often called white noise. The reason for this description is now more apparent. The spectrum is a constant multiple of the variance and, in analogy to white light, one might say that all frequencies contribute equally to the variance.
If the spectral distribution function contains finite jumps, we can visualize the spectrum containing discrete “spikes” at the jump points in the same way that we view discrete probability distributions. We now investigate a time series whose spectral distribution function is a step function.
Let a time series be composed of a finite sum of simple processes of the form (1.6.3). Tha! is, define the time series U, by
M
Y, = 2 (Aj cos o y + E, sin ?t ) , (4.1.3) 1-0
where the Aj and B, are random variables with zero mean and
E{A:} = E{E;} = a; , j = O , 1,2 ,..., M , j + k ,
vj, k , E{EjEk} = E{A,A,} = 0 .
E{A,E,} = 0
and a+, j = 0,1, . . . , M, are distinct frequencies contained in the interval [ - n, 4. By (1.6.4), we have
M
y,(h) = E{y,y,+,} = z: v;rcos ay cos q(f + h)] J”0
=% crfcosyh. j - 0
(4.1.4)
Since the function y,,(h) is composed of a finite sum of cosine functions, the graph of u; against 9 (orj) will give us a picture of the relative contribution of the variance associated with frequency q to the variance of the time series. This is
146 SPECTRAL THEORY AND FILTERING
true because the variance of is given by
While we permitted our original frequencies in (4.1.3) to lie anywhere in the interval [-n; T] , it is clear that with no loss of generality we could have restricted the frequencies to [0, T]. That is, the covariance function for
X, = A j cos ( - 9 ) t + B’ sin (-q)t
is the same as that of
X, = A, cos qt + Bj sin 9’.
his suggests that for a covariance function of the form c; cos u,h we associate one half of the variance with the frequency -9 and one! half with the frequency 9. To this end, we set
To facilitate our representation, we mume that o, = 0 and then write the sum (4.1.4) as
(4.1.5)
where 0-, = -9, We say that a time series with a covariance function of the form (4.1.5) has a
discrete spectrum or a line spectrum. Equation (4.1.5) may be Written as the Lebesgue-Stieltjes integral
U
- - \ e*’”’dF(o), (4.1.5a) J - u
where F(o) is a step function with jumps of height iu; at the points 9 and -9, id: 0, and a jump of height a: at w, = 0. By construction, the jumps Z(I$ are
symmetric about zero. and we have expressed ~ ( h ) in the general form (4.1.1a). We used this method of construction because the covariance function &(It)= ZEo u: CQS +h is not absolutely summable and we could not directly apply (4.1.2).
For our purposes it is sufficient for us to be able to recognize the two types of autocovariance functions and associated spectra: (1) the autocovariance function
THE SPECTRUM 147
4 n
I n
that is the sum of a finite number of cosine functions and is associated with a spectral distribution function, which is a step function; and (2) the absolutely summable autocovariance function that is associated with a continuous spectral density.
r
.I
Example 4.1.1. Let us consider an example. Assume that the covariance function is given by (4.1.4) with the variances and frequencies specified by Table 4.1.1. Defining I(?) as in (4.1.5), we have
Z(0) = 4 , f ( - q 7 r ) = Z ( ~ . . ) = ~ ,
1(- & 71.) = f(& 7r) = f , I( - a 7r) = f(i 7r) = ; , l(-i n) = f(i 7r) = .
The original variances are plotted against frequency in Figure 4.1.1, and the line s p e c t ~ m is plotted in Figure 4.1.2. The associated spectral distribution function is given in Figure 4.1.3. AA
1 1
Table 4.1.1. Examptes of Varianas for Time Series of Form (4.13)
I
148 SPECTRAL THFORY AND FILTERING
Frequency FLgure 4.1.2. Graph of the line spectrum for the time series of Table 4.1.1.
I1 ‘ 1
- c
I I l l 1 1 1 I -n 0 R
Figure 4.13. Spectral distribution function associated with the spectrum of figure 4.1.2.
Frequency, w
Example 4.13. As a second example, let X, be defined by
W IT X, = Q, msyt + c, sin j - f + e, , (4.1.6)
where the e,, z = 0, 2 1.22.. . , , are independent (0.0.21~) random variables independent of the 4, j = 1,2, which are independent (0.0.8~) random variables. Letting
CIRCULANTS-DIAGONALEATION OF THE COVARIANCE MATRIX 149
it follows that
and
Therefore,
The autocovariance function of X, is
= I(w,)cos cqh + 1 ” 0.1 cos wh dw J = - l - w
h = O , f ’ n
0.8n cos yh otherwise,
where w-, = - d 2 and w, = d 2 . The reader may verify this expression by evaluating E{X,X,+,} using the definition of X, given in (4.1.6). AA
4.2. CIRCULANTS-DIAGONALIZATION OF THE COVARIANCE MATRIX OF STATIONARY PROCESSES
In this section we investigate some properties of matrices encountered in time series analysis and use these properties to obtain the matrix that will approximately diagonalize the n X n covariance matrix of n observations from a stationary time series with absolutely summable covariance function. Let the n X n covariance matrix be denoted by r. Then
r =
150 SPECTRAL THEORY AND PILTERiNG
It is well known' that for any n X R positive semidefinite covariance matrix r, there exists an M satisfying M'M = I such that
M'I'M = diag(A,, A,, . . . A,,), where A,l i = 1,2,. . . , n, are the characteristic roots of l'. Our investigation will demonstrate that for large n the Ai are approximately equal to 2@0,), whereflw) is thespectraldensityofX, and0+=2?~jln, j=0, 1 , 2 , . . . 1 n - 1.Thispermitsus to interpret the spectral density as a multiple of the variance of the orthogod random variables defined by the transformation M. We initiate our study by introducing a matrix whose roots have a particularly simple representation.
A matrix of the form
y(l) * * - r(n-2) r(n-I ) f i n - 1) y(0) * * . fin-3) fin-2) fin-2) Hn-1) * . . r(n-4) fin-3)
r, = [ ] (4.2.1)
is called a circular matrix or circulunt. Of course, in this definition f i j ) may be any number, but we use the covariance notation, since our immediate application will be to a matrix whose elements are covariances.
The characteristic roots Aj and vectors xj of the matrix (4.2.1) satisfy the equation
(4.2.2)
and we have
fi2) r(3) - * * Y(0) f i l l f i l l 742) * . * fin- 1) 740)
rcxj = A j x j , j = 1,2,. . . , n ,
y(l)xjj f y(2)x, ,+**'+y(n- lbn-fi,j+fi0bmj=A~,,j* where xy is the kth element of the jth characteristic vector. Let r, be a root of the scalar equation r n = 1 and set xkj = r:. The system of equations (4.2.3) becomes
r(0)rj + fil)r; + + - 2)t;"-' + fin - l)t$'=Ajr,,
f in - l)r, + H0)r; + - * * + fin - 3)r;-' + Hn - 2)r; = Air;, (4.2.4)
CIRCULANTS-DIAGONALIZATION OF THE COVARIANCE MATRIX 151
If we multiply the first equation of (4.2.4) by ry-’, the second by ry-’, and so forth, using rjn+k = rf, we see that we sball obtain equality for each equation if
n- I
Aj = 2 y(h)rj. h = O
(4.2.5)
he equation tn = I ha^ n distinct roots. eo2rrj1r, j = 1.2,. . . , n, which may , j = 0,1,. . . , n - 1. Therefore, the n charac- - e 2 u j l n also be expressed as rj = e
teristic vectors associated with the n characteiistic roots are given by = n - 1 / 2 [ 1 , e-*2rrjln -.2a2jln -rZrr(n- I ) j ln I
X j = gj , e ,... , e ] , j = O , I I . . . , n - I , (4.2.6)
where g; is the conjugate transpose of the row vector
I . -112 c2u j ln , e v 2 ~ 2 ~ t n c2n(n- l ) j l n g j . = n [I, e ,... * e
If we define the matrix G by
G = (g;.? a; , . . . , g:.)’ ,
then
GG*=I, Gr,G*=diag(~,A,, ..., A n , _ , ) ,
where G* is the conjugate transpose of G.
obtain the circular symmetric matrix Setting y(1) = r(n - l), y(2) = fin - 2)*. . . , Hh) = f in - h), . , . in r,, we
(4.2.7)
(4.2.8)
where we have used the periodic property e-02T’(n-h)’n - - @a2rrjhln . Note that these roots are real, as they must be for a real symmetric matrix.
152 SPECTRAL W R Y AND FILTERING
In some applications it is preferable to work with a rea1 matrix rather than the complex matrix G. Consider first the case of n odd. Eiquation (4.2.8) may also be Written
( n - I ) / 2 2 s A j = f i h ) c o s n h j , j = O , l , ..., n - 1 . (4.2.9)
h- - ( n - 1)12
SinceforOSrn~2~wehavecosm=cos(2r-m) , weseethatthereisaroot for j = 0 (or j = n) and (n - 1)/2 roots of multiplicity two associated with j = 1,2,. . . , (n - 1)/2, For each of these repeated roots we can find two real orthogonal vectors. These are chosen to be
and
If we choose the vectors in (4.2.6) associated with the j th and n - jth roots of one, the vectors in (4.2.10) are given by
2-"*(gj. +gn- j , . ) and 2-l'*4gj.
Much the same pattern holds for the roots of a circular symmetric matrix of dimension n Xn where n is even. There is a characteristic vector n (1,1, ..., 1) associated with j = O , and a vector n-1'2(l ,- l , l , . . . , - l) associated with j = n/2. The remaining (d2) - I roots have multiplicity two, and the roots are given by (4.2.8).
-112
Define the orthogonal matrix Q by setting t~"'2-"~Q' equal to
2 - I l 2 ... 2-'12 2-1/2 '2-Il2 n-1
cos 2 v - n ... 1 2
cos 2T - n cos 2 s - n 1 n - 1
sin 2 9 - n ... 1 2
sin 2 v - 0 sin29; n n - 1
1 cos 41r n ... cos 4 s y 2 cos4r- n
1
n - 1 1 n - 1 2 n - 1 n - l 2s; sin-25 * . . sin 7 2 r y-- 2 0 sin- 2 L1
(4.2.11)
Note that Q is the matrix composed of the n characteristic vectors defined by
CIRCULANTS-DIAGONALUATION OF THE COVARIANCE MATRIX 153
(4.2.10) and our illustration (4.2.1 1) is for odd n. Let X, be a stationary time series with absolutely summable covariance function Hh). For odd n, define the n X n diagonal matrix D by
D = diag{d,, d,, . . . , d,) , (4.2.12)
where
j = 1,2, . . . , OS(n - 1). For r, defined by (4.2.7) the matrix QT,Q is a diagonal matrix whose elements converge to 2721) as n increases. This also holds for even n if the definition of Q is slightly modified. An additional row,
- 1 1 2 n [I, -1, l , . . . , I , - 1 1 ,
which is the characteristic vector associated with j = n / 2 , is added to the Q' of (4.2.1 1) when n is even. The last entry in D for even n is
The covariance matrix for n observations on X, is given by
HO) f i l ) - . . f i n - 1 )
r = (4.2.14)
fin-1) Hn-2) - - . For Hh) absolutely summable, we now demonstrate that QTQ also converges to 27rD.
Let q = [ql , , qZi,. . . , qni]' be the ith column of Q. We have, for I' defined in (4.2.14) and r, defined in (4.2.7),
154 SPECTRAL THBORY AND PLTERlNG
whete M = (n - 1)/2 if n is odd and M = (0112) - 1 if n is even. Now (4.2.15) is less than
since qkiqri G 21n for all k, i, r, j E (1.2, . . . , n). As n increases, the limit of the first term is zero by Lemma 3.1.4 and the limit of the second term is zero by the absolute summability of r(h). Therefore, the elements of Q'I'Q converge to 27rD. We state the result as a theorem.
Theorem 4.2.1. Let r be the covariance matrix of n observations fr+n a stationary time series with absolutely summable covariance function. Let Q be defined by (4.2.1 I), and take D to be the n X n diagonal matrix defined in (4.2.12). Then, given e > 0, there exists an n, such that for n > n, every element of the matrix
Q'rQ - 2nD is less than B in magnitude.
Corollary 4.2.1. Let r be the covariance matrix of n observations from a stationary time series with covariance function that satisfies
W
2 l~lIrcNl=L<cQ. h m - m
Let Q and D be as defined in Theorem 4.2.1. Then every element of the matrix Q'I'Q - 2nD is less than 4Lln in magnitude.
Proof. By (4.2.16) the difference between q.;I'#q.] and qlirq, is less than
Now QT,Q is a diagonal matrix, and the difference between an element of QT,Q and an element of 211D is
THE S I ” R A L DENSITY OF MOVING AVERAGES 155
We have demonstrated that, asymptotically, the Q defined by (4.2.1 1) or the G defined by (4.2.6) will diagonalize all r-matrices associated with stationary time series with absolutely swnmable covariance functions. That is, the transformation Q applied to the n observations defines n new random variables that are “nearly” uncorrelated. Each of the new random variables is a linear combination of the n original observations with weights given in (4.2.10). The variances of both the 2jtb and (2j + 1)th random variables created by the transformation are approxi- mately 2rrf l2q’ln).
For time series with absolutely summable covariance function the variance result holds for the random variable defined for any w, not just for integer multiples of 2 d n . That is, the complex random variable
n
has variance given by
By Lemma 3.1.4,
We shall return to this point in Chapter 7 and investigate functions of the random variables created by applying the transformation (4.2.10) or (4.2.6) to a finite number of observations from a realization.
4.3. THE SPECTRAL DENSITY OF MOVING AVERAGE AND AUTOREGRESSIVE TIME SERIES
By the Fourier integral theorem we know that the Fourier transform of an absolutely summable sequence is unique and that the sequence is given once again by the inverse transform of the transform. Therefore, we expect the spectral density of a particular kind of time series, say a finite moving average, to have a
156 SPECTRAL THEORY AND mLTERMG
distinctive form. We shall see that this is tbe case: the spectral densities of autoregressive and moving average processes are easily recognizable.
Stationary moving average, stationary autoregressive, and stationary auto- regressive moving average time series are contained within the representation
X, = bjeIdj (4.3.0) j - -0
where
and the e, are uncorrelated (0,~') random variables. Therefore, if we obtain the spectral density of a time series of the form (4.3.0), we shall have the spectral density for such types of time series. In fact, it is convenient to treat the case of an infinite weighted sum of a more general time series.
Theorem 4.3.1. Let X, be a stationary time series with an absolutely summable covariance function and let {aj}y=-m be absolutely summable. Then the spectral density of
is
wheref,(w) is the spectral density of XI,
is the Fourier transform of uj, and
is the complex conjugate of the Fourier transform of uj.
(4.3.1)
Proof. Given the absohte summability, we may interchange integration and summation and calculate the expectation term by term. Therefore, letting
THE SPECTJIAL DENSlTY OF MOVING AVERAGES 157
E{X,} = 0, we have
m m
Ifwe s e t p = h - k + j , then m m
(4.3.2)
Since yy(h) is absolutely summable, we may write
t m m m
where we have made the transformation s = j + h - p. A
The theorem is particularly useful, and we shall make repeated application of the resuit. If the X, of the theorem are assumed to be uncorrelated random variables, we obtain the spectral density of a moving average time series.
Corollary 4.3.1.1. The spectral density of the moving average process m
where {e,} is a sequence of uncorrelated (0, a*) random variables and the sequence {a,] is absolutely summable, is
(4.3.3)
Proof. The result follows immediately from T h m m 4.3.1. A
158 SPECTRAL THEORY AND FILTERING
Let us use (4.3.3) to calculate the spectral density of the finite moving average
P
X, = C aje,-j , /=O
We have
(4.3.4)
(4.3.5)
Equation (4.3.5) is a proof of the corollary for finite p and Serves to reinforce our confidence in the gened result. The spectral density of a finite moving average may, alternatively, be expressed in terms of the roots of the auxiliary equation. The quantity
is seen to be a poiynominal in e-'". Therefore, it may be written as
where uo = 1 and the m, are the roots of
THE SPECTRAL DENSITY OF MOVING AVERAGES 159
It follows that the spectral density of the {X,} defined by (4.3.4) is
We can also use Corollary 4.3.1.1 to compnte the spectral density of the first order autoregressive process
The transform of the weights uJ = p' is given by
and the complex conjugate by
Therefore.
(4.3.7)
U2 1 2.rr 1 - 2 p c O s u + p 2 *
-- -
sincefc(o), the spectral density of the uncorrelated sequence {q}, is a2/27r for all
We note that we could also find the spectral density of X, by setting Y, = e, in (4.3.1) and using the known spectral density of e,. Applying this approach to the pth order autoregressive time series, we have the following corollary.
w.
Codary 4.3.1.2. The spectral density of the pth order autoregressive time series {X,} defined by
160 SF'FLTRAL THEORY AND FILTERMI
where a. = 1, ap # 0, the e, are uncorrelated (0, rr2) random variables, and the roots {m,: j = 1,2,. . . , p } of
P
R(m) = 2 C Y / ~ ~ - ~ = 0 j = O
are less than one in absolute value, is given by
u2 1 2~ j - I ( 1 - 2m, cos o + m:) * =-n (4.3.8)
Proof. Apply Theorem 4.3.1 to the finite moving average time series 1
to obtain
Where
= R(eiw)R(e-'").
Neither R(e"") nor R(e-"") can be zero, since the mots of R(m) cannot have an absolute value of one. Therefore, no) is not equal to zero for any o, and we are able to write
f X ( 4 =f,(o)T-'(w) *
This is the first repmentation given in (4.3.8). The alternative form forfX(w) follows by writing no) in the factored form
A
THE SPECTRAL IIEfGITY OF MOVING AVERAGES 161
We can also write (4.3.8) in the factored form
The spectral density of the autoregressive moving average process is sometimes called a rational spectral density because it is the ratio of polynomials in e"".
Corollary 43.1.3. Let an autoregressive moving average process of order ( p , q) be defined by
where a,, = & = 1, ap #O, p9 # 0, the e, are uncomlated (0, u2) random variables, and the rods {mi: j = 1,2,. . . , p ) of
P
R(m) = 2 aJmP-j = 0 1-0
are less than one in absolute value. Then the spectral density of {X,} is given by
0 a
where {rk: k = 1,2.. . . , q} are the mots of
(4.3.10)
k=O
Proof. Reserved for the reader. A
In Theorem 2.6.4 we demonstrated that a finite moving average process with no roots of unit absolute value always has a representation with characteristic equation whose roots are less than one in absolute value. In Theorem 2.6.3 we obtained a moving average representation for a time series with IpCl)( < 0.5 and p(h) = 0, h = rt2, +3, . . , . We can now extend Theorem 2.6.3 to time series with a finite number of nonzero covariances.
162 SPECTRAL THEORY AND FILTERING
Theorem 4.3.2. Let the stationary time series X, have zero mean and spectral density
where flu) is strictly positive, Hh) is the covariance function of X,, and Hq) # 0. Then X, has the representation
4 4
where Po = 1, the e, are uncomlated (0, a2) random variables with a' = HO)[X;=o Sf]-'. and the roots of
4
are less than one in absolute value.
Proof. The function
is a polynomial of degree 2q in z with unit coefficient for zZ4. Therefore, it can be written in the factored form
where the m, are the roots of A(z) = 0. If A(m,) = 0, then
and by symmetry
h a - q
Sinceflu) is strictly positive, none of the roots are of unit absolute value. Also, the coefficients are real, so that any complex roots occur as conjugate pairs. Thetefore, we can arrange tbe roots in q pairs, (m,, m-J = (mr, rn;'), where multiple roots are repeated the proper number of times. We let m,, r = 1.2,. . . , q. denote the roots less than one in absolute value. Using these roots, we define 4, for
THE SPE€TRAL DENSITY OF MOVING AVERAGES 163
J=0 ,1 ,2 ,..., +by a a I3 (z - m,) = 2 gzq-J .
r= I j-0
Therefore,
4 4
z-4r(q)A(z) = z-4r(q) r= n I (z - m,) r = l n (z - m;‘ )
It foliows that the expression in e-’e, defining f(o) can be written as
where fa(o) is the Fourier transform of {g} and a* = r(O)(;Ci”=, @:)-’. Now define the sequence of random variables {e,) by e, = Z,Tm0 c,X,-~, where the sequence {c,} is defined in Theorem 2.6.2. These c, are such that X, = X;=o P,e,-,, which means that l f , (~)&(w)1~ = ( 2 ~ ) - ~ . Therefore, the spec3al density of e, is
and the e, are uncorrelated with variance c2. A
By our earlier results in Fourier series we know that a continuous periodic function can be approximated by a trigonometric polynomial. This means that a time series with a continuous spectral density is “very nearIy” a moving average process and also “very nearly” an autoregressive process,
Theorem 43.3. Let g(o) be a nonnegative even continuous periodic function of period 2v. Then, given an e 3 0, there is a time series with the representation
x, = x 4 45-J ]=O
such that Ifx(o) - g(w)I < for all o E [-P, ?r] where Pa = 1, q is a finite
164 S M A L THEORY AND FILTERING
integer, the e, are uncmlated (0, a’) random variabies, and
Proof. The result is trivial if g(o) * 0; hence, we assume g(w) > 0 for some w. Given e > 0, let S = 2eC(4nM + 3G)-’ and set
g ( 4 if g ( 4 ’ 6 v
where
M=mwa g(o)
P
G = I_, g ( 4 d o *
Then by Theorem 3.1.10, there is a q such that I&(#) - C(o)l<+S for all w E [-T, R], where
and
By Exercise 4.14, r(h) is positive semidefinite, and fu(o) is of the same form as the spectral density in Theorem 4.3.2. Therefore, we may write
where 8, = 1, the roots of 2,9;, /$rnqPJ are less than one in absolute value, and T~ = [z,’=, g,”]-’ J:,, C(o) do. Setting
where
A
THE SPECTRAL DENSITY OF MOVING AVERAGES 165
Theorem 43.4. Let g(w) be a nonnegative even continuous periodic function of period 2n. Then, given an E > 0, there is a time series with the representation
such that Ifu(w) - g(w)l< t. for all w f [-n, n], where q-, = 1, p is a finite integer, the roots of
P
9 r n P - J = o J-0
are less than one in absolute value, and the e, are uncorrelated (0, v2) random variables.
Proof. The result is trivial if g(w) = 0, Assuming g(w) > 0 for some w define
Also define G = max g(w), and let 0 < S < e[G(2G + €)I-'. Then, by Themem 3.1.10, there is a finite p such that
for all w E [- .rr, lr]. By Theorem 4.3.2,
where a, = 1, M is a constant, and the roots of "=,, ajmp-J = 0 are less than one in absolute value. Hence, by defining
and u2 = 2nM-', we have the conclusion. A
Operations on a time series are sometimes called filtering. In some areas, particularly in electronics, physical devices are constructed to filter time series. Examples include the modification of radio waves and radar signals. The creation of a linearly weighted sum is the application of a linear filter, the weights being identified as the filter. If, as in (4.3.0), the weights are constant over time, the filter
166 SPECTRAL THEORY AND FILTERING
is called time inv~riant.~ Obviously such operations change the behavior of the time series, and in practice this is often the objective. By investigating the properties of the filter, one is able to make statements about the change in behavior.
It is clear that moving average and autoregressive processes are obtained from a white noise process by the application of a linear time invariant fib. Thus, using this terminology, we have been studying the effects of a hear time invariant filter on the spectral density of a time series. To further illusbate the effects of the application of a linear filter, consider the
simple time series
and apply the absolutely summable linear filter uj to obtain m
j = - m
We may express the Fourier transform of uJ in a general complex form - @,,,, (say)
OD
271f,(w) = C, uje = u(o) + /s-0
= @(/(w)[cos do) + is in &o)] = @(&?'(0), (4.3.11)
where
The reader should note the convention used in defining the sign of u(w)/u(o): u(w) is the coefficient of i (not -6) in the original transform.
In this notation. the filtered cosine wave becomes
1 y, = a r M w ~ e * l m f + B + ' ( w ' l + e-sletr+8+r(41
= 2aflw)cos(wi + B + p(w)}.
Thus, the process Y, is also a cosine function with the original period, but with amplitude 2a@(o) and the phase angle /? + do). The function #(to) is called the
' we may define the time invariant property as follows. If the output of a filter g applied to X, is q, lhen the filter is time invariant if g(X,+,) = V,,, €or all integer h.
THE SPECTRAL DENSITY OF MOVING AVERAGES 167
gain of the filter, and the function p(w) is the phase angle (or simply phase) of the filter. The rationale for these tenns is clear when one observes the effect of the filter on the cosine time series. The transform 2?rf,(w) is called the frequency responsefunction or the transferfunction of the filter. From (4.3.1) we know that the spectrum of a time series Y, created by linearly filtering a time series X, is the spectrum of the original series, fx(w), multiplied by the squared gain of the filter. The squared gain is sometimes called the power transfer function.
A particularly simple filter is the perfect delay. Let {X,: t E (0, 2 1, 22, . . .)} be a stationary time series. The time series delayed or lagged by 7 periods is defined bY
y, =X,-, I
where the filter is defined by u7 = 1 and uj = 0, j # 7. Hence the frequency response function of the Mter is e-'"' = cos m - i sin w7, the gain is (cos2w + sin2m)"* = 1, and the phase angle is tan-'(-sin m/cos urr) = - m. A cosine wave of frequency w and corresponding period 27rw-' completes (2n)- 'w7 cycles during a "time period" of length 7. Thus the phase of such a cosine wave lagged T periods is shifted by the quantity (27r)-'07.
Theorem 4.3.1 is a very useful theorem for absolutely summable filters. In some situations, a mean square result applicable to a broader range of filters and time series is appropriate.
Theorem 43.5. Assume X, is a zero mean stationary time series with spectral density fx(w). Assume {cj}T--.- is a square summable sequence and that
converges absolutely. Define m
y," 5 cjx,, , t = O , k I , t 2 ,... . j 3 - m
Then
and
where
(4.3.12)
(4.3.13)
168 SeecTRAL THBORY AND FILTERING
Proof. Because (4.3.12) converges, we can interchange orders of integration and summation and we have
m .
Because of the one-to-one correspondence in mean square between the Fourier coefficients of a square integrable function and the function, f,(w) is the function specified in (4.3.13). A
In Section 2.11, we introduced the long memory processes that satisfy
(1 - aIdu, =z, (4.3.14)
or
y, = (1 - a,-“, , (4.3.15)
where Zf is a finite autoregressive moving average. By Theorem 3.1.6, there is a functionf,(o) defined as a iimit in mean square,
P
f&4 = ( 2 7 v c Y y ( W i W h * h = - m
because y,(h) is square summable. Also,
w j = I_-: fv(w)e*mh dw . Consider the simple case in which the Zf of (4.3.14) is e,, a time series of
uncorrelated (0, cr’) random variables. Then, by Theorem 4.3.5,
f,(o) = ( I - e-co)-d(l - e’”)-df,(w) = (2m)-71 - e’Qy-uJ = (2~ ) - ’ [4 sir1~(0.5w)]-~a’ .
Observe that fu(w) is unbounded at w = 0 when d > 0. Because x - ’ sin x -+ 1 as x 3 0 ,
fu(”) 3 (2 7r) - 0 - 2duZ
V-R PROCESSES 169
as w -+ 0. Sometimes a long memory process is defined to be a process whose spectral density is approximately equal to a multiple of w-zd for -0.5 < d < 0.5 and w near zero.
It follows from the spectral density that
4.4. VECTOR PROCESSES
The spectral representation of vector time series follows in a straightforward manner from that of scalar time series. We denote the cross covariance of two zero mean stationary time series X,, and X,, by &(h) = r(h)} , ,n = E{xj,x,,,+h) and assume that {am(h)} is absolutely summable. Then
(4.4.1)
is a continuous periodic function of w, which we call the cross spectral function of and Xm,. Since xm(h) may not be symmetric about 0, &(w) is, in general, a
complex valued function. As such, it can be written as
&(w) = C j , ( 4 - iqi,(w) 7
where c,,(o) and qjm(w) are real valued functions of w. The function cjm(w) is called the coincident spectral density or simply the cospectrum. The function qj,(w) is called the quadrature spectral density. The function cim(w) is the cosine portion of the transform and is an even function of w, and q,,(w) is the sine portion and is an odd function of w. Thus we may define these quantities as the transforms
1 ” c, ,(4 = - c +[ym(W + r/m(-h)le-””h 9
2 w h = - m
or, in real terms,
r m
(4.4.2)
170 SPECTRAL THEORY AND FILTERING
By the Fourier integral theorem,
If we let f(w) denote the matrix with typical element &(o), we have the matrix representations
r(h) = 1' eePhf'(u) dw , -5T
. m
(4.4.3)
(4.4.4)
For a general stationary time series we can write
in complete analogy to (4.1.Ia). k t us investigate some of the properties of the matrix f(w).
Definition 4.4.1. A square complex valued matrix B is called a Hermitian matrix if it is equal to its conjugate transpose, that is,
B=B*,
where thejmth element of B* is the complex conjugate of the mjth element of B.
Definition 4.4.2. A Hermitian matrix B is positive definite if for any complex vector w such that w*w>O,
w*Bw > 0 ,
and it is positive semidejinite if
w*Bw 3 0 ,
Lemma 4.4.1. For stationary vector time series of dimension k satisfying
for j , m = 1,2,. . . , k, the matrix f(w) is a positive semidefinite Hermitian matrix for all o in [-w, w].
Proof. The matrix f(o) is Hermitian, since, for all w in [-w, 4 and for
VECTOR PROCESSES
j , m = l , 2 ,..., k,
Consider the complex valued time series
k
Z, = alxl = C, ?xj,, j = I
where! d=(a1 ,a2 , ... , ark) is a vector of arbitrary autocovariance function of 2, is given by
171
complex numbers. The
Now ~ ( k ) is positive semidefinite, and hence, for any n,
and
Taking the limit as n 00, we have
k b
which establishes that f(o) is positive semidefinite. A
It follows immediately from Lemma 4.4.1 that the determinant of any two by two matrix of the form
172 SPECTRAL THEORY AND FlLTERlNG
is nonnegative. Hence,
The quantity
(4.4.7)
is called the squared coherency funcfion. The spectraI density may be zero at certain frequencies, in which case X’,,,(o) is of the form O/O. We adopt the convention of assigning zero to the coherency in such situations. The inequality (4.4.6), sometimes written as
st,m(o) c 1 , (4.4.8)
is called the coherency inequality. To further appreciate the properties of f(w), consider the time series
X,,=A,cosrr+B,sinrr,
X,, = A , cos rt + B2 sin rt , (4.4.9)
where r E (0, T ) and (A ,, B , , A,, B2)‘ is distributed as a multivariate normal with zero mean and covariance matrix
Then
Y,,(h) = a,, cos rh , %2(h) = a,, cos ?
y, , (h) = E{A ,A [cos nlcos r(f + h) + A , B2 [cos rf ]sin r(r + h) + B,A,[sin rtJcos rfr + h) + B,B,[sin rtlsin r(t + h)}
= q3 cos rh + Q , ~ sin r h ,
and Y&) = %*(--h) -
The matrix analog of the spectral distribution function introduced in Section 4.1
VECTOR PROCESSES
is
173
where F,,(w) is the cross spectral distribution function. For the example (4.4.9). F l , ( w ) is a step function with jumps of height fo;, at kr , F22(w) is a step function with jumps of height ;a,, at +r, F , , (u ) is a complex valued function where Re F,,(o) is a step function with a jump of height at kr, and Im F,,(w) is a step function with a jump of +c14 at -r and a jump of -4cI4 at r. Since the elements of F(w) are pure jump functions, we have a pure line spectrum. The real portion of the cross line spectrum is one-half the covariance between the coefficients of the cosine portions of the two time series, which is also one-half the covariance between the coefficients of the sine portions. The absolute value of the imaginary portion of the cross line spectrum is one-half the covariance between the coefficient of the cosine portion of the first time series and the coefficient of the sine portion of the second time series. This is one-half the negative of the covariance between the coefficient of the sine of the first time series and the coefficient of the cosine of the second. To consider the point further, we write the time series X2, as a sine wave,
X,, = # sin(rz + p) ,
where
A2
4 p = tan-’ - .
Let X,, be the cosine with the same amplitude and phase,
X,, = $ cos(rr + p)
= B, COSH- A , sin r t .
It follows that X,, is uncorrelated with X3,. The covariance between XI, and X,, is
E(X,,X2,} = E((A I cos rt + B, sin rt)(A2 cos r? + B, sin rr)} 2 = v13(cos2rt + sin rt) = wI3 ,
and the covariance between X,, and X,, is
E{X,,X3,} = E{(A I cos rr + B , sin r?)(B, cos rt - A, sin rt)} = Cr,, .
The covariance between XI, and X,, is proportional to the real part of the cross spectrum at r. The covariance bemeen X,, and X,, is proportional to the imaginary
174 SPtXTRAL THEORY AND FulTERINo
portion of the cross spectrum at r. The fact that X,, is X,, “shifted” by a phase angle of d 2 explains why the real portion of the cross spectrum is called the cospectntm and the imaginary part is called the quadrature spectrum. The squared coherency is the multiple correlation between XI, and the pair Xzrr X3,, that is,
We now introduce some cross spectral quantities useful in the analysis of input-output systems. Let the bivariate time series (X,, U,)‘ have absolutely summable covariance function. We may then write the cross spectral function as
fxr(w) = A,y(w)e”PXY‘”’ , (4.4.10)
where
We use the convention of setting y3ry(w) = 0 when both cxu(w) and qxy(o) are zero. The quantity (pxr(w) is calIed the phase spectrum, and A,,(@) is called the cross amplitude spectrum. The gain of Y, over X, is defined by
(4.4.11)
for those w where fxx(w) > 0. Let us assume that an absolutely summable linear filter is applied to an input
time series X, with zero mean, absolutely summable covariance function, and everywhere positive spectral density to yield an output time series
= i a,xl-j . (4.4.12) j= -a .
The cross covariance function is
and, by Corollary 3.4.1.1, the cross spectral function is
(4.4.13)
VECTOR PROCESSES
where
175
It follows that fxv(o)/fxx(w) is the transfer function of the filter. Recall that the phase do) and gain +(a) of the filter {aj} were defined by
Sincefx,(o) is the product of 2@(w) andfxx(w), where&(@) is a real function of o, it follows that the phase spectrum is the same as the phase of the filter. The cross amplitude spectrum is the product of the spectrum of the input time series and the gain of the filter. That is,
and the gain of Y, over XI is simply the gain of the filter. By Theorem 4.3.1, the spectral density of Y, is
and it follows that the squared coherency, for If,(w)l> 0, is
This is an interesting result, because it shows that the squared coherency between an output time series created by the application of a linear filter and the original time series is one at all frequencies. The addition of an error (noise) time series to the output will produce a coherency less than one in a linear system. For example, consider the bivariate time series
(4.4.16) X*r=@XI3r-1 + e l f
x,, = a r + %XI.,-, + e2r '
where 1/31 < 1 and {(eIf,e2,) '} is a sequence of uncorrelated vector random variables with E{e:,} = a,, , E{e:,) = a,,, and E{elfe2,f+h} = 0 for all t and h.
The input time series XI, is a first order autoregressive time series, and the output X,, is a linear function of X l f , XI.,-, , and eZr. The autocovariance and cross
176 SPECTRAL THEORY AND FlLTERlNG
covariance functions are therefore easily computed:
where g, that is.
is the spectral density of el,, and g22(o) is the spectral density of e2,;
TaMe4.4.1. Autocovariance and Cross Covarlance Functions of t4e Time Series M n e d in (4.4.16)
-6 -5 -4 -3 -2 -1
0 1 2 3 4 5 6
0.262 0.328 0.410 0.512 0.640 0.800 1 .Ooo 0.800 0.640 0.5 12 0.4 10 0.328 0.262
0.5% 0.745 0.932 1.165 1.456 1.820 2.550 1.820 1.456 1.165 0.932 0.745 0.5%
0.341 0.426 0.532 0.666 0.832 1.040 1.300 1.400 1.120 0.8% 0.7 17 0.573 0.459
0.213 0.267 0.333 0.4 17 0.52 1 0.651 0.814 0.877 0.701 0.561 0.449 0.359 0.287
VECTOR PROCESSES 177
The cospectrum and quadrature spectral density are
the phase spectrum and cross amplitude spectrum are
and the squared coherency is
W h e r e
Since g,,(o) is positive at all frequencies, the squared coherency is strictly less than one at all frequencies. The quantity do) is sometimes called the noise ro signu2 ratio in physical applications. Although the presence of noise reduces the squared coherency, the ratio of the cross spectrum to the spectrum of the input series still gives the transfer function of the filter. This is because the time series ezf is uncmelated with the input Xlf . ?he quantity
is sometimes called the error spectral density or error spectrum. We see that for a model such as (4.4.16),
For the example of Table 4.4.1 the elements of the spectral matrix are
178 SFWTRAL THHORY AND FILTERING
It is interesting that the spectral density of X,, is that of an autoregressive moving average (1,l) process. Note that if &, had been defined as the simple sum of X,, and ez,, the spectral density would also have been that of an autoregressive moving average (1, I ) process but with different parameters.
The noise to signal ratio is
?it4 = g z z ( 4 - (1 -0.8e-'")(l -0.8e'") -
1211f,(o)(%,(w) (0.72)(0.5 + e-"")(O.J -+ e'") '
and the squared coherency is
0.45 + 0.36 cos w x:2(w) = 1.27 - 0.44 cos w *
Let us consider the example a bit futher. The input time series XI, is autoconelated. Let us filter both the input and output with the same filter, choosing the filter so that the input time series becomes a sequence of uncorrelated random variables. Thus we define
x,, =XI, - @XI,,-l = el, 9
'41 = '21 - p'2.I - 1 9
and it follows that
The cross covariance function of X,, and X,, then has a particularly simple fonn:
qull , h = O , y3&)= a;q,, h = l , I , otherwise.
By transforming the input series to white noise, the cross covariance function is transfwmed into the coefficients of the function (or linear filter) that defines X,, as a function of X3,. The spectral matrix of X,, and X,, has elements
VE(JT0R PROCESSES 179
The reader may verify that
-a2 sin w
As we expected from (4.4.1 1) and (4.4.13), the phase spectrum is unchanged by the transformation, since we transformed both input and output witb the same filter. The filter changed the input spectrum, and as a result, that portion of the cross amplitude spectrum associated with the input was altered. Also, the error spectral density
is that of a moving average with parameter -@. The reader may verify that xi,(") is the same as X:,(o).
We have introduced and illustrated the ideas of phase spectrum and amplitude spectrum using the input-output model. Naturally, these quantities can be computed, in much the same manner as we compute the correlation for a bivariate normal distribution, without recourse to this model. The same is true of squared coherency and error spectrum, which have immediate generalizations to higher dimensions.
The effect of the application of a matrix filter to a vector time series is summarized in Theorem 4.4.1.
Theorem 4.4.1. Let X, be a real k-dimensional stationary time series with absolutely summable covariance matrix and let {A,}JW,-m be a sequence of real k X k matrices that is absolutely summable. Then the spectral density of
is
where
180 SPECTRAL THEORY AND FILTERING
and Q(w) is the conjugate transpose of fA(w),
. m
Proof. The proof parallels that of Theorem 4.3.1. We have
and the resuit follows. A
As in the scalar case, the spectral matrix of autoregressive and moving average time series follows immediately from Theorem 4.4.1.
Corollary 4.4.1.1. The spectral density of the moving average process
x, = 5 Bje,-, . 1- -m
where {e,} is a sequence of uncorrelated (0, X) vector random variables and the sequence {B,} is absolutely summable, is
= (2T)2f,9(0)2f*8(0).
Corollary 4.4.1.2. Define the vector autoregressive process X, by D
where A, = I, A,, # 0. the e, are uncorrelated (0,Z) vector random variables, and the roots of
I &Ajrnp-Jl = O
are less than one in absolute value. Then the spectral density of X, is
f,(W) = 2nrf~ , (w)~- '11[ f~ , (w) i - ' ,
where fA.(w) is defined in Theorem 4.4.1.
MEASUREMENT ERROR - SIGNAL DETECTION 181
43. MEASUREMENT ERROR - SIGNAL DETECTION
In any statistical model the manner in which the “emrs” enter the model is very important. In models of the “input-output” or “independent-variable-dependent- variable” form, measurement error in the output or dependent variable is relatively easy to handle. On the other hand, measurement error in the input or independent variable typically introduces additional complexity into the analysis. In the simple regression model with no& e m the presence of normal measurement error in the independent variable requires additional information, such as knowledge of the variance of the measurement error, before one can estimate the slope of the regression line.
In time series analysis the distinction between independent variable and dependent variable may be blurred. For example, in predicting a future observation in the realization, the past observations play the role of independent variables, while the future observation plays the role of (unknown) dependent variable. As a result, considerable care is required in specifying and treating measurement e m in such analyses. One important problem where errors of observation play a central role is the estimation of the values of an underlying time series that is observed with measurement error.
To introduce the problem, let {XI: t E (0, t 1, 22, . . .)} be a stationary time series with zero mean. Because of measurement error we do not observe X, directly. Instead, we observe
y, =x, + u, * (4.5.1)
where {u,: t E (0, t: 1,+-2,. . .)} is a time series with u, independent of X, for all t , j . The u, are the measurement mrs or the noise in the system. A problem of signal measurement or signal detection is the construction of an estimator of XI given a realization on Y, (or a portion of the realization). We assume the covariance functions of XI and u, are known.
We first consider the problem in the time dornain and restrict ourselves to finding a linear filter that minimizes the mean square error of our estimator of X,. Thus we desire weights {a,: j = -L, -(L - 11,. . . , M - 1, M}, for L 3 0 and M 3 0 fixed, such that
is a minimum. We set the derivatives of (4.5.2) with respect to the aj equal to zero and obtain the system of equations
M
zl aj[Yxx(j - r) + %“(j - r)l= .xrx(‘) 9
I= -L
r = -L, -(L - I), , . . , M - 1 , M . (4.5.3)
182 SPECTRAL THEORY AND FILTERING
For modest choices of L and fU this system of liner equations is w i l y solved for the a,. The mean square error of the estimator can then be obtained from (4.5.2). To obtain the set of weights one would use if one had available the entire
realization, we investigate the problem in the frequency domain. This permits us to establish the bound on the mean square error and to use this bound to evaluate the performance of a filter with a finite number of weights. We assume that u, and X, have continuous spectral densities with bounded derivatives. The spectral density of Y, and the cross spectral density follow immediately from the independence:
hub) = f x x ( 4 + f , , ( W ) *
f x y ( 4 = f x x ( 4 *
We assume that frr(o) is strictly positive and that I;:(&) has bounded first derivative.
We shall search in the class of absolutely summable filters {aj} for that filter such that our estimator of X,,
m
Now the mean square e m r is
(4.5.4)
and W, = X, - X,yw-m ajY, , is a stationary time series with a spectral density, say fw,,,(y(o). Hence, the variance of is
7r
&W(O) = EW:} = f W W ( 4 - 7 r
1T
= I_, cfxx(4 - 2mXdf , , (@) +ff(w)fux(w)I
+ ( 2 T ) 2 f , ( w 3 4 f u u o ) y ~ ( d * (45.5)
We have converted the problem of finding the a, that minimize (4.5.4) to the problem of finding thef,(o) that minimizes (4.5.5). This problem retains the same form and appearance as the classical r e p s i o n problem with f,(w) playing the role
MEASUREMENT ERROR - SIGNAL DETECTION 183
of the vector of coefficients. Therefore, we take as our candidate solution
which gives
ra
(4.5.6)
(4.5.7)
The weights aj are given by the inverse transform off,(@), II
ai= \-IIf,(ok'"do* j=0,+1*-+2 ,... . To demonstrate that these weights yield the minimum value for the mean
square error, we consider an alternative to (4.5.6):
wheref,(w) must be in the class of functions such that the integral defining the mean square error exists. The mean square emor is then
c f x x ( 4 - 2 . r r t r , ( w ; x ( d + f t ( @ l f Y X ( d I
+ ( 2 7 4 2 f ( 4 f X ( ~ ) f y y O ) d o
Since If,(w)i2/fyy(w) is nonnegative, we conclude that the f,(w) of (4.5.6) yields the minimum value for (4.5.5).
Note that 212f,(w)=fi;(w)f,(o) is a real valued symmetric function of w because fyx(w) =fxx(o) is a real valued symmetric function of w. Therefore, the weights a, are also symmetric about zero.
We summarize in Theorem 4.5.1,
Theorem4.5.1. Let{X,:rE(O,k1,+2 ,... )}and{u,:rE(O,%l,-C2 ,... )}be independent zero mean stationary time series, and define Y, = X, + u,. Let fyu(w), f i ,!(w), andfxx(w) be continuous with bounded first derivatives. Then the best linear filter for extracting X, from a realization of Y, is given by
184
Where
SPECTRAL THEORY AND FILTERING
Furthermore, the mean square error of X;=-ma,q-, as an estimator for X, is
U I_: cfxx(4 - r . f Y Y ( ~ ~ l - ' I & Y ( ~ ) 1 2 1 ~ ~ =I_, fXX(4[1 - G(41 dfiJ - Proof. Since the derivatives off;,!(#) andfyx(w) are bounded, the derivative
of f;,!(w)&',(o) is square integrable. Therefore, by Theorem 3.1.8, the Fourier coefficients of ~ ; , ! ( U & ~ ( W ) , the a,, are absolutely summable, and by Theorem 2.2.1, U , K - ~ is a well-defined random variable. That {a]} is the best linear filter follows from (4.5.81, and the mean square error of the filtex follows from (4.5.7). A
Example 4.5.1. To illustrate the ideas of this section, we use some data on the sediment suspended in the water of the Des Moines River at Boone, Iowa. A portion of the data obtained by daily sampling of the water during 1973 are displayed in Table 4.5.1. The data are the logarithm of the parts per million of suspended sediment. Since the laboratory determinations are made on a small sample of water collected from the river, the readings can be represented as
y c = x f + u f ,
where U, is the recorded value, XI is the "true" average sediment in the river water, and u, is the measurement error introduced by sampling and laboratory determination. Assume that XI can be represented as a first order autoregressive process
XI - 5.28 = 0.81(X,-, - 5.28) + el ,
where the e, are independent (0,0.172) random variables. Assume further that u, is a sequence of independent (0,0.053) random variables independent of X, for all
To construct a filter {a-2,a-,,a,,a,,a,} that will best estimate X, using the I, j .
observations { U , - 2 , Y,- ,, q, y i+ l , q+2}r we solve the system of equations
0.553 0.405 0.328 0.266 0.215 a _ , 0.328 0.405 0.553 0.405 0.328 0.266 a_, 0.405 0.328 0.405 0.553 0.405
0.328)( ;:) =(0.500) 0.405 0.266 0.328 0.405 0.553 0.405 0.215 0.266 0.328 0.405 0.553 0.328
to obtain
a = {a-,, a- ,, a,, a, , a,} = (0.023,0.120,0.702,0.120,0.023}.
MEASUREMENT ERROR - SIGNAL DETECTION 185
Tak45.1 . Logarithm of sedhnent Suspended in Des Moines River, Boone, Iowa, 1973
Smoothed Observations Daily Observations Two-Sided Filter One-sided Filter
5.44 5.38 5.43 5.22 5.28 5.21 5.23 5.33 5.58 6.18 6.16 6.07 6.56 5.93 5.70 5.36 5.17 5.35 5.5 1 5.80 5.29 5.28 5.27 5.17
- 5.40 5.26 5.27 5.22 5.25 5.37 5.63 6.08 6.14 6.13 6.38 5.96 5.69 5.39 5.24 5.36 5.51 5.67 5.36 5.29 -
- 5.26 5.28 5.22 5.23 5.31 5.53 6.04 6.10 6.04 6.42 5.99 5.14 5.42 5.22 5.33 5.47 5.72 5.37 5.30 5.27 5.19
SOURCE: U.S. Department of Interior Geological Survey-Water Resources Division, Sediment Concentration Notes, Des Moines River, Boone, Iowa.
The mean square error of the filtered time series 5.28 + Z:,=-2uj(&-i - 5.28) as an estimator of X, is 0.500 - a[0.328,0.405,0.500,0.405,0.328]’ = 0.0372. The data transformed by this filter are displayed in the second column of Table 4.5.1. Note that the filtered data are “smoother” in that the variance of changes from one period to the next is smaller for the filtered data than for the original data.
We now obtain a one-sided filter that can be used to estimate the most recent value of X, using only the most recent and the four preceding values of Y,. The estimator of X, is given by
5.28 + b,(& - 5.28) + b,(Y,-, - 5.28) -t bZ(q-l - 5.28) + b,(Y,-, - 5.28) + b4(Y-., - 5.281,
186 SPECTRAL TweoRY AND FILTERING
where
(b) b, (0.405 0.553 0.405 0.553 0.328 0.405 0.266 0.328 0.215 0.266) - I (0.405) 0.500 (0.134) 0.790
This one-sided estimator of X, has a mean square error of 0.0419. The filtered dafa using this filter are given in the last column of Table 4.5.1. To obtain the minimum value for the mean square error of the two-sided filter,
we evaluated (4.5.7), where
= 0.328 0.405 0.553 0.405 0.328 0.328 = 0.023 . 0.266 0.328 0.405 0.553 0.405 0.266 0.005 0.215 0.266 0.328 0.405 0.553 0.215 0.0o0
0.172 fxx(” )= 2741 - 0.81e-’”)(l - 0.81e’”) ’
0.172 0.053 +- fyy(o)= 2?r(l - 0.81e-i“)(l - 0.81e’”) 27T
- 0.2524(1 - 0.170e’”)(l- O.l7Oe-‘”) - 2n(l - 0.81e-’’w)( 1 - 0.81e’”) ’
obtaining
= 0.500 - 0.463 = 0.037 . The infinite sequence of weights is given by the inverse transform of
0.172 2~(0.2524)(1 - 0.17e-‘”)(l - 0.17e””) ’
which yields
(. . . , 0.1 193,0.7017,0.1193,0.0203,0.0034,~ . .) . While it is possible to use specrral metfiods to evaluate the minimum mean
square error of a one-sided filter [see, for example, Yaglom (1952, p. 97)], we examine the problem in a slightly different manner. Since Y, is an autoregressive moving average (1,l) process, the methods of Section 2.9 can be used to obtain a one-perioci-ataead predictor t (~-] , Y,-~, . . .) based on an infinite past. AS Y, = X, + u,, where u, is a sequence of independent random variables, the best predictor of XI based on q-,, Y,-2,. . . must be the same as the best predictor of Y, based on
q-z,. . . . Furthennore, given the predictor, the partial correlation between any q-,, j >O, and X, is zero. Therefore, to obtain the best filter for X, using Y,. Y,-,, . . . , we find the optimal linear combination of f’r(Y,-,, qb2,. . .) and Y,.
STATE SPACE MODELS AND KALMAN FILTERING 187
Denote the linear combination by c,Y, -I- c,t(Y,- ,, Y,-2, . . .), where
0.5530 0.3006) - *(“‘MOO) - (“‘“) (zy) =(0.3006 0.3006 0.3006 - 0.2100 ’
the matrix
) 0.5530 0.3006 ( 0.3006 0.3006
is the covariance matrix of [q, t($-,, Y,-2,. . .)I, and (0.5000,0.3~) is the vector of covariances between [Y,, Y,(q-l, yt+. . .)I and X,. It follows that the minimum mean square error for a one-sided predictor is 0.0419, AA
4.6. STATE SPACE MODELS AND KALMAN FILTERING
We begin our discussion under the assumption that the univariate time series XI is a stationary, zero mean, first order autoregressive process. We write
X~=QX,-~ + e , , r = l , 2 ,... , (4.6.1)
where e, am independent identically distributed (0, a:) random variables, denoted by e, - II(0, a:). Assume we are unable to observe X, directly. Instead, we observe Y,, where
Y,=XI+ul, t = 1 , 2 ,..., (4.6.2)
and u, is the measurement error. We assume u, -II(O, a:) and that u, is independent of e, for all t and j .
The model (4.6.1)-(4.6.2) is a special case of the state space representation of time series. Equation (4.6.1) is called the state equation or the transition equation, and X, is called the state of the system at time t. Equation (4.6.2) is called the measurement eguatwn or the observation equation. The model (4.6.1)-(4.6.2) was introduced in the Des Moines River example of Section 4.5. In that example it is very natural to think of the unknown level of the river as the true “state” of nature.
In Section 4.5, we constructed linear filters to estimate the values of the time series X, that is observed subject to measurement error. The filters were designed to minimize the mean square error of the estimation error for a particular set of observations. In many applications, the observation set is composed of all previous observations plus the current observation. In some engineering applications, it is important to have an efficient method of computing the current estimated value requiring as little storage of information as possible. Such computational methods have been developed by Kalman (1960, 1963) and others. The methods are often
188 S&”RAL THEORY AND MLTERING
called Kalman jilters. In this section, we study the state space model and the Kalman filter.
Let the model (4.6.1) and (4.6.2) hold, and assume that an initial estimator for X,, denoted by i,, is available. Let
2, = x, + u, , (4.6.3)
where u, is a (0, a:,) random variable, independent of (u,, e,), t = 1,2, . . . . The initial estimator 2, and parameters a:, af, at,, and a are assumed known. Equation (4.6.3) is called the initial condition equation (or starting equation). A possible choice for 2, for the model (4.6.1) is 2, = 0. With i0 = 0, we have
At time t = 1, we have the observation (Y1,ko) and knowledge of a:, a:, and a to use in constructing an estimator (predictor) of XI. On the basis of the specification, we have
2 2 - I 2 avo = x(0) = (1 -a ) a,.
Y, = X I + u , ,
aJ?,=X, + w , ,
(4.6.4)
where w , = avo - e,. The system of equations (4.6.4) can be written in matrix form as
zg=JxI+~l,
where Z, = (Y,, &f0)’, J = (1, I)’, and E , = (u,, w,)’. Because u, , e l , and uo are mutually uncorrelated, the covariance matrix of E, is diag(a:,oi,), where ail = a, + a avo. merefore, it is natural to construct an estimator of X, as the weighted average of Y, and a&, where the weights are proportional to the inverses of the variances of u, and wI. Thus,
2 2 2
8, =(a;’ + o;y(a;’Y, + a,;a8,) . (4.6.5)
The estimator (4.6.5) is constructed by analogy to linear regression theory. In the problem formulation (4.6,4), the information about the unknown random value, X,, is contained in the second equation of system (4.6.4). See Exercise 4.19. The same approach can be used to construct succeeding estimators.
Let the error in 8, as an estimator of X, be u , , where
(4.6.6) - 2 - I u, =fl -XI ={a;’ + awl ) (a,-2uu, + cr,:w,) . The variance of u , is
(4.6.7)
At time t = 2, it is desired to estimate X, using Y,, Y , , and go. Now d, is the best
- 2 - I a;, = (a;2 + CT,, ) .
STATE SPACE MODELS AND KALMAN FlLTERlNG 189
predictor of X 2 constructed from the data (Y,, $). Therefore, we need only combine the information in a i l with that in Y2 to obtain the best estimator of X , . Using (4.6.1)-(4.6.2) and the identity
= ax, + a(& - x,) = ax, + auI ,
we have the system of two equations containing X 2 ,
Y2=X2+uz.
aY?, =x2 + w , ,
(4.6.8)
where w2 = au, - e,. Because the vectors (u,, e,) are uncorrelated, we have
E&,e2, uI)'(u2,e2,u,)} = diag(at, a:, d) and
~ { ( u , , w,)) = diag(a,2, a: + a * ~ ~ ~ ) .
It fouows that the best estimator of x,, given ($, Y ~ , Y,), is
(4.6.9)
2 where a,* = a: + a'a:, . Letting the error g2 - X, be -2 - 1
0, = (Cri2 + uw2 (+2 + +2),
2 - 2 - 2 - I the variance of u, is uu2 = (a, + uvz) . previous observations, the estimator of X, for general t is
Because if-, contains all of the information about X, available from the
(4.6.10) -2 - 2f =(a;' + ~;;,">-'(u;~Y, + cw, axf-,),
and
= (ai2 + a;;:>- I a",
Equation (4.6.10) can be rearranged to yield
(4.6.11)
190
or
SPECFRAL THEORY AND FILTERING
The first term on the right of (4.6.12) is the estimator of X, based upon 2,- I . The second term is the negative of an estimator of the error made in predicting X, with a& I. We can give a direct justification of (4.6.12) as follows. Because u, - w, = Y, - a$, -, is a linear combination of V, and 2, -, , the linear estimator of X, b a d on
(ut - w,,$-,) = (u, -
is the same as the linear estimator based on (K, 2, - , ). Now u, - w, is uncorrelated with 2, - I. It follows, by the properties of regression (or of Hilbert spaces), that the best linear estimator of X, based on ($-,, Y,) is the best linear estimator of X, based on 2,- , , plus the best linear estimator of X, - &, based on Y, - a$,-, . Hence, the best estimator of X, is given by (4.6.12). See Exercise 4.25.
An equation equivalent to (4.6.12) is
2, = u, - a,2(a,z + r;r)-t(x - a$,-,). (4.6.13)
The form (4.6.12) appears in the original signal extraction literature, and the form (4.6.13) appears in the random model prediction literature. In the form (4.6.13), an estimator of the error u, is subtracted from Y, to obtain the estimator of XI.
The variance of u, satisfies the equation
(4.6.14) 2 r", = - (..;,)'(a: f a y .
Hence, the data required at time t f 1 to construct the estimator of X,, I and to construct the variance of the estimation error are the elements of (v,+,, 2,. CT;,),
The equations (4.6.12) and (4.6.14) are sometimes called the updating equations of the Kalman filter.
The results for the simple model (4.6.1)-(4.6.2) generalize to p-dimensional vector time series X, and to the situation wherein the observed vector Y, is the sum of a known linear function of X, and measurement error. Let
X, =AIXI-I + e l , t = 1,2 ,..., (4.6.15)
Y, = H,X, + u, , I = 1,2, . . . , (4.6.16)
go = x, + v, , (4.6.17)
where XI is a pdimensional column vector, Y, is an rdimensional column vector, {HI} is a sequence of known r X p matrices, A, is a sequence of known p X p matrices, 2, is a known initial vector, and {(u:, e:)} is a sequence of uncmlated,
STATE SPACE MODELS AND KALMAN F[LTERING 191
zero mean, vector random variables with known covariance matrix
Equation (4.6.15) is the state equation, equation (4.6.16) is the measurement equation, and equation (4.6.17) is the initial equation. As before, we assume vo of the initial equation to be uncorrelated with u, and el, t = 1,2, . . . . Considerabie generalization of the model is obtained by permitting the variances to be functions of r and by the inclusion of the matrices H, in the measurement equation. Many different forms of state space representations appear in the literature.
For the vector model, the system of equations analogous to (4.6.8) is
Y, = H,X, + u, ,
A,%, - I = X, + w,
(4.6.18)
for r = 1,2,. . . , where w, = A,v,- - e, and v, = 8, - XI. If we assume that Z,,,, is nonsingular, the best estimator of X, in (4.6.18) is
where
The estimator (4.6.19) can also be written as
where D, is the covariance matrix of Y, - R,A,fZ,- I and SWw,,H~ is the covariance between w, and Y, - H,A,%,- I . Therefore, the estimator (4.6.23) is the difference between the unbiased estimator A,%,-, of X, and an unbiased estimator of the e m r w, in A,%,.-l.
Equation (4.6.20), the second equation of (4.6.21), and uation (4.6.23) form a set of equations that can be used to construct 8, given Y,, , -, , If this system of equations is used for updating, only the matrix D, is required to be nonsingular. The matrix D, should always be nonsingular because there is little reason for the subject matter specialist to consider singular observation vectors Y,.
192 SFJXXRAL THRORY AND FLTERMG
The vector of updating equations analogous to (4.6.13) is
and an alternative expression for Xu,,, is
Exampie 4.6.1. In this example, we construct the Kalman filter for the Des Moines River example of Section 4.5. The model for the data is
(4.6.24)
where e, - II(0,0.172) and u, - iI(O,O.O53). The sediment time series has a nonzero mean, so the diffemce XI - 5.28 plays the role of XI of (4.6.1).
We begin with the first observation of Table 4.6.1, which we call Yt . If XI is a stationary process, as we assume for the Des Moines River example, we can use the population mean to initiate the filter. If we use the population mean as an estimator of X,, the variance of the estimation error is 0.500, which is the variance of the X, process. The system of equations (4.6.4) becomes
Y, -5.28=0.16=(Xt -5.28)+u1 ,
O = ( X , -5 .28)+wI.
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15
5.44 5.38 5.43 5.22 5.28 5.2 1 5.23 5.33 5.58 6.18 6.16 6.07 6.56 5.93 5.70
5.42467 5.38355 5.4 161 3 5.25574 5.2 7 5 8 8 5.22399 5.23097 5.31 117 5.52232 6.03227 6.10318 6.04413 6.42 123 5.98760 5.73215
0.04792 0.04205 0.041 88 0.04187 0.041 87 0.04 1 87 0.04187 0.04187 0.04187 0.04187 0.04187 0.04187 0.04187 0.04187 0.04 1 87
0.50000 0.20344 0.19959 0.19948 0.19947 0.19947 0.19947 0.19947 0.19947 0.19947 0.19947 0.19947 0.19947 0.19947 0.19941
STATE SPACE MODELS AND KALMAN FILTERING 193
where V((u,, w , ) } = diag(0.053,0.500). Therefore, by (4.6.10),
2, = 5.28 + (18.868 + 2.000)-'[18.868(YI - 5.28) + 01 = 5.28 + 0.145 = 5.425 .
The reader may verify that the coefficient for Yl - 5.28 is the covariance between X, and Y, divided by the variance of Y, . The variance of the error in the estimator of X I is cr," = 0.0479.
The estimate of X 2 is
z2 = 5.28 + [(0.053)-' + (0.2034)-']-'
X [(0.053)-'(Y2 - 5.28) + (0.2034)-'(0.81)(~, - 5.28)]
= 0.2075 + 0.7933Y2 + 0.16742, = 5.3836,
2 2 2 where ut2 = ue + a r,, = 0.2034. The variance of 22 - X, is
at2 = 0.2034 - (0.2034)2(0.053 + 0.2034)-' = 0.0420.
The estimate for X, is
(0.053)-'(Y3 - 5.28) + (4.0581)($2 - 5.28) (0.053)-' + (0.1996)-'
I!3 = 5.28 + = 5.4161.
and the variance of the estimation error is
u:3 =0.1996 - (0.1996)2(0.053 + 0.1996)-' = 0.0419.
The estimates and variances for the remaining observations are given in Table 4.6.1. Note that the variance of the estimation error is approaching 0.0419. This limiting variance, denoted by u:. was derived in Example 4.5.1 as the mean square error based on an infinite past. The variance of w, stabilizes at
ui = a: + a'ua,' = 0.172 4- (0.81)2(0.0419) = 0.1995.
It follows that equation (4.6.12) stabilizes at
gI = 5.28 + 0.81(2,-.l - 5.28) + 0.7901[Y, - 5.28 - 0.81(I!1-, - 5.28)],
where 0.7901 = (a: + u:)-'u,: and a = 0.81. AA
Example 4.6.2. We investigate estimation of XI for the Des Moines River
194 SPECTRAL THEORY AND FILTERING
example under the assumption that the mean of the process is unknown. We retain the assumption that the other parameters of the model are known. We write the model as
(4.6.25)
where p is the mean of the process. The first two equations of (4.6.25) are the state. equations and the last equation is the measurement equation. in terms of the model (4.6.15)-(4.6.17). a,,,, = a,, = 0.053,
A, = A = diag(a, 1) = diag(0.81,l) , &,, = x,, = diag(0.172,0),
Xi = (Z,, p), ei = (e,,, 0), and HI = (1,l). Under the model, each Y, is unbiased for p, and the variance of U, - p is uzz + cuu.
To initiate the filter, we use the knowledge that 2, is a random variable with mean zero and variance azz = a:. Letting Yo be the first observation of TabIe 4.6.2, we can form the system of equations
(4.6.26)
(4.6.27)
0 = z, + u, ,
Yo= p + z, I- uo . We are treating p as a fixed unknown parameter, so we do not have an initial equation of the type (4.6.26) for p. The first real observation furnishes the first information about p. From the system (4.6.26)-(4.6.27), we obtain the estimator
g; = (20, A) = (0, Yo)
with covariance matrix
where a - = 0.500 and a,, = 0.053. Using 2; = (0,5.44) and equation (4.6.19), we have
0.46953 -0.45252 18.868Y, + 19.5884 (3 =( -0.45252 0.47901 18.868Yl + 24.1832 = (-0.0193,5.4100)' ,
and from (4.6.20),
0172 0 0.328 -0.405) =( ' * ~ l t =( '0 0) '(-0.405 0.553 -0.405
STATE SPACE MODELS AND KALMAN FUTERMG 195
If we use (4.6.23),
'1 =(Ob8' ~ > ( S . ~ ) '(-0.405 Oso0 -OS4O5) 0.553 (:)(0.1430)-'(5.38 - 5.44)
= (-0.0193,5.4100)' ,
where
D, =0.053+(l,1)2,w,,(1,1)'=0.29605.
It follows that
0.46953 -0.45252) zuull =(-0.45252 0.47901
estimate of the sediment at t = 1 is
S, = gt + 4, = -0.0193 + 5.4100 = 5.3907 . The variance of the error in the estimated sediment at time one is
(1, l ) ~ n u , i ( l , 1)' =0.04351.
The estimates for t = 2 are
0.43387 -0.41230 -0.41230 0.43367
Where
0.4801 -0.3665) ' w w Z z = (-0.3665 0.4790 *
The estimates of S, = 2, + p, based on Y,, I-',-.,, . . . , Yo, are given in Table 4.6.2. Note that the estimation of the mean conttibutes modestly to the variance of $, - S,. 'Ihe variance of fi, is declining approximately at the rate t - I . While the variance of the estimator of S, = p + 2, will eventually approach 0.0419, the approach is slowed by the estimation error in &. AA
Example 4.63. In the previous examples, we appIied the Kalman filter to a stationary time series. Stationarity made it relatively easy to find starting values for the filter. In this example, we consider filteMg for a time series in which the autoregressive part has a unit root. Let the model be
t = O , Z,-,+e,, r = l , 2 ,..., (4.6.28)
y , = e + z I + u l , t = o , i ,..., (4.6.29)
1% SPECTRAL THEORY AND FILTERING
Table 4.6.2. Estimates of Sedhnent Constructed with Kalman Filter, Unknown Mean
0 1 2 3 4 5 6 7 8 9 10 I1 12 13 14
5.44 5.38 5.43 5.22 5.28 5.2 1 5.23 5.33 5.58 6.18 6.16 6.07 6.56 5.93 5.70
5.44OOo 5.39074 5.42324 5.25915 5.27933 5.22621 5.23298 5.31422 5.52856 6.04632 6.1 1947 6.06090 6.44377 6.00652 5.74886
5.44ooo 5.41001 5.42436 5.35070 5.35 185 5.326 13 5.32174 5.34347 5.40987 5.57235 5.61890 5.62881 5.74903 5.67363 5.62767
0.05300 0.0435 1 0.04293 0.04280 0.04272 0.04266 0.04261 0.04256 0.04252 0.04249 0.04246 0.04243 0.04240 0.04238 0.04236
0.55300 0.47901 0.43367 0.39757 0.36722 0.34 12 1 0.31864 0.29887 0.28141 0.26588 0.25 198 0.23%5 0.228 1 1 0.21780 0.20838
where (e,* uI)' - NI[O, diag(cr:, (r:)]. We treat the true part of Yo, denoted by 8, as a fixed unknown constant to be estimated. The Y-data of Table 4.6.3 were generated by the model with (wt, a:) = (0.25,0.16). In terms of the model (4.6.15)- (4.6.17), a,,,, = o;, = 0.16, A, = A = I,
H, = H = (1, I ) , and X,, = diag(0.25,O) . Table 4.63. Kalman Fflter Applied to a Unit Root Process
f - 0 1 2 3 4 5 6 7 8 9 10 1 1 12 13 14
r, j , = a, + 2,
2.30797 2.54141 3.08044 1.35846 1.55019 2.34068 1.33786 0.98497 1 .I7314 0.65385 0.35140 0.47546
-0.56643 0.04359
- 0.25374
2.30797 2.47588 2.89622 1.83049 1.63629 2.12430 1.57945 1.16759 1.17143 0.8 1285 0.493 1 5 0.48090
-0.24470 -0.04497 -0.18% 1
2.30797 2.37350 2.4252 1 2.38483 2.38257 2.38432 2.38372 2.38358 2.38358 2.3 8 3 5 7 2.38357 2.38357 2.38356 2.38356 2.38356
0.16OOo 0.11509 0.1 1125 0.1 1089 0.1 1085 0.1 1085 0.1 1085 0.1 1085 0.1 1085 0.1 1085 0. I 1085 0.1 1085 0.1 1085 0.1 1085 0.1 1085
0.16ooO 0.1 1509 0.11 125 0.11089 0.11085 0.1 1085 0.1 1085 0.1 I085 0.1 1085 0.1 1085 0.1 1085 0.1 I085 0.1 1085 0.1 1085 0.1 I085
0.16OOo 0.0449 I 0.0 1 369 0.00420 0.00129 0.00040 0.00012 0.00004 O.oooO1 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
STATE SPACE MORELS AND KALMAN FILTERING 197
The model specifies 2, = 0 and U, = 8 + uo. Therefore, we initiate the filter with
86 = (2?o, do) = (0, Yo) = (0,2308)
and
= diag(0,0.16).
Then
= diag(0.25,0.16) , D, = 0.16 -b (1 , l)[diag(0.25,0.16)]( I, 1)' = 0.57 ,
and
= (0.1024,2.3735)' . The estimate of B + Z, is yI = 4, + 2, = 2.4759,
The remaining estimates and the variances of the estimators are given in Table 4.6.3. There are several interesting aspects of the results. First, the variance of the estimator of 8 stabilizes rather quickly. Observations after the first five add very little information about the true value at time zero. Associated with the stabiliza- tion of the variance of JI is the fact that the estimate of 8 + Z, at time t depends very little on values of Y, for t - r < 5. Also, the variance of the estimator of 8 at time t is equal to the variance of the estimator of B f Z,. This is because the estimator of t9, expressed as a function of (Yo, Yi, . . . , K), is the mirror image of the estimator of 8 + 2,.
After a few initial observations, the estimation equation stabilizes. For this probIem, the limiting updating equation for yI = B + 2, can be written as
9, = 9, - I + c(K - 9( - , 1 I (4.6.30)
where
2 2 - 1 2 c = @, + a,) a,. , u~=O.5ae,,,[1 + (1 + ~ C T ~ ~ ' , , U ~ ) ' ' ~ I ,
2 2 2 2 - 1 4 uu = 0, - (CT, + aw) a,.
The expression for e: was obtained from the expression following (4.6.10) by setting a, = uwl and cot = uw.l- 2 2 2 2
198 SPECTRAL THEORY AND PILTERING
The j l of (4.6.30) is a convex combination of yl-i and U;. Hence, the sum of the weights on the Y,-,, j = 0, 1,. . . , f , that define 9, is one. This means that estimates constructed with this model maintain the level of the original series. If the true U, contains a positive time trend, the predictor (4.6.30) will have a negative bias. AA
In our discussion to this point, we have used information on the Y-process through U, to construct an estimator for XI. It follows from (4.6.15) that predictors of future values can be constructed from the estimator of XI. For example, tbe predictor of XI+, constructed with data t h g h Y, is
% + I & =Al+l% (4.6.3 1)
and the variance of the prerfction error V@, + - Xf + is
Zf+tlI = ~ w w , l + I . l + l =AI+i~uu11A:+I + ~ e e * f + I , f + l * (4.6.32)
The formula (4.6.31) can be applied recursively to obtain predictions for any number of periods. Thus,
CI CI
Xf+/[f =Af+r-iXf+/-i(f 1
Zf+/,l = A,+,%+,-IlfAf'+/ + % e * l + / . f + , *
and the variance of the prediction error is
The prediction formuIas can also be used in constructing estimators of X, when data are missing. Assume that (Y,,Y,, , . . , Yr-,, Y,+,) is available, that the Kalman filter has been initiated prior to time r - 1, and that the objective is to estimate X,+ At time r, the best estimator of X, is
2, = grIr- =A,%,- I ,
and V{X, - A,%,- ,} = Z,,,,. Because there is no Y-information at time r, we have X,,,, = Xu,,,. That is, the e m r in the estimator of X, constructed with 8,- , is the final estimation error. At time r + 1, when Y,+I is available, the best estimator of Xr+l is
g r + i =A,+ i g r + x w w . r + I.r+ l ~ ~ + ~ D ~ + ' ~ ( y r + 1 - H r + l A r + l ' r ) 9
where &w,r+l , ,+ l , I),+,, and Zuu,r+l , r+l are given by (4.6.201, (4.6.22), and (4.6.21), respectively.
Example 4.6.4. We use the Kalman filter and the data of Example 4.6.1 to construct estimators in the presence of missing data. We assume that the mechanism that causes data to be missing is independent of the (V,,Xl) process.
Table 4.6.4 contains the data of Table 4.6.1 with observations 7, 1 1, and 12
STATE smcE MODU AND KALMAN FJLTERING 199
Table 4.6.4. Estimates of Sediment ColrsGucted with Wman Filter
I Y, gt a:, a:,
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15
5.44 5.38 5.43 5.22 5.28 5.21
5.33 5.58 6.18
6.56 5.93 5.70
5.42467 5,38355 5.41613 5.25574 5.27588 5.22399 5.23463 5.31708 5.52380 6.03256 5.88957 5.77375 6.44992 5.99176 5.73285
0.04792 0.04205 0.04188 0.04187 0.04 187 0.04187 0.19947 0.045 1 I 0.04197 0.04 1 88 0.19948 0.30288 0.04637 0.04200 0.04188
0.J0000 0.20344 0.19959 0.19948 0.19947 0.19947 0.19947 0.30287 0.20160 0.19954 0.19948 0.30288 0.37072 0.20242 0.19956
missing. Recall that the model of Example 4.6.1 is
X, - 5.28 = 0.81(X1-, - 5.28) + e,,
yI=x,+u, ,
where (el, u,)’ -II[O, diag(O.172,0.053)]. Given the previous data, the predictor of X, is
2, =c. 5.28 + 0.81(5.22399 - 5.28) = 5.23463 ,
and the variance of the estimation emr is a:, = 0.19947. The estimator for X, is
= 5.28 + 0.81(-0.04537) + (0.30287)(0.35587)-1(0.05 f 0.03675) = 5.31708,
where Y, - 5.28 = 0.05,
a:, = a: + a‘a’,, = 0.172 + (0.81)20.19947 = 0.30287, R, = 0.053 + 0.30287 = 0.35587,
= 0.30287 - (0.30287)’(0.35587)-’ = 0.0451 1 .
, . TIE estimator for x,, is The calcuiations are analogous for
R,, = 5.28 + 0.81(5.88957 - 5.28) = 5.77375 ,
zoo
where
SPECTRAL THEORY AND FILTERING
a:,lz = a: + a 2 a ~ , , , = 0.172 f (0.81)20.19948 = 0.30288.
The estimator for XI, given in Table 4.6.4 is conshucted with
2 ( ( T ~ , ~ ~ , D I 3 , &) = (0.37072,0.42372,0.04637).
These calculations illustrate the importance of Y, in the estimation of X, for these data. AA
In some situations, it is useful to express an autoregressive moving average in the state space form. The vector first order autoregressive process is in the fom (4.6.15) with a fixed matrix A and zero measurement error. Therefore, because any autoregressive model can be put in the vector first order form (see Section 2.8), any pure autoregressive model can be put in the state space form. We shall see that there are alternative state space representations.
h i k e (1974) suggested a state space represeatation for autoregressive moving averages. To introduce the ideas, consider a second order moving average
Y, = BS-2 f PI S - 1 f e, (4.6.33)
where the e, are independent (0, af) mdom variables. Let
x: = (HY, I4 HY,,, I ay,,, I & 9
= (u,, a%- t + PI st WS) 9 (4.6.34)
where E{YT I t ) is the expected value of Y,. given an infinite past (Y,, U,-, , . . .). The vector X, contains information equivalent to the vector (4, ef-,), which is the information required for any future predictions of Y,. Furthermore, X, satisfies the first order vector autoregression
X , = A X , _ , f e , , (4.6.35)
where
and e: = ( I , PI, a)%. The state space representation of (4.6.18) is completed by adding the “observation equation”
u, = (1, 0, O)X, = Hx, (4.6.36)
to (4.6.35). To initiate the filter for the second order moving average, we begin with the
STATE SPACE MODELS AND KALMAN FILTERING 201
vector 2; = (0, o,o). The covariance matrix of go - X, is
The reader may verify that
(4.6.37)
and
g2,, =A%, =(1 +&+&)-'(PI +P,&,&,O) 'Y, . (4.6.38)
The vector 8; is the best predictor of (Y,, Y,, Y,) given Y , , and the vector g2,, is the best predictor of (Y,, Y3, Y,) given Y, . The predictor of (Y2, Y,, Y,) could have been obtained directly from the regression of (Y,, Y3, Y4) on Y,. The predictor for X, given (Y,, Y,) . and subsequent predictions can be constructed using the equa- tions (4.6.20), (4.6.21), and (4.6.23).
We now give a state space qresentation for a stationary autoregressive moving average of order (p, q). Let
P 9 y,=z q q j + I : P,q-,+E,, (4.6.39) j = I I = ,
where the e, are independent (0, at) random variables. The vector of conditional expectations, XI, of (4.6.34) becomes
where m = max(p, q + 1). From Theorem 2.7.1, we have m
y, = x Y4-r 9 (4.6.41) i - 0
where the y are defined in that theorem. It follows that m
(4.6.42)
202 SPECTRAL THEORY AND FILTERING
for j = I , 2, . . . , m - I . From equations (4.6.41) and (4.6.42), we have
X, =AX,-, + e l , (4.6.43)
where
A =
0 0
0 1 0 ... 1 ... 0 0
and it is understood that a,,,, . . . , ap+I are zero if m > p . Equation (4.6.43) and the equation
T=( l ,O , ..., O)X, (4.6.44)
form a state space representation for a stationary autoregressive moving average process.
Example 4.6.5. Xn this example, we use the Kalman filter to consttuct predictions for the stationary autoregressive moving average model
where ct -NI(O, 1). The second column of Table 4.6.5 contains observations generated by the process (4.6.45). By Theorem 2.7.1,
i -0
where (q,, y , y) = (1 .OO, 1.40.1.70). The state space representation of the model (4.6.45) is
X, =AX,-, + e l ,
w, = (LO, O)X, I
where
(4.6.46)
STAT@ SPACE MODeLS AND KALMAN FILTERING 203
Table4.6.5. Kalman Filter Used to Construct Predictors for an Autoregressive Moving Average
1 3.240 3.008 2.578 2 1.643 0.699 0.036 3 2.521 2.457 2.875 4 3.122 3.779 3.359 5 3.788 3.371 2.702 6 2.706 1.782 1.052 7 4.016 4.169 4.601 8 5.656 6.680 6.199 9 6.467 5.903 4.600
10 7.047 6.201 5.622 11 4.284 2.939 1.241 12 2.587 0.749 0.395 13 -0.421 -1.242 -1.671 14 0.149 0.276 1.028 15 -1.012 -0.776 -1.368
1.515 1.162 1.150 1.049 1.032 1.024 1.008 1.007 1.004 1 .002 1.002 1,001 1 .Ooo I .Ooo 1 .OO0
4.033 3.200 3.170 3.074 3.001 3.001 2.977 2.969 2.968 2.%3 2.%2 2.961 2.960 2.960 2.960
and
A = O O I . (1 : 008) Multiplying (4.6.45) by Y, and Y,-,, and taking expectations, we have
yY(0) - 0.8Oyy( 1) = 2.8260, ~ ~ ( 1 ) - 0.80%(0) = 1.4120.
It follows that [%(O), yv( l)] = [10.9878, 10.20221. If we initiate the filter with
2; = (0, 0,O) ,
the covariance matrix of 2, - X, is the variance of [Y,, E{Y,+, It}, E{Y,,, I ~ > I , namely,
10.202 9.988 8.802 . 10.988 10.202 8.742
8.742 8.802 8.028 ) L o o =( To construct the estimator of XI given 2, and Y, , we compute
=Z,, +A%,,&' = ~ E , u o o l
D, = (40, O)Zuv, ,( LO, 0)' = yu(0) = 10.988,
204
and
SPECTRAL THEORY AND HLTERXNG
2; = (1.000,0.929,0.7%)Y I
= (3.240,3.008,2.578). (4.6.47)
The reader can verify that the coefficients in (4.6.47) are y;’(O)[x(O), ‘yu(l), yy(2)1. The covariance matrix of 8, - X, is
The predictor of X, given 8, is
22 , l =a1 ’
and the covariance matrix of the prediction error is
1 1.515 2.085 2,248
( 2.248 3.238 3.577 Ezll = 2.085 3.033 3.238 ,
where LIZ!, = A ~ u u I , A ’ + Zee, The quantities required to construct g2 are
1.515 2.085 2.248 2.085 3.033 3.238 2.248 3.238 3.577
and
The predictions and variances of the prediction errors are given in the third and fifth columns, respectively, of Table 4.6.5. As the initial effects die out, the estimator of X, stabilizes at
8, =A$,,-, + (1.0,1.4,1.7)‘[Y, - (1.0, O)Ag,-ll = AX,-, + (1.0, 1.4, 1.7)’el .
The predictor stabilizes at
%,+ $1, = = AX, 9
and the covariance matrix of the e c t i o n e m stabilizes at
&+,,,=V{%+,,t -X,+l )=%c.
The second entry in g,,,,, is the predictor of E{Y,+, It + l}, based on
EXERCISES 205
observations through time t. Thus, it is also the predictor of & + 2 based on observations through time t. The limiting variance ofA!f+l,2,f - Y , + ~ is (1 + u:)a2, while the second entry on the diagonal of Z,, is u:u2. In the last column of Table 4.6.5 we give
where V{$,+,,,,, - X,, and u2 = 1. AA
is the second entry on the diagonal of Z,,
REFERENCES
Wens 4.1, 43-45. Anderson (1971). Beran (1992), Blackman and Tukey (1959), Bloomfield (1976), Brillinger (1975). Brockweli and Davis (1991), Granger and Hatanaka (1964), Hannan (1960, 1970), Harris (1%7), Jenkins and Watts (1968), Kendall and Stuart (1966). Koopmans (1974). Whittle (1963), Yaglom (1962).
seetlon 4.2. Amemiya and Fuller (1967). Bellman (1960), Wahba (1968). Section 4.6. Akaike (1974), Anderson and Moore (1979), Diderrich (1983, Duncan and
Horn (19721, Jones (19801, Kalman (1960, 1%3), Kahan and Bucy (1961). Meinhoid and Singpunvalla (1983), Sage (1968). S a l k and Harville (1981), Sorenson (1966).
EXERCISES
1. Which of the following functions is the spectral density of a stationary time series? Explain why or why not. (a) ~ w ) = 1 - + 0 2 , - - n < w ~ r . (b) f (w)= l + $ w , - T C W G ~ :
(c) Jlw) = 476 + cos 130, - Q 4 w T.
2. Let {X,: t E (0, 2: 1, 22, . . . )} be defined by X, = e, + 0.4e,- ,. Compute the autocovariance function r(h) and the spectral density f(w), given that the e, are independently and identically distributed (0, u2) random variables.
3. Give the spectral density for the time series defined by
X , - / 3 X I - , = e , + q e , _ , +crje,_,, t =o, +1,+2,. . . , where value, and the e, are uncorreiated (0, u2) random variables.
< 1, the roots of ma2 + a,m + az = 0 are less than one in absolute
4. Let
X, + a&- I f = 2,
206
and
SPECTRAL THeORY AND FTLTUUNG
where the el are uncorrehted (0, a2) random variables, and the roots of both m2 + q m + a; = 0 and rz + &r + = 0 are less than one in absolute value. Give an expression for the spectral density of X,. How do you describe the time series XI?
5. Find the covariance function and spectral distribution function for the time series
X, = u , cost + u2 sin f + V, ,
where (u,, uz)’ is distributed as a bivariate normal random variable with zero mean and diagonal covkance matrix diag(2,2), Y, = e, - e,- , , and the el are independent (0,3) random variables, independent of (u, , uz)’.
6. Given the following spectral distribution function:
n+ w , - n s w < -ni2, F,(W)= 5 1 ~ 4 - 0 , - d 2 G w < d 2 , I 9n+ 0 , ni2G w c T .
What is the variance of XI? What is the spectral distribution function of XI - X,- ,? Is there a k such that XI - X,.+ will have a continuous spectraI distribution function?
7. Prove Corollary 4.3.1.3.
8. Let {el: t E (0, t 1, 22, . . .)} be a time series of unconelated (0, a’) random + 0.5e1-,. Give the covariance matrix r(h) and the variables. Let XI =
spectral matrix tyo) for (Xl,ef)’.
9. Let {el,} and {ezf) be two independent sequences of uncorrelated random variables with variances 0.34 and 0.50 respectively. Let
XI, = O.8Xl1- , + e l l , x2, = x,, + %I
10. The complex vector random variable X of dimension q is distributed as a complex multivariate normal if the real vector [(Re X)’, (Im X)‘]’ is distribut- ed as a multivariate normal with mean [(Re p)’, (Im p)’]‘ and covariance
EXERCISES
matrix
207
where p = E{X} and Z is a positive semidefinite Hermitian matrix. Let (A B , , A,, BJ be the multivariate normal random variable defined follow- ing (4.4.9). Show that X = ( A , + iB,, A, + 5B2)r is distributed as a complex bivariate normal random variable, Give the Hermitian matrix E.
11. Let {uj},m=-= and {b,}r='=_m be absolutely summable, and let {X,: tE (0,+1, k2,. , .)} be a stationary time series with absolutely summabie covariance function. Consider the following filtering operation: (1) apply the filter {aj} to the time series XI to obtain a time series Z, and then (2) apply the filter {b,} to 2, to obtain a time series Y,. What is the transfer function of this filtering operation? Express the spectral density of Y, as a function of the spectral density of X, and of the transfer function.
12. Let {aj}, {bj}, and {X,} be as defined in Exercise 11, and define m
Y, = (aj + b,)X,-, . j a - m
Express the spectral density of Y, as a function of the spectral density of X , and the transfer functions of {a,} and {b,}.
13, Let X, and Y, be defined by
X I = 0.9X,- I + e, , U , = x , + u , ,
where {e,} is a sequence of normal independent (0,l) random variables independent of the sequence {u,}. Let u, satisfy the difference equation
u, + 0.5u,-, = u, ,
where {u,} is a sequence of normal independent (0,0,3) random variables. Assuming that only U, is observed, construct the filter {u,: j = -2, - 1, 0,1,2} so that
2 c. qf-, j - -2
is the minimum mean square error estimator of X,. Construct the one-sided filter {b,: j = 0, 1, . . . , 5 } to estimate X,. How does the mean square error of these filters compare with the lower bound for linear filters?
208 SPEcf R& THEORY AND FILTERING
14. Let f ( w ) be an even nonnegative continuous periodic function of period 27r. Let
%r
u(h) = I_ Aa)e-*ah do .
Show that, for q a positive integer,
!I - a(h), h = O , ? l , 22,. . . , + q ,
otherwise
is the covariance function of a stationary time series. (Hint: See Exercise 3.15 and Theorem 3.1.10.)
15. Let X, be a time series with continuous spectral density fx(w). Let K be a time series satisfying
P
Z q K - j = e , ,
Ifv(w) -fX(dl< (5 9
I'O
where the e, are uncorrelated (0, u2) random variables, a,, = 1, and Y, is defined by Theorem 4.3.4. (a) Show that (b) Let &(o) be strictly positive. Prove that given > 0 there is a p and a set
- y,,(h)l< 21re for all h.
(9: j = 0, 1,. . . , p } with a,, = 1 such that the time series Zt defined by
satisfies
for all w, where &(o) is the spectral density of 2, and ~ ( 0 ) = u2. Show that
m c yZ(h) =G 2 m 2 . h= I
(c) L,e$f,(w) be strictly positive. Show that, given e > 0, one may define two autoregressive time series 1',1 and Yz, with spectral densities
EXERCISES 209
(d) For the three time series defined in part (c) prove that
for any fixed real numbers {u,: t = 1.2, . . . , n}. 16. Let
XI, = e, - 0.8ef.., , X,, = u, - O . ~ U , - , ,
where {e,} is a sequence of independent (0,l) random variables independent of {u,}, a sequence of independent (0,6) random variables. Express
Y, =x,t +xz, as a moving average process.
17. Let Y, be defined by
y, = s, + 2, , s, = 0.9 SIP, + U, , 2, = 0.8 + e, ,
where the sequence {(e,,~,)'} is a sequence of normal independent (0,E) random variables with E. = diag(O.l,O.6). Construct the optimum filter {a,: j 7 -9, -8,. . . ,8,9} to estimate S,, where the estimator is defined by Zj=.-9u,Y,-j. Construct the best one-sided filter {bj: j = 0, 1, . . . ,8,9} to estimate S,. Compare the mean square error of these filters with the lower bound for linear filters.
18. Let C and D be k X k nonsiagular matrices. Show that
(C-' + D-')-'D--' = C(C + D)-' .
19. Let X-N(0 , ui), and let Y = X + u, where u -N(O, (r:) independent of X. Give the covariance matrix of (Y ,X) and the conditional expected value of X given Y. Consider the regression problem
(Y, 0)' = (1,l)'X + (u, €)' ,
where (u. E)' -N[(O,O)', diag{u:, a:)]. Show that the generalized regression
210 SPECraAL THMRY AND FILTERING
estimator of X constructed with known (a:, .;') is the conditional expected value of X given Y.
20. Let Z,, be an t X r nonsingular matrix, Z,, a p X p nonsingular matrix, and H an r X p matrix. Show that
[H'E;,,'H f $;:]-' = E.,, - E.,,H'(E,, + HE,,H')-'HE.,, ,
and hence verify (4.6.21). [See Duncan and Horn (1972).]
21. Using Exercise 20, verify equation (4.6.23).
22. Let the model (4.6.1)-(4.6.2) hold, and assume it is desired to construct a recursive estimator of XI+, given the observations Y,, Yz,. . . , Y,. Construct such an estimator. Consider both s > 0 and s < 0.
23. Let the following linear model hold
Y ' = X , B + u , , Y, = x,p + u2 ,
where Y, is n, X 1, XI is n, X k, /3 is k X 1, Y, is n, X 1, X, is n, Xk, and (ui,u;)' -N[O,I~T']. Let XiX, be nonsingular. Let XIX, and where
be given. Construct the best linear unbiased estimator of p as a function of fit, X',X,, X,, and Y,.
24. Prove Corollary 4.2.2.
25. Prove the following lemma.
Lemma. Let Z be the nonsinguiar covariance matrix of the vector Y: = (Y:I,Y;2,Y:3), where
BXERCISES 211
26.
27.
be the best linear estimator of (Y:,, Y:,) based upon Y, , . Let
(e:z,ef'3) = [ ( Y , ~ - fir2(1)f. (y13 - 9 f 3 , ~ ) f ~
and let V be the covariance matrix of (ef'z,e;3)f, where
Then the best linear estimator of Y,, given (Y;l,Y;z) is the best linear estimator of Y,, given Yfl , plus the best linear estimator of er311 given eftl,. That is,
't3/(1,2) = 9,311 +'I312 '
where E:312 = e~,V;~V2,, and
Show that the redictor of U,,, given (Y,, . , . , Y,) of Example 4.6.4 is the first
left element of element of A' 4 ,. Show that the variance of the prediction error is the upper
A~P,,,,A" +- A.X,J~ + P, .
Assume that the time series U, has a spectral density f,(w) such that f,(w) > 0 for all w and such that the Fourier coefficients of tf,(w))-' are absolutely summable. Cleveland (1972) defines the inverse autocorrelation function by pi(h) = [7+i(0)1-~yi(/z), where
U
ri(h) = (4v2)-'&(w)]-'e'"* dw . -II
(a) Show that if U, is a stationary autoregressive process satisfying
then the pi@) are the autocorrelations of the moving average process D
X, =e l + C ale,-, . (b) Show that if U, is an ARMA(p,q) process, then pi@) is the auto-
correlation of the ARMA(q, p ) process with the autoregressive and moving average coefficients interchanged.
i = I
212 SPECTRAL THEORY AND FILTERIN0
28. Let XI have the spectral density
(2?~)-‘ , d 4 d o d 7d2, Ma)= ( 2 ~ ) - ’ , - 1 ~ / 2 a ~ ~ - d 4 ,
[O otherwise.
The time series XI is sometimes called “bandpassed white noise.” Give the covariance function of XI. Plot the covariance function. Using the covariance function (or otherwise), approximate X, by a third order autoregressive process. Plot the spectral density of the third order process with the spectral density of the original process. What would be the error made in predicting XI one period ahead using the third order autoregressive process?
Given the availability of a random number generator, explain how you would create 200 observations from the time series X, using the “bandpass” idea. Assume you wish to create a normal time series.
29. Let
y, = s, + u, ,
where S, is a stationary time series satisfying
S, = 0.9SI-, + e,
where the e, are M(0,1.9) random variables. Let u, be a sequence of NI(0,l) random variables where u, is independent of Sj for all & j . Construct a centered two-sided filter of length seven to estimate S, given (q-3, . . , Y,+3). Construct a one-sided filter of length 6 using (Y$, Y-,. . . . , q+) to estimate S,. For the two-sided filter define
x, = s, - $ ,
where 8, is the estimate of S, computed using the filter. Compare the v&riance of XI with the lower bound for a two-sided filter.
30. Let Y, be a stationary autoregressive process satisfying
P
2 a;q-,=e,, or B ( w Y , = c , ,
where a, = 1, B( a) = 2:=o a{$#’, e, are independent (0, a*) random vari- ables, and S is the backshift operator. The process also satisfies
i = O
P
2. a l y + , = a , , or A ( w Y , = ~ , , t-0
where A( 93) = Xr=o a, Op-’ and a, are uncorrelated (0, a’) random variables,
EXERCEES 213
Show that the partial correlation between Y, and q-p given (U,-,, . . . , U,-,, . . , yt,,,) is zero for j = 1,2,. . . , and that the partial correlation between U, and Y,-p-j given (q-,,, . . . , U,-,, Y,,, , .. . q,,) is zero for j = 1,2,. . . .
31. Let
U,=x,+u,, X , = @ X 1 - , + e , ,
where e,-NI(0, a:), u,-NI(0, a:), {e,} is independent of {u,}, and X, is stationary with I@(< 1. Show that
U, = eq- , + U, +f lu , - , ,
where /? = -[a2 + (1 + 02)a:]-'8a:, u, are NI(0, a:), and (1 + pz)ui = a; + (1 + 82)af.
C H A P T E R 5
Some Large Sample Theory
So far, we have been interested in ways of representing time series and describing their properties. In most practical situations we have a portion of a realization, or of several realizations, and we wish a description (an estimate of the panuneters) of the time series.
Most of the presently available results on the estimation of the covariance function, the parameters of autoregressive and moving average processes, and the spectral density rest on large sample theory. Therefore, we s h d present some results in large sample statistics.
5.1. ORDER IN PROBABILITY
Concepts of relative magnitude or order of magnitude are useful in investigating limiting behavior of random variables. We first define the concepts of order as used in real analysis. Let (a,}:=, be a sequence of real numbers and (g,}E, be a sequence of positive real numbers.
Definition 5.1.1. We say a, is of smaller order than g, and write
a, = o(g,,)
if
Definition 5.13. We say a, is at most of order g,, and write
a, = O(g,)
if there exists a real number A4 such that g,' la,[ Q M for all n.
order and the properties of limits.
214
The Properties of Lemma 5.1.1 are easily established using the definitions of
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
ORDER IN PROBABILITY 215
kmma 5.1.1, Let {a,} and {b,,} be sequences of real numbers. Let x} and {g,,} be sequences of positive real numbers.
A
The concepts of ordm when applied to random variables are closely related to convergence in probability.
Debition 5.13. The sequence of random variables {X,,} converges in probability to the random variable X, and we write
plimX,, = x (the probability limit of X,, is X), if for every e > 0
lim ~{(x,, - X I > e} = 0. n+
An equivalent definition is that for every > 0 and S > 0 there exists an N such that for all n > N.
The notation P x,,+ x
is also frequently used to indicate that X,, converges in probability to X. For sequences of random variables, definitions of order in probability were
introduced by Mann and Wald (1943b). Let {X,,) be a sequence of random variables and {g,,} a sequence of positive real numbers.
216 SOME LARGE SAMPLE THulRY
Definition 5.1.4. We say X,, is of smaller order in probability than g,, and Write
if
plim g, 'x , = 0 .
Delinition 5.1.5. We say X, is at most of order in probability g,, and write
if, for every c > 0, there exists a positive real number M, such that
W n l * M e g n I c c
for all n. If X,, = Op(g,), we sometimes say that X,, is bouded in probability by g,,. We
define a vector random variable to be Op(g,,) if every element of the vector is Op(g,,) as follows.
Definition 5.1.6. If X, is a &-dimensional random variable, then X, is at most of order in pmbabiIity g,, and we write
if, for every (5 > 0, there exists a positive real number M, such that
P{&,,l 3M,g , , )Q € , j h2.m . . P k 9
for all n. We say X,, is of smaller order in probability than g,, and write
if, for every E > 0 and S > 0, there exists an N such that for all n > N,
P{&.,,I > q,,} < s , j = 1,2, . . . , k . Note that R might be a function of n, and X, could still satisfy the definition.
However, it is clear that the M, of the definition is a function of f only (and not of n).
A matrix random variable may be viewed as a vector random variable with the elements displayed in a particular manner, or as a collection of vector random variables. Therefore, we shall define the order of matrix random variables in an analogous manner.
Defsnition 5.1.7. A k X r matrix B,, of random variables is at most of order in
ORDER IN PRO3ABILlTY 217
probability g,,, and we write
B, = OJg,) 9
if for every Q > 0 there exists a positive real number Me such that
P{lbjpI *Megn} v i = l , 2 ,..., &, j = 1 , 2 ...., r ,
for all n, where the b,, are the elements of B,. We say that B, is of smaller order in probability than g, and write
ifforevery e>OandS>OthereexistsanNsuchthatforalln>N,
P{lb,l>eg,}<S, i = 1 , 2 ,.... k , j = 1 , 2 ,..., r .
For real numbers a,, i = 1, , . . . , n, we know that [Zy=, ujl GX;b,la,l. The following lemma, resting on this property, furnishes a useful bound on the probability that the absolute value of a sum of random variables exceeds a given number.
Lemma 5.12. Let X i , i = 1,2,. . , , n, be &-dimensional random variables. Then, for every e > 0,
h f . Let e > o be arbitrary. We see that if XE, lX,l a e, then (x i / a e/n for at least one i E {1,2, . . . , n}. Therefore,
A
Definition 5.1.3 applies for vector random variables of fixed dimension as well as for scalar random variables, if it is understood that IX, -XI is the common Euclidean distance.
Lemma 5.13. Let X, be a Rdimensional random variable such that
plimXi,=$, j = l , 2 ,..., R ,
where X,, is the jth element of X,. Then, for k fixed,
plimX, = X.
218 SOME LARGE SAMPLE THEORY
Proof. By hypothesis, for each j and for every E > 0 and S > O? there exists an integer N/ such that for ail n > l$
P{&, -x j (>k-”2r}sk-16 a
Let N be the maximum of {N,, N2, . . . * Nk}. Using Lemma 5.1.2. we have
k
P{IX, - X I > e } 4 C P { C X , - ~ ] > k - ” * € } s a j = I
for n > N. A
The proof of Lemma 5.1.3 should also help to make it clear that if k is not fixed, then the fact that X, = op( 1) does not necessarily imply that limlX,,l= 0. The vector random variable cornposed of n entries all equal to n-” furnishes a counterexample for k = n. We shall demonstrate later (Theorems 5.1.5 and 5.1.6) that operations valid for
order are also valid for order in probability. Since it is relatively easy to establish the properties analogous to those of Lemma 5.1.1, we do so at this time.
Lemma 5.1.4. Let Cf,} and { g,} be sequences of positive real numbers, and let {X,} and {Y,} be sequences of random variabIes.
Proof. We investigate only part i, leaving parts ii and iii as an exercise. By arguments similar to those of Lemma 5.1.3, ~,,Y,,I >f,g, implies that R,/f,l> 1 or (and) (Y,lg,,( > 1. By hypothesis, given e > 0 and S > 0, there is an N such
ORDER IN PROBABlLITY
that
Pflx,l'$I<0*5a *
P{[YnI > q,,} < 0.56
for n > N. Therefore,
219
for n >N. The second equality in part i follows from
which holds for ali E > 0. Let q, = maxcf,,, g,,}, Given E > 0 and S > 0, there exists an n such that
P{IX, I > 4 q,,} < $ s . P{lYn 1 > 3 cqn) < + 6
for n >N. Hence, the third result of part i foIlows by Lemma 5.1.2. A
One of the most useful tools for establishing the order in probability of random variables is Chebyshev's inequality.
Theorem 5.1.1 (Chebyshev's inequality). Let r>O, let X be a random variable such that E{IX1? < a, and let FOE) be the distribution function of X. Then, for every e > 0 and finite A,
Proof. Let us denote by S the set of x for which b - A1 a 1~ and by s" the set of x for which - A [ < E. Then,
\ ~r - A l w b ) = b - AlrdF(x) + I, 1. - A l ~ - ( x ) S
> er I, dF(x) = dP{(X - A1 3 E } . A
It follows from Chebyshev's inequality that any random variable with finite variance is bounded in probability by the square root of its second moment about the origin.
220 SOME LARGE SAMPLE THEORY
Corollary 5.1.1.1. Let {X,} be a sequence of random variables and {a,} a sequence of positive real numbers such that
E{Xi } = O(a,2).
Then
Xn = Op(an) *
Proof. By assumption there exists an M, such that
for all n. By Chebyshev’s inequality, for any M, > 0,
Hence, given L: > 0, we choose M, 2 M, e-’”, and the result follows. A
If the sequence (X,} has zero mean or a mean whose order is less than or equal to the order of the standard error, then the order in probability of the sequence is the order of the standard error.
Corollary 5.1.13. Let the sequence of random variables {X,} satisfy
E{CXn - = O(a5
where {a,} is a sequence of positive real numbers. Then
Xn = Qp(an) *
Proof. By the assumptions and by property ii of Lemma 5.1.1,
w:} = Et<X, - ECQY} + (E(X,b2 = O(a:) 1
and the result follows by Corollary 51.1.1. A
Let the probability limits of two sequences of random variables be defined. We now demonstrate that the sequences have a common probability limit if the probability limit of the sequence of differences is zero.
ORDER IN PROBABILITY 221
Theorem 5.1.2. Let {X,) and {Y,,} be sequences of random variables such that
If there exists a random variable X such that plim X, = X, then
plim Y,, = X . Proof. Given E > 0 and 6 > 0, there exists, by hypothesis, an N such that for
n > N ,
P{IY,, - X,l3 O S E } 6 0.58 and P { P , -X[ 3 0 . 5 ~ ) 4 0.56 . Applying Lemma 5.1.2, for n > N,
Deftnition 5.1.8. For r 3 1, the sequence of random variables {X.} converges in rth man to the random variable X if E{lX,,l'} < 03 for all n and
E{IX,, - XI'} +o as n + -. We denote convergence in rth mean by writing X,, &X.
We note that if E{lX,,-Xmlr}+O as n+w and m - + q then there exists a random variable X such that X,,&X. Using Chebyshev's inequality it is easy to demonstrate that convergence in rth mean implies convergence in probability.
Theorem 5.1.3. Let {X,) be a sequence of random variables with fipnite rth moments. If there exists a random variable X such that X,, &X, then X, +X.
Prod. Given r > 0 ,
by Chebyshev's inequality. For 6 > 0 there is, by hypothesis, an integer N = N(E, 6) such that for d l n >N,
and therefore, for R > N,
One useful consequence is Corollary 5.1.3.1, which can be paraphrased as follows. If the sequence of differences of two sequences of random variables converges in squared mean to zero, then the two sequences of random variables have a common probability limit if the limit exists.
222 SO&: OE SAMPLE THEORY
Corohry 5.1.3.1. Let {X,} and {Y,} be sequences of .dom variables such that
If there exists a random variable X such that plimX, = X, then plim Y, = X.
Proof. By Theorem 5.1.3, we have that pUm(X, - Y,) = 0. The conclusion follows by Theorem 5.1.2. A
Corollary 5.1.3.2. If the sequence of random variables {Y,} is such that
lim E{ Y,} = p n+m
and
then plim Y, = p.
Proof. The proof follows directly by letting the wnstants p and {E{Y,)} be, A respectively, the X and {X,} of Corollary 5.1.3.1.
Since we often work with functions of sequences of random variables, the following theorem is very important. The theorem states that if the function g(x) is continuous, then '+the probability limit of the function is the function of the probability limit."
Theorem 5.1.4. Let or,) be a sequence of real valued k-dimensional random variables such that plim X, = X. Let g(x) be a function mapping the reaI k- dimensional vector x into a real p-dimensional space. Let g(x) be continuous. Then plim g(X, ) = g(X).
Proof. Given E > 0 and S > 0, let A be a closed and bounded k-dimensional set such that
P{X E A) 2 1 - 0.58 . Since g(x) is continuous, it is uniformly continuous on A, and there exists a 8, such that
I&,) - g(x2)IC 6
if IP, - x21 < 6, and x, is in A. Since pIim X, = X, there exists an N such that for n>N,
P(IX, - XI > 8,) < +a .
ORDER IN PROBABILITY 223
Therefore, for n > N,
Theorem 5.1.4 can be extended to functions that are continuous except on a set D where P{X E D} = 0. See, for example, Tucker (1967, p. 104). Mann and Wald (I943b) demonstrated that the algebra of the common order
relationships holds for order in probability. The following two theorems are similar to a paraphrase of Mann and Wald's result given by Pratt (1959). The proof follows more closely that of Mann and Wald, however.
Theorem 5.1.5, Let {X,} be a sequence of k-dimensional random variables with elements (5,: j = 1,2, + + . , k}, and let {r,) be a sequence of k-dimensional vectors with positive real elements {rjn: j = 1,2,. . . , k} such that
3, = Op(r,,>, j = 192,. f . . t ,
q. ,=oP(rjfl) , j = t + l , t + 2 ,..., k.
Let g,(x) be a sequence of rea€ valued (Bore1 measurable) functions defined on k-dimensional Euclidian space, and let {s,} be a sequence of positive real numbers. Let {a,} be a nonrandom sequence of k-dimensional vectors. If
Sn(a") = 06,)
for all sequences {a,} such that
ujn = O(r,,) +
ujn = o(rj,),
j = 1.2,. . . , t , j = r + 1, t + 2, . . . , k ,
then
gn(Xn) = op(s, ) +
Proof. Set 6 > 0. By assumption there exist real numbers M,, M2, . . . , M, and sequences {Mj,,}, j = t + 1, t + 2,. . . , k, such that limn+ PAjn = 0 and
€ P{I$,,I>M,r, ,}<~, j = 1,2, . . . , t ,
P { & , ~ S M , ~ , , , } < ~ . c
j = t + 1,t + 2,. . . , k ,
224 SOME LARGE SAMPLE THEORY
for all n. Let {A,,} be a sequence of k-dimensional sets defined by
Then, for a E A,,, there exists an M such that lg,(a)l< Msn. Hence,
Ign<Xn>l< Msn
for all X,, contained in A,,, and the result follows, since the A,, were constructed so A that P { X , E A,) > 1 - c for all n.
in the conclusion.
Proof. The set A, is constructed exactly as in the proof of Theorem 5.1.5. There then exists a sequence {b,,} such that b,,+ b,, = 0 and
I 8, (4 c bnsn
for a contained in A,,. Therefore, lgn(X,,)l< b,s,, for all X, contained in A,,, and A the result follows from the construction of the A,,.
Corollary 5.13. Let {X,} be a sequence of scalar random variables such that
x,, = a -I- q A r n ) P
where r, + 0 as n + @J. If g(n) is a function with s continuous derivatives at x = a,
ORDER IN PROBABILITY 225
then
"-"(a)(X, - a)$-' + O&), 1
(s - l)! +..*+-
where g"'(a) is the j th derivative of g(x) evaluated at x = a.
Proof. Since the statement holds for a sequence of real numbers, the result folIows Erom Theorem 5.1.5. A direct proof can be obtained by expanding g(x) in a Taylor series with remainder
gCS'(b)(X, - a)" S!
I
where b is between X, and a. Since go)(x) is continuous at a, for n sufficiently large, g"'(b) is bounded in probability [i.e.. g'"(b) = Op( l)]. Therefore,
g'"'(b)(X, - = O,(r:).
S! A
Corollary 5.1.6. If
x, = a + Up@,)
in the hypothesis of Corollary 5.1.5 is replaced by
then the remainder O,(r:) is replaced by op(r:),
Proof, The proof is nearly identical to that of Corollary 5.1.5. A
Note that the condition on the derivative defining the remainder can be weakened. As we saw in the proof of Corollary 5.1.5, we need only that g'"(6) is bounded in probability. Also see Exercise 5.30.
The corollaries genetalize immediateIy to vector random variables. For example, let
X, = a + op(r,,).
where Xn=(Xln,XZ ,,..,, XJ. a=(a, ,a , ,..., a&)', and r,,-+O as n - + w Let g(x) be a real valued function defined on k-dimensional Euclidean space with
226 SOME LARGE SAMPLE THEORY
continuous partial derivatives of order three at a; then, for example,
where t3g(a)/dxj is the partial derivative of g(x) with respect to xj evaluated at x = a and d2g(a)/t3x, ax, is the second partial derivative of g(x) with respect to xi and xi evaluated at x = a.
The Taylor expansions of Corollaries 5.1.5 and 5.1.6 are about a fixed point a. Expansions about a random point are also valid under certain conditions.
Theorem 5.1.7. Let {X,,}, {W,,} be two sequences of kdimensional vector random variables defined on the same probability space. Let l(X,,, Wn) be the line segment joining X,, and W,,. For every (r > 0, suppose there is an N, and a set B, in k-dimensional Euclidean space, such that
P(l(X,, W,,) E Be} a 1 - €
for al l n >N,. Let g(x) be a real valued function that is continuous with continuous partial
derivatives through order s on B, for every > 0. Suppose the absolute values of the sth order partial derivatives are bounded on B, and that
x, - w,, = OJr,,) ?
k k
+ 2-' c, 2 g'V'(wn)(xi, - w,")(3,, - Y,,) + * ' ' i = l j = l
k k
+ [(s - l)!]-I 2 .c g(l-r)(w,,)(x,n - yn)...(xl,, - 4,)
+ OJr3 9
i=1 I = ]
where g(')(W,,) is the first partial derivative of g(x) with respect to xi evaluated at x = w,,, and g(*...')(W,,) is the sth partiai derivative of g(x) with respect to the elements of x whose indexes are i, . . . , t, evaluated at x = W,,.
Proof. Let 6 > 0 be given, and let Be be the associated set. Then g(x) and the first s derivatives of g(x) are continuous on Be. For Z(X,,, W,,) E Be, one can expand the function in a Taylor series with exact remainder. Far scalar X,,, the expansion
CONVERGENCE IN DISTRIBUTION 227
is
g(XJ = g(WJ + g(‘)(W,,NX, - W,) + * * *
-I- [(s - l)!]-’g‘‘-’)(W,)(X, - %)’-’ + (~!)-‘g‘”)(W,’)(X,, - W,)“ ,
where W: is on the line segment joining X, and W , and for scalar arguments, g”’(W,) denotes the jth derivative. Now g‘”(W,) is bounded on B,, and X, - W, = OJr,). It follows that
F
The arguments are analogous for the vector case, and the result is established. A
Example 5.1.1. Let W, = W, where W-N(0 , I). and let
X , = W + Z , ,
-112 where Z,, - U(-n-’”, n given, and set
), independent of W Let g(x) = x-2. Let 1 > t. > 0 be
B, = {x: 1x1 > 2-’e},
BZr = {x: 1x1 > e} . The ordinate of the standard normal distribution is about 0.40 at zero. Therefore, P{W E Be} 2 1 - 0.4~ and P{W E B2e} 3 1 - 0 . 8 ~ If W E B , , and n > 4 ~ - ~ , then X, E B,. Hence, if n > 4r-2 the probability is greater than 1 - E that f(X,, W,) is in Be. The function x - 2 is continuous with continuous derivatives on Be. The absolute value of the first derivative and of the second derivative are bounded by finite multiples of e-3 and E - ~ , respectively, for x E B , . Therefore. the conditions of Theorem 5.1.7 are met and
&= w-2 - ~ w - ~ ( x , , - W) + O , ( ~ - ’ )
= W - 2 f O p ( n - ” 2 ) .
Because W 2 is a chi-square random variable, g(X,,) converges in probability to the reciprocal of a chi-square random variable, by Theorem 5.1.4. Theorem 5.1.7 enables us to establish the order of the remainder in the approximation, AA
5.2. CONVERGENCE IN DISTRIBUTION
In the preceding section we discussed conditions under which a sequence of random variables converges in probability to a limit random variable. A second type of convergence important in statistics is the convergence of a sequence of distribution functions to a limit function. The classical example of such conver-
228 SOME LARGE SAMPtE THEORY
gence is given by the central limit theorem wherein the sequence of distribution functions converges pointwise to the normal distribution function.
Definition 5.2.1, If {X,} is a sequence of random variables with distribution functions {F&)}, then {X,) is said to converge in distribution (or in$aw) to the random variable X with distribution function F'(x), and we write X, -+X, if
lim F,,(x) = Fx(x) n-W
at all x for which Fx(x) is continuous. Note that the sequence of distribution functions is converging to a function that
is itself a distribution function. Some authors define this type of convergence by saying the sequence {Fxd(x)} converges completely to F&). Thus our symbolism
4
is understood to mean that Fx,(x) converges to the distribution function of the random variable X. The notation
is also used.
Theorem 5.2.1. Let {X,} and {Y,} be sequences of random variables such that
plim(X, - Y,,) = 0 .
If there exists a random variable X such that
Y X , - - + X ,
then
Proof. Let W and Z be random variables with distribution functions F,(w) and F,(z), respectively, and fix e > 0 and S > 0. We first show that
P{IZ - w[ > 4) 4 s
implies that
F,(Z - €) - s F,(Z) e F,(Z + 6 ) + S
CONVERGENCE W DISTRIBUTfON 229
for all z. This result holds because
F& - €1 - F,(Z) = P { Z b Z - €} - P{W 6 z } C P{Z = P { Z < z - r and W > z }
z - E} - P{Z 6 z - E and W d Z }
P{(W- 2) > €}
q w - 21 > €} c 6 ,
and, in a similar manner,
Let xo be a continuity point for F,@). 'Then, given S > 0, there is an q > 0 such - x, lC q and FJx) is continuous at xo - g and at that IFx(.> - Fx(xo)l < $6 for
no + q. Furthermore, for this S and q, there is an N, such that, for n > N,,
PW" - y,t>rl)-=Jis
and therefore
for all x. Also, there is an N, such that, for n > N2,
and
IFxp, + 7) - FX(X0 + 4 < 26 '
Therefore, given the continuity point xo and S > 0, there is an q > O and an N = max(N,, N,) such that for n > N,
As a corollary we have the result that convergence in probability implies convergence in law.
230 SOME LARGE SAMPLE THEORY
CoroJlary 53.1.1. Let {X,} be a sequence of ranc&mr iuiables. If there exists a random variable X such that plim X, = X, then X, +X.
Corollary 5.2.1.2. Let {X,} and X be random variables such that plim X, = X. If g(x) is a continuous function, then the distribution of g(X,) converges to the distribution of g(X).
Proof. This follows immediately, since by Theorem 5.1.4,
We state the following two important theorems without proof.
Theorem 5.2.2 (Helly-Bray). If {F,(x)] is a sequence 05 distribution hc t ions over k-dimensional Euclidean space @) such that F,(x) -#(x), then
for every bounded continuous function g(x).
Theorem 5.23. Let {F,(x)} be a sequence of distribution functions over 9"' with corresponding characteristic functions {pa(u)}.
(i) If F&)%(x), then q,,(u)+ &u) at ail u E 9!(k', where &u) is the characteristic function associated with F(x).
(ii) Continuity theorem. If y,,(u) converges pointwise to a function du) that is continuous at (0, 0, . . . , 0) E B@), then p(u) is the characteristic function of a distribution function F(x) and F,(x) SF(=).
Theorem 53.4. Let {X,} be a sequence of k-dimepional random variables with distribution functions {Fxl(x)} such that Fx,(x)+Fx(x), and let T be a continuous mapping from B") to ~9). m e n
Proof. By the Helty-Bray theorem, the characteristic function of RX,) A converges to the characteristic function of RX), and the result follows.
Theorem 5.2.5. Let {X,} and {Y,} be two sequences of k-dimensional random variables such that X, is independpt of Y, for ail ne If there exist random variables X and Y such that Fx,(x)-+Fx(x) and FyA(y)+F,(y), then
FX,Y,(% Y > A U X ) F Y ( Y ) '
CONVERGENCE IN DISTRIBUTION 231
Proof. The characteristic function of (X:,Y:)' is given by
Now, by the Helly-Bray theorem, e,(u) 3 e(u) and fipyn(v) + (pv(v). Therefore,
Pxy(u, v) =,l@ eFk,(u)fin(v) = %(U)4Py(V) *
c By the continuity theorem, this implies that F, (x,y)+FXy(x,y), where Fxy(x, y) is the distribution function of independent "ddom variables associated
A with the characteristic function yky(u, v).
From Corollary 5.2.1.1, we know that convergence in probability implies convergence in law. For the special case wherein a sequence of random variables converges in law to a constant random variable, the converse is also true.
Lemma 5.2.1. Let {Y,} be a sequence of p-dimensional random variables with corresponding distribution functions (Fym(y)}. Let Y be a pdimensional random variabie with distribution function Fy(y) such that P{Y = b} = 1, b is a constant vector, and
Then, given E > 0. there exists an N such that, for n > N,
P(IY, - b l 2 E } < E . Proof. t e t B = {y: y, > b, - d p , y2 > b, - e/p, . . . , yp > b, - d p } . Then
Fy(y) = 0 on the complement of B. Fix E > 0. As Fy (9) +F,Jy), there exists an No such that, for n > N o ,
C
E € E c F Y , , @ , ) = F ~ ~ ( ~ , +- ,b , P +-, P . . . ,bp -p ) <- 2P
There also exists an N, such that, for n > N,, 1 - Fy,(b, + dp, b, + SIP, . . . , b, +
232 SOME LARGE SAMPLE "HE4)RY
rlp) < d2. Therefore, for n > max(No, N, ),
and it follows that
Theorem 5.26. Let {(XL, Y:)') be a sequence of (k +-p)-dimensional random variables where X, is k-dimensional. Let the sequence of joint distribution functions be denoted by {Fx,yn(x,y)} and the sequences of marginal distribution functions by {Fxm(x)} and {Fy (y)}. If then exists a k-dimensional -om variable X andca p-dimensional &om variable Y such that Fxn(x)-sFx(x) and Fym(y) +Fy(y), where P ( y = b} = 1 and b = (bl , b,, . . . , bp)' is a constant vector, then
C Fx,y,(x, Y)-+ Fxy(x. Y)
Proof. Now P{Y = b} = 1 implies that
Fx(x) = F x y k b) ,
that Fxy(x, y) = 0 if any element of y is less than the corresponding element of b, and that Fxy(x, y) = Fx(x) if every element of y is greater than or equal to the corresponding element of b. Fix e > 0, and consider a point (xo, yo) where at least one element of yo is less than the corresponding element of b by an amount E.
Then Fxy(xo, yo) = 0. However, there is an No such that, for n >No,
by Lemma 5.2.1. Let (xo,yl) be a continuity point of Fxy(x,y) where every element of yI exceeds the $orresponding element of b by elpl" > O . Because F,,Cx) -+Fi(x) and Fyn(y) +Fy(y), we can choose Nl such that, for n ,A',, lFxn(xxo) - Fx(xo)l < €12 and IFy,(yI) - Fy(yI)l < €12. Hence,
A
CENTRAL LIMIT THEOREMS 233
Utilizing Theorems 5.2.4 and 5.2.6, we obtain the following corollary.
Corollary 5.2.6,l. Let {X,} and {Y,} be two sequences of k-dimensional random variables. If thgre exists a &dimensional random variable Y and a fixed vector b such that Y,+Y and X,+b, then
Corollary 5.26.2. Let {Y,} be a sequence of k-dimensional random variables, and let {A,) be a sequence of k X k random matrices. 15 there egsts a random vector Y and a fixed nonsingular matrix A such that Y, +Y, A,, +A, then
A;IY,& A-IY.
5.3. CENTRAL LIMIT THEOREMS
The exact distributions of many statistics encountered in practice have not been obtained. Fortunately, many statistics in the class of continuous functions of means or of sample moments converge in distribution to normal random variables. We give without proof the following central limit theorem.
Theorem 53.1 (Lindeberg central limit theorem). Let {Zf: t = 1.2,. . .} be a sequence of independent random variables with distribution functions {F,(z)}. Let E{Z,) = u, E{(Z, - p$} = rrf, and assume
(5.3.1)
2 for all Q > 0, where V , = 2:-, rrr . Then,
where N(0,I) denotes the normal distribution with mean zero and variance one. A form of the central limit theorem whose conditions are often more easily
verified is the following theorem.
Theorem 5.3.2 (Liapounov central limit theorem). Let {Zf: t = 1,2, . . .} be a sequence of independent random variables with distribution functions {Ff(z)}. Let
234 SCWE 1;.4RCj@ SAMPLE THEORY
for some S >O, then n
v y 2 c, (2, - W)J'N(O, 1). 1 - 1
Proof. LetS>Oands>O,anddefinethesetA, by
A , = {z: IZ - > eV:"}.
"hen
1 " d l+8/2 6 c I lz - &12+6 dF,(z) 9 v, € , = I
which, by hypothesis, goes to zero as n + m. Therefore, the condition on the 2 + S A moment i m p k the condition of the Lindeberg theorem.
The reader is referred to the texts of Tucker (1967), Gnedenko (1%7), and
For a proof of the following extension of the central limit theorems to the Lobve (1963) for discussions of these theorems.
multivariate case, see Varadarajan (1958).
Theorem 533. Let {Zn:n= 1,2,,..} be a sequence of k-dimensiond random variables with distribution functions {Fzm(z)}. Let F&) be the dis- tribution function of X,, = A'Z,, where A is a fixed vector. A necessary and sufficient condition for Fzm(z) to converge to the k-variate distribution function F(z) is that FAan(x) converges to a limit for each A.
In most of our applications of Theorem 53.3, each F,,,(x) will be converging to a normal distribution function and h e w the vector random variable 2, will converge in distribution to a multivariate normal.
The Lindeberg and Liapounov central limit theorems are for independent random variables. It is possible to obtain the limiting normal distribution for sequences that satisfy weaker conditions. One type of sequence that has been studied is the martingale process.
Delenition 53.1. Let {XI}:-, be a sequence of random variables defined on the
CENTRAL LIMIT THBORBMS 235
space {a, d, P}. The sequence is a martingale if, for all t,
E{lXf 1> < O0
and
where d, is the sigma-field generated by ( X I , X , , . . . , X f } . It is an immediate consequence of the definition that for s G t - 1,
DeMtioo 5.3.2. Let {X,}:=, be a martingale sequence defined on ($2, d, P}. Let Zf = XI - XI- I . Then the sequence {Z,}:- I is called a sequence of martingale digerences.
We give a centraI limit theorem for martingale differences. Theorem 5.3.4 is due to Brown (1971). Related results bave been obtained by Dvoretsky (1972), Scott (1973), and McLeish (1974). Also see Hall and Hey& (1980) and Pollard (1984).
Theorem 53.4. Let {Zm: 1 C t G n, n 3 1) denote a triangular array of random variables defined on the probability space (a, d, P), and let {dfn: 0 C t s n, n 3 1) be any triangular m y of sub-sigma-fields of d such that for each n and 1 G t C n, Zf,, is &,,,-measurable and dl-l.,, is contained in din. For 1 Gk =G n, 1 " j G n, and n a 1, let
k
stn = Ir3 9
~ i = E { Z ~ n I ~ - l , n I ,
qn= c a:,, I
r-1
1
1 = I
and
Assume
236 SOME LARGE SAMPLE THEORY
where I(A) denotes the indicator function of a set A. Then, as n -+ 03, P
s;nlsnn+ N(0,l).
Proof. omitted. A
Corollary 53.4. Let {ef,,} be a triangular array satisfying
E{(e,,12+'ldf,_,.,}<M<w as.
for some 6 >0, where d,,,, is the sigma-field generated by {e),,: jsf} . Let {w,,,: 1 c t c n, n 3 1) be a triangular array of constants, x:=, w:,, + o for all n, satisfying
Then
Proof. Reserved for the reader. A
We now give a functional central limit theorem which is an extension of Theorem 5.3.4. For related results, we refer the reader to Billingsley (1968). In the following, we let D = D[O, I] denote the space of functionsflu) on [0,1] that are right continuous and have left limits. Let (0, d, P) be a probabiiity space, and let X(u, w ) be a random function on D defined by an element o of s1. We will often abbreviate X(u, w ) to X(u) or to X.
An important random function is the Wiener process. If W is a standard Wiener process, then W(u) is normally distributed for every u in [0,1], where W(u)- N(0, u). Furthermore, W(u) has normal independent increments. A process on [0,1] has independent increments if, for
O S ; u , € u , ~ * - . c u , s 1 ,
the random variables
W,) - W@,) , W(U, ) - WU,), . . . , WU,) - W(Uk-1) are independent. The weak convergence or convergence in distribution of a sequence of random eJements X,, in D to a random element X in D will be denoted by X, 3 X or by X,, +X.
Theorem 5.3.5. Let {e,}:' be a sequence of random variables. and let df-
CENTRAL LIMIT THEOREMS
be the sigma-field generated by {el, e,, . . . , e,- ,}, Assume that
and
for some S > 0. Let
where [nu] denotes the integer part of nu and the sum is zero if [nu] = 0. Then
Sn*W,
where W is a standard Wiener process,
Proof. See Billinglsey (1968) or Chan and Wei (1988). A
The conclusion of Theorem 5.3.5 holds for el that are iid(0, (r2) random variables. That form of the theorem is known as Donsker's theorem. See Billingsley (1968, p. 68).
We now give a result for the limiting distribution of continuous functionals of S,. The result follows from the continuous mapping theorem. See Theorem 5.1 of Billingsley (1968).
Theorem 53.6. Let S,, and S be random elements in D. For 0 C s C 1, let
Zi,,(d = 6 ukrXISn(u)I du
and
where fit . . . , f, are real valued continuous functions on the real line and ki 3 0. If S, + S, then
(S,,, z, ,,, . * * I Z,,,) * (S, z, 3 - * ' * Z,) . Proof. Omitted. A
The following corollary is useful in deriving the limiting distribution of estimators of the parameters of nonstationary autoregressive processes.
238 SOME LARGE SAMYLE THEORY
Corollary 53.6. Let {el) be a sequence satisfying the conditions of Theorem 5.3.5. Define
far r b 1 with Yo = 0. Then
Proof. From Theorem 5.3.5, we have that inul
converges weakly to crW(u), where W(u) is the Wiener process. If we let
and f i ( x ) = x2, then
and
The result fallows from Theorem 5.3.6. A
Theorem 5.3.7 is the vector version of the functional central limit theorem. We define a standard vector Wiener process of dimension k to be a process such that the elements of W(u) are independent standard Wiener pmeases.
Thearem 5.3.7. Let { e , } ~ ~ , be a sequence of random vectors, and let 4- be the sigma-field generated by {el, e,, . . . , e,- I). Assume
and
CEMaAL JJMlT THeoREus 239
E{le,l' *' 1 d,- } < M < m a.s.
far some 6 > 0, or assume the el are iid(0, XJ, where x,, is positive definite. Let I
Y, = Z ej j - I
and
s,,(u) = n- I i2 Z,1'2~fnul , o G u G 1 ,
where [nu) is the integer part of nu, and 2:;' is the symmetric sqm root of see. Then
Sn(u) * w ?
where W is a standard vector Wiener process. Also,
and R
n - ' X ; y Z (Yl-l -y)e;X,"2=Y-~' (1) . 1 - 1
Proof. See Phillips and Durlauf (1986). A
We shall have use for a strong consistency result for martingales.
Theorem 53.8. Let {S,, = Zy=, X,, Js,, n 2 1) be a martingale, and let {M,,, n 2 I} be a nondecreasing sequence of positive random variables where M,, is
measurable for each n. Let
Iim M,, = 03 as. n+
and m
2 M,VE{IX,~P I ~e,- , } < m as. 1'1
240
for some 1 S p C 2 . Then
SOME LARGE SAMptE THEORY
Proof. See Hall and Heyde (1980, p. 36). A
Coronary 53.8. Let {a,),", be a sequence of random variables, and let 59,- I
be the sigma-field generated by {e,, e,, . . . , ef-,}. Assume
Pruof. B y assumption
The conclusion follows by letting Mi = t in Theorem 5.3.8. A
5.4. APPROXIMATING A SEQUENCE OF EXPECTATIONS
Taylor expansions and the order in probability concepts introduced in Section 5.1 are very important in investigating the limiting behavior of sample statistics. Care must be taken, however, in understanding the meaning of statements about such behavior. To illustrate, consider the sequence {fill, where fn is the mean of n normal independent (p, 1) random variables, p # 0. By Corollary 5.1.5, we may write
fil = p-I - p-,(..tn - p) + p 3 f n - p)2 - p-4(zn - r)3 + op<n-2) (5.4.1)
It follows that 112 - - I plim n [ (xn - p- ' ) + p-'(zn - p)l= 0.
Therefore, by Theorem 5.2.1, the limiting distribution of n 112 ( x , - - I -p-') is the
APPROXIMATING A SEQUENCE OF EXPWTTATIONS 241
same as the limiting distribution of -n”’ p -‘ (4 - p). The distribution of -n p (f,, - p ) is N(0, for all n, and it follows that the limiting distribution of n”‘(.f,’ - p- ’ ) is N(0, P-~).
On the other hand, it can be demonstrated that E { X i ’ } exists for no finite n. since tbe expectation of (3,’ - p-’1 is not defined, it is clear that one cannot speak of the sequence of expectations { E { f i ’ } } , and it is incorrect to say that E ( x ~ ’ } converges to p-’.
The example illustrates that a random variable Y, may converge in probability and hence in distribution to a random variable Y that possesses finite moments even though E{Y,} is not defined. If we know that Y, has finite moments of order r > 1. we may be able to determine that the sequence of expectations differs from a given sequence by an amount of specified order. The conditions required to permit such statements are typically more stringent than those required to obtain convergence in distribution. In this section we investigate these conditions and develop approximations to the expectation of functions of mean or “meanlike” statistics. In preparation for that investigation we consider the expectations of integer powers of sample means of random variables with zero population means. Let (Ls,, y,,, Z,, fin)’ be a vector of sample means computed from a random sample selected from a distribution function with zero mean vector and finite fourth moments. Then
112 - 2
where E{XYZ} is the expectation of the product of the original random variables. Siroiiarly,
242 SOME LAROE SAMPLE THEORY
where uq is the covariance between X and Y, a,, is the covariance between Z and W, and so forth.
We note that the expectation of a product of either three or four mews is his is an example of a g e n d result that we state as a theorem. Our
proof follows closely that of W e n , Hurwitz, and Madow (1953).
Theorem 5.4.1. Let t, = (Z,,, f2,, . . . , j,,,,)' be the mean of a random sample of n vector random variables selected from a distribution function with mean vector zero and 6nite Bth moment. Consider the sequence {%,},"=,, and let b,, b,, . . . , b, be nonnegative integers such that B = Zys, b,. Then
O(n-'") if B is even, O(n-''f1)'2) if B is odd.
E{f;:j;; . . 0 jbm} =
Proof. Now E{X':,&; - .fh} can be expanded into a sum of terms such as
If there is a subscript matched by no other subscript, the expected value of the product is zero. (Recall that in the four-variable case this included terms of the form X,qZ,W,, XiV,ZGk, and XiqZkZ,..) If every subscript agrees with at least one other subscript, we group the terms with common subscripts to obtain, say, H groups. The expected value is then the product of the H expected values. The sum contains n(n - 1). * a (n -H + 1) terms for a particular configuration of H different subscripts. The order of n-B[n(n - 1) - - (n - H + l)] is n-'+H and will be maximized if we choose H as large as possible, If B is even, the largest H that gives a nonzero expectation is B/2, in which case we have BIZ groups, each containing two indexes. If B is odd, the largest H that gives a nonzero expectation is (B - 1)/2, in which case we have (B - 1)/2 - 1 groups of two and one group of three. A
The following lemma is used in the proof of the principal results of this section.
Lemma 5.4.1. Let {X,} be a sequence of k-dimensional random variables with corresponding distribution functions {F,(x)} such that
bi - p,l'dF,(x) = q.3, i = 1.2, . . . , k,
where the integral is over B'~), a, > 0, s is a positive integer, E{x,} = p =
APPROXIMATINO A SEQUENCE OF EXI”ATI0NS 243
(p! , H, . . . , k)’, 1x1 = [Zf=, x ; ] ’ ‘ ~ is the Euclidean norm, and
lim a, = 0 . n - w
Then
where the pi, i = 1,2,. . . , R, are nonnegative real numbers satisfying k
2 p;as . i- I
Proofi Without loss of generality, set all p, = 0. Define A = [- 1,1], and let IA(x) be the indicator function with value one for n E A and zero for x rZ A. Then, for oa q G s,
b,IQ G&(&) + GCir *
so that
where the integrals are over a‘”. By the Holder inequality,
where k
r=C P i . A
Theorem 5.43. Let {Xn} be a sequence of real valued tandom variables with corresponding distribution functions {Fn(x)}, and let cf,(x)} be a sequence of real valued functions. Assume that for some positive integers s and No:
(i) Jlx - pizs dF,(x) =a:, where a, 4 0 as n +w.
Ilf.(x)l* dFncX) = O(1).
flp’Cx) 5fnCx) .
(iii) f:’(x) is continuous in x over a closed and bounded interval S for n greater than No, where fi”(x) denotes the jth derivative of &(x) evaluated at x and
244 SOME LAROE SAMPLE 'IIJEORY
(iv) p is an interior point of S. (v) There is a K such that, for n >No,
and
Then
where
s = l ,
s > l .
Proof. See the proof of Theorem 5.4.3. A
Theorem 5.43. Let {X,} be a sequence of k-dimensional random variables with corresponding distribution functions {FJx)}, and let un(x)} be a sequence of functions mapping 9"' into 9, Let S E (0, m), and define a = &-'(I + 6). Assume that for some positive integers s and No:
(i) Jlx - (ii) J&x)I'+' ~ F , ( x ) = o(I).
(iii) f t f l ..dJ
greater than No, where
dF,,(x) = a:, where a, -+O as n +=.
(x) is continuous in x over a closed and bounded sphere S for all n
f ;I..
(iv) p is an interior point of S. (v) There is a finite number K such that, for n >No,
APPROXIMATING A SEQUENCE OF EXPECTATIONS 245
Then
where
The result also holds if we replace (ii) with the condition that the f,(x) are uniformly bounded for n sufficiently large and assume that (i), (iii)’ (iv), and (v) hold for a = 1.
Prod. We consider only those n greater than No. By Taylor’s theorem there is a sequence of functions {Y,} mapping S into S such that
where
1 i f x E S , 0 otherwise
and
k k
c(s!)-’ C * ’ - 2 Klx -
= (s!)-’Kk”Jx - . ) , - I i,=1
246 SOME LARGE SAMPLE THEORY
We now have that, for s > 1,
However,
APPROXWA'MNG A SEQUENCE OF EXPECTATIONS 247
We now give an extension of Theorem 5.4.3 to a product, where one factor is a function of the type studied in Theorem 5.4.3 and the second need not converge to zero.
Corollary 5.43. Let the assumptions of Theorem 5.4.3 hold. Let 2, be a sequence of random variables defined on the same space as X,. Let F,(x, z) be the distribution function of (X,,Z,). Assume
Then
Theorems 5.4.2 and 5.4.3 require that Jlf.(x)l'+* dF,(x) be bounded for all n sufficiently large. The following theorem gives sufficient conditions for a sequence of integrals Jl&(x)l dF,(x) to be O(1).
Theorem 5.4.4. Let U;,(x)} be a sequence of real valued (measurable) functions, and let {X"} be a sequence of k-dimensional random variables with corresponding distribution functions {F,(x)}. Assume that:
(i) If.(x)l d K, for x E S, where S is a bounded open set containing p, S is the
(ii) If,(x)lS Y(x)n' for some p > O and for a function Y(-) such that
(iii) Jlx - & I r dF,(x) = O(n-") for a positive integer r and an q such that
closure of S, and K, is a finite constant.
JIY(x)ly d ~ , ( x ) = q1) for some y. 1 < y <m.
l l q + l / y = 1.
Then
The result also holds for q = 1 given that (i), (iii), and lf,(x)I S K,np hold for all x, some p > 0, and K, a finite constant.
248 SOME LARGE SAMPL6 THEORY
Proof. Let S > 0 be such that
By Chebyshev’s inequality,
= K, + R P [ P { X , EA}]””[0(1)] = O(1).
For lf,(x)I G K2np and q = 1, we have
= o(1). A
Example 5.4.1. To illustrate some of the ideas of this section, we consider the problem of estimating log p, where p > 0 is the mean of a random variable with finite sixth moment. Let f,, be the mean of a simple random sample of n such observations. Since log x is not defined for x S 0, we consider the estimator
Suppose that we desire an approximation to the sequence of expectations of f,(Q to order n-’. We first apply Theorem 5.4.2 with a, = O(n-*I2) and s = 3. By Theorem 5.4.1, we have that
- p)zs} = q n - 8 ) , = 1 2, I (5.4.2)
APPROXLMATtNNG A SEQUENCE OF EXPECTATWNS 249
so that condition i of Themem 5.4.2 is met. To establish condition ii, we demonstrate that l f , ( ~ ) 1 ~ satisfies the conditions of Theorem 5.4.4. Since f,(x) is continuous for x>O, condition i of Theorem 5.4.4 is satisfied. For p = 1, y = 9 = 2 , and
we have that lf.(x)12 < nY(x) and f}Y(x)12 dF,(x) = @I). This is condition ii of Theorem 5.4.4; from equation (5.4.2) we see tbat condition iii also holds for r = 4, 5, or 6. Therefore, the conditions of Themem 5.4.4 are satisfied and the integral of lf.(x)I2 is order one.
Since the third derivative of log x is continuous for positive x, we may choose an No > p-’ and an interval containing y such that f y ’ ( x ) is continuous on that interval for all n >No. To illustrate, let 1.1 = 0.1, and take the interval S to be [0.05,0.15]. Then, for n > 20, f y ’ ( x ) is continuous on S. Therefore, conditions iii, iv, and v of Theorem 5.4.2 are aIsa met, and we may write
1 - - E((2, - + q n - y
2P2
where u2 is the variance of the observations. To order n-’, the bias inL(f,) as an estimator of log p is - ( Z I Z ~ ~ ) - ’ U ~ .
Since 3, possesses finite fifth moments, we can decrease the above remainder term to O(n-’) by carrying the expansion to one more term and using the more general form of Theorem 5.4.3. By equation (5.4.2), condition i of Theorem 5.4.3 holds for s = 4, a = 4, and a, = O(n-’”), so that S = 2 by definition. Condition ii is established by demonstrating that lf,(x)13 satisfies the conditions of Theorem 5.4.4. Defining
and letting p = 8 . we see that lf(x)I’< Y(x)nP and
jIycr)lz~F,(x)+ +x3/212dF,(x)=0(1),
as 2, has finite third moments. Therefore, we are using y = 7 = 2 and q p = 3. Since .@ - 4’ dF,(x) = 0(n-3) and lf.(x)I’ is continuous, the conditions of Theorem 5.4.4 are met. The sixth derivative of logx is continuous for positive x,
250 SOME LARGE SAMPLE THEORY
and we can find an No and a set S such that conditions iii, iv, and v of Theorem 5.4.3 are also met. BecausefL3)(p) = O(1) and E{(%,, - p)’} = O(n-2), we have
= log p - (2np2)-’u2 -I- O ( 8 ) I AA
5.5. ESTIMATION FOR NONLINEAR MODELS
55.1. ktimators That Minimize an Objective Fuactioa
The expected value of a time series is sometimes expressible as a nonlinear function of unknown parameters and observable functions of time. For example, we might have for {q: r E (0,1,2,. . .)}
where {x,} is a sequence of known constants and A is unknown. The estimation of a parameter such as A is considerably more complicated than the estimation of a parameter that enters the expected value function in a linear manner.
We assume the model
for t = 1.2, . . . , where the random variables are defined on a complete probability space (a, d, P), the vector 6’ = (6p, 6p, . . . , @:I’ of unknown parameters lies in a compact subset 8 of k-dimensional Euclidean space Rk, the function& is defined on R X Q with the form off; known, andA(o, 0 ) is continuous on 8 almost surely, for all r. The expression&(w, * ) represents the function for a particular o in a. The dependence of the function on w is generally suppressed for notational con- venience. The functionf; may depend on an input vector s,, and sometimes we may write the model as
Y, =fix,; 6’) + e, . The dimension of x, need not be fixed, and it may increase with r. Some elements of x, may be fixed and some may be random.
The derivatives of the functionf;(@) are very important in the treatment of this problem, and we introduce a shorthand notation for them. Let f:’)( 6) denote the first partial derivative off;@) with respect to the jth element of 6 evaluated at the point 6 = &. Likewise, letfI’“’(6) denote the second partial derivative of&(@ with respect to the jth and sth elements of 8 evaluated at the point 6 = d. The third
ESTIMATION FOR NONLINEAR MODEM
partial derivative, f~’”’(6), is defined in a similar manner. Let
251
(5.5.2)
be the vector of first derivatives, and let
be the n X R matrix of first derivatives. Let an estimator of the unknown 8’ be the 8 that minimizes Qn(8), where en(@) is a function of the observations and 8. The least squares estimator of 8’ is
the 8 that minimizes n
Q,,(@) = [Y, -f;(@12. r = l
For least squares, the function evaluated at 8’ is
(5.5.3)
n
en,(@’) = n- ’ 2 e: . 1 - 1
The function Qn@) is the negative of the logarithm of the likelihood in the case of maximum likelihood estimation for the model (5.5.1) with normal el.
In order to obtain a limiting distribution for the estimator, the sequence {xl, el}, the function A(@), and the parameter space Q must satisfy certain conditions. Stronger Conditions in one a m will permit weaker conditions in another. The random variables and the function must be such that the estimator is uniquely defined, at least for large n. Furthermore, the function being minimized must have a unique minimum “close to” the true parameter value, and the distance must decrease as the sample size increases.
These ideas are made moiz precise in the following theorems. We begin with results for the general problem and then consider special cases. Let 4 in 8 be a measurable function that minimizes Q,(8) on fd almost surely. Lemma 5.5.1 contains sufficient conditions for the consistency of 4. Lemma 5.5.1 is given in Wu (1981) and is based on the idea used by Wald (1949) in a proof of the strong consistency of the maximum l ikelihood estimator.
Lemma 55.1. Let 4 in 69 be a measurable function that minimizes an objective function Qn(0) on 0 almost surely.
(i) Suppose, for any q > O ,
(5.5.4)
h o s t surely. men tj, + eo almost surely as n -+ 00.
252 SOME LARGE SAMPLe THEORY
(ii) Suppose, for any q>O,
(5.5.5)
Then 4 4 6' in probability as n + *.
Proof. A proof of part i is given by Wu (1981). We p v e part ii. Suppose 4 does not converge to 8' in probability as n + m . Then there exist q>O, 1 5 e,, > 0, and a subsequence {n,} such that
implying
Therefore,
lim SUP P{ inf [Q,(6) - Q,(6')] 6 0) > e;, . n - w le-e0lrV
That is,
which is a contradiction of the condition (5.5.5). Hence, if (5.55) holds, then 4 A converges to 6' in probability as n + 01.
The conditions in Lemma 5.5.1 are rather general. Gallant and White (1988, pp. 18-19) give a set of sufficient conditions for strong consistency. We give a form of their result in Lemma 5.5.2. The condition (5.5.6) of the lemma is a uniform law of large numbers for the function Qn(6>. This condition guarantees that the sample function will be close to the limit function as the sample size increases. The condition (5.5.7) insures that the limit of QJ6) has a minimum at 8'. It is sometimes called the identification condition.
Lemma 5.5.2. Given (a, d, P) and a compact set 8 that is a subset of W', let Q : X 8 + 9 be a random function continuous on 43 a.s., for n = 1,2, . . . . Let 4 be as defined in Lemma 5.5.1. Suppose there exists a function Qn : Q) + W such that
ESTIMATION FOR NONLINEAR MODELS
uniformly on 8. Assume that for any q > 0
253
(5.5.7)
Then h-e0+o a.s. as n+=.
Proof. Now
The first term on the right of the inequality converges to zero as n + 00 because
and by the assumption (5.5.6), there exists a B in I with P(B)= 1 such that, given any e- > 0, for each o in B there exists an integer N(o , e-) < 00 such that for ail n > N(w, e-),
SUPIQ,(~, e) - Q,,(~)I .= c . e
The difference &@') - Q,,(6') also converges to zero as n -+m by (5.5.6). The A conclusion follows from the assumption (5.5.7) and Lemma 5.5.1.
A critical assumption of Lemma 5.5.2 is (5.5.6), which guarantees that the random functions en(@) behave like nonrandom functions Qn(e> for large n. If Q,(O) can be written as
then a natural choice for en(@) is
(5.5.8)
(5.5.9)
Gallant and White (1988, p. 23) give a uniform law of large numbers for a function of the form (5.5.8) which is a slight madifidon of the result of Andrews (1986). We give the result as Lemma 5.5.3.
254 SOME LARGE SAMPLE THEORY
Lemma 5.5.3. Let qf(8) be a function from fi X 8 to 9, t = 1.2,. . . , where (a, d, P) is the underlying probability space, and 8 is a compact subset of ak. Assume q,(w,* is continuous in e almost surely. Assume q,(- ,eo) is d- measurable for each 8' in 6 and for t = 1,2, . . . . For given 8' in 8 and S > 0, let
A(&) = {e E 8: 16 - 0'1 C 6 ) .
Assume that for each 8' in 8 there exists a constant So > 0 and positive random variables Ly such that
R
n-! c E{LP} = q 1 ) ; = I
and
1q,(t9) - qf(60)/ Lyle - 8'1 , t = I , 2, . . . , as.
for all 8 in the closure of A(&. Also assume that for al l 0 C S C So, where So may depend on 8*,
n
lim n-' 2 [I,(s) - E{I,(s)}J = o a.s, r = 1 n 3 m
and
where
Proof. See Gallant and White (1988, p. 38). A
It may be easier to obtain a uniform law of large numbers for the function Qn(e) directly than for {q,@), t = 1.2,. . . , n}. A uniform strong law of large numbers for en(@) is given in Lemma 5.5.4, and a uniform weak law of large numbers is given in Lemma 5.5.5.
ESTIMATION FOR N O W E A R MODELS 255
Lemma 55.4. Let 8 be a convex, compact subset of kdimensional Euclidean space, Bk, and let {Qn(6), n = 1,2,. . .} be a sequence of random functions. Assume
(i) For each @, in 8
(ii) Them exist random variables Ln such that
(b) lim sup 5, < 00 as., n-w+
(c) sup E{L,} < 03 . n
Then
n + m { lim sup I Q n ( @ ) - E ( p , ( 8 ) ~ } = 0 as.
Proof. omitted. A
Lemma 5.5.5. Let 8 be a convex, compact subset of k-dimensional Euclidean space, and let {Q,(@): n = 1,2,, . .} be a sequence of random functions. Assume:
(i) For each el in 6,
for some Q(@).
that for 0, and & in 8,
(b) lQ(8,) - eC%>l Ll8, - 41 I
(c) Ln = Up( 1) and L = Up( 1).
(ii) There exists a sequence of positive random variables {L,} and an L such
(a) IQn(# 1 - Qn(4)I -(LnI@l - @ I 9
Then
P W a,(@ - a@> = 0 n 4-
uniformly in 8 in 8.
Proof. Given > 0, consider the open set Gq = {& 149 - 91 < rl) defined by 4 and q. Since 6 is compact, there exists a k =k(q) and @,,%,. . . ,a, such that
256 SOME LARGE SAMPLE THEORY
8 = ue, G,, where Gi = G,. Now,
Given Q >O and ?> 0, by (i) there exists an Ni = Nf(ti, 7) such that for n >4,
By (ii)(c)? for given S > 0 there exists M > 0 such that
sup P[Ln>M]<3-'S n
and P[L >MI < 3-'S. Thus, given Q > 0 and 6 > 0, choose 11 = ( ~ M ) - ' Q . Then,
Similarly,
Choosing r = (3k)-'S gives the result. A
Given conditions for consistency of 4, we investigate the limiting distribution of the estimator. We begin by giving conditions undet which the least squares estimator converges in law to a nonnal random variable. In the theorem statement we assume the estimator to be consistent. The o h r assumptions of the theorem are sufficient for the existence of a cunsistent sequence of estimators (see Theorem 5.5.3), but are not sufficient to rule out multiple local minima. The theorem is general enough to cover models that contain lagged values of U, in the explanatory vector x,, To permit this, the sigma-field d,-, of the theorem is generated by xl through x, and el through q-'. For a vector a, let la1 =
Theorem 5.5.1. Let the model (5.5.1) hold, and let 4 be a weakly consistent estimator of 8' that minimizes Q,,,(e) of (5.5.3). Let .Q(-, be the sigma-field
ESTIMATION FOR NONLINEAR MODELS 257
generated by
Assume: (i) There exists a convex, compact neighborhood S of the true parameter
vector 8' such that A( - ,8) is measurable for each 8 in S, &(to, - ) is almost surely twice continuously differentiable with respect to 8 on S, S is in 8 and 8' is an interior point of S.
(ii) The vector sequence {[F,(Oo)e,, e,]} satisfies
E{[Ff(Bokf. e,, e:lld,-,) = [O, 0, a*] *
E{F:<8')Ff(8')e>e~.S,- I ) = F:(8°)F,(80)a' mF,(@Ok,, e,l12+g 14- 11 < 4 < Q, 1
a.s., for all t, where S > 0, F,(8) is defined in (5.5.2), and MF is a positive constant.
(iii) There exists a k X k fixed matrix B(6) such that B(8) is continuous at 8 O ,
plim [ 0.5 aeaet - B(B)] = 0 uniformly on S , n+w
plim n-'P:,(8')F,,(8') = n+- lim n-'E{F~,(8°)F,,(80)} = B(8') , n - w
and B(8') is positive definite.
Let
Then
n
s2 = (n - & ) - I x [y , -fix,; bn)12. 1-1
(5.5.10)
(5.5.1 1)
and
(b) s2 converges to Q' in probability.
Proof. Because the estimators et lie inside the convex, $ompact neighborhood S of do with probability approaching one as n increases, 8, satisfies
with probability approaching one as n increases. We can expand the first derivative
258 SOME LARGE SAMPLE THEORY
in a Taylor series about 8’ to obtain
(5.5.12)
for 4 in S, where & is on the lint? segment joining 4 and 8’. By assumption iii and the consistency of 4,
Now
where en = (el, e,, . . . ,en)’ . Therefore
d- eo = ~ - ~ ( t ~ ’ ) n - ’ F ~ , ( e ’ ) e ~ + r, , where rn is of smaller order in probability than d- 8’.
To prove that
n-1’2F:k(@o)en ”, “0, B(eo)a2]
it is enough to show that for any k-dimensional vector A, AZO,
(5.5.13)
(5.5.14)
(5.5.15)
k
A’[n-1’2F:k(eo)en] = t = l 2 n-’’’[X - 5 1 A,f;”(S’)e,]
converges in law to a univariate normal distribution. We prove this by showing that the conditions of Theorem 5.3.4 hold. Let
ESTIMATION FOR NONLINEAR MODELS 259
and s,, 2 = E{V:,} = ~ZA’[E{n-’F~~(Bo)F,~(Bo)}]A.
By assumption iii
u-~(V: , - sin) = A’[n-’F:,(eo)F,(eO)]A - A‘[E(n-’F&(8°)F,k(80)}]A
converges to zero in probability,
s;, + a2A‘B(tl0)A > 0, and
Hence, con4 ion i of Theorem 5.3.4 is satisfied. Now for any suyitmy e > 0, n
s;: c Erz,t,r(lz,.l~ eS,,)} j = I
n
-8 - (2+S) - (8 /2 ) 6 6 s,, n M,,
where I(A) is the indicator function for the set A. Hence, condition ii of Thearem 5.3.4 is satisfied and
n-l’zF:k(Oo)enL “0, B(Oo)az] . (5.5.16)
By (5.5.12)-(53.14). using Corollary 5.2.6.2, we have
n112($ - e o ) - 5 “0, ~ - ‘ ( e ~ ) ( r ~ l . To obtain the limit of sz, we note that
260 SOME LARGe SAMPLE THEORY
where 6 is on the line segment joining & and 9' and we have used (5.5.15). Therefore
By assumption ii, the e, have bounded 2 + S conditional moment. Therefore, by Corollary 5,3,5,
n 2 ~ i m (n - k)-' 2 e: = cr almost surely.
t -1 ti--tm
Conclusion b follows by (5.5.17). A
Because s2 is converging to cr2 and n-'Fr:k(d)Fnk(d) is converging to B(9*), we can use
as an estimator of the variance of 4. The statistics computed as the usual "t-statistics" of regression will be distributed as N(0,l) random variables in the limit.
There ~IE situations in which the estimator dm may have a limiting distribution that is not normal. Also, for some functions Q,(6), the covariance matrix of the limiting distribution may not take the simple form of (5.5.11). Some of these situations are covered by Theorem 5.5.2.
Theorem 5.52. Suppose that:
(i) The assumptions (5.5.6) and (5.5.7) of Lemma 5.5.2 hold. (ii) The first and second derivatives of Qm(9) with respect to 9 are continuous
for all R on a convex, compact neighborhood S containing 8' as an interior point.
(iii) n"' dQn(e")la9~2X, where X represents the limiting distribution. (iv) There exists a k X k fixed matrix function B(f3) such that B(f3) is
continuous on S, B(f3') is a symmetric, positive definite matrix, and
n--tm iim [ a2Q,0 aeaer - 2B(8)] = 0
almost surely, uniformly on S.
Let 4 be a measurable function that minimizes the objective function Q,(9). Then
n1"(4 - 6 O ) Y ' - B-'(e')X I
ESTIMATION FOR NONLINEAR MODELS 261
Proop. By assumption i, the estimator 4 converges to 8' almost surely, and 8, lies inside the convex, compact neighborhood S of 8' for large n, almost surely. Therefore, we can expand the partial derivative of Q,(&) with respect to 8 in a Taylor series about 8', for large n, almost surely. For #n on the line segment joining 9 and 8' we have
(5.5.18)
for large n, almost surely. By assumption ivy the second parhal derivative of Q,(6) evaluated at 4 is converging to 2B(eo) 8,s. Therefore, using assumption iii,
A (5,5.18), and Corollary 5.2.6.2, we have the conclusion.
If, in Theorem 5.5.2, X - N[0,2xx], then
nl/*($ - e 0 ) 5 N[O,B-~~~~)X,B-~(~~)].
If X, =B(f9°)aZ, we obtain the result of Theorem 5.5.1 that the limiting distribution is "0, B-'(8')a2].
In Theorem 5.5.1 and Theorem 5.5.2, it is n1'2(8 - 6') that converges in distribution to a nondegenerate random variable. In assumption iii of Theorem 5.5.1, the matrix of derivatives divided by n is assumed to converge to a nonsingular matrix. The normalization of the sums of squares by n and the corresponding normalization of 8- 8' with n112 is not always appropriate.
In time series analyses, it sometimes happens that the sum of squares of the derivative with respect to one parameter increases at a faster rate than the derivative with respect to a second parameter. Consider the simple model
with vector of derivatives F(x,; #3) = [ I , t ] . It follows that the sum of squares of the derivative with respect to p0 increases at the rate n, while the sum of squares of the derivative with respect to p1 increases at the rate n3. %refore, Theorem 5.5.1 could not be used to obtain the limiting distribution of the least squares estimator of the parameter vector. Following Sarkar (1990), we now give a theorem, Theorem 5.5.3, that is applicable to situations in which the sums of squares of the first derivatives increase at different rates. Standard results are obtained by setting M, and a, of the theorem equal to n"'I and n1l2, respectively.
The assumptions of Theorem 5.5.3 are local in that assumption d is a local pmpeay of Qn(8) in the vicinity of 8'. This assumption permits the existence of other local minima. Thus, the conclusion of Theorem 5.5.3 is local in that the sequence of estimators associated with the local region of assumption d are consistent for 6'.
262 SOME LARGE S A W THEORY
We assume that an estimator of 6 is constructed by minimizing the function Q,,(6). Let the vector of first partial derivatives be
and the matrix of second partial derivatives be
If the miaimum of Q,,(6) occurs in the interior of the parameter space, then the 6 that minimizes QJ6) is a solution of the system
u,(e) = 0 . (5.5.19)
heo or em 55.3. Suppose the true value 6’ is an interior point of the parameter space 8. Assume that there is a sequence of square matrices {M,} and a sequence of real numbers {a,,} such that
(a) limn+- M,’Bn(6’)M~” = B a.s., where B is a k X k symmetric, positive
(b) limn+ a,’M,’U,(e’) = 0 B.S.
(d) There exists a S > 0 and random variables L,, such that for all n, all 6 in
definite matrix, as.
(c) limn+ an = 00.
Snb, and all 4, 0 < 4 < 6,
I ~ ~ ’ B n ( 6 ) M ~ ” - M,’B,,(B’)M,’’I/ d Ln& a.s.,
and
lim sup L,, < w as., n-tm
where
sna = {e E 8: 1l.i .I M:V - eo)>ll < 4)
and for any k X r matrix C, l[Cl12 = tr CC‘ denotes the sum of squares of the elements of C.
Then there exists a sequence of roots of (5.5.19), denoted by {e}, such that
lim /b;’M;(# - 6’)1l= 0 IS. n 3 m
Proof. Given e > 0 and 0 < S, S S, it is possible to define a set A. an no, and positive constants K,, K,, and K3 such that P{A} 3 1 - e and for all o in A and alt
ESTIMATION FOR NONLINBAR MODELS
n>no,
263
y,,,[M~'Bn(8')M,"1 zs 3K2 , (5.5.21)
(5.5.22) I~,'B,(fl')M,'' - BlJ2 < K2 , Ibf,'[B,(@) - B , ( ~ o ) l M ~ " ~ ~ < K 3 ~ o ,
and L,(o) =s K3 for all 8 in Snw by assumptions c, a, a, b, and d, respectively, where y,,,(C) is the smallest root of the matrix C. Let 8; = min(8,. K i ' K , ) . Then, for o in A, 8 in S,,,;, and n >no, we have
Ilanll< KzSX (5.5.23)
and
where Ib,,ll= Ip.5a;'M, 'U,(@')>l/. Let the Taylor expansion of Q,(8) about 8' be
Q,(e) = Q,,(e0) + os(e - eoyu,(eo) + (e - so)tB,(8,,)(e - eo) = Q,,(eo) + os(e - e0yu,(eo) + (e - e0)~~,(8,,) - ~,(e~)l(e - eo)
+ (e - eo)'B,(eo)(e - 8') , (5.5.25) *
where 8, lies on the line segment joining 8 and 8'. Using (5.5.24).
(8 - 8")'M,,{M, ' [B,(i,> - B,(eo)1M,"}M:(6 - 8') Iw#l- 8')1I2K2 (5.5.26)
for all 8 in Snb;. By (5.5.21),
(e - eo)'B,(eoxe - 8') a ~ K J M ~ - eo)>(12 . (5.5.27)
Let R(S,,;) be the boundary of SnS;,
From the expansion (5.5.25), using (5.5.26), (5.5.27), and (5.5.20),
Q,(@ - Q,,(Oo) 3 IIM;(8 - 8°)1)2[2K2 + (e - @')'IlM;(@ - ~0)I/-2a,M,,d,l (5.5.28)
264
and
SOME LARGE SAMPLE THEORY
and
Q,(@ - Q,(eo) a S,*2KIK2
for all w in A, all n > no, and all B in R(Sn8;). It follows that Q,(@ must attain a minimum in the interior of Sns;, at which point the system (5.5.19) is satisfied. Therefore,
P(1/u,'ML( 8, - e0]>ll< S o } * r 1 - E for n >no .
Now, let A,, n,, and S,* denote A, no, and S,* associated with 4 = 2-I. Since 4 is arbitrary, we can choose the S: to be a sequence decreasing to zero and take n, to be an increasing sequence. Define A = lim inf A,. Then P(A) = 1. Given 7 > 0 and a fixed w in A, there exists a j,, such that w is in A) for all j > jo. Choose jl >jo such that s/, < q. Then, for j > j , and n > n,, we have
lb;1M;t4 - @"I1 c s, <7* A
Corollary 5.53.1. Let assumptions a through d of Theorem 5.5.3 hold in probability instead of almost surely. Then there exists a sequence of roots of (5.5.19), ¬ed by {8,}, such that
plim i[a,'ML( 8, - Oo)>l[ = 0. n-P
Proof. omitted. A
If the roots of U;'M,M; are bounded away from zero, then 8, is consistent for 8'. If 8, is consistent, and if the properly normalized first and second derivatives converge in distribution, then the estimator of 8 converges in distribution.
Corollpry 5.53.2. Let assumptions a through d of Theorem 5.5.3 hold in probability. In addition, assume 8, is consistent for B0 and
(e) {M,'U,(8°), M;'B,(OO)M;' '}z (X, B) as n +-J. Then
ML(@-t?o)-% -B-'X as n + = ,
ESTIMATION FOR NONLINEAR MODELS 265
where B is defined in assumption a and Un(flo) is the vector of first partial derivatives evaluated at 8 = 8'.
M. Because @ is converging to 8' in probability, there is a compact set S containing 8' as ao interior point such that, given E > 0, the probability that @, is in S is greater than 1 - e for n greater than some N,. For 8, in S, we expand UJ8) in a first order Taylor series about 8' and multiply the Taylor series expansion by Mi' to obtain
0 = M,'U,( 6) = M,'U,(6') + M,'B,( i&)Mi''M;( 8. - 8').
where 8" is an intermediate point between 8, and 8'. Note that
M;'Bn(e)M;'' =M,'Bn(6')M~'' + M;'[B,,(@) - B,(6')]M,''
and the conclusion follows by assumptions a, d, and e. A
By Theorem 5.5.3, there is a sequence of roots of (5.5.19) that converges to 8'. Hence, the assumption of the corollary that 6 is consistent is equivalent to assuming that we are considering the consistent sequence of roots.
Example 5.5.1. Consider the model
(5.5.30)
where e, - NI(0, a'), PI # 0, and
if t is odd, ''1 ={!I if t is even,
Assume that a sample of size n is available and that an estimator of (A, P,)' = f l is constructed as the (PO, P, ) that minimizes
266 SOME LAROE SAMPLE mmr
is singular if so # 0 and PI # 0. We now show how we can apply Theorem 5.5.3 to this problem. Let
=( 1 -p")("' o)-'" o p ; 0 1
Setting a, = n1I2, we have n
The second derivatives of Q,,(/3) are continuous with continuous derivatives, and hence condition d of Theorem 5.5.3 also holds. It follows that the estimator of /3 is consistent for fi. Using Theorem 5.3.4, one can show that
M , ' U , ( P 0 ) A N O , Vaa)
where Vaa is tbe matrix of (5.5.31) multiplied by a'.
covariance matrix is to reparametrize the model. Assume that /3, #O. Let An alternative method of defining a sequence of estimators with a mnsingular
(% a,) = (8081r PI) 9
(A, P,) =(a;'%, a,).
from which
and
Y, =a;'% + apI1 + q,t + e l .
The vector of first derivatives of the a-form of the mode1 is
It follows that
ESTIMATION FOR NONLlNI3AR MODELS 267
where
and H, = diag(n3, n). It can be shown that n
Hi”‘ Gf(a)e,& “0, Aa21 . I= I
Thus, the least squares estimator of f l is consistent and there is a normalizer that gives a limiting n o d distribution. However, the two nonnalizers that we used are functions of the true, unknown parameters. The practitioner prefers a distribution based on a nonnalizer that is a function of the sample.
It is natural to consider a nonnalizer derived from the derivatives evaluated at the least squares estimator. For the model in the a-form, let
n
K, = 2 Gi(P)G,(&). 1=1
One can show that
and hence
Ki’”(&- a)* N(0,Io’).
It follows that one can use the derivatives output by most nonlinear least s q m programs to construct approximate tests and confidence intervals for the parame- ters of the model. AA
Example 5.53. As an example where Theorem 5.5.3 does not hold, consider the model
u, = /3 + p2t + e, ,
where F(x,; /3) = (1 + 2Sr) and e, - NI(0, a’). Assume that the model is to be estimated by least squares by choosing p to minimize
Then
268
and
SOME LARGE SAMPLE THEORY
If Po = 0,
n
Bn(/3O) = 1 + 2n-' c te, . The quantity n-' Xr=, re, does not converge to zero in probability. If one divides Bn(p0) by n'", the normalized quantity converges in distribution, but not in probability. If one divides Bn(/3') by a quantity increasing at a rate faster than n"', the Limit is zero. merefore, the m ~ d e ~ with P'=O does not satisfy assumption a of Theorem 5.5.3. However, one can use Lemma 5.5.1 to demon- strate that the k s t squares estimator is a consistent estimator of /3' when /3O = 0. AA
5 . 5 1 one-step Estimation
An important special case of nonlinear estimation occurs when one is able to obtain an initial consistent estimator of the parameter vector 8'. In such cases, asymptotic results can be obtained for a single step or for a finite number of steps of a sequential minimization procedure.
We retain the model (5.5.1), which we write as
u, =AX,; eo) + e, , r = I , 2,. . . , (5.5.32)
where the e, are iid(0, a') random variables or are (0, P') martingale differences satisfying condition ii of Theorem 5.5.1. Let d be an initial consistent estiytor of the unknown true value 8'. A Taylor's series expansion of AxI; 0') about B gives
(5.5.33) k
fix,; eo) =Ax,; 8) + 2 f ( j ) (x , ; &e," - e,> + d,(x,; 8) , j - I
where 0; and 4 are thejth elements of 8' and d, respectively,
* and 0, is on the line segment joining d and 8'. On the basis of equation (5.5.33), we consider the modified sum of squares
= n-'[w - F(&]'[w - F(B)S] ,
ESTIMATION FOR NONLlNEAR MODELS 269
where 6 = 6 - 6, w is the n X 1 vector with rth element given by
w, = y, -fix,; a), and F(6) is the n X k matrix with tth row F,(6) defined in (5.5.2). Minimizing &ij> with respect to s leads to
I = & + # (5.5.34)
as an estimator of 6', where
f = IF'( a)F( &]-IF'( 6)w.
We call 8 the one-step Gauss-Newton estimator.
Theorem 55.4. Assume that the nonlinear model (5.5.32) holds. Let 4 be the one-step Gauss-Newton estimator defined in (5.5.34). Assume:
(1) There is an open set S such that S is in 63, 8'ES, and
piim ~-'F'(~)F(B) = ~ ( 6 ) n - w
is nonsingular for all 8 in S, where F(6) is the n X k matrix with tjth element given by f(j)(x,; 6).
(2) plim,,, n-'G'(@)G(6) = L(6) uniformly in 6 on the closure s of S. where the elements of L(6) are continuous functions of 6 on 3, and G(8) is an n X (1 -k k + k2 + k3) matrix with ?th row given by
Then
8 - 6' = ~'(Bo)F(6')~-'F'(8')e + O,(max{a,?, ~ , n - " ~ } ) ,
Furthemre, if n-"* X:,, F#3°)e,%V[Q,B(60)u2] and a: = o(n-"'), then
n"'( I - @')A "0, B-'(~')CT~] .
Proof. Because 6 - 8' is O,,(a,), given q, > 0, one can choose No such that
270 SOME LARGE SAMPLE THEORY
the probability is greater than 1 - q, that d is in S for all n >No. On the basis of a T ~ Y I O ~ expansion, we can write, for d in S,
8 = [F'(d)F( &]-IF( e)[f(O0) - f( d) + el = [Ff(d)F(d)]-'F'(d)[F(d)do +el
+ [Ff(d)F(d)]-'R(h),
where f(8) is the n X 1 vector with tth elementflx,; 8), e is the n X 1 vector with tth element e,, 6' = 8' - d, and the jth element of n-'R(d) is
I
r = l s = l r = 1
in which 8 is on the line segment joining d and 8'. The elements of L(8) are bounded on $, and, given el >0, there exists an Nl such that the probability is greater than 1 - B, that the elements of
differ from the respective elements of L(8) by less than el for all 8 € $ and al l n > Nl . Therefore,
and
n-'R(d) = O,,(u;). (5.5.35)
Because B(8) is nonsingular on S, there is an N2 such that Ff(8)F(8) is nonsingular with probability greater than 1 - 9 for all 8 in S and all n >N2. It follows that
and
plim[n-'Ff(d)F(d)]-' = B-'(e0),
where thejsth element of B(8O) is
ESTIMATION FOR NONLINEAR MODELS 271
Therefore,
wf(d)F(d)]-'R(d) = U,(a:).
The jth element of n-'F'( d)e is
where Bt is on the line segment joining d and 8'. By assumption 2, n
n-' 2 p s r f ( l ( t l ; e+) - f (Jsr) (xf ; e 0 ) l 2 L o 1-1
as n + 00. "herefore,
and
The Fesults foIlow from (5.5.35) and (5.5.36). A
To estimate the variance of the limiting distribution of n"2( # - e0), we must estimate B-'(e0) and u2. By the assumptions, [n-'F'(d)F(d)]-' and [n-'F'(a)F( #)I-' are consistent estimators for B-'(9'). Also
n
s2 = (n - & ) - I c [y, -four; 6)12 f = = l
(5.5.37)
is a consistent estimator of a'. Hence, using the matrix [F'(d)F(&)]-' and the mean square s2, all of the standard linear regression theory holds approximately for d.
The theorem demonstrates that the one-step estimator is asymptotically unchanged by additional iteration if 6 - 8' = ~ , , ( n - " ~ ) . For small samples, we may choose to iterate the procedure. For a particular sample we are not guaranteed that iteration of (5.5.341, using B of the previous step as the initid estimator, will
272 SOME LARGE SAMPLE THEORY
converge. Therefore, if one iterates, the estimator 8 should be replaced at each step bY
e , = $ + v &
where d is the estimator of the previous step and vE(0 , I] is chosen so that n-' Xy,, [Y, - f ix,; @)I2 is less than n-' Ey=, [Y, -fix,; $)I2, and so that €8. This iteration fumishes a method of obtaining the least squares estimator that minimizes (5.5.3).
Example 5.53. To illustrate the Gauss-Newton procedure, consider the
y , = e , + e , X , , + c I ~ x , , + ~ , , (5.5.38)
where the e, are normal independent ( 0 , ~ ~ ) random variables. While the superscript 0 was used on 8 in our derivation to identify the true parameter value, it is not a common practice to use that notation when discussing a particular application of the procedure. Observations generakd by this model are given in Table 5.5.1. To obtain initial estimators of e,, and e,, we ignore the nonlinear restriction and regress Y, on x,, =I 1, x,, , and x , ~ . This gives the regression equation
model
= 0.877 + 1 . 2 6 2 ~ ~ ~ + 1 . 1 5 0 ~ ~ ~ .
Assuming that n-' Z$,, n-' Z&, and n-' Z:X,~X,~ converge to form a positive definite matrix, the coefficients for n,, and x,' are estimators for 0, and O,, respectively, with emrs O,,(n-'"). We note that (1.150)"* is also a consistent estimate of 0'. Using 0.877 and 1.262 as initial estimators, we compute
W, = U, - 0.877 - 1.262,' - 1.593%,, . The rows of F(d) are given by
Ff,(d)=(l,nfl +24&, r = 1,2 ,...,, 6 .
Table 5.5.1. Data and Regression Varioblw Used in the EstImatiOn of the ParameLers of the Model (5.53)
1 9 1 2 4 -0.777 12.100 2 19 1 7 8 -3.464 27.199 3 11 1 1 9 -5.484 23.724 4 14 1 3 7 -1.821 20.674 5 9 1 7 0 -0.714 7.000 6 3 1 0 2 -1.065 5.050
INSTRUMENTAL VARIABLES 273
Regressing w, on x , ~ and x , ~ + 28,xt2, we obtain
g=( 0.386) -0.163 ' 0.877) + ( 0.386) = (1.263)
e=(1.262 -0.163 1.099
as the one-step Gauss-Newton estimate. Our computations illustrate the nature of the derivatives that enter the
covariance matrix of the asymptotic distribution. In practice, we would use one of the several nonlinear least squares programs to estimate the parameters. These programs often have the option of specifying the initial values for the iteration or permitting the program to use a search technique. In this example, we have excellent start values. If we use the start values (0.877, 1.262), the nonlinear least squares estimate is a= (1.2413,1.0913)', the estimated covariance matrix is
q d l = ( 1.3756 -0.0763 ) -0.0763 0.00054 '
and the residual mean square is 1.7346. We used procedure "LEN of SAS@ [SAS (1989)l for the computations. AA
5.6. INSTRUMENTAL VARIABLES
In many applications, estimators of the parameters of an equation of the regression type are desired, but the classical assumption that the matrix of explanatory variables is fixed is violated. Some of the columns of the matrix of explanatory variables may be measured with error and (or) may be generated by a stochastic mechanism such that the assumption that the enor in the equation is independent of the explanatoq variables becomes suspect.
Let us assume that we have the model
y = Q,@ + XA + z , (5.6.1)
where @ is a k, X 1 vector of unknown parameters, A is a k, X 1 vector of unknown parameters, y is an n X 1 vector, Q, is an n X k , matrlx, X is an R X k, matrix, and z is an n X 1 vector of unknown random variables with zero mean, The matrix @ is fixed, but the elements of X may umtain a random component that is correlated with 2.
Estimators obtained by ordinary least squares may be seriously biased because of the correlation between z and X. If information is available on variables that do not enter the equation, it may be possible to use these variables to obtain consistent estimators of the parameters. Such variables are called instnunenral variables. The instrumental variables must be correlated with the variables entering the matrix X but not with the random components of the model. Assume that we have observations on k, instrumental variables, k, 3 k,. We denote the n X k, matrix of
274 SOMB LARGE SAMPLE THEORY
observations on the instrumental variables by + and assume the elements of + are fixed. We express X as a linear combination of Cp and #:
X=@S, +w+w = (@ : #)S + W , (5 6 2 )
where
Note that the residuals w follow from the definition of 6. Therefore, w may be a sum of random and fixed components, but the fixed component is orthogonal to @ and + by construction,
The instrumental variable estimators we consider am obtained by regmssing X on @ and +, computing the estimated values 2 from this regression, and hen replacing X by 2 in the regression equation
y = @fl f XA + z .
The instrumental variable estimators of fl and A are given by the regression of y on # and t:
(5.6.3)
where
fit =(a: #)[(a: qb)'(Cp: +)]-'(a: #)'X. These estimators are called two-stage least squm estimators in the econa-
metria literature. An alternative instrumental variable estimator is the limited information maximum likelihood estimator. See Johnston (1984) and Fuller (1987). To investigate the properties of estimator (5.6.3), we mum:
1. Q, = DLi(@: #)'(a: +)D;,' is a nonsingular matrix for all n > k, +k,, and limn+ Q, = Q, where Q is nonsingular and D,, is a diagonal matrix whose elements are the square roots of the diagonal elements of (@: *)'(a: #).
2. M, is nonsinguiar for all n > k, + k,, and
lim M, = M, n-v-
INSTRUMENTAL VARIABLES 275
where M is a positive definite matrix,
M, = D;,'(@ : X)'(@ : S)Dii , x = (cp: #)6 ,
and D,, is a diagonal matrix whose elements are the square roots of the diagonal elements of (a : X)'(@ : X).
3. limn+., R,, = R, where R is finite and
R, = E{Di, [a : %]'zz'[@ : %ID,'} .
4. (a) limn+, B, = B, where B is finite and
B, = E{Di,(@: #)') 'zz'(cP: #)D;i}
(b) limn+ G,,j = G,,, i , j = 1.2,. . . , k, + k,, where G, is finite,
G,, = E(D;i[a: #]'w,w>[cP: q%JD;,'},
and w,, is the ith column of the matrix w.
5. (a) limn+ dj,,, = 00, j = 1,2, i = 1,2, . . . , k, + k4-,, where d,,,, is the ith diagonal element of Din.
(b) !l(z pi) - I P P , " ~ = 0. i = 1,2, . . . , k, ,
lim 1,4$ +:=O, j = 1 , 2 ,..., k,, ) - I
where qi is the tith element of @ and #j is the tjth element of #.
Theorem 5.6.1. Let the model of (5.6.1) to (5.6.2) and assumptions 1 through 4 and 5(a) hold. Then
Dzn( 6 - 8) = Mi'Dk'(cP : X)'Z + op( 1) ,
where
e= [(a:a)'(~:a),-I(~:az>'y, a = (@: dr)&
and
276 SOME LARGE SAMPLE THEORY
proof: Define M, to be the matrix M,, with X repiaced by 2. By assumption 4, the variance of D1,( 8 - 6 ) is of order one, and by assumption 3, the variance of D;,'(@ : X)'z is of order one. Thus,
Similarly,
Using
and
we have
DJd- @)=fi,'D~~(@:%)'z=Pkf;' +o,,(l)lD~~(@:%)'z + op(l)
= M; 'D&O : it)'^ + oP(i) . A
Since limn+ M, = M, we can also write
D2,,( 8 - 8) = M-'D&a : X)'Z + OF( 1). (5.6.4)
In many applications n-"*D,, and n-1'2D2,, have finite nonsingular limits. In these cases n'" can be used as the normalizing factor, and the reminder in (5.6.4)
It follows from Theorem 5.6.1 that the sampling behavior of D2,,(&- 0) is approximately that of M,'Di:(cp:x)'~, which has variance M;'R,,M,', where R, was defined in assumption 3. In some situations it is reasonable to assume the elements of z are independent (0, a:) random variables.
is o,(n-"2).
Corollary 5.41. Let the mudel (5.6.1) to (5.6.2) and assumptions 1 to 5 hold with the elements of z independently distributed (0,~:) random variables with
JNSTRUMENTAL VARIABLES 277
E{z~} = qu:. Then
Furthermore, a consistent estimator for at is
1 n-k, -k2 2'2, 2 s, = (5.6.5)
where 2 = y - (a : X)d.
Proof. A proof is not presented here. The normality result is a special case of Theorem 6.3.4 of Chapter 6. That s,' is a consistent estimator of a: can be
A demonstrated by the arguments of Theorem 9.8.3.
Our analysis treated and qb as fixed. Theorem 5.6.1 will hold for CP and (or) 9 random, provided the second moments exist, the probability limits analogous to the limits of assumptions 1, 2, and 5 exist, and the error terms in y and X are independent of Q, and +.
A discussion of instnunental variables particularly applicable when k, is larger than k, is given in Sargan (1958). A model where the method of instrumental variables is appropriate will be discussed in Chapter 9.
Example 5.6.1. To illustrate the method of instrumental variables, we use some data collected in an animal feediig experiment. Twenty-four lots of pigs were fed six different rations characterized by the percentage of protein in the ration. The remainder of the ration was primarily carbohydrate from com. We simplify by calling this remainder corn. Twelve of the lots were weighed after two weeks, and twelve after four weeks. The logarithms of the gain and of the feed consumed are given in Table 5.6.1. We consider the model
Gi =P, +&Pi +&Ci + Z,, i = 1.2,. . . ,24,
where Gj is the logarithm of gain, Ci is the logarithm of corn consumed, Pi is the logarithm of protein consumed, and 2, is the random error for the ith lot.
It is clear that neither corn nor protein, but instead their ratio, is fixed by the experimental design. The observations on corn and protein for a particular ration are constrained to lie on a ray through the origin with slope corresponding to the ratio of the percentages of the two items in the ration. The logarithms of these observations will lie on parallel lines. Since Cj - PI is fixed, we rewrite the model as
Gi = + (pi + &>f'I + &(C, - Pi> + 2, - In terms of the notation of (5.6.1), Gi = and PI = X i . Candidates for & are functions of C, - P I and of time on feed. As one simple model for the protein
278 SOME LARGE SAMPLE THBORY
Tnbk 5.6.1. Gain and Feed C~nsumed by 24 Lots d Plga
Lot on Feed Gain Corn Rotein Time Log Log
i Weeks G C P P 2
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
2 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4
4.477 4.564 4.673 4.736 4.718 4.868 4.754 4.844 4.836 4.828 4.745 4.852 5.384 5.493 5.513 5.583 5.545 5.613 5.687 5.591 5.591 5.700 5.700 5.656
5.366 5.488 5.462 5.598 5.521 5.580 5.516 5.556 5.470 5.463 5.392 5.457 6.300 6.386 6.350 6.380 6.314 6.368 6.391 6.356 6.288 6.368 6.355 6.332
3.465 3.587 3.647 3.783 3.787 3.846 3.858 3.898 3.884 3.877 3.876 3.941 4.399 4.485 4.535 4.565 4580 4.634 4.733 4.698 4.702 4.782 4.839 4.816
3.601 3.601 3.682 3.682 3.757 3.757 3.828 3.828 3.895 3.895 3.961 3.961 4.453 4.453 4.537 4.537 4.616 4.616 4.690 4.690 4.760 4.760 4.828 4.828
-0.008 -0.042
0.035 -0.036 -0.033
0.059 -0.043 0.007 0.035 0.034
-0.026 0.017
-0.021 0.003 0.001 0.04 1 0.01 3 0.028 0.028
-0.033 -0.015
0.015 -0.019 -0.041
Source: Data courtesy of Research Department, Moorman Manufacturing Company. The data are a portion of a larger experiment conducted by the Moorman Manufacturing Company in 1974.
consumption we suggest
P, = 8, + S,(C, - Pi) + &ti + 4(ti - 3#Ci - P i ) + W, ,
where ti is the time on feed of the ith lot. The ordinary least squares estimate of the equation is
gi = 4.45 - 0.95 (Ci -Pi) + 0.46ti - O.O2(t, - 3#C, - P i ) , (0.48) (0.09) (0.15) (0.09)
where the numbers in parentheses are the estimated standard emrs of the regression coefficients. If the r;C: are independent (0, a’) random variables, the usual regression assumptions are satisfied. The interaction term contributes very little to the regression, but the time coefficient is highly significant. This suppolts
ESTiUATED GENERALIZED L U S T SQUARES 279
assumption 2 because it suggests that the partial correlation between P, and ti after adjusting for C, - P, is not zefo. This, in turn, implies that the matrix M, is nonsingular. The hmlues for this regression are given in Table 5.6.1. Regressing G, on @, and C, - P,, we obtain
Gj = 0.49 + 0.98Pi + 0.31(Ci - Pi ) . In this problem it is reasonable to treat the 2 ’ s as independent random
variables. We also assume that they have common variance. The estimated residuals are shown in the last column of Table 5.6.1, These must be computed directly as
Gi - 0.49 - 0.98Pi - 0.31(Ci - Pi ) . The residuals obtained in the second round regression computations are Gi - 0.49 - 0.98@, - 0.31(C, - P i ) and are inappropriate for the construction of variance estimates, From the 2’s we obtain
24
$2 = (21)-’ 2 2; = 0,0010. i s 1
The inverse of the matrix used in computing the estimates is
14.73 -1.32 -5.37 (-1.32 0.23 0.21),
-5.37 0.21 2.62
aud it follows that the estimated standard errors of the estimates are (0.121), (0.015), and (0.051), respectively. AA
5.7. ESTIMATED GENERALIZED LEAST SQUARES
In this section, we investigate estimation for models in which the covariance matrix of the error is estimated. Our treatment is restricted to linear models, but many of the results extend to nonlinear models.
Consider the linear model
Y=X#f+u, (5.7.1)
whereY i s a n n x l v e c t o r , X i s a n n X k m a t r i x , a n d # f i s t h e k X l vectorof unknown parameters. We assume
mu, uu’) t XI = (0, V,,) 7 (5.7.2)
where V,, is positive definite. In many of our applications, u’ = (u,, u2, . . . , u,) will be a portion of a realization of a time series. For example, the time series may be a pth order autoregressive process.
280 SOME LARGE SAMPLE THEORY
For known V,,, the generalized least squares estimator of f l is
j3 = (x'v,;'x)-'X'v,;'Y, (5.7.3)
where we have assumed V,, and X'Vi:X to be nonsingular. The conditional variance of the estimator is
V(j3 I X) = (x'v;;x)-' . (5.7,4)
Often V,, is not known. We are interested in the situation where an estimator of V,,, denoted by ?,,, is used to construct an estimator of fl. We define the estimated generalid least squares estimator by
@ = (x'6,-,'x)-'x'~~,'Y, (5.7.5)
where we assume ?,u and X'6iU'X are nonsjngular. The estimated generalized least squares estimator of f l is consistent for f l under mild conditions. In the theorem, we use a general nonnalizer matrix, Mi. A natural choice for Mi is G:" = (X'V,;'X)'/*.
Theorem 5.7.1. Let the model (5.7.1) and (5.7.2) hold. Assume there exists a sequence of estimators ?,, and a sequence of nonsingular matrices {M,} such that
M,'X'V,;'u = O J l ) , (5.7.7)
M,'X'(?;,' -V;,')U= O,(&), (5.7.9)
where &+ 0 as n 00. Then
Proop. By the assumption (5.7.8),
(5.7.10)
where
(G,,, a,,) = (X'V;u'X, X'?,;'X). (5.7.1 1)
ESTIMATED GENFJRALEED LEAST SQUARES 281
Therefore
M;(@ - 8) =MA(&;' - G;~)M,M;~xv,;'u
+- M$;'M,M;~x~(~;,' -v;,')a = OP(5,) * A
In working with models such as (5.7.1)-(5.7.2). it is often assumed that
P M i 'X'V,;"M,'' + A,, (5.7.12)
where A, is a fixed positive definite matrix. This assumption is sufficient for the condition (5.7.6). If there exists a sequence (M,} that satisfies the assumptions of Theorem 5.7.1, then any nonsingular H, such that G, = H,HA also will satisfy the assumptions because (5.7.6) implies that Hi'M, = OP(l).
Theorem 5.7.1 is given for a general normalizing matrix M,. If M, is chosen to be G:", where G, is defined in (5.7.11), then (5.7.6) and (5.7.7) follow directly.
Under the assumptions of Theorem 5.7.1, the limiting distribution of the normalized estimated generalized least squares estimator is the same as the limiting distribution of the normalized generalized least squares estimator con- structed with known V,,, provided the limiting distribution exists. Note that the estimator V,, can converge to v,, rather slowly.
Corollary 5.7.1.1. Let the assumptions (5.7.10)-(5.7.12) of Theorem 5.7.2 hold. In addition assume
M;'G,M;'~--~~A,, (5.7.13)
where A, is a fixed positive definite matrix, and
ML(@ - /3)2 N(0, A;'). (5.7.14)
Then
I M;(B - MO, A;') I (5.7.15)
and
where G, is defined in (5.7.11) and ,yz(k) is a chi-square random variable with k degrees of freedom.
Roof. The assumption (5.7.13) implies the assumption (5.7.6) of Theorem 5.7.1. Hence, (5.7.15) follows from the variance of the generalized least squares
202 SOME LARGE SAMPLE THEORY
estimator and (5.7.14). Also, (5.7.16) follows from
where we have used (5.7.13), (5.7.4), and the assumption (5.7.8). A
The conditions (5.7.6) and (5.7.7) of Theorem 5.7.1 hold if Mi is chosen equal to Gf”. However, in practice, one wishes a matrix ML that is a function of the data. In some situations, there is a known transfotmation such that the estimated parameters of the transformed model can be normalized with a diagonal matrix. The transformation can be a function of n. In Corollary 5.7.1.2, we demonstrate how the estimator in such a case can be normalized using sample statistics.
corollary 5.7.13. L,et the assumptions of Theorem 5.7.1 hold with M = D:”, where D,, = diagw’V,-,’X}. Also assume (5.7.14) holds with M, = D:”. Then
a:“(@ - B ) O ’ N(0, A;’),
where 8, = diag{X’viu1X}7 and
(5.7.17)
112x,0,1 xfi, 1 I 2 Where d = 6,
Proop. By assumption (5.7.8) of Theorem 5.7.1,
Given the Conditions of Theorem 5.7.1 and Corollary 5.7.1.1, the estimated variance of the estimated generalid least squares estimator can be used to construct pivotal statistics. We state the result in Corollary 5.7.1.3.
ESTlMATED G E N E R A L W LEAST SQUARES 283
Corollary 5.7.13. Let the assumptions of Theorem 5.7.1 and Corollary 5.7.1.1 hold, where {M,} is a sequence of fixed matrices. Then
6;; A:(B - /?)A N(0, I),
where that can depend on II.
= A$,'A,,, 4, is defined in (5.7.12)' and A, is a fixed nonzero vector
Proof. Under the assumptions, we show that
- I I - unA An(# - /?)z N(0, 1) *
Where
By Corollary 5.7.1.1, we have
By Skorohod's theorem [see Theorem 2.9.6 of Billingsley (197911, there exist random vectors Z, and Z on a common probability space such that 2, has the same distribution as L,, Z - N(0, A;'), and lim Z, = Z as. Therefore,
u,;'A~;''z, = U ; ; A ~ ; " Z + c;;A;M;"(z, - z).
Since t7;;A;M;"A;'Mi'An = 1 , we have
and
where g (A,) is the maximum eigenvalue of A,. Therefore, u;; A#f,"Z, and hence uiA AM;"Ln converges in distribution to the standard n o d distribution. See Sanger (1992, Section 7.1.3).
From the assumption (5.7.8) of Theorem 5.7.1 and the assumption (5.7.13) of Corollary 5.7.1.1,
"Y
M;%,M;" = M;'C,M;'' + up(s,) = A0 + op( l ) ,
284 SOME LARGE SAMPLE TweoRY
- - o#) 7
where S,,,,, is the maximum element of 4. Therefore &,; qA converges to one in A probability, and the result follows.
Under the conditions of Theorem 5.7.1, the asymptotic distribution of the estimated generalized least squares estimator is the same as that of the generalized least squares estimator, provided the limiting distribution of the generalized least squares estimator exists. The following theorem gives conditions such that the generalized least squares estimator is asymptotically n o d . To obtain a limiting n o d distribution, the transformed errors must satisfy a central limit theorem. In Theorem 5.7.2, we assume the transformed errors to be independent. For example, if {u,} is an autoregressive process with independent increments, then the assumption is satisfied.
Theorem 5.7.2. Assume the model (5.7.1) and (5.7.2) with fixed X. Assume there exists a sequence of fixed, nonsingular matrices {M,,} such that
lim M,'X'V;:XM,'' =A,, , (5.7.19)
where A,, is positive definite. Assume there exists a sequence of nonsingular transformations pn} such that the elements of e = T,a are independent with zero expectation, variance one, and
n-m
E{Ietn12+'I < K for some 6 > 0 and finite K. Also assume that
lim sup [(M;'x'T:),( = 0. n--)m I C j s n
1ei.Gp
Then
M : ( B - m S N(O,A,') 7
where & is the generalized least squares estimator defined in (5.7.3).
(5.7.20)
ESTIMATED GWERALIZED LEAST SQUARES 285
Proof. Consider the linear combination n
A'Ml'X'Tie = 2 c,"e,, , r - 1
Where
c, = I'M, 'X'T:,, ,
A is an arbitrary K-dimensionai vector with 0 < IAI < a, and T,,,, is the tth row of T,,. By construction, the variance of X:-l c,,e,,, denoted by V,, is
n n
Hence, V, Z:, is bounded. By the assumption (5.7.19), n
fim 2 c:,, =!& A'M;~X'T;T,XM;''A m+m ,- 1
= A'A,A > 0.
Let hi, be the ijth element of M;'x'T;. n u s ,
which implies that
By the assumption (5.7.20) and (5.7.21),
lim v,' sup c,, 2 = 0 . 16r6n n - w
Hence, by Corollary
Since A is arbitrary,
5.3.4, n
P v,"~ C c,,,e,,+ N(O, 1). f - 1
by the assumption (5.7.19) we have
M,'X'Vi:u< N(0, A,),
(5.7.21)
and we obtain the conclusion. A
To this point, no structure has been imposed on V,, other than that it must be
286 SOME LAROE SAMPL.E THEORY
positive definite. The number of unknown parameters in VuM could conceivably grow with the sample size n. A model of practical importance is
E{u IX} = 0
E{UU' I x} =v,, =vu,(eo), (5.7.22)
where 8 is an 1 X 1 vector of unknown parameters, 1 is fixed, and 8' is the true value. The parameter space for 8 is 8. It is assumed that the form of the function V,,(@) is known and that V,,(8) is a continuous function of t l If, for exampk, the time series is known to be a pth order autoregressive process, the vector 8 will contain the parameters of the autoregressive process. We are interested in the situation in which an estimator of 4 denoted by 4, is used to construct an estimator of V,,(e), denoted by g,, =V,,(d).
The following theorem gives sufficient conditions for the estimated generalized least squares estimator to have the same asymptotic distribution as the generalized least squares estimator for general n d i z i n g matrices under the model (5.7.1) and (5.7.22).
Theorem 5.73. Let the model (5.7.1) and (5.7.22) hold. Let 8 = (el, e,, . . . ,q)' and let
be continuous in 8. Let B be defined by (5.7.5). Assume there exists a sequence of nonsingular matrices {M,} such that
M;G,'M,, = OJI), (5.7.23)
M,'X'V;,'(B')u = Op(l) , (5.7.24)
- 1 0 where G, = X'V,, (6 )X. Also assume
M,'X'Bn,(B)XM,'' = 0,,(1), (5.7.25)
M,'X'BRi(t9)u = Op( 1). (5.7.26)
for i = 1,2,. . . , I, uniformly in an o n neighborhood of 8', denoted by C(8'). Let an estimator of O0, denoted by rbe available, and assume
6- eo = o,(h,,), (5.7.27)
where [,+O as n+w. Then
ESTIMATED GENERALIZED LEAST SQUARES 287
Proof* By a TayIor expansion
* where 8' is the true value of fl and Bni(t0, d) is a matrix whose j s element is the j s element of Bni(e) evaluated at a poiat on the line segment joining d and 0'. Let the probability space be (a, d, P). Let E > 0 be given. Choose S such that {e: }eo - el < S) c c(eo), where for any vector v, 1.1' = v'v. Since B converges in probability to 8', there exists an NI and a set D,," E d such that P(D1,,) > 1 - el2 and
forj=1,2 ,..., n, fors=l ,2 ,..., n,foralln>N,,andforallw€D,.,.Bythe assumption (5.7.25), there exists a K, an N2, and a set Dz,, E d with P(D,,,)> 1 - €12 such that '
for all R r N,, all w E D2,,. and all 8 E C(@'), where 1141 = [tr(H'H)J"2. Let D,, = D1,, n D,,,, and observe that P(DJ > 1 - c. Let N = max(N,, N2). Therefore, foralln>Nandforall W E D , ,
Hence,
which implies the assumption (5.7.28) of Theorem 5.7.1. By a similar expansion,
M;'X'V;,'(b)u = M,'X'V;,'(6°)u I
+ 2 M;'X'Bn,(Oo, d)u(q - ep), i-1
* where Bni(O0. 8) is analogous to Bnr(8', 8). Using the assumption (5.7.26) and an argument similar to the one used to show (5.7.28).
M;lX'~,i(eo, $)u = O,,( 1).
Thus the assumption (5.7.9) of Theorem 5.7.1 is satisfied. Because the assump-
288 SOME LARGE SAMPLE THEORY
tions (5.7.23) and (5.7.24) are the same as the assumptions (5.7.6) and (5.7.7) of Theorem 5.7.1, the result follows. A
A sufficient condition for (5.7.23) is
M;~G,M;I~ 5 A,, (5.7.29)
where A, is a fixed positive definite matrix, and a sufficient condition for (5.7.25) is
M;~x'B,,(B"M,''~A,(~), i = 1,2,. . . , I ,
uniformly in a neighborhood of 8' as n 3 00, where the A,@) are continuous in 8. Under the assumptions of Theorern 5.7.3, the limiting distribution of the
normalid estimated generalized least squares estimator is the samc as the limiting distribution of the normalized generalized least squares estimator con- structed with known V,,, provided the limiting distribution exists. Note that the estimator of 6 can converge to 8' at a relatively slow rate.
A common procedure is to estimate f l by the ordinary least s q u m estimator,
/# = (x'x)-Ix'Y, (5.7.30)
compute residuals
ii=Y-XB, (5.7.31)
and use these residuals to estimate 8. Then the estimator of 6 is used in (5.75) to estimate fl. In mder for this procedure to be effective, the estimator of 8 based on U must be a consistent estimator of 6', and this, in turn, requires 8 of (5.7.30) to be a consistent estimator of p. The following lemma gives sufficient conditions for the estimator of 6' based on the ordinary least squares residuals to be consistent.
Lemma 5.7.1. Consider the model (5.7.1). Let d = &u) be an estimator of 6' based upon the true u, and let
* u = [I - cx(x'X)-'x']u, where 0 6 c G 1. Assume that &u) is a continuous function of u with a continuous first derivative, and that
- 6, = 0,Cf"). (5.7.32)
where f' 4 0 as n 3 00. Also assume
(5.7.33)
ESTIMATED GENERALIZED LEAST SQUARES 289
forj = 1,2, . . . , l, uniformly in c for 0 d c d 1, where is defined in (5.7.30). Let 41) be an estimator of 8 constructed with the reaiduaI.9 defined in (5.7.31). Then
&U) = &u) + op(tn) = eo + o,,(~,,). Proof. By a Taylor expansion about u, we have
(5.7.34)
where 6 is on the line segment ‘oinin u and u, X i is the ith row of X, and ii, - u, = - X i @ - /3). Because u is of the form given in the assumptions, it d g follows from (5.7.33) that
&ii) = &I) + up(sn) = 8; + o,<f). A
Given that 4 is a consistent estimator of 8’, the difference between the estimator of I3 based upon 8 and the estimator of /? based on 8’ converges to zero.
Theorem 5.7.4. Let the assumptions (5.7.23)-(5.7.26) of Theorem 5.7.3 hold. In addition, assume (5.7.32),
M;(x’x)-’M, = op(i), (5.7.35)
E(M~’X’V,,(B’)XM~”) = 0(1), (5.7.36)
and
(5.7.37)
uniformly in c for O S c S 1 and j = 1,2,. . . ,1, where X, is the ith row of X, and 6 is defined in Lemma 5.7.1. Then
~ : r & &tz )) - 131 = w W 0 ) - 131 + oPc 6,) ,
(5.7.38)
230
then
SOME LARGE SAMPLE THEORY
Proof. Under the assumptions (5.7.36) and (5.7.22),
E { E ~ ~ ' X ' U U ' X M ~ ' ' IXJ} = E(M~'X'V,,(t9°)XM~''} = ql),
which implies that Mi'X'u = Op(l) and
ML(@ - @) = M~(X'X)-'M,,M,'X'U = Op( 1) . (5.7.40)
By (5.7.40) and the assumption (5.7.37).
(5.7.4 1 )
By (5.7.41). the assumption (5.7.32), and Lemma 5.7.1, we have &ii) = e0+ Op(&), and the assumption (5.7.27) of Theorem 5.7.3 holds. Also, the other assumptions of Theorem 5.7.3 hold for &ti). Therefore,
~i[B&ii)) - BI =M:[A~') - BI + op(Sn)
and (5.7.39) follows from the assumption (5.7.38). A
Under the conditions of Theorem 5.7.4, the procedure of using the ordinary least squares residuals to estimate the covariance matrix and then using the estimated covariance matrix in the estimated generalized least squares estimator produces an estimator of /3 with the same large sample properties as the estimator constmcted with known 8. Also
can be used as an estimator of the covariance matrix of the approximate. distribution of @.
5.8. SEQUENCEs OF ROOTS OF P0LYNOMIAI.S
In Chapter 2, we saw that the roots of the characteristic polynomial were important in defining the behavior of autoregressive and autoregressive moving average time series. Hence, the relationships between the coefficients of a polynomial and the
SEQUENCES OF ROOTS W POLYNOMIALS 291
roots are of interest. We write the pth order polynomial as
g(m) = mp + ulmp- ‘ + - - - + ap = (m - rn,)(m - mz) * * - (m - mp) , (5.8.1)
where the mi, i = 1,2,. . . , p, are the roots of g(m) = 0. Let a], f i2 , . . . , tii9 denote the distinct roots with multiplicities r,, rz, . . . , r9, where Z;=]ri =p. If ri = 1, we say that rii, = m, is a simple root. The coefficients can be expressed in terms of the roots by
P P P
a l = - Z m j , a 2 = C C, m p j , i = I j = 1 + 1
(5.8.2) I-! P P P P
up = (-1)’ n m, . i s I
a 3 = - C X G m p y n , , . q . , i = l ]-i+l !==I+,
Thus, the coefficients are continuous differentiable functions of the mots. The functions (5.8.2) define a mapping from a p-dimensional space to a pdimensional space. The differentials are
P
P P
J - 1 I - I da, = -a,
ah3 = -a, 2 ahj -a , 2 mj dm, -
dmj - 2 m, ah, , P P P
m:dmj, I- 1 j = 1 ]=I
P P P
dap = -ap-] C dml - ap-2 2 mi dmj - - - - - m;-’ ahl . j= I I- 1 j= I
Thus
d a = B H d m ,
where da = (da,, da,, . . , , hP)’, dm = (dm,, ah,, . , . , ahp)’,
B =
0 0 0
0 ... 0 ...
-1 ... -1 0 -a, -1 -a2 -a1
-1 -ap-, -apv3 ...
(5.8.3)
(53.4)
(5.8.5)
292
and
SOME LARGE SAMPLE THEORY
H=
m P - l mP- l ... m P - l 1 2 P
(5.8.6)
The matrix H is nonsingular if all mots are simple, Because B is nonsingular, it is possible to solve for the differentials of m, as functions of the differentials of aj when al l toots are simple.
If the coefficients of the polynomial are real, the complex roots are complex conjugate pairs. In that case, we can use a transfomtion to define real quantities
c, = m , + m 2 , (5.8.7)
(5.8.8) 2 c2 = (m, -m2)2
for every complex conjugate pair. The inverse transformation is
m, = OS(c, + c2),
m2 = O.S(c, - c2), (5.8.9)
where we adopt the convention that m, is defined with the positive coefficient on the radical and m2 is defined with the negative coefficient. The inverse transforma- tion has continuous derivatives except at c2 = 0, the point where m, = m2. The derivatives of the coefficients with respect to c, and with respect to ci are derivatives of real valued functions with respect to real valued arguments. From (5.8.6) we see that repeated r o d s will produce a singularity in the
Jacobian of the transformation. However, even if there are repeated roots, the mots of a polynomial are continuous functions of the coefficients of the polynomial.
Lenuna 5.8.1. Let g(m) of (5.8.1) have distinct roots rill, m2,. . . ,aq with multiplicities r , , r2 , . . . , rq, where a,.u2, . . . ,up are complex numbers, and q-, rj = p . Let
G(m) = mp + (u, + 6,)m p-' 4- - * - + (up + ap>,
where a,, . , . , Sp are complex numbers. For i = 1.2,. . . , q, let ei be an arbitrary real number satisfying
O € Cmin laj - & , I .
SEQUENCES OF ROOTS OF POLYNOMIALS 293
Then there exists a S > 0 such that, if 141 C S for i = 1,2,. . . , p, then G(m) has precisely ri mots in the circle 4 with centex at mi and radius 9, for i = 1,2, . . . , q.
Proof. Omitted. See Marden (1966, p. 3). A
To extend the results of Lemma 5.8.1 to sequences of polynomials, let
g,(m) = mp + al,mP-' + - - - + up" , (5.8.10)
where uinr i = 1,2,. . . , p, are sequences of reaI numbers such that a,, +ui, i = 1,2,. . . , p, as n+W, where the a, are the coefficients of (5.8.1). Let rft,,d,, . . . ,aq be the distinct roots of (5.8.1) with multiplicities r, , r z , . . . , rq, and let E be the arbitrary real number defined in Lemma 5.8.1. Then, by Lemma 5.8.1, there exists an N such that, for n bN, g,(m) has precisely ri mots in the sphere 4i with center rii, and radius E, for i = 1,. . . , q.
We now investigate the special case where a root has multiplicity two. To study the relation between the Coefficients and the roots in the neighborhood of m , = m,, we use the transformation introduced in (5.8.7) and (5.8.8) and assume m3, m4,, . . , mp are simple rods. Then
P
a, = -cl - 2 m, , j - 3
P P P
a , = m , m , + c 1 2 m , + 2 2 "inJ 1-3 j = 3 i = j + I
D D D
= 0.25(c: - cf) + c , m, + c c m,mj , j = 3 J = 3 i = j + l
(5.8.11)
0 R D
P
up = 0,25(c; - c:)(-l)' n mi. 1-3
We observe that the partial derivatives of the coefficients with respect to cI evaluated at mi = m,, am equal to the partial derivatives of the functions (5.8.2) with respect to m,. Also, c2 enters the expressions for the coefficients only as a square. In the proof of Lemma 5.8.2, we show that c,, cj, m3, . . . , mp, where c3 = ci and m3, . . . , mp are simple roots, can be expressed as locally continuous differentiable functions of the coefficients. Thus, Lemma 5.8.2 extends the result of Lemma 5.8.1 to the case in which some of the roots are of multiplicity two. The
m SOME LAROB SAMPLE THEORY
bound on the difference between two equal roots is the square root of the bound on the error in a simple root.
Lemma 5.8.2. Let a, = (u,,,~,, , . . . ,up,), n = 1,2,. . . , be a sequence of real vectors such that
where M , + O as n+a and a=(u,,u,, . . .,a,,) is the vector of real coefficients of (5.8.1). Let mi,, i = 1,2,. . . , p, be the roots of
and let m,, i = 1,2,. . . , p , be the roots of (5.8.1). Assume the roots of (5.8.1) are either simple roots or roots of multiplicity two. If mi is a simple root of (5.8.1),
for some K < a , where it is understood that for n sufficiently large, K can be chosen such that there is exactly one mi,, satisfying the mequality for each simple mi.
If m , =m, ,
lmln + m z n - I KMn 9
Imln -m,12 KM,
for some K < 00, where, for n sufficiently large, K can be chosen such that exactly one pair of roots satisfies the inequalities.
Proof. Let the roots, denoted by m = (m, , m,, . . . , mP), be arranged so that the repeated mts OCCUT as pairs in the first part of the vector. Let (m, , m,), (m3.m4), . . . , (m2k-1 , m2)) be the pairs of equal roots. Let
and
da=BH,du, (5.8.12)
SEQuENces OF ROOTS OP POLYNOMIALS 295
B is defined in (5.8.3, H is defined in (5.8.6). J is the Jacobian of the transformation of u into m, and
0.5 0 . 2 5 ~ , " ~ 3, = ( 0.5 - 0 . 2 5 ~ ~ ~ _,,,).
The matrix H is a Vandermonde matrix with determinant
IH/= n (rnj--rni), 16i<j6p
and the determinant of H, is
The determinant lHil is not zero at mZi- I = mZi because every zero product in the determinant of H is removed by the transformation. Hence, the differentials of u
A am locally continuous differentiable functions of a, and the results follow.
It is important that the bound given for the difference in Lemma 5.8.2 can be achieved. Consider, for example, the sequence of polynomials
g,(m) = m2 - (2 + 2n-I)m + 1 + n-' + n-' .
The roots of g,(m)=O are 1 +n-* +n-'I2 and 1 +n-' Thus, the difference between the two roots is O(n-1'2). The conclusion of Lemma 5.8.2 can also be obtained from the following theorem.
Theorem 5.8.1. Let a,, = (a,,,, ab,. . . * up,,), n = 1,2,. . . , be a sequence of (possibly complex valued) vectors. and let a = (a I, a,, . . . * up) be the vector of (possibly complex valued) coefficients of (5.8.1). Assume that
la, - al = o(%) * (5.8.13)
where K, + O as n+w. Let *',a2,. . . , rii denote the distinct roots of g(m) = O with multiplicities r , , r,, . . . , rq9 where X i = ,rl = p . Let Q be an arbitrary real number satisfying
8
O<r<0.5 min lar - r f i j l , 2 6 / 9
and let r, = s. Then there exists an N such that if n 2 N, the equation
mp + a,,lmP-l + - - + a,,,, = o (5.8.14)
2% SOME LARGE SAMPLE THEORY
has exactly s roots in the circle t#l with center 1, and radius E and
(5.8.15)
i Z j
where lei = m,, - MI, i = 1,2,. . . , s, in which mni. i = 1,2,, , . , s, are the s mots in t#l.
Proof. That s roots will fall in when n 3 N follows from Lemma 5.8.1. Therefore, we only demonstrate the order results.
If rii, # 0, we can replace m in (5.8.1) with rn - fi1 to obtain a polynomial with s zero roots. Therefore, without loss of generality, we assume m, = 0. Then
and
The polynomial (5.8.1) can be written as
where
SEQUENCES OF ROOTS OF POLYNOMIALS 297
(5.8.17)
Because the ani, i = 1,2,. . . , p, converge, the b,, j = 1,2,. , . , p - s, also converge, and we write
Since the multiplicity of tii, = 0 is s, we have bp-s # 0. Thus, for example,
from the first equation of (5.8.17). A
Corollary 5.8.1.1. If the assumption (5.8.13) is replaced with
la, - 4 = op(K,)
then O(K,) of (5.8.15) is replaced with Op(x,).
Pmof. omitted. A
Corollary 5.8.1.2. Let the assumptions of Theorem 5.8.1 hold with s = 2. Then, for n a N,
where m,, and mn2 are the two mots in c#+.
proof. We have, for m,, and mn2 in 4,.
by Lemma 5.8.1 and by Theorem 5.8.1. A
We now give a result on the rwts and vectors of square matrices. Let A be any p X p matrix with possibly complex elements. "he roots of the determinantal
298
equation
SOME LARGE SAMPLE THEORY
1A - mil= 0 (5.8.18)
are called the characteristic roots of A, the eigenvalues of A, the eigenroots of A, or simpry the roots of A. The vector u, that satisfies the equation
(A - rn,I)u, = 0, (5.8.19)
where m, is a root of (5.8.18), is called a characteristic vector of A or an eigenvector of A. The following theorem, taken from Magnus and Neudecker (1988). demonstrates that the simple roots of A are continuous differentiable functions of the elements of A. The theorem gives the differentials of the roots and of the vectors.
Theorem 5.83. Let m, be a simple root of (5.8.18) for A, , where A, is any (possibly complex) p X p matrix. Let u, be the associated eigenvector. Then a complex valued function m(A) and a complex vector valued function u(A) are defined for all A in some neighborhood S of A , , such that
m(A,) = m, W,) = u, (5.8.20)
Au(A) = m(A)u(A), and n:(A)u(A) = 1 (5.8.21)
for all A in S, where u:(A) is the complex conjugate of u,(A). Furthennore, the functions are differentiable any number of times on S, and the differentials are
dm = ( v ~ ~ , ) - ~ v ~ ( d A ) u , (5.8.22)
and
du = (A, - rn,I)+[I - (v~u,)-'u,v~](dA)u, , (5.8.23)
where Bt is the Moore-Penrose generalized inverse of B, v, is the eigenvector associated with the eigenvalue my of A:, satisfying
A ~ v , =mTvI,
and B* is the complex conjugate of B.
Proof. Omitted. See Magnus and Neudecker (1988, p. 161). A
If A, is a real symmetric matrix, then the roots are real, and the expressions (5.8.20) and (5.8.21) simplify to
dm = u;(dA)u, , du = (A, - rn,I)t(dA)u, .
(5.8.24)
(5.8.25)
EXERCISES 299
It follows from Theorem 5.8.1 that if the coefficients of a matrix are converging to limiting values at the rate M,, then a simple root rn, is converging to a limiting value at the same me.
The results for the roots of a matrix are applicable to the roots of a polynomial written in the form (5.8.1). Let the p X p matrix A be defined by
-a, -a, -a3 * * * -ap-, t q I 0 I 0 . * - 0 0 1 0 0 ... 0
A = I I :
; I ’ (5.8.26)
I 0 0 0 . - . 1 O J where the ai are the coefficients of (5.8.1). Then the roots of the determinantal equation
IA - rnII = 0 (5.8.27)
are the same as the mts of (5.8.1).
REFERENCES
Section 5.1. Chernoff (1956). Chung (1%8). Mann and Wald (1943b), Pratt (1959). Tucker (1967).
Seetiom 5A5.3. Billingsley (1%8), Brown (1971), Chan and Wei (1988), Chung (1968). Cram& (1946), Dvorctsky (19721, Gnedenko (1967), Hall and Heyde (1980). Loeve (1%3), Pollard (1984). Rao (1973), Scott (1973), Tucker (1%7), Varadarajan (1958). Wilks (1962).
sec#on 5.4. Cradr (lpQ6). DeGracie and Fuller (19721, Hansen, Hurwitz, and Madow (1953). Hatanaka (1973).
Section 5.5. Andrews (19871, Gallant (1971, 1987). Gallant and White (1988). Gumpertz and Pantula (1992), Hartley (1961). Hartley and Booker (1965), Jennrich (1969), Klimko and Nelson (1978). Malinvaud (197Ob). Potscher and Prucha (1991a.b). Sarkar (1990), Wald (1949), Wu (1981).
W o n 5.6. Basmann (1957), Fuller (1987), Sargan (1958, 1959). Theil (1961). Section 5.7. Carroll and Ruppert (19881, Jobson and Fuller (1980). Rothenberg (1984).
Seetian 5.8. Amemiya (199O), Magnus and Neudecker (1988), Marden (1966). Sanger (1992). Toyooka (1982, 1990), Van Dcr Genugten (1983).
1. Let {a,} and {b,} be sequences of real numbers, and let urn} and {g,) be and b,, = O(g,,). sequences of positive real numbers such that Q, =
Show that (a) la,/ = OW:), s > 0.
300 SOME LARGE SAMPLE THEORY
2. Let Cf,,}, {g,,}, and {r,,} be sequences of positive real numbers, and let {X,,}, {Y,,), and (2,) be sequences of random variables such that X,,=O,(f.), Y,, = Op(gn), and 2, = op(r,,). Without recourse to Theorems 5.1.5 and 5.1.6, show that: (a) Ix,,lJ = o,(f."), s > 0.
(c) x,, + Y,, = ~,(max(f , . 8,)). (b) X n u n = Op(f,gn).
(d) Xnzm = opcf,',).
(e) If g,,/r,,=o(l), then Y,,+Z,,=o,(r,,).
3. Let X,, . . . , X, and Y,, . , . , Y,, be two independent random samples, each of size n. Let the X, be N(0,4) and the yi be N ( 2 , 9 ) . Find the order in probability of the following statistics. (a) X,,. (d) r'f . (b) F,,. (e) X,,F,,.
Find the most meaningful expression (i,e., the smallest quantity for which the order in probability statement is true).
(c) x;. ( f ) (X,, + 1).
4. h v e that plim,,,, 8 = 8' implies that there exists a sequence of positive real numbers {a,,} such that lim,,+ a,, = 0 and 8 - 6' = O,(a,).
5. Let (a,} be a sequence of constants satisfling ZL, la,l< 00; also let {X,,,: t = 1,2, . , . , n; n = 1,2,. , .} be a triangular array of random variables such that E{X,,} = O(b:), t = 1,2, . . . , n, where limn+, b,, = 0. Prove
6. Let Z,, be distributed as a normal (p, a 2 / n ) random variable with p f 0, and define Y,, = 3; - f," e X p d Y,, in a Taylor's series through terms of O,(n-'). Find the expectation of these terms.
7. Let F, , F2,. , . , FM be a finite collection of distribution functions with finite variances and common mean p. A sequence of random variables {X,: t E (I , 2, . . .)) is created by randomly choosing, for each t, one of the distrihution functions and then making a random selection from the chosen distribution. Show that, as II +=,
EXERCISES 301
8.
9.
10.
11.
where a: is the variance of the distribution chosen for index t. Does
hf
n ” ’ ( M - ’ c . j - I .;)-*’2(n-’ I= i X + h ( O , I I ) ,
where a; is the variance of the jth distribution, j = 1,2, . . . , M? Let X, be normal independent (p, a’) random variables, and let 2, = n-’ E:= x,. (a) It is known that p # 0. How would you approximate the distribution of Xlf
(b) It is known that p = 0. How would you approximate the distribution of Xlf
Explain and justify your answers, giving the normalizing constants necessary to produce nondegenerate limiting distributions.
in large samples?
in large samples?
Let f,, be distributed as a normal (p, a 2 / n ) random variable, p > 0. (a) Find E{Y,} to order (1 /n), where
Y, = sgnf,, ~,,[”’
and sgn X,, denotes the sign of 3,. (b) Find E{(Y, - pi’*)2) to order (lh). (c) What is the limiting distribution of nl’*(Y, - pi”)?
Let f, be distributed as a normal (p, a 2 / n ) random variable, p # 0, and define
Mn z, = 2 ‘ &+a
Show that E{Xi ’} is not defined but hat E{Z,} is defined. Show further that Z, satisfies the conditions of Theorem 5.4.3, and find the expectation of 2, through terms of O( lh) .
Let (X, Y)‘ be distributed as a bivariate normal random variable with mean (n, h)’ and covariance matrix
Given a sample of size n from this bivariate population, derive an estimator for the product ~4 that is unbiased to q l l n ) .
302
12. Assume the model
SOME LARGE SAMPLE THEORY
U, = a + h-”~ + u, ,
where the u, are distributed as n o d independent (0, u2) random variables. We have available the following data:
t u, XI t u, XI
1 47.3 0.0 7 136.5 2.0 2 87.0 0.4 8 132.0 4.0 3 120.1 0.8 9 68.8 0.0 4 130.4 1.6 10 138.1 1.5 5 58.8 0.0 11 145.7 3.0 6 111.9 1 .o 12 143.0 5.9
where Y is yield of corn and x is applied nitrogen. Given the initial values & = 143, 1 = -85, = 1.20. carry out two iterations of the Gauss-Newton procedure. Using the estimates obtained at the second iteration, estimate the covariance matrix of the estimator.
13. In the illustration of Section 5.5 only the coefficient of x,, was used in constructing an initial estimate of 0,. Identifying the original equation as
U, = so + elXli + a’%,* + e, ,
construct an estimator for the covariance matrix of ($,8,, &), where the coefficients 80, 8,, and L2 are the ordinary least squares estimates. Using this covariance matrix, find the A that minimizes the estimated variance of
Ad, + ( I -A)&
as an estimator of 6,. Use the estimated linear combination as an initial estimate in the Gauss-Newton procedure.
14. Assuming that the e, are normal independent (0, cr2) random variables, obtain the likelihood function associated with the model (5.5.1). By evaluating the expectations of the second partiat derivatives of the likelihood function with respect to the parameters, demonstrate that the asymptotic covariance matrix of the maximum likelihood estimator is the same as that given in Theorem 5.5.1.
15. Let U, satisfy the model
U, = e,, + 0, e-’ZL + el , t = I, 2, . . . ,
EXERCISES 303
where 62 > 0 and e, is a sequence of n o d independent (0, a2) mdom variables. Does this model satisfy assumptions 1 and 2 of Theorem 5.4.4? Would the model with t in the exponent replaced by x,, where {x,}= (1,2,3,4,1,2,3,4,. . .}, satisfy the three assumptions?
16. An experiment is conducted to study the relationship between the phosphate content of the leaf and the yield of grain for the corn plant. In the experiment different levels of phosphate fertilizer were applied to the soil of 20 experimental plots. The leaf phosphate and the grain yield of the corn were recorded for each plot. Denoting yield by Y, applied phosphate by A, and leaf phosphate by P, the sums of squares and cross products matrix for A, P, and Y were computed as
Z A 2 Z A P 2 A Y 69,600 16,120 3,948
( Z Y A XYP 3,948 1,491 739 Z P A ZP2 X P Y 16,120 8,519 1,491
The model
yr= /3P1+u , , t = l , 2 ,*.., 20,
where the u, are normal independent (0, cr2) random variables, is postulated. Estimate /3 by the method of instrumental variables, using A , as the instrumental variable. Estimate the standard error of your coefficients. Compare your estimate with the least squares estimafe.
17. Show that if X, = O,(u,) and a, = o(b,), then X, = o,,(b,).
18. Prove the following corollary to Theorem 5.5.1.
Corollary 55.1. Let the assumptions of Theorem 5.5.1 hold. Also assume
E{(e; - a2)21d,,_,} = K
and
Then
n112 2 - (s N(0. K) . 19. Let s2 be defined by (5.5.57). Show that
s2 = (n - k)-'e'{I - ~(e~)i[~'(e~)~(e~)1-'F(e~)}e + 0,(max{u,3, n-'''u:})
under the assumptions of Theorem 5.5.4.
304 SOME LARGE SAMPLE THEORY
u). Show that if the assumptions (5.7.6) through (5.7.9) hold and if M i = op( l), then converges to fl in probability, where /# is defined in (5.7.5).
21. Let X,, t = 1,2,. . . , n = 1,2,. . . , be a triangular array of random variables. Let {gf}:=, and {h,},"=, be sequences of positive real numbers. Show that if
E{ktnI} = glh, , t = 1,2,. . . , n , n = 1,2,. . . , then
Show that X,, = O,(g,h,) does not imply that
e ze 22. Let the model (5.7.1)-(5.7.2) hold with XI = t and V,, = diag{e , e , . . . , en'} for e E 8 = (1, a). Let M, = G:". Suppose an estimator of eo, the true vAe, is available such that 8- eo = O,(n-"'). (a) Show that
does not go to zero in probability, where lbIl2 is the square root of the largest eigenvalue of A'A.
(b) Show that the assumption (5.7.8) holds for this model.
23. Let
and let
&) = a, + a,z + a * f agk = 0
have distinct roots Z,, &,. . . , ,Tq with multiplicities r l , r,, . . . , rq. Let
ain - a, = o,(n+)
for i = 1.2,. . . ,k and some (>O. Prove that, given a>O, there exists an N such that the probability is greater than 1 - 6 that exactly r, of the roots of ~ ( z ) are within Men-' of zi for i = 1.2,. . . , q.
EXERCISES 305
The variance of f i conditional on x,, t = 1.2,. . . , n, is
vcb I (X, , . . . , XJ} = (i: 1 = 1 x:.,) -I..: .
Show that
n
plim n-3 2 X:X, = 3-’J‘J, n - w t - 1
where J = (1,l). Construct a sequence of matrices (M,} such that
and such that
25. Use Theorem 5.3.4 to prove the assertion made in Example 5.5.1 that
M;’Un(@o)-i”, N(0, VaS).
26. Let (a,,,a2,)’-N[(-1.40,0.49)’,n-’I]. Let (mln,mh) be the roots of 2 m + aInm + a,, = 0 ,
where lmlnl Sr lm,,l. What can you say about the distribution of n1’2(ml, - 0.7, m2, - 0.7) as n increases?
27. Let (a,,,a,)‘-N[(-1.20,0.61)’, n-’l[l. Let (m,,, m2,J be the roots of
m2 +a,,m +a,, = o .
306 SOME LARGE SAMPLE THBORY
What can you say about the distribution of n1'2(mln - m,, m,, - m,)', where (m, ,m2) are the mots of
m2 - 1 . 2 h -I- 0.61 = 01
Hint: See Exercise 10 of Chapter 4.
28. Consider the sequence of polynomials
gn(m) = m2 - 3n-'m + 2n-2.
What is the order of the difference of the two roots? How do you explain the difference between this result and the example of the text? Use the proof of Theorem 5.8.1 to prove a general result for the order of the difference of the two roots of a second order polynomial that is converging to a polynomial with repeated roots.
29. Let {Yi]y=l be a sequence of k-dimensional random variables, where Y i - NI(pJ, Z), and J is a kdimensional column vector of ones. Find the limiting distribution, as n + 00, of
where Y = n - ' X ; = , Y, and n
S = (n - I)-' 2 (yi - y)(Yi - 9)' . Compare the variance of the limiting distribution of pg with that of f i = k-'J'F.
i s 1
30. Prove the following.
Corollary 5.1.6.2. Let {X,,} be a sequence of scalar random variables such that
x, = a + O,(t..) *
wbere r, + 0 as n + 00. If g(n) and g'(x) are continuous at a, then
31. Let
q = e a ' i a , , t = 0 , 1 , 2 ,....
EILERCISES 307
where fi E (-m, 0) and a, -NI(O, a'). Let 4 be the value of fi that minimizes n
QJB) = C (Y, - e8f)2 1-0
for /3 E (-m, 0). Are the conditions of Theorem 5.5.1 satisfied for Mn = 1 and a, = n f for some 5 > O? What do you conclude about the consistency of 87
C H A P T E R 6
Estimation of the Mean and Autocorrelations
In this chapter we shall derive some large sample results for the sampling behavior of the estimated mean, covariances, and autocorrelations.
6.1. ESTIMATION OF THE MEAN
Consider a stationary time series X,, with mean p, which we desire to estimate. If it were possible to obtain a number of independent realizations, then the average of the realization averages would converge in mean square to p as the number of realizations increased. That is, given m samples of n observations each,
- I where 4,) = n 2:ml XflIr is the mean of the n observations from the jth realization.
However, in m y areas of application, it is difficult or impossible to obtain multiple realizations. For example, most economic time series constitute a single realization. Therefore, the question becomes whether or not we can use the average of a single realization to estimate the mean. Clearly, the sample mean 2" is an unbiased estimator for the mean of a covariance stationary time series. If the mean square error of the sample mean as an estimator of the population mean approaches zero as the number of observations included in the mean increases, we say that the time series is ergodic for the mean. We now investigate conditions under which the time series is ergodic for the mean. Theorem 6.1.1 demonstrates that the sample mean may be a consistent estimator for wnstationary time series if the nonstationarity is of a transient nature. The theorem follows Parm (1962).
308
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
ESTIMATION OF THE MEAN 309
"hearem 6.1.1. Let {Xl: t E (1,2, . . .)} be a time series satisfying
lim E{X,} = p ,
lim COV{~,,, X,,} = 0, f - w
"3-
- I where IS, = n Xf=, X,. Then
Proof. Now
where the second term on the right converges to zero by Lemma 3.1.5. Furthermore,
1 " " n t = i j = i
Var{Z"} = 7 c x COV(X,, 3)
which also converges to zefo by Lemma 3.1.5. A
Corollary 6.1.1.1. Let {X,} be a stationary time series whose covariance function f ih) converges to zero as h gets large. Then
lim Var@"} = 0. n - w
h f . By Lemma 3.1.5, the convergence of fib) to zero implies that Cov{Xn, X,,} = (1 ln) Xi:: Hh) converges to zero, and the result follows by Theorem 6.1.1. A
Corollary 6.1.1.2. A stationary time series with absolutely summable co- variance function is ergodic for the mean. Futhermore,
m
310 ESTIMATION OF THE MEAN AND AUTOCORRELATIONS
€'roo€. The assumption of absolute summability implies
lim r(h) = 0 , h-W
and ergodicity follows from Corollary 6.1.1.1. We have
and n Va&} converges to the stated result by Lemma 3.1.4. A
Theorem 61.2. If the spectral density of a stationary time series X, is continuous, then
lim n Va@,,} = 2 d O ) , (6.1.1) n - w
where AO) is the spectral density of X, evaluated at zero.
Proo€. By Theorem 3.1.10 the Fourier series of a continuous periodic function is uniformly summable by the method of C e s h . The autocovarianws r(k) are equal to 12 times the uk of that theorem. Therefore,
A
Since the absolute summability of the covariance function implies that flu) is continuous, it follows that (6.1.1) holds for a time series with absolutely summable covariance fuaction. Thus, for a wide class of time series, the sample mean has a variance that is declining at the rate n-'. ~n large samples the variance is approximately the spectml density evaluated at zerr, multiplied by Zm-'. To investigate the efficiency of the sample mean as an estimator of p, we write
Y, =c1 +z,. where Z, is a time series with zero mean, and we define V to be the covariance matrix for n observations on Y,. Thus,
v = E(m'} ,
where z is the column vector of n observations on 2,. If the covariance matrix is
EsIlMATION OF THE MEAN 311
known and nonsingular the best linear unbiased estimator of the mean is given by
r; = ( l ’ v - l l ) - l l ’ v - l y , (6.1 -2)
where 1 is a column vector composed of n ones and y is the vector of n observations on Y,. The variance of I; is
Var{I;} = (l’V-9)-l, (6.1.3)
- 1 I ) whereas the variance of jf,, = n X,,] Y, is
var(ji,} = n-21‘V1 . (6.1.4)
Let Y, be a pth order stationary autoregressive process &fined by
(6.1.5)
where the e, are uncorrelated (0,a2) random variables, and the roots of the characteristic equation are less than one in absolute value. For known 5, the estimator (6.1.2) can be constructed by transforming the observations into a sequence of u n m h t e d constant variance observations. Using the Gram-Schmidt orthogonabtion procedure, we obtain
(6.1.6) P-1
J = 1
P
w,=y ,+ &y,+ r = p + l , p + 2 ,..., n ,
where 6, = y - 1 ‘ 2 ( o ) 4 & ={[l -p2(1)]7)10)}-1’24 =p(l){[l -
j = t
p2(1))fi0)}-”2~, and so forth. The expected values are
312 EmMATION OF THE MEAN AND AUTOCORRELATIONS
In matrix notation we let T denote the transformation defined in (6.1.6). Then T'Tc-~ = V-I, E{Ty} = Tlp, and
p = ( I ~ T ~ T I ) - * I ~ T T Y . (6.1.7)
With the aid of this transformation, we can demonstrate that the sample mean has the same asymptotic efficiency as the best linear unbiased estimator.
Theorem 6.13. Let U, be the stationary pth order autoregressive process defined in (6.1.S). Then
lim n Va&} = lim n Var(ji), n - w n-+-
where is defined in (6.1.2).
Proof. Without loss of generality, we let C* = 1. The spectral density of U, is then
where a, = 1. With h e exception of 2pz terms in the upper left and lower right comers of T'T, the elements of V-' are given by
v i r = arar+h, l i - r l = h G p 9 (6.1.8) otherwise.
{;I: The values of the elements u" in (6.1.8) depend only on li - rl, and we recognize
average. Therefore, for n > 2p, zp-lhl r -O a ,+Ihl = y,(h), say, as the covariance function of a pd, order moving
ESTIMATORS OF THE AUTOCOVARIANCE ANTI AUTOCORRELATION F U N ~ I O N S 313
It follows that
= 2nfr(O) =lim n Var@"}. n-rm
A
Using the fact that a general class of spectral densities can be approximated by the spectral density of an autoregressive process (see Theorem 4.3.4). Grenander and Rosenblatt (1957) have shown that the mean and certain other linear estimators have the same asymptotic efficiency as the generalized least squares estimator for time series with spectral densities in that class.
63. ESTIMATORS OF THE AUTOCOVARWYCE AND AUTOCORRELATION FUNCTIONS
While the sample mean is a natural estimator to consider for the mean of a stationary time series, a number of estimators have been proposed for the covariance function. If the mean is known and, without loss of generality, taken to be zero, the estimator
(6.2.1)
is Seen to be the mean of n - h observations from the time series, say,
' fh ='Jt+h '
For stationary time series, E(Zf,} = ~ ( h ) for all t, and it follows that Hh) is an unbiased estimator of Hh). In most practical situations the mean is unknown and must be estimated. We list below two possible estimators of the covariance function when the mean is estimated. In both expressions h is taken to be greater than or equal to zero:
1 n - h
Y+(W = - c cx, - %XX,+k -in) 9 (6.2.2) n - h
(6.2.3)
It is clear that these estimaton differ by factors that become small at the rate n-'. Unlike $(A), neither of the estimators is unbiased. The bias is given in Theorem 6.2.2 of this section.
314 esTIMATION OF THE MEAN AND AUTOCORRELATIONS
The estimator -j(h) can be shown to have smaller mean square e m r than rt(h) for certain types of time series. This estimator also has the advantage of guaranteeing positive definiteness of the estimated covariance function. In most of our applications we shall use the estimator j(h). As one might expect, the variances of the estimated autocovariances are much
more complicated than those of the mean. The theorem we present is due to Bartlett (1946). A result needed in the proof will be useful in later sections, and we state it as a lemma.
Lemma 62.1. Let {+}/"--m and {c,);=-~ be two absolutely summable sequences. Then, for fixed integers r, h, and d, d 2r 0,
Proof. Let p = s - t. Then
. n+d n
Now,
with the inequality resulting from the inclusion of additional terms and the introduction of absolute values. A
Theorem 6.2.1. Let the time series {XI} be defined by
ESTIMATORS OP THE AVraYlVARIANCE AND AUTOCORRELATION FUNCTIONS 315
where the sequence (q} is absolutely summable and the er are independent (0, a*) random variables with E { e f } = qu4. Then, for fixed h and q (h q a 0),
where x h ) is defined in (6.2.1).
Proof. using
77a4, t = u = v = w , u4 if subscripts are equal in
pairs but not all equal otherwise,
E{ere,e"e,) =
it foUows that
Thus,
+ Y(s - t + q)Y(s - r - h)] . (6.2.6)
w w
316 ESTIMATION OF THE MEAN AND A U " X m T I O N S
For normal time series q = 3, and we have
1 " -(%h), K d = n p = - , c E)Y(P)?tP - h + q) + ?tP + @)Y(P - h)I -
Corollary 6.2.1.1. Given a time series {et: t E (0 ,2 1,42, . . .)}, where the e, are normal independent (0, cr2) random variables, then for h, q 3: 0,
h = q = O ,
h = q # O , otherwise.
n - h ' C0v{Ye(h), qe(q)} =
Proof, Reserved for the reader. A
In Theatem 6.2.1 the estimator was constructed assuming the mean to be known. As we have mentioned, the estimation of the unknown mean introduces a bias into the estimated covariance. However, the variance formulas presented in Theorem 6.2.1 remain valid approximations for the estimator defined in equation (6.2.2).
Theorem 63.2. Given fixed h 3 q 0 and a time series X, satisfying the assumptions of Theorem 6.2.1,
and
Now
ESTIMAIVRS OF THE AUTOCOVARIANCE AND AUTOCORRELATION FUNCI'IONS 317
and we have the first result. Since by an application of (6.2.5) we have VdZ?} = A U(n-*), the second conclusion also folfows.
Using k e r n s 6.2.1 and 6.2.2 and the results of Chapter 5, we can approximate the mean and variance of the estimated correlation function. If the mean is known. we consider the estimated autocorrelation
ffh) = [Ko)l-~RW,
and if the mean is unknown, the estimator
F(h) = [ H0)l-I Sk). (6.2.7)
If the denominator of the estimator is zero, we define the estimator to be zero.
Theorem 6.23. Let the time series {XI } be defined by m
+ r~o)l-2~pf~)var(Ko)~ - COV{Hh), %O>ll + O(n-').
Proof. The estimated autocorrelations are bounded and are differentiable functions of the estimated covariances on a closed set containing the true parameter vector as an interior point, Furthermore, the derivatives are bounded on that set. Hence, the conditions of Theorem 5.4.3 ate met with {@), Kq), KO)}
playing the role of {X,} of that theorem. Since the function jfh) is bounded, we take a = l .
Expanding [r(h)-p(h)][jfq)-p(q)] through third order terns and using Theorem 5.4.1 to establish that the expected value of the third order moments is an-'), we have the result (6.2.8). The remaining results are established in a similar manner. A
For the first order autmgressive time series X, = pX,-, + c, it is relatively easy to evaluate the variances of the estimated autocorrelations. We have, for h > 0,
(6.2.9)
We note that for large h,
n-h 1+p2 Var{f(h)) = 7 -
n l - p z ' (6.2.10)
For a time series where the mlat ions approach zero rapidly, the variance of ?(h) for large h can be approximated by the first tam of (6.2.8). That is, for such a time series and for h such that p(h)=O, we have
(6.2.1 1) 1
Var{@h)} =- 2 p2(p) . n p = - m
We are particularly interested in the behavior of the estimated aut~rrelations for a time series of independent random variables, since this is often a working hypothesis in time series analysis.
Comllary 6.2.3. Let {e,} be a sequence of independent (0 ,a2 ) random variables with sixth moment w6. Then, for h 3 q > 0,
n - h n(n - 1) E{%h)} = - - + qn-')
otherwise. proot. omitted A
For large n the bias in fib) is negligible. However, it is easy to reduce the bias
ESTIMATORS OF THE AUTOCOVARIANCE AND AUTOCORRELATION FUNCTIONS 319
in the null case of independent random variables. It is suggested that the estimator
(6.2.12)
be used for hypothesis testing when only a small number of observations are available. For time series of the type specified in Corollary 6.2.3 and h, q > 0,
E(ii(h)) = O(n-2),
In the next section we prove that the ;(h) and a h ) are approximately normally distributed. Tfie approximate distribution of the autocorrelations will be adequate for most purposes, but we mention one exact distributional result. A statistic closely related to the first order autocorrelation is the von Neumann ratio:'
(6.2.13)
We see that
n (X' - xn)2 + (X, - Q2
2 c (X, -Q' f - 1
If the XI are n o d independent (p, a') random variables, it is possible to show that E(d,} = 2. Therefore, r, = $(d, - 2) is an unbiased estimator of zero in that case. von Neumann (1941) and Hart (1942) have given the exact distribution of r, under the assumption that XI is a sequence of normal independent ( p , a') random variables. Tables of percentage points are given in Hart (1942) and in Anderson (1971, p. 345). Inspection of these tables demonstrates that the
' he ratio is sometimes defined with the multiplier n/(n - I).
320 ESFIMATION OP THE MEAN AND AUTOCORRELATIONS
2 -112 percentage points of tv = ru(n + 1)’12(1 - ru) are approximately those of Student’s t with n + 3 degrees of freedom for n greater than 10.
Clearly the distribution of Al)[l- /i2(1>]-”’(n + 1)II2, where &l) is defined in (6.2.12), is close to the distribution of t, and therefore may also be approximated by Student’s t with n f 3 degrees of freedom when the observations are independent normal random variables.
Kendall and Stuart (1966) and Anderson (1971) present discussions of the distributional theoq of statistics such as d,.
6.3. CENTRAL LIMIT TREOREMS FOR STATIONARY T W SERIES
The results of this chapter have, so far, been concerned with the mean and variance of certain sample statistics computed from a single realization. In order to perform tests of hypotheses or set confidence limits for the underlying parameters, some distribution theory is required.
That the mean of a finite moving average is, in the limit, normally distributed is a simple extension of the central limit t h e o m of Section 5.3.
Proposition 63.1. Let {Xt: 1 E (0, 2 1.22,. . .)} be defined by
where bo = 1, Zy’.o b, # 0, and the e, are uncorrelated (0, a2) random variables. Assume ~2’’~8,, converges in distribution to a N(0, a2) random variable. Then,
Proop. We have
m - I n -112 2- - n bjen-, .
s - 0 J==S+l
112 m Both n-lf2 Z::: ZIA(s+I bjen-, and n- to zero as n increases, by Chebyshev’s inequality.
Z,,, ZI”, bjel-, converge in probability
CENTRAL LIMIT THEOREMS FOR STATIONARY TIME SERIES 321
Therefore, the limiting distribution of n’”(f,, - p) is the limiting distribution of A
If X;’,, bj = 0, then fn - p has a variance that approaches zero at a rate faster than n-’, The reader can verify this by considering, for example, the time series X, = p + e, - e,- , . In such a case the theorem holds in the sense that n (.fn - p) is converging to the singular (zero variance) normal distribution.
Moving average time series of independent random variables are special cases of a more general class of time series called m-dependent.
,,It2 m Xi,, b3Zn, and the result follows.
112
Delinition 63.1. The sequence of random variables {Z,: t E (0,rt 1,+2, . . .)} is said to be mdependent if s - r > m, where m is a positive integer, implies that the two sets
(* * * 3 - 2 , 5- 1, 3) (Zs, zs+, Zs+Z, * * *)
are independent. We give a theorem for such time series due to Hoeffding and Robbins (1948).
Theorem 63.1. Let {Z,: t E (1,2, . . .)} be a sequence of mdependent random variables with E{Z,} = 0, E{Z:} = a: < ,8 < -, and E{p,lz+28} 6 /32’26 for some 6 > 0. Let the limit
P
Iim p - ’ c, A,+j = A , P+- j = I
A # 0, be uniform for t = 1,2, . . . , where
Then n
n-”’ 2 Z , L N(0, A ) . r = l
Proof. Fix an a, 0 < a < 0.25 - 5, where
0.25 > e8 > max{O,O.25( 1 + 25)-’( 1 - 25)) . Let k be the largest integer less than nu, and let p be the largest integer less than k-ln. Define
D
322 ESTIMATION OF THE MEAN AND AUlYXXlRRELATIONS
Then the difference
0, = n-112 I= f: 1 z , - sp = n-1’2 [ZL? z k i - m + j > + r = p k - m + l 2 zr] *
For n large enough so that k > b , the sums Zy=l Zk,-m+,, i = 1,2,. , . , p , are independent. By the assumption on the moments, we have
va{2 ’ P I zki-m+J} m2g2,
r=pk-m+l
I
It follows that
Val@“) s n-’/32{mZ(p - 1) + (k + m)2} = o( 1) ,
and n-l” Z:=l 2, converges in mean square to Sp. Since 2, is comlated only with Z,-,, Zt-m + . . . , Zt+,- I , Z,+,,,, the addition of a 2, to a sum containing the m preceding tern, Zl- , , Z,+, . . . , Z,-,, increases the variance by the amount Therefore,
and
Since Varp;=, Z c i - l , k + j } ~ m 2 p z , and p - 1 X i , , P A , , converges uniformly, we have
Now
and
CENTRAL LIMIT THEOREMS FOR STATIONARY TIME SERIES 323
Hence, p-"' 27-, k-"'Y, satisfies the conditions of Liapounov's central limit theorem, and the result follows. A
For the mth order moving average, the A, of Theorem 6.3.1 is the same for all t and is
rn m m m-J
y(0) + 2 2 H j ) = 2 b:rr2 + 2 2 2 b,b,+ja2 /"I I =o j = l s=O
=@ b,)'r' *
which agrees with Proposition 6.3.1. The results of Proposition 6.3.1 and "heorem 6.3.1 may be generalized to
infinite moving averages of independent random variables. Our proofs follow those of Diananda (1953) and Anderson (1959, 1971). We first state a lemma required in the proofs of the primary theorems.
Lemma 6.3.1. Let the random variables with distribution functions F, (2) be defined by
5, = s k n + D k n
for R = 1,2,. . . and R = 1,2,. . . . Let
uniformly in n. Assume
and
Then
F t $ ) - A F t t z ) as R+OrJ#
Proof. Let to be a point of continuity of Ft(z), and fix S > 0. There is an > 0 such that
~F&z) - Ft(zo>f < 0.256
for z in [to - E, zo + a], and F&) is continuous at to - E and at zo f E. By
324 ESTIMATION OF THE MEAN AND AUTOCORRELATIONS
hypothesis, there exists a KO such that
for k > KO. This means that
for k > KO and zo - ~ S z b z , + 6. Now there is a K, such that for k K, ,
P(I - s,, I > €} .= 0,256
for all n. By the arguments used in the proof of Theorem 5.2.1, this implies that
FS,(z - E ) - 0.256 Ffm(z) F',(z + E ) + 0.256
for all z. Fix k at the maximum of KO and K, . Let z1 and z2 be continuity points of F,&j
s u c h t h a t z , - ~ S z , < z o and2,€z,czo+eThenthereexistsanNsuchthat for n > N,
IFS,(Z, 1 - F&, >I < 02% s
IFSkm(Z2) - F&)I < 0.256
Therefore, for n > N,
F&z0) - 8 F $ k ( ~ l 1 - 0.56 C FskRtz, - 0.256
Ftn(Z,) Fs,,(z2) + 0.258 zs F#$Z,) + 0.58
F&zo) + 8 . A The following two lemmas are the convergence in probability and almost SUE
results analogous to the convergence in distribution result of Lemma 6.3.1.
Lemma 6.3.2. Let (&,S,,,D,), k = 1 , 2 ,..., n = l , 2 ,..., be random variables defined on the probability space ($2, d, P) satisfying
6fl = skn + Dkn . (6.3.1)
Assume: (i) one has
plim Dkn = 0 uniformly in n , (6.3.2) k - w
CENTRAL LIMIT THEOREMS FOR STATlONARY TIME SERIEB
(ii) for every fixed k there is an Sk, satisfying
p b (Skn - S k , ) = 0 9
n+w
(iii) one has
325
(6.3.3)
(6.3.4)
A
Lemma 6.33. Let the assumption (6.3.1) of Lemma 6.3.2 hold. Replace the assumptions (6.3.2). (6.3.3). and (6.3.4) with the following:
(i) one has
lim Dkn = 0 a.s., uniformly in n ; k-cm
(ii) for every fixed k, there is an &. satisfying
lim (Skn - Sk ) = 0 as. ; n 4-
(iii) one has
lim (&. - 6 ) = 0 as. k - m
Then
lim (5. - 6) = 0 as. n-rm
Prod. Omitted.
(6.3.5)
(6.3.6)
(6.3.7)
A
We use Lemma 6.3.2. to prove that the sample mean of an infinite moving average converges in probability to the population mean under weak conditions on the e,. For example, the conclusion holds for independently identically distributed e, with zero mean. Corollary 6.1.1.2 used finite variance to obtain a stronger conclusion.
Theorem 63.2. Let the time series X, be defined by
326 ESTIMAllON OF THE MEAN AND AU"QCORREU\TIONS
Proof. We apply Lemma 6.3.2. Let t,, = 3,, - p,
(6.3.8)
and
By the assumption that the e, are uncomlated ( 0 , ~ ' ) random variables, n X,,, converges to zero in probability as n -+m. Hence, the sum (6.3.8) converges to zero (=J;.) as n + 01 for every fixed k, and condition (ii) of Lemma 6.3.2 is satisfied. Condition (iii) is trivially satisfied for = 0. Now
- 1 n
bM€-' c lWjl I/l>k
for some finite My by Chebyshev's inequality. Hence, condition (i) of Lemma 6.3.2 is satisfied and the conclusion follows. A
Using Lemma 6.3.1, we show that if the e, of a0 inlinite moving average satisfy a central limit theorem, then the moving avenge also satisfies a central limit theorem.
Thoorem 6.3.3. Let XI be a covariance stationary time series satisfying
CENTRAL LIMIT THEOREMS FOR STATIONARY TIME SERIES 327
Then
where
Proof. Let
and define the normalized sums
For a fixed k, W,, is a stationary time series such that ca
2 %(h) = c. q++lhla , h =o, "I, 52,. . . * J m k + 1
It follows that . n - I
Therefore, by Chebyshev's inequality, Db converges to zero as k --+ c*) uniformly in n. For fixed R, qk is a finite moving average of el that satisfies a central limit theorem. Therefore, by Proposition 6.3.1, as n tends to infinity,
AS k increases, the variance of the d distribution converges to <z;~,-, q)'u2. The conclusion folIows by Lemma 6.3.1. A
The stationary autoregressive moving average is a special case of an infinite moving average. Because the weights decline exponentially for the autoregressive
328 ESTIMATlON OF THE MEAN AND AIJIWDRRELATIONS
moving average, the sample mean of Y, can be expressed as a function of the sample mean of the el with remainder of order in probability n- ' .
Corn- 6.33. Let U, be a stationary 6nite order autoregressive moving average satisfying
D a
where a, = 1, bo = 1, the roots of the autoregressive characteristic equation ~IE
less than one in absolute value, and (e,} is a sequence of uncomlated (0,cr') random variables. Then
where the y are defined in "heorem 2.7.1 and X,:=, y = (X;=, aj)-' 2&, b, . If n"'2e converges in distribution to a N(0, cr ) random variable, then
Proop. By the representation of T h m m 2.7.1,
m n+i-I
- 'Y c y+1e,-i i=O J = i
Now, by Exercise 2.24, there is some M and 0 C A < 1 such that lyl< MA' for all j 3 0. Hence, tfiere is a K such that
and
for all n. Dividing the variances by nz, we obtain the first result. The limiting distribution result is an immediate consequence of the order in probability result
A and the assumption that n'"2,, converges in distribution.
We now obtain a central limit theorem for a linear function of a realization of a
CENTRAL LIMIT THEOREMS FOR STATIONARY TIME SERIES 329
stationary time series. We shall state the assumptions of this theorem in a manner that permits us to use the Lindeberg central limit theorem.
Theorem 63.4. Let {Xl: r E T = (0, t 1,-+2,. . .)} be a time series defined by
where XT=o 191 em, and the el are independent (0, a') random variables with distribution functions F,(e) such that
lim sup J e2 tiF,(e) = 0 . i 3 - w rET Iel>i
Furthermore, let {Cl};=, be a sequence of fixed real numbers satisfying
(i i ClX1-% N(0, V) * f = l f=. l
prod. Following the proof of Theorem 6.3.3, we set
and note that
h - -(n - I )
which, by the proof of Theorem 6.3.3, can be made arbitrarily small by choosing k sufficiently large.
For fixed k, &, is a finite moving average, and, following the proof of Theorem
330 EST(MATI0N OF THE MBAN AND AIJIWQRRELATIONS
6.3.1, we have
where b, = 2$, a$&+,; and the random variables blel are independent with mean zero and variance b , a . Let V, = Z:nI b:a2,
n
S, = V i l " 2 brer, 1 - 1
Mn= sup b , , 2
t S r S a
and consider, for S > 0,
where
and
R, ={e: lel>M; 112 V, 112 6 ) .
Clearly, R, C R, for all t c n . Assumptions i and ii imply that
2 - 1 n Because V=lim,,, (X:-] C,) Z,,, bgo2#0, we have
which, in turn, implies that Jima+- M,l lZV, l 'Z = 00. By assumption, the supremum of the integral over R, goes to zero. Hence, the
assumptions of the Lindeberg central limit theorem are met, and S,, converges in distribution to a nonnal random variable with zero mean and unit variance. Since
CENTRAL LIMIT THEOREMS FOR STATIONARY TIME SERIES 331
and
the result follows by Lemma 6.3.1. A
Because a stationary finite order autoregressive moving average time series can be expressed as an infinite moving average with absolutely summable covariance function, the conclusions of Theorem 6.3.4 hold for such time series. Also, the condition that the el are independent (0, a') random variables can be replaced by the condition that the el are martingale differences with E[e 1 Js, - J = u2 as. and bounded 2 -I- S (8 > 0) moments, where dt- I is the sigma-field generated by e,, . . . , e r - , .
We now investigate the large sample properties of the estimated auto- covariances and autwrrelations. We use Lemma 6.3.2 to demonstrate that the estimated autocovariances converge in probability to the true values. The assumptions and conclusions of Theorem 6.3.5 differ from those of Theorem 6.2.1. In Theorem 6.2.1, the existence of fourth moments enabled us to obtain variances of the estimated autocovariances. In Theorem 6.3.5, weaker assumptions are used to obtain convergence in probability of the sample autocovariances.
Thewem 63.5. Let the stationary time series U, be defined by
Proof. Now
Also,
by Theorem 6.3.2.
332 ESTIMATION OF THE MEAN AND AUTOCORREU\TIONS
To show that the term
converges to zero, we apply Lemma 6.3.2, letting
n - h
n-h
n-h
and
For fixed i, j with i + j , using Chebyshev’s inequality, it can be proved that
a-h
Hence, for any fixed k, plim,,, S,, = 0. Now, by Chebyshev’s inequality,
plim 0::’ = 0 uniformly in n . k - m
In a similar manner, we can prove
plim[DE’ + D:’] = 0 uniformly in n &*-
and it follows that Hh) converges in probability to Hh). The result for j(h) A follows because, by the proof of Theorem 6.2.2, %A) - $@) = O,,(n-’).
CENTF3L LIMIT THEOREMS FOR STATIONARY TIME SERIES 333
Corollary 63.5. Let the stationary time series Y, satisfy
for some S > 0, and att-, is the sigma-field generated by et-2r. . . . Then
proof, The proof parallels that of Theorem 6.3.5. A
The vector of a finite number of sampIe autocovariances converges in distribution to a normal vector under mild assumptions.
"heorern 63.6. Let U, be the stationary time series defined by
where the et are independent (0, v2) random variables with fourth moment 7 p 4
and finite 4 + S absolute moment for some S > 0, and the bj are absolutely summabie. Let K be fixed. Then the limiting distribution of n"2[sO) - do), K1) - Hl), . . . , 3K) - HK)]' is multivariate normal with mean zero and co- variance matrix V whose elements ace defined by (6.2.4) of Theorem 6.2.1.
proof. The estimated covariance, for h = 0,1,2,. . . , K, is
and the last two terms, when multiplied by n1'2, converge in probability to zero. Therefore, in investigating the limiting distribution of n1'2[$(h) - Hh)] we need only consider the first term on the right of (6.3.9). Let
y, =x,, + w,, 9
ESTNATlON OF THE MEAN AND AUMCORRELATlONS 334
where
T n - h n-h
-112 n h where Smhn = n have
2,,, Xm,Xm,+h. Following the proof of Theorem 6.2.1, we
Now,
where L,,, 4 0 uniformly in n as m 4 0 0 because 21i,,m bj 4 0 as m 303. Therefore, Dmn + 0 uniformly in n as m + 00.
Consider a linear combination of the errors in the estimated covariances of Xmr,
CEMRAL LIMIT THEOREMS FQR STATIONARY TIME SERIES 335
where the Ah are arbitrary real numbers (not all zero) and
zih =XmiXm.i+h h =0,1,2,. .. , K. Now, Z,, is an (m + h)dependent covariance stationary time series with mean
x (h) and covariance function 11)
m
where we have used (6.2.5). Thus, the weighted average of the Z,h’s,
K K
is a stationary time series. Furthermore, the time series U, is (m + K)-dependent, it has finite 2 + Si2 mment, and
K
Therefore, by Theorem 6.3.1, Sm converges in distribution to a normal random variable. Since the Ah are arbitrary, the vector random variable
n*’2[?*m(o) - XyNY ?Xm(l) - X “ ( U ” - Y ? X p - x p 1
converges in distribution to a multivariate normal by Theorem 53.3, where the covariance matrix is defined by (6.3.10). Because E{Xm;Ym,r+h) converges to E { q K + h } as m 4 a, the conditions of Lemma 6.3.1 are satisfied and we obtain the conclusion. A
Theorem 6.3.6 can be proven for el that are martingale differences. See
Generally it is the estimated autocorrelations that are subjected to analysis, and Exercises 6.23 and 6.24, and Hannan and Heyde (1972).
hence their limiting distribution is of interest.
Corollary 63.6.1. Let the assumptions of Theorem 6.3.6 hold. Then the vector t~’’~[?(1) - p(l), 42) - p(2), . . . , 4K) - p(K)]’ converges in distribution to a multivariate normal with mean zero and covariance ma- G, where the hqth element of G is Zp”=-, [ A p l A p - h + q) + d p + q)P(p - h) - 2P(q)dp)p(p - h) - 2P(h)P(P)dP - 4) + 2dMJo)P2(P)l.
Proof. Since the i lh) are continuous differentiable functions of the Nh). the A result follows from Theorems 5.1.4.6.2.3, and 6.3.5.
336 ESTIMATION OF “HE MEAN AND AUTOCORReLATlONS
Observe that if the original time series X, is a se:pence of independent identically distributed random variables with finite moments, the sample correla- tions will be nearly independent in large samples. Because of the importaace of this result in the testing of time series for independence, we state it as a corollary.
Corollary 6.3.6.2. Let the time series e, be a sequemce of independent (0, a’) random variables with uniformly bounded 4 + S moments for some S > 0. Let &h) be defined by (6.2.12), and let K be a fixed integer. Then n(n - h)-”’8h), h = 1.2,. . . , K, converge in distribution to independent normal (0,l) random variables.
Proof. omitd. A
Example 6.3.1. The quarterly seasonally adjusted United States unemploy- ment rate from 1948 to 1972 is given in Table 6.3.1 and displayed in Figure 6.3.1. The mean unemployment rate is 4.77. The autocorrelation function estimated using (6.2.7) is given in Figure 6.3.2 and Table 6.3.2. This plot is metimes called a correlogram.
To cany out statistical analyses we assume the time series can be treated as a stationary time series with finite sixth moment. If the original time series was a sequence of uncmfated random variables, we would expect about 95% of the estimated correlations to fall between the lines plus and minus l.%n-’ (n - h)”’. ObviousIy unemployment is not an uncorrelated time series. Casual inspection of the conelogram might lead one to conclude that the time series contains a periodic component with a period of about 54 quarters, since the estimated correlations for h equal to 21 through 33 are negative and below the 1.96 sigma bounds for an uncorrelated time series. However, because the time series is highly correlated at small lags, the variance of the estimated correlations at large lags is much larger than the variance of correlations computed from a white noise sequence.
The first few autocorrelations of the unemployment tine series are in good agreement with those generated by the second order autoregressive process
XI = 1.5356X,-, - 0.6692X1-, + e, ,
where the el are uncorrelated (0.0.1155) random variables. We shall discuss the estimation of the parameters of autoregressive time series in Chapter 8. However, the fact that the sample autoconelations are consistent estimators of the population correlations permits us to obtain consistent estimators of the autoregressive parameters of a second order process from (2.5.7). The g e n d agreement between the correlations for the second order autoregmsive process and the sample correlations is clear from Table 6.3.2.
The roots of the second order autoregressive process are 0.76820.2825. We recall that the correlation function of such a second order process can be written as
p(h)=b ,mf+b ,m; , h = 0 , 1 , 2 ,...,
CENTRAL LIMIT THEOREMS FOR STATIONARY TlME SERIES
Table 6.3.1. US. Unemployment Rate (Quarterly Sensonally Adjusted) 1948-1972 Year Quarter Rate Year Quarter Rate Year Quarter Rate
337
1948
1949
1950
1951
1952
I953
I954
1955
1956
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
3.73 3.67 3.77 3.83 4.67 5.87 6.70 6.97 6.40 5.57 4.63 4.23 3.50 3.10 3.17 3.37 3.07 2.97 3.23 2.83 2.70 2.57 2.73 3.70 5.27 5.80 5.97 5.33 4.73 4.40 4.10 4.23 4.03 4.20 4.13 4.13
1957
1958
1959
196Q
1%1
1962
1%3
1964
1965
f 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
3.93 4.10 4.23 4.93 6.30 7.37 7.33 6.37 5.83 5.10 5.27 5.60 5.13 5.23 5.53 6.27 6.80 7.00 6.77 6.20 5.63 5.53 5.57 5.53 5.77 5.73 5.50 5.57 5.47 5.20 5.00 5.00 4.90 4.67 4.37 4.10
1966
1967
1968
1969
1970
1971
1972
1 2 3 4 1 2 3 4 I 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
3.87 3.80 3.77 3.70 3.77 3.83 3.83 3.93 3.73 3.57 3.53 3.43 3.37 3.43 3.60 3.60 4.17 4.80 5.17 5.87 5-93 5.97 5.97 5.97 5.83 5.77 5.53 5.30
Sources. Business Statistics, 1971 Biennial Edition, pp. 68 and 233 and Suwey of Current Buriness. January 1972 and January 1973. Quarterly data are the averages of monthly data.
where
(b,, b,) = (m* - m,>-"m, - dl), P(1) - m,I
For the unemployment time series the estimated parameters are 6 , = 0.500 -
ESTIMATION OF THE MEAN AND AUTOCORRELATIONS
0 15 30 45 60 75 90 105 T h e
Figure 6.3.1. United States quarterly seesonalIy adjusted ua~mployment tate.
0.2706 and 6 2 =I 6: = 0.500 + 0.2706. Using these value^, C B ~ estimate $le variance of the estimated autocorrelations for large lags using (6.2.11). We have
= 4.812,
Thus the estimated standard error of the estimated autocorrelations at large lags is about 0.22, and the observed correlations at lags near 27 could arise from such a process.
Given that the time series was generated by the second order autoregressive mechanism, the variance of the sample mean can be estimated by CorolIary
ESTIMATION OF THE CROSS COVARIANCES 339
1.2
1 .o
0.8
0.6
r(h) 0.4
0.2
0.0
-0.2
-0.4
Y
0 6 12 18 24 30 36 42 h
Figare 63.2. Cornlogram of quarterly s e a d l y adjusted unemployment rate.
6.1.1.2. By that corollary OD CO
?I 9a&}= x %h) = %O) 2 fib) h=-m hn-a)
0.1 155 (1 - 1.5356 + 0.6692)’
- -
= 6.47 . For this highly correlated process the variance of the mean is about five times that of an uncorrelated time series with the same variance. AA
6.4. ESTIMATION OF THE CROSS COVARLANCES
In OUT Section 1.7 discussion of vector valued time series we introduced the k X k
340 ESTIMATION OF THE MEAN AND AUTOCORRELATIONS
TabIe6.32 Esttrnurted Autocorrelatlom, Quarterly U.S. sepsonally AdJusted Un- employment Rate, 1-72
Lag Estimated Conrelations for h Correlations Il.%n-‘(n - h)’”J Second Order Process
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1.oooO 0.9200 0.7436 0.5348 0.3476 0.2165 0.1394 0.0963 0.0740 0.0664 0.0556 0.0352 0.0109
-0.0064 -0.0135
O.OOO4 0.0229 0.0223
-0.0126 -0.0762 -0.1557 -0.235 1 -0.2975 -0.3412 -0.3599 -0.3483 -0.3236 -0.3090 -0.3142 -0.3299 -0.33% -0.3235 -0.2744 -0.2058 -0.1378 -0.0922 -0.0816 -0.1027 -0.1340 -0.1590 -0.1669
- 0.1950 0.1940 0.1930 0.1920 0.1910 0.1900 0.1890 0.1880 0. I870 0.1859 0.1849 0.1839 0.1828 0.1818 0. I807 0.17% 0.1786 0.1775 0.1764 0.1753 0,1742 0.1731 0.1720 0.1709 0.1697 0.1686 0.1675 0.1663 0. I652 0.1640 0.1628 0.1616 0.1604 0.1592 0.1580 0.1568 0.1555 0,1543 0.1531 0.1518
I *oooo 0.9200 0.7436 0.5262 0.3 105 0.1247
-0.0164 -0.1085 -0.1557 -0.1665 -0.1515 -0.1212 -0.0848 -0.0490 -0.0 186
0.0043 0.0190 0.0263 0.0277 0.0249 0.0197 0.0136 0.0077 0.0027 -0.0010 -0.0033 -0.0044 -0.0046 -0.0041 -0.0032 -0.0022 -0.0012 -0.oo04
O.OOO2 O.OOO6 0.0007 O.OOO8 0.0007 0.0005 O.oo04 o.ooo2
ESTIMATION OF THE CROSS COVARIANCE3 341
covariance matrix
where X, is a stationary k-dimensional time series and f i = E{X,}. The j th diagonal element of the matrix is the autocovariance of 4,. xj(h) = E((5 , - &)(q,,+* - 4)}, and the tjth element is the cross covariance between Xi , and X,,,
xj(h) = E{(Xj, - f i X X j . t + h - 611 *
Expressions for the estimated m s s covariance analogous to those of (6.2.1) and (6.2.3) are
for the means known and taken to be zero, and
where the unknown means are estimated by f , , = n- Z:, I X,i. By (1.7.4). we can also write
The corresponding estimators of the cross correlations are
and
By our earlier results (see ”heorem 6.2.2) the estimation of the mean in estimator (6.4.2) introduces a bias that is O(n-’) for time series with absolutely sumrnable covariance function. The propeaies of the estimated cross covariances are analogous to the properties of the estimated autocovariance given in Theorems 6.2.1 and 6.2.3. To simplify the derivation, we only present the results for normal time series.
342 ESTIMATION OF THE MEAN AND AUMCORREUTIONS
Theorem 6.4.1. Let the bivariate stationary normal time series X, be such that
Then
where the remainder term enters because the mean is estimated. Evaluating the expectation, we have
. * - h n - a
Using Lemma 6.2.1 to take the limit, we have the stated result. A
Since the cross comlations are simple functions of the cross covariauces, we can obtain a similar expression for the covariance of the estimated cross correlations.
Corollary 6.4.1.1. Given the bivariate stationary n o d time series of Theorem 6.4.1,
ESTIMATION OF THE CROSS COVARIANCES 343
Proop. By Theorem 5.5.1 we may use the first term in Taylor's series to obtain the leading term in the covariance expression. Evaluating
922(0)]}, and Var{+[9,1(0) + 922(0)1} by the methods of Theorem 6.2.1, we obtain the conclusion. A
COV{9*2(h), 9&)1, COV{9,,,(h), +r?,l<o) f 9,2(0)11, Cov{91,(q), ;r.i;,co> +
Perhaps the most important aspect of these rather cumbersome results is that the covariances are decreasing at the rate n-'. Also, certain special cases are of interest. One working hypothesis is that the two time series are uncorrelated. If XI, is a sequence of independent normal random variables, we obtain a particularly simple result.
Corollary 6.4.13. Let X, be a bivariate stationary normal time series satisfying
m
and
z2(h) = 0 , all h . Then
In the null case, the variance of the estimated cross correlation is approximately II , and the correlation between estimated cross correlations is the autocotrelation of X,, multiplied by II-'. If the two time series are independent and neither time series is autocorrelated, then the estimated cross comlations art? uncorrelated with an approximate variance of n-',
It is possible to demonstrate that the sample covariances are consistent estimators under much weaker conditions.
- 1
Lemma 6.4.1. Let
where {a,} and {B,} are absolutely summable and {e,} = {(el,, e2,)'} is a sequence of independent (0, S) random variables. Assume E{(e, ,(2+S} < L for i = 1,2 and some
3-44 ESTIMATION OF THB MEAN AND AUTOCORRELATIONS
S >O, or that the el are identically distributed. Then
n-h
Proof. Define
fix h, and consider
If j # i - h,
where a: is the variance of eil. If j = i - h and u12 = E{ellea}, then
by the weak law of large numbers. [See, for example, Chug (1968, p. lo$).] Now
By Chebyshev's inequality
and it follows that
EsIlMATION OF THE CROSS COVARIANCES 345
uniformly in n. Convergence in probability implies convergence in distribution, and by an application of Lemma 6.3.1 we have that R Z,,, XIrX2 , r+k converges
A - 1 n h
in distribution to the constant ~ , ~ , ( h ) . The result follows by Lemma 5.2.1.
"hearem 6.4.2. Let {e,,} and {X,} be independent time series, where {el,} is a sequence of independent (0, a:) random variables with uniformly bounded third moment and (X,} is defined by
m
x, = c a,ez,-/ ' 3"o
where Z3?=?I+l<m and {eZt} is a sequence of independent (0,~:) random variables w~th uniformly bounded third moment. Then, for fixed h > 0,
proof. Wewrite
and note that, by Lemma 6.4.1,
For fixed h, the time series
is weakly stationary with
and bounded third moment. Asymptotic normality follows by a modest extension of Theorem 6.3.3. A
Example 6.4.1. In Table 6.4.1 we present the sample autocorrelations and
346 EsllMATfON OF THE MEAN AND AUTOCORRELATIONS
Table 6.4.1. Sample Correlation Functions for Suspended sediment fn Dcs Mohw River at Boooe, Iowa and Saylorvllle, Iowa
~ ~- ~ ~
Autocorrelation Autocorrelation ~scorrelat ion Boone Saylorville Boone-SaylorviUe
h ? , I (h) +& +,&) - 12 0.20 0.10 0.10 -11 0.19 0.13 0.07 -10 0.22 0.16 0.08 -9 0.25 0.16 0.08 -8 0.29 0.16 0.13 -7 0.29 0.17 0.18 -6 0.29 0.15 0.21 -5 0.32 0.18 0.21 -4 0.39 0.27 0.23 -3 0.48 0.42 0.30 -2 0.62 0.60 0.40 -1 0.76 0.8 1 0.53
0 1 .oo 1 .oo 0.64 1 0.76 0.81 0.74 2 0.62 0.60 0.67 3 0.48 0.42 0.53 4 0.39 0.27 0.42 5 0.32 0.18 0.32 6 0.29 0.15 0.26 7 0.29 0.17 0.26 8 0.29 0.16 0.25 9 0.25 0.16 0.29
10 0.22 0.16 0.3 1 11 0.19 0.13 0.28 12 0.20 0.10 0.33
closs correlations for the bivariate time series Y, = CY,,, Yzl)', where Y,, is the logarithm of suspended sediment in the water of the Des Moines River at Boone, Iowa, and Y,, is the logarithm of suspended sadiment in the water at Saylorville, Iowa. SayIorviUe is approximately 48 miles downstream from Boone. The sample data were 205 daily observations collected from April to October 1973.
There are no large tributaries entering the Des Moines River between Boone and Saylorville, and a correlation between the readings at the two points is expected. Since Saylmille is some distance downstream, the cornlation pattern should reflect the time required for water to move between the two points. In fact, the largest sample cross correlation is between the Saylorville reading at time t + 1 and the Boone W i n g at time r. Also, estimates of y,&), h > 0. am consistently larger than the estimates of y,J-h).
The Boone time series was discussed in Section 4.5. There it was assumed that
ESTlMATION OF THE CROSS COVARIANCES 347
the time series Y,, could be represented as the sum of the “true process” and a measurement error. ”he underlying true value XI, was assumed to be a first order autoregressive process with parameter 0.81. If we define the time series
W,, = Y,, - 0.81Y,,,-, , W,, = YZr -0.81Y2,,-, ,
the transformed true process for Boom is a sequence of uncorrelated random variables, although the observed time series W,, wilI show a small negative first order autocorrelation.
Table 6.4.2 contains the first few estimated correlations for W,. Note that the estimated cross correlation at zero is quite small. Under the null hypothesis that the cross correlations are zero and that the autocorrelations of W,, and W,, are zero
Table 6.4.2. Saxnpk Correlation Functione for Trpnsfonned Suspenaed Sediment in Res MoineS River at Boone, Iowa and SayiordUe, Iowa
Autocorrelation Autocorrelation Cross Correlation h 4, w,, %I with %I
-12 -11 - 10 -9 -8 -7 -6 -5 -4 -3 -2 -1
0 1 2 3 4 5 6 7 8 9
10 11 12
0.05 -0.06
0.03 0.01 0.11 0.05
-0.03 0.04
-0.01 -0.07
0.05 -0.17
1 .oo -0.17
0.05 -0.07 -0.01
0.04 -0.03
0.05 0.11 0.01 0.03
-0.06 0.05
0.05 0.02 0.08 0.01 0.04 0.07
-0.06 -0.12 -0.07
0.02 -0.04
0.15 1 .oo 0.15
- 0.04 0.02
-0.07 -0.12 -0.06
0.07 0.04 0.01 0.08 0.02 0.05
0.04 -0.07
0.05 0.11 0.03 0.08 0.10 0.03
-0.05 -0.01 -0.04
0.06 0.06 0.41 0.24
-0.02 0.01 0.04
-0.10 0.08
-0.08 0.05 0.16 0.08 0.0 1
348 ESTlMATION OF THE MEAN AND AUTOCORRELATlONS
after a lag of two, the estimated variance of the sample cross conrelations is
1 - 205 = 0,0046.
[l + 2(-0.17)(0.15) + 2(0.05)(-0.04)] --
Under these hypotheses, the estimated standard error of the estimated cross correlation is 0.068.
The hypothesis of zero cross correlation is rejected by the estimates Fu(l), t3,,(2), since they are several times as large as the estimated standard error. Tbe two nonzero sample cross correlations suggest that the input-output model is more complicated than that of a simple integer delay. It might be a simple &lay of over one day, or it is possible that the mixing action of moving water produces a m m complicated lag structure. AA
RJ%FERJDJCJB
SectIoo 6.1. Grenander and Rosenblatt ( 957). - ( 9701, Panen (1958, 1%2). Seetisn 6.2. Andeffon (19711, Bartlett (1946.1966). Hart (1942). Kendall(1954). Kendall
and Stuart (1966). Mariott and Pope (1954), von Neumann (1941, 1942). SeEtlOp 6.3. Anderson (1959, 1971), Andcrson and Walker (IW), Diananda (1953).
Eicker (19631, Hannan and Heyde (1972). Hoeffding and Robbins (1948). Mom (1W7).
Section 6.4. Bartlett (1966), Box, Jenkins, and Reinsel (1994), Hannan (1970).
EXERCISES
1. Let Y, = p + X,, where X, = e, + 0.4e,-, and the e, are uncorrelated (0, a') random variables. Compute the variance of j N = n - * Zfa, Y,. What is limn+, n VarQ,,}?
2. Letthetimeseries{Y,:tE(1,2, ...)} bedefioedby
U, = Q + pU,-, + e, ,
where U,, is fixed, {e,: t E (1,2, . . .)} is a sequence of independent (0, 02)
random variables, and lpl C 1. Find E{U,} and Var(Y,}. Show that U, satisfies the conditions of Theorem 6.1.1. What value does the sample mean of U, converge to?
ExERCiSEs 349
3, Let X, =e, + 0.5e,-,, where the e, are normal independent (0, c2) random variables. Letting
4. Evaluate (6.2.4) for the first order autoregressive process
X, = pX,- I + e, . where IpI C 1 and the e, are normal independent (0, a') random variables.
5, Given the finite moving average M
where the e, are normal independent ( 0 , ~ ' ) random variables, is there a distance d = h - q such that %h) and vq) are uncomlated?
6. Use the realization (10.1.10) and equation (6.2.2) to construct the estimated (3 X 3) covariance matrix for a realization of size 3. Show that the resulting matrix is not positive definite. Use the fact that
form=0,1, ..., n - l , h = O , l , ..., n- l ,where , fo r j= l ,2 ,..., 2n-1,
to prove that (6.2.3.) yields an estimated covariance matrix that is always positive semidefinite.
7. Give the variance of 4, n = 1,2,. . . , for {X,: t E (1.2,. . .)} defined by
XI = P + e, --el-, ,
where {e,: i E (0,1,2, . . .)} is a sequence of independent identically distributed ( 0 , ~ ' ) ran$m variables. Do you think there is a function w, such that wn(fn - P) + N O , 117
350 ESTIMATION OP THE MEAN AND AUTOCORRELATIONS
8. Prove the following result, which is used in Theorem 6.3.4. If the sequence (c,) is such that
then
n 3 m 1 6 1 6 n lim sup ( 2 c ; ) - ' c : = o . - 1
9. Prove Corollary 6.3.5.
10. The data in the accompanying table are the average weekly gross hoars per production worker on the payrolls of manufacturing establishments (seasonally adjusted).
YeiU I II m Iv 1948 1949 I950 1951 1952 1953 1954 1955 1956 I957 1958 1959 1960 1961 1962 1963 1964 1%5 1966 1967 1968 1969 1970 1971 1972
40.30 39.23 39.70 40.90 40.63 41.00 39.53 40.47 40.60 40.37 38.73 40.23 40.17 39.27 40.27 40.37 40.40 41.27 41.53 40.60 40.60 40.57 40.17 39.80 40.30
40.23 38.77 40.27 40.93 40.33 40.87 39.47 40.73 40.30 39.97 38.80 40.53 39.87 39.70 40.53 40.40 40.77 41.10 41.47 40.43 40.63 40.73 39.87 39.39 40.67
39.97 33.23 40.90 40.43 40.60 40.27 39.60 40.60 40.27 39.80 39.40 40.20 39.63 39.87 40.47 40.50 40.67 41.03 41.33 40.67 40.83 40.63 39.73 39.77 40.67
39.70 39.30 40.97 40.37 41.07 39.80 39.90 40.93 40.47 39.13 39.70 40.03 39.07 40.40 40.27 40.57 40.90 41.30 41.10 40.70 40.77 40.53 39.50 40.07 40.87
Sources: Bvsiness Statistics, 1971, pp. 74 and 237, and Surwy qf Current Bwhss. Jan. 1972, Jan. 193. Thc quarterly data arc the averages of monchty data.
EXERCISES 351
(a) Estimate the covariance function fih), assuming the mean unknown. (b) Estimate the correlation function p(h), assuming the mean unknown. (c) Using large sample theory, test the hypothesis Ho: p(1) = 0, assuming
p(h) = 0, h > 1.
11. Using Hart’s (1942) tables for the percentage points of d, or Anderson’s (1971) tables for ru, obtain the percentage points for t , = ru(n f 1)1’2(1 - ru2)-1’2 for n = 10 and 15. Compare these values with the percentage points of Student’s t with 13 and 18 degrees of freedoin.
12 Using the truncation argument of Theorem 6.3.3, complete the proof of Theorem 6.4.2 by showing that
n-lr
(n ZIh
converges in distribution to a normal random variable.
13. Denoting the data of Exercise 10 by X,, and that of Table 6.3.1 by X2,, compute the cross covariance and cross correlation functions. Define
Y,, =XI, - 1.53X,.,-, - 0.66X,,,-, , Y2, =X,, - 1.53X2,,-, - 0.66X2,-, .
Compute the cross covariance and cross correlation functions for <Y,,,Y,,). Plot the cross correlation function of (Y1,,Y,,). Compute the variance of P, (h) under the assumption that Y,, is a sequence of uncorrelated random v&les. Plot the standard e m on your figure.
14. Let X, be a time series with positive continuous spectral density A(@). (a) Show that., given e > 0, one may define two moving average time series
W,, and W,, with spectral densities
352 ESTIMATION OF THE MEAN AND AUTOCORRELATlONS
(b) Let {uf: t = 1,2,. . .} be such that
See Exercise 4.15 and Grenander and Rosenblatt (1957, chapter 7).
15. Prove Lemma 6.3.2.
16. Let XI = Z;&, b,e,-/. Show that
17. Let U, be the stationary autoregressive moving average
Rove that if Xy=o bi = 0, then vVn} = ~ ( n - ~ ) .
18. Let X I be a covariance stationary time series satisfying
where E7eo 191 < 00, X7mo ‘yi # 0, and the el are uncomlated (0, cr’) random variables. Show that
EXERCISES
19. Let m m
353
Rove
20. Let U, be a stationary time series with absolutely summable covariance function, and let
m -
X, = c cy,Y,-j 3
j - 0
where IiijI<MAJ for someM<w and O < A < l . Show that
21. Construct an alternative proof for Corollary 6.3.3 by averaging both sides of the defining equation to obtain
22. For a moving average of order q, show that nl"f(h) $N[O, 1 + 2 Z;= pz(k)] for lhl> q.
23. Prove the following.
Result 63.1. Let el be a sequence of random variables. let d, be the sigma-field generated by {es: s zs t}. and assume the e, satisfy (a) E{(e,, e: - d, e: - 77cr4) I ~e, - , } = (o,o, 01 as.,
(b) p h , - R
(c) ~ { l e , 1 ~ + ~ ~ } < ~ < ~ for some 6 >o.
- 1 - I X:=l E{e:e,-,, I .I&l-'} = limn-,- n E{e:e,-,,} = 0 for every fixed h > 0,
354 ESTIMATION OF THE MEAN AND AVMCORRELATIONS
Then, for any fixed K, B n ” 2 [ 7 e ( ~ ) - a2, yC(1), . . . , 9c(~)~’+ “0, u4 diag(q - 1, I, . . . ,111 .
24. Using Result 6.3.1 of Exercise 23, prove the following.
Result 6.3.2. Let Y, =XY-, &el-,, where the e, satisfy the conditions of Result 6.3.1. Then, for any fixed K,
n”2t7y(0) - MO), 74) - Yy(l), * ’ * 9 7rW) - I b ( r n 1 ‘ 5 M O Y) 9
where the elements of V are defined by (6.2.4) of Theorem 621.
C H A P T E R 7
The Periodogram, Estimated Spectrum
In this chapter we shall investigate estimators of the spectral density of time series with absolutely summable covariance function. The spectral density of a time series was defined in Section 4.1 as the Fourier transform of the covariance hction. We also noted in Section 4.2 that the variances of the random variables defined by the Fourier coefficients of the original time series are, approximately, multiples of the spectral density. These two results suggest methods of estimating the spectral density.
The study of the Fourier coefficients was popular among economists in the 1920s and 193Os, and the student of economics may be interested in the discussion of Davis (1941) and the studies cited by Tintner (1952). Also see Granger and Hatanaka (1964) and Nold (1972). Recent applications are more common in the engineering literature. See, for example, Marple (1987).
7.1. THE PERIODOGRAM
Given a finite realization from a time series, we can represent the n observations by the trigonometric polynomial
m 1 a0 X, = y -I- 2 (uk cos qtr + b, sin q f ) .
k = I
where
217k o,=- n ’ k=0 ,1 ,2 ,..., m ,
U
2 2 x, cos Wkt t = 1 ak = , k = 0 , 1 , 2 ,..., m , n
(7.1.1)
355
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
356
n
2 C X, sin wkt
n 1-1 6, = , k=l,2, ..., m ,
and we have assumed n odd and equal to 2m + I. As before, we recognize the Fourier coefficients as regression coefficients. We can use the standard regression analysis to partition the total sum of squares for the n observations. 'l%e sum of squares removed by the regression of X, on cos okt is the regression coefficient multiplied by the sum of cross products; that is, the sum of squares due to a, is
(7.1.2)
and the sum of squares removed by cos q t and sin q f is
We might call the quantity in (7.1.3) the sum of squares associated with frequency 0,. Thus the total sum of squares for the n = 2m + 1 observations may be partitioned into m + 1 components. One component is associated with the mean. Each of the remaining m components is the sum of the two squares associated with the rn nonzero frequencies. The partition is displayed in Table 7.1.1.
Should the number of observations be even and denoted by 2m, there is only one regression variable associated with the mth frequency: cos m. Then the sum of squares for the mth frequency has one degree of freedom and is given by n-'(Z:=, X, cos 7rt)2 = $nu,.
One might divide all of the sums of squares of Table 7.1.1 by the degrees of 2
Table 7.1.1. Andy& of Variance Table for a Same of Size n = 2m + 1
SwrCe Degrees of Freedom snm of squans
MfZlQ
Frequency w, = 21rln
Frequency w, = 4 d n
Frequency a,,, = 2mnln
Total
2
n I - I
THE PERIODOGRAM 357
freedom and consider the mean squares. However, it is more common to investigate the sums of squares, multiplying the sums of squares with one degree of freedom by two. The function of frequency given by these normalized sums of squares is called the periodogram. Thus, the periodogram is defined by
n Zn(o,)=~ (a: + b i ) , R = 1,2,. . . , nt, (7.1.4)
where m is the smallest integer greater than or equal to (n - 1)/2. Most computer programs designed to compute the periodogram for large data
sets use an algorithm based on the fast Fourier transform. See Cooley, Lewis, and Welch (1967) for references on the fast Fourier transform. Singleton (1969) gives a Fortran program for the transform, and Bloomfield (1976, p. 61) discusses the
If {XI} is a sequence of normal independent (0, a2) random variables, then the a, and b,, being linear combinations of the XI, will be normally distributed. Since the sine and cosine functions are orthogonal, the a, and b, are independent. In this case those entries in Table 7.1.1 with two degrees of freedom divided by c2 are distributed as independent chi-squares with two degrees of freedom.
The periodogram may also be defined in terms of the original observations as
pfroCedUE.
,(,)=a[( t-1 2 x l c o s q t ) 2 + ( f - 1 kxIs inq t )* ] , k = 0 , 1 , ..., m .
(7.1.5)
Note that if we define the complex coefficients c, by
then
(7.1.6)
Thus, the periodow ordinate at o, is a multiple of the squared nonn of the complex Fourier coefficient of the time series associated with the frequency q.
The periodogram is also expressible as a multiple of the Fourier transform of the estimated covariance function. If % Z 0, we can write
358 THE PERIODwRAhl, ESTIMATED SPECTRUM
where p = E{Xr}. Therefore,
for k = 1,2,. . . , m. In this double sum there are n combinations with t - j = 0, n - 1 combinations with t - j = 1, and so forth. Therefore, by letting p = t - j , we obtain the following result.
Result 7.1.1. The kth periodogmn ordinate is given by
(7.1.7)
(7.1.8)
The coefficients a, and b, (k # 0) that are computed using X, - .fn are identical to those computed using X,. Therefore, by substituting X, - Zn for X, in (7.1.5), we MI1 also write
w
where k # 0 and
(7.1.10)
THE PERIODOGRAM 359
The function f ( w ) of equation (7.1.8) is a continuous periodic function of w defined for all w. The periodogram has been defined only for the discrete set of points 0, = 2nk/n, k = 0 , 1 , 2 , . . . , m. In investigating the limiting properties of the periodogram, it is convenient to have a function defined for all w E [0, w]. To this end we introduce the function
* k=0,+-1 ,+ .2 ,..., 77tB+1) < w 4 dZ-1) n K(n,w)=R for I?
and take
I n ( 0 ) = In(wK(n,w,) *
Thus, I,@) for w E [O, n] is a step function that takes the value In(wk) on the interval (?r{2k - l } / n , n(2k + l } /n) .
The word periodogram is used with considerable flexibitity in the literature. Despite the apparent conflict in terms, our definition of the periodogram as a function of frequency rather than of period is a common one. We have chosen to define the periodogram for the discrete frequencies wk = 2nk/n, k = 0 . 1 . 2 , . . . , my and to extend it to all w as a step function. An alternative definition of the periodogram is 47$(0), where f ( w ) is given in (7.13). in which case one automatically has a function for all w. Our definition will prove convenient in obtaining the limiting properties of estimators of the spectrum.
The distributional properties of the periodogram ordinates am easily obtained when the time series is n o d white noise. To establish the promies of the periodogram for other time series, we first obtain the limiting value of the expectation of I,(&) for a time series with absolutely summable covariance function.
Theorem 7.1.1. Let X, be a stationary time series with E{X,}=p and absolutely summable covariance function. Then
lim E{Zn(w)} = 4*w), w # 0, n-vw
lim E{I,(O) - 2np2} = 4nJqO) w = 0.
Proof. Since E{j((p)} = Hp), it follows from (7.1.7) that
n-m
Now
360 THE PERIODOGRAM, ESTIMATED SPECTRUM
and the latter sum goes to zero as n + m by Lemma 3.1.4. The sequence gn(w) = 2 X ~ l ! ( n - l l ~ h ) cos wh converges uniformly to 4@w) by Corollary
A 3,1.8, and w ~ ( , , ~ ) converges to w by construction.
The normaiized coefficients 2-”2n’/2a, and 2-”%”* b, are the random variables obtained by applying to the observations the transformation discussed in Section 4.2. Therefore, for a time series with absolutely summable covariance function, these random variables are in the limit uncorrelated and have variance given by a multiple of the spectral density evaluated at the associated frequency.
The importance of this result is difficult to overemphasize. Faa a wide class of time series we are able to transform an observed set of n obseeeeeons into a set of n statistics that are nearly uncorrelated. All except the first, the sample mean, have zero expected value. The variance of these random variables is, approximately, a simple function of the spectral density,
Theorem 7.1.2. Let XI be a time series defined by
where (aj) is absolutely sumable and the el are independent identically distributed (0, cr’) random variables. Let &(w) be positive for all w, Then, for w and A in (0, T ) and w # A , the sequences [27@w)]-’Ja(w) and [27@A)]-’Zn(A) converge in distribution to independent chi-square random variables, each with two degrees of freedom,
proof. Consider
uniformly in n. Fixing r, we have
me PERJOLWRAM 361
where
and R, converges in probability to zero. As a,,, is uniformly bounded by (say) M, we have for E > 0
n r
and n-I1* Xr= SnCu,et converges in distribution to a normal random variable by the Linhberg condition. The asymptotic normality of 2- n uK(n,a,) follows by Lemma 6.3.1. The same arguments hold for a linear combination of n“2uK(n,r),
w # A, converges in distribution to a multivariate normal random variable. The covariance matrix is given by Theorem 4.2.1 and is
t / 2 112
I I 2 112 112 112 n112bK(n,(n.w)’ uK(n,A)y ’ K ( n . W so that 2- IQK(n,w)> ’K(n.o)* uK(n.A)* b K ( n . l ) l ,
Hence, by Theorem 5.2.4, the limiting distribution of Zn(w)/2?Tf(0) and Zn(A)/ 2MA) is that of two independent chi-square random variables with two degrees of freedom. A
Thus, for many nornormal processes, we may treat the periodogram ordinates as multiples of chi-square random variables. If the original time series is a sequence of independent (0, a’) random variables, then the periodogram ordinates all have the same expected value. However, for a time series with a nonzero autocorrelation structure, the ordinates will have different expected values. These facts have been used in constructing tests based on the periodogram.
Perhaps it is most natural to use the periodogram to search for “cycles” or “periodicities” in the data. For example, let us hypothesize that a time series is well represented by
XI = p + Acos ot + B sin cot + e l , (7.1.11)
where the e, ace normal independent (0, a’) random variables and A and B are fixed.
Fit assume that w is hown and of the form 2?rkln, where R is an integer. To test the hypothesis that A = B = 0 against the alternative A 20 or B ZO, we can USe
2 (2m - 2 m ; + b;)
2 2 (u:+b:) F 2 m - 2 = m 1
/ “ I I+&
2 where F2m-2 has the F-distribution with 2 and 2m - 2 degrees of freedom. Note that the sum of squares for the mean is not included in the denominator, since we postulated a general mean p. If w cannot be expressed as 27&ln, where k is an integer, then the regression associated with (7.1.1 1) can be computed aad the usual regression test constructed.
We sometimes believe a time series mntains a periodic component, but are unwilling to postulate the periodic function to be a perfect sine wave. For example, we may feel that a monthly time series contains a seasonal compoIlent, a large portion of which is because of a high value for December. We know that any periodic function defined on the integers, with integral period H, can be represented by
-+ a0 (a,cos-p+bksin-;ii-r), 2w& 27rk LIHI
2 k = i
where L[H] is the largest integer less than or equal to H/2 . For the monthly time series one might postulate
To test the hypothesis of no seasonal effect (i.e., all A, and Bk equal zero), we form Snedecor’s F as the ratio of the mean square for the six seasonal frequencies to the mean square for the remaining frequencies. Note that theae tests assume el to be a sequence of normal independent (0, a*) random variables.
The periodogram has also been used to s m h for “hidden peridcities.” In the above examples we postulated the frequency or frequencies of interest and hence, under the null hypothesis, the ratio of the mean squares has the F- distribution. However, we might postulate the null model
X, = p + el
and the alternative model
XI = p + A cos ot + B sin wt + e l ,
where w is unknown. In such a case one might search out the largest periodogram ordinate and ask if
THE PERIODOGRAM 363
this ordinate can reasonably be considered the largest in a random sample of size m selected from a distribution function that is a multiple of a chi-square with two degrees of freedom. A statistic that can be used to test the hypothesis is
where ZJL) is the largest periodogram ordinate in a sample of m periodogram ordinates each with two degrees of freedom. Fisher (1929) demonstrated that, for g>o,
where ( is constructed from the periodogram of a sequence of normal independent (p, r r2 ) random variables and k is the largest integer less tban g - ' . A table of the distribution of 6 is given by Davis (1941). W i k (1962, p. 529) contains a derivation of the distribution. In Table 7.1.2 we give the 1,5, and 10 percentage points for the distribution.
Under the null hypothesis that a time series is normal white noise, the periodogram ordinates are multiples of independent chi-squares, each with two degrees of M o m . Hence, any number of other "goodness of fit" tests are available to test the hypothesis of independence. Bartlett (1966, p. 318) suggested a test based on the normalized cumulative periodogram
(7.1.12)
The normalized cumulative periodogram for k = 1,2, . . . , m - 1 has the same distribution function as that of an ordered sample of size m - 1 selected from the uniform (0,l) distribution. Therefore, if we plot the normalized periodogram as a sample distribution function and apply the Kolmogorov-Smimov test of the hypothesis that it is a sample distribution function for a sample of m - 1 selected from a uniform (0,l) distribution, we have a test of the hypothesis that the original time series is white noise. This testing procedure has been discussed by Durbin ( 1967, 1969).
Example 7.1.1. In Section 9.2 a grafted quadratic trend is fitted to United States wheat yields ffom 1908 to 1971. The periodogram computed for the deviations from the trend is given in Table 7.1.3. As we are working with deviations from regression, the distributions of the test statistics are only approximately those discussed above. Durbii (1%9) has demonstrated that the critical points for the Kolmogorov-Smirnov-type test statistic using deviations from regression differ from those for an unaltered time series by a quantity that is o(n-').
364 THE PERIOWGRAM, ESTIMATED SPECTRUM
Table 7.16 Percentage Points for tbe Ratio of Largest Peciodogram Ordinate to the Average ._ . -
Probability of a Larger Value Number of Ordinates 0.10 0.05 0.01
2 3 4 5 6 7 8 9
10 15 20 25 30 40 50 60 70 80 90
100 150 200 250 300 350 400 500 600 700 800 900
lo00
1.900 2.452 2.830 3.120 3.354 3.552 3.722 3.872 4.005 4.511 4.862 5.130 5.346 5.68 1 5.937 6.144 6.317 6.465 6.595 6.71 1 7.151 7.458 7.694 7.886 8.047 8.186 8.418 8.606 8.764 8.901 9.022 9.130
1.950 2.613 3.072 3.419 3.697 3.928 4.125 4.297 4.450 5.019 5.408 5.701 5.935 6.295 6.567 6.785 6.967 7.122 7.258 7.378 7.832 8.147 8.389 8.584 8.748 8.889 9.123 9.313 9.473 9.612 9.733 9.842
1.990 2.827 3.457 3.943 4.331
4.921 5.154 5.358 6.103 6.594 6.955 7.237 7.663 7.977 8.225 8.428 8.601 8.750 8.882 9.372 9.707 9.960
10.164 10.334 10.480 10.721 10.916 1 1.079 11.220 11.344 11.454
4.65 i
To test the hypothesis that the largest ordinate is the largest in a random sample of 31 estimates, we form the ratio of the fourth ordinate to the average of ordinates 1 to 31:
THE PERIODOGRAM 365
Table 7.13. periodogwn of Devtatiom of Unfted States Wheat Yields from Trend
k PUiOd
in Years Periodogram
ordinate
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
64.0 32.0 21.3 16.0 12.8 10.7 9.1 8.0 7.1 6.4 5.8 5.3 4.9 4.6 4.3 4.0 3.8 3.6 3.4 3.2 3.0 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 2.1 2.0
0.54 5.15 4.74
53.50 9.86 3.02 3.20 2.62 3.56 2.7s 7.99 5.89 2.08 2.61 3.44 1.47 1.44 0.12 8.77 1.03 4.00 0.41 0.10
17.03 5.73 0.15 1.90 3.53 1.91 6.64 1.96
12.50
53.50 5.391 (= - = 9.92.
From Table 7.1.2 we see that the 1% point for this ratio is about 7.28. Thus, the null hypothesis is rejected at this level. While we may be somewhat reluctant to accept the existence of a perfect sine cycle of length 16 years on the basis of 64 observations, it is unlikely that the deviation from trend is a white noise time series.
The cumulative periodogram for the wheat yield data is displayed in Figure
366 THE: PERIOLXXRAM. ESTIMATED SPECTRUM
0 8 16 24 32 k
Figure 7.1.1. Normalized cumulative periodogram for deviations of wheat yield from trend.
7.1.1. We have followed the common practice of plotting the cumulative periodogram against k. The value of the cumulative periodogram for k is plotted on the interval (k - 1, k]. The upper and lower 5% bounds for the Kolmogomv- Smirnov test have been drawn in the figure. The lines are k/31 -I- 0.245 and k/31 - 0.245, where 0.245 is the 5% point of the Kolmogorov-Smirnov statistic for sample size 31. The tables of Birnbaum (1952) indicate that for m - 1 > 30, the 95% point for the Kolmogorov-Smimov statistic is approximately 1.36(m - 1)-”* and the 99% point is approximately 1.63(m - l)-’’’. In constructing the cumulative periodogram we included a l l 32 ordinates, even though the last ordinate has only one degree of freedom. Since the normalized cumulative periodogram passes above the upper 5% line, the data reject the hypothesis of independence, primarily because of the large ordinate at R = 4. AA
73. SMOOTHING, ESTIMATING THE SPECTRUM
It is clear from the development of the distributional properties of the periodogram that increasing the sample size has little effect on the behavior of the estimated ordinate for a particular frequency. In fact, if the time series is n o d (0.1) white noise, the distribution of the periodogram ordinate is a twodegree-of-Worn chi-square independent of sample size. The number of periodogram ordinates
SMOOTHING. ESTIMATINa THE SPECIRUM 367
increases as the sample size increases, but the efficiency of the estimator for a particular fresumcy remains unchanged.
If the spectral density is a continuous function of w, it is natural to consider an average of local values of the periodogram to obtain a better estimate of the spectral density. Before treating such estimators, we present an additional result on the covariance properties of the periodogram.
In investigating local averages of the periodogram we find it convenient to make a somewhat stronger assumption about the rate at which the covariance function approaches zero. We shall assume that the c o v a r i m are such that
h = - n
This is a fairly modest assumption and would be satisfied, for example, by any stationary finite autoregressive process. A sufficient condition for
for a time series X, is that
where m
and {el} is a sequence of uncorrelated (0, a2) random variables. This is because
rn m
Before giving a theorem on the covariance properties of the periodogram, we present two lemmas useful in the proof.
I -c I = I (fb "S Tb up - [b so3 $-v 803) '3 = ([+ 7)b so3 g
U U
d-u
' {d - 'dpp p ([+ 7)b up I ( d - u ' d } u p 3 ( f + 7)k so3 I d- u
SMOOTHING. ESTIMATING THE SPeCiaUM 369
where we have used Lemma 7.2.1. The sine result follows in a completely analogous manner. A
Theonm 7.2.1. Let the time series XI be defined by m
where the e, are independent (0, u2) random variables with fourth moment 77a4 and
Then
Furthermore, for the sequence composed only of even-sized samples,
Var{In(r)} = 2(4~)7~(7r) + 41).
Proop. By the definition (7.1.6) of the periodogram, we have
and
(7.2.1)
370 THE PERIODOORAM. ESTiMATED SPECTRUM
where we have used (6.2.5) and a, = 0, r < 0. Now,
4 n: -
by the abswte summability of 9. If q, = 9 = 0, or if n is even and q = 9 = n, the second term of (7.2.1) is
For wk = 9 $0, a the second term of (7.2.1) is
where the inequality follows from Lemma 7.2.2. For w, = 9 the third term reduces to
By Lemma 7.2.2, for w, # 9,
and the absolute value of the second term shown above is less than w' Y - I p = - ( n - l ) /p/lHp)1J2, which, by assumption, is an-'). By a similar argument, the third term of (7.2.1) is O(n-') when w, # q. Thus, if q, = 9 = 0 ZO, Ir,
SMOOTHING. ESTIMATING THB SPECTaUM 371
If q = % = O , then
" - 1 2
2 h - - ( n - t ) z: 41(h) ] =2(4v)2f2(O),
and the stated results follow. A
Corollary 7.2.1. Let XI be the time series defined in Theorem 7.2.1. In addition, assume the e, are NI(0, a') random variables and that
m
Then
cOv{ln(q), lfl(q)) = O(nv2) B q @k
Proof. Under normality, the first term of (7.2.1) is zero because 71 = 3 for the normal distribution. Under the assumption that XymI j lql< *, we have
m
and the remaining two terms of (7.2.1) are See Exercise 7.9. A
Since the covariances between periodogram ordinates are small, the variance of the periodogram estimator of the spectral density at a particular frequency can be reduced by averaging adjacent penodogmn ordinates. The simplest such estimator is defined by
where
In general, we consider the linear function of the periodogram ordinates d
A u k ) = wj)f(uk+j)9 -d
(7.2.2)
372 THE PERIOMxiRAM. ESTIMATED SPECTRUM
where 2;- - d ~ ( j ) = 1. The weight function ~ ( j ) is typically symmetric about zero with a maximum at zero.
We may extend frq) to all o by defining fro) =flwK(n,wJ w b the function K(n, w) was introduced in Section 7.1. In practice the vdues of flq) are often connected by lines to obtain a continuous function of o.
Theorem 7.2.2. Let the time series XI be defined by
where the e, are independent (0, cr2) random variables witb fourth moment qu4 and
Let d,, be an increasing sequence of positive integers satisfying
lim d,, = OQ , n - v
4 n - w fl l i m - = O .
Let the weight function Wn(.j), j = 0 , 2 1.22,. . . , -+-tin, satisfy
Then .&K(n,w)) defined by (7.2.2) satisfies
Proof. Now
SMOOTHING. WTIMATINQ THE SPECTRUM 373
Sinceflw) is u n i f d y continuous, given > 0, there exists an N such that for " 9 -fTo)l < for in the [wlp(i,w-Z3rd,/n)9 wK(n,e,+20d,/n)l'
Therefore,
By Theorem 7.2.1, for 2n-dn/n S o G ?r - 2wdnfn,
and, by the argument used for the expectation,
If o = 0 or w, only dn + 1 (or d,,) estimates i(q) are averaged, siace, for example, f ~ n - / n ) = j(-~n-/n). merefore,
VarNO)} = W,2(0)f2(O)
and the result follows. A
Corollary 7.2.2. Let the assumptions of Theorem 7.2.2 be satisfied with ~ , ( j ) = (un + I)-' . Then
Perhaps it is worthwhile to pause and summarize our results for the period- ogram estimators. First, the periodogram ordinates I&) are the sums of squares associated witb sine and cosine regression variables for the frequency q. For a
374 THE PERIODWRAM ESTIMATED S I W X R U M
wide class of time series the IJy) are approximately independently distributed as [2Mw,)],y;, that is, as a multipIe of a chi-square with two degrees of freedom (Theorems 7.1.1 and 7.2.1).
Ifflo) is a continuous function of o, then, for large n, djacent periodogram ordinates have approximately the same mean and variance. Therefore, an average of 2d + 1 adjacent ordinates has approximately the same mean and a variance (2d + l)-' times that of the original ordinates. It is possible to construct a sequence of estimators based on redizations of increasing size wherein the number of adjacent ordinates being averaged increases (at a slower rate than n) so that the average, when divided by 47r, is a consistent estimator of flu) (Theorem 7.2.2). The consistency result is less than fully appealing, since it does not tell us how m y terms to include in the average for any particular time series. Some general conclusions are possible. For most time series the average of the periodogram ordinates will be a biased estimator of 4Mo). For the largest poxtion of the range of most functions, this bias will increase as the number of terms being averaged increases. On the other hand, we can expect the variance of the average to decline as additional terms are added. (See Exercise 7.11.) Therefore, the mean square error of our average as an estimator of the spectral density will decline as long as the increase in the squared bias is less than the decrease in the variance. The white noise time series furnishes the limiting case. Since the spectral density is a constant function, the best procedure is to include all ordinates in the average [i.e., to use SO) to estimate 2Mo) for all 0). For a time series of known structure we could determine the optimum number of terms to include in the weight function. However, if we possess that degree of knowledge, the estimation problem is no longer of interest. The practitioner, as he works with data, will develop certain rules of thumb for particular kinds of data. For data with unknown structure it would seem advisable to construct several averages of varying length before reaching conclusions about the nature of the spectral density.
The, approximate distributional properties of the smoothed periodogmn can be used to construct a confidence interval for the estimated spectral density. Under the conditions of Theorem 7.2.2, the I,&) are approximately distributed as in- dependent chi-squares, and therefore Ao) is approximately distributed as a linear combination of chi-square random variables. One common approximation to such a distribution is a chi-square distribution with degrees of freedom determined by the variance of the distribution.
Result 7.2.1. Let X, satisfy the assumptions of Theorem 7.2.2 and let flu) > 0. Then, for n-dn/n < w < 41 - d,/n), f - ' ( o ) ~ w ) is approximately distributed as a chi-square random variable divided by its degrees of freedom 8 where
SMOUIMING, ESTIMATING THE! SPECI'RUM 375
An approximate 1 - a level corhdence interval forfto) can be constructed as
(7.2.3)
where ,y:,012 is the a/2 tabular value for the chi-square distribution with u degrees of freedom. (Point exceeded with probability d2 , )
the logarithm of f iw) for time series with fto) strictly positive will have approximately constant variance,
Since the variance of f iw) is a multiple of
2 var{logfiw)}=~~)l -2var@(w)}% .
Therefore, logfiw) is often plotted as a function of w. Approximate confidence intervals for logflw) are given by
U ( 2 ) - (7.2.4) (X,./,> XV.1 -(a/ 2 )
v logfito) + log 7 log f tw) c 10gJrw) + log
Example 7.2.1. As an illustration of spectral estimation, we consider an artificially generated time series. Table 7.2.1 contains 100 observations for the time series
X,=0.7X,-, fe ,
where the e, are computer generated normal independent (0,l) random variables. The periodogram for this sample is given in Figure 7.2.1. We have C O M W ~ ~ ~ the ordinates with straight lines. The approximate expected value of the periodogram ordinates is 4mw) = 2( 1.49 - 1.4 cos @ ) - I and has also been plotted in the figure. Both the average value and the variance are much larger for ordinates associated with the smaller frequencies.
We have labeled the frequency axis in radians. Thus, the fastest frequency we can observe is r, which is 0.50 cycles per time unit. This corresponds to a cycle with a period of two time units. In applications there are natural time units such as hours, months, or years, and one may choose to label the axis in terms of the frequency in these units.
In Figure 7.2.2 we display the smoothed periodogram where the smoothed value at q is
The smoothed periodogram roughly assumes the shape of the spectral density. Because the standard error of the smoothed estimator for 2 < k < 48 is about 0.45 of the true value, there is considerable variability in the plot. The smoothing
376 THE PERIOWORAM. ESTIMATED S m U M
Table 7.2.1. One Hamlred Observatlom fhom a FLrrst Order Alltoregressive Time Series with p = 0.7
._
First 25 Second 25 Third 25 Fourth 25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0.874 -0.613 - 0.366 0.850 0.110 - 1.420 2.345 0.113 -0.183 2.501 -0.308 -0.044 1.657 0.723 -0.391 1.649 -0.257 -0.095 2.498 1.051 -0.971 1.330 0.803 0.371 1.307 0.116 -1.622 3.404 -1.454 -2.941 2,445 0.2% -2.814 2.805 1.501 -1.784 1 A39 0.880 -2.471 1.240 -0.672 -3.508 1.116 0.436 -2.979 0.448 0.930 -0.779 0.377 1.168 0.869
-0.488 1.999 1.786 -0.960 1.376 0.123 -0.579 1.613 0.093 - 1.674 2.030 -0.731 -0.366 0.616 - 1.253 -0.922 0.667 -2.213 -1.174 0.707 -0.252 - 1.685 1.029 0.403
-0.955 -0.948
0.046 0.09 1 0.254 2.750 1.673 2.286 1.220
-0.256 0.252 0.325
-0.338 0.378 0.127 -2006 -2.380 - 2.024 - 1.085 1.037
-0.467 -0.794 -0.493 -0.157 0.659
introduces a Cosrelation between observations less than 2d + 1 units apart. In one sense this was the objective of the smoothing, since it produces a plot that more nearly approximates that of the spectral density. On the other hand, if the estimated spectral density is above the true density, it is apt to remain so for some distance. To compute smoothed values near zero and n; the periodic nature of the
function flu) is used. In most applications the mean is estimated and therefm ZJO) is not computed. We follow this practice in ow example. To compute the smoothed value at zero we set lJu,,) = I,,(ul) and compute
which, by the even periodic property of&@), is given by
SMOOTHING. ESTIMATING THE S f " R U M 377
Frequency (radians)
Ffgctre 7.2.1. Periodogram computed from the 100 autoregressive observations of Table 7.2.1 compared with 47&4.
In OUT example, replacing Iloa(oo) by Z,oo(wl),
&uo) = +[3(22.004) + 2(9.230)] = 16.894,
Similarly
As the sample size is even, there is a one-degree-of-freedom periodogram ordinate for w. The smoothed estimate at w is
in Figure 7.2.3 we plot the logarithm of the average of 11 periodogram ordinates (d = 5). The 95% confidence intervals are also plotted in Figure 7.2.3.
378
28
24
20
16 C 5 0
12
a
4
c1
-
I
0.0 0.4 0.8 1.2 1.6 . 2.0 2.4 2.8 3.2 Frequency (radians)
Flgure 7.22 Smootbed periodogram (d = 2) computed from 100 autowgressive obswetions of TabIe 7.2.1 compared with 49&w).
They were consmcted using (7.2.4), so that the upper bound is
and the lower bound is
This interval is appropriate for 6 6 R rd 44 and adequate for k = 45. Confidence intervals for other values of R could be constructed using the variance of the estimator. For example, the smoothed value at zefo is
and the vaxiance is approximately & [4@O)]2=0.21[4@O)]2. Site the
SMOOTHING, ESTfMATINC THE SPiXTRUM 379
-
r 0
0.0 - . Smoothed periodogram + Confidence interval
-OS8 -- rlrcf(O1
I
0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 Frequency (radians)
FIgare 7.2.3. Logarithm of smoothed periodogram ( d = 5 ) and confidence interval for logarithm of 4nfw) computed from the 100 automgmsive observations of Table 7.2.1.
variance of a chi-square with 10 degrees of freedom divided by its degrees of freedom is 0.20, we can establish a confidence interval for 4MO) using the critical points for a chi-square with 10 degrees of freedom. A similar approach can be used for 1GkS5 and 46GkG50,
Example 73.2. As a second example of spectral estimation, we consider the United States monthly unemployment rate from October 1949 to September 1974, Figure 7.2.4 is a plot of the logarithm of the smoothed peciodogram using the rectangular weights and d = 5. Also included in the figure are lines defining the 95% confidence interval. This plot displays characteristics typical of many economic time series. The smoothed periodogram is high at small frequencies, indicating a large positive autocornlation for observations close together in time. Second, there are peaks at the seasonal frequencies IT, 5d6 , 2d3 , d 2 , d 3 , d 6 , indicating the presence of a seasonal component in the time series. The periodogratn has been smoothed using the simple average, and the flatness of the seasonal peaks indicates that they are being dominated by the center frequencies. That is, the shape of tbe peak is roughly that of the weights being used in the smoothing. Table 7.2.2 contains the ordinates at and near the seasonal frequencies.
I Smoothed periodogram + Confidence limits
1 I I 1 I 1 I I 0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2
Frequency (radians) Figure 7.2.4. Logarithm of smoothed paiodogram for monthly United States unemptoyment rate for October 1949 through September 1974 (n = 300, d = 5).
With the exception of T, the ordinates at the seasonal frequencies are much larger than the other ordinates. Also, it seems that the ordinates adjacent to the seasonal fieqwncies are kger than those farther away. This suggests that the “seasonali- ty” in the time series is not perfectly periodic. That is, mote than the six seasonal frequencics are required to completely explain the pealrs in the estimated spectral density. AA
7.3. OTHER ESTIMATORS OF THE SPECTRUM
The smoothed periodogram is a weighted average of the Fourier transform of the sample autoccrvariance. An alternative method of obtaining an estimated spectral density is to apply weights to the estimated covarisllce function and then transform the “smoothed” covariance function. The impetus for this procedure came from a desire to reduce the computational costs of computing covariances for reaijzations with very large numbers of observations. Thus, the weight function has traditional- ly been chosen to be nonzero for the first few autocovariances and zero otherwise. The development of computer routines wing the fast Fourier transform reduced the cost of computing finite Fourier transforms and reduced the emphasis on the use of the weighted autocovariance estimator of the spectrum.
OTHER ESTiMATORS OF THE SPECTRUM 381
Table 7.2.2. Perlodogrlun Ordinates Nepr Seaeonel FrequeneieS
Frequency Frequency (Radians) Ordinate (Radians) Ordinate
0.46 1 0.482 0.503 0.523 0.545 0.565 0.586
0.984 1.005 1.026 1.047 1.068 1.089 1.110
1.508 1.529 1.550 1.571 1.592 1.613 I .634
0.127 0.607 5.328
3 1.584 2.904 0.557 0.069
0.053 0.223 0.092
23.347 0.310 0.253 0.027
0.191 0.01 1 0.177 6.402 0.478 0.142 0.012
2.032 2.052 2.073 2.094 2.1 15 2.136 2.157
2.555 2.576 2.597 2.618 2.639 2.660 2.681
3.079 3.100 3.120 3.142
0.025 0.02 1 0.279 0.575 0.012 0.01 1 0.008
0.028 0.010 0.286 4.556 0.038 0.017 0.022
0.008 0.003 0.294 0.064
Let w(x) be a bounded even continuous function satisfying
w(0) = 1 ,
Iw(.>l 1 for all x . W(x)=O, kI>L
Then a weighted estimator of the spectral density is
(7.3.1)
(7.3.2)
where g,, < n is the chosen point of truncation and
1 *-& %h) = j f -h) - 2 (X, - fm)(xf+h - q, - 7; ,Lf h 2 0 .
From Section 3.4 we know that the transform of a convolution is the product of the transforms. Conversely, the transform of a product is the convolution of the
382 THE PERIOWGRAM, ESTIMATED SP€XXRUM
transforms. In the current context we define the following function (or transform):
1
n-1
2n h = l
The function p(w) is a continuous function of w and is an unweighted estimator of the continuous spectral density. By the uniqueness of Fourier transforms, we can Write
We define the transform of w(n) similarly as
where W(w) is also a continuous function. It follows that
w(;) = W(w)eeUh d o I h = 0, 21, 22,. . . , f ( n - 1) I
Note that w(0) = 1 means that ,fZw W(s) ds = 1. Then, by Exercise 3.f4,
One should remember that both W(s) and p(s) are even periodic functions. Thus, the estimated spectrum obtained from the weighted estimated covariances w(hlg,)#h) is a weighted average (convolution) of the spectrum estimated from the original covariances, where the weight function is the transform of weights applied to the covariances.
The function W(w) is called the kernel or specha1 window. The weight function w(x) is often called the lag window.
Theorem 73.1. Let g, be a sequence of positive integers such that
lim &=a,
limn-’ g,=o;
n+m
n 4-
OTHER ESTIMATORS OF THE SPECTRUM 383
and let XI be a time series defined by
where the e, are independent (0, 02) random variables with fourth moment 7 p 4 ,
and (aj) is absolutely summable. Let
and
where w(x) is a bounded even continuous function satisfying (7.2.5). Then
and
the conditions
Proof. See, for example, Anderson (1971, Chapter 9) or Hannan (1970, Chapter 5). A
The choice of a truncation point gn for a particular sample of rt observations is not determined by the asymptotic theory of Theorem 7.2.3. As in our discussion of the smoothed periodogram, the variance generally increases as g, increases, but the bias in the estimator will typically decrease as g, increases. It is possible to determine the order of the bias as a function of the properties of the weight function w(x) and the speed with which y(h) approaches zero [see h n (l%l)J, but this still does not solve the problem for a given sample and unknown covariance structure.
Approximate confidence limits for the spectral density can be constructed in the same manner as that used for the smoothed periodogram estimator.
Result 7.3.1. Let XI satisfy the assumptions of Theorem 7.3.1, let w(n) satisfy
(7.3.1), and let f (w)>O. Then for rrv/2n< o< T - ?rvl;?n, f-'(w)&u) is approximately distributed as a chi-square random variable divided by its degrees of fbedom y where
2n v = * I_, w2(x)4.Jk
Considerable research has been conducted on the weights w(x) to use in estimating the spectrum. One of the simplest windows is obtained by truncating the sequence of autocovariances at 8,. This procedure is equivalent to applying the window
(7.3.3)
The function w(x) is sometimes called a truncated or rectangular window. While w(x) does not meet the conditions of Theorem 7.3.1, it can be shown that the conclusion holds. The spectral window for the function (7.3.3),
1 sin(g, ++)s 2 a sin(s/2) ' W(s) = -
takes on negative values, and it is possible that the weighted average fio) will be negative for some w. This is generally considered an undesirable attribute, since f ( o ) 3 0 for all o.
Bartlett (1950) suggested splitting an observed time series of n observations into p groups of M observations each. The periodogram is tben computed for each group, and the estimator for the ordinate associated with a particular frequency is taken to be the average of the p estimators; that is,
where f,,(q) is the estimator for the ordinate at frequency w, obtained from the sth subsample.
Barlett's estimator is closely related to the estimator with lag window
This window has been called modifid Bartktt or triangular. Setting g, = M, the spectral window is given by
MULTIVARIATE SPECTRAL eSTIMATES 385
Using Lemma 3.1.2,
sin2 ( M / 2 ) 0 21rM sin2(w/2) ’ We(w) =
To evaluate the variance of the modified Bartlett estimator, we have
M wz($ )= 2 (7) M - l h 1 2 = + M . h = - M h=-M
Thus, for the modified Bartlett estimator with covariances truncated at M, the variance of the estimated spectrum is approximately
Blackman and Tukey (1959) suggested the weight function
1-2a+2ucos?m, Ix lS l , otherwise. w ( 4 = {
The use of the window with u = 0.23 they called “hamming”, and the use of the window with a = 0.25, “harming.” Panen (1961) suggested the weight function { 1 - 6x2 + 6t.4’. bl =s 4 ,
W(X) = 2(1- 1.1)3, + s [XI 6 1 , otherwise.
This kernel will always produce nonnegative estimators. Brillinger ( 1975, Chapter 3) contains a discussion of these and other kernels.
Another method of estimating the spectral density is to estimate an auto- regressive moving average model for the time series and then use the spectral density defined by that model as the estimator of the spectral density. Because of its simplicity, the pure autoregressive model is often used. The smoothness of the estimated spectral density is a function of the order of the model fit. A number of criteria for determining the order have been suggested. See Akaike (1969a), Parzen (1974, 1977), Marple (1987). Newton (1988), and the references cited in Section 8.4.
The Burg method of fitting the autoregressive process, discussed in Section 8.2.2, is often used in estimating it. The method has the advantage that the roots of the autoregressive process are always less than one and the computations are such that it is relatively easy to consider autoregressive models of different orders.
7.4. MULTIVARIATE SPECTRAL ESTIMATES
We now investigate estimators of the spectral parameters for vector time series. Let n observations be available on the bivariate time series Y, = (Y,,, Yz,)’. The Fourier coefficients uIk , b,,, uZk, and b,, can be computed for the two time series
by the formulas following (7.1.1). Having studied estimators off,,(@) and&,(@) in the preceding sections, it remains only to investigate estimators 0fft2(0) and of the associated quantities such as the phase spectrum and the squared coherency.
Recalling the transformation introdwed in Section 4.2, which yields the normalized Fourier coefficients, we shall investigate the joint distributional properties of
2- 1 n 112 (u2r + fib,) = n-112 c y2fe+t, j (7.4.1) I = I
where w, = 2?rk/n, k = 0,1,. . . , n - 1. Define a transformation matrix H by
(7.4.2)
where G is an n X n matrix with rows given by
The matrix G was introduced in (4.2.6), and is the matrix that will diagonalize a circular matrix. Let
be the 2n X 2n covariance matrix of y = (yi, yb)', where
and
v,, =
(7.4.3)
. (7.4.4)
yJ-n + 1) %,(-PI f 2) x, ( -n + 3) * * - Yl1(0) . In Theorem 4.2.1, for time series with absolutely summable covariance function, we demonstrated that the elements of GV,,G* converge to the elements of the diagonal matrix 27rDii, where the elements of Dii areXi(w) evaluated at @k = 27rkl n, k=O,1,2, ..., n - 1 .
MULTIVARIATFi SPBXRAL ESTIMATES 387
It remains to investigate the behavior of GV,,G*. To this end define the circular matrix
VIZC = . ( 7 . 4 3
Then GV,,,G* is a diagonal matrix with elements
A4 - 1
, k = 0 , 1 , ..., n-1, (7.4.6) -*Pnkhln
h = - M
where we have assumed n is odd and set M = (n - 1) /2 . If n is even, the sum is from -M + 1 to M, where M = n / 2 , If y,,(h) is absolutely summable, we obtain the following theorem.
Theorem 7.4.1, Let Y, be a stationary bivariate time series with absolutely summable autocovariance function. Let V of (7.4.3) be the covariance matrix for n observations, Then, given 8 =- 0, there exists an N such that for n > N, every element of the matrix
is less than E in magnitude, where
and q: = 2vk/n, k = 0,1,. . . , n - 1.
proof. The result for D,, and D,, follows by Theorem 4.2.1. The result for Dlz is obtained by arguments completely analogous to those of Section 4.2, by showing that the elements of GV,,G* - GV,,,G* converge to zero as n increases. The details are reserved for the reader. A
If we make a stronger assumption about the autocovariance function we obtain the stronger result parallel to Corollary 4.2.1.
Corollary 7.4.1.1. Let Y, be a stationary bivariate time series with an autocovariance function that satisfies
W
Let V be as defined in (7.4.3), H as defined in (7.4.2), and D as defined in Theorem 7.4.1. ?%en every element of the matrix W H * - 2nD is less than 3Lln.
Proof. Tbe proof parallels that of Corollary 4.2.1 and is reserved for the reader. A
- 1 112 By T h m m 7.4.1, the complex coefficients 2 n (alk + &,,) and 2- n (av + eBY) are nearly uncorrelated in large samples if j # k. Since
Uik - ib,k = t 6bj,n-k , i = l , 2 , k = 1 , 2 ,..., n - 1 ,
I 112
it follows that
(7.4.7)
That is, the covariance between alk and a,, is approximately equal to that between b,, and b,, while the covariance betwecn b,, and aZk is approximately the negative of the covariance between bzk and alk .
We define the cross penorbgram by
To obtain a function defined at all o, we recall the function
(7.4.9)
Cordlarg 7.4.1.2. Let Y, be a stationary bivariate time series with absolutely summable covariance function. Then
n - m lim H42n(4 = 4#12(W).
Proof. The result is an immediate consequence of Theorem 7.4.1. A
MULTIVARIATE SpE(TfRAL ESTIMATES 383
We now obtain some distributional results for spectral estimates. To simplify the presentation, we assume that Y, is a normal time series.
Theorem 7.4.2. Let Y, be a bivariSte normal time series with covariance function that satisfies
112 IIZ Then r k =2- n (a,k,blk,azt,b2t)' is distributed as a multivariate normal random variable with zero mean and covariance matrix
for w, f 0, T, where jJ%) = c12(wk) - .+q,2&). Also
j f k . E{r&} = U(n-') ,
It follows that
and for w, # 0, T,
Proof. Since the a, and bi, are linear combinations of normal random variables, the normality is immediate. The moment properties follow from the
A moments of the normal distribution and from Corollary 7.4.1.1.
We may construct smoothed estimators of the cross spectral density in the same manner that we constructed smoothed estimators in Section 7.2. Let
(7.4.10)
390
Where
THE PERIODOGRAM. ESTlWATED SPECTRUM
d
and W,(j), j = 0, t 1,+.2, . . . , +d, is a weight function.
Tbeorem 7.4.3. Let Y, be a bivariate normal time series with covariance function that satisfies Z;,.-- lhl Ixj(h)l < L < 00, i, j = 1.2. Let dn be a sequence of positive integers satisfying
lim d, = 00, n+m
dn n+m n lim-=O,
and let Wn(j), j = 0, If: 1.22,. . . , hfn. satisfy
Then
where f,2(wKc,,,,,) is defined by (7.4.10) and K(n, o) is defined in (7.4.9).
Proof. Reserved for the reader. A
The properties of the other multivariate spectral estimators follow from Theorem 7.4.2 and Theorem 7.4.3. Table 7.4.1 will be used to illustrate the computations and emphasize the felationships to normal regression theory. Notice that the number of entries in a column of Table 7.4.1 is 2(2d + 1). We set
MULTIVARIATE SPECTRAL ESTIMATES 391
Table 7.4.1, StatisttCe Used in Computation oP Cross Sptxtra at ak = 2&/n, d = 2
Fourier coefficients Fourier Coefficients Signed Fourier for Yz for Y, Coefficients for Y,
' 2 . k - 2 ai.fi-2 -b , ,k -2
' 2 . k - 2 ' t , k - 2 ' 1 , k - 2
' 2 . k - I % - I -%-I
' 2 4 '1, +Ik
'2, b l , 'I It
a2,ki-2 ' l , k + 2 -b, .* +2
b2.k - I ' 1 . k - I
a2.t + I a l , k + l - b l , k + ,
' & k + I 'I,k+I
b2.k + 2 bI .4+2 ' 1 , k + 2
W , ( j ) = (26 + l)-' forj = 0,k 1,+2,. . , , +d. Then the cospectrum is estimated by
which is the mean of the cross products of the first two columns of Table 7.4.1 multiplied by n/4m The quadrature spectrum is estimated by
which is the mean of the cross products of the first and third column of Table 7.4.1 multiplied by n/4m
The estimator of the squared coherency for a bivariate time series computed from the smoothed periodogram estimator of f(w),
is given by
This quantity is recognizable as the multiple correlation coefficient of normal regression theory obtained by regressing the first column of Table 7.4.1 on the second and third columns, By construction, the second and third coIumns are orthogonal.
The estimation of the squared coherency generalizes immediately to higher
392 THE PERIOWORAM, ESTIMATED SPECIXUM
dimensions. If there is a second explanatory variable, the Fourier coefficients of this variable are added to Table 7.4.1 in the same form as the columns for Yl . Then the multiple squared coherency is the multiple correlation coefficient associated with the regression of the column for Yz on the four columns for the two explanatory variables. An estimator of the error spectnun or residual spectrum of Y2 after Y, is
(7.4.14)
This is the residual mean square for the regression of the first column of Table 7.4.1 on the second and third multiplied by n/47r. Many authors define the estimator of the error spectrum without the factor (2d + 1)/2d. We include it to - make the analogy to multiple regression complete. It also serves to remind us that X2 (w ) is identically one if computed for d = 0. A test of the hypothesis that XJyt) = 0 is given by the statistic l2
(7.4.15)
which is approximately distributed as Snedccor's F with 2 and 4d (d > 0) degrees of M o m under the null hypothesis. This is the test of the hypothesis that the regression coefficients associated with columns two and three of Table 7.4.1 are
If X:2(mk) # 0, the distribution of %:z(q) is approximately that of the multiple correlation coefficient [see, for example, Anderson (1984, p. 134). Tables and graphs useful in constructing confidence intervals for Xtz(wk) have been given by - Amos and Koopmans (1963). For many degrees of freedom and X : , ( 4 # 0 , Xi,(*) is approximately normally distributed with variance
zero.
(7.4.16)
The estimated phase spectrum is
where it is understood that 3,2(wk] is the angle in (-T, n] between the positive half of the cI2(yt) axis and the ray from the origin through (Cl2(uk) , -&(wk)). The sample distribution of this quantity depends in a critical manner on the true coherency between the two time series. If X:,(w,)=O, then, conditional on fI1(q), the variables ~~(q) / f ,~ (q ) and 4 ,2 (wkVf1 , (q ) are approximately distributed as independent normal (0, f,(wk)[2(2d + l)fll(q,)J-') random vari- ables. his is because c]2(Wk)/&,l(@k) and &(wk)/f11(64) are the regression coefficients obtained by regressing column one of Table 7.4.1 on columns two and three. It is well known that the ratio of two independent normal random variables with zero mean and common variance has the Cauchy distribution, and that the arc
MULTfVARIATE S-AL ESTMATES 393
tangent of the ratio has a uniform distribution. Therefore, if X;2(wk) “0, the principal value of @)t2(wJ will be approximately uniformly distributed on the interval (-7r/2, d 2 ) .
If X 7 2 ( q ) + 0 , then &(w) will converge in distribution to a normal random variable. While approximate confidence limits could be established on the basis of the limiting distribution, it seems preferable to set confidence limits using the normality of ~ ~ ~ ( ~ f ~ ~ ( q ) and g,2(~kVf11(~k).
Fieller’s method [see Fieller (1954)] can be used to construct a confidence interval for p&). This procedure follows from the fact that the statement -ql2(w)/cI2(w) = R,,(w) is equivalent to the statement c,,(o)R,,(o) + qI2(o) = 0. Therefore, the method of setting confidence intervals for the sum c,&)RI2(w) + qt2(w) can be used to determine a confidence interval for R, , (o ) = tan plz(w) and hence for p12(w). The (1 - a)-level confidence interval for the principal value of p12(w) is the set of p12(w) in [-7r/2, 7r/2] such that
Sin2[(P, , (~) - 61zfw)l t:rc:,cw> + 4:2(w)l-’9ar(C,,(w)) 9 (7.4.18)
where
9a&(4= (4d + 2)- If,, ( w l f a ( 4 7
t, is such that P{ltl> t,} = a, and t is distributed as Student’s t with 4d degrees of freedom. To obtain a confidence interval for plz(w) in the interval (- 7r, a] it is necessary
to modify Fieller’s method. We suggest the following procedure to establish an approximate (1 - a)-levei confidence interval. Let Fid(a) denote the a% point of the Fdistribution with 2 and 4d degrees of freedom. The possibilities for the interval fall into two categories.
Note that the criterion for category 1 is satisfied when the F-statistic of (7.4.15) is less than F$(a). Assuming E l , ( @ ) and tjI2(w) to be normally distributed, it can be proven that the outlined procedure furnishes a confidence interval with probability at least 1 - a of covering the true ~(0). If the true coherency is zero, the interval will have length 27r with probability 1 - a.
Recall that the cross amplitude spectrum is
A I 2 W = [c f2 (4 + q:2(w)1”z = If,2(41
394 THB PERMIWSRAM. ESTIMATED SPECTRUM
and the gain of X,, over XI, is
Estimator of these quantities are
17;,,(4 = V l l ( 4 r 1 & ( @ ) * (7.4.20)
It is possible to establish approximate confidence intervals for these quantities using the approximate normality of t2,2(tu)/fll(w) and 4&)/fIl(w). As a consequence of this normality,
has, approximately, the F-distribution with 2 and 4d degrees of freedom. Therefore, those C,~(O) and q,2(u) for which (7.4.21) is less than the a percentage tabular value of the F-distribution form a ( I - a)-level confidence region. Let
Assuming normal e,,(o) and tjI2(o), we conclude that [AL(w),AU(o)] is a confidence interval for A 1 2 ( o ) of at least level I - a. The confidence interval for gain is that of A l z ( ~ ) divided by fl I (w).
Example 7.4.1. We use the data on the sediment in the Des Moines River discussed in Section 6.4 to illustrate some of the c m spectral computations. Table 7.4.2 contains the Fourier coefficients for the first 11 frequencies for the 205 observations. For d = 5, these are the statistics used to estimate $&), where o6 = 0.0293 cycles per day.
Using the rectangular weight function, we have
MULTIVARIATE SeEciaAL ISTIMATES 395
Table 7.4.2. S t a ~ t l c p Used in Computing Smoothed Btimptee of Crm Spectrum for Sediment in the Des M d m s River at Baone and SaylorviHe for Frequency of 0.0293 Cycles per Day, d = 5
1
2
3
4
5
6
7
8
9
10
11
0.0049
O.Qo98
0.0146
0.01 95
0.0244
0.0293
0.034 1
0.0390
0.0439
0.0488
0.0537
205.00 -0.194 0.305
102.50 0.175 0.161
68.33 0.01 1 0.013
5 1.25 0.058 0.147
41.00 0.067 0.042
34.17 0.02 1 0.061
29.29 0.322 0.059
25.63 0.065 -0.067
22.78 -0.053 0.019
20.50 0.281 0.037
18.64 0.08 1 -0.062
-0.103 0.504 0.432 0.1% 0.047
-0.039 0.055 0.088 0.202 O.lt6 0.133 0.072 0.234
-0.012 -0.017 -0.019 -0.013
0.098 0.332
-0.053 0.152 0.024
-0.504 -0.103 -0.1%
0.432 0.039
-0.088 0.055
-0.116 0.202
-0.072 0.133 0.01 2 0.234 0.019
-0.017 -0.098 -0.013
0.053 0.332
-0.026 0.152
0.047
= 4.6768 - 1.43 13 U
It follows that
$22(0.0293) = 0.3209,
$, , (0.0293) = 0.5809 ,
$, 2( 0.0293) = 0.3722 - 0.1 I38 G: .
The error spectrum for Saylorville is estimated by the residual sum of squares obtained from the regression of the Saylorville column on the Boone columns
3% THE PBRIODOGRAM. ESTIMATED SPECTRUM
multiplied by 2O5[2( 10)(471)] - I . We have
205 J;;,(0.0293) = [0.4327 - 0.6407(0.5019) - (-0.1960)(-0.1536)]
= 0.0661
The squared coherency is
The F-statistic to test the hypothesis of zero coherency is
202t,(0.0293) = 43.42. "' = 2[1 -%;,(0.0293)]
This is well beyond the 1% tabular value of 5.85 for 2 and 20 degrees of freedom, and it is clear that the two time series are not independent. The estimate of the
@,,(0.0293) = tan-'[-1.4313/4.6768] = -0.2970 radians.
phase spectrum is
+ Upper confidence fimit 2.0 + Lower confidence limit
h VI g 0.0 .- -0 E Y
0 M - $j -2.0
2 z 0-
-4.0
-6.0
0.0 0.1 0.2 0.3 0.4 0.5 Frequency (cycles per day)
Figure 7d.1. Estimated phase spectrum for Boom-Saylorville sediment data (d = 5 and rectangular weight function).
MULTIVAWATE SPECTRAL ESTIMATES 397
Let us establish a 95% confidence interval for qI2(0.0293). Because the F-test rejects the hypothesis of zero coherency at the 5% level, the criterion of category 2 is satisfied. Consequently the confidence interval for ( P ~ ~ ( W ~ ) is (-0.5228, -0.07 121, where
S = sin-'[ (2.086)' 20(o.8128) 0.1872 ] "' = 0.2258 ~
The estimated gain of &, over X,, is
~,,,(0.0293)1 0.3892
f l ,(0.0293) - OS809 - - = 0,6700 . $,,,<4 =
A 95% confidence interval for #,,,(w6) is given by [t,hL, t,hut], where
t,hL = 0.6700 - [( 1 l)-'(0.5809)-1(0.0661)(3.49)]"2 = 0.4800,
$" = 0.6700 + [(11)-'(0.5809)-'(0.0661)(3.49)]''2 = 0.8600.
Figure 7.4.1 is a plot of Qlz(o) and the confidence interval for plptz(o) (d = 5 and the rectangular weight function) for the Boone-Saylorville sediment data. Recall that q12(o) is an odd function of o with q12(0) = q,J.rr) = 0. Therefore, we
0.0 0.1 0.2 0.3 0.4 0.5 Frequency (cycles per day)
Figure 7.4.2 Squared coherency for the Boone-Saylorville sediment data ( d = 5 and rectangular weight hmnction).
398 THE F'ERIODOGRAM. ESTIMATED SRYTRUM
define the imaginary part of Z I z m ( w k ) to be the negative of the imagirtary part of Zlzn(ok). Likewise, the imaginary part of Z12,,(a + A) is set equal to the negative of the imaginary part of Ilzm(rr - A). IfZ*&(O) or Zlzm(a) are computed, the imaginary part is zero. As a result, &(O) = 0, and &(a) = 0 if n is even. It follows that the estimated phase at zero is 0 if E,,(O) > 0 and is a if EI2(0) C 0. In plotting the estimated phase a continuous function of o is desirable. Therefore, in creating Figure 7.4.1, that angle in the set
that differed least from the angle previously chosen for q-l, k = 2,3,. . . ,102, was plotted. The general downward slope of &(o) is associated with the fact that the
readings at Saylorville lag behind those at Boone. If the relationship were a perfect one period lag, @j2(w) would be estimating a straight line with a negative slope of one radian per radian. The estimated function seems to differ enough from such a straight line to suggest a more complicated lag relationship.
Figure 7.4.2 contains a plot of squared coherency for the sediment data. The 5%
Frequency (cycles per day)
Figure 7.4.3. Plot of 49&(w) and 47&(0) for Saylorvilk sediment (d = 5 and ncmgular weight function).
MULTIVARIATE SPECTRAL ESTIMATES 399
point for the F-distribution with 2 and 20 degrees of freedom is 3.49. On the basis of (7.4.15), any %;,(o) greater than 0.259 would be judged significant at that level. A lie has been drawn at this height in the figure. Similar information is contained in Figure 7.4.3, where the smoothed period-
ogram for Saylorville and 4Tf"(w) are plotted on the same graph. The estimated e m spectrum lies well below the spectrum of the original time series for low frequencies, but the two nearly coincide for high frequencies. The estimated error spectrum is clearly not that of white noise, since it is considerably higher at low fquencies than at high. One might be led to consider a first or second order autoregressive process as a model for the e m .
Figure 7.4.4 is a plot of the estimated gain of Saylorville over Boone. The 95% confidence interval plotted on the graph was computed using the limits (7.422) divided by $, (w). Note that the lower confidence bound for gain is zero whenever the squared coherency falls below 0.259.
0.0 0.1 0.2. 0.3 0.4 0.5 Frequency (cycles per day)
&gom 7.4.4. Gain of Saylorvilb over Boone (d = 5 and rectangular weight function).
400 THE PERIODOGRAM, ESTIMATED SPeCFRUM
REFERENCE3
Section 7.1. Bartlett (1950, 1966). Bimbaum (1952). Davis (1941), Durbin (1967, 1968, 1969). Eisenhart, Hastay, and Wallis f1947), Fisher (1929). Gnnander and Rosenbiatt (1957), Hannan (1958, 1960), Harris (1967). Kendall and Stuart (1%6), Koopmans (1974), Tintner (1952). W e y (1961), Wilh (1962).
Sections 747.3. Akaike (I%%), Anderson (1971), Bartle# (1950), Blackman and Tukey (1959), Bloomfield (19761, Box (19541, Brillinger (1975), Hannan (1960, 1970). Jenkins (1%1), Jenkins and Watts (1968). Koopmans (1974), Marple (1987), Newton (1988). Otnes and hochson (1972). Parzen (1957,1961,1974,1977), Priestly (1981).
Section 7.4. Amos and Koopmntns (19631, Anderson (19841, Brillinger (1975), Fiellcr (1954). Fishman (1%9), Hannan (1970), Jenkins and Watts (1968), Koopmans (1974). Scbeffi (1970). Wahba (1%8).
EXERCISES
1. Compute the periodogcam for the data of Table 6.4.1. Calculate the smoothed peciodogram using 2,4 and 8 for d and rectangular weights. Plot the smoothed periodograms, and observe the differences in smoothness and in the height and width of the peak at zero. Compute a 95% confidence interval for 4 M w ) using the smoothed periodogmm with d=4. Plot the logarithm of the smoothed periodogram and the confidence interval for log4.rrs(o).
2. Given in the accompanying table is the quarterly gasoline consumption in California from 1960 to 1973 in millions of gallons.
Q- Year I I1 III IV 1960 1335 1443 1529 1447 1%1 1363 1501 1576 1495 1962 1464 1450 161 1 1612 1963 1516 1660 1738 1652 1964 1639 1754 1839 1736 1955 1699 1812 1901 1821 1966 1763 1937 2001 1894 1967 1829 1966 2068 1983 1968 1939 2099 2201 2081 1969 2008 2232 2299 2204 1970 2152 2313 2393 2278 1971 2191 2402 2450 2387 1972 2391 2549 2602 2529 1973 2454 2647 ’ 2689 2549
Source. U.S. Dept. of Transp&ation (1975). Review and Analysis of Gasoline Consumption in the United Stares from 1- to the Presenr, and US. Degi. of Trnnsportation, News, various issues.
EXERCISES 401
Using these data: (a) Compute the periodogram. (b) Obtain the smoothed periodogram by computing the centered moving
average
(c) Fit the regression model
y, = a f pt +z, . Repeat parts a and b for the regression residuals 2,.
weight function W,( j) where (d) Compute the smoothed periodogram for 2, of part c using the symmetric
j = O , 0.2, j = 1 ,
0.05 , j = 3,
(e) Fit the regression model
where
1 , jthquarter, atj ={ 0 otherwise.
Repeat parts a and b for the residuals from the fitted regression. Compute and plot a 95% confidence interval for the estimated spectral density.
3. Let the time series X, be defined by
X, =el +O.&,-, ,
where {e,} is a sequence of n o d independent (0,l) random variables. Given a sample of l0,OOO observations froin such a time series, what is the appmximate joint distribution of the periodogram ordinates associated with qsoo = 27r(2500)/10,000 and 0,2s0 =2n(1250)/10,O00?
4. Prove the sine result of Lemma 7.2.2.
5. Prove that the covariance function of a stationary finite autoregressive process
44x2
satisfies
THE PBRIODOGRAM, ESIPMATED SPECTRUM
h = - n
6. Use the moment propedes of the n o d distribution to demonstrate the portion of Theorem 7.4.2 that states that
7. Let X,, denote the United States quarterly unemployment rate of Table 6.4.1, and let X,, denote the weekly gross hours per production worker given in Exercise 13 of Chapter 6. Compute the penodogram quantities lll,,(a+), ZIZn(q), and Z22n(ok). Compute the smoothed @mates using d = 5 and the rectangular weight function. Compute and plot X;*(q), qI2(uk). Obtain and plot a confidence interval for q2(uk) and for the gain of X,, over XI,. Treating hours per week as the dependent variable, plot 47&,(@).
8. Show that
where
l o otherwise.
9. Let XI be a time series defined by m
where {el} is a sequence of independent (0, 02) random variables with fourth moment 9u4 and
m
EXERCISES 403
(a) Show that e
for such a time series. (b) Let d,,, Wn(j), and A m ) be as defined in Theorem 7.2.2. Show that
E N @ ) } = A m ) + o(n-'d,) .
10. Given X,, X,, . . . X,, let &(h) be defined by (7.1.10). Consider the aug- mented observations
n + 1 s t s 2 n - 1.
(a) Show that the periodogram of Y, can be written n- I
zy,2n-1(q) = (2n - 1)-'2n 2 9x(h)e'"Jh, h = - n + l
where 9 = (2n - 1)-'27rj. (b) Show that
A ( j ) = ('Ln)-I(2n - l)"zc, *
where = (2n - 1)-'27rj and
r=O
11. Suppose XI X,, . . . , X,,, are unconelated random variables. Show that
vsr((p+ l ) - 1 5 1 x i } a 3 r { P - 1 Z X , ) P
i = I i = l
unless
C H A P T E R 8
Parameter Estimation
8.1. FIRST ORDER AUTOREGRESSIVE TIME SERIES
The stationary time series defined by
U,-p=p(Y,-, - p ) + e , , (8.1.1)
where the e, are normal independent ( 0 , ~ ' ) random variables and lpl< 1, is one of the simplest and most heavily used models in time series analysis. It is often a satisfactory repsentation of the error time series in economic models. This model also underlies many tests of the hypothesis that the observed time series is a sequence of independently and identically distributed random variables.
One estimator of p is the first order autocornlation coefficient discussed in Chapter 6,
(8.1.2)
- I where ji, = n X:= I U,. To introduce some other estimators of the parameter p, let us consider the distribution of the U, for the normal time series defined by (8.1.1).
The expected value of U, is p, and the expected value of Y, - pY,-, is A = (1 - p)p. For a sample of n observations we can Write
or
Y l = P + Y U,=A+pU,- , +e, , t = 2 , 3 ,..., n ,
where the vector (q, e2, e3,. . . ,en) is distribted as a multivariate normal with zero mean and covariance matrix
(8.1.4) 2 - I 2 E=diag{(l - p ) u , U ~ , U ' , . . . ,a2}.
404
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
FTRST ORDER AUTOREGRESSIVE TiME SERIES 405
It follows that twice the logarithm of the likelihood of a sample of n observations is
2logz4y: p, p, U Z ) = -n l og2r - n log ff2 + log(1 - p2)
(8.1.5)
The computation of the maximum likelihood estimators is greatly simplified if we treat Yl as fixed and investigate the conditional likelihood. This is also an appropriate model in some experimental situations. For example, if we initiate an experiment at time 1 with an initial input of Y,, it is very reasonable to condition on this initial input.
To construct the conditional likelihood, we consider the last n - 1 equations of (8.1.3). Maximizing twice the logarithm of the likelihood,
2 logL(y: & p, a21 Y,) = -(n - 1) log 27r - (n - 1) log r2 n
- g - 2 (y , - A - py,-,)2 9
r=2
leads to the estimators
(8.1.6)
i = Y ( o , - f iY( - l ) 9 (8.1.7) n
G2=(n - l)-'c [(u, -Y(o))-b(y,-, - Y ( - , J 2 , r=2
where (yt-,), y(o,> = (n - i 1 - I Xy.=2 <Y,-,, Y,). These estimators are, strictly speak- ing, not the maximum likelihood estimators for the model stated in (8.1.1). The estimator can take on values outside of (-1,1), while the maximum likelihood estimator is constrained to the parameter space.
The estimators of A and p are those that would be obtained by applying least squares to the last n - 1 equations. he least squares estimator of c2,
n
s2=(n-33)-' c [ (y , -Y(o) ) - iXK-, -Y(- t ) )12 , (8.1.8) 1-2
is typicdly used in place of 6'. The least squares estimator for p differs from (8.1.2) by terms whose order in
probability is n-'. Therefore, by (6.2.9) and Corollary 6.3.6.1, n"'(@-p) is approximately normally distributed with mean zero and variance equal to 1 - pz. The limiting distribution is also derived in the next section. Note that the estimator
406 PARAMEIER ESTIMATION
of p defined by (8.1.7) can be greater than one in absolute value, while that defined in (8.1.2) cannot.
Let us now retum to a consideration of the unconditional likelihood as given by equation (8.1.5). Differentiating the log likelihood with respect to 1u, p, and c2 and setting the derivatives equal to zero, we obtain
n - I
Ic = 12 + (n - 2)(1 - P ) I j YE + (1 - p) 2 y, + Y.] , 1-2
If y is known, Anderson (1971, p. 354) shows that the maximum likelihood estimator of p is a root of the cubic equation
f lp ) = p3 + c,p2 + c2p + c3 = 0 , (8.1.10)
where cj = - (n - 2)-’nc,,
and y, = U, - p. Hasza (1980) gives explicit expressions for the three roots of (8.LlO) and shows that there is a root in each of the intervals (-m, -1), (-1, 1) and (1, a). If yI is stationary, then
[CI, c21 = - [B, (n - 2)-’nI + O,(n-’) *
where the next section that b, - po = o,(n-I’’), where po is the true value. Hence,
= [Xy=2 y:-,]-’ Z:=, yI-Iyr is the least squares estimator. We show in
2 mP’ P3 - POP2 - P + Po = (P - 1)(P - Po) ‘
It follows from the results of Section 5.8 that the three mots offlp) = 0 converge in probability to - I , p,,, and 1, respectively. Therefore, the root in ( - 1 , f ) is consistent for po. We show in section 8.4 that the least squares estimator (8.1.7) and the maximum likelihood estimator have the same limiting distribution for stationary processes.
If y is unknown, Gonzalez-Farias (1992) showed that the unconditional
H I G W ORDER AUTORBORESSIVE TIME SERIFS 407
maximum likelihood estimator is a solution of a fifth degree polynomial. A numerical solution can be obtained by iterating equation (8.1.10) and the estimator for p in (#.1.9), beginning with 4 = 9,.
8.2. HIGHER ORDER AUTOREGRJWIVE TIME SERIES
8.2.1. Least Squares EstDmation for Uaivarlate Processes In this section we study the ordinary least squares estimators of the pth order autoregressive process. Consider the time series
P
U, + C aiyt-, = 0, + e, , (8.2.1) f = 1
where the! mots of
(8.2.2)
are less than one in absolute value and the e, are uncorrelated (0, a') random variables with additional properties to be specified. Because the procedures of this section are closely related to multiple regression, it is convenient to write (8.2.1) as
where 4 = -q, i = 1,2,. . . , p. If the process is stationary, the expression
where E(Y,} = p = (1 + Z%, q)-lfJo, is also useful. The ordinary least squares estimator of 6 = (8,. el,. . . ,S,)' is
a=[ l=p+l i: x:x,]-' I=p+ i: I x;U,,
(8.2.4)
(8.2.5)
By the properties of autoregressive processes, E{Y,-,e,}=O for j > O and E{X;e,} = 0. Also, if the process is stationary with autocovariance function fih), then
408 PARAMETER ESTIMATION
and under the conditions of Theorem 6.2.1,
It follows that the e m r in 8 is O,(n-’”). Also, by (8.2.7) the least squares estimator is asymptotically equivalent to the estimator
where %h) is defined in (6.2.3). The estimator (8.2.8) is sometimes called the Yule-Walker estimator.
he least squares estimator of a* is n
s2=(n-2p-1) - ’ I: s;, (8.2.9)
where Z, = Y, - X# The divisor for a’ is defined by analogy to ordinary regression theory. There are n -p observations in the regression, and p 4- 1 parameters are estimated.
The asymptotic properties of d and a’ are given in the following theorem. The theorem is stated and proven for martingale difference errors, but the result also holds for el that are iid(0, (r2) random variables.
r=p+ I
Theorem 8.2.1. Let Y, satisfy
P
x = t t i + X + ~ - ~ + e , , t = p + l , p + 2 ,.... (8.2.10) j - I
Assume (Y l , Y2,. . . , Y ) is fixed, or assume (Y,, Y2, . . . , Y,) is independent of e, for t > p and E{IY,12+’}<m for i = 1,2,. . . , p and some v>O. Let the roots of
(8.2.11)
be less than one in absolute value. Suppose {e,}]-, , is a sequence of (0, a‘) random variables with
HIGHER ORDER AUTOREGRESSIVE TIME SERIES
and
for all t and some Y > 0, where d,-, is the sigma-field generated by {el : j C t - 1). If (Y, , Y2, . . . , Yp) is random, the sigma-field is generated by {ej : j S t - 1) and V I , y2, * 9 Yp>. Let
- 1
6= 1c x : x , i: x;q, 1 ,=p+l
where X l = ( l , q - I , q - 2 ,..., yt-p). Then
(a) n”’( b - 0)’. N(0, A-’02), and (b) 6 ’ 4 a’,
where
A = l i n-’ E(XiX,), ,+- 1=p+l
and 8’ is defined in (8.2.9).
Proof. We have
Given e>O, there is some No such that Z:=,+, XiX, is nonsingular with probability gteater than 1 - E for n >No. Let q be a column vector of arbitrary real numbers such that q’q#O, and consider those samples for which Y - p + I XiX, is nonsinguiar. Let
l=p+I r=p+l
where Z,, = n-112q‘Xief. Then E@,, I a?,-, I ) = 0 as.,
6 ~ = E { 2 ~ , ) 5 8 , _ , } = n - ’ ~ ’ X ; X f ~ * ,
and
imp+ I
410 PARAMETER ESTIMATION
By Theorem 6.3.5,
Now
and 2 lim s,, =lim E{S;,} = q1Aqu2 ,
R - P m-tm
and hence condition ii of Theorem 5.3.4 is satisfied. We now investigate
<s;;-ye-~n-*-u'2L 2 E{Iq'x;lz+"}. 1=p+l
Let wj be the weights of Theorem 2.6.1. By Holder's inequality,
Therefore, E{Iq'X:12+Y} is bounded, condition iii of Theorem 5.3.4 is satisfied, and result a follows.
To prove part b, we observe that Z1 = e, - X,(d - 6) and
9 8: = i: e: - (n -p)(d-- @)'An(ij- e) f = p + l 1=p+l
= ji e,2+Op(1) t = p + l
HIGHER ORDER AUTOREGRESSIVE TIME SERIES 41 1
where A, = (n - p ) - ’ Z2Ym,,+, XiX,. The conclusion of part b follows because A (n - 2p - I ) - ’ Xy=,+, e, converges to uz a.s. by Corollary 5.3.8.
Because the matrix A, = (n - p) - ’ X:=, + , X:X, converges to A in probability, the usual regression distribution theory holds, approximately, for the a u t o r e p - sive model. For exampie, u ~ ” ’ ( ~ - e), where vli is the ith diagonal element of (X:,,,, X:X,)-’&’, is approximately a N(0,l) random variable.
While we have obtained pleasant asymptotic results, the behavior of the estimators in small samples can deviate considerably Erom that based on asymptotic theory. If the roots of the autoregressive equation are near zero, the approach to normality is quite rapid. For example, if p = 0 in the first order autoregressive process, the normal or t-distribution approximations will perform very well for n > 30. On the other hand, if the roots are near one in absolute vdue, very large sampies may be required before the distribution is well approximated by the normal.
In Figure 8.2.1 we present the empirical density of the least squares estimator b
0.5 0.6 0.7 0.8 0.9 1 .o 1.1 Parameter Estimate
FIGURE 8.21. Estimated density of b comparcd with normal approximation for p = 0.9 and n = 100. (Dashed line i s m#mal density.)
412 PARAMETER ESTIMATION
defined in (8.1.7). The density was estimated from 20,000 samples of size 100. The observations were generated by the autoregressive equation
y, = 0,9y,-, + e, ,
where the e, are normal independent (0. I ) random variables. The empirical distribution displays a skewness similar to that which we would encounter in sampling from the binomial distribution. The mean of the empirical distribution is 0.861, and the variance is 0.0032. The distribution obtained from the normal approximation has a mean of 0.90 and a variance! of 0.0019. The mean of the empirical distribution agrees fairly well with the approxi-
mation obtained by the methods of Section 5.4. The bias approximated from a first order Taylor series is E { b - p}=-n-'(l + 3p). This approximation to the expectation has been discussed by Mariott and Pope (1954) and Kendall(l954). Also see Pantula and Fuller (1985) and Shaman and Stine (1988).
Often the practitioner must determine the degree of autoregressive process as well as estimate the parameters. If it is possible to specify a maximum for the degree of the process, a process of that degree can first be estimated and high order terms discarded using the standard regression statistics. Anderson (1962) gives a procedure for this decision problem. Various model building methods based on regression theory can be used. Several such procedures are described in Draper and Smith (1981). In Section 8.4, we discuss other order determination procedures.
Often one inspects the residuals from the fit and perhaps computes the autocorrelations of these residuals. If the model is correct, the sample auto- correlations estimate zero with an e m that is Op(n-1'2), but the variances of these estimators are generally smaller than the variances of estimators computed h m a time series of independent random variables. See Box and Pierce (1970) and Ljung and Box (1978). Thus, while it is good practice to inspect the residuals, it is suggested that final tests of model adequacy be constructed by adding terms to the model and testing the hypothesis that the true value of the added coefficients is zero.
Example 8.2.1. To illustrate the regression estimation of the autoregressive process, we use the unemployment time series investigated in Section 6.3. The second order process, estimated by regressing Y, - j i , on - Y,, and Y,-z - Ya, is
P, - 4.77 = 1.568 (Y,-, - 4.77) - 0.699 (q-2 - 4.77), (0.073) (0.073)
where j$,, = 4.77 and the numbers below the coefficients are the estimated standard errors obtained from the regression analysis. The residual mean square is 0.105.
If we condition the analysis on the first two observations and regress Y, on Y,-, and Y,-z including an intercept term in the regression, we obtain
HIGHER ORDER AUTOREGRESSIVE TIME SERIES 413
= 0.63 + 1.568 Y,-, - 0.699 K v 2 . (0.13) (0.073) (0.073)
The coefficients ace slightly different from those in Section 6.4, since the coefficients in Section 6.4 were obtained from equation (8.2.8).
To check on the adequacy of the second order representation, we fit a fifth order process. The results are summarized in Table 8.2.1. The F-test for the hypothesis that the time series is second order autoregressive against the alternative that it is fifth order is
F i g = 0.478[3(0.101)]-’ = 1.578.
The tabular 0.10 point for Snedecor’s F with 3 and 89 degrees of freedom is 2.15, and so the null hypothesis is accepted at that level. AA
8.23. Alternative Estimators for Autoregressive Time Series
The regression methd of estimation is simple, easily understood, and asymp- totically efficient for the parameters of stationary autoregressive processes. However, given the power of modern computing equipment, other procedures that are more efficient in small samples and (or) appropriate for certain models can be considered.
If the V, are normally distributed, the logarithm of the likelihood af a sample of n observations from a stationary pth order autoregressive process is the generali- zation of (8.1.5).
where Y’ = (Y,, Yz, . . . , Y,), J’ = (1,1,. . . ,1), Z,, is the covariance matrix of Y expressed as a function of (u’, e,, 8,. . . . , e,), and
Table 8.2.1. A ~ l J r s l s of Variance for Quarterly Season- ally Aajustea Unempioyment rate, 1948 to 1972
Degreesof Mean source Freedom Square
q-1 1 112.949 q - 2 after q-3 1 9.481 y-, after q-,, Y,-z 1 0.305 Y,-4 after Y,-l, Yr-2, Y,-3 1 0.159 K - 5 after q-1, K - 2 , y-3, q--4 1 0.014 Error 89 0.101
414 PARAMETBR ESTIMATION
= E l - 2 - l 4 . (8.2.13)
Several computer packages contain algorithms that compute the maximum likelihood estimator. In Section 8.4, it is proven that the limiting distribution of the maximum likelihood estimator is the same as the limiting distribution of the ordinary least squares estimator.
By Corollary 2.6.1.3, a stationary autoregressive process can be given a forward representation
or a backward representation
where {el} and {u,} are sequences of serially uncorrelatd (0, rr2) random variables. The ordinary least squares estimator of a = (al, a i l . . . , ap) is the value of a that minimizes the sum of squares of the estimated e,. One could also construct an estimator that minimizes the sum of squares of the estimated u,. This suggests a class of estimators, where the estimator of a is the a that minimizes
(8.2.14)
The ordinary least squares estimator is obtained by setting w, 1. The estimator obtained by setting w, = 0.5 was studied by Dickey, Hasza, and Fuller (1984). We call the estimator with w, = 0.5 the simple symmetric estimator. For the zero mean first order process, the simple symmetric estimator is
Because I$-,$[ G00,5(Y~-, + Y:), GIs for the first order process is always less than or equal to one in absolute value.
We call the estimator constructed with
t = l , 2 ,..., p , t = p + 1, p + 2 , , . . , n - p + 1 , t = n - p + 2 , n - p + 3 , . . , , n ,
w, = (n - 2 p + 2)-'(t - p ) , (8.2.15) c : the weighted symmetric estimator. The weights (8.2.15) assume n 3 2 p . The
weighted symmetric estimator is nearly identical to the maximum likelihood estimator unless one of the estimated roots is close to one. Tbe roots of the weighted symmetric estimator with weigbts (8.2.15) am not reshcted to be less than one in absolute value. For the zero mean 6rst order process, the weighted symmetric estimator with weights (8.2.15) is
Thus, the least squares estimam , and Mer in the weights given to Y, and
Tabk 8.2.2 contains the variables nquired for the symmetric estimation of the Y, in the divisor.
pth order process. Tbe estimator is
i3 = -(x’wX)-’x‘wy, where Xis the (2n - 2p) X p matrix below the headings -al , -%, . . . , -ap, Y is the (2n - 2p)-dimensional column vector called the dependent variable, and W is the (2n - 2p) diagonal matrix whose elements are given in the “Weight” column. An estimator of the covariance matrix of i3 is
Where
B2 = (n - p - l ) - ’ (Y’wy - B’X’WY).
Parameter
- 4 ... Dependent weight Vdrrbk - 0 1 -a,
Y l
y2
... yp-l W p + , 5. I u,
Wp+2 u ,+2 Yp+, Y, ...
Y” -p
y. Ym- 1
... Wn y. y.- l K - 2
1 - %-p+ I u, -p u , - p + , y,-,+, 1 - wn-p Ym-p-l Y” - p L P + ,
...
...
416 PARAMETER ESTIMATION
If the mean is unknown, there are several ways to proceed. One option is to replace each U, in the table with Y, - J, where j = n-’ Zy=l Y,. The second is to replace the elements in each column of the table with U, - y(i), where ji(,) is the mean of the elements in the ith column. The third is to add a column of ones to the table and use the ordinary least squares formulas. The procedures are asymp- totically equivalent, but the work of Park (1990) indicates that the use of separate means, or the column of ones, is p r e f d in small sampies when the process has a root close to one in absolute value. If the estimator of the mean is of interest, the mean can be estimated by j , or one can use estimated generalized least squares where the covariance matrix is based on the estimated autoregressive parameters.
If a regression program has a missing value option that omits an gbservation if any variable is missing, one can obtain the estimators by creating a data set composed of the original observations followed by a missing value, followed by the original data in reverse order. The created data set contains 2n + 1 “observa- tions.” Lagging the created vector p times gives the p explanatory variables required for the regression. Then calling the regression option that deletes observations with a missing value enables one to compute the simple symmetric estimator for any autoregression up to order p. The addition of weights is required to compute the weighted symmetric estimator.
The ideas of partial autocomlation introduced in Section 1.4 can be used in a sequential estimation scheme for autoregressive models. Let a sample of n observations (Y,, Y 2 , . . . , Y,) be available, and define X, = U, - y,, where 7, = n-l x:=, u,. Let
-112 a
t = 2 t -2 r-2 (8.2.16)
be an estimator of the first autocomelation. Then an estimator of the first order autoregressive equation is
q = y n u - $1) + 4’u,-’ 9 (8.2.17)
and an estimator of the residual mean square for the autoregression is
(8.2.18)
where
A test that the first order autocomiation is zero under the assumption drat higher order partial autocorrelations are zero is
t , = [(n - z)-l(l - B f I ) ] - ’ ” 4 , . (8.2.19)
HIGHER ORDER AUTOREGRESSIVE TIME SERIES 417
Under the null hypothesis, this statistic is approximately distributed as a N(0,l) random variable in large samples.
Higher order partial autocorrelations, higher order autoregressions, higher order autocorrelations, and tests can be computed with the following formulas:
(8.2.20)
where
and 20) = 1. The estimated partial autocorrelations ti ate defined for i = 1,2,. . . , n - 1, and t, and +f,i) are defined for i = 1,2,. . . ,n - 2. A test that 4, = 0, under the assumption that 4j = 0, j > i, is t,, which is the generalization of t , of (8.2.19).
If a pth order autoregression is selected as the representation for the time series, the covariance matrix of the vector of coefficients can be estimated with
V{C?,} = (n - 1 -p)-lP-l&;pp), (8.2.21 )
where
1 $(2) ..' &p-1) k l ) . . * 2 p - 2 )
1 1, ,-[ 2:) 1
2 p - 1 ) z ; ( p - 2 ) k p - 3 ) * * '
eP = (4,. &, . . . , dpP)', and a h ) is defined in (8.2.20). The matrix P-' is also given by
B ' S - I B , (8.2.22)
418
where
PARAMEWER ESTIMATION
0 1
- 4 2
0 0 1
- Jp -3,p- I 1 ... 1. An alternative estimator of the partial autocornlation is
It can be verified that 4, is also always less than one in absolute value. The estimator (8.2.23) in combination with (8.2.20) was suggested by Burg (1975).
The sequential method of computing the automgmssion has the advantage for stationary time series that all roots of the estimated autoregression are less than one in absolute value. The estimators obtained by the sequential methods are very similar to the simple symmetric estimators. The sequential procedure also uses all available observations at each step. If regressions are only computed by regression in one direction, moving from an equation of order p - 1 to an equation of order p involves dropping an observation and adding an explanatov vsriable. Hence, the maximum possible order far the one-direction regression procedure is i n . The sequential procedure defines the autoregression up to order n - 1.
The alternative estimators are asymptotically equivalent for stationary auto- regressive processes.
"heorern 8.2.2. &et the assump$ons of Theorem 8.2.1 hold. Then the limiting distribution of n"2(6- 6). where 6 is the maximum likelihood estimator, the simple symmetric estimator, the partial correlation estimator, or the weighted symmetric estimator, is the same as that given for the ordinary least squares estimator in Theorem 8.2.1.
Proof. Omitted. A
The maximum likelihood estimator is available in many computer packages and performs well in simulation studies for the correct model. The simple symmetric estimator, the weighted symmetric estimator, the Burg estimator, and the maxi- mum likelihood estimator have similar efficiencies for stationary processes. The maximum likelihood estimator and the weighted symmetxic estimator perform better than other estimators for processes with roots close to one in absolute value. The maximum n o d likelihood estimator and the partial correlation methods
HIGHER ORDER AUTOREORHSIVE TIMG SERIES 419
produce estimated equations such that the Foots of the estimated characteristic equation are all less than one in absolute value. It is possible for the roots associated with ordinary least squares or with the weighted symmetric estimator to be greater than one in absolute value. The ordinary least squares estimator perfonns well for forward prediction and is recommended as a preiiminary estimator if it is possible that the process is nonstationary with a root greater than one.
8.2.3. Multivariate Autoregrdve Time Series
In this subsection, we extend the autoregressive estimation procedures to vector valued processes. Let Y, be a k-dimensional stationary process that satisfies the equation
P
Y , - p + ~ A i ( Y , - i - p ) = e r (8.2.24)
for t = p + 1, p f 2,. . . , where e, are independent (0,s) tandom variables or martingale differences. If the process is stationary, E{Y,) = p. The equation can also be written
I = i
P
Y, + 2 AIY,+ = 4 + e f t i = 1
(8.2.25)
where t& is a k-dimensional column vector. Let Y,,Y, , . . ., be observed. Discussion of estimation for vector autoregressive processes can proceed by analogy to the univariate case. Each of the equations in (8.2.25) can be considered as a regression equation. We write the ith equation of (8.2.24) as
D
(8.2.26)
where Y,, is the ith element of Y,, 8,, is the ith element of $. A,, is the ith row of A,, and e,, is the ith element of e,.
and X, = (1, Y;-i, Y;-,,. . . , Y;-,), we can write equation (8.2.26) as
Defining = (e,,, -Ai, , , -Azk , . . . , -Ap, )', i = 1,2, . . . , k,
<, =X,q fe,, . (8.2.27)
On the basis of our scalar autoregressive results, we are led to consider the estimators
i = 1,2,. , .,k. (8.2.28)
If we let A' = (-4, A,, A,, . . . , Ap)', then the system (8.2.25) can be written
420 PARAMETER ESTIMATION
Y:= -X,A'+e;, t = p + 1 1 p + 2 ,..., n , (8.2.29)
and the ordinary least squares estimator of A' is
(8.2.30)
where 4 is the ith column of -A' and 4 is the ith column of -A. The least squares estimator of 2 is
n
2 = [ n - ( k + l ) p - l ] - ' t: GrG;l (8.2.31) r=p+l
W h e r e
P
8, = Y, - 4 + I= A,Yl-p 1 - 1
and fi is defined in (8.2.30). An alternative method of computing the estimator of (A,, . . . AP) = Alzl is as
i: u:(yr-y)l* (8.2.32)
where U, = (Yl'-, - y'l Yim2 - ji', . . . , Y:-,, - 7') and = n-' Xf=, Y,. The dis- tributions of the estimators are the vector analogs of the distributions of Theorem 8.2.1.
r=p+ 1 1-=p+l
Theorem 8.2.3. Let the vector time series Y, satisfy D
C A ~ Y , - ~ = 8 + el j = O
for t = p + 1, p f 2,. . . , where {el} is a sequence of k-dimensional random variables and the Aj are fixed k X R matrices such that A, = I and the roots of
are less than one in absolute value. Suppose {el};s, is a sequence of (0 .2) random vectors with
E{(e,, ere:) 14-J = (0, a.s.
and
E{le112+' 1 &,) < L < QI a.s.
MOVING AVERAGE TIME SERW 421
for all t and some Y > 0, where 4- is the sigma-field generated by {e, : j G t - 1). Assume
is positive definite. If (Yl,Yz,. . . ,Yp) are random, it is assumed that E{IYi(’+”) < 03 for i = 1,2, . . . , p and the sigma-field is generated by (e): j G
n*”(d- 8)’. N(O,S@G-’) *
t - 1) and (Y,,Y,, . . . , Y,). Then
and %$s, where 4 is defined in (8.2.28), 2 is defined in (8.2.31),
6’ = (el, e;*. . . , 8;) ,
and Z 8 G - I is the Kronecker pmduct of the matrices 2 and G-’.
PmoC We only outline the proof, because it differs in no substantive way from that of Theorem 8.2.1. Since Y, is converging to a stationary time series, plim(n - p)-l 2:,,+, XiX, = G by Corollary 6.3.5. Therefore, the limiting dis- tribution of 6 follows from that of n*”(n -p) - ’ X:..p+l X: e,,, i = 1,2,. . . , k. By arguments paralleL to those of Theorem 8.2.1, we obtain the limiting distribution. If Z is singular, then the limiting distribution contains singular components. A
An alternative estimator for the vector process is the maximum likelihood estimator which can be computed by numerical procedures. The limiting dis- tribution of the maximum likelihood estimator is the stme as that of the least squares estimator for stationary vector processes.
8.3. MOVING AYERAGE TlME SERIES
We have seen that ordinary least squares regression procedures can be used to obtain efficient estimators for the parameters of autoregressive time series. Unfortunately, the estimation for moving average processes is less simple. By the results of Chapter 2 we know that there is a relationship between the
correlation function of a moving average time series and the parameters of the time series. For example, for the first order process, p( 1) = /3( 1 + /3’)-’, where f i is the parameter of the process. On this basis, we might be led to estimate f3 from an estimator of p(1). Since we demonstrated in Chapter 6 that
estimates p(1) with an error which is Op(n-f’Z), it follows that
422 PARAMETER ESTIMATION
[2?(1)]-'{1 - [I -4i2(1)]"2}, O<)ql) (G0.5 , 41) < -0.5, 41) > 0.5 , (8.3.1) ji~={ ;;,
0 , 41)=0,
estimates j3 with an error of the same order, Obviously, if 31) lies outside the range (-0.5,0.5) by a significant amount, the mode1 of a first order moving average is suspect.
It follows from equation (6.2.8) that
var(q1)) = f i ( i + f i 2 p ( 1 + pz + 4p4 + p6 + pa) + qn-2). (8.3.2)
Because the derivative of p(1) with respect to j3 is (1 -i- /?2)12(1 - PZ), the approximate varianw of &, is, for IsI< 1,
var{jir}=n-'(1 -j32)-2(1 + j3z+4j34+p6+p8) . (8.3.3)
While the estimator (8.3.1) is consistent, we shall see that it is inefficient for /3 # 0. Roughly, the other sample autocorrelations contain information about /3.
More efficient estimators can be obtained by least squares or by likelihood methods. We first present an estimation procedure based on the Gauss-Newton method of estimating the parameters of a nonlinear function discussed in Chapter 5. One purpose for describing the procedure is to demonstrate the nature of the derivatives that define the limiting distribution of the estimator. In practice, there are a number of computer programs available to perform the numerical estimation. Maximum likelihood estimation is discussed in Section 8.4.
Consider the first order moving average
T = e , + & - , , (8.3.4)
where I#3l< 1 and the e, are independent (0, a2) random variables. Equation (8.3.4) may also be written
(8.3.5) e, = -Be,-, + 5 , and, using our previous difference equation results, we have
I- I
(8.3.6)
where
and
MOVING AVERAGE TIME SBRlRs 423
(8.3.7)
for t > 1. The expression (8.3.6) places the estimation problem in the nonlinear estimation format of Section 5.5.2. We assume initial estimators @ and Zo satisfying (@ - /3) = ~ ~ ( n - ” ~ ) and
Z,, = Op(l) are available. These requirements are satisfied if one uses Z, = 0 and the estimator & of (8.3.1). The one-step Gauss-Newton estimator of #? is obtained by regressing the deviations
I - I
e,(R 8) = Y, --x(Y; P, ZJ = C (-P>’Yt-, + (-B>’co (8.3.8) J’o
on the first derivative of i ( K 8, e,) evaluated at p = p; that derivative is
We could also include e, as a “random parameter” to be estimated. The inclusion of the derivative for a change in 6, does not affect the limiting distribution of the estimator of #? for invertible moving averages. Therefore, we simplify our discussion by considering only the derivative with respect to 8.
The computation of c,(Y;@) and W,(Y;P) is simplified by noting that both satisfy difference equations:
and
The difference equation for e,(Y;p) follows directly from (8.3.5). Equation (8.3.11) can be obtained by differentiating both sides of
e,(R B ) = Y, - Be,-, (K B)
with respect to p and evaluating the resulting expression at /3 = 8. improved estimator of #? is then
Regressing e,(U;@) on y(Y;p), we obtain an estimator of p - p . The
424 PARAMETER ESTIMATION
The asymptotic properties of B are developed in Theorem 8.3.1. An interesting result is that the large sampb behavior of the estimator of the moving average parameter /3 is the same as that of the estimator of the autoregressive process with parameter -/3. The limiting distribution follows from the fact that the derivative (8.3.11) evaluated at the m e /3 is an autoregressive process.
Theorem 83.1. Let Y, satisfy (8.3.4), where 1/3’1< 1, Po is the true value, and the e, are independent (0, a’) random variables with E{ler[2+v) < L < 03 for some Y > 0. Let Z,, and 6 be initial estimators satisfying i?,, = Op( l), B - /3 = ~ , ( n - ” ~ ) , and 161 < 1. Then
n”2(/9- / ? O ) O ’ “0,l - ( P O ) ’ ] ,
where 6 is defined in (8.3.12). Also, & ’ 2 ~ ( c 1 0 ) 2 , where a” is the true value of a and
n
&* = n-’ C ej(E 4). r = 1
Proof. The model (8.3.6) is of the form (5.5.52) discussed in Section 5.5.2. The first derivative ofx(Y; @) is given in (8.3.9). The next two derivatives are
I- I
= C j(j - l)(j - 2)(-@)j-3q-, -t t(t - I)(r - 2~- /3 ) ‘ -~e , , J = 3
where it is understood that the summation is defined as if U, = 0 for t 6 0. Let $ be a closed interval containing p as an interior point and such that maxBES 1/31 C A < 1. By Corollary 2.2.2.3, f;(R /3) and the derivatives are moving averages with exponentially declining weights for all #3 in 3. Heoce, for @ in s they converge to stationary infinite moving average time series, the effect of e, being transient. It follows from Theorem 6.3.5 that the sample covariances and autocovariances of the four time series converge in probability for all @ in s and the limits are continuous functions of /3. Convergence is uniform on the compact set $. Hence, conditions 1 and 2 of Theorem 5.5.4 are satisfied and
MOVING AVERAGE TIME SERIES 425
If #3 = po,
= 2 (-@)'-let-j + (-po)r-l[ti;o - (f - l)eo] I = 1
and W,(Y; Po) is converging to a stationary first order autoregressive process with parameter -Po. Hence,
by Theorem 6.3.5, and
by the arguments used in the proof of Theorem 8.2.1. Thus, the Iimiting distribution for n"*@ - Po) is established.
Because 8' is a continuous function of ) and because n-' X:-, e:(Y; p) converges u n i f d y on S, it follows that 8' converges to (a')' in probability. A
Comparison of the result of Theorem 8.3.1 and equation (8.3.3) establishes the large sample inefficiency of the estimator constructed from the first order autocorrelation. By the results of Theorem 8.3.1, we can use the regular regression statistics as approximations when drawing inferences about #3.
In our discussion we have assumed that the mean of the time series was known and taken to be zero. It can be demonstrated that the asymptotic results hold for Y, replaced by Y, - p", where f,, is the sample mean.
The procedure can be iterated using 4 as the initial estimator. In the next section we discuss estimators that minimize a sum of squares criterion or maximize a likelihood criterion. The asymptotic distribution of those estimators is the same as that obtained in Theorem 8.3.1. However, the estimators of the next section generally perform better in small samples.
A method of obtaining initial estimators that is applicable to higher order processes is an estimation procedure suggested by Durbin (1959). By Theorem 2.6.2, any qth order moving average
Y, = C ~ s e r - a + er (8.3.13) r = I
for which the roots of the characteristic equation are less than one can be represented in the form
426 PARAMETER ESTIMATION
m
y, = -2 cjy,-j + e, , / = I
where the weights satisfy the difference equation c, = -PI, cz = -& - pic, ,
c j= -2 pmcj-m, j = q + 1 , q + 2 , .... m = l
Since the weights cj are sums of powers of the roots, they decline in absolute value, and one can terminate the sum at a convenient finite number, say k. Then we can write
&
u,=- 2 cjyt-/ +- e, . (8.3.15) /= 1
On the basis of this approximation, we treat Y, as a finite autoregressive process and estimate cj, j = 1,2, . . . , k, by the regression procedures of Section 8.2. As the true weights satisfy equation (8.3.14), we treat the estimated c,’s as a fiNte autoregressive process satisfying (8.3.14) and estimate the p’s. That is, we treat
(8.3.16)
as a regression equation and estimate the P’s by regressing -Cj on
1,2, . . . , q, as per (8.3.14).
possible to use the results of Berk (1974) to demonstrate that
* * 9 C p q , A where the appropriate modifications must be made for j =
Ifwelet {k,}beasequencesuch thatk,,=o(n”’) andk,+masn-+oo,itis
k. 2 ( t j -C j )2=Op(k”n- ’ ) . j - I
It follows that the preliminary estimators constructed from the 6, will have an error that is oP(n-“’).
In carrying out the Gauss-Newton procedure, initial values Z, - q , Zz-q,. . . , Z,, are required. The simplest procedure is to set them equal to m. Alternatively, the autoregressive equation (8.3.15) can be used to estimate the Y-values preceding the sample period. Recall that a stationary autoregressive process can be written in either a forward or a backward manner (Corollary 2.6.1.2) and, as a result, the extrapolation formula for Yo is of the same form as that for Y,,,. If the true process is a qth order moving average, one uses the autoregressive equation to
MOVING AVERAGE TIME SERIES 427
estimate q values, since the best predictors for Y,, t S -4, are zero. Thus, one predicts Yo, Y-,, . . . , Yt-q , sets Y, = 0 for t < 1 - q, and uses equation (8.3.13) and the predicted Y-values to estimate e2-,, . , . , e,.
Example 83.1. We illustrate the Gauss-Newton procedure by fitting a first order moving average to an artificially generated time series. Table 8.3.1 contains 100 observations on X, defined by X, = 0.7e,-, + e,, where the e, are computer generated n o d independent (0,l) random variables. We assume that we know the mean of the time series is zero.
As the first step in the analysis, we fit a seventh order autoregressive model to the data. This yields
= 0.685 Y,-l - 0.584 Y,-2 + 0.400 Y,-3 - 0.198 K - 4
(0.108) (0.13 1) (0.149) (0.152)
f 0.020 Y,-5 + 0.014 q-6 + 0.017 Y,-7, (0.149) (0.133) (0.108)
Table 8.3.1. One Hundred Observations Prom a First Order Moving Average Time Series witb f l = 0.7
First 25 second 25 Third 25 Fourth 25
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
1.432 -0.343 - 1.759 -2.537 -0.295
0.689 -0.633 -0.662 -0.229 -0.851 -3.361 -0.912
1.594 1.618
- 1.260 0.288 0.858
- 1.752 -0.960
1.738 - 1.008 - 1 .589
0.289 -0.580
1.213
1.176 0.846 0.079 0.815 2.566 1.675 0.933 0.284 0.568 0.515
-0.436 0.567 1.040 0.064
-1.051 -1.845
0.281 -0.136 -0.992
0.321 2.621 2.804 2.174 1.897
-0.781
-1.311 -0.105
0.313 -0.890 - 1.778 -0.202
0.450 -0.127 -0.463 0.344
-1.412 - 1.525 -0.017 -0.525 - 2.689 -0.21 1
2.145 0.787
-0.452 1.267 2.316 0.258
- 1.645 - 1.552 -0.213
2.607 1.572
-0.261 -0.686 -2.079 -2.569 -0.524 0.044
-0.088 - 1.333 - 1.977
0.120 1.558 0.904
- 1.437 0.427 0.051 0.120 I .460
-0.493 -0.888 -0.530 -2.757 - 1.452
0.158
428 PAhWETEJt ESTIMATION
where the numbers in parentheses are the estimated standard errors obtained from a standard regression program. The residual mean square for the regression is 1.22. The first few regression coefficients are declining in absolute magnitude with alternating signs. Since the third coefficient exceeds twice its standard error, a second order autoregressive process would be judged an inadequate representation for this realization. Thus, even if one did not know the nature of the process generating the data, the moving average representation would be suggested as a possibility by the regression coefficients. The regression coefficients estimate the negatives of the ci, j 1, of Theorem 2.6.2. By that theorem, /3 = -cI and c, = -pi- j = 2,3, . . . . Therefore, we arrange the regression coefficients as in Table 8.3.2, The initial estimator of B is obtained by regressing the firpt row on the second row of that table. This regression yields b=O.697.
Our initial estimator for e, is
to = 0.685Y, - 0.584Y2 + 0.400Y3 - 0.198Y4 I- 0.020U5 + 0.014Y6 + 0.017Y7 = 0.974.
The values of e,(U;0.697) are calculated using (8.3.10), and the values of W,(Y; 0.697) are calculated using (8.3.11). We have
e,(Y; 0,697) Y, - 0.697C0 = 1.432 - 0.697(0.974) = 0.753 , t = 1 , ={ Y, - 0.697et-,(Y; 0.697), t = 2 , 3 ,..., 1 0 0 ,
W,(Y; 0.697) to = 0.974, t = l , =( e,- ,(Y; 0.697) - 0.697v:_,(Y; 0.697), t = 2.3, . . . , 100.
The first five observations are displayed in Table 8.3.3. Regressing e,(Y; 8) on W,(Y; B ) gives a coefficient of 0.037 and an estimate of ) = 0.734. The estimated standard error is 0.076, and the residual mean square is 1.16.
Table 8.33. obsctnatlons for Regreapion Jibtimalion of anInitialEtimatcfor#J
Regression Coefficient Multiplier
i -5 of B
I 0.685 1 2 -0.584 -0.685 3 0.400 0.584 4 -0.198 -0.400 5 0.020 0.198 6 0.014 -0.020 7 0.017 -0.014
AUTOREGRESSIVE MOVING AVERAGE TIME SERIES 429
Table 8.33. First Five Observations Used in Gauss- Newton ComputaUOn
r e,(K 0.697) W,(U; 0.697)
1 0.753 0.974 2 -0.868 0.074 3 -1.154 -0.920 4 - 1.732 -0.513 5 0.913 - 1.375
Maximum likelihood estimation of the parameters of moving average processes is discussed in Section 8.4. Most computer programs offer the user the option of specifying initial values for the likelihood maximization routine or of permitting the program to find initial values. Because of our preliminary analysis, we use 0.697 as our initial value in the likelihood computations. The maximum likelihood estimator computed in SAS/ETS@ is 0.725 with an estimated standard error of 0.071. The maximum likelihood estimator of the ecror variance adjusted for degrees of freedom is k2 = 1.17. The similarity of the two sets of estimates is a reflection of the fact that the estimators have the same limiting distribution. AA
The theory we have presented is all for large samples. There is some evidence that samples must be fairly large before the results are applicable. For example, Macpherson (1981) and Nelson (1974) have conducted Monte Carlo studies in which the empirical variances for the estimated first order moving average parameter are about 1.1 to 1.2 times that based on large sampIe theory for samples of size 100 and Odp $0.7. Unlike the estimator for the autoregressive process, the distribution of the nonlinear estimator of /3 for 8 near zero differs considerably from that suggested by asymptotic theory for n as large as 100. In the Macpherson study the variance of B for p = 0.1 and n = 100 was 1.17 times that suggested by asymptotic theory.
In Figure 8.3.1 we compare an estimate of the density of B for p = 0.7 with the normal density suggested by the asymptotic theory. The empirical density is based on 15,000 samples of size 100. Tfie estimator of /3 is the maximum likelihood estimator determined by a grid search procedure.
The estimated deosity for 4 is fairiy symmetric about 0.7 with a mean of 0.706. However, the empirical density is considerably flatter than the normal approxi- mation. The variance of the empirical distribution is 0.0064, compared to 0.0051 for the normal approximation.
8.4. AUTOREGRESSIVE MOVING AVERAGE TIME SERJES
In this section, we treat estimation of the parameters of time series with representation
430 PfaRAMETw ESTIMATION
0.4 0.5 0.6 0.7 0.8 0.9 1 .O Parameter Estimate
FIGURE 83.1. Estimated density of maximum likeifiood estimator of f i compared with normal approximation for f i = 0.7 and n = 1.00. (Dashed line is normal density.)
P 4
Y, + C ajK-, = e, + E fie,-, , j = I i= I
where the e, are independent (0, u2) random variables, and the mots of P
~ ( m ; (u) = mp + (rim'-' = 0 , J " 1
(8.4.1)
(8.4.2)
and of 4
B(s; /3) = sq + 2 pisq-i = 0 (8.4.3) I - 1
are less than one in absolute value. Estimators of the parameters of the process can be defined in a number of
different ways. We consider three estimators obtained by minimizhg three
AUMWRESSrVE MOVING AVERAGE TIME SERIES 431
different functions. If Y, is a stationary n o d time series, the logarithm of the likelihood is
L, , ( f ) = -0.51 log 2~ - 0.5 loglZyy(f)l - OSY'Z;;(f)Y (8.4.4)
where f ' = (6', v2), 6' = (al , a,, . . . , ap, &, &, . . . , @,I, Y' = (Y,, Y2,. . . , Y,). and 2,,=2,,(~) = E{YY').
The estimator that maximizes L,,(f) is often called the maximum likelihood estimator or Gaussian likelihood estimator even if Y, is not normal. Let a-22yAf)=M,,,(6). Then the maximum likelihood estimator of 6 can be obtained by minimizing
(8.4.5) Z n @ ) = n-llM,#)l I l n Y t Mij(8)Y.
The estimator of 6 obtained by minimizing
QJ6) = n-'Y'M;,!(B)Y (8.4.6)
is called the least squares estimator or the unconditional least squares estimator. An approximation to the least squares estimator is the estimator that minimizes
(8.4.7)
where the d,(6) are defined in Theorem 2.7.2. Observe that Q,,(6) is the average of the squares obtained by truncating the infinite autoregressive representation for el. In Corollary 8.4.1 of this section, we show that the estimators that minimize Zn(6), el,(@, and Qzn(6) have the same limiting distribution for stationary invertible time series.
To obtain the partial derivatives associated with the estimation, we express (8.4.1) as
P 9
el(& 6) = Y, + 2 - 2 p,el-i(Y; 6 ) . (8.4.8) j = 1 i= 1
Differentiating both sides of (8.4.8). we have
Using the initial conditions
432 PARAMETER EST(MATI0N
w,,,(y,e)=o, j = i , 2 ,..., p ,
wpi,,(x e) = 0 9 i = 1,2, . , . , q ,
for t S p and e,-,(Y, 0) = 0 for t - i G p , the derivatives of QJ8) are defined recursively. The Wa,,(Y; 0) and WP,.,(Y; e) are autoregressive moving averages of Y,. Therefore, if the roots of (8.4.2) and (8.4.3) associated with the 8 at which the derivatives are evaluated are less than one in absolute value, the effect of the initial conditions dies out and the derivatives converge to autoregressive moving average time series as t increases.
The large sample properties of the estimator associated with Q2,(8) are given in Theorem 8.4.1. We prove the theorem for iid(0, g2) errors, but the result also holds for martingale difference errors satisfying the conditions of Theorem 8.2.1.
"heorem 8.4.1. Let the stationary time series Y, satisfj (8.4.1) where the el are iid(0,d) random variables. The parameter space Q is such that all roots of (8.4.2) and (8.4.3) are less than one in absolute value, and (8.4.2) and (8.4.3) have no common m u . Let 8' denote the true parameter, which is in the interior of the parameter space 8. Let 4 be the value of e in the closure of Q that minimizes Qz,(e)- men
and
- 1 0 2 P( 4 - O 0 ) 5 "0, v, (a ) 3 * (8.4.10)
where
Proof. Let (9, be a compact space such that, for any 6 E ee, the roots of (8.4.2) are less than or equal to one in absolute value, the roots of (8.4.3) are less than or equal to 1 - c in absolute value for some e > 0, and eo is in the interior of Se. For any 8 in S,,
where
AUTOREORESSIVE MOVING AVERAGE TTME SERIES 433
and the polynomiaIs are defined in (8.4.2) and (8.4.3). The time series Z,(e) is a stationary autoregressive moving average because the roots of B(s; @) are less than one in absolute value. See Corollary 2.2.2.3, equation (2.7.14), and Theorem 6.3.5. The convergence is uniform by Lemma 5.5.5, because Z,(e) and its derivatives are infinite moving averages with exponentially declining coefficients. Now
defines the unique minimum variance prediction error for the predictor of U, based on q-l,q-z ,... - See Section 2.9. Therefore, if 6+8' and eEQ,,
V{Z,(@) ' V{z,(e">) - If any of the roots of B( 3; 8) are equal to one in absolute value, Q2,,(8) increases without bound as n -+m. It follows that the condition (5.5.5) of Lemma 5.5.1 is satisfied for 4 defined by the minimum of Q2,(f?) over the closure of 8. Hence, 4 converges to 8' in probability as n + m. Because &(a) is apcontinuous function of 8 thai converges uniformly to V{Z,(O)) on @,, and 4+eo, it follows that
The first derivatives of e,(Y; 0 ) are &lined in (8.4.9). The second derivatives e,,cr%> can be defined in a similar manner. For example,
Therefore, the second derivatives are also autoregressive moving follows that the matrix
averages. It
1
converges uniformly to B(B) = limf-.,m O.SE{W~,W,,), which is a continuous func- tion of 4 on some convex compact neighborhood S of 8' containing 8' as an interior point.
By the Taylor series arguments used in the proof of Theorem 5.5.1,
where B(6O) is defined in that theorem and r, is of smaller order than 4 - 8'.
434 PARAMETER ESTIMATION
Now,
because the d,(B') decline exponentially. The vector WLl(Y 0') is a function of (Yl, Y,, . . . , and hence is independent of e,. Following the arguments in the proof of Theorem 5.5.1, we obtain the asymptotic n d t y of
ti
n-'', w;,(Y; @')el ,
and hence of nI"(4 - 0'). Theorem 5.5.1 does not apply directly, because Y, + Z;:; dJ(B)Y,-, are not independent and not identically distributed. However, as t increases, every element of W;,(Y; 6') converges to an autoregressive moving average time series in the c,, and we obtain V&'(U')~ as the covariance matrix of
1 - 1
the limiting distribution. A
The three estimators defined by (8.4.5), (8.4.6), and (8.4.7) have the same
Theorem 8.4.2. Let dm, and dl be the estimators obtained by minimizing (8.4.5) and (8.4.6), respective1 . Then under the assumptions of Theorem 8.4.1, the limiting distribution of n (I, - e') is equal to the limiting distribution of n'/2(d, - 8') and is that given in (8.4.10).
Proof. Let Yf = Y,(B) be an autoregressive moving average (p, q) with parameter 6 E Q, and uncmlated (0, a,) errors r,, where S, is defined in the proof of Theorem 8.4.1. Let
V = ( Y ~ - ~ , Y2-p , . . . , Yo, el-p, e2-,,, . . . , eo)
limiting behavior.
1 IY
* *
* * * * * * * * * * * * * * * Y,#, = (Y,, Y,, . . . , Yn), and e i =(e l , e,, . . . , en). Then
[:'I en =[;I t, +G> ::, where D = D(6) is the n X n lower triangular matrix with dii = d,@) = dl,-j,(f% d,(@) is defined in* Theorem 2.7.2, 0 is a ( p + q) X n matrix of zeros, 1 is ( p + q) X ( p + q), K = (I, K')' is a matrix whose elements satisfy
s, I 1 s i a p + q , 1 aj G p f q ,
- 2 p r k i - r + j - ai-J-q Y p + q + 1 S i S n + p + q,
- 5 p r l t - r , j 9 p + q + l S i S n + p + q , p + l S j S p + q ,
is the Kronecker delta. Note that the k, = k,(6) satisfy the same difference
1 d j G p , r- I
r-I I * *
tu =
and
AUTOREGRESSIVE MOVING AVERAGE TIME SERIES 435
equation as the do(@), but with different initial values. Galbraith and Galbraith (1974) give the expressions
M;; = D'D - D'K(A-' + K'K)-'K'D (8.4.11)
and
where
The dependence of all matrices on 8 has been suppressed to simplify the expressions. Observe that Q2, = n-'Y'D'DY and
Q," - Q1, = n--'Y'D'K(A-' + KK)-'K'DY.
To show that Qzn - Q,, converges to zero, we look at the difference on the set 8,. Now
for some M < 00 and some O < A < 1, because the dr(8) and the elements of K decline exponentially. Also, A is a positive definite matrix for 8 E @,, and the determinant is uniformly bounded above and below by positive numbers for all n and al l 8 E 8, It follows that
SUP lQ,V) - Q,,(@)l= a.8.
We now investigate the ratio r,(e)[~,(e)]-' . Let the n x n lower triangular matrix T = T(8) define the prediction m r made in using the best predictor based on (y t - , , y l - z , . . . , U,) to predict Y,. See Theorem 2.9.1 and Theorem 2.10.1. Then,
rt.e~i3,
and
436 PARAMETER ESTIMATION
where h,i >a2 and hi i+u2 as i+@. Also, letting 2;:; bjq-j be the best predictor of 6,
for some M < @ and O < A < 1. Thus,
* * for some M < 03 and & < ~ 0 . It foIIows that
i,,(e)[~~,,(e)j-~ = IMyje)lt'" c -+ 1.
* * Let the ( p + q)-vector Z = &e) = K'DY, where q(8) = Ey=l QlJ(8) and
n--i
Now the gJe), their first derivatives. and their second derivatives are exponential- ly declining in i for 6E ee. Also, the first and second derivatives of &,(@) are exponentially declining in i. The first and second derivatives of A are bounded on e. Therefore, r"(8) - Qln(f9), Qt,(6) - Q,,,(e), and Q,(e) - I, ,(@) and their first and second derivatives converge uniformly to zero in probability. By the stated derivative Properties of K(8) and A(@), the first and second
derivatives of n-'loglM,,(B)( converge uniformly to zero in probability for 8 E 69,. Therefore, the limits of l,,(@), of Q,,(8), and of Qzn(@) are the same, and the limits of the first and second derivatives of the three quantities are also the same. It follows that the limiting distributions of the three estimators are the same. A
We derived the limiting distribution of the estimators for the time series with known mean. Because the limiting behavior of sample autocovariances computed with mean adjusted data is the same as that for autocovariances computed with known mean, the results extend to the unknown mean case.
The mean squares and products of the estimated derivatives converge to the mean squares and products of the derivatives based on the true 8'. Therefore, the usual nonlinear least squares estimated covariance matrix can be used for inference.
There are a number of computer programs available that compute the Gaussian
AUTOREORESSIVE MOVING AVERAGE TIME SERIES 437
maximum likelihood estimates or approximations to them. It is g o d practice to use the autoregressive representation technique introduced in Section 8.3 to obtain initial estimates for these programs. By Theorem 2.7.2, we can represent the invertible autoregressive moving average by the infinite autoregressive pmess
m
Y, = - 2 d,Y,-j + e, , j - I
(8.4.12)
where the d, are defined in Theorem 2.7.2. Hence, by terminating the sum at a convenient finite number, say k, and estimating the autoregressive parameters, d , , d,, . . . , d,, we can use the definitions of the dj to obtain initial estimates of the autoregressive moving average parameters.
In using a nonlinear method such as the Gauss-Newton procedure, one must beware of certain degeneracies that can occur. For example, consider the autoregressive moving average (1,l) time series. If we specify zero initial estimates for both parameters, the derivative with respect to a, evaIuated at a, =/3, = O is - Y,- ,. Likewise, the derivative with respect to P, evaluated at a, = /3, = 0 is Y,- , , and the matrix of partial derivatives to be inverted is clearly singular. At second glance, this is not a particularly startling result. It means that a first order autoregressive process with small at behaves very much like a first order moving average process with small PI and that both behave much like an autoregressive moving average (1,l) time series wbere both a, and 8, are small. Therefore, one should consider the autoregressive moving average (1 , l ) repre- sentation only if at least one of the trial parameter values is well away from zero. Because the autoregressive moving average (1,l) time series with at = /3, is a sequence of uncorrelated random variables, the singularity occurs whenever the initial values are taken to be equal.
In developing our estimation theory for autoregressive moving averages, we have assumed that estimation is carried out for a correctly specified model. In practice, one often is involved in specifying the model at the same time that one is constructing estimates. A number of criteria have been suggested for use in model selection, some developed with time series applications in mind. Because the estimated model parameters are asymptotically normally distributed and because the estimation procedures are closely related to regression, model selection procedures developed for regression models are also applicable to the auto- regressive moving average problem. A test based on such statistics was used in Example 8.2.1.
Another selection procedure is based on the variance of regression prediction, See Mallows (1973) and b i k e (1%9a). Assume one fits the regression model
y = X $ + e , (8.4.13)
where y is n X 1, X is n X k , /3 is k X 1, and e-(0, (r21). If one predicts the y-value for each of the observed X, rows of X, the average of the n prediction variances is
438 PARAMETER ESTIMATION
(8.4.14)
where = X,B, @ = (X'X) X y, and k is the dimension of X,. Thus, one might choose the model that minimizes an estimator of the mean square prediction error. To estimate (8.4.14), one requires an estimator of cr2. The estimator of a2 that is most often suggested in the regression literature is
- 1 t
B2 = (n - k,)-'(y'y - @;x;x,B,), (8.4.15)
where X, is the model of highest dimension and that dimension is k,. Thus, the model is chosen to minimize
~ ~ ~ = n - ' ( n + k ) & ' . (8.4.16)
The use of the regression residual mean square for the particular set of regression variabIes being evaluated as the estimator of a2 generally leads to the same model selection.
A criterion closely related to the mean square prediction error is the Criterion called AIC, introduced by Akaike (1973). This criterion is
AIC = -2 log L(d) + 2 k , (8.4.17)
where L ( 6 ) is the likelihood function evaluated at the maximum likelihood estimator d, and k is the dimension of f2 For normal autoregressive moving average models,
-2 logL(6) = loglE,@)) + n + n log 27r,
where Eyy(b) is the covariance matrix of (Yl, Y2,. . . , YE) evaluated at 8 = b The determinant of ZY,(O) can be expressed as a product of the prediction error variances (see Theorem 2.9.3). Also, the variances of the prediction errors converge to the error variana u2 of the process as t increases. Therefore, 1 0 g ~ 2 ~ ~ ( d ) ~ is close to n log &:, where 8: = a:(@> is the maximum likelihood estimator of cr2. It follows that AIC and MPE are closely related, because
n-' log&,,(8)l+ 2n-'k=log[(l + 2n-'k)&~]=logMPE,
when the Ci2 of MPE is close to (n - k)-'nB;. The AIC criterion is widely used, although it is known that this criterion tends to select rather high order models and will overestimate the true order of a finite order autoregressive model. Because of the tendency to overestimate the order of the model, a number of related criteria have been developed, and considerable research has been conducted on the use of model selection Criteria, See Parzen (1974, 1977), Shibata (1976, 1986), Hannan
AUTOReORESSIVE MOVING AVERAGE TIME SERIES 439
and Quinn (1979). Hannan (1980), Hannan and Rissanen (1982), Bhansali (1991), Findley (1983, Schwartz (1978), and Rndley and Wei (1989).
Example 8.4.1. The data of Table 10.B.2 of Appendix 10.B are artificially created data generated by an autoregressive moving average model. As a first step in the analysis, we fit a pure autoregressive model of order 10 by ordinary least squares. The fitted model is
$=- 0.099 + 2,153 q-, - 2.032 q-2 + 0.778 q-3 (0.104) (0.105) (0.250) (0.332)
(0.352) (0.361) (0.366) + 0.072 qV4 - 0.440 q-5 + 0.363 ?-6
- 0.100 Y,-, - 0.144 Y,-,+ 0.150 q-9- 0.082 (0.368) (0.355) (0.268) (0.1 13)
where the numbers in parentheses are the estimated standard errors of an ordinary regression program. We added seven observations equal to the sample mean to the beginning of the data set and then created ten lags of the data. This is a compromise between the estimators (8.2.5) and (8.2.8). Also, we keep the same number of observations when we fit reduced autoregressive models. There are 97 observations in the regression. The residual mean square with 86 degrees of freedom is 1.020. The fourth order autoregressive process estimated by ordinary least squares based on 97 observations is
$ = - 0.086 f 2.170Y,-, - 2.074 q-2+ 0.913 yt - -3 - 0.209 x-4 (0.102) (0.099) (0.224) (0.225) (0.100)
with a residual mean square of 0.996. This model might be judged acceptable if one were restricting oneself to autoregressive processes because the test for the fourth order versus the tenth order gives F = 0.63, where the distribution of F can be approximated by Snedecor’s F with 6 and 86 degrees of freedom.
We consider several alternative models estimated by Gaussian maximum ~kelihood in Table 8.4.1. All calculations were done in SAS/ETS*. The estimate
I Table8.4.1. ComparLson d Alternative Models Estl- mated by Maximum Likelihood Model
AR( 10) 1.005 301.4 1.115 0.996 294.5 1.046
ARMA(2.1) 1.032 296.9 1 .w4 -(222) 0.991 294.0 1.041 GRMA(2.3) 0.990 295.0 1 .om ARMA(3.1) 1.010 295.8 1.061 ARMA(3.2) 0.992 295.2 1.052
of u2 given in the table is the maximum likelihood estimatol djusted for degrees of freedom,
e2 = (n - r)-I tr{YlM;;Y},
where Y‘ = (Y, , Y,, . . . , Y,), M,, is the estimate of My,, E,, = v Z M y y , 8, is the II X n covariance matrix of Y, and r is the total number of parameters estimated. The mean prediction e m was calculated as
W E = n-’(n + r)S2,
where n = 100, and the AIC is defined in (8.4.17). Several of the models have similar properties. On the basis of the AIC criterion,
one would choose the autoregressive moving average (2,2). For this model, the estimated mean is -0.26 with a standard error of 0.56, and the other estimated parameters are
(&’, 4, &, b,) = (- 1.502, 0.843, 0.715, 0.287). (0.109) (0.110) (0.061) (0.059) AA
Example 84.2. As an example of estimation for a prows containing several parameters, we fit an autoregressive muving average to the United States monthly unemployment rate from October 1949 to September 1974 (300 observations). The periodogram of this time series was discussed in Section 7.2. Because of the very large contribution to the total swn of squares from the seasonal frequencies, it seems reasonable to treat the time series as if there were a different mean for each month, Therefore, we analyze the deviations from monthly means, which we denote by U,. The fact that the periodogram ordinates close to the seasonal frequencies are large relative to those separated from the seasonal fquencies leads us to expect a seasonal component in a representation for Y,.
and V,-,,. Notice that we are anticipating a model of the “component” or “multiplicative” type, so that when we include a variable of lag 12, we also include the next three lags. That is, we are anticipating a model of the form
(1 -e,a-e,se2-e,a3xi - ~ 4 ~ 1 2 - ~ ~ s e 2 4 - e 6 s e 3 6 - e 7 g e 4 8 ) ~ = V, - e , ~ , - ~ - e , q 2 - e,y,-, - e,y,-,,
+ e,e,u,-,, + e2e4qMl4 + . - - + e,e,q_,, = e, . In calculating the regression equation we added 36 zeros to the beginning of the
data set, lagged Y, the requisite number of times, and used the last 285 observations in the regression. This is a compromise between the forms (8.2.5) and (8.2.8) for the estimation of the autoregressive parame€ers. The regression vectorsfortheexplanatoryvariables Y,-,, Y,-2, q-3r yI-12,U,--13, Y,-14,andY,-,,
AUTOREGRESSIVE MOVING AVERAGE TLME SERIES 441
Table8.4.2. Regression CoeW~ents Obtained in P m liminary Autoregressive Fit to United States Monthly Unemployment Rate
Standard Error Variable Coefficient of Coefficient
Y- 1 1.08 0.061 T - 2 0.06 0.091
L I Z 0.14 0.060 Y - 3 -0.16 0.063
Y - L 3 -0.22 0.086 Y - 1 4 0.00 0.086 Y- LS 0.06 0.059 u,-24 0.18 0.053 Y - 2 5 -0.17 0.075 YI-26 -0.09 0.075 x - - 2 7 0.09 0.054 u,-36 0.08 0.054 Y-3, -0.13 0.075 Y-3, 0.04 0.075 Y - 3 9 0.00 0.054
Y - 4 9 -0.01 0.075 Y - S O -0.01 0.075 L, -0.09 0.053
K-.a, 0.11 0.055
contain all observed values, but the vectors for longer lags contain zeros for some of the initial observations.
The regression coefficients and standard e m n are given in Table 8.4.2. The data seem to be consistent with the component type of model. The coefficients for yI-,,,-, are approximately the negatives of the coefficients on Y,-lzi for i = 1,2,3. Even more consistently, the s u m of the three coefficients for U, - z, -, for j = 1,2,3 is approximately the negative of the coefficient for yI-lz,, i = 1, 2, 3, 4. The individual coefficients show variation about the anticipated relationships, but they give us no reason to reject the component model. The residual mean square for this regression is 0.0634 with 254 degrees of freedom. There are 285 observations in the regression and 19 regression variables. We deduct an additional 12 degrees of freedom for the 12 me8118 previously estimated. We also fit the model with U,- x- , , and the componding lags of 12 as well as the model with y I - , , yt-2. YtT3. yI-.,, qPs, and the corresponding lags of 12. Since the coefficient on y IW3 is almost twice its standard error, while the coefficients on yI-4 and yt-5 were small, we take the third order autoregressive process as our tentative model for the nonseasonal component.
The coefficients for yt-,2, yI-24, yI-36, yI-,, are of the same sign, are of small magnitude relative to one, and are declining slowly. The autoregressive co-
442 PARAMETER ESTMAllON
efficients for an autoregressive moving average can display this behavior. Therefore, we consider
as a potential model. On the basis of Theorem 2.7.2, the regression coefficients for lags of multrples of 12 should satisfy, approximately, the relations of Table 8.4.3. Regressing the first column of that table on the second two columns, we obtain the initial estimates 8 = 0.97, B = -0.83. The estimate of $I is of fairly large absolute value to be estimated from only four coefficients, but we are only interested in obtaining crude values that can be used as star& values for the nonlinear estimation. The maximum likelihood estimate of the parameter vector of model (8.4.18) using (1.08, 0.06, -0.16, 0.97, -0.83) as the initial vector is
(J,, 4, a3, 4 ))=(1.152, -0.002, -0.195, 0.817, -0.651)
with estimated standard errors of (0.055, 0.085, 0.053, 0.067, 0.093). The residual mean square e m is 0.0626 with 268 degrees of freedom. Since this residual mean square is smaller than that associated with the previous regressions, the hypothesis that the restrictions associated witb the autoregressive moving average representa- tion are valid is easily accepted. As a check on model adequacy beyond that given by our initial regressions, we estimated four alternative models with the additional terms $+, e, - I , $-,,, In no case was the added term significant at the 5% level using the approximate tests based on the r e p s i o n statistics.
One interpretation of our final model is of some interest. Define X, = Y, - l.lSZY,-, + 0.002Y,-, -t 0.195Y,-,, Then XI has the autoregressive moving aver- age representation
X, = 0.817X,-1, - 0.651e,-,, + e, ,
where the e, are uncomlated (0, 0.0626) random variables. Now XI would have this representation if it were the sum of two independent time series X, = S, + u,, where v, is a sequence of uncorrelated (0,0.0499) random variables,
S, = 0,817S,-12 + u, ,
Table 8.4.3, cplculstfon of Initial Estimates of S and j#
Regression Coefficients Multipliers Multipliers
j -4 for s for #3
1 0.14 1 1 2 0.18 0 -0.14 3 0.08 0 -0.18 4 0.11 0 -0.08
PREDICTION WITH ESTlMATED PARAMETERS 443
and u, is a sequence of (0,0.0059) random variables. In such a representation, S, can be viewed as the "seasonal component" and the methods of Section 4.5 could be used to construct a filter to estimate S,. AA
8.5. PREDICTION WITH ESTIMAmD PARAMETERS
We now investigate the use of the estimated parameters of autoregressive moving average time series in prediction. Rediction was introduced in Section 2.9 assuming the parameters to be known. The estimators of the parameters of stationary finite order autoregressive invertible moving average time series discussed in this chapter possess errors whose order in probability is n-"2. For such time series, the use of the estimated parameters in prediction increases the prediction error by a quantity of OP(n-I'*).
Let the time series Y, be defined by
where the roots of
and of
9
rq + fir"-' = o
are less than one in absolute vdue and the e, are independent ( 0 , ~ ' ) random variables with E{ef} = 7p4. Let
i= 1
denote the vector of parameters of the process. When 8 is known, the best one-period-ahead predictor for U, is given by
Thewems 2.9.1 and 2.9.3. The predictor for large n is given in (2.9.25). The large n predictor obtained by replacing 9 and #?, in (2.9.25) by &, and ,&, is
where
444 PARAMETER ESTIMATKlN
t = p -4 + I , p -4 + 2, * . . ? P YO*
(8.5.3)
Theorem 8S.1. Let yI be the time Series defined in (8.5.1). Let 6 be an estimator of 8 =(-a,, . . , , -apt p,, . . . , p,)' such that d- 8 = Up(n-1'2). Then
where f , + , ( Y , , . . . , U,) is defined in (2.9.25) and <+l(Yl , . . ., Y,) in (8.5.2).
Proof. We write D D
where 8' is between 6 and 8 and, for example,
and P ll
For I such that the mots of the characteristic equations are less t,an one in absolute value, the derivatives multiplying 6, - are, except for the initial effects,
A stationary time series. since 8- = ~~(n-''~), the result ~OIIOWS.
Theorem 8.5.1 generalizes immediately to predictions s peniods ahead. On the basis of th is result, the prediction variance formulas of Section 2.9 can be used as approximations for UK predictor fn+*(Y1,. . . , Y.1.
Additional results are available for the e m r in predictions for the stationary p-th order autoregressive process. Let Y, be the stationary process satisfying
D
(8.5.4)
PREDICTION WITH ESTIMATED PARAMmRS 445
A =
where the roots of
JI
(8.5.5)
are less than one in absolute value. and the e, are independent (0, cr2) random variables. We can also write
Y, =AY,-, fe , , (8.5.6)
where Y, = (U,, U , - l , . . . , y I - p + l , l)’,e, = (e,,O,. . . , O)’, and
--f% -cr, -a3 * . - - 9 - 1 -5 a,
1 0 0 * - * 0 0 0 0 I 0 * - . 0 0 0 0 0 1 .-. 0 0 0
0 0 0 . - - 1 0 0 I 0 0 0 .-. 0 0 1
The least squares estimator of a = (-ai, -%,. . . , --ap, a$‘ is
(8.5.7)
and the estimator of A, denoted by b, is obtaiwd from A by replacing the q with the estimator $. Let the least squares predictor of Y,,,, based upon (8.5.7). be
En+, = XY, (8.5.8)
for s 3 1.
Theorem 85.2. Let the model (8.5.4) hold with the e, independent identically distributed random variables with a symmetric distribution function. Let U,, Y2, . . . , Yp be symmetrically distributed with finite variance, independent of ep+ , , ep+2, . . . . Assume the distributions are such that
for n > p . Then, for n >p ,
h f . The least squares estimators 4. i = 1.2,. . . , p . can be expressed as functions of
446 PARAMETBR ESTIMATJON
Therefore, the &,, i = 1,2,. . . , p, are even functions of (U,, Y,, . . . , Y,), and &,, is an odd function of (Yt, Y,, . . . , Y,). Note that adding a constant to each Y, will change the predictor by the same amount. Therefore, there is no loss of generality in the assumption that a, = 0. The prediction error is
S - 1
Y,+, - g,,, = 2 Aie,,+s-i + (A‘ - As)Y, . i=o
The last element of the last co~umn of iz” is one. AII other entries of the last column are multiples of $, where the multiples are functions of &,, &,, . . . ,ap and the functions may be zero for sma&l s. Let (Y,, Y,, . . , , Y,,) = Y be a realization, and let (-U,, -Y2,. . . , ;Yn) = 3 Then the value of (A“ - AJ)Y, for Y is the negative of the value for 2 The result follows because E{e,}=O, and
A because the distribution of (U, , Y,, , . . , Y,) is symmetric.
We now obtain an approximation for the variance of the prediction emr, conditional on (Y,, Y,-], . + . , Yn-p+ ,).
Theorem 83,3. Let Y, be a stationary normal time series satisfying the model (8.5.4) with the roots of (8.5.5) less than one in absolute value. Then the mean square errm of the predictor ?,+s of (8.5.8), conditional on Y,,, is
E{(Yn+j - Qn+s)(Yn+s - Qn+,>’ IYnI
where r = EOI’,Y:), llR,,]lz = t’R;R,,, Y,, = A’Y, forj = 0,1,. . . , s, and M is a matrix with one as the upper left element and zeros elsewhere.
Proof. We have . - I . . ,
Y,+, = A‘Y, + A’en+s-j , I-0
PREDICTTON WR'H ESTIMATED PARAMETERS 447
where Ao = I. Therefore, the e m in the predictor is r-1
-*S
Y,,, - q,,, = A'e,,,, - (A - As)Y, 1=0
It follows that
H(Ym+s - Q n + s M Y n + s - Qn+s)' IYnI s-1
= c2 2 A'MA" + E{(X - As)YnYi(A' - A")' I Y,} . i - 0
Because A = A + Op(n-1'2), we may expand b" in a first order Taylor series about A to obtain
a - I
By the normality, E{I& - #I*) = O(n-*), and we obtain
j - 0
(8.5.10)
where E(([R,,Il} = and E(IIRl,([ I Y.} = ~ ? ~ ( n - ~ " ) . Note that for t > n, q > n the upper left element of (A - A)TrTi(A - A)' is
(& - LT)'~,P:(& - LT) and all other elements are zero. Themfore, because P, and P, are functions of Y,,
where
Under om assumptions,
and
,qi&,l~ = 0 ( ~ - 3 ' ~ ) . A Note that the upper left element of A'MA'' is w/', where the w, satisfy the
448 PARAMETER ESTlMATloN
difference equation (2.6.4). For the first order r s s ( p = I), the expression for the conditional mean square error of Y,+,f - n + r reduces to
where p = E { q ) and ~ ( 0 ) = E{(Y, - p)'}. If the mean is known to be zero and not estimated, we obtain for p = 1
2 - I E{(Y~+ I - fa+ 8 1' IYnI = (I"~ + Y,Y, (0) + Rn *
The unconditional mean square error of g,,, is the expected value of (8.5.9).
Cod1ai-y 8S3. Let the assumptions of Theorem 8.5.3 hold. Then
s- 1 r - l s-I
= (I"' 2 A'MA'' + n- 'u2 2 2 A'MA" (8.5.11) j = O j = O k=O
x tr[(As-'-'r)'(r-'As-k-')] + O ( ~ J - ~ ' ~ ) .
Proof. omitted A
The approximate mean square error of the predictor given in (8.5.9) can be expressed in scalar form as
(8.5.12)
where the wi are defined in Theorem 2.6.1. For s = 1, we have 1 2 ' E{(Y,+, - f m + l ) 2 1 ~ n ) = ~ 2 + ~ - (I" Ynr-Iyn
an approximation that is often used. Let
n
# = (n - p - l)-I x (y, - &'Y,-l)2 f m l
and f=n-'Zy,, Y,-lY;-l. An estimator, q(Yn+s-fIn.+I), of the mean square
PREDICTION WITH ESTIMATED PARAMEfERS 449
error of prediction may be obtained from (8.5.9) by ignoring the remainder term and replacing A, r, and c2 by their estimators A, f, and a2, respectively. Let
s- I
V(Y,+, - P,,,) = a2 2 A k A t j j - 0
j = O k = O
(8.5.14)
For s = 1, equation (8.5.14) reduces to the estimated variance of the e m r in predicting Y,+l obtained by treating the problem as a hed-X regression prediction of Y,+, given Xn+l = Y,. Predictions more than one period ahead are nonlinear functions of the estimated parameters.
Example 85.1. A number of computer programs are available to compute predictions. Procedure ARIMA of SAS/ETS@ was used to estimate models in Example 8.4.1. That program computes predictions that are essentially the estimated minimum mean square error predictor. The estimated prediction variance for the s-period prediction computed by the program is the formula that ignores the estimation error,
.--I " 1
j = O (8.5.15)
where 9 are estimators of the coefficients of Theorem 2.7.1. Predictions for the next three periods using the estimated autoregressive moving average (2,2) of Example 8.4.1 are -7.96, -5.17, and -1.14 for one, two, and three periods, respectively. The standard errors output by the program are 1.00, 2.42, and 3.67 for one, two, and three periods, respectively.
If we use Gaussian maximum likelihood to estimate the fourth order auto- regressive process, we obtain the model
- j%= 2.188 (Y,-, - I;) - 2.130 ( K - 2 - j % ) (0.098) (0.220)
(0.220) (0.098) + 0.977 (Y,-3 - j % ) - 0.232 (q-4 - f i ) ,
where I; = -0.270 (0.493). The predictions for the next three periods based on the fourth order autoregressive model are -8.11, -5.37, and -1.30 for one, two, and three periods, respectively. The standard errors that do not contain a component for estimation error, as output by the program, are 1.00, 2.40, and 3.58 for one, two, and three periods, respectively. To illustrate the effect on the estimated standard e m of the component due to
estimating parameters, we use the ordinary least squares estimates of the fourth
450 PARAMETER ESTIMATlON
order autoregression. The estimated model is given in Example 8.4.1. To compute predictions, we use the method of indicator variables and a nonlinear regression program. To construct predictions for three periods, we write the regression model
y, = eo -t- e,x,, + e2xf2 + e3xt3 + e4xt4 + e5x,, + ( ~ 6 - ~ ~ ~ 5 ) x 1 6 + ( ~ - e 1 e 6 - e ~ e ~ ) x t 7 + e t ,
where the regression variables are defined in Table 8.5.1. The estimator of (e,,e,,6,) obtained by nonlinear least squares is the predictor of (Yn+,,Yn+2,Yn+3). The estimated standard errors for (&, J6, 4) output by a nonlinear regression program are standard errom for the predictions containing a component for estimation error. The estimated predictions and the estimated standard errors for the nonlinear regression are given in the last column of Table 8.5.2.
The next to last column of Table 8.5.2 contains tbe standard errors computed with the formula (8.5.15) using the ordinary least squares coefficients. The differences between the standard errors of the last two columns are due to the
Table 8.5.2. Alternative Predictors for Data of Example 8.4.1
Method and model
Prediction ML, ARMA(2,Z) ML, AR(4) OLS, AR(4) OLS, AR(4) Period (8.5.15) (8.5.15) (8.5.15) (Nonlinear) n + l -7.96 -8.1 1 -8.17 -8.17
(1.W ( 1 *w ( 1.w (1.01)
n + 2 -5.17 -5.37 -5.51 -5.51 (2.42) (2.40) (2.38) (2.45)
n + 3 -1.14 - 1.30 - 1.48 - 1.48 (3.67) (3.58) (3.55) (3.68)
NONLINeAR pRocEssBs 451
estimated effect of the estimation error. The estimation error makes a larger contribution to the estimated standard errors for the longer predictions.
Table 8.5.2 also contains predictions and standard errors for two models estimated by maximum likelihood, The maximum likelihood estimates of the autoregressive process differ slightly from the ordinary least squares estimates, and hence the predictions differ slightly. The autoregressive moving average predic- tions differ somewhat from the predictions based on the autoregressive model, but the differences are smaU relative to the standard errors. The differences among the predictions for the different models are larger than the differences among the estimated standard errors. AA
8.6. NONLINEAR PROCESSES
In this section, we consider estimation for time series that fall outside the class of finite parameter autoregressive moving averages. As we have seen, estimation for autoregressive time series has many analogies to linear regression theory. Estimation for moving average time series is a particular nonlinear estimation problem, in that the model can be expressed as an autoregression in which the coefficients satisfy nonlinear restrictions. We now study extensions in which the conditional expectation of Y, given ( Y t - l , . . . , Yo) is a nonlinear function of (u, -,.. . . . Yo).
Assume
Y, =fiz,; e) + e, , t = i , 2 , . . . , (8.6.1)
where Z, = (Y 1-1' Y t - 2 : : . . , Yo), 8 = (el, 0,. . . . , Ok)' is the parameter vector, and {e,} is a sequence of iid(0, a') random variables. Let d be the value of 8 that minimizes
AssumeflZ,; 6 ) has continuous first and second derivatives with respect to 6 for all t on a convex compact set S, S C 8, where the true value 8' is an interior point of S, and 8 is the parameter space. The model (8.6.1) is the same as the model (5.5.1) of Chapter 5. In Theorem 5.5.1, we gave conditions under which the nonlinear least squares estimator has a normal distribution in the limit. Basically, the derivatives must satisfy some convergence criteria.
In ordinary fitting problems, where U, is known to depend on a vector Z,, it is common practice to approximate an unknown functional relationship with a polynomial in Z,. The fitted function may be used directly as an approximation to the conditional expectation, or it may erne as an intermediate step in developing a nonlinear model with realistic properties. We demonstrate that similar approxi- mations can be used in time series analysis.
452 PARAMf,lER ESTIMATION
Theorem 8.6.1. Let U, be a strictly stationary time series wsth finite 2(r+ 1) moments for some positive integer r. Let XI be a vector of polynomials in lags of r,:
XI = (1, K-1, q - 2 , . . . , KTpr Y I _ , t 2 U , - l U , - , * * * 9 y:-,> *
Assume
Y, =flZ,; 9) + el,
where {e,} is a sequence of iid(0,a’) random variables, 2, is the vector (r,-l, Y,-t, . . . , Yo), and 6 is a vector of parameters. Assume
where E{(Y,, Xl)‘(q, XI)} is positive definik Let
and
where = X I &
Proop. We have
By construction, E{XI’[f lZl; 6) -XI&) = 0. Also,
(8.6.5)
by the assumption (8.6.3). We demonstrate a stronger result in obtaining the order of 8- 6. By
assumption, el is independent of K-,, for h 9 1. Therefore, letting dI-, denote the sigma-field generated by (q- U,-’, . . .), X;e, satisfies the conditions of Theorem 5.3.4 because
NONUNEAR PROCESSES 453
[ E { ~ X ; x r } ] - ' i 1 5 1 x;x,e,i .
Therefore,
and g- & =pp(n-"2). Because 6- 5: = OP(n-"'),
where we have used (8.6.3). A
It is a direct consequence of Theorem 8.6.1 that polynomial regressions can be used to test for nonlinearity in the conditional expectation.
Corollary 8.6.1. Let U, be the strictly stationary autoregressive process satisfy ing
P
Y, = tr, + C el&-, + e, . r = l
where {e,} is a sequence of iid(0, u2) random variables. Assume U, satisfies the moment conditions of Theorem 8.6.1, includinfi the assumption (8.6.3). Let 8 be defined by (8.6.5). where %= (&, &) and 6, is the vector of coefficients of (1, q-1,. . . , &-,I. Let
n
B2 = (n - Q-' 2 (u, - fy, 1-1
where k is the dimension of X, and = X,& Then
where
and the partition of f i confoms to the partition of 8.
454 PARAMEER ESTIMATION
Proof. Under the assumptions,
andf(Z,; 8) - X,& = 0. The limiting normal distribution follows from the proof of Theorem 8.6.1. A
It follows from Corollary 8.6.1 that the null distribution of the test statistic
where k, is the dimension of t2 and k is the dimension of 8, is approximately that of Snedecor's F with k, and n - k degrees of freedom. The distribution when ffZ,; t9) is a nonlinear function depends on the degree to which fiZ,; 8) is well approximated by a polynomial. If the approximation is good, the test statistic will have good power. Other tests for nonlinearity are discussed by Tong (1990, p. 221), Hinich (1982), and Tsay (1986).
There are several nonlinear models that have received special attention in the time series literature. One is the threshold autoregressive model that has been studied extensively by Tong (1983, 1990). A simple first order threshold model is
where
(8.6.7)
The indicator function, s(Y,- ,, A), divides the behavior of the process into two regimes. If Y,-, 2 A , the conditional expected value of Y, given Y,-l is (6, + e2)Y,-,. If Y,-, < A , the conditional expected value of Y, given Y,-, is e,Y,-,. Notice that the conditional expected value of U, is not a continuous function of Y,- I for e2 # 0 and A # 0.
Threshold models with more than two regimes and models with more than one lag of Y entering the equation are easily constructed. Also, the indicator function 4.) can be a function of different lags of Y and (or) a function of other variables. Practitioners are often interested in whether or not a coefficient has changed at some point in time. In such a case, the indicator function can be a function of time.
Most threshold models do not satisfy the assumptions of our theorems when the parameters specifying the regimes are unknown. The conditional expected vaiue of Y, given Y,-* defined by threshold model (8.6.6) is not a continuous function of Y,- , , and the conditional expected value is not continuous in A. Models that are continuous and differentiable in the parameters can be obtained by replacing the function 4.) defining the regimes with a continuous differentiable function.
NONLINEAR PROCESSES 455
Candidate functions are continuous differentiable statistical cumulative distribution functions. An example of such a model is
* U, = e, w,- I + e 2 m - 2 . K)L + e, , (8.6.8)
(8.6.9)
where 6(x, K) is the logistic function. The parameter K~ can be fixed or can be a parameter to be estimated. The conditional mean function of (8.6.8) is continuous and differentiable in (el, f?,, K,, q) for tcI in (0,"). Tong (1990, p. 107) calls models with s(.) a smooth function, such as (8.6.9), smoothed threshold autoregressive models. See Jones (1978) and Ozaki (1980).
&x, K) = [I + exdxc,(x - K2))~ -1 , *
Example 8.6.1. One of the most analyzed realizations of a time series is the series on the number of lynx trapped in the Mackenzie River district of Canada based on the records of the Hudson Bay Company as compiled by Elton and Nicholson (1942). The data are annual recurds for the period 1821-1934. The biological interest in the time series arises from the fact that lynx are predators heavily dependent on the snowshoe hare. The first statistical model for the data is that of Moran (1953). Tong (1990, p. 360) contains a description of other analyses and a detailed investigation of the series. We present a few computations heavily influenced by Tong (1990). The observation of analysis is log,,, of the original observations. The sample mean of the time series is 2.904, the smallest value of W, - f is - 1.313, and the largest value of U, - is 0.941. The second order autoregressive model estimated by ordinary least squares is
8 = 1.057 + 1.384 U , - l - 0.747 U , - 2 , (0.122) (0.064) (0.m) (8.6.10)
where 6' = 0.053, and the numbers in parentheses are the ordinary least squares standard errors.
Let us assume that we are interested in a model based on U,-l and q-2 and are willing to consider a nonlinear model. For the moment, we ignoG information from previously estimated nonliiear models. If we estimate a quadratic model as a first approximation, we obtain
j , = 0.079 + 1.366 yl-l - 0.785 Y, -~ + 0.063 Y:-, (0.034) (0.073) (0.068) (0.159)
2 + 0.172 Y , - ~ Y , - ~ - 0.450 JJ,- .~ (0.284) (0.172)
where y, = U, - p = Y, - 2.904 and &' = 0.0442. Also see Cox (1977). The ordinary regression F-test for the hypothesis that the three coefficients of the quadratic terms are zero is
F(3,106) = (0.0442)-'0.3661= 8.28,
456 PARAMETER ESTIMATION
and the hypothesis of zero values is strongly rejected. The coefficient on yfbI is small, and one might consider the estimated model
j t = 0.089 + 1.350 yt-! - 0.772 y,-2 + 0.272 Y , - I Y , - ~ - 0.497 Y:-2 i
(0.027) (0.059) (0.059) (0.129) (0.123) (8.6.1 1)
where Ci2 = 0.0438. The estimated conditional expectation of Y,, given (y t - , , Y,-J, is changed very
little from that in (8.6.11) if we replace yIA2 in the last two tenns of (8.6.11) by a bounded function of Y , - ~ that is nearly linear in the interval (-1.1,l.l). Such a function is
g(Y,-,; K t , K2) = 11 4- exp(K,(Y,-, - K2)}1-' - (8.6.12)
If K~ is very small in absolute value, the function is approximately linear. As K,(Y, -~ - K ~ ) moves from -2 to 2, g(y,-,; K ~ , K ~ ) moves from 0.88 to 0.12, and as - K?) moves from -3 to 3, g(y,-2; q, K ~ ) moves from 0.9526 to 0.0474. If K~ is very large, the function is essentially a step function. Also, as K'
increases, the derivative with respect to K, approaches the zero function except for values very close to K ~ . The range of y, = Y, - y' is from - 1.313 to 0.94 1. Tbus, g(y,-,; -2.5,O) is nearly linear for the range of the data.
The estimated equation obtained by replacing y,-, with g(y, -2; -2.50) in the last two terms of (8.6.11) is
(8.6.13)
with Ci2 = 0,0432. We note that g(y,-,; K,, %) converges to the indicator function with jump of height one at the point K~ as K~ -+ -w. Therefore, the threshold model is the limit 8s ( K , , K2) 3 (-a, 22), where g2 is the point dividing the space into two regimes.
We fit the nonlinear model obtained by letting K , and K~ of (8.6.12) be parameters to be estimated. In the estimation we restricted K, to be the interval [-15, -0.51, and the minimum sum of squares m u d on the boundary K, = -15. The estimated function is
(8.6.14)
where Bz = 0.0420 and the standard error of k2 is 0.078. The standard errors are computed treating K, as known, If the standard errors are computed treating K~ as
NONLINEAR PROCESSES 457
estimated, the standard error of 2, is larger than the estimated value, This reflects the fact that the derivative with respect to K~ approaches zero as tcI 4 -a.
The residual sum of squares for the model (8.6.14) is 4.448, while that for (8.6.13) is 4.620. The reduction due to fitting K~ and tcZ is 0.172, and the F for a test against (-2.5,O) is 2.05, which is less than the 5% tabular value of 3.08. If we assumed that we were only estimating tc2, then the improvement in the fit would be significant. The K, of - 15 means that about 76% of the shift occurs in an interval of length 0.27 centered at the estimated value of 0.357. The estimated variance of the original time series is 0.314. Therefore, the interval in which most of the estimated shift takes place is about one-half of one standard deviation of the original time series.
A threshold model fitted using g2 to define the regimes is
0.102 f 1.278 yral - 0.456 Y , - ~ if y,-,<0.36, (0.030) (0.071) (0.086)
0.166 + 1.527 yt-, - 1.239 y,+ if y , - , 3 0 . 3 6 , i (0.152) (0.108) (0.264)
jl =
where 8* = 0.0445. Tong (1990) has given threshold models with smaller residual mean square.
It is reasonable to conclude that the process generating the data is nonlinear, but a number of nonlinear models are consistent with the data. Selection of a final model would rest heavily on subject matter considerations. AA
A second nonlinear model that has been studied extensively is the bilinear model. See, for example, Granger and Andersen (19781, Subba Rao and Gabr (1984). and Tong (1930). The first order model can be written
q=a,Y,-, +BIY,- Ie,- , + e , , (8.6.15)
where el - NI(0, a2). Sufficient conditions for (8.6.15) to be a stationary process are a: + &v2 < 1 or E&j + e,l} < 1. Both conditions are intuitive in that they are analogous to a requirement that, on average, the coefficient of Y,- , be less than one in absolute value. See Tong (1981) and Quinn (1982).
A third class of nonlinear models is the random coefficient autoregressive models. An example of such a model is
where
a: = (a,,, a,,, . . . , a,,), and e, is independent of a,. Nichols and Quinn (1982) is a treatment of such models.
458 PARAMETER ESTIMATION
8.7. MISSING AND OUTLIER OBSERVATIONS
Estimation with missing observations requires a specification of the mechanism that is responsible for the observations being missing. The most common specification is that observations are missing at random. See Rubin (1376) and Little and Rubin (1987). Under the missing-at-random specification, the likelihood can be written as
(8.7.1)
Y, is the vector of Y, actually observed, and Y, is the vector of missing observations. The subscripts of Y are placed in square brackets because they do not necessarily correspond to the index t.
A simple missing mechanism specifies some probability p,,, = q that an observation is missing and that this probability is equal for all observations. A more general model would permit the probability to depend on the index t, but in order to estimate 8 by maximizing only L(Y,; 8). the probability that the observation is missing must not be a function of Y.
Assume that Y, is a normal stationary autoregressive moving average of order ( p , q) and that m observations in a sample of size n are missing at random. Then
where YB is the (n -m)-dimensional vector of observations and Z,, is the covariance matrix of Y,. While it is relatively easy to write down the likelihood, there has been considerable research on computational methods.
Jones (1980) suggested using the Kalman filter representation of Section 4.6 to construct maximum likelihood estimators for autoregressive moving averages. Let
where r g t, s = max(p, q -k l), and E{Y,+] I r} = E{Y,+, I (Y,, Yz, . . . , Y,)). Let XI,?,, denote thejth element of X,,,, and abbreviate XI,, to X,. The Kalman recursion IS initiated at t = 1 with X, of (4.6.43) equal to 0, and I;vvoo equal to the covariance matrix of XI. Let the sample be complete. Then one can write the log-IikeEihood as
n 4
(8.7.3)
n log(2.rrtr2) + 2 log u,, + 2 o,'<Y, - I= I 1 3 1
MISSING AND OUTLIER OBSERVATIONS 459
where k,,,,f- , is the best linear predictor of Y, given (Y,-,, . . , , Y,), and
%=vfu, - ~ f , l , f - , l - (8.7.4)
We have suppressed the dependence of u,, and 9 ,.,,,-, on the parameters to simplify the notation. In Example 4.6.4, we illustrated how the usual Kalman recursion formulas can be used in the presence of missing data. Thus, the maximum likelihood estimators can be obtained by maximizing (8.7.3) with respect to the unknown parameters, using the formulas of Section 4.6, where the sum in (8.7.3) is over the elements of Y,.
If there are a modest number of missing values, the method of indicator variables can be used in conjunction with an ordinary maximum likelihood program to construct the maximum likelihood estimators. We illustrate the procedure in hample 8.7.1.
Example 8.7.1. The data of Table 8.7.1 are computer generated observations from a second order autoregressive process. We assume that observations 40, 57, and 58 are missing at random. One method of computing the maximum likelihood estimates under this assumption is the method of indicator variables illustrated in the construction of predictions in Example 8.5.1.
Tab& 8.7.1. Data for Example 8.7.1
1.467 7.181 8.939 5.810
-0.267 -4.053 -5.427 -3.624 -0.704
1.256 I
2.162 3.232 1.076
-2.572 -4.135 -3.836 -1.185
0.787 1.446 2.425
2.722 0.201
- 3.239 -5.287 -5.491 -3.210
0.682 4.97 1 6.265 3.347
-0.860 -3.697 -4.030 -1.710 -0.217 - 1.254 -0.618
0.087 0.553
-1.198 - 1.803
0.096 -0.113 -0.023 -0.746 - 1.072 - 1.235 -0.943
1.218 3.173 2.158 1.859 2.676 3.01 1 1.540
- 5.081 3.900
-0.107 -4.723 -7.357 -4.938
1.647 6.374 5.993 3.705
-0.784 - 3.036 -3.516 -1.510
0.294 1.637 0.633
- 1.447 -4.060 -5.559 -3.814
2.067
4.156 4.224 2.605
-0.561 - 1.842 - 1.580
0.266 1.526 3.401 3.450 1.361
-2.423 -5.100 -5.433 -3.377 -0.070
2.924 3.432 3.209 1.745
460 PARAMEI'ER ESTIMATION
To implement the procedure, we define
-1 if t = M ,
= { 0 otherwise , -1 if t=57,
-1 if t = 5 8 ,
Then the model
(8.7.5) u, + a]fl,_l + aju,-2 = 6, 9
where el -NI(O, c2), is estimated by maximum likelihood. Using the maximum likelihood option of the AUTOREG procedure of SAS", we obtain
= P O + P l x t , + + @?'I3 + *
=(- 0.001, - 0.040, 1.442, 3.284, - 1.377, - 0.899, 1.183). (0.207) (0.554) (0.787) (0.786) (0.045) (0,042) (0.173)
where 8' is the estimator adjusted for degrees of freedom. Observe that (B,, b2, b3) is the best estimator of (Y40r Ys7, YS& given the other observations and that the parameters are equal to (&, ,%). The standard errors for Y5, and Y5, are larger than the standard error of the estimator of Y40 because Y,, and Y,, are adjacent missing observations. AA
In any statistical analysis, it is wise practice: to inspect the data at every stage of the analysis for extreme or unusual observations. Such observations are often called outlier observations. In time series analysis, the first step in the analysis is generally an inspection of plots of the original data. Such plots help one define the nature of the data. The plot of the data against time is often sufficient to identify very extreme observations, particularly extreme observations created by emus in recording data.
As with many important concepts, what defines an outlier is difficult to specify. This is because what is unusual requires a complete specification of the statistical model, a specification beyond that common in practice. Thus, in a sample of 100 normal (0,l) random variables, a value of sif-,)(Xtn) -Z(n-l,) = 5.00, where Xol, is the largest observation, is the root mean square for the 99 smallest observations, and .f(,,- I is the mean of the 99 smallest observations, is unusual. In a sample of 100 from a Cauchy distribution, the same value is less unusual.
Fox (1972) defined two types of unusual observations in time series. The first type is called an additive outlier. As an example model for additive outliers, assume that we observe the process
(8.7.6)
MISSING AND OUTLIER OBSERVATIONS 461
where XI is a stationary autoregressive moving average, 8, is zero for t # m and one for t = m, and l, is the value of the perturbation at r = m. Under such a model, one might attempt to estimate I, and to remove the effect of the 5, on the estimates of the autoregressive parameters. One can expand the specification by permitting more than one outlier.
The second type of outlier is called an innovation outlier. To illustrate, assume that Y, is a stationary autoregressive process satisfying
(8.7.7)
where e, = a, + S,,l,, and (a,,,,, 5,) has the properties described following (8.7.6). Then {, is called an innovation outlier.
If we assume that we know the point m at which an outlier may have occurred, treat the unknown &, as a parameter to be estimated, and assume is a normal process, then one can use likelihood methods to estimate 5, and to test the hypothesis that 5, = 0. If the point at which an outlier may have occurred is unknown, then a reasonable procedure is to search over the possible values. To construct the likelihood ratio for every value of t , or for every possible pair of values, is a large computational task. Therefore, it is common practice to fit a model to the original data and then inspect the residuals to see if any are unusual. The inspection is often done on the basis of plots. In an autoregressive process, an innovation outlier will generally appear as a single residual of large absolute value. This is because only the single observation fails to satisfy the autoregressive model. On the other hand, an additive outlier will affect the residual for the point at which it occurs and will also affect the p following residuals because they also fail to satisfy the basic autoregressive model. Statistics for testing for outliers were suggested by Fox (1972), and extensions have been considered by several authors, including Chang, Tiao, and Chen (1988) and Tsay (1988).
Assume we have an autoregressive process of order p with known parameters. The predicted vdue for Y,, given that Y, is not observed, is a function of
filter, say H, which when applied to the y, gives an estimator of Y,. The difference between Y, and Hyi divided by the estimated standard deviation of Y, - Hyk provides evidence on whether or not Y, is an additive outlier. For a first order process with parameter a,,
y, - - (Y,-,, . . . , Y,-,, Y,,,, . . . , Urn+,)'. Therefore, one can construct the linear
and the variance of the difference is a2( 1 + a:)-'. In practice, it is necessary to replace the unknown parameters with estimates.
If the parameters of the autoregressive process have been estimated and the residuals calculated, then the residuals at the points m, m 4- 1, . . . , m + p are affected by the presence of an additive outlier. Thus, an estimate of c,, calculated from the residuals. is
462 PARAMETER ESTIMATION
(8.7.9)
where 4 = 1 and 2,+i = E?=, djYm++,, i =0, 1,. . . , p. A test statistic for the hypothesis that 5, = 0 is
(8.7.10)
where &' is an estimator of u2, such as the mpssion residual mean square. If the process is n o d , rn is not determined by the data, and 5, = 0, then t, is, approximately, a N(0,l) random variable.
An innovation outlier at time m affects only the residual at time rn. "Is, an estimate of 3, is the residuals at the point rn. Fox (1972) and Chang, Tho, and Chen (1988) have discussed choosing between the two types of outliers, A simple procedure is to choose the type of outlier that gives the largest absolute value of
In Example 8.7.2, we use model fitting procedures similar to those used in Example 8.7.1 to identify unusual observations, to classify the unusual observa- tions, and to construct estimates of the autoregressive parameters.
the estimate of lrn.
Example 8.7.2. Table 8.7.2 contains 100 computer generated observations from a second order autoregression. The basic model is
25.267 27.08 1 26.819 25.1 10 20.733 16.947 15.573 17.376 20.2% 22.256 23.162 24.232 22.076 18.428 16.865 17.164 19.815 21.787 22.446 23.425
23.722 21.201 17.761 15.713 15.509 17.790 21.682 25.971 27.265 24.347 20.140 17.303 16.970 19.290 20.783 19.746 20.382 21.087 21553 22.063
19.802 19.I97 21.096 20.887 20.977 20.254 19.928 19.765 20.057 22.2 18 24.173 23.158 22.859 23.676 24.01 1 22.540 22.818 24.914 26.081 24.900
20.893 16.277 13.643 16.062 25.712 31.666 30.120 25.048 17.757 14.199 14.525 18.885 23.230 25.915 24.403 20.349 15.452 12.609 14.620 22.137
26.266 27.653 25.962 21.455 22.665 17.355 19.120 21.462 24.929 26.190 24.300 19.655 15.589 14.1 17 15.886 19.861 24.060 25.628 25.75s 23.786
MISSING AND OUTLIER OBSERVATIONS 463
where e, - NI(0, a2). The parameters of the second order autoregressive process estimated by maximum likelihood are
(fi,&l,&2, a2)= (21.150, - 1.308, 0.840, 2.20). (0.276) (0.053) (0.053) (0.32) (8.7.12)
Figure 8.7.1 is a plot of the standardized residuais from the second order autoregression, plotted against time. The standardized residuals are
c1 = (Y, - &)~;““o), z* = [(Y2 - $1 - m ( Y , - P)I(~y(o)r1 - i52(1)11-1’2 >
Z, = [Y, - $ + hl(yt-, - f i ) + 4(yI-2 - $)la-’, t=2,3,. . . ,n,
0 20 40 60 80 100 Observation
FIGURE 8.7.1. Residuals from second Mder autofegressive fit.
464 PARAMETER ESTIMATION
where &' = 2.20 is the estimated variance of el. There is a large deviation at t = 65 and three large deviations for t = 85, 86, and 87. In practice, one would attempt to determine if an error has been made in data recording or if there is a subject matter basis for the unusual observations. We proceed under the assump- tion that no explanation for the large deviations is available.
The plot suggests that there is an innovation outlier at t = 65, because there is a large deviation only at that point, the deviations at t = 66 and t = 67 appearing rather ordinary. The deviations associated with t = 8 5 , 86, and 87 indicate that there may be an additive outlier at t = 85. There is a large positive deviation at t = 85, followed by a negative deviation at t = 86, followed by a positive deviation at t = 87, and these signs match the signs of the autoregressive coefficients. The value of t,,, of (8.7.10) for m = 85 is 5.81, while the standardized residual is u eEJ = 3.48. Thus, the data support the presence of an additive outlier over the presence of an innovation outlier at t = 85.
To estimate the parameters of the model treating Y6$ as an innovation outlier and Y8, as an additive outlier, we write a regression model
A - I a
(8.7.13)
where u, is the zero mean second order autoregressive process. In this nzpresenta- tion, an additive outlier can be introduced by creating an indicator variable for the point. For our example, we let XI, = 1 for the parameter p and let
1 if r = 8 5 , 0 otherwise.
(8.7.14)
The process of maximizing the likelihood for the model (8.7.13) with XI = ( X l , , X 1 2 ) can be visualized as transforming the data on the basis of the autoregressive model and then finding the /3 that minimizes the regression residual mean square. Thus, after transformation,
YS5 = p + & - ry1(Y84 - p) - a2(y83 - p) + e85 *
yS6 = p - a1(Y85 - p -. ss,) - a2(y&4 - p) + e86 9
y8'7 = - cTI(y86 - - %(u8, - - & 5 ) + e87 I
where (PI. &I = (P, &). A different X-variable is requiied for an innovation outlier in regression model
(8.7.13). We require a variable which, after the autoregressive transformation, is one for t = m and is zero for t # m. If we know (mi, a'), the variable
if tC6.5, if t = 6 5 , (8.7.15)
= { %,x~- 1.3 ' a2xt-2,3 if t>65
MISSING AND OUTUER OBSERVATIONS 465
will satis$ the requirements for rn=65, because X,, satisfies the difference equation. We created the vector X t = ( X t l , X , z , X , 3 ) using (-1.308,0.840) from our initial fit of the autoregressive model. The model (8.7.13), estimated by maximum likelihood using the ARIMA procedure in SASQ, gives the estimates
(A &,5, f85, &,, $, 82)=(20.9!?1, 4.65, 5.27, (0.220) (0.55) (1.07)
- 1.390, 0.877, 1.157). (0,048) (0.048) (0.170)
If we iterate, using (6,. &J = (- 1.390, 0.877) to create a new X,, variable, we obtain
(G, &5, tB5, g,, &, 2) = (20.982, 4.64, 5.16, - 1.395, 0.880, 1.179) . (0.223) (0.55) ( I .lo) (0.048) (0.047) (0.173)
(8.7.16)
Note that iteration changed the estimates very little. The estimates of tS and lB5 are estimated differences between the observed
values and the values expected under the autoregressive model, where the deviation is assumed to be an additive outlier for r = 85 and an innovation outlier for r = 65. Because we identified these points from the residual plots, we should not use ordinary critical values in making tests. Monte Car10 studies by Chang, Tiao, and Chen (1988) suggest that the largest t in a sample of 100 will exceed 4.00 about 1% of the time when the null model is true. Therefore, both observations ate suspect.
Removing the outliers produces a large reduction in the estimated variance, where the estimated variance in (8.7.16) is only about 55% of that in (8.7.12). The changes in the autoregressive coefficients due to removing the outliers are between one and two standard errors, and the estimated mean is changed by about one-half the standard error.
If we use the estimates given in (8.7.12) to create predictions for the next three periods, we obtain
(Pi011 f102, f103) = (20.73, 18.38, 17.89). (1.48) (2.44) (2.76)
The corresponding predictions constntcted with (8.7.16) are
The largest effect of removing the outliers is a reduction in the estimated standard error of the prediction errors. One should evaluate these effects when deciding to remove outliers. If the true model is one with long tailed errors, then removing
466 FARAMETeR ESTIMATlON
outliers and using the smaller estimate of Q* will lead to confidence intervals for predictions that are too narrow. AA
The estimation procedure of Example 8.7.2 assigns zero weight to observations that are identified as outliers. Alternatively, the presence of unusual observations might lead to a modification of the working model and the application of estimation procedures to the modified model. One might specify a long tailed distribution or a mixture distribution for the errors and use maximum likelihood estimation under the specified model. See Kitagawa (1987), Pefia and Gutrman (1989). and Durbin and Corder0 (1993).
Estimators that perform well under a wide range of statistical models are called robust estimators. See Hampel et al. (1986) and Huber (1981) for general discussions of robust procedures. Robust procedures that automatically down- weight large deviation observations have also been considered for time series. See, for example, Martin (1980), h i s s (1987), and Bruce and Martin (1989).
8.8, LONG MEMORY PROCESSES
We introduced long memory time series in Section 2.1 1. The long memory time series that is called fractionally differenced noise satisfies
(8.8.1)
and
$d) = [r(j + i)r(-d)~-~r(j - d ) ,
~ ( d ) = mi + ~)r(d)i - 'rcj + d ) ,
and -0.5 < d < 0.5, As might be expected, the estimation theory associated with such processes differs from that of short memory processes. For example, consider the variance of the sample mean. We have
If ~ ( h ) is proportional to hZd-' and d < 0, then ~ ( h ) is absolutely summable and V E n } = O(n- ). If 0 < d C 0.5, ~ ( h ) is not summable, but Yajima (1988) has shown that
LONG MEMORY PROCESSES 467
lim ni-zd v&} = dr( 1 - u)[r(d)r( 1 - dw( i + 2.41 - . (8.8.2) n - s m
The distributions of the sample autacovariances are difficuIt to obtain, but it is relatively easy to show that the sample autocovariances are consistent.
Lemma 8.8.1. Let U, be the infinite moving average time series m
U, = C ye,.-, , ?PO
where e, are independent (0, a*) random variables with bounded fourth moments and the v, are square swnmable. Let
n-h
W) = n-’ x U,q+, . r * l
Then plim,,, +y(h) = %(h) for h = 0, 1,2,. . . .
and
uniformly in n, because lim,,,E{CX,- U,) ’ )=O, uniformly in t, Therefore, by Lemma 6.3.2,
A Plim + y W = Y y ( W * n+m
To introduce estimation for the parameters of long memory processes, assume
468 PARAM- ESTIMATION
we have n observations from the Y, of (8.8.1) and we desire an estimator of d. Consider the estimator of d obtained by minimizing
(8.8.3)
with respect to d. Note that the infinite autoregressive representation of U, is truncated at length t - 1 in (8.8.3). The function (8.8.3) is of the same type as (8.4.7).
Lemma 8.82. Let the model (8.8.1) hold, let do E 03 be the true parameter, and let 8 = [d , d2,] C (0.0,O.S). Assume the e, are independent identically distributed (0, Q ) random variables with finite fourth moment E(e:) = &*. Let d be the value of d E 8 that minimizes (8.8.3). Then plim,,, d = do.
'f
Proof. We show that d is a consistent estimator of do by showing that
lim P{ inf [Q,(d) - Qfl(dO)l > 0) = 1 (8.8.4) "*)" Id-doIarl
for all q > O . Consider the difference
where Q,(d) = n - Z:= S:(d) and
Now,
= o(t-2d) Therefore
and
LONG MEMORY PROCESSES 469
for all d , and d, in 8, where
(8.8.5)
(8.8.6)
and lbj(d)tGMj-l-d logj as j + m . Since the coefficients ~ ( d ) and bi(d) are absolutely summable and attain their supremum at some point in the set [d,,, d,,], it follows that the supremum of the derivative on the right of (8.8.5) is UJl). Thus, by Lemma 5.5.5, plim,,,, C,,(d) = 0 uniformly in d. Therefore, we consider the d associated with the minimum of Z:=, A:@).
We have n
plim n-' 2 A:@) = E{A:(d)}
because A,@) = XY-, gl(d)e,+, where g,(d) is square 2.2.3,
n-wJ , = I
summable by Theorem
m
gi(d) = C Ki-r(d)y(do) 9 ypi-m
and it is understood that ~ ( d ) = 0 and $do) = 0 for j < 0. Again, by the mean value theorem,
for all d , and d2 in 6, and, by the properties of Kj(d) and b,(d), the supremum on the right is UJl). Now,
470 PARAMEFER ESTIMATION
for some M < a because SUPdce lq(d)l and SUp,,,elb,(d)l am absolutely summ- able; we have used the dominated convergence theorem. Hence, by Lemma 5.5.5,
n
plim n- ' A:@) = E{A :(d)} n+- / = I
uniformly in d. Because E{A:(d)} reaches its minimum at do, the condition (8.8.4) is established. A
Results on estimators of d and of the parameters of the process
(1 - &u, = 2, , ' (8.8.7)
where Z, is a stationary autoregressive moving average, have been obtained by a number of authors. Maximum likelihd estimation for the normal distribution model has been studied by Fox and Taqqu (1986), Dahlhaus (1989), Haslett and Rafbry (1989). and Beran (1992). Properties of Gaussian maximum likelihood estimators for linear processes have been investigated by Yajima (1985) and Giraitis and Surgailis (1990). Also see Robinson (1994a). The following theorem is due to Dahlbaus (1989). The result was extended to linear processes by Giraitis and Surgailis (1990).
Theorem 8.8.1. LRt Y, satisfy (8.8.71, where d E [dl0, d,,] C (0.0,0.5), and Z, is a stationary normal autoregressive moving average with (k - 1)dimensional parameter vector 4. where t& is in a compact parameter space Q,. Let 8 = (d, 8;)'. Let 4 be the value of B that maximizes the likeiihood. Then
n112(d- 810- N(B, Ve,)
where
fv(o) is the spectral density of Y,, and the derivatives are evaluated at the true 61
Proof. Omitted. See Dahlhaus (1989). A
The inverse of the covariance matrix of "beorem 8.8.1 is also the limit of the expected value of n-'b;(Y)h#(Y), where
EXERCISES 471
log L(Y : 8) is the log-likelihood function, and the derivatives are evaluated at the true 8.
Section 8.1. Anderson (1971), Gonzalez-Parias (19921, Hasza (1980), Jobson (1972). Koopmans (1942).
Section 8.2. Anderson (1959, 1962, 1971). Box and Pierce (1970). Drapef and Smith (19811, Hannan (1970), Kendall (1954), Mann and Wald (1943a). Maniott and Pope (1954). Reinsel (1993). Salem (1971). Shaman and Stine (1988).
Sections 8.3,8.4. Anderson and Takemura (1986), Be& (1974), Box, Jenkins and Reinsel (1994), Brockwell and Davis (1991). Cryer and Ledolter (1981), Durbin (1959). Eltinge (1991). Hannan (1979). Kendall and Stuart (1966), Macphetson (1975). Nelson (1974). Pierce (197Oa), Sarkar (1990), Walker (1%1), Wold (1938).
Sedion 8.5. Davisson (1965). Fuller (1980), Fuller and Hasza (1980,1981). Hasza (1977). Phillips (1979), Yamamoto (1976).
Section 8.6. Granger and Andersen (1978). Prieatley (1988). Quinn (1982), Subba Rao and Oabr (1984). Tong (1983, 1990).
SeetiOn 8.7. Chang, Tiao, and Chen (1988), Fox (19721, Jones (1980). Ljung (1993). Tsay (1988).
sectton 8.8. Beran (1992). Dahlhaus (1989). Deo (1995), Fox and Taqqu (1986), Yajima (1985).
EXERCISES
1, Using the first 25 observations of Table 7.2.1, estimate p and A using the equations (8.1.7). Compute the estimated standard errors of a and d using the standard regression formulas. Estitnate p using the first equation of (8.1.9). Then compute the root of (8.1.10) using y, = Y, - 3. Iterate the computations.
2. Let Y, be a stationary time series. Compare the limiting value of the coefficient obtained in the regression of Y, - t(l)Y,.-l on Y,-, - ?(l)Y,-* with the limiting value of the regression coefficient of Y,-2 in the multiple regression of Y, on Y,-l and Y,-2.
3. Compare the variance of y,, for a first order autoregressive process with Var(&} and Va@}, where & = d( 1 - p)-' , and A^ and 3 are defined in (8.1.7) and (8.1.9). respectively. In computing Var($) and Va@}, assume that p is known without error. What arc the numerical vahes for n = 10 and p = 0.71 For n = 10 and p = 0.9?
4. Assume that 100 observations on a time series gave the following estimates:
NO) = 200, fll) = 0.8, 42) = 0.7, 33) = 0.5.
472 PARAMETER ESTIMATION
Test the hypothesis that the time series is first order autoregressive against the alternative that it is second order autoregressive.
5. The estimated autocorrelations for a sample of 100 observations on the time series {XI} are 41) = 0.8, 42) = 0.5, and $3) = 0.4. (a) Assuming that the time series {XI} is defined by
XI = ax,-1 + B2XI.-2 +el 9
where the e, are n o d independent (0, a’) random variables, estimate 8, and &.
(b) Test the hypothesis that the order of the autoregressive process is two against the dternative that the order is three.
6. Show that for a fixed B, B # Po, the derivative W,(C 8) of (8.3.9) converges to an autoregressive moving average of order (2,l). Give the parameters.
7. Fit a first order moving average to the first fifty observations in Table 8.3.1. Predict the next observation in the realization. Establish an approximate 95% confidence interval for your prediction.
8, Assume that the e, of Theorem 8.3.1 are independent with 4 + S moments for some 6 > 0. Show that n1’2[&z - ( v ~ ) ~ ] converges in distribution to a normal random variable.
9. Prove Theorem 8.3.1 for the estimator constructed with Y, - ji,, replacing V, in all defining equations.
10. Fit an autoregressive moving average (1.1) to the data of Table 8.3.1.
11. The sample variance of the Boone sediment time series d i s c u d in Section 6.4 is 0.580, and the sample variance of the SaylorvUe sediment time series is 0.337. Let XI, and X2, represent the deviations from the sample mean of the Boone and Saylorville sediment, respectively. Using the correlations of Table 6.4.1, estimate the following models:
x,, = @lXL1-1 + @2X1.,-2 9
x,, = @,X,,l-, + @2XI.,-Z + @,%*,-1 9
X21E 6 1 x l , ~ - l + 6Zxl,f-Z + e3xl , f -3 + 65X2.r-1 + e6xZJ-Z * A 7
On the basis of these regressions, suggest a model for predicting Saylorville sediment, given previous observations on Boone and Saylorville sediment.
EXERCISES 473
12. Fit autoregressive equations of order 1 through 7 to the data of Exercise 10 of Chapter 6. Choose a model for these data.
13. A test statistic for partial autocorrelations is given in (8.2.19). An alternative statistic is t: = n1’2&i. Show that
t 7 - 5 N ( 0 , l )
under the assumptions of Theorem 8.2.1 and the assumption that 4j = 0 for j 3 i.
14. A simple symmetric estimator with mean adjustment can be defined as a function of the deviations Y, - yn. Show that
15. We can express the Q(@) of (8.2.14) as
where q = l and c,=O if t - J < l or r - i < l . Show that cOl f=1 for t = 2.3, . . . , R for the weights as defined in (8.2.15). What are cool and c, for w, = 0.51 What are cooI and cI1, for the weights of (8.2.15)?
16. Compute and plot the estimated autocorrelation functions for the AR(4) and ARMA(2,2) models of Table 8.4.2 for h = 0, 1, . . . ,12. Does this help explain why it is difficult to choose between these models?
17. Consider a first order stationary autoregressive process Y, = pq-, + e,, with IpI < 1 and e, satisfying the conditions of Theorem 8.2.1. Prove Theorem 8.2.2 for the first order autoregressive process by showing that the difference between any two of the estimators of p is O ~ ( R - ” ~ ) .
18. Consider a first order invertible moving average process U, = f e,, where IpI < 1, and the eI are iid(0, a2) random variables with bounded fourth moments. Let 6 , . cZ,. . . , tx the regression coefficients in (8.3.15). Let hD.k
denote the Durbin’s estimator given by - [Zt , cly-l]-l [xt, 2,-,c?,], where to = - 1. Find the asymptotic distribution of @D,k for a fixed R. Find the limit as k + w of the asymptotic mean and the variance of SDqk.
19. Let (&,, 4, S2) be the estimated parameters for the secoad order auto-
474 PA- ESTIMATION
regressive process of Example 8.7.1. Use (&*, $, 6’) to construct estimates of the first five autocovariances of the process. Then use observations (YS8, Y3,,, Y41, Y4J and the estimated autocovariances to estimate Yd0. Use (yss, y56, y s g ~ y60) to (y57, y58)’
20. Assume that Y, is the first order n o m l stationary time series
Y, = e, + Be,-,, e,-NI(O, a’).
Show that the log-likelihood can be written n
log Qc /3) = -0.51 log 2~ - 0.5 Ic, log l E 1
n
- 0.511 log c2 - 0 . 5 ~ - ~ Z: V;’Zf, r = I
w h e r e Z , = Y l , V I = ( 1 + ~ 2 ) , Z t = ~ - V ~ - 1 1 j 3 Z t - 1 forr=2,3 ,..., n,and
~=l+p’-V;-”, p z , t = 2 , 3 ,..., n. 21. Rove the following.
Lemma. Let Y, and er satisfy the assumptions of Theorem 8.2.1. Let cArr t = 1,2,. . . , n and n = 1.2. ..., be a triangular m a y of constants with
n
Z: c;tf = 1 and lim sup c:~ = 0. 1 5 1 ‘+01 l 6 f c n
Then
forJ = 1.2,. . . , p .
C H A P T E R 9
Regression, Trend, and Seasonality
The majority of the theory presented to this point assumes that the time series under investigation is stationary. Many time series encountered in practice are not stationary. They may fail for any of seved reasons:
1. The mean is a function of time, other than the constant function. 2. The variance is a function of time, other than the constant function. 3. The time series is generated by a nonstationary stochastic mechanism.
We consider processes of the third type in Chapter 10. Examples of hypothesized nonstationarity of the first kind occur most frequently in the applied literature, but there are also many examples of heterogeneous variances.
The traditional model for economic time series is
u, = T, + S, + 2, , (9.0.1)
where T, is the “trend” component, S, is the “seasonal” component, and Z, is the “irregular” or “random” component. In our terminology, 2, is a stationary time series. Often T, is fufther decomposed into “cyclical” and ”long-term” com- ponents. Casual inspection of many economic time series leads one to conclude that the mean is not constant through time, and that monthly or quarterly time series display a type of “periodic” behavior wherein peaks and troughs occur at “nearly the same” time each year. However, these two aspects of the time series typically do not exhaust the variability, and therefore the random component is included in the representation.
While the model (9.0.1) is an old one indeed, a precise definition of the components has not evolved. This is not necessarily to be viewed as a weakness of the representation. In fact, the terms acquire meaning only when a procedure is used to estimate them, and the meaning is determined by the procedure. The reader should not be disturbed by this. An example from another area might serve to clarify the issue. The “intelligence quotient” of a person is the person’s score on an I.Q. test, and I.Q. acquires meaning only in the context of the procedure used to
475
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
476 REGRESSION, TIUPID, AND S E A S O N M Y
determine it. Although the test may be based on a theory of mental behavior, the I.Q. test score should not be taken to be the only estimator of that attribute of humans we commonly cali intelligence.
For a particular economic time series and a particular objective, one model and estimation procedure for trend and seasonality may suffice; for a different time series or a different objective, an alternative specification may be preferred.
We shall now study some of the procedures used to estimate trend and seasonality and (or) to reduce a nonstationary time series to stationarity. Since the mean function of a time series may be a function of other time seria or of fixed functions of time, we are led to consider the estimation of regression equations wherein the error is a time series.
9.1. GLOBAL LEAST SQUAREs
In many situations we are able to specify the mean of a time series to be a simple function of time, often a low order polynomial in t or trigonometric polynomial in t. A sample of R observations can then be represented by
y=*/3+2, (9.1.1)
where /3' = (p, , pZ, . . . , &) is a vector of unknown parameters,
F~, i = 1,2,. . . , r, are n-dimensional column vectors, and Z, is a zero mean time series. In many situations, we may be willing to assume that 2, is stationary, but we also consider estimation under weaker assumptions.
The elements of cp,, say p,,, are functions of time. For example, we might have Pr I = 1, 9 t z = 1, % =tZ . The elements of p, may atso be random functions of time, for example, a stationary time series. In the random case, we shall assume that z is independent of Q, and investigate the behavior of the estimators conditional on a particular realization of v,,. Thus, in this section, al l p(, will be treated as fixed functions of time. Notice that y, #, pi, and z might properly be subscripted by n. To simplify the notation, we have omitted the subscript.
The simple least squares estimator of /3 is
= (*faJ)-lwy. (9.1.2)
In (9.1.2) and throughout this section, we assume Cb'aJ is nonsingular. Assume that the time series is such that the matrix V,, = E(zz') is nonsingular. Then the generalized least squares (best linear unbiased) estimator of /3 is
(9.1.3)
GLOBAL LEAST SQUARES 477
be covariance matrix of Bs is E{(& - @><& - #)'} = (*'@)-'*'vaw@'a)-' , (9.1.4)
while that of & is
It is well known that & is superior to bS in that the variance of any linear contrast A'& is no larger than the variance of the corresponding linear contrast A'& However, the construction of BG requires knowledge of V,, and generally V,, is not known. In fact, one may wish to estimate the mean function of U, prior to investigating the covariance structure of the error time series. Therefore, the properties of the simple least squares estimator are of interest. To investigate the large sampie behavior of least squares estimators, let
where 9, is the ith column of the matrix @. The least squares estimator is consistent for 8 under mild assumptions.
Propwition 9.1.1. Let the model (9.1.1) hold where 2, is a zero mean time series and qi, i = 1,2,. . . , r, t = 1,2,. . . , are fixed functions of time. Assume
lim d,,, = co, i = 1,2,. , . , r , (9.1.5) n - r m
n - w lim D,'@'@D,' = A,, (9.1.6)
Proof. "he covariance matrix of the least squares estimator is given in (9.1.4), and the variance of the n o d i z e d estimator is
V@,(& - @)} = D,(~'*)-'D,D,'@'V,,~D,'D,(~Qi)-'D, .
It follows that D,& - P O ) = OJI), because
n-rm lim V{D,(b, - Po)} = Ai'BAi ' , A
Under stronger assumptions, the limiting distribution of the least squares estimator can be obtained. Assume that the real valued functions qIf of t satisfy
478 RWRESSION, TREND, AND SEASONALITY
and n-h
l b difd; C f l , f l+h , j = a-hsi j I (9.1.9)
h = 0, 1,2,. . . , i, j = 1,2,. . . , r, where A, is nonsingular and A,, is the matrix with typical element uhij. me assumptions on the (pti are quite modest and, for example, are satisfied by polynomial, trigonometric polynomial, and logarithmic functions of time. The following theorem demonstrates that the limiting dis- tribution of the vector of standardized estimators is normal for a wide class of stationary time series.
, = I n+-
Tbeorem 9.1.1. Let Z, be a stationary time series defined by a
z, = 2 ye,-j 9 t=O,+-l , 5 2 *..., j =O
where { y } is absolutely summable and the e, are independent (0,cr2) random variables with distribution functions F,(e) such that
lim sup' J e2 dF,(e) = 0 . 1-1.2, ... l4>8
Let the (pt,, i = 1.2,. . . , r, f = 1.2,. . . , be fixed functions of time satisfying the assumptions (9.1.8) and (9.1.9). Then
D,(& - N(O, A ~ ~ B A ; ' ) , (9.1.10)
where A, is the matrix with elements u,,~ defined in (9.1.9). B is defined in (9.1.7), and BS is defined in (9.1.2). Tbe covariance matrix is nonsingular if B of (9.1.7) is nonsingular.
Proof. Consider the linear combination X:-l cIZl, where
and the A, are arbitrary real numbers. Now, by our assumptions,
GLOBAL LEAST SQUARES 479
2 -112 and c, is completely analogous to (Xi",lCj) 6.3.4, X:-l c,Z, converges to a normal random variable with variance
C, of Theorem 6.3.4. By Theorem
m I I
where b, = Z;=-,u,,%(h) is the ijth element of B of (9.1.7). Since limn+, D,'@'@D,' =A, and since A was arbitrary, the mult follows from Theorem 5.3.3. A
The matrix A, with ijth element equal to ahij of (9.1.9) is analogous to the matrix r(h) of Section 4.4, and therefore the spectral representation
(9.1.11)
holds, where M ( y ) - M(w,) is a positive semidefinite Hermitian matrix for all - T C o, < a+ G n; and A,, = M(v) - M(- r). We state without proof some of the results of Grenander and Rosenblatt (1957), which are based on this representation.
Theorem 9.13. Let the assumptions (9.1.1)> (9.1.8). and (9.1.9) hold, and Iet
lim D; ' W Y ~ ~ D ; I = 2 r I fz(o) ~IM(U),
the spectral density of the stationary time series Z, be positive for all w. Then 7r
-" n - w
(9.1.12)
(9.1.13)
wherefz(w) is the spectral density function of the process 2,.
Proof. See Grenander and Rosenblatt (1957). A
The asymptotic efficiency of the simple least squares and the generalized least squares is the same if the two covariance matrices (9.1.12) and (9.1.13) are equal. Following Grenander and Rosenblatt, we denote the set of points of increase of M(u) by S. That is, the set S contains all w such that for any interval (0,. y)
480 REGRESSION, TREND, AND SEASONALITY
where w1 < o < %, M(y) - M(w,) is a positive semidefinite matrix and not the null matrix.
Theorem 9.13. Let the assumptions (9.1.1), (9.1.8), and (9.1.9) hold, and let the spectral density of the stationary time series Z, be positive for all w. Then the simple least squares estimator and the generalized least squares estimator have the same asymptotic efficiency if and only if the set S is composed of q distinct points, 0,’ y,. . . , oq, q G r .
Proof. See Grenander and Rosenblatt (1957). A
It can be shown that polynomials and trigonometric polynomials satisfy the conditions of Theorem 9.1.3. For example, we established the result for the speciai case of a constant mean function and autoregressive Y, in Section 6.1. One may easily establish for the linear trend that
from which it follows that the set S of points of increase is o, =O. The reader should not forget that these are asymptotic results. If the sample is
of moderate size, it may be desirable to estimate V and transform the data to obtain final estimates of the trend function. Also, the simple least squares estimators may be asymptotically efficient, but it does not follow that the simple formulas for the estimated variances of the coefficients are consistent. In fact, the estimated variances may be badly biased. See Section 9.7.
9.2. GRAFTED POLYNOMIALS
In many applications the mean function is believed to be a “smooth” function of time but the functional form is not known. While it is difficult to define the term “smooth” in this context, several aspects of functional behavior can be identified. For a function defined on the real line, the function would be judged to be continuous and, in most situations, to have a continuous fint derivative. This specification is incomplete in that one also often expects few changes in the sign of the first derivative.
Obviously low order polynomials satisfy the stated requirements. A h , by the Weierstrass approximation theorem, we know that any continuous function defined on a compact interval of the real line can be uniformly appximated by a polynomial. Consequently, polynomials have been heavily used to approximate the mean function. However, if the mean function is such that higher order polyno- mials are required, the approximating function may be judged unsatisfactory in that it contains a large number of changes in sign of the derivative. An alternative
0- POtYNOMIALS 481
approximation that generally overcomes this problem is to approximate segments of the function by low order polynomials and then join the segments to form a continuous function. The segments may be joined together so that the derivatives of a desired order are continuous.
Our presentation is based on the fact that, on the real line, the functions
where i = 1,2,. . . ,M, R and M are positive integers, and continuous with continuous (k - 1)st derivatives. It follows that
is also continuous with continuous (k - 1)st derivative. We call the function g(t) a grafted polynomial of degree k.
To illustrate the use of grafted polynomials in the estimation of the mean function of a time series let us make the following assumption: "The time series may be divided into periods of length A such that the mean function in each period is adequately approximated by a quadratic in time. Furthermore the mean function possesses a continuous first derivative." Let R observations indexed by t = 1,2,. . . , n be available. We construct the functions q,, i = 1.2, . . . , M, for k = 2 and A, = Ai, where M is an integer such that IA(M + 1) - nl <A. In this formulation we assume that, if n # A(M + l), the last interval is the only one whose length is not A. At the formal level all we need do is regress our time series upon t, t2, q,,, cpt2 , . . . , ysM to obtain estimates of b,, J = 0,1,. . . ,M + 2. How- ever, if M is at all large, we can expect to encounter numerical problems in obtaining the inverse and regression coefficients. To reduce the possibility of numerical problems, we suggest that the p's be replaced by the linear combina- tions
w , ~ = ys, - 3 q , , + , + 343.,+, - q.i+3 i = h2 , . . - ,M, where, for convenience, we define
for all t. Note that
(f - Ai)* , A i c t c ( i + 1 ) A ,
(t - A i l 2 - 3 [ t - (i + I)Al2, (i + 1)AGt <(i + 2 ) A , [c - (i + 3)AI2, (i + 2 ) A S f <( i + 3 ) A ,
otherwise.
Wf, =
Since the function wti is symmetric about (i + 1.5)A, the w-variables can be
482 REGRESSION, TRBND, AND SEASONALITY
written down immediately. The w,, remain correlated, but there should be. little trouble in obtaining the inverse.
This procedure is felt to have merit when one is called on to extrapolate a time series. Since polynomials tend to plus or minus infinity as t increases, practitioners typically hesitate to use high order polynomials in extrapolation. Although a time series may display a nonlinear trend, we might wish to extrapolate on the basis of a linear trend. To accomplish this, we approximate the trend of the last K periods of a time series of n observations by a straight line tangent to a higher degree trend for the earlier portion of the time series. If the mean function for the first part of the sample is to be approximated by a grafted quadratic one could construct the regression variables
for i = 1,2, . . . , M, where (K + AM - nI < A . Using thcse variables, the estimated regression equation can be written as
where the 6's are the estimated regression coefficients. The forecast equation for t h e m e a n f u n c t i o n i s ~ ~ , + ~ , t , t = n + l , n + 2 ,... .
We can also estimate a mean function that is linear for both the first and last portlons of the observational period. Using a grafted quadratic in periods of length A for the remainder of the function, the required regression variables would be, for example,
Q~I = t *
[t-K-A(i- l)]', K+A(i- l )<t=ZK+Ai , #,I +, = A* + 2A(t - K - Ai) , t > K + Ai ,
{ o otherwise
for i = 1,2,. . . , M, where n > K + M A and the mean function is linear for t < K and t > K f MA. If M is large, a transformation similar to that discussed above can be used to reduce computational problems.
Example 92.1. Yields of wheat in the United States for the period 1908 through 1991 are given in Table 9.2.1. In this example, we use onIy the data fmm 1908 through 1971. Yields for the first 20 to 30 years of the period are relatively constant, but there is a definite increase in yields from the late 1930s through 1971. As an approximation to the @end line, we fit a function that is constant for the first 25 years, increases at a quadratic rate until 1961, and is linear from 1961
GRAFTED POLYNOMIALS 483
Table 9.2.1. United States Wheat Ylelds (1908 to 1991)
Year
1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 I920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935
- Yield Y W Yield Year Yield
14.3 15.5 13.7 12.4 15.1 14.4 16.1 16.7 11.9 13.2 14.8 12.9 13.5 12.7 13.8 13.3 16.0 12.8 14.7 14.7 15.4 13.0 14.2 16.3 13.1 11.2 12.1 12.2
1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 I954 1955 1956 1957
1959 1960 1961 1962 1963
1958
12.8 13.6 13.3 14.1 15.3 16.8 19.5 16.4 17.7 17.0 17.2
17.9 14.5 16.5 16.0
17.3 18.1
20.2
27.5 21.6 26.1 23.9 25.0 25.2
18.2
18.4
19.8
21.8
1964 1965 1966 1967 1%8 1%9 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991
25.8 26.5 26.3 25.9 28.4 30.6 31.0 33.9 32.7 31.7 27.3 30.6 30.3 30.7 31.4 34.2 33.5 34.5 35.5 39.4
37.5 34.4 37.7 34.1 32.7 39.5 34.3
38.8
SOURCE U.S. Department of Agriculture, Agricultural Statistics, various issues.
to 1971. A continuous function with continuous derivative that satisfies these requirements is
1 d t G 2 5 , 25 c t s 54, v , ~ = (t - 25)’ , (9.2.1) r 841+58(t-54), ’ 5 4 c t ,
where t = 1 for 1908. The trend line obtained by regressing the 64 yield observations on the vector
(1, R l ) is displayed in Figure 9.2.1. The equation for the estimated trend line is
I‘, = 13.97 + O.O123p,,
and the residual mean square is 2.80.
484 REGRESSION. TREND, AND SEASONALITY
30
A
." -- PI
E 20
15
A I I I I I i
50 60 0 10 20 30 40 lime
Figure 9.2.1. U.S. wheat yields 1908-1971 end grafted quadretic trend.
The trend in the data and the properties of the time series will be investigated AA further in Example 9.5.1 and Example 9.7.2.
In the example aud in ow presentation we assumed that one was willing to specify the points at which the different plynomiah are joined. The problem of using the data to estimate the join points has been considered by Gallant and Fuller (1973).
Grafted polynomials passing through a set of given pints and joined in such a way that they satisfy certain restrictions on the derivatives are called spline funcrions in approximation theory. A discussion of such functions and heir properties is contained in Ahlberg. Nilson, and Walsh (1967), Greville (1%9), and Wahba (1990).
9.3. ESTIMATION BASED ON LEAST SQUARES RESIDUALS
93.1. Estimated Autocorrelations
One of the reasons for estimating the trend function is to enable one to investigate the properties of the time series Z,. Let the regression model be that defined in (9.1.1):
ESTIMATION BASED ON LEAST SQUARES RESIDUALS
y=@'B+z.
The calculated residuals using the simple least squares estimator are
2 = K - q . b s =zr - q,<& - B) 7
485
(9.3.1)
(9.3.2)
and uf, estimated Z, is
the rth row of (9. The estimated autocovariance of Zf computei. using the
I n - h T2(h) = ; 2 2$f+h , h = 0,1,2, . . . , n - 1 .
I = I (9.3.3)
Theorem 9.3.1. Let Y, be defined by the model (9.3.1) where Zf = ZJwm0 +e,-J, {T} is absolutely summable, {e,} is a sequence of independent (0, a') random variables witb bounded 2 f S (S > 0) moments, and the fit, i = 1,2, . . . , r, t = 1,2,. . . , are fixed functions of time satisfying the assumptions (9.1.5) and (9.1.6). Then
T2(W - W ) = 0 J n - I ) 7
where ?Ah) is defined in (9.3.3) and
. n - h
TJ~) = C Z,Z, +,, , h = 0, I , 2, . . . , n - I . , = I
Proof. We have, for h = 0.1,. . . , n - 1,
n - h n-h n -h
where Dn is the diagonal matrix whose elements are the square roots of the elements of the principaI diagonal of a'@. By the absolute summability of y,(h), tbe assumption (9.1.5), and the assumption (9.1.6)
n-h
2 Zr#+h, .Dil = Op(l) 9
1 = 1
486 REORESSION, TREND, AND SEASONALITY
Since each element of the r X r matrix
is less than one in absolute value, the result follows. A
It is an immediate consequence of Theorem 9.3.1 that the limiting behavior of the estimated autocovaxiances and autocorrelations defined by Theorem 6.3.5 and Corollary 6.3.5 also holds for autocovariances and autocorrelations computed using 2, in place of 2,. The bias in the estimated autoconelations is O(n-’) and therefore can be i g n d in large samples. However, if the sample is small and if several R~ are included in the regression, the bias in j$(h) may be sizable. To briefly investigate rhe nature of the bias in 92(h) we use the statistic studied
by Durbin and Watson (1950, 1951). They suggested the von Neumann ratio (see Section 6.2) computed from the calculated residuals as a test of the hypothesis of independent e m . The statistic is
(9.3.4)
where
- 1 2 -1 ... # .
0 0 0 . - * -1 1
P = [I - @(@’@)-‘@’]z is the vector of calculated residuals, and z is the vector of observations on the original time series. Recall that
where
The expected values of the numerator and denominator of d are given by
and
ESTIMATION BASED ON LEAST SQUARES RESIDUALS 487
E{@’2) = trp,, -y,@(@’~)-l@’l, (9.3.6)
where V,, = E{zz’), and trB is the trace of the matrix B. It is easy to see that the expectation (9.3.5) may differ considerably from
2(n - l)[x(O) - x( l)]. The expectation of the ratio is not necessarily equal to the ratio of the expectations, and this may further increase the bias in some cases (e.g., see Exercise 9.3).
We now turn to a consideration of d as a test of independence. For normal independent e m s W i n and Watson have shown that the ratio can be reduced to a canonical form by a simultaneous diagonalization of the numerator and denominator quadratic forms. Therefore. in this case, we can write
where it is assumed that the model contains an intercept and k’ other independent variables, the A, are the roots of [I - WW@)-’@’lHfI - @(@’@)-‘@’I, the S, are the roots of I-*@’@)-’@‘, and the e. are normal independent random variables with mean zero and variance a: = uz. Furthermore, the distribution of the ratio is independent of the distribution of the denominator. The exact distribution of d, depending on the roots A and 8, can be obtained. For normal independent errors Z,, Durbin and Watson were able to obtain upper and lower bounds for the distribution, and they tabled the percentage points of these bounding distributions.
For small sample sizes and (or) a large number of independent variables the bounding distributions may differ considerably. Durbin and Watson suggested that the distribution of i d for a particular @-matrix could be approximated by a beta distribution with the same first two moments as $d.
As pointed out in Section 6.2, in the null case the f-statistic associated with the first order autocorrelation computed from the von Neumann ratio has percentage points approximateiy equal to those of Student’s f with n + 3 degrees of freedom. For the d-statistic computed from the regression residuals it is possible to use the moments of the d-statistic to develop a similar approximation. The expected value of d under the assumption of independent errors is
’2
E(d} = (tr H - tr[@’H@(@’@)-’])(n - k‘ - 1)-’
= {2(n - 1) - tr[(A@)’(A@)(W@)-’])(n - k’ - 1)-’ , (9.3.7)
where (A@)’(A@) is the matrix of sums of squares and products of the first differences. Consider the statistics
(9.3.8)
488
and
REGRESSION, TREND. AND SEASONALITY
where r, = 3 2 - d) . The factor 1 - r2 keeps 6 in the interval [-1,1] for most samples. For normal independent (0, Q ) errors the expected value of 6 is O(n-*) and the variance of 3 differs from that of an estimated autocorrelation computed from a sample of size n - k' by terns of order n-'. This suggests that the calculated td be compared with the tabuIar value of Student's t for n - k' + 3 degrees of freedom. By using the second moment associated with the particular @-matrix, the t-statistic could be further modified to improve the approximation, though there is little reason to believe that the resulting approximation would be as accurate as the beta approximation suggested by Durbin and Watson.
4
93.2. Estimated Variance Functions
The residuals can be used to investigate the nature of the variance of Zf as well as to study the autocorrelation structure. By Proposition 9.1.1, the least squares estimator of #3 is consistent in the presence of variance heterogeneity. A simple form of variance heterogeneity is that in which the variance of the errors depends on fixed functions of time.
In Proposition 9.3.1, we show that the errors in the least squares estimated autoregressive coefficients are 0 J r 1 - l ' ~ ) in the presence of variance heterogeneity.
Proposition 93.1. Let Z, be a time series satisfying
n
(9.3.10)
for t = 0, 2 1, 22,. . . , where e, are independent (0,l) random variables with bounded 2 + 6 moments, S > 0, { Q ~ J is a fixed sequence bounded above by M, and bounded away from zero, and the roots of the characteristic quation associated with (9.3.10) are less than one in absolute value. Define the least squares estimator of a by
- i n
I=p+ 1
where
ESTIMATION BASED ON LEAST SQUARES RESIDUALS 489
Proof. The Z, can be written as X,m=owju,-j, where the w, are defined in (2.6.4). Now,
i+j
The a, are independent and have bounded 2 + 6 moments. Therefore,
and n r-h
2 2
? = I j=O
by Theorem 6.3.2.
second term of (9.3.12) converges to zero in probability. Therefore, Using the arguments of the proof of Theorem 6.3.5, it can be shown that the
n - h n - h - Z~Z,+,, - n-' Z 2
f=I j = O
The error in the least squares estimator is
&-a=-( 1=p+l i x;xr)-' t = p + l 2 x;a, (9.3.13)
and by the assumption (9.3.10),
E{(X:a,, X:X,a:)I 4-J = (0, X:Xpf,) (9.3.14)
for t = 0, t l , 2 2 , . . . , where a(-, is the sigma-field generated by Zf-l, Zf--2, . . . . Because a;-,o,?, has absolute moment greater than one, the arguments associated with obtaining the probability limit of (9.3.12) can be used to show that
plim (n -PI-'[ 2 x:xfcT:l - i: ~ ( X : X J ~ ~ , ] = 0 . n -B- t = p + I f=p+I
It follows that
REORESSION, TRPID, AND SEASONALITY 490
(9.3.15)
by Theurem 5.3.4. A
The result of Proposition 9.3.1 can be extended to the estimated autoregressive parameters computed from least squares residuals.
Propasition 9.3.2. Let the model (9.1.1) hold, where Zf is the autoregressive time series of (9.3.10) and f i p , i = 1,2, . . . , r, t = 1,2, . . . , are fixed functions of time. Let the assumptions (9.1.5) and (9.1.6) of Proposition 9:l.l and the assumptions of Proposition 9.3.1 hold. Let 2f be the least squares residuals of (9.3.2), and let
ti=-( i: $v;**)-' I=.P+l i w2:, t=p+ 1
Gi'''(t% - a)& N(O, I),
where Gn is defined in (9.3.11).
Proof. Because
for gome 0 < A < 1, one can use the arguments of the proof of Theorem 9.3.1 to show that
A
Example 9.3.1. The 150 observations in Table 9.3.1 were computer gener- ated. A plot of the data suggests that the mean function is approximately a linear function of time. If we fit a linear trend by ordinary Ieast squares, we obtain
= 15.472 + 0.3991t. (9.3.16)
If a second order autoregression is fitted to the residuaIs by ordinary least squares, we obtain
2, = 1.524 gf-, - 0.616 2f-2, (0.067) (0.067) (9.3.17)
ESTIMATION BASED ON M A S T SQUARES RESIDUALS 491
Table 93.1. Data fw Example 93.1
16.45 16.39 16.25 16.23 17.16 17.55 17.48 18.32 18.31 19.09 19.53 18.08 19.21 19.96 20.69 21.15 19.69 19.36 18.83 19.84 22.20 22.42 22.18 22.38 22.45 23.47 25.03 25.41 24.91 25.49
25.13 26.21 26.98 28.60 30.80 32.65 32.92 34.41 35.65 37.15 37.49 37.64 37.03 36.78 37.26 36.83 36.38 36.79 35.87 35.89 36.21 36.95 37.93 36.55 33.94 34.28 37.51 40.42 41.70 43.39
46.49 47.78 48.05 46.54 43.15 40.34 41.30 42.49 43.00 43.75 45.86 45.9s 44.87 44.22 43.95 43.52 41.51 39.41 36.07 37.23 40.14 40.4 1 41.06 39.56 40.33 43.90 47.28 48.71 49.46 50.61
49.57 51.03 50.80 5 1.49 52.91 56.95 59.32 59.76 59.29 58.74 57.80 56.27 52.98 52.27 56.17 61.33 64.69 68.49 70.19 7 1.47 70.98 68.63 67.47 67.68 68.47 69.52 71.47 73.33 72.17 72.72
73.07 72.83 70.05 66.30 61.47 60.47 61.30 62.38 63.46 62.38 61.61 61.82 61.03 62.18 62.60 61.83 64.23 65.62 67.17 68.61 69.36 69.91 70.28 71.92 72.48 71.59 70.19 69.69 73.55 77.55
where s2 = 1.646 and the standard errors are those of the ordinary regression program.
If the absolute values of the residuals from the autoregression are plotted against time, there is an increase in the mean of the absolute values. Several models can be considered for the variance of the errors. We postulate a model in which the variance is related to the mean. Our model for the variance of a, is
(9.3.18) 2 a,, = ~ ( a 3, = eKl [E{ q)l K* ,
where a, is the error in the autoregression estimated by (9.3.17). We estimated (K,. %), replacing a: with Ci 3 from regression (9.3.17) and replacing E{Y,} with 1 from (9.3.16). by maximum likelihood under the assumption that a, are NI(0, g.],). The estimates are
4% REGRESSION, TREIW, AND SEASONALITY
(2,, k2) = (-3.533 , 1.051), (0.558) (0.148)
where the standard e m are from the inverse of the estimated information matrix. We have estimated the variance function for the a,, but if the variance function
is a smooth function, the variance of the autoregressive process Z, will be nearly a multiple of the variance of the a,. Therefore, we define new variables
(e;,'~, a;', ~ s ' t ) = (~1, (ptlt ( P , Z ) ,
where aa, is the function (9.3.18) evaluated at E{Y,)=$ and ( R , , % ) = (-3.533, 1.051). The parameters of the model
(9.3.19)
estimated by Gaussian maximum likelihood, are
(PI, &,a,,a2,a~)=(15.113, 0.407, - 1.473, 0.570, 0.995). (1.465) (0.021) (0.068) (0.068) (0.090)
Estimation for a model such as (9.3.19) is discussed in Section 9.7. The estimate of (B,, &, a,, %) is not greatly different from that obtained in (9.3.16) and (9.3.17) by ordinary least squares. However, the confidence interval for a prediction is different under the model in which the variance is a function of the mean. The standard e m for predictions one, two, and three periods ahead are (1.283, 2.328, 3.178) for the model computed under the assumption of homoge- neous variances. The corresponding prediction standard emrs are (1.673, 2.985,
A 4.023) for the model with the variance function of (9.3.18).
In Example 9.3.1, the variance was assumed to be a fixed function of time. Models in which the variance is a random function have also been considered in the literature. Most often the variance at the current time is expressed as a function of the past behavior of the time series. Under mild assumptioIls, it can be shown that the least squares estimator of the parameter vector of the autoregressive process has a normal distribution in the limit.
Projmition 9.33. Let (2,. q,), t = 0 . 2 1 . 5 . . . . , be a covariance stationary time series satisfying
ESTIMATION BASED ON LEAST SQUARES RESIDUALS 493
where the wj are absolutely swnmable, el are iid(0,l) random variables with 2 + S, (S, > 0) moments, el +, is independent of a,, for j 3 0, and
If, in addition, 2, is an autoregressive process satisfying P
z1 + C a ~ , - , =a, ,
where the roots of the characteristic equation are less than one in absolute value, and if
plim n-1 i: X:Xlafl = E{X:X,a~,), n-+- t - p + i
i s 1
(9.3.20)
then
G,"*(&- a)& N(0, I ) ,
where q &, G,, and X, are defined in Proposition 9.3.1.
proof. The sequence (a,} is a martingale difference sequence and
P n"*rS,+ N(0, a:)
by Theorem 5.3.4. Hence, n1"Zn also converges in distribution to a normal random variable by Theorem 6.3.3. Thus,
We expand n-' 2::; Z,Z,+,, as in (9.3.12) and observe that
by Theorem 6.3.2. It can be shown that r - h m rn .. ..
plim C C C wjw,+,a,-,a,-j = o ? = I j = O i - O
i# j
by applying the arguments of the proof of Theorem 6.3.5. Therefore, the sample covariance is consistent for the population covariance.
The m r in the least squares estimator is given in (9.3.13). Under the present assumptions, (9.3.14) holds and, by the assumption (9.3.20) and Theorem 5.3.4, we obtain the limiting normal distribution result of (9.3.15). The limiting result for
A G,"'(& - cu) then follows.
Engle (1982) introduced a variance model called the autoregressive conditional heteroscedastic model (ARCH). An example is
(9.3.21)
where the e, are NI(0,l) random variables and 0 Q K' < 3-l". Some properties of this model are developed in Exercise 2.43 of Chapter 2. CIearIy, many extensions of the simple model are possible, and a number have been considered in the literature. See Engle and Bollersiev (1986), Weiss (1984, 1986), Pantula (1986), Bollerslev, Chou, and Kroner (1992). Higgins and Bera (1992). and Granger and Teriisvirta (1993),
A model closely related to the ARCH model postulates the variances to be an unobserved stochastic process. See Melino and Turnbull (1990), Jacquier, Polson, and Rossi (1994), and Harvey, Ruiz, and Shepbard (1994). An example mode1 is
z,=ez,_,+a,, (9.3.22) a, = u,,e, , (9.3.23)
h , - / S , = W , - , - & J + b , , (9.3.24)
where h, = log a,, and (e,, b,) - II[(O, 0), diag( 1, a:)]. In (9.3.241, the logarithm of the unobserved variance process is a first order autoregressive process. Thus, a,, is always positive.
If the a, are such that loga: has second moments, it is natural to take logarithms of (9.3.23) to create a linear model. Then,
log a: = ZZ, + log e: . (9.3.25)
If e, is normally distributed, then log e: is distributed as log chi square which has mean of about -1.27 and variance of about 4.93. The distribution of loge: is heavily skewed to the left. If (9.3.24) is a stationary autoregressive process and e, is normally distributed, then log a: is a stationary autoregressive moving average process. Nonlinear estimation methods can be used to estimate the parameters of
ESTIMATION BASED ON LEAST SQUARES RESIDUALS 495
(9.3.24) treating the variance of loge: as known or unknown. In Example 9.3.2, we consider a modification of the log transformation in the estimation.
Exampie 933. As an example of the fitting of a stochastic volatility model, we consider the prices of June futures contracts for the Standard and Poor’s 500 index, collected as the average price per minute from 8:30 am. to 3:15 p.m. on April 13,1987. Our basic observations are the first differences of the prices. The data are considered by Breidt and M q u i r y (1994). If a first order autoregression is fitted to the data, the estimated coefficient is 0.340 with a standard error of 0.047. The residuals from the first order autoregression, denoted by d,, are plotted against time in Figure 9.3.1.
The observations in Figure 9.3.1 show evidence of a serially correlated variance. At the end of the series, large deviations tend to be followed by large deviations, while in the middle, small deviations tend to be followed by small deviations. This dependence is reflected in the sample autocorrelations for the Ci :. The first five autocorrelations of the 4 : are 0,204,0.501,0.205, 0.346, and 0.263. These correlations are significantly different from zero because the standard error under zero correlation is about 0.05.
We wish to fit a variance model of the form (9.3.23). (9.3.24) to the residuals from the autoregressive fit. In practice, there are some disadvantages to a logarithmic transformation of the a: as defined in (9.3.25). There may be some zero values for a:, and the original distribution may not be nonnal. Therefore, we
1.0-
0.5 -
- i .oJ ,
8:30 10:30 12:30 2:30 Time
Figure 93.1. Residuals fmm autoregression of price differences.
4% REORESSION, TREND, AND SEASONALITY
suggest the following estimation scheme that is more robust against nonnormality. Let 5 be a s d fraction, say 0.02, of the average of the a:. Let
u, = log(u: + 6 ) - (u: + S)-- ' t = 2h, + w, , (9.3.26)
where
w, = log(e: + v,'f> - (e: + (r,2~)-'a,'~, (9.3.27)
If e: is large, w, differs little from log e:. For example, the difference is less than 0.016 if ai:[=0.02 and e:>0.10. For normal (0.1) random variables and uGt t= t= 0.02, the mean of w, is about -1.09 and the variance is about 3.19. Thus, the &modification of the log transformation produces variables with smaller variance than log e: because the function (9.3.26) is bounded below by log f - 1. Also, the distribution of w, is much less skewed than that of loge:.
The V, of (9.3.26) can be used to estimate the parameters of the h,-process up to an additive constant. For example, if our specification for the h,-prOcess is a pth order autoregressive process, we write
- 2
(9.3.28)
i - 1
where /+, = E{2h,} + E{w,}, g, = w, - E{wt}, XI = u", - E{2hl}, w, is defined in (9.3.27). {b,} is independent of {g,}, and b, - II(0, 0,'). Because of the nahm of our transformation, the conditional distributions of the g, given h, are only approxi- mately equal.
We use a first order autoregressive model for the h, of stock prices. Then the IJ, of (9.3.26) is an autoregressive moving average, which we write as
~,-llu=wJ,-, - h ) + % + B r l t - l * (9.3.29)
We replace u: with d : in the definition of U, of (9.3.26) and set f = 0.00058, where the average of the S: is 0.0290. The estimates of the parameters are
((2: B, &;) = (0.939, - 0.816,2.%1). (0.033) (0.055)
From (9.3.29) and (9.3.28), pa: = -@," and
(1 + P*),,Z = a; + ( 1 + +'>a,".
It follows that * - 1 1
&," = - + /3&: = 2.5742,
&.b' = &$ + 8' + $-'&1+ $31 = 0.0903.
MOVING AVERAGES-LINEAR PlLTERING 497
The fact that &: differs considerably from 3.19 suggests that the original e, are not normally distributed. Given an estimate of the parameter 9 of the X,-process, we can construct smoothed estimates of X,, r = 1,2,. , . , n and, hence, estimates of 2h,, up to an additive constant. Then the constant can be estimated as a multiplicative constant in the original (nonlogarithmic) scale. Let f be defined by
where p,, = E{h,}. Then an estimator of f is
(9.3.30)
(9.3.3 1)
where 2, is the smoothed estimator of X, constructed from the fitted model (9.3.28). The quantities
B:, = pexd2,) (9.3.32)
are smoothed estimates of the unknown air. be carried out in which the initial transformation is
If the Bq, arr: quite variable over the sample, a second round of calculations can
where R is a constant, such as 0.02, and &:, = lexP($,}. In this example, the estimates are changed very little by the second round of computations. However, in some situations there is a considerable change, and in general a second round of computation is recommended. See Breidt and Caniquiry (1994).
Given the estimates of Q:, a;, and JI for the price data, we constructed smoothed estimates $, using the Kalman filter recursions for fixed interval smoothing described, for example, in Harvey (1989, p. 154) and Anderson and Moore (1979, Chapter 7). The estimate of f of (9.3.30) is l= 0.01550. The smoothed estimates of the standard deviations go, from (9.3.32) are plotted against time in Figure 9.3.2. "hey reflect the pattern observed in the data. That is, there is a period of high volatility near the end of the series and low volatility in the middle of the observation period. A A
9.4. MOVING AVERAGES-LINEAR FILTERING
9.4.1. Moving Averages for tbe Mean
In the previous sections we considered methods of estimating the mean function for the entire period of observation. One may be interested in a simple approximation to the mean as a part of a prel imhy investigation, where one is not willing to specify the mean function for the entire period, or one may desire a relatively simple method of removing the mean function that will permit simple
498 REGRESSION, TREND. AND SBASONALRY
Time Figme 9.33. Smoothed estimates of a, for price differences.
extrapolation of the time series. In such cases the method of moving averages may be appropriate.
One basis for the method of moving averages is the presumption that for a period of M observations, the mean is adequately approximated by a specified function. The function is typically linear in the parameters and most commonly is a polynomial in r. Thus, the specification is
where 2, is a stationary time series with zero expectation, is a vector of parameters, and M = M, + M2 + 1. The form of the approximating function g is assumed to hold for all r, but the parameters are permitted to be a function of r. Both the “local” and “approximte” nature of the specification should now be clear. If the functional form for the expectation of U, held exactly in the interval, then it would hold for the entire realization, and the constants M, and M, would become f - 1 and n - 1, respectively.
Given specification (9.4.1), a set of weights, w,(s), am constructed that, when applied to the ?+,, j = -hi,, -M, + 1,. . . , M2 - 1, M2, furnish an estimator of g(s; 4) for the specified s. It follows that an estimator of Z,,, is given by
y,+, - 8 s ; 8, ?
Where
MOVING AVERAGES-LINEAR PILTERING 499
In the terminology of Section 4.3 the set of weights {wj(s)) is a linear filter.
time Series is adequately represented by Let us consider an example. Assume that for a period of five observations the
where Z, is a stationary time seriles with zero expectation. Using this specification, we calculate the least squares estimator for the trend value of the center observation, s = 0, as a linear function of the five observations. The model m a y be written in matrix form as
where
y, = (Y,+ Y,-l , U,, U,,, , U,+z ) ' is the vector in the first column of Table 9.4.1, Q, is the matrix defined by the second, third, and fourth columns of Table 9.4.1 (those columns headed by &,, PI,, and a,), and z , = ~ Z , ~ ~ , ~ , ~ , . Z , ~ Z , + ~ , Z , ~ ~ ~ ' is the vector of (unobservable) elements of the stationary time series. The least squares estimator of the mean at j = 0, g(0; B,), is given by
= (1, 0, O)(*'@)-'@'y, 2
where the vector (1,O.O) is the value of the vector of independent variables associated with the third observation. That is, (1,0,0) is the third row of the matrix Q, and is associated with j = 0. The vector of weights
Table 9.4.1, WculatsOn of Weights for a Five Period Quadratic Moving Average
Weights for Trend Weights for Trend Adjusted Series
9, w, P,, P2, s(o;B,) g ( k B , ) t?(2;B,) (s = 0)
Y,-2 1 -2 4 -6170 -10170 6/70 6/70 y,-' 1 -1 1 24/70 12170 -10/70 -24170 Y, I 0 0 34/70 24/70 -6170 36/70 Y,+, 1 1 1 24/70 26/70 18/70 -24170 y1+2 1 2 4 -6170 18/70 62/70 6/70
500 RWRESSION. TREND, AND SEASONALITY
w' = (1, O,O)(O'@)-'@' ,
once computed, can be applied to any vector y I of five observations. These weights are given in the fifth column of Table 9.4.1 under the heading "g(0; B,)." The least squares estimator of 2, for the third observation (s = 0) is given by
y, - = u, - (1,O. o)(@'@)-'O'yf . The vector of weights giving the least squares estimator of 2, is presented in the
last column of Table 9.4.1. One may readily ver@ that the inner product of this vector of weights with each of the columns of the original matrix @ is zero. Thus, if the original time series satisfies the specification (9.4.2), the tihe series of estimated residuaIs created by applying the weights in the last column will be a stationary time series with zero mean. The created time series X, is given by
where F, denotes the weights in the last column of Table 9.4.1. Although X, can be thought of as an estimator of Z,, X, is 8 linear combination of the five original 2, included in the moving average.
A regression program that computes the estimated value and deviation from fit for each observation on the dependent variable is a convenient method of computation. Form a regression problem with the independent variables given by the trend specification (e.g., constant, linear, and quadratic) and the dependent variable defined by
where the estimated mean is to be computed for j = s. The vector of weights for the estimated mean is then given by
(9.4.3) w = 6 = @(@'@)-'wv = aq@'*)-'q;. ,
where g, is the vector of observations on the independent variables associated with j = s. Similarly, the vector of weights for computing the trend adjusted time series at j = s is given by
(9.4.4) v - 6 = v - aqcP'@)--'@'v = v - aqwm)-'p;. . "be computation of (9.4.3) is recognized as the computatim of 9 associated
with the regression of the vector v on @. The weights in (9.4.4) are then the deviations from fit of the same regression.
MOVING AVERAGES-LINEAR FILTERING 501
The moving average estimator of the trend and of the trend adjusted time series are frequently calculated for j = 0 and M, = M2. This configuration is called a centered moving averuge. The popularity of the centered moving average is perhaps due to the following theorem.
Theorem 9.4.1. The least squares weights constructed for the estimator of trend for a centered moving average of W + 1 observations, M a positive integer, under the assumption of a pth degree polynomial, p nonnegative and even, are the same as those computed for the centered moving average of 21M + 1 observations under the assumption of a (p + 1)st degree polynomial.
Pmf. We construct our matrix Cg in a manner analogous to that of Table 9.4.1, that is, with columns given by jk, k =0, 1,2,. . . , p , j = 0, +1, +2,. . . , +M.
Since we are predicting trend for j = 0, the regression coefficients of all odd powers are multiplied by zero. That is, the elements of 0:. in (9.4.3) associated with the odd powers are all zero. It remains only to show that the regression coefficients of the even powers remain unchanged by the presence or absence of the (p + 1)st polynomial in the regression. But
j = - M
for k a positive integer. Therefore, the presence or absence of odd powers of j in the matrix Cg leaves the coefficients of the even powers unchanged, and the result follows. A
The disadvantage of a centered moving average is the loss of observations at the beginning and at the end of the realization. The loss of observations at the end of the observed time series is particularly critical if the objective of the study is to forecast fiiture observations. If the trend and trend adjusted time series are computed for the end observations using the same model, the variance-covariance structure of these estimates differs from those computed for the center of the time series. We illpstrate with our example.
Assume that the moving average associated with equation (9.4.2) and Table 9.4.1 is applied to a sequence of independent identically distributed random variables. That is, the trend value for the first observation is given by applying weights g(-2,8,) to the first five observations, and the trend adjusted value for the first observation is obtained by subtracting the trend value from Y,. The weights g(-2, &) are the weights g(2.8) ammged in reverse order. For the second observation the weights g(-1, P,) me used. For t = 3.4,. . . , n - 2, the weights g(0, P,) are used. Denote the trend adjusted time series by X,, t = 1,2,. . . , n. If 3GtGn-2 and 3Gt+h+En-2, then
502 RBORESSION. TREND, AND SEASONALITY
For the first observation,
Cov{X, , x,} =
560/4900, - 1260/4900, 420/4900, 252/4900, -420/4900, 204/4900, -36/4900, 0
t = l , t = 2 , t = 3 , t = 4 , t = 5 , t = 6 , t = 7 , otherwise .
9.4.2. Moving Averages of Integrated The Series
In Chapter 10 we shall study nonstationary time series that can be represented as an autoregressive process with root of unit absolute value. We have been constructing moving average weights to remove a mean that is polynomial in time. We shall see that weights constructed to remove such a mean will also eliminate the nonstationarity arising from an autoregressive component with unit root. The time series {w, t E (0, 1.2, . . .)} is called an integrated time series of order s if it is defined by
where {&, t E (0,1,2,. . .)) is a stationary time series with zero mean and positive spectral density at w I= 0.
Theorem 9.4.2. A moving average coashructed to remove the mean will reduce a first order integrated time series to stationarity, and a moving average constructed to remove a linear trend will reduce a second order integrated time series to stationarity.
Proof. We first prove that a moving average constructed to remove the mean (zero degree polynomial) will reduce a first order integrated time series W,= Z:=,,Z, to stationarity. Define the time series created by applying a moving average to remove the mean by
MOVING AVERAGES--LINEAR FlLTERING 503
M
x, = il: C j w , + / , t = 0, 1,2,. . . I
j = I
where the weights satisfy M c cj = O .
1-1
Then
M f + j
x,= c CI 2 z* j = l s-0
J
= I - 3 I c,(w, + r=l z &+.) M I
= I= CJ c z,+, . j = 1 r = I
which is a finite moving average of a stationary time series and therefore is stationq.
Consider next the second order integrated time series
r = O j = O r=0 j - 1
where W, is a first order integrated time series. The weights dj constructed to remove a hear trend satisfy dj = 0 and Z r l Jdj = 0. Therefore, the time series created by applying such weights is given by
which, once again, is a finite moving average of a stationary time series. A
The reader may extend this theorem to higher orders. Moving averages are sometimes repeatedly applied to the same time series.
504 REGRESSION, TREND, AND SEASONALJTY
Theorem 9.4.3 can be used to show that the repeated application of a moving average constructed to remove a low order polynomial trend will remove a high order polynomial trend.
Theorem 9.4.3. Let p and q be integers, p b 0, q a 1. A moving average constructed to remove a pth degree polynomial trend will reduce a (p + q)th degree polynomial trend to degree q - 1.
Proof. We write the trend function as
P+Q
r-0
If a moving average with weights {wj: J = -M,, -MI + 1, . . . , iU2} is applied to this function, we obtain
IntenAanging the order of summation and using the fact that a filter consmcted to remove a pth degree polynomial trend satisfies
we obtain the conclusion. A
9.43. Seasonal Adjustment
Moving averages have been heavily used in the analysis of time series displaying seasonal variation. Assume that is a monthly time series that can be represented locally by
12
(9.4.5)
1 if Y, is observed in month m, Dm ={ 0 otherwise,
MOVING AVERAGW-LWEAR FETERING 505
11 c a,, = 0, m = l
and Z, is a stationary time series with zero mean. In this representation at + B, j is the trend component and X:=, ~lmDl+j,m is the seasonal Component. Since the sum of the seasonal effects is zero over a period of 12 observations, it follows that a moving average such as
(9.4*6) i t = L ( L Y 12 2 f -6 + &-5 + T-4 + * * ' + q + 4 + x + 5 + + q + 6 )
the difference 12 6
u, - t = x &Qm f 2 djZ,+, (9.4.7)
constructed for the time series (9.4.5) contains the original seasonal component but no trend. A moving average of the time series Y, - % can then be used to estimate the seasonal component of the time series. For example,
m = l j=-6
(9.4.8)
furnishes an estimator of the seasonal component for time t based on 5 years of data. The 12 values of $, computed for a year by this formula do not necessarily sum to zero. Therefore, one may modify the estimators to achieve a zero sum for the year by defining
(9.4.9)
where is the seasonal component at time t that is associated with the kth month. The seasonally adjusted time series is then given by the difference
The seasonal adjustment procedure described above was developed from an additive model and the adjustment was accomplished with a difference. It is quite common to specify a multiplicative model and use ratios in the construction.
- 't(k)'
506 REGRESSION, TREND, AND SEASONALITY
It is also possible to constnict directly a set of weights *sing regression procedures and a model sucb as (9.4.5). To illustrate the procedute of seasonal adjustment based directly on a regression model, we assume that a quarterly time series can be represented for a period of 21 quarters by
3 4
where
1 if U, is observed in quarter k, Dfk ={ 0 otherwise
and
We call X;=, ap,k the quarter (or seasonal) effect and Zjm0 &j' the trend effect. The first seven columns of Table 9.4.2 give a *matrix for this problem. We have incorporated the restriction 2:=1 a, = 0 by setting ah = -a, - a, - a,.
With our coding of the variables, the trend value for f , the center observation, is given by A, and the seasonal value for f is given by a,. To compute rhe weights needed for the trend value at f , we regress the &-column on the remaining six columns, compute the deviations, and divide each deviation by the sum of squares of the deviations. It is readily verified that this is equivalent to Computing the vector of weights by
where
J, = (1,0,0,0,0,0,0).
The weights for the seasonal component at time f can be calculated by regressing the column associated with a, on the remaining columns, computing the deviations, and dividing each deviation by the sum of squares. These operations are equivalent to computing
where
J,, = (O,O, 0,0,1, 090).
Let v be a vector with 1 in the 1 lth (tth) position and zeros elsewhere. Then the weights for the trend and seasonally adjusted time series are given by
MOVING AVERAGES-LINEAR FETERING 507
Table 9.4.2. Calculatdoa of Weights for tbe Trend and !kasonaI Components of a Quarterly Time &ria
cp Weights Index
-10 1 -10 100 -9 1 -9 81 -8 1 -8 64 -7 1 -7 49 -6 1 -6 36 -5 1 -5 25 -4 1 -4 16 -3 1 -3 9 -2 1 -2 4 -1 1 -1 1 0 1 0 0 1 1 1 1 2 1 2 4 3 1 3 9 4 1 4 16 5 1 5 2 5 6 1 6 36 7 1 7 49 8 1 8 6 4 9 1 9 81
10 1 10 100
- lo00 -729 -512 - 343 -216 - 125 -64 -27 -8 -1 0 1 8 27 64 125 216 343 512 729 lo00
0 0 1 -1 -1 -1
1 0 0 0 1 0 0 0 1
-1 -1 -1 1 0 0 0 1 0 0 0 1
-1 -1 -1 1 0 0 0 1 0 0 0 1
- 1 -1 -1 1 0 0 0 1 0 0 0 1
-1 -1 -1 1 0 0 0 1 0 0 0 1
-0.0477 -0.0304 -0.0036 0.0232 0.0595 0.0634 0.0768 0.0902 0.1131 0.1036 0.1036 0.1036 0.1 131 0.0902 0.0768 0.0634 0.0595 0.0232
-0.0036 -0.0304 -0.0477
-0.0314 -0.0407 0.1 562
-0.0469 -0.0437 -0.0515 0.1469
-0.0546 -0.0499 -0.0562 0.1438
-0.0562 -0.0499 -0.0546 0.1469
-0.0515 -0.0437 -0.0469 0.1 562
-0.0407 -0.0314
0.0791 0.071 1
- 0.1 526 0.0237
-0,0158 -0.0119 -0.2237 -0.0356 -0.0632 -0.0474 0.7526
-0.0474 -0.0632 -0.0356 -0.2237 -0.01 19 -0.0158 0.0237
-0.1526 0.07 1 1 0.079 I
w, = v - wso - w,, . The weights w, can also be obtained by regressing v on B, and computing the deviations h m regression.
9.4.4. Differences
Differences have been used heavily when the objective is to reduce a time series to stationarity and there is little interest in estimating the mean function of the time series. Differences of the appropriate order will remove nonstationatity associated witb locally polynomial trends in the mean and will reduce to stationarity integrated time series. For example, if Y, is defined by
u, = a +/3t + w,, (9.4.10)
where w, = c I zt-I
j - 0
508 REGRESSION, "D, AND SEASONALITY
and 2, is a stationary time series with zero mean, then
AY, = U, - U,-l = fl +Z,
is a stationary time series. The first difference operator can be viewed as a multiple of the moving average
constructed to remove a zero degree polynomial trend from the second of two observations. The second difference is a muItiple of the moving average constructed to remove a linear trend from the third of three observations. The second difference can also be viewed as a multiple of the moving average constructed to remove a linear trend from the second of three observations, and so forth, Therefore, the results of the previous subsections and of Section 2.4 may be combined in the following lemma.
Lemma 9.4.1. Let U, be a time series that is the sum of a polynomial trend of degree r and an integrated time series of order p. Then the qth difference of Y,, where r d q and p S q, is a stationary time series. The mean of AqU, is zero for r c q - 1.
prool. See Corollary 2.4.1 and Theorems 9.4.2 and 9.4.3. A
Differences of lag other than one are important in transforming time series. For the function f i r ) defined on the integers, we define the diflerence of lag H by
Ac,,,f(t) =.IT?) -f(t - H) (9.4.1 1)
where H is a positive integer.
a function satisfying In Section 1.6 we defined a periodic function of period H with domain T to be
J l t + H ) = f ( t ) V t , # + H E T .
For T the set of integers and H a positive integer, the difference of lag H of a periodic function of period H is identically zero. "berefore, differences of lag H have been used to remove seasonal and other periodic components from #me series. For example, if the expected value of a monthly time series is written as
12
E{Y,)= p + flt + c qD,, . i = l
where
(9.4.12)
1 if U, is observed in month i , Dti ={ 0 otherwise,
12
1- 1
STHUCRJRAL MODEXS 509
then the expected value of A(12J is E(Y, - Y,- = 128. The difference of lag 12 removed the periodic component and reduced the linear trend to a constant as well. Also, a difference of any finite lag will reduce a first order integrated time series to stationarity.
Mixtures of differences of different lags can be used. For example, if we take the first difference of the difference of lag 12 (or the difference of lag 12 of the first difference) of the time series V, of (9.4.12), the expected value of the resultant time series is zero; that is, ,??{Acl2) AY,} = 0.
The effect of repeated application of differences of different lags is summarized in Lemma 9.4.2,
Lemma 9.43. Let the time series V, be the sum of (1) B polynomial trend of degree r, (2) an integrated time series of order p , and (3) a sum of periodic functions of order H,, H,, . . . , Hq, where r Gq and p d q. Then the difference X, = A(H,)A(H2)- .ACHq,Y,, where 1 SH, C a for all i, is a stationary time series. If r d q - 1, the mean of X, is zero.
Pmf. Reserved for the reader. A
93. STRUCTURAL MODELS
In Sections 9.1 and 9.2, we demonstrated the use of specified functions of time to approximate a mean function. In Section 9.4, we studied moving averages as local approximations to the mean function. Stochastic functions are also used as models for the means of time series. Models with stochastic means are sometimes called structural models in econometrics. See, for example, Harvey (1989). To introduce the approach, consider the simple model
U, =p+ + e l , & = & - I +a , ,
f = 1.2,. . . , (9.5.1)
where (e,,a,)'-NI(O,diag(u~, a:)). Models such as (9.5.1) are also called unobserved compoRents models and can be considered a special case of (9.0.1). It follows fkom (9.5.1) that
X,=AV,=a,+e,-e,-, (9.5.2)
and hence X, is a nonnally distributed stationary process with mean zero and autocovariances ~ ( 0 ) = a: + 2u:, ~ ( 1 ) = -uf, and ~ ( h ) = 0 for lhl> 1. That is, XI is a first order moving average time series with representation
510 REORFSSION. TREND, AND SEASONALITY
with mean zero and variance a: = ( I + f12)-'(2a,2 + 5). If we let K = U ; ~ U ; , then K = 2 - @-'(I + f l )2 and a: = (K + 2)-'( 1 + @ )a:. When a," = 0, the model (9.5.1) reduces to the constant mean model and X, =e, -e,- , is a noninvertible moving average.
Given a sample segment from a dizat ion of Y,, we can estimate the unknown parameters (a:, a:) by fitting a first order moving average to X, = AU,, with the parameter p restricted to [ - 1,O). The resulting estimata of 43 and uz are used to construct estimates of (a:, a:). Given an estimate of (a:, aJ, estimates of the h can be obtained with filtering methods. Because of the form (9.5.1), it is natural to use the Kalman filter procedures of Section 4.6 to estimate the 14.
It is also clear that a prediction constructed with estimates of a: and t~," is identical to the prediction obtained by using the moving average representation for
A more general trend model is obtained by including a random change x,. component. The expanded model is
(9.5.4)
where (e,,, et2, er3) -NI(O, diag{a:, a;, af}). The component is a local linear trend.
If we take second differences of Y,, we obtain
(9.5.5)
By the assumptions on e,, we have
[a(O), %(I), ~ ( 2 ) ] = [af + 20; + 6a:, -a; - 4a:, a:] , and ~ ( h ) = 0 for h > 2, Thus, X, = A'Y, can be represented as a second order moving average
x ,=u ,+p ,u , - , +&u,-,. (9.5.6)
-2 2 - 2 2 If we let ~2 = a, a2 and K~ = ul P,, then
K2 = -4 - p;'(2)&(1) 3
K~ = p i ' ( 2 ) - 2 ~ , - 6 , a : = ( ~ ~ + 2 ~ ~ + 6 ) - 1 a,(l+j3:+&). 2
As with the simple model, the fact that variances are nonnegative restricts the possible values for (PI, a).
Example 9.5.1. To illustrate the use of the structural model to estimate the
STRUCTURAL MODELS 511
trend, we use the data on wheat yields of Example 9.2.1. We consider the simple model
(9.5.7)
where (e,, a,)' - II(0, diag{of, o,"}). We let X, = AY, and fit the first order moving average of (9.5.3) to the first differences of the data to obtain
X, = - 0.4109 u,-, + U,
(0.1010)
with St = 4.4019. Eighty-three differences were used, and the zero mean model is estimated. Using
(1 + p 2 p p = -(2af i- aya;,
(1 + pZ)a,Z = 2a3 4- u," ,
we obtain (&.,", (is = (1.8086,1.5278). Using 0{8:} = 0.4726 and Taylor ap rox- imation methods, the estimated standard errors are 0.5267 and 0.5756 for 6, and a,", respectively.
Given the estimates and the model, one can use filtering procedures to estimate the individual p,. We use the covariance structure to construct a linear filter. If the p m s s begins at time one with a fixed unknown initial vaiue, denoted by p,, the covariance matrix of the vector (Y,, Y,, . . . , Y,)' = Y is
D
v,, = 1 ~ 3 + L L ' ~ , ~ ,
where L is an n X n lower triangular matrix with hj = 1 fo r j< i and LIj = O for jai. The best linear unbiased estimator of the unknown initial value is
p, = (J'v;,'J)-'J'v;;Y, (9.5.8)
where J' is an n-dimensional row vector composed of all ones. Now, from (9.5.7),
f i = (ELI 9 112, - - - 9 EL")' = ELI J + La
where a = (ui, a2,. . . , u,)'. Therefore, the best unbiased estimator of p is
@ = JP, + a;LL'Vi'(Y - J& > = {J(J'V;; J)- J'V;: + o:LL'V;' [I - J(J'V;,'J)- ' J'V;; ]}Y .
(9.5.9)
For fixed v,'a;*, the estimator is a linear function of Y, say KY, and the weights to be applied to the elements of yi to estimate any particular ~4 decline rapidly as
512 REGRESSION. TREND, AND SEASONALITY
the distance betweenj and t increases. In equation (9.5.9), @ is an n-dimensional vector, but any subset of the original vector of observations can be used to construct a smaller vector of estimates. Using our estimates of a: and a:, the vector of optimal weights for estimating c1, using (Y,+ Ym+. . . . , Ym+4, Ym+J is
H,,, = (0.007,0.013,0.029,0.071,0.171,0.418,0.171,0.071,0.029,0.013,0.007).
Notice that the sum of the weights is one and that the weights are symmetric about the center weight. If K denotes the matrix multiplying Y in (9.5.9), the vector €I,,, is the sixth row of the eleven by eleven matrix K.
Figure 9.5.1 contains a plot of the estimated f i values.Values for 1913 through 1986 were computed using H,,,. That is, the estimates are centered moving averages with weights H,. The end values were computed using the optimal estimator based on the first eleven and last eleven observations. The vector of estimated variances of the e m s jit - f i for the last six observations is
(0.755,0.755,0.757,0.764,0.808,1.066).
m * 40
30 -
I! F Q
44
8 5
20 -
I 104 I I I I i I i I
0 10 20 30 40 50 60 70 80 90 Time
Figurn 9S.1. U.S. wheat yields 1908-1991 and structural trend estimates from model (9.5.7).
SOMB E€W?CTS OF MOViNG AVERAGE OPERATORS 513
The variance for Y,,-5 is the variance for all estimators in the center of the observation vector, that is, more than 5 years from the end points. The estimated variances are the last six diagonal elements of the matrix
LLV.,” - K L L ~ ~ ~ - LL’KW: + K ~ ~ K ’ . The random walk model for pt provides a rather flexible mode1 for the mean
function. As a result, the fi, of Figure 9.5.1 form a much more variable estimated mean function than the &rafted polynomial function of Example 9.2.1. The random walk model might be questioned for these data because of the rather extended period of increasing yields. Although one may not be satisified with the final estimates of & obtained from the simple structural model, the estimates furnish useful information about the general movement of the time series.
Under the model (9.5.7), the predictor for &+,, s > 0, is fin, The variance of the prediction e m r is V{i2n+s - &+J = V{bn - &Lh) -t- su,”. Thus, the predictions of the mean for 1992, 1993, and 1994 are (35.46,35.46,35.46), respectively, and the estimated standard errors of the estimated means are (1.61,2.03,2.38). The predictions of the individual yields for 1992, 1993, and 1994 are also (35.46,35.46,35.46). The variances of the prediction errors are the variances of the estimated means increased by (r:. The estimated standard errors of the prediction errors are (2.03,2.38,2.68). AA
Example 9.5.1 illustrates the fact that the structural model can be used for preliminary investigation of a time series. We discussed the construction of moving averages of arbitrary length based on polynomial models in Section 9.4. By specifying a structural model for the trend, we also obtain a moving average estimator of the trend. The “length” of the moving average is based on estimates constructed with the data
9.6. SOME EFFECTS OF MOVING AVERAGE OPERATORS
Linear moving averages have welidefined effects on the correlation and spectral properties of stationary time series. These were discussed earlier (see Theorems 2.1.1 and 4.3.1), but the effects of trend-removal filters are of sufficient importance to merit special investigation.
Proposition 9.6.1. Let X, be a stationary time series with absoiutely summ- able covariance function, and let Y, = Z E - L aT,,j, where L and M are nonnega- tive integers and the weights a, satisfy X: - aj = 0. Then the spectral density of Y, evaluated at zero is zero; that is, fu(0) = 0.
Proof. By Theorem 4.3.1, the spectral density of Y, is given by
f y ( 4 = <2.rr>2f,(@Vt(4&(4 I
514
where
REGRESSION, TREND. AND SEASONALITY
Since c f l _ - L uj = 0, it follows thatA(0) = 0. A
Since fU(o)= ( 2 ~ ) ~ ' X;fp-m %(/I), it follows that the sum of the auto- covariances is zero for a stationary time series that has been filter& to remove the mean. For example, one may check that the covariances of X, and Xl+,,, h = -4, -3, . . . ,4, for the example of Table 9.4.1 s u m to zero.
Tbe weights in the last column of Table 9.4.1 were constructed to remove a quadratic trend The transfer function of that filter is
2 @ ( ~ ) = (70)-'(36 - 48 cos w + 12 CQS 20),
and the squared gain is
12@(0)1' (70)-'(36 - 48 cos w + 12 cos 20)'.
The squared gain is plotted in Figure 9.6.1. The squared gain is zero at zero and rises very slowly. Since the weights remove a quadratic trend, the filter removes much of the power from the spectral density at low frequencies. Since the spectral density of white noise is constant, the squared gain is a multiple of the spectral density of a moving average of uncorrelated random variables where the coefficients are given in Table 9.4.1.
The squared gain of the first difference operator is displayed in Figure 9.6.2. While it is of the same general appeafance as the function of Figure 9.6.1, the squared gain of the first difference operator rises from zero more rapidly.
1.5
0.5
0 0 d 4 d2 3d4 1F
0
Figure 9.6.1. Squand gain of filter in Table 9.4.1.
SOME EFFECTS OF MOVING AVERAGE OPERATORS 515
.- r ~ . o ~ m M
/ 4.0
..
U
a / 0 d4 Id2 3d4 n
0
Flgure 9.6.2. Squared gain of first difference operator.
If a time series contains a perfect sine component, the difference of the time series will also contain a perfect sine component of the same period, but of different amplitude and phase. That is, if Y, = sin wt, the difference is
AY, =sin ot - sin& - 1)
= 2 sin +wcos w(t -3). We note that the amplitude of the sine wave is changed. If w is such that lsin 301 <$, the amplitude will be reduced. This will be true for long period waves. On the other hand, for short period waves, 1713 < w < rr, the amplitude will be increased. Note that this agrees with Figure 9.6.2, which shows the transfer function to be greater than one for w > n13.
Filtering 8 stationary time series with least squares weights to remove the seasonal effects will reduce the power of the spectral density to zero at the seasonal fresuencies. For a time series with p observations per period of interest, the seasonal frequencies are defined to be 2mp-'w, rn = 1 ,2 , . . . , L [ p ] , where L [ p ] is the largest integer less than or equal p12. For example, with a monthly time series, the seasonal frequencies are w16, 1~13, w12, 271/3,5rr16, and w. We have not included the zero frequency because most seasonal adjustment schemes are not constructed to remove the mean.
PmposJtion 9.6.2. Let uj be a least squares linear filter of length R constructed to remove seasonal variation from a time series with p (Rap) observations per period of interest. Let the time series have an absolutely summable covariance function. Then the spectral density of the filtered time series is zero at the seasonal frequencies.
Froof. A lineat least squares filter constructed to remove the seasonal effects
516
satisfies
REGRESSION, “D, AND SEASONALITY
2mn R
2 ajsin---j=O, j - I P
This is so because any periodic function of period p defined on the integers can be represented as a sum of p sines and cosines. We have
and, setting w = 2 m / p ,
form = 1.2, . . . , L,fpl . A
The first difference is a filter satisfying the conditions of Proposition 9.6.1, and the difference of lagp satisfies the conditions of Proposition 9.6.2. We now consider the effects of difference operators on autoregressive moving average time series.
Proposition 9.63. Let X, be an autoregressive moving average time series of order (p, q) expressible as
where {e,] is a sequence of uncorrelated (0, a’) random variables. Let the roots of m P + a,mP-l + - * + a- = 0 be less than one in absolute value, and let the roots ofsq+ /31sq - ’ + . . . + B , = ~ b e s , , s , , ..., s,.~henthefirstdifference x = x , - Xg-l is an autoregressive moving average ( p , 4 + 1) with the autoregressive portion uacbanged and the roots of the moving average portion given by sl. s2,. . . , sq, 1.
Pmf. The spectral density of X, is given by
SOME E-S OF MOVING AVERAGE OPERATORS 517
The spectral density of Y, is given by
' ] = I
where sq+l = 1. A
It follows immediately from Proposition 9.6.3 that the kth difference of the stationary autoregressive moving average time series of order (p, q) is an autoregressive moving average time series of order (p, q + k), where at least k of the roots of the auxiliary equation associated with the moving average are one. Differences of lag r have similar effects on a time series.
Proposition 9.6.4. Given a time series X, satisfying the assumptions of Proposition 9.6.3, the difference of lag r of XI is an autoregressive moving average time series of order ( p , q + r) where the autoregressive portion is unchanged and the roots of the moving average portion are s l , s2, . . . , sq plus the r roots of the equation sr = 1.
Proof. The spectral density of Y, =XI - X,-? is given by
fJo) = (1 - e-""')(l - e'Yr)fX(o)
and, for example, the factor e-'"' - 1 can be written as a + r
wherethes,, j = q + l , q + 2 ...., q+r,aretherrmtsofs '=l . A
These results have important practical implications. If one is attempting to identify an autoregressive moving average model for a time series that has been differenced, it is wise to consider a moving average of order at least as large as the order of differences applied to the original time series. If the time series was stationaty before differencing, the characteristic polynomial of the moving average
518 REGRESSION. TREND, AND SEASONALITY
portion of the differenced time series contains at least one unit root and the estimation theory of Section 8.3 is not applicable.
9.7. REGRESSION WITH TIME SERIES ERRORS
In this section we treat the problem of obtaining estimates of the parameters of regression equations with time series errors. We investigated the properties of the ordinary least squares estimators in Section 9.1 and found that for some special types of independent variables the ordinary least squares estimators are asymp- totically fully efficient. For other independent variables, including most stationary time series, the ordinary least squares estimators are not efficient. Moreover, in most situations, the variance estimators associated with ordinary least squares are biased. To illustrate some of these ideas, consider the simple model
(9.7.1) q=px,+z , ,
(Z,, X , ) = w,- f I I\x, - I 1 + (er * '
where the e, are n o d independent (0, a:) random variables, the ,re normal independent (0, a,") random variables, el is independent of uj for all 2, j , lpl< 1, and IAI < 1. Under the model (9.7.1), the expected value of
conditional on the observed XI, t = 1,2,. . . , n, is p, and hence js is unbiased.
( X I , X,, . . . , X n ) is given by equation (9.1.4), The variance of the least squares estimator of /3 conditional on X =
r = l j - 1
where Q: = x ( O ) . Now the sample autocovariance of X, converges to the population autocovariance, and
where u: = ~ ~ ( 0 ) . Hence,
REGRESSION WITH TIME SERIES ERRORS 519
If p is known, the variance of the generalized least squares estimator computed with known p, conditional on X, is
It follows that the large sample relative efficiency of generaIized least squares to ordinary least squares is
"his expression is greater than one for all nonzero (p, A) less than one in absolute value. For p = A = (OS)'", the relative efficiency is 30096!
The estimated variance of the ordinary least squares estimator obtained by the usual formulas is
where s2 = (n - l)-I z;=l (y, - &x,,>'. Since V{bs 1 X} = OJn-'), we have bs - p = Op(n-"2) and s2 converges to
the variance of Z,. Therefore,
and
The common least squares estimator of variance will be an over- or underestimate, depending on the signs of p and A. If p = A = (0.5)1'2, the ordinary least squares estimator will be estimating a quantity approximately one-third of the true variance. Given these results on the efficiency of ordinary least squares and on the bias in the ordinary least squares estimatM of variance, it is natural to consider an estimator such as (9.1.3). where the estimated autocorrelations are used in place of the unknown parameters.
In Section 9.3 we established that the autocorrelations of the error time series could be estimated from the calculated regression residuals. We apply those results to our current model.
Propasition 9.7.1. Let Y, satisfy
520 REGRESSION, TREND. AND SEASONALITY
(9.7.2)
where Z, is a stationary autoregressive process, D
the roots of the characteristic equation are less than one in absolute value, the 9- satisfy (9.1,7), (9.1.8), and (9.1.9), and {e,} is a sequence of independent (0, c2; random variables witb E{e:} = qc4. Denote the simple least squares estimator of /3 by B = (b,, A, . . . , &’ and the simple least squares residuals by &.
Let dir be the estimator of a = (a,, a2, . . . , ap)’ obtained by regressing 2, on (&,,&, - - . ,$Z,-p), t = p + 1, p + 2 , . . . ,n , and let & be the estimator obtained byregressing2, OI~(Z,~,,Z,~~,...,Z,~~). Then
dir - & = Op(n-’ ) .
The result is an immediate consequence of Theorem 9.3.1. Proof. A
Given an estimator of the correlational structure of the error time series, we wish to construct the estimated generalized least squares estimator. Often the most expeditious way to obtain this estimator is to Et.ansfonn the data. In the present case, the Gram-Schmidt orthogonalization leads one to the transformed variables
(9.7.3)
P
q = e , = ~ , + Z a , ~ ,-,, z = p + l , p + 2 ,..., n , i=1
where S,, = yL1”(O)q h2 = { [ 1 - p ~ ( 1 ) ] ~ ( 0 ) } - ” 2 u , =%(lX[l - p~(1))75(0)}-1’za, etc. The t; are uncomlated with constant variance u . For the first order autoregressive time seiiea the transformed variables are
El = (1 - p2)”2z1 , +=ZI-pZ,- l , t = 2 , 3 ,..., n.
As a second example of the transformation, consider the second order auto- regressive process
Z, = 1.532,-, - 0.662,.-, + e, ,
REGRESSION WlTH TlME SERIES ERRORS 521
where the variance of e, is u2. Then by (2.5.8) the variance of 2, is 11.77~'. The correlations for the time series are given in Table 6.4.2. The transformation is
~~=(11.77) -"~2, ,
4 = (1.7718)-"'(22 - 0.92172,), 5 = e, = 2, - 1.532,-, + 0.662,-, , t=3,4 ,..., n .
To define the estimated generalized least squares estimator, we can express our original model (9.7.1) in the matrix notation of Section 9.1 as
y=cpg+z.
We let V,, = E{zz'} and E = Tz, where T is the n X n transformation matrix defined in (9.7.3). Then the estimated generalized least squares estimator is obtained by regressing ?y on %,
b= [@'*'9cp]-'cp'P?y = [@'$,'@]-'Cb'9,'y1 (9.7.4)
where 0,' = T'Tc?-', a2 = (n - 2p) - 1 ZimpTl(& n f Z,P=, c$,-~)~, and 9 is ob- tained from T by replacing aj with 4, u with B', and so forth.
In Section 5.7 we demonstrated that the limiting distributions of the generalized least squares estimator and of the estimated generalized least squares estimator are the same under mild assumptions. By Theorem 5.7.4 and Proposition 9.7.1, the estimated generalized least squares estimator for an error process that is an autoregressive moving average will have the same limiting distribution as the generalized least squares estimator based on the m e parameters, under mild assumptions. In Theorem 9.7.1, we give a direct proof of the result for autoregressive errors using the transformation 9.
Theorem 9.7.1. Let Y, satisfy the model (9.7.2), where the (Pri satisfy the assumptions (9.1.7), (9.1.8). and (9.1.9). Then
Dn(BG - B ) = op(n-"2) ?
where BG = [WV~'@]-lcP'V~lg, @ is defined in (9.7.4). dnii = (Xr=l P , ~ ) 2 112 , and
D, = diagVn, I , dn22, . . . I d,,,) . Proof. We have, for f',, the tth row of $ and 9' the ith column of a,
P
6 = *,.z = €, + c (4 - aj)z,-j, J = I
522 REGRBSSION. TREND, AND SEASONALITY
for t = p + 1, p f 2,. . . ,n, with similar expressions holding for t = 1,2,. . . , p . By Proposition 9.7.1 and Theorem 8.2.1, di! - a! = O,,(n-“’), and it follows that
Therefore,
D,@ - p) = ( D , ~ Q ~ $ ‘ ~ ~ D ~ I ) - I D , * ~ ’ $ ~ ~ Z I
= @n-r#V;lrpD,’)-lD,lO‘V;lz + Op(n-1’2)
= D,(& - 8) + O,(~Z-~”). A
Since the residual mean square for the regression of “y on $4) converges in probability to uz, one can use the ordinary regression statistics for approximate tests and con6dence intervals.
Because of the infinite autoregressive nature of invertible moving average time series, an exact transformation of the form (9.7.3) is cumbersome. However, the approximate difference equation traasfonnation wili be adequate for many purposes. For example, if
Zr=et+be , - , , lbl<1,
the transformation
e, = (1 + b2)-”’2, , (1 + b2)”’(1 + b2 + b4)-”’[& - (1 f b2)-’bZ, J , (9.7.5)
$=Z,-b5-1 s t=3 ,4 ,..., n ,
wiH generally be satisfactory when b is not too close to one in absolute value.
Example 9.7.1. To illustrate the fitting of a regression model with time series errors, we use the data studied by Prest (1949). The data were discussed by Durbin and Watson ( 195 11, and we use the data as they appear in that article. The data, displayed in Table 9.7.1, pertain to the consumption of spirits in the United Kingdom from 1870 to 1938. The dependent variable Y, is the annual per capita consumption of spirits in the United Kingdom. The explanatory variables q, and y2z are per capita income and price of spirits, respectively, botb deflated by a general price index. All data are in logarithms. The model suggested by Prest can be written
REGRESSION WITH TIME SERIES ERRORS 523
Table 9.7.1. AMWI Consumption of Spirits in the United Kingdom from 1870 to 1938
Consumption Income Price Y W v, 9, I 4112
1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904
1.9565 1.9794 2.0120 2.0449 2.0561 2.0678 2.0561 2.0428 2.0290 1.9980 1.9884 1.9835 1.9773 1.9748 1 .%29 1.9396 1.9309 1.9271 1.9239 1.9414 1.9685 1.9727 1.9736 1.9499 1.9432 1.9569 1.9647 1.9710 1.9719 1.9956 2.oooo 1.9904 1.9752 1.9494 1.9332
1.7669 1.9176 1.7766 1.9059 1.7764 1.8798 1.7942 1.8727 1.8156 1.8984 1.8083 1.9137 1.8083 1.9176 1.8067 1.9176 1.8166 1.9420 1.8041 1.9547 1.8053 1.9379 1.8242 1.9462 1.8395 1.9504 1.8464 1.9504 1.8492 1.9723 1.8668 2.oooO 1.8783 2.0097 1.8914 2.0146 1.9166 2.0146 1.9363 2.0097 1.9548 2.0097 1.9453 2.0097 1.9292 2.0048 1.9209 2.0097 1.9510 2.02% 1.9776 2.0399 1.9814 2.0399 1.9819 2.0296 1.9828 2.0146 2.0076 2.0245 2.oooo 2.oooo 1.9939 2.0048 1.9933 2.0048 1.9797 2 . m 1.9772 1.9952
Consumption Income Price Year v, Qfl %2
1905 1906 1907 1908 1909 1910 191 1 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 I 926 1927 I928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938
1.9139 1.9091 1.9139 1.8886 1.7945 1.7644 1.7817 1.7784 1.7945 1.7888 1.8751 1.7853 1.6075 1.5185 1.6513 1.6247 1.539 1 1.4922 1.4606 1.455 1 1.4425 1.4023 1.3991 1.3798 1.3782 1.3366 1.3026 1.2592 1.2635 1.2549 1.2527 1.2763 1.2906 1.2721
1.9924 1.9952 2.0117 1.9905 2.0204 1.9813 2.0018 1.9905 2.0038 1.9859 2.0099 2.0518 2.0174 2.0474 2.0279 2.034 I 2.0359 2.0255 2.0216 2.0341 1.9896 1.9445 1.9843 1.9939 1.9764 2.2082 1.9965 2.2700 2.0652 2.2430 2.0369 2.2567 1.9723 2.2988 1.9797 2.3723 2.0136 2.4105 2.0165 2.4081 2.0213 2.4081 2.0206 2.4367 2.0563 2.4284 2.0579 2.4310 2.0649 2.4363 2.0582 2.4552 2.0517 2.4838 2.0491 2.4958 2.0766 2.5048 2.0890 2.5017 2.1059 2.4958 2.1205 2.4838 2.1205 2.4636 2.1 182 2.4580
SOURCE: Durbin and Watson (1951). Rqmduced with pmnission of the Trustees of Biometrika and of the authors.
wkre 1869 is the origin for t , p,, = lO-'f, pf4 = lO-"(t - 35)'. and we assume 2, is a stationary time series.
The ordinary least squares regression equation is
E,= 2.14 + 0.69 q, - 0.63 YI,~ - 0.95 p,3 - 1 . 1 5 ~ ~ . (0.28) (0.14) (0.05) (0.09) (0.i6)
524 REGRESSION. TREND, AND SEASONALITY
The residual mean square is 0.00098. The numbers in pentheses are the standard errors computed by the ordinary regression formulas and would be a part of the output in any standard regression computer program. They are nor proper estimators of the standard errors when the error in the equation is autocorrelated. We shall return to this point.
The Durbin-Watson d is 0.5265. Under the null hypothesis of normal independent errors, the expected value of d is
E{d} = [2(68) - 0.67](64)-' = 2.1 145,
where
@{(A@)'(A@XdW)-'} = 0.67.
Therefore, from equations (9.3.8) and (9.3.9),
b= 0.7367 + (0.5)(0.1145)(66)(65)-'(1 - 0.5427) = 0.7633
and
f d = (66)"2(0.7633)(0.4174)-"2 = 9.60.
Obviously, the hypothesis of independent errors is rejected. To investigate the nature of the autocorrelation, we fit several autoregressive models to the mgression residuals. The results are s u d z d in Table 9.7.2. The statistics of that table are consistent with the hypothesis that the error time series is a first order auto- regressive process. Therefore, we construct the transformation of (9.7.3) for a first order autoregressive process with 6 = 0.7633. The transformation matrix is
-0.7633 1
0
-0.7633 ::: 0 0 * * * !J, 1
-0.7633 1 ... 0 *=I 0 . -0.7633 1 * a *
where 'li' is a 69 X 69 matrix and 0.6460 = (1 -
the estimated generalized ieast squares equation
= [I - (0.7633)21"2. Regressing the transformed U, on the transformed q,, i = 0.1, . . . ,4, we obtain
The numbers in parentheses are consistent estimates of the standard errors of the
REGRESSION WITH TIME SERIES ERRORS 52s
TaMe9.7.2. AapiysiS of Varlance for Residupls from Spirit Consumption Regression
S o m e Freedom square Degrees of Mean
0.03174
0.00013
2- I
1 0.00036 it,-, after 2,-,,zl-z
1 0.00004 %-, after Zl-,p&.2, 2,-, 1 0.00049
21-, aftcr;31-, I
Error 56
estimators. They are computed h m the transformed data by the usual regression formulas. The error mean square for the transformed regression is 0.000417.
Every one of the estimated standard errors for the generalized least squares estimator based on /i = 0.7633 is greater than the standard error calculated by the usual formulas for the simple least squares estimates. Of course, we know that the standard formulas give biased estimators far simple least squares when the errors are autoconrelated. To demonstrate the magnitude of this bias, we estimate the variance of the simple least squares estimators, assuming that the error time series is
Z, = 0.76332,- + e, ,
where e, is a sequence of uncorrelated (O,O.O00417) random variables. The covariance matrix of the simple least squares estimator is estimated by (@f@)-'Wtz@(WD)-', where Q, is the 69 X 5 matrix of observations on the independent variables and va is the estimated covariance matrix for 69 observa- tions from a first order autoregressive time series with b = 0.7633.
Alternative estimates of the standard errors of the coefficients are compared in Table 9.7.3. The consistent estimates of the standard errors of simple least squares obtained from the matrix (@'@)-'W~z@(@f9)-' are approximately twice those obtained from the simple formulas. This means that the variance estimates
Table 9.7.3. Comparison of Alternative Estlmatom of Standard Errors of Estimators for Spfrits Example
Estimator Consistent Consistent for Simple Estimator Estimator for
Least squares for Simple Generalized Coefficient Assuming p = 0 Least squares Least squares
Bll 0.282 0.54s 0.3W Bl 0.138 0.262 0.146 a 0.054 0.112 0.072
0.088 0.179 0.108 0.161 0.313 0.266
a 84
526 REQRESSION. TREND, AND SEASONALITY
computed under the assumption of zero correlation are approximately one-fourth of the consistent estimates. The estimated standard enms for the generalized least squares estimators range fium about one-half (0.56 for income) to nine-tenths (0.85 for time squared) of those for simple least squares. This means that the estimated efficiency of simple least squares relative to generalized least squares ranges from about three-tenths to about seven-tenths.
The computations for regression models with autoregressive mrs m available in several computer programs. 'Itre procedute AUTOREG of SAS Institute, Inc. (1989) is one convenient program. If we specify a first order autoregressive error and the Yule-Walker procedure of estimating the autoregressive process, we obtain an estimate b = 0.7160 with a standard e m of 0.0879. The estimate of p differs from ours because of the different estimator used. The estimated regression equation is nearly identical to the one given above. AA
In Example 9.7.1 we gave details of a two-step computational procedure for estimation of a regression equation with time series errors. Computer programs offer one the option of specifying and estimating the complete model, either by the two-step procedure or as a single maximization problem.
Assume the time series error in a linear regression model to be an auto- regressive process. Thus, we write
Y,=q.p+Zl. t = 1 , 2 ,..., n , (9.7.6a)
P
z1 + C. aiz,+ = e, , i = l
(9.7.6b)
where {eJ is a sequence of M(0, a') random variables, and q,, t = 1,2,. . , , n, are &-dimensional row vectors of explanatory variables. If the process 2, is a stationary normal process, the log likelihood is
UY; e) = -o.s[n log 2~ + loglv,l+ (Y - @p)'v;'(y - @p)i, (9.7.7)
where V,, =V,,(a, a2) is the n X n covariance matrix of (2, ,Z2, . . . , Z,,), Y is the n-dimensional column vector (Y,, Y2, . . . , Y")', CP is the n X k matrix
The theorems of Section 5.5 can be used to obtain the limiting distribution of the maximum likelihood estimator. The limiting distribution of the maximum likelihood estimator of j3 is the same as the limiting distribution of the generalized least squares estimator, if the latter exists.
2 (pi., q;., . . . , qi.)', and 8' = (PI, &, . . . I B,, al, %, . . . , aps a ).
Theorem 9.7.2. Let Y, satisfy equation (9.7.6a). where 2, is a stationary process with the roots of the characteristic equation associated with (9.7.6b) less than one in absolute value. Assume the e, are iid(0, cr') random variables or that the e, satisfy the martingale difference conditions of Corollary 5.3.4. Let (y1.1 and m,,} be sequences such that
REGRESSION WITH TIME SERIES ERRORS 527
and
Let d be the estimator that maximizes the log Gaussian likelihood (9.73, and let eo be the true parameter vector. Then
Mi(@ - Po)--% N(0, I),
n"*(& - a')-% N(0, A-'02),
8' 5 c', and ML(B - pa) is independent of r ~ " ~ ( & - a') in the limit, where A = N Z , - I , Zt+, . . . , Zt-,, )'(Z,- 1, 2,- ', . . . , z, -,, 11.
prool. omitted. A
Example 9.7.2. We use the wheat yield data of Example 9.2.1 for the period 1908 to 1991 to illustrate the nonlinear fitting. The fi, from Example 9.5.1 and a plot of the data suggest that wheat yields were nearly constant for the first 25 years and for the last several, say five, years. Therefore, we consider a mean function that is constant for the first 25 years and for the last five years. A function that is continuous with continuous first derivative that meets these requirements is
0, 1 =s t G 25, (t - 25)', 25 6 t 4 54, 841 + 58(t - 54) 54 c t s 70, 841 + 58(t - 54) - 2.9(t - 70)2, 7 0 c t 6 80, 2059, t a 8 0 ,
where t = 1 for 1908, The first part of the function was used in Example 9.2.1. We assume the errors follow a first order autoregressive process. Then our model is
u, = B, + BI Pa + 2, (9.7.8)
~ 2 = [
for t = 1.2, . . . , n, where 2, is the stationary process satisfying
Z t + a , Z , - , = e l .
and we assume the e, are iid(0, o:).
maximum likelihood estimates, we obtain If we use procedure AUTOREG of SAS/ETSo to compute the Gaussian
bo = 14.257, 8, = 0.01075, &, = - 0,292, &: = 3.427. (0.380) (0.00037) (0.107)
52% REORPSSION, TREND, AND SEASONAUTY
The model predictions for the next three observations are
(flw,, f,993, f,,,,) = (35.78, 36.21, 36.34). (1.95) (2.05) (2.07)
The prediction for 1992 is similar to that obtained in Example 9.5.1, The model of Example 9.5.1 postulates a random walk for the mean function and uncomlated deviations from the mean function. Hence, the predictions for a11 future periods are the best estimate of the yield for 1991, and the estimate for 1991 is heavily influenced by the actual observation for 1991. The predictions under the model (9.7.8) ate based on the estimated trend associated with the function fi2. Because the function is constant for t Z 80, the predictions ultimately fall on a horizontal line. Because the yield for 1991 is about 2.1 bushels below the trend line, the prediction for 1992 is about 0.29 X 2.1 = 0.61 bushels below the trend line, the prediction for 1993 is about (0.29)2 X 2.1 = 0.18 bushels below the trend line, and the prediction for 1994 is abwt (0.29)3 X 2.1 = 0.05 bushels below the trend line. Because the 1991 yield is below the trend line and the estimated autocorrelation is positive, the predictions for the next three years increase toward the trend line.
The standard errors of the predictions constructed in this example are smaller than those of the predictions of Example 9.5.1. In this example, the mean is assumed to be of a known functional fcnm. In Example 9.5.1, the mean is assumed to be random walk. If the mean function is known, then it is easier to predict than the random walk. However, in using the estimated trend function for prediction, one projects the trend function into the future. Often practitioners hesitate to project polynomial trends very far into the future. In our current example, the most recent past of the mean function is assumed to be constant, and we might be less worried about our trend assumption than we would have been in constructing predictions in 1971 using the function of Example 9.2.1. To project linearly in 1971, one would need to believe that the improvements in varieties and pesticides and the use of fertilizer that led to increased yields during the observation period would continue at the same rate during the prediction period. In the case of wheat yields the linear trend predictions would have peformed well for about fifteen years, from 1971 until about 1986. AA
Example 9.7.3. As a second illustration of nonlinear estimation, we use the data on consumption of spirits of Table 9.7.1. If we use nonlinear least squares and fit the model with a first order autoregressive e m to the last n - 1 observations, we obtain
= 2.421 f 0.716 p,, - 0.818 ptZ - 0.846 ~5~ - 0.560 ~ 5 4
(0.304) (0.149) (0.074) (0.1 19) (0.380)
2, = 0.788 Zt-, , (0.078)
REGRESSION WITH T W E SERE3 ERRORS 529
and 6’ = O.OOO41, where y)r3 = lO-’t and pf4 = 10-4(t - 35)’. If we fit the full model by maximum likelihood, we obtain
pI = 2.391 + 0.727 y ~ , - 0.822 y)rz - 0.773 qf3 - 0.932 pt4, (0.308) (0.150) (0.075) (0.1 13) (0.300)
2, = 0.812 Z,-, , (0.076)
and 6’ = 0.00042. As expected, the results are very similar to those obtained in Example 9.7.1 because the three procedures have the same asymptotic efficiency.
If we fit the model with a secolld order autoregressive error by maximum likelihood, we obrain
= 2.473 + 0.704 - 0.832 y~~ - 0.845 p,3 - 0.470 q r , (0.333) (0.156) (0.077) (0.127) (0.420)
= 0.742 Z f - I + 0.054 Z,-’ , (0.140) (0.139)
and 8’ = 0.00042. The second order coefficient in the autoregressive model is small relative to its standard error, agreeing with the results of Table 9.7.2. AA
Regression models in which the current value of a time series U, is a function of current and lagged values of a time series X, are sometimes called transfer function models. The models are often written in terms of the backshift operator. Thus, the simple model
U, = PIX, + PZXI - I + el
can be written as
More complicated models may contain ratios of polynomials in the backshift operator. For example,
E: = [ A ( & ? ) I - ’ B ( S B ) X , + e,
where A(&?) = 1 - a, 93-- - B(a) = 6 , - b,93-. - --b,B”, the ci are the coefficients defined by [A(B)]-’B(B), and it is assumed that the mots of A( a) = 0 are greater than one so that the coefficients converge to zero. The set of coefficients (c,,, c,, . . .) is called the impulse response function. Such models fall into the class of nonlinear regression models covered by the theory of Section 5.5, provided X, is reasonably well behaved.
530 RBCIRESSION. TREND, AND SEASONALITY
9.8 REGRESSION EQUATIONS WITH LAGGED DEPENDENT VARIABLES AND TIME SERIES ERRORS
In the classid regression problem tbe emr in the equation is assumed to be distributed independently of the independent variables in the equation. indepen- dent variables with this property we call ordinary. In ow investigation of the estimation of the autoregressive parameters in Chapter 8, we noted that the autoregressive representation had the appearance of a regression problem, but that the vector of errors was not independent of the vector of lagged values of the time series. A generalization of the autoregressive model is one containing ordinary independent variables and also lagged values of the dependent variable as explanatory variables.
Consider the model
= X,0 f e, (9.8.1)
fOK t = 1 , 2 , ..., where x,=(~,p,), y)tz,... ~Q)lr,q-I,q-~,..,,q-~), e= (&, &, , , . , &, -a -%, . . . , -a,), the ql are fixed constants, and e, are uncorrelated (0, 02y random variables. The characteristic equation associated with equation (9.8.1) is
D
(9.8.2)
Such models are common in economics. See Judge et al. (1985, Chapter 10). Ordinary least squares is a natural way to estimate the parameters of (93.1). Let
denote the least squates estimator. The limiting distribution of the S t 4 W h t U d 8 is normal under mild assumptions.
equation
(9.8.3)
PmPerlY
Theorem 9.8.1. Let U, satisfy equation (9.8.1). where Yo, Y-,, . . . , Y-,+, are fixed. Assume
N e , , e:> I SZ,-~) = (0, r2) a.s.
and
E{lef1*+” I s z t , - , } < ~ < w as.
for all t and some u>O, where d,-, is the sigma-field generated by
REGRESSION EQUATIONS WITH LAGGED DEPENDENT VARIABLES 531
and
(9.8.5)
Let b be defined by (9.8.3). Then
M:( b - O o ) 5 N(0, I) . Proof. Let q' be an arbitrary (k +p)-dimensional row vector, and &fine
where V l , = Z:=, a:,,. By assumption, -2 2 plims,, v,, = 1
and condition (ii) of Theorem 5.3.4 is satisfied. We have
- I ' where R, = {e: lei> I q'M, X,(-'e} for t = 1,2,. . . , F,(e) is the distribution function of e,, and
- 2 2 p Because s,,V,,, 1 and because the supremum af rhe integrals of e2 over R , goes to zero as n increases, condition (iii) of Theorem 5.3.4 is satisfied. Because q is arbitrary, the joint normality is established. A
In Theorem 9.8.1, conditions on the nature of the difference equation associated with (9.8.1) are imposed t h u g h the assumptions (9.8.4) and (9.8.5). If {@,} is a fixed sequence, whexe a, = (p,,, pr2,. . . , R~), then the existence of a sequence of fixed matrices {Mlln} such that
532 REGRESSION, TREND, AND SEASONALITY
together with the assumption that the roots of (9.8.2) are less than one in absolute value, is sufficient for (9.8.4) and (9.8.5). See Fuller, Hasza, and Goebel (1981).
The conclusion of Theorem 9.8.1 can be obtained with martingale difference errors or with iid(0, a') errors. The martingale difference property is critical to the proof. If the e, are autwomlated, E{Y;-,e,} may not be zero, in which case the
To obtain consistent estimators for the parameters in the presence of autocorre- lated errors, we use the method of instrumental variables introduced in Section 5.6. The lagged values of O D , ~ are the natural instrumental variables. To simplify the presentation, we consider a model with two independent variables and two lagged values of Y,,
least squares estimator is no longer consistent. I
Y, = /?, ptI + he2 + A, Y,- , + A,<-, + Z, , t = 1,2, . . . , (9.8.6)
where 2, is a stationary time series with properties described in Theorem 9.8.2. There are several methods of instrumental variable estimation. In econometrics,
the two common instrumental variable procedures are called two-stage least squares and limited information maximum likelihood. The two procedures have the same asymptotic efficiency. For the model (9.8.6), the first step of the two-stage procedure is the regression of q-, and K - z on R,. ( p t - , , I , f l - 2 . 1 ~ Y)2,
OD, -,,*, and O D , - ~ , ~ . The predicted values K-, and t-2 are obtained from these regressions, and the instrumental variable estimates are OMained by regressing Y,
To apply an instrumental variable procedure to the model (9.8.6). we assume the matrix of sums of squares and products of ( p f I , pf2, O D , - I , ~ , f l - 2 . 1 , %-1.2,
O D , , - ~ , ~ ) to be nonsingular and to satisfy assumption 1 preceding Theorem 5.6.1. We requin: only two instrumental variables in order to estimate the parameters A, and 4, and some of the variables may be omitted if singularities appear in practice. For example, if Qh = (pa, 9,. p2), where O D , ~ 1 and OD,, = t, then the matrix with rows (1, t, eZ. t - 1, Y ) - ~ , ~ , t -2 , ~)1-~,d will be singular. One can delete variables from the regression to create a nonsingular regression problem. In the examplecaseonecanomit ~ ~ , - , , , = r - l and ~ ) - ~ , , = t - 2 .
In Theorem 9.8.2, we show that the instrumental variable estimator is consistent
on (4Ptl. Y)zt t - 1 9 t - 2 ) .
for the model (9.8.6). The conclusions generalize O D , ~ and any number of lagged Y's.
Theorem 9.83. Let {q: t E (1,2, . . .)} satisfy
immediately to any number of
the model (9.8.6), where m
REGRESSION EQUATIONS WlTH LAGGED DEPENDENT VARIABLES 533
the bj are absolutely summable, the roots of m2 - A,m - 4 = 0 are less than one in absolute value, (Y- ,, Yo) are bounded in probability, and the e, are independent (0, a') random variables with E(]e,12"} < L2+& for some finite L and some 8 > 0.
Let pfI and cpI2 satisfy the assumptions (9.1.8) and (9.1.9). Let
Q =,'% Q,,
be nonsingular, where
Let 6 = (b, , &, &, &) be the instrumental variable estimator, and let
,= 1
D2,,(6- 8)=0,(1).
Proof. By the difference equation properties of our model, we may write
where U, = 2;:; wjZf,, the wj are defined in terms of A, and 4 by Theorem 2.6.1, and c,, and czf go to zero as t goes to infinity. Equation (9.8.7) gives q- , and q-2 as functions of lagged values of q l and e2 and of a random term U,. By our assumptims on the R~ and since PI and p2 are not both zero, the current and lagged values of R~ satisfy the conditions placed on Q, and 9 in Theorem 5.6.1. Likewise, Z, satisfies the conditions on Z, of Theorem 5.6.1.
The B, of assumption 4 preceding Theorem 5.6.1 is
B, = D;,,?Q,; #)'r(cP; W;.' ,
where the ijtb element of r is r;(li - j l ) , and tbe matrix B = limn+- B, is of the same form as the matrix B of Theorem 9.1.1. The matrices G,, of assumption 4 preceding Theorem 5.6.1 are of a similar form, with the covariance matrix of U,- , , of U, - 2 , or of U,-, with U,-2 replacing that of 2,.
534 REGRESSION, TREND. AND SEASONALITY
Therefore, the conditions of Theorem 5.6.1 are satisfied, and the order in A probability of the emr in the estimators is given by that theorem.
By Theorem 9.8.2 the instrumental variable estimator is consistent when the enor in the equation is a stationary moving average time series with absolutely sununable coefficients. Although the instrumental variable estimators are not the most efficient estimators, they are not dependent on a specification for the emr process. Hence, they can be used as preliminaq estimators. In some applications, one will use the instrumental variable estimators to obtain information about the emr process. Given the instrumental variable estimates BI, b2, 6,, 4 of the model (9.8.6), we can calculate the estimated residuals
The presence of the lagged Y-values in the equation means that the results of Theorem 9.3.1 do not hold for autoconelations computed from these residuals. However, we are able to obtain a somewhat weaker result.
Theorem 9.83. Given the assumptions of Theorem 9.8.2, the sample covariances calculated from the instrumental variable residuals satisfy
9dh) - 9;(& = Op(rn) T
where
is defined in (9.8.8), rn = max(n-”2, S;’}, and S, is defined in Theorem 9.8.2.
Proof. Using the definition of 2, we have, for h = 0,1,. . . , n - 1,
REGRESSION EQUATIONS WITH LAGGED DEPENDENT VARIABLES 535
Therefore,
where D,, is defined in T h m m 9.8.2. A
Because the estimated autocorrelations are consistent estimators, the estimated autocorrelations of the residuals can be used to develop a model for the error process. Given an original model such as (9.8.6) and a specified form for the error process, it is natural to estimate the entire model by a nonlinear fitting procedure.
To introduce such models, let Z, be the 6rst order autoregressive process
Zf = PZ,-, i- e, , (9.8.9)
where lpl< 1 and the el are independent (0, a’) random variables. We may then write (9.8.6) as a nonlinear model with lagged values of U, among the explanatory variables, and whose error term e, is a sequence of independent random variables. That is, substituting (9.8.6) into (9.8.9), we obtain
U,=/3,qJ,] +&Pf,+(A1 +P)U,-, + ( A , - P A i ) U , - ,
- Pfli%i,l - P&e)r-1,2 - P M - 3 +el
(say) =fiWf;il)+ef, t=4,5, ..., n , (9.8.10)
where W, = (ql, q2, e)r- l ,19 P ~ - ~ . , , T-, . T-,, U , - 3 ) and 6’ = (4,e2, fl3,e4, 4) = (PI, &, A,, A,, p). Note that we have expanded the vector of parameters to include P.
By Theorem 5.5.3 and its comllaries, the least squares estimator of 6 will be consistent and, when properly standardrze . 4 asymptotically normally distributed under mild assumptions. Note that the model (9.8.10) is of the same form as (9.8. I) with coefficients that satisfy nonlinear restrictions.
&le 9.8.1. To illustrate the estimation of a made1 containing lagged values of the dependent variable, we use a model studied by Ball and St. Cyr (1966). The data, displayed in Table 9.8.1, are for production and employment in the bricks, pottery, glass, and cement industry of the United Kingdom. The
536 REGRESSION, TREND. AND SEASDNALITY
TabIe9.8.1. Production Index and Empbynmt of the Urkks, Pottery, Glass, and Cement Industry of the United Kingdom Year- Logarithm of Logarithm of Quarter production Index Employment (OOO)
1955-1 4.615 5.844 2 4.682 5.846 3 4.644 5.852 4 4.727 5.864
1956- 1 4.625 5.855 2 4.700 5.844 3 4.635 5.841 4 4.663 5.838
1957- I 4.625 5.823 2 4.644 5.817 3 4.575 5.823 4 4.635 5.817
1958-1 4.595 5.802 2 4.625 5.790 3 4.564 5.790 4 4.654 5.784
1959-1 4.595 5.778 2 4.700 5.78 1 3 4.654 5.802 4 4.727 5.814
1960-1 4.719 5.820 2 4.787 5.826 3 4.727 5.838 4 4.787 5.841
1961-1 4.771 5.838 2 4.812 5.844 3 4.7% 5.846 4 4.828 5.855
1962- 1 4.762 5.852 2 4.868 5.864 3 4.804 5.864 4 4.844 5.861
1963-1 4.754 5.844 2 4.875 5.835 3 4.860 5.841 4 4.963 5.852
1964-1 4.956 5.849 2 5.004 5.858
SOURCE: Ball and St. Cyr (1966). Reproduced from Review of Economic Studies Vol. 33, No. 3 with rbe permission of the authors and editor.
REGRESSION EQUATIONS WITH LAGGED DEPENDENT VARIABLES 537
quarterly data cover the period 1955-1 to 1944-2. The model studied by Ball and St. Cyr can be written as
Y, + pi ~5 + + &oft f 8 4 0 1 2 + P$f, + AK-1 + Zf 9 (998.111
where Y, = the logarithm of employment in thousands of the bricks, pottery, glass,
qf = the logarithm of the production index of the industry, and cement industry of the United Kingdom,
t = time with 1954-4 as the origin.
1 - 1 0 otherwise.
if observation t occurs in quarter j , if observation t occurs in quarter 4,
We assume Z, is a zero mean stationary time series, PI # 0, and IAl To constntct the instrumental variable estimator, we use the limited information
maximum likelihood procedure SYSLEN of SAS/ETS". In the terminology of that program Y, and K-, are endogenous variables and q, q-,, 2, D1,, DI2, and D,, are instruments. The inswmental variable estimated equation is
%={
1.
8 = 0.27 + 0.077% - 0.00042t - 0.004920,, - 0.003360,, + 0.00728Df, + 0.894K3_, .
The pmgram calculates estimated standard errors under the assumption that the errors are uncorrelated. The estimated standard errors are biased if the emrs are autoconelated. Because we are considering the possibility that the errors are autocoxrelated, we do not present tbe estimated standard errors.
An analysis of the estimated residuals 2, such as that in Table 9.7.2, indicated that the data are consistent with the hypothesis that 2, is a first order autoregressive process with a parameter of about 0.42. To estimate the model with autoregtessive errors, we use the maximum
likelihood option of procedure ARINA of SAS/ETS". We specify the model (9.8.11) with a first order autoregressive error. While the error in the first observation is correlated with the lagged dependent variable for that observation, we chose to include that observation in the fit. Thus, the program is used to maximize a pseudolikelihood that treats Y,-, as an ordinary explanatory variable. The estimated model is
f: = 0.88 + 0.097 - 0.00055 t - 0.00368 D,, (0.41) (0.032) (0.00030) (0.00186)
(0.00181) (0.00162) (0.081) - 0.00432 Dt2 + 0.00734 D,3 + 0.773 K-1
and @=0.386 (0.194). where the numbers in parentheses are the estimated
538 REGRESSION. l”D, AND SEASONALITY
standard errors, Although this is a fairly small sample with a fairly large number of explanatory variables, the maximum likelihood estimates are similar to the instrumental variable estimates. Because of the small sample and the presence of time among the explanatory variables, it is a reasonable conjectme that the estimator /i possesses a negative bias.
There are two important reasons for recognizing the potential for autocorrelated e m in an analysis such as this one. First, the simple least squares estimators are no longer consistent when the errors are autocorrelated. Second, the estimators of the variances of the estimators may be seriously biased. A hypothesis of some interest in the study of employment was the hypothesis that /3, + A = I. This corresponds to the hypothesis of constant returns to scale for labor input. In the simple least squares analysis the sum of these coefficients is 0.89 with an estimated standard emr of 0.045. The resulting “?-statistic” of -2.5 suggests that there are increasing returns to scale to labor in this industry. On the other hand, the analysis recognizing the possibility of autocorrelated errors estimates the sum as 0.88, but with a standard e m of 0.072. The associated “t-statistic” of -1.7 and the potential for small sample bias in the estimators could lead one to accept the hypothesis that the sum is one. AA
REFERENCES
section 9.1. Anderson (1971). Orenander (1954). Onmder and Rosenblatt (1957), Hannan (1970).
Section 9.2. Ahlberg, Nilson, and Walsh (19671, Fuller (1969). Gallant and Fuller (1973). Greville (19691, Poirier (1973).
W o n 9.3. Bollerelev, Chou, and Kroner (1992), Lhubin and Wamn (1950,1951), Engle (1982), Granger and Teriisvirta (1W3), Hannan (1969). Harvey, Ruiz, and Shephard (1994). Melino and Turnbull (1990). von Neumann (1941, 1942).
Section 9.4. Davis (1941). Grether and Nerlove (1970), Jorgenson (1964), Kendall and Stuart (19661, Love11 (19631, Malinvaud (197Oa). Nerlove (1964). Stephenson and Farr (1972). Tintner (1940, 1952).
Section 9.5. Harvey (1989). Seetion 9.6. Davis (1941), Durbin (1962, 1963), Hannan (1958), Tintner (1940, 1952),
Rao and Tintner (1%3). Section 9.7. Amemiya (1973), Cockan and Orcutt (1949). Durbin (1960). Hildretb (1969).
Malinvaud (197Oa), Rest (1949). Seetion 9.8. Aigner (1971). Amemiya and Fuller (1967). Ball and St. Cyr (1966). Dhynnes
(1971). Fuller and Martin (l%l), Griliches (1967). Hatanalca (1974). Malinvaud (197Oa). Nagamj and Fuller (1991, 1992), Wallis (1%7).
EXERCISES
1. Let the time series X, be defined by
EXERCISES
where
539
~ , = p Z , - ~ + e , , l p l < I , t=O,+1,_+2, ..., and {e,) is a sequence of independent (0, a') random variables. Compute the covariance matrices of Bs of (9.1.2) and BG of (9.1.3) for p = 0.9 and n = 5 and 10.
2. The data in the accompanying table are the forecasts of the yield of Corn in Illinois released by the United States Department of Agriculture in August of the production year. (a) For this set of data estimate a mean function meeting the following
restrictions: (i) The mean function is h e a r from 1944 through 1956, quadratic from
1956 through 1960, linear from 1960 through 1965, quadratic from 1965 through 1%9, and linear from 1969 through 1974.
(ii) The mean function, considered as a continuous function of time, has a continuous first derivative.
(b) Reestimate the mean function subject to the restriction that the derivative is zero for the last five years. Test the hypothesis that the slope for the last five years is zero.
(c) Plot the data and the estimated mean function. Augnst Foreclrst of Corn YieM for IuInets
Year YieM Year Yield Year Yield
1 944 I 945 1946 1947 1948 1949 1950 1951 1952 1953 1 954
45.5 43.0 55.0 45.0 58.0 61.0 58.0 56.0 56.0 58.0 45.0
1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
58.0 59.0 52.0 64.0 63.0 63.0 73.0 79.0 81.0 85.0
1965 1966 1%7 1968 1 %9 1970 1971 1972 1973 1974
88.0
96.0 103.0 95.0 93.0 92.0 97.0
105.0 86.0
82.0
SOURCE: U.S. Department of Agriculture, Crop Production, various issues.
3. Evaluate (9.3.5) and (9.3.6) for n = 10 and the model of Exercise 9.1. The covariance between Xrm2 Z,Z,-, and X:, I 2: for the &st order autoregressive process can be evaluated using Theorem 6.2.1. Using these approximations and the approximation
540 REGRESSION, TREWD. AND SEASONALITY
find the approximate expected value of j2(1) for the model and sample sizes of Exercise 9.1.
4. Using a period of three observations, construct a moving average to remove a linear trend from the third observation and hence show that the second diffemce operator is a multiple of the moving average constructed to remove a linear trend from the third observation in a group of three observations.
5. Define XI by
x, = px,-, + @, , lpl< 1 9
where the el are independent (0, cr2) random variables. Find the auto- correlation functions and spectral densities of
Plot the spectral densities for p = 0.5.
6.
7.
8.
9.
10.
Consider the model
where
1 for quarter i (i = 1,2,3,4) , ‘ti ={ 0 otherwise,
4 c ar, =0 , I = ]
and Z, is a stationary time series. Assuming that the model holds for a period of 20 observations (5 years), construct the weights needed to decompose the most recent observation in a set of 20 observations into “trend,” “seasonal,” and “remainder.” Define trend to be & + /3,r + &rz for the tth observation, and s e a ~ 0 ~ 1 for quarter i to be the a, associated with that quarter.
Obtain the conclusion of Proposition 9.6.3 directly by differencing the autoregressive moving average time series.
Prove Lemma 9.4.1.
Prove Lemma 9.4.2.
Using the data of Exercise 2 and the model of part b of that exercise, construct the statistics (9.3.8) and (9.3.9).
EXERCISES 541
11. Assume that the data of Exercise 2 of Chapter 7 satisfy the model A
where I for quarter j ,
41 ={ 0 otherwise,
2, is the stationary autoregressive process z, = e,z,-, + e,z,-, + C, .
and {e,) is a sequence of normal independent (0, a’) random variables. (a) Estimate the parameters of the model. Estimate the standard errors of your
estimators. Do the data support the model for the error time series? (b) Using the model of part a, estimate gasoline consumption for the four
quarters of 1974. Construct 95% confidence intervals for your predictions, using normal theory. Actual consumption during I974 was [2223,2540,25%, 25291. What happened? Computational hint: If a computer program is available that fits regression models with auto- regressive errors, the parameters and predictions can be obtained in one fitting operation. Add four “observations” and four independent variables to the data set. The four added independent variables are defined by
%*5+‘ ={o otherwise -1 for 1974-i,
for i = 1,2,3,4. The four observations on the original data for 1973 and the four added “observations” for 1974 are displayed in the accompany- ing table. The values of the original independent variables for the prediction period m the true values of these independent variables, while the values of Y, for the prediction period are zero. The augmented regression model with autoregressive errors is then estimated. The coefficients for the four added independent variables are the predictions, and the estimated standard enms are appropriate for the predictions.
Independent Dependent Variables for
Year Quarter Variable a,, a,, a,, a,, t Predictions
1973 1 2454 1 0 0 0 53 0 0 0 0 2 2647 0 1 0 0 5 4 0 0 0 0 3 2689 0 0 1 0 55 0 0 0 0 4 2549 0 0 0 1 5 6 0 0 0 0
1974 1 0 1 0 0 0 5 7 - 1 0 0 0 2 0 0 1 0 0 5 8 0 - 1 0 0 3 0 0 0 1 0 5 9 0 0 - 1 0 4 0 0 0 0 1 6 0 0 0 0 - 1
542 REORESSION, TREND, AND SEASQNALin
12. Using the Prest data of Table 9.7.1, estimate the parameters of the model
U, = P, -t plr + act - 35)' + p3pIl + p464(pf2 + AK-, + Z,, 2, = pZl-, + e, ,
where {e,} is a sequence of independent ( 0 , ~ ~ ) random variables with E{e:}=7p4 and IpI < 1, 1A1< 1.
13. Let U, satisfy
U, = aCq + 2, , Zl=e,fi?e,-,, t=1,2, ...,
where lpl< 1, {el} is a sequence of independent (0, cr2) random variables with E{ef)=rp4, and {qp,} satisfies the assumptions (9.1.8) and (9.1.9). Let
and /.$ is constructed from the first order autocornlation of the (8.3.1) and restricted to (-1,l). (a) Show that
by the lule
/&-)=O,(n-') ,
where
EXERCISES 543
(1 + p p z , , t = 1,
t = 2 , 3 ,..., n ,
and ‘f,. is the tth row of 9. Show that
112 [ i: p:] (4 - a) = op(n-”2), I - 1
where
and
14. Let {Y,} satisfy
v = E{zz‘}.
r I
x = C ~ , f i ~ + . C ~ , x - , + e , , t=1,2, ...*
where (e,) is a sequence of independent (0,a2) random varia~les with E{e:} = v4 and the mots of ms - Z;ci”.; AjmS-j = 0 are less than one in absolute value. State and prove Theorem 9.8.1 for this model.
150 J = 1
15. Add a linear term to the wheat model of Section 9.2, and fit the model with a first order autoregressive emf.
16. Let y1. = (1,4p,, , p,*) = (1, t, t + (- l),). Does this vector satisfy the assump- tions (9.1.8) and (9.1.9) with A, nonsingular? Is there a linear transformation of y3 that satisfies the two assumptions with A,, nonsingular?
17. Let the model (9.7.1) hold with q+, that satisfy the assumptions (9.1.8) and (9.1.9). Show that the conditions of Theorem 5.7.4 are satisfied for (2,) that is a stationary autoregressive moving average with iid(0, a2) errors.
1%. Using the estimates of Example 9.5.1, construct the vector H, to estimate % using (YmW3, Y,-*, . . . , Y,+?, Y,+,). What is the estimated variance of b,,, constructed with the vector?
19. The U.S. wheat yields for 1W and 1993 are 39.4 and 38.3. Are these yields consistent with the model of Example 9.7.27 Fit the model of Example 9.7.2 to the data for 1908 through 1993, and predict yields for 1994, 1995, and 1996. Fit the model in which (9.7.8) is replaced with
REORESSION. TREND. AND SEASONALITY 544
where
1 s t s 7 0 , 70 G t Q 80,
100+20(t-80), t 3 8 0 .
Use the fitted model to predict yields for 1994, 1995, and 1996.
24. The U.S. wheat yields for 1992 and 1993 are given in Exercise 19 as 39.4 and 38.3, respectively. Fit the structural model of Example 9.5.1 to the data for 1908 through 1993. Predict yields for 1994,1995, and 1996.
21. Let Z, be a stationary time series with absolutely summable covariance function. Show that the assumptions (9.1.8) and (9.1.9) are sufficient for the limit (9.1.7) to exist. Give an example of a 2, and fi such that B of (9.1.7) is singular.
22. Let Z, satisfy the ARCH model of (9.3.21). Show that 2, satisfies the assumptions of Proposition 9.3.3.
23. Consider the model
z, = ez,- I + a, , let < 1 , a, = (k, -t k2a:- I )1’2e, ,
where the e, are M(0,l) random variables and Oak, dO.3. Let aOLs and (k,,,,, i2,0Ls)2denote the least squares estimators obtained by regressing Z, on Z,-l and a on (1, 6t-l), respectively, where 6, = Z, - domZ,-l. Define 6, = ( i1 ,OLs + L2.0Ls6:-1). ~ e t (i,,,Ls, i2,,,,) be the estimator of <kl, k2) obtained by regressing f ; , a , on (if-’, f;:*d:-,). Define =(i,,GLs + L s d :-I ).
Let BOLs denote the estimated generalized least squares estimator obtained by regressing fi;“2Z, on /;~1’z2f-,. (a) Find the asymptotic distribution of
- 1 0 2
n1’2($Ls - e, R,,,, - k,, i2.0Ls - k,)’ .
n1’2[$Ls - e, R,,,, - k,, Rz2.GLs - k2)’ . (b) Find the asymptotic distribution of
(c) Consider the conditional maximum likelihood estimators bL, and lZaML obtained by minimizing
EXERCISES 545
where h, = k, + k,(Z,_, - flZ,-2)2. Find the asympt~tic distribution of n ; e, &,+,, ; k,, &ML - Q'. Compare the limiting distributions 112
of eotsT eGu, a d eML.
C H A P T E R 10
Unit Root and Explosive Time Series
In Chapter 9, we studied time series in which the mean was a function of time. A second important class of nonstationary time series are those that satisfy nonstationary stochastic difference equations. We begin OUT discussion with least squares estimators for the univariate unit root autoregressive process. Alternative estimation procedures for unit processes are discussed in Section 10.1.3. Tests based on the metbods of Section 10.1.3 have been demonstrated to be more powerful against the unknown mean stationary alternative than tests based on ordinary least squares regression. The remaining sections are devoted to auto- regressive processes with a root greater than one, multivariate unit root auto- regressive processes, and time series with a moving average unit root.
10.1. UNIT ROOT AUTOREGRESSIVE TIME SERIES
10.1.1. The AutoregregPlve procesS with a Unit Root
Consider the first order time series
Y,=pY,-, +e, , t = l , 2 ,..., Y 0 = O ,
( 10.1.1)
where the e, are normal independent (0. a*) random variables. In Section 2.1, we showed that the solution to the difference equation is
1 - 1
U, = 2 1-0
(10.1.2)
One case of particular interest is obtained when p = 1. Then Y, is the sum of t independent (0, a2) random variables. The time series (10.1.1) with p = 1 is sometimes called a random walk.
546
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
UNIT ROOT AUTOREGRESSIVE TIME SeRIEs 547
By the results of Section 8.1, the maximum likelihood estimator of p conditional on Y,, for any real valued p, is
and
(10. I .3)
(10.1.4)
If p = 1, then Y, = Xj-, el and the mean and variance of the numerator and the denominator of (10.1.4) can be computed. We have
= +n(n - I)aZ, (10.1.5)
(10.1 6)
(10.1.7)
Normality was used in obtaining (10.1.6) and (10.1.8). The other results remain true for e, that are iid(0, a').
These results differ in a notable way from those obtained for (pf C 1. When fpl< 1, every entry in the covariance matrix of (x:=~ Y,- le,, ~ y = ~ Y:- is of order n , ~ t h e l i m i t o f t h e e x p e c ~ v ~ u e o f n - ' ~ ~ , , ~ ' isaZ(l-p2)-' . ~fp-1, the variance of the numerator of (10.1.4) is order n instead of order n. Also, the mean of the denominator of (10.1.4) is of order n2, and the variance is order n4. The denominator divided by 0.5n(n - l)az gives a random variable with mean one and variance 4(nz - n + 1)[3(nz - n)]-'. Because the variance converges to 413, the normalized denominator does not converge to a constant in mean square.
The moment results and the fact that Z Yf is a quadratic form in (el, e,, . . . , en) can be used to show that the enur in j3, as an estimator of one, i s O,(n-'>. Thus, for p = 1, the least squares estimator converges in probability to the true value more rapidly than does the estimator for ipi < 1.
i:
Lemma 10.1.1. Let f i be defined by (10.1.3), and assume Y, satisfies (10.1.2) with p = 1, where the e, are normal independent ( 0 , ~ ' ) random variables. Then
548 UNIT ROOT AND EXPIABIVE TIME SERIES
- 1 = O,(n-').
Proof. The denominator of $ a n be written as n
yf-, = e'Ane , r=2
where e' = (el, e,, . . . , and
n - 1 n - 2 n - 3 . * *
n - 2 n - 2 n - 3 . * . ( 10.1.9)
... The matrix A, can be diagonalized by the (n - 1) X (n - 1) orrhonormal matrix M, whose 0th element is
m,,, = 2(2n - 1)-"2cos[(4n - 2)-'(2j - 1)(2i - 1)vJ. (10.1.10)
Then
M,A,M: = A,, 3 diag(A,,, 4,,, . . , An- ,,,,I *
Where
A ~ , = $sec2[(n - i ) 4 2 n - I)-'] (10.1.11)
for i = 1,2, , , . , n - 1. S e e Rutherford (1946). Hence, the denominator of 3 - 1 can be written as
" - 1 .. . e'A,e = 2 Al,U;, ,
I = I (10.1.12)
where n - 1
uin = c mni,e, s i = 1 , 2 )..., n - 1 , I= 1
and the ZJ,, are NI(0,o') random variables. By the moment results (10.1.5) and (10.1.6)
" - 1
UNIT ROOT AUTOREORESSIVE TlME SERIES 549
If we fix i and take the limit as n -+ 03, we have 2 lim n-'Ain = 7 r EZ 4[(2i - l ) ~ r J - ~
n -w
for i = 1,2,, . . . Therefore, the largest root, A ~ , , increases at the rate n2 as n increases. Given E 2 0,
P{Uf, < 2E).
Because U:,, is a chi-square random variable, given A > 0, there exists an E > 0 such that P{Uin < 2 ~ ) < A. It follows that, given A > 0, there exists a KE-' such that
P{(e'A,e)-'n(n - I ) > KE-'} < A
for all n and, by (10.1.6),
Because n-2 X:G2 Y,-,e, = OJn-'), it follows that a - 1 = Op(n-')+ A
The numerator of - 1 can be written in an alternative informative manner as
Hence, as n + w ,
2(n(r*)-' i: q-le , + t ~ ' x : ,
where ,y: is the chi-square distribution with one degree of freedom. The probability that a onedegree-of-freedom chi-square random variable is less than one is 0.6826. Therefore, because the denominator is always positive, the probability that /i < 1, given p = 1, approaches 0.6826 as n gets large. Although a chi-square distribution is skewed to the right, the high correlation between the numerator and denominator of /i - 1 strongIy dampens the skewness. In fact, the distribution of /i displays skewness to the left.
To establish the limiting distribution of the least squares estimator of p, given that p = 1, we will use the following lemma.
r=2
Lemma 10.12. Let {U,n: 1 s t G n, n 2 1) denote a triangular array of random variables defined on the probability space (a, d, P). Assume
550 UNIT ROOT AND EXPLOSIVE TIN SElUES
E W , , , u:n, U,,P,,)l= (0, c2, 0)
for 1 G t # s C n. Let {wi: i = 1,2, . . .) be a sequence of real numbers, and let {win: i = 1,2,. . . , n; n = 1,2,. . .) be a triangular m y of real numbers. If
and
l imw, ,=wi , i = l , 2 ,..., n-m
then
I'I i= I
Proof. Let e > 0 be given. Then we can choose an M such that
and
for all n > M. Furthermore, given M, we can choose No > M such that n >No implieS
€ M
(Y2 z (Wi" - wi)2 c I = 1
and
36 1=M+I 9'
Hence, for all n >No,
i= I i - I A
We now give the primary result for the distribution of the least squares estimator of the autoregressive parameter when the true process is a random w a k
Tbeorem 10.1.1. Let
Y,=Y,-,+e,, t = l , 2 ,...,
UNIT ROOT AUTOREGRESSIVE TIME SERlES 551
where Yo = 0. Assume {ef}r=l satisfies
E { ( e f , e ~ ) I ~ - . l } = ( ~ , a’) as.,
E{le,(2+dldf-1}<M<~ 8.5.
for some S > 0, where 4 is the sigma-field generated by (el, e,, . . . , er}. Then
n(b - 1 ) P ’ (2G)-’(T2 - 1) ,
where 6 is defined in (10.1.3),
3: = (-l)‘+I 2[(2i - 1)wI-l ,
{Vi}r= I is a sequence of M(0,l) random variables, and (G, T) is defined as a limit in mean square.
Proof, The estimator b is scale invariant. Hence, there is no loss of generality in assuming cr2 = 1. LA
From the definition of p and from (10.1.13). we have
= (2G,,)-’(T,2 - 1 ) + oP( 1) ,
where n-’ X:=, e: = a2 +o,(l) by Corollary 5.3.5.
distribution to (G, T). Let To establish the limit distribution result, we show that (G,,, T,) converges in
U, = Win, U,,, . . . U,,-l.n)t = Mnen I
where en = (el, e,, . . . , en- (10.1.10). Then
and M, is the matrix with (i, j) th element given by
T,, = n-It2 JLe,, = n -112 J,M, I I U, ,
where J,, = (1, 1 , . . . , 1)‘. Let k,,, be the ith element of n-”’J;M,’. The element k,, is the covariance between T,, and Uin. Using this fact and the results of J o k y (1961, p. 78). it can be shown that, for fixed i,
lim kin = 21’27 . n +m
552 UNIT ROOT AND EXPLlXIVE TIME SERIES
See Dickey (1976, p. 29). Also, 2 Z;",, y: = 1; see Jolley (1961, p. 56). Let
* where A, is defined in ( 10.1.1 1 ). Then Tn - T,, converges in probability to zero by Lemma 10.1.2. Also,
because
The entries in nl1'Mn are bounded, and the sum of squares of each row of M , is one. Therefore, by Corollary 5.3.4, for fixed 1,
3 Win, u,,, * * . * UJ+ (U1, up * - * 9 UJ 9
where (U, , U,, . . . , V, ) - N(0, I). Note that
Because the sequence {y:}Tel is summable, the vector
converges to zero in probability as 1 tends to infinity, uniformly in n. Therefore, by Lemma 6.3.1, (G,,,Tn) converges in distribution to (G,T) and the result is established. A
We gave the proof of Theorem 10.1.1 for e, that are martingale diffkrences. The
The limiting distribution of Theorem 10.1.1 can be obtained by using Theorem result also holds for e, that are iid(0, a2) random variables.
5.3.5 and 5.3.6. By Corollary 5.3.6, with u2 = I ,
Gn, T , ) s (G, T) ,
where
UNIT ROOT AUMRBaRESSIVE TIME SERlEs 553
( 10.1.14)
and W(t) is the standard Wiener process defined in Section 5.3. In Theorem 10.1.1, we assumed Yo = 0 to simplify the presentation. Because
for any finite Yo, the conclusion of Theorem 10.1.1 is true for any finite Yo. The first part of Table 10.A.l contains percentiles for the empirical distribution
of a given that p = 1. Normally distributed errors were used to construct the table for n = 25,50, 100, 250, and 500. The total number of observations available is n, while n - 1 sums of squares and products are used in calculating the regression coefficient. The entries for n = 00 are the percentiles for the limiting distribution of Theorem 10.1.1.
The table may also be used for the distribution of a for p = - 1, since, for symmetrically distributed e,, the distribution for p = - 1 is the mimr image of the distribution for p = 1. If the e, are not symmetrically distributed, the limiting distribution for p = -1 remains the mirror image of the limiting distribution for p = 1.
Corollary 10.1.1.1. Let the model of Theorem 8.5.1 hold. Then
lim (P@ - p > a I p = 1) - P@ - p < -a I p = -I}) = o n-roo
for aU real a. If, in addition, the e, are independently and identically distributed with distribution that is symmetric about zero, then
P { p p > o I p = l } = P @ - p < - u I p = -1)
for all real a and all n 2 2 .
Proof. Let X, = X;:: (-l)’e,-j. In a manner similar to that used to obtain (10.1.13), we have
Therefore, for the estimator of p when p = - 1, P{b + 1 <a} is
554 UNIT ROOT AND EXPLOSIVE TIME SERIES
Although the sign of e, in the weighted sum t > i, the sign is always opposite of that for
(- l)’eI-j is not the saxne for all and e,,,, and it follows that
If the el m iid(0, a2) and if the distribution of el, t = 1,2, . . . , is symmetric, then the distributional pmperties of the sequence (-el, e2, -e3, . . .) are the same as the distributional properties of the sequence (el, e2, eg, . . .), and we conclude that, for symmetric el and for any a,
!
Also, the large sample distribution of given p = -1 is the mirror image of the distribution for p = 1 for e, that are not symmetric, because the V, have a limiting n d distribution for the sequence (-el, e,, -e3’. . .) as well as for the sequence
A (el, e,, e3,. . .I. A ~ t ~ r a l statistic to use in testing the hypothesis that p = 1 is the test statistic
one would calculate in ordinary linear regression:
where
Cordlarg 10.1.1.2. Let the model of Theorem 10.1.1 hold. Then
+A (4G)-1’2(T2 - 1) p
where + is defined in (10.1.15).
Proof. We have
(10.1.15)
n
UNIT ROOT AUTOREGRESSIVE " M E SERIES 555
Therefore, s2& a2 by Corollary 5.3.8. Because
= (4G,)-"'(T: - 1) + aP(l),
we have the result. A
Percentage points for the empirical distribution of ? are given in Table 10.A.2. The percentiles for n = 00 are the percentiles of the distribution of Corollary 10.1.1.2.
To extend the results for the first order process with p = 1 to the pth order autoregressive process, we consider the time series
I v,=zz);, t = l , 2 )..., ( 10.1.16)
where {Zl: t E ( O , r t l , 22, . . . )} is a ( p - 1)st order stationary autoregressive time series with the representation
j = 1
D
2, = c @,-,+, + e, , i-2
(10.1.17)
the e, iue martingale difference random variables with the properties defined in Theorem 10.1.1, and the absolute value of the largest root of
(10.1.18)
is less than some A < 1. By (10.1.16) and (10.1.17), we may write P ~ , + z ajq- ,=e , , t = p + l , p + 2 , ..., (10.1.19)
I" I
wherep- 1 of the roots of P
m p + 2 (rim'-' = o
are thep - 1 roots of (10.1.18). and the remaining root is one. In a problem where a root of one is suspected, it is operationally desirable and theoretically convenient to write the autoregressive equation so that the unit root is isolated as a coefficient. To this end, we write
j 5 1
( 10. I .20)
556 UNIT ROOT AND ExpLOsNE TIME SERIES
for t = p + l , p + 2 , ..., where p a 2 , 4=Z,P,ia,, i = 2 , 3 ,..., p , and 0,= -Z;=, a,. If there is a unit mot, 0, = 1.
2, on Z,-,, ZJ-2, . . . , Zl-p+l and obtain an estimator of (02, 0,, . . . , 0,). say (6, &. . . , a,), with the properties described in Section 8.2. It is interesting that, given 8, = 1, the regression of U, on Y,- z,-, , . . . , Z, yields an estimator of el, say a,, such that the large sample distribution of nc(8, - 1) is the same as that of n(b - l), where b is the estimator (10.1.3) and c is a constant defined in Theorem 10.1.2. The large sample distribution of ( 4 , &, . . . 6,) is the same as that of the regression coefficients in the regression of 2, on q-,, Zt-2, . , . , Z,-,+,.
If one knew that 0, = 1, one would regress U, - Y,-l
Theorem 10.1.2. Let U, be &hed b (10.1.16). Suppose the el satisfy the assumptions of -em 10.1.1. k t d'=(#l, $ , . . ., 8,)' be the vector of regression coefficients obtained by regressing U, on U,- , , 2,- . . . , Z,-,+ ,, t = p + 1, p + 2,. . . ,n, and define D,, to be the diagonal matrix
I I2 D,, = diag{n, n"', . . . , n } . &fine = z;*,, wi = (1 - zTm2 ql, WIWE
and the w, satisfy the homogeneous difference equation with characteristic equation (10.1.18) and initial conditions wo = 1 and w, = 0 for i C 0. Let
where W, = Zj=lej, and (a2, a,, . . . , a,)' is the vector of regression coefficients obtained by regressing Z, on ZJ.-l, Zl -2 , . . . ,z,-p+l. Then
plim D,( d - #) = 0. n-+m
112 Furthermore, n(d1 - 1) is independent of n (0, - 4,. . . , 8, - 8,) in the limit.
hof. Using the definition of 2, as the weighted sum of past el, we can write
I / m \ m i + l
= cW, + Q, + R , ,
where Q, =xj:;=I gjet-j-t-1, gj = -XT,, wt, and R, = XT.o e-] Z::;+, w,. Since the absolute value of the largest root of ( 10.1.18) is less than A C 1, by Exercise 2.24, lw,l is less than MIAi and lgjl is less than M2Aj for some finite MI and M2. It
UNIT ROOT AUTOREGRESSIVE TIME SERIES 557
follows that E(Q,Q,,,} and E{ZfZf+,} are bounded by a multiple of Also, the expectations of qQj, KZ,, n-'Y,q, and R: are bounded. Using these results, it can be shown that
(10.1.21)
and
where C, is the (p - 1) X (p - 1) matrix of sums of squares and cross products of Zf-I, ZfV2,. . . , Z,-,+,. The lower right (p - 1) X (p - 1) submatrix of A, is C,,. Furthermore, the ijth element of n-'C, is converging in probability to ~ ( i - j ) . By the convergence results for n-'X Wf-l, we have D,'B,,D,' = OJ1) and D,B;'D, = o~(I). Therefore, using equation (10.1.21),
D,(A;' - B;')D, = q i ) .
Similarly,
n n
. . . . r=p+l 1 = p + l
and the probability limit result is established. From (10.1.13).
n-' f: W,-,e, = 0.J(n"2e')2 - 0 . 5 ~ ~ + Op(n-1'2).
Let U.,,,=CV,,,U,,,, ..., U,,X X I = ( Z , - I . Z,-.2 .... . Z f - p + l ) . and A = E O F , h where Ui, is defined in (10.1.12). Now
, - p + l
UNIT ROOT AND EXPLOSIW TIME SERlEs
wbere mnit is defined in (10.1.10) and 6 , I , . . . , blr, b,, b3, , . . . , b3,p- are fixed constants, satisfies the assumptions of Theorem 5.3.4. Therefore,
for every fixed 1. It follows, by Lemma 6.3.1, that
converges in distribution to (G, T, g'), where (G,,, T,) is defined in the proof of Theorem 10.1.1, f is a N(0, Ia2) random variable, and (G, T ) is independent of 5. A
Corollary 10.131. Let the assumptions of Theorem 10.1.2 hold. Let
Then
piim [ K J ~ - 6 ) - kn(@ - e)j = 0 , n+m
where y, c, d, and 6 are defined in Theorem 10.1.1.
Proof. By the proof of Theorem 10.1.2,
by 'Theorem 6.3.5, the result follows. A
On the basis of Corollary 10.1.2.1, Table 10.A.2 can be used to investigate the
UNIT ROOT AUTOReGRESSIVE TIME SERIES 559
hypothesis that one of the mots in a pth order autoremsive process is unity. That is, the pivotal statistic for 8, computed by an ordinary least squares program,
9= [q{dl)]-’/2(J, - I ) , (10.1.22)
where O{dl> is the ordinary least squares estimator of the variance of J,, has the limiting distribution given in Corollary 10.1.1.2. Also, the regression pivotals can be used to test hypotheses about the mfficients of the stationary process. For example, thc hypothesis that the original process is of order p - 1 is equivalent to the hypothesis that ap = 0, which is equivalent to the hypothesis that 19~ = 0. A test statistic for this hypothesis is the ordinary regression pivotal statistic
rp = [ O { t @ - ” z ~ p ,
where O{ap} is the ordinary least squares estimator of the variance of JP. Under the assumptions of Theorem 10.1.2, tp is asymptotically normal when 8, = 0.
We note that the coefficient for Y,- in the regression of Y, on Y,- l , Z,- ,, . . . , Z,-p+l is not the largest root of the fitted autoregressive equation. To see this, consider the second order process. The fitted equation is
= (A, + - fild12Y,-z,
where (d,, hill) are the two roots of the estimated characteristic equation. It follows that the estimated equation in Y,-l and Z,-l is
= [Al + h,(l - dl ) ]q- , + h,A2(q-1 - q-,) * Also, the error in the coefficient of q-, as an estimatur of one is
Because c = (1 - rn2)-’. the limiting distribution of n(hl - 1) is the same as the limiting distribution of n(j3 - 1) given in Table 10.A.l. We state the generalization of this property as a corollary.
Corollary 10.13.2. Assume that the model ( 10.1.16)-( 10.1.17) holds, and express Y, in the form (10.1,19), where one of the roots of
P
mp + C aim’-’ = o
is equal to one and the other roots are less than one in absolute value. Let &’ = ( G1, 4, . . . , $) be the least squares estimator of a estimated subject to the restriction that & is real, where (fi,, f i 2 , . . . , riip) are the roots of the estimated characteristic equation with rii, a lliizl 3 - - - B IriiJ. Then
i = l
560 UNIT ROOT AND EXPLOSIVE TIME SERIES
n(fii, - 1 ) 3 (2C)3T2 - 1), where (G, T) is defined in Theorem 10.1.1.
Proof. By (10.1.16) and (10.1.17). we have P P - 1
mp + simp-' = (m - 1) i = l
aI - 1 =Op(n-') , and 4- 0, =Op(n-"2) for i = 2 , . . . , p , where the 4, i = 1,2,. . . , p, are the regression coefficients of Theorem 10.12. Therefore, by Corollary 5.8.1, the roots of
P
m p + x & m p - t = o , 1=1
where the relation between and is defined by (10.1.20), converge in probability to the roots of the equation defined with (a1,. . . , a p ) . The estimated polynomial for the ordinary least squares estimator evaluated at m = 1 is
Using the fact that 8, = -XTeI hi, and the limiting result for the roots,
This equality also demonstrates that given c > 0, there is some N, such that for n > N,, the probabiiity that h computed with the ordinary least squares estimator of (I; is re!al is greater than 1 - e. The conclusion follows because we demon- strated in the proof of Theorem 10.1.2 that
A
For the model (10.1.1) with lpl< 1, the limiting behavior of the estimator of p is the same whether the mean is known or estimated. The result is no longer true when p = 1. Consider the estimator
(10.1.23)
UNIT ROOT AUTOREaRBSSIVE TIME SERIES 561
where n
[Y(o),Y(-I)l= (n - 1 1 - I c (q, q - , , . r=2
When p = 1, we have (3-1n)-1’2j7 4 N(0,l) and this random variable makes a contribution to the limiting distribution of n(b, - 1). Also see the moment results of Exercise 10.4. The limit random variable associated with 9 is denoted by H in Theorem 10.1.3.
Theorem 10.13. Let the assumptions of Theorem 10.1.1 hold. Let b, be defined by (10.1.23), and let
7; = [P@,}1-”2(4 - 1) . (10.1.24)
Where
and
Then
and
P 2 112 ?,,d (G - H )- [0.5(T2 - 1) - TH] ,
where
iJ, -NI(O, l), W(t) is the Wiener process, and G, T, and 10.1.1.
are defined in Theorem
Prod. The method of proof is the same as that of Theorem 10.1.1. Let
and define U, and Mn as in the proof of Theorem 10.1.1. Then
562 UNIT ROOT A M , EXPLOSIVB TIME S E W
-312 whereL:=n (n- l ,n-2, ..., l).Letql,, betheitheIementofL:M,'.This element is the covariance between H, and U,, and it can be shown that for fixed i
112 2 limq,,=2 Y i n - w
The maainder of the proof follows that of Theorem 10.1.1. A
The second part of Table 10.A.1 contains empirical percentiles for the distribution of n(b, - l), and the second part of Table 10.A.2 contains the empirical percentiles of the corresponding studentized statistic given that p = 1. The values for n = 00 are tbe percentiles of the Iimiting distribution of Theorem 10.1.3.
It can be demonstrated that plimn(fi, - /i) = 0 when p = -1. That is, estimating the mean does not alter the limiting distribution when p = -1. Therefore, the first part of Tables 1O.A.1 and 10.A.2 can be used to approximate the distributions of 6, and 7- when p = -1.
The distributions of Theorem 10.1.3 are obtained under the model (10.1.1) with p = 1. This point can be emphasized by writing an extended made1 as
U, = (1 - p)p + pY,-] + e, ,
where p and p are unknown parameters. The null model for the test 7 j is p = 1. The extended model then reduces to (10.1.1). The alternative model with p # 1 permits a nonzero value for 6, = (1 - p)p. Thus, the test based on ?& is invariant to the mean p of the alternative model.
The limiting distribution of n3"(8, - I), where dl is the bast squares estimator and 4, = 1, for the model
Y,=B,+61Y,-, +e,
with 0, # 0 is normal, and is discussed in Section 10.1.2. It follows from Theorem 10.1.3 that the distribution of the ordinary least
squares estimator of 6, for the model (10.1.1) with p = 1 is not normal. Let AY, = Y, - Y, - and consider the equation
AU, = 6, + <el - l)U,-' f e, , (10.1.25)
with the unknown e, equal to zero. The estimated regression equation can be written
where y(-ll is defined in (10.1,23), AP- (n - l)-' Xy-2 AX, and
UNIT ROOT AUTOREGRESSIVE TIME SERIES
Under the null model, AY, = e, and AT= E. It follows that
io = B - (8, - l)y(-l)
and
n 1 ’ 2 d , , z T - (G - H2)-’[0.5(TZ - 1) - TH]H,
563
(10.1.27)
where G, T, and H are defined in Theorem 10.1.1 and Theorem 10.1.3. Dickey (1976) and Dickey and Fuller (1981) have discussed the distribution of nI’z80. Under the normal distribution null model, Ahp of (10.1.26) is distributed as a normal random variable. Also, because il is a quadratic function of (Y,, . . . , Y,), AT is uncorrelated with dl for normal e,. Therefore, estimating the equation in the form ( 10.1.26) yields two uncorrelated estimators whose null distributions are easily interpreted.
The distributional results for the estimator of the parameters of the first order autoregressive model with intercept extend to the pth order process in much the same manna as the results for the model with no intercept. It follows from Theorem 10.1.4 that the regredon pivotal calculated to test 6, = 1 has the tabled distribution ?* when the fitted made1 has an intercept.
Theorem 10.1.4. Let Y, be defined by (10.1.16). Suppose the assumptions of Theorem 10.1.2 hold Let 8= (4, Jl, d2, . . . , i,,)’ be the vector of regression coefficients obtained by regressing Y, on (1, Y,-l, Z,-.l,. . . , Z,-p+l 1. t = p i- 1, p + 2,. . . , n. Define Dn to be the diagonal matrix
D, = diag(nIt2, n,n1’2,n1’2, . . . , n I I 2 ),
and let 8- 8 be the vector such that #o = $o;
- 1 n-1 where W, and c are defined in Theorem 10.1.2, and = (n - 1) Z,,, 5, and ( g2, &, , . . ,8,) is the vector of regression coefficients obtained by regressing 2, on (Z,- I , Z, - , . . . , Z, -,,+, ). Let 1;T be the ordinary regression pivotal computed by dividing 4 - 1 by the regression estimated standard error of
plimDn(8- 8 ) = 0 ,
Then
n-*m
n(J1 - 8,) is independent of n”*[i2 - 0,. . . . , JP - 6,J in the limit, and the limiting distribution of T; is the limiting distribution of ?,, given in Theorem 10.1.3.
564 UNIT ROOT AND EXptoSIVE TIME SERES
PmP. omitfed. A
There has been extensive research on models with unit roots. We give only a few references. Hasza and Fuller (1982) studied models with two unit roots. Dickey, Hasza, and Fuller (1984) gave results for seasonal models, and Tiao and Tsay (1983) gave results for models with complex mots. Said and Dickey (1984) show that fitting a high order autoregressive model is an appropriate way to test for a unit root in a model of unknown form. Chan and Wei (1988) and Pantula (1989) give results that cover models with a number of unit roats. Willips (1986, 1987a-c, 1988) and his co-workers have given many results on both univariate and vector processes. Shin (1990) and Yap and Reinsel (1995a,b) give tests for a unit root in the autoregressive portion of an autoregressive moving average. Ahtola and Tiao (1984). Chan (1988), Chan and Wei (1987), and Phillips (198%) have studied the behavior of estimators when p is close to one but not equal to one.
Example 10.1.1. In this example we analyze the one-year treasury bill interest rate for the period January 1960 through August 1979. This is one of three interest rate series studied by Stock and Watson (1988). The data are given in Table 10.B.1 of Appendix 10.B. We postulate the time series to be an auto- regressive process, and we are interested in testing the hypothesis that the process has a unit root. In this example, we use the ordinary least squares method of fitting. If we fit a sixth order autoregressive pmess, the fitted equation is
t= 0.104 + 1.332 Y,-, - 0.453 Y , - p + 0.127 FY3 (0.064) (0.067) (0.1 11) (0.115)
+ 0.043 q - 4 - 0.079 5 - 5 + 0.014 q - 6 (0.1 15) (0.1 11) (0.067)
and the regression residual mean square is 0.0833. Tbere are 230 observations in the regression, and the residual mean square bas 223 degrees of freedom. The coefficients for yt-3, yt-,,, x+ , and Y,-6 am small relative to the least squares standard errors. If we drop Te6 from the regression, the t-statistic for yt-s is - 1.036. If we drop x-.5 f m the regression, the t-statistic for Y,-4 is -0.884, and if we drop xa4 from the regression, the t-statistic for yt-3 is 1.427, By Theorem 8.2.1, the t-statistics are approximately normally distributed for a stationary process. By Theorem 10.1.4, the t-statistics are appximately normally distributed if the process contains a single unit root. Thus, the testing procedure is appropriate in either case. The F-test for the coefficients of qq4, q-,, and q - 6 , computed as the sum of the squared t-statistics divided by 3, is 0.63. We compute the F-statistic in this way because different numbers of observations were used in the different regressions. The value of F is such that one easily accepts the hypothesis that the three coefficients are zero. The t-statistic for the coefficient of q-3 is such that one could either drop the coefficient or retain it in the analysis. We choose to retain the coefficient, and we proceed with the analysis using the third order model.
UNIT ROOT AUWReORESSIVE TlME SERlES 565
We are interested in testing for a unit mot against the alternative of a stationary process, but we are willing to entertain the possibility of two unit roots. Dickey and Pantula (1987) suggest a sequential testing procedure in such situations. One begins with the greatest numbex of unit mots one is willing to consider. In our case, we begin by testing for a second unit root under the maintained hypothesis of a single unit root. We are willing to assume there is no intercept in the model for the differences. The estimated regression is
where the numbers in parentheses are the regression estimated standard errom. Because the T-statistic +=(0.076)-’(-0.763) = -10.09 is less than the 1% tabular value of -2.58 in the first part of Table 10.A.2, we reject the hypothesis of a second unit root.
The regression equation for the test of a unit root in the original time series is
A t = 0.082 - 0.012 &-, + 0.343 A q - , - 0.094 Aq-, . (0.062) (0.01 1) (0.065) (0.066)
Because the +- statistic of -1.09 for the coefficient of &-, is greater than the 10% tabular value of -2.57 from the second part of Table 10.A.2, we would accept the hypothesis of a unit root at that level. The analysis of these data is extended in Example 10.1.2. AA
10.1.2. Rmdom Walk with Drift
In this section, we obtain the limiting distribution of the least squares estimator for (O,, 8,) of the model
eo+el&-,+e, , t = 1 , 2 ,..., &’{Yo, t = O , ( 10.1.28)
under the assumption that 8, f 0 and 8, = 1. The made1 (10.1.28) with 8, # 0 is sometimes called a random walk with d$t, where 8, is the drift panuneter. Under ( 10.1 28)
y, = Y, + eot + w, , (10.1.29)
where = Z;=l e,. When 0, # 0, the time trend wiU. dominate the long-run behavior of Y, of (10.1.29). This is because W, = QP(t1”) and hence W , is small in probability relative to t. The least squares estimator of 6 = (t90, 8,)’ is 4 = (do, J,)’, where
566 UNJT ROOT AND I 'Spu)sIVE TIME SERIES
$0 = Y(0) - #lY(-l) (10.1.30)
y ( , ) = ( n - l)-I 2 y,+, , i = O , -1 . I -2
We give the limiting distribution of the least squares estimator under the assumption that O0 # 0 and el = 1.
Theorem 10.13. Let the process Y, satisfy (10.1.28) where {e,) satisfies the conditions of Theorem 10.1.1 or is a sequence of independent identically distributed (0, a') random variables. Assume 4 ZO and 0, = 1. Let the least squares estimator be given by (10.1.30). Then
Proot. We have
and
where T = 0.5(n + 1) and = (n - l ) - ' Z:-2 K-l. By our previous results, n
Similarly,
and
UNIT ROOT AUTOREGRESSIVE TIME SERIES 567
nl"($ - N(0, 4v2) .
The joint limiting distribution follows by Theorem 6.3.4. A
The regression estimated equation for the model (10.1.28) can be written in the form
AU,=AF+(dl - l)(U,-] -y(-,,), (10.1.3 1)
i n t r o d d in (10.1.26). Tbis form has the advantage that AF is asymptotically normally distributed with mean 8, for all 8, when @, = 1.
A time series of the form (10.1.28) with 8, # 0 will tend, over time, to increase if 6, > 0 and to decrease if 0, < 0. A natural alternative model for a time series displaying such behavior is as the sum of a stationary time series and a time trend. Let
U, = 4) + 4,t +X,, (10.1+32)
where X, is an autoregressive time series satisfying
By substituting (10.1.33) into (10.1.32). we obtain
P
U, = e, I- ~lr t + el?-, + C gcv,,+, - q,) + e, , ( 1 0 . 1 . ~ j = 2
where I&, = Qio( 1 - 8,) + 0, q5, - 4, ZiS;, tj and el = c#+( 1 - 8,). Thus, one way to investigate the hypothesis that U, contains a unit root is to fit the regression model (10.1.34) by ordinary least squares and test the hypothesis that 8, = 1. The distribution of the test statistic differs fiwm that obtained when time is not included in the regression.
Theorem 10.1.6. Let Y, satisfy (10.1.34) where {eI}rPI is such that
E{(e,, I &,-,I = (0, a2) B.s.,
~ { l e , l ~ + ' J ~ - , } < ~ < ~ a.s.
for some 6 > 0 and t = 1,2, . . . , where dI is the sigma-field generated by (e,,e,, . . . , e,). Assume 89= I .
Let ($,, I$,, 6,. 4,. . . , ep) be the ordinary least squares regression coefficients, and let
568 UNIT ROOT AND EXXPUISIVE TIME SERIES
be the ordinary r e p s i o n pivotal for 6,. Then
d ?T+ OS(G - H 2 - 3K2)-L'2[(T - W ) ( T - 6K) - I] ,
where
W(f) is the Wiener process, and T, G, H, x , snd U, are defined in Theorem 10.1.1 and Theorem 10.1.3. Also, the limiting distribution of
n112[(d2 - e;), (83 - e;), . . . , (4 - e;)i is the same as the limiting distribution of the least squares estimator obtained by regressing AK on . . ,
Proof. Omitted, See Dickey and Fulier (1979). A
In (10.1.34). we expressed Y, as a function of the vector (1, t, K-,, AT-, , . . . , AK-,,+,) and computed the estimates by ordinary least squms. A second estimation procedure that has been heavily used in practice is to adjust the observations by subtracting the mean functions as estimated by ordinary least squares. The conclusions of Theorem 10.1.4 and of Theorem 10.1.6 also hold for the coefficients obtained in the regression of y, on ( Y , - ~ , Ay,-, , . . . , Ay,-J where y, is the deviation of & from the estimated mean function.
10.13. Alternative Estimators In the stationary case, we presented a number of alternative estimators of the parameters of the autoregressive process. Tbe maximum likelihood estimator associated with (8.1,9), the least squares estimator (8.2.6), and the weighted symmetric estimator associated with (8.2.14) all have the same limiting dis- tribution when is a stationary autoregressive process. However, the limiting distributions differ if the pmcess has a unit root.
We begin our discussion with the first order model and let
+e,, t=2,3, ..., t ' 0 , (10.1.35)
where the e, are independent (0, a') random variables. The simple symmetric estimator for p minimizes (8.2.14) with w, ss 0.5 and is
( 10.1.36)
UNIT ROOT AUTOREOReSSlVE TIME SERIES 569
The error in the simple symmetric estimator is
(10.1.37)
and if p = 1, the erzo~ reduces to - I n
p$ - 1 = - [ <Y," - Y :I + 2 i: Y:- c e; , (10.1.38) t -2 r=2
where we have used (10.1.13).
Q, is (8.2.14) with w,=O5. Then An estimator of u2 associated with (10.1.36) is 5: = (n - 2)-'Qs(pS), where
If this estimator of variane is used to construct a pivotal statistic, the pivotal reduces to a function of A. Letting
tF.,2[0.5(Y: + Y,') + 2 Y:]-' (10.1.40) 1=2
we have
.i, = -(n - 2y2(1 + pJ-L12(1 - p y 2 . (10.1.41)
The estimator constructed with the weights of (8.2.15) is called the weighted symmetric estimator and can be written as
- I n
a - ( 10.1.42) r-2
The error in the weighted symmetric estimator when p = 1 is
(10.1.43)
The pivotal statistic for the weighted symmetric estimator is
- 1 -112
rw= A [ - 2 p 1 Y : + n - l e Y : ) cr, ] ( f i W - - 1 ) , (10.1.44) 1-2 1-1
where 8; is the estimator of u2 constructed by dividing Qw(pw) by n - 2. The
570 UNlT ROOT AND W O S I V E TIME SERIES
Iimiting distributions of the symmetric estimators follow from (10.1.38) and ( 10.1.43).
Theorem 10.1.7. Let the assumptions of T h m m 10.1.1 hold, and let /?, be by (10.1.42), and ?w by (10.1.44). Then defined by (l0.1.36), .i, by (10.1.40),
P n(P, - I)--+ -(2G)-’, Y .i,+ -0.5G-’”,
n(&, - 1)-% 0.5G-’(T2 - 1) - 1 , P
?w--+ 0.SG-”2(TZ - 1) -
where G and Tare defined in ~ m m 10.1.1.
Proof. From (10.1.38) we have
= -0.5[ n-2 .:I - l a 2 + OJl) ,
and the first result follows from Theorem 10.1.1. The limiting distribution for 7: is a consequence of (10.1.41).
Using (10.1.43), and the limiting distributions of n-’X;=., 1 , and of n-’ 2rS2 Y,-,e, derived in Theorem 10.1.1, we obtain the results for n(& - 1) and eW. A
The first part of Table 10.A.3 contains percentiles of the distribution of t, and the first part of Table 10.A.4 contains the percentiles of the distribution of C. The percentiles were generated using the Monte Carlo method. The percentiles for the limiting distributions were constructed by approximating the infinite sums that define (G, T, H, K) with finite sums. The raw Monte Carlo percentiles were smoothed using a function of 1 2 - I . The standard errors of the estimates of Table 10.A.3 are generally less than 0.01, and those of Table 10.A.4 are generally less than 0.007.
- 1 is always negative, and this is reflected in the percentiles of Table 10.A.3. The percentiles of n(& - 1) are defined in tern of those of Table 10.A.3 by quation (10.1.41). The distribution of fis has somewhat smaller tails than the distribution of the ordinary least squares estimator. For example, the distance between the 0.025 and 0.975 percentiles of the limiting distribution is 11.03 for rii, and is 12.10 for b. About 10% of the values of pw are greater than one, and the 90th percentile of ?w is close to zero.
The difference
UNlT ROOT AUTOREORESSIVE TIME SERIES 571
The distributions of the symmetric estimators are a W o n of other variables included in the regression model, as is the distribution of the ordinary least squares estimator. Let xk be an estimated “mean function” for the time series Y,. Examples of j;, arr: the sample mean
the linear regression trend function
fy ,2 = b, + 6 , f , (10.1.45)
where (b,, b , ) is the vector of regression coefficients obtained in the ordinary least squares regression of U, on (1, t), and the quadratic regression trend function
fYg3 = 60 + b,t + b2t 2 . where (bo, b, , b2) is the vector of regression coefficients obtained in the ordinary least squares regression of Y, on (1, t, t2).
The limiting distributions of the .r-statistics am given in Theorem 10.1.8. The limiting distributions of n(Pg1 - I), g = s, w and j = 1,2, are obtained by squaring the denominators in the expressions. For example, the limiting distribution of n(& - I ) is the distribution of -0.5(G - H2)-’,
Theorem 10.1.8. Let the assumptions of Theorem 10.1.7 hold. Let Ps, be the estimator obtained by replacing U, with y( - f y , j in (10.1.36), and Iet tj be the pivotal obtained by replacing Y, with U, - f y f j in (10.1.40). Let &, and gW, be constmcted in an analogous manner using (10.1.42) and (10.1.44). respectively. Then
Y 2 -112 PSI+ -0S(G - H ) ?s2+ -0.5(G - H 2 - 3KZ)-*”,
?wl + (G - H 2 ) - r’2[0.5(T2 - 1) - TH - G 4- 2 H 2 ] ,
, y:
X
Y 0.5[(T - 2HXT - 6K) - 11 f (H - 3K)* - (G - H 2 - 3K2) (G - H 2 - 3K2)’I2 Pw2- 9
where G, T, H, and K are defined in Theorem 10.1.6.
Proop. The limiting distributions of the simple symmetric test statistics are obtained by replacing Y, with Y, - f y , j in (10.1.45), where - 1 is given in (10.1.38), and the limiting distributions of the weighted symmetric test statistics are obtained by replacing Y, with U, - f y t j in (10.1.44), where p,,, - f is given in
572 UNIT ROOT Ah: i EXPLQSIVE TIME SERIES
( 10.1.43). We have
"he limiting distributions of the test statistics follow from the joint Wting distribution of the components. A
The distributions of the pivotal statistics for the mean adjusted case are given in the second parts of Tables lO.A.3 and 10.A.4. The distributions for the linear trend adjusted case are given in the third parts of the tables. In addition, the distribution for the simple symmetric pivotal is given for the quadratic trend adjusted case.
The distributions of the symmetric estimators generalize to higher order processes in the same manner as the ordinary least squares estimators. Table 10.1.1 contains a data arrangement that can be used for the weighted regression estimation of the autoregressive model written as
where 2, = yt - Yl-t. If the model contains a mean function, the procedure is adjusted as described in Section 8.2.2. The hypothesis of a unit mot is tested by testing the hypothesis that @, = 1.
Table 10.1.1. Data Arrangement for Regresasron Eeumntlon 02 Antorrgresgfve Pprame- ters by the Weighted Symmetric Procedure
parameter Weight Dependent
B,
y 2 - y, y 3 - y2
... Variable 81 82
... y, - Yp-1 WP' l Yp, , YP
w p + 2 u,+, Y P + l y,+, - y, ...
UNm ROOT AUTOREOReSSIVE TIME SERIES 573
Theorem 10.1.9. Let the assumptions of Theorem 10.1.2 hold. Let w,, 0.5, and let wf, be defined in (8.2.15). Let t& be the value of 8 that minimizes
2 i: wtg(Y1 , - 8 l Y f - l . j - e z z , - 1 , j - * * * - 8pZt -p + 1, j 1 f = p + l
pI.oof. omitted. A
The estimator of p for the autoregressive process (10.1.35) constructed by maximizing the n o d stationary likelihood has a different limiting distribution than either that of ordinary least squares or those of the symmetric estimators when p = 1. The limiting distribution of that estimator, which we call the maximum likelihood estimator, has been given by Gonzalez-Farias (1992).
Theorem 10.1.10. Let the assumptions of Theorem 10.1.7 hold. Let h,,, be the value of p that maxitnizes
L(Y; p, u') = -0.k log 2~ - 0.5n log u2 + 0.5 log( 1 - p 2 )
- ( 2 ~ ~ ) - ~ [ ( 1 - p ' ) Y ~ + ~ 1=2 (V, -pV,- , ) ' ] . (10.1.48)
Then
n(h,,, - 1 ) L 0.25G-'{(T2 - 1) - [(T2 - 1)' + 8G]"2},
where G and T are defined in Theorem 10.1.1.
Proof. From .(8.1.10), the cubic equation defining ( p - 1) = 6 can be written
&(a) = a3 + (3 + C,)S2 + (3 + 2c, + c2)6 + (1 + c, + c2 + cg) = 0, (10.1.49)
where c, = -fi, c3 = (n - 2)-'np,
574
and
UNIT ROOT AND EXPLOSIVE TIME SERlBs
We observe that
3 +c, = 3 - p = 2 + Op(n-'),
3 + 2c' + c2 = -2(8 - 1 + n - * ) + op(n-2), 1 + c1 +c, + cj = 2n-'(p- I + n - ' ) - 6 + ~,(n-~). I
It follows that the roots of (10.1.49) converge in probability to 0, 0, and -2. Hence, the root in (-2.0) is converging in probability to zero. Also,
n2+5 G - ' T , ,
n(@ - 1 ) A G-'[0.5(T2 - 1) - GI , a
nZ(l + C, + C, + c 3 ) d -G-' . Let
g,@) = + a',& + a2, *
whereyaon = 3 + cl, uln = 3 + 2c, + c,, and a,, = 1 + c1 + C, + c3. Because n2a2,,+ -G-', tbe roots of g,(S)=O itre real and of opposite. sign with probability approaching one as A increases. Letting S, denote tfie negative root of g,(S) = 0, we have
nS, = n(&,n) - ' [ -a ,n - - h o n a 2 n ) 1 ' 2 ~
=0.5n(/?- 1 +n-' - [ ( p - 1 -n- ' ) , + 26- 4n - 2 1 112 )+ O,(n-')
-A 0.25G-'{(T2 - 1) - [(T' - 1)2 + 8G]"2}. (10.1.50)
Also,
and
UNIT ROOT AUTOREORESSIVE TIME SERIES 575
for n large, 8,,, C 4 for n large. It follows that
-nb + [n2b2 - 4n28~(2u,, + 6S*)]”’ a + 0, 2(2q),n + G*) n ( a m - 4.) =
where b = h,,& + u , ~ ~ + 38;. Therefore, the limiting distribution of n8, is the A same as the limiting distribution of n&.
From (10.1.50), we see that the value of li, - 1 is close to the value of bw - 1 unless jiw - 1 is positive, or unless j5, - 1 is negative and 2(n2fi - 2) is large relative to pw - 1. The limiting distribution of 2(n2;7 - 2) is the distribution of 2G-’(T2 - 2G), where E{T2 - 2G) = 0. Empirical studies show that the 5% and 10% points of the distribution of n(& - 1) are very similar to those of the distribution of n(& - 1).
If p is unknown and the estimator of (p, p, a’) is obtained by maximizing the likelihood (8.1.5), the limiting distribution of the estimator of p differs from that of Theorem 10.1.10. Let &, denote the estimator of p that maximizes the likelihood (8.1.5). If Y, satisfies the assumptions of Theorem 10.1.7, Gonzalez- Farias (1992) has shown that
9 4 P m l - 1 1 4 l 9
where 6 is the negative root of
a , ~ “ + a313 + a212 + a , 5 + a. = 0,
[a,, a3, a,] = [ -4, -T2 + 1 - 2TH + H 2 - 8(G - H’), 2(G - H 2 ) ] , -4T2 f 8 + 8TH - 8 H 2 4- 2(T - W)’ , Q ,
a,= 8(G -H2) t 4 T - 5 - 8TH + H2, and G , H , and T are defined in Theorem 10.1.6. Gonzalez-Farias (1992) also showed empirically that the limiting distribution of the estimator of p that maximizes (8.1.5) is nearly indistinguishable from the limiting distribution of the estimator that maximizes the known mean likelihood (10.1.48) where p is replaced with y’.
Corollary 10.1.10. Let the assumptions of Theorem 10.1.7 hold Let be the value of p that maximizes (10.1.a) with y,, = U, - = Y, -& replacing u,. and let Fm2 be the value of p that maximizes (10.1.48) with Y , ~ = Y, -&,2
replacing U,, where jYt2 is the ordinary least squares estimator of the linear trend function. Then
576 UNIT ROOT AND EXPLOSIVE TIME SERIES
where gj is the limiting distribution of n(Pwj - 1) obtained by multiplying the distribution of PWj of Theorem 10.1.8 by (G - H2)-"' for j = 1 and by (G - H z - 3KZ)-"' for j = 2,
q, = (G - H2)- ' [H2 + (T - H)'] , and
= (G - H2 - 3K2)-'[(H - 3K)' + (T - H - 3K)'] . Proof. The results follow from (10.1 SO). A
A pivotal statistic for the test of p = 1 can be constructed by dividing @, - 1 by the estimated standard error of b,,,,. For higher order processes, the test is constructed by testing the bypothesis that the sum of the autoregressive co- efficients is one.
T h m are several methods available for computing the estimated covariance matrix of the estimated coefficients. One procedure is to write the concentrated log likelihood of (8.4.5) in the form
It
2 g:(Y; 6) , (10.1.51) 1-1
where g,(Y; 6) = c(6)zI(Y; 6), z,(Y; 8) are uncorrelated (0, a') random variables when the function is evaluated at the true 9 and 46) is a constant depending on 4 Then the maximum likelihood estimators are obtained by minimizing (10.1.51), and an estimator of the covariance matrix of the maximum likelihood estimator &m
is
(10.1.52)
where hi(Y &,) is the vector of derivatives of g,(Y 8) with respect to 8 evaluated at the maximum likelihood estimator,
I9
&: = (n - t)-' C z;(Y; 6) (10.1.53)
is the maximum likelihood estimator of a' adjusted for dewes of freedom, and r is the number of parameters estimated.
Table 10.A.5 contains the percentiles of tbe test statistics based on the maximum likelihood estimator and the estimated covariance matrix (10.1.52). This table was constructed by the same procedutes used to construct Table 10.A.4. We have presented a numbex of test statistics for the unit root problem. Tb
ordinary least squares estimators are easy to compute and can be computed with an
f P 1
UNIT ROOT AUTORBGRESSIVE TIME SERIES 577
ordinary regression program, They may be used in exploratory procedures even when other programs are available. The simple symmetric estimator is next in ease of computation and can also be computed with an ordinary least squares regression P*gram*
If the objective is to test the hypothesis of a unit root against the alternative of a stationary process with unknown mean, the maximum likelihood and weighted symmetric procedures are recommended. Monte Car10 studies have demonstrated that these two procedures have nearly identical power against the stafionary unknown mean alternative and that they are much more powerful than ordinary least squares in that case. The power of the test based on the simple symmetric estimator falls between that of ordinary least squares and that of maximum likelihood, being closer to that of maximum lielihaod. See for example, PantuIa, Gonzalez-Farias, and Fuller (1994) and Park and Fuller (1995).
If a test for a unit root against the alternative of a model with known mean is desired, the ordinary least squares procedure is recommended. See Park and Fuller (1995).
Example 10.1.2. We continue the analysis of the one-year treasury bill interest rate begun in Example 10.1.1. The simple symmetric estimator of the third order process is
A t = 0.106 - 0.020 -I- 0.354 AK-, - 0.089 A&-, . (0.062) (0.01 1) (0.065) (0.066)
The regression equation was computed using an ordinary regression program and the data arrangement of Table 10.1.1 with an intercept included in the regression. The estimated error variance is 0.083. The value of the statistic for a test of a unit mot is - 1.76, which is greater than the 0.10 tabular value of -2.34 given in Table 10.A.3.
If we use the weights of (8.2.15), we obtain
Aft = 0.090 - 0.015 &-, + 0.360 AT-, - 0.092 AT-, (0.062) (0.01 1) (0.065) (0.066)
as the weighted symmetric least squares estimated equation. The test statistic for a unit root is -1.36, which is greater than the 0.10 tabular value of -2.23 given in Table 10.A.4.
The maximum likelihood estimates for the third order process are
(-6,,-4, -bi,)=(1.3436, - 0.4510, 0.0915), (0.0650) (0.1053) (0.0656)
where we used procedure ARIMA of SAS/ETSe to compute the maximum likelihood estimator. This program uses the Marquardt algorithm and equation (10.1.52) to compute the estimated covariance matrix of the coefficients. The sum
578 UNIT ROOT AND EXPLGSIVE TIMe SENtLS
of the estimated autoregressive coefficients is 0.9841, and the estimated standard e t ~ o ~ of the sum of the coefficients is 0.0119. Therefore, the maximum l i k e l i i pivotal for testing the bypotksis of a unit root is - 1.34. The computed value is greater than the 0.10 tabular value of -2.32 given in the second part of Table 10.A.5.
In thii example, all procedures lead to essentially the same conclusions. As stated in the text, the weighted symmetric and maximum likelihood procedures are the recommended pocedures when the mean is estimated. In this example, and in the majority of practical situations, the weighted symmetric estimator and the maximum likelihood estimator will both reject the hypothesis of a unit root or both accept the hypothesis when the test is against the stationary alternative. AA
From the tabled distribution of n@ - 1) of Table 10.A.1, it is clear that the least squares estimator of the unit root parameter of an autoregressive process is biased. Also, the estimator of the autoregressive parameter is biased in the stationary case. See Section 8.2. In the unit root case, the bias is order n-', the same order as the standard deviation of the estimator.
There are several possible situations associated with the use of data to investigate an autoregressive process. One may be willing, on the basis of subject matter knowledge, to specify that the process is statiomy. Then the use of an estimation procedure, such as maximum likelihood based on the stationary likelihood, that restricb all mots of the process to be less than one in absolute value, is appropniite. In a second situation, one may have sttong theoretical reasons for believing a unit root is present. If so, one may choose to test for a unit root and, if the test is consistent with the hypothesis of a unit mot, estimate the process subject to the restriction of a unit root. In the third situation, a unit root is a distinct possibility, but not to the extent that it is a model to be maintained unless rejected by the data. In this third case, it is natural to adopt an estimator that is less biased in the vicinity of the unit root than are the ordinary estimators.
Assume that the model is a pth order autoregressive process. Assume we wish to restrict the estimator of the largest root to the interval (-1, I]. Under the assumption that only a single root of one is possible and that there is no drift if a mot is equal to one, an estimation procedure is the following.
1. Estimate the model P
Y, =so + e,y,-, + C 9 A Y , - ~ + , + e, (10.1.54) j - 2
by weighted symmetric least squares, and compute the statistic CI for the test that el = 1. This can be done by fitting the weighted least squares regression of y, on (y,-,. A<-, , . . . , AU,-,+J where Y, = U, - ji,.
2. Construct a new estimator of 8,
8, = b, + dew, )r9{t31}10.5 , (10.1.55)
UNIT ROOT AUTOREGRESSIVE TIME SERIES 579
where 6, is the estimator from step I ,
+WI if t w 1 8 - l . 2 , c ( t W l ) = 0.035672(@w1 + 7.0)2 if -7.00< C -1.2, I. if .iwl 6 -7.00,
and 0{8,] is the estimated variance of the weighted least squares estimator. 3. Estimate (e2,. . . , %) by the regression of y, - t?lyr-l on AT-j, j =
1,2, , . . , p - I. Denote the estimators by (g2,. . . , OJ.
In the presence of a unit root (8, = l), the outlined estimation procedure is approximately median unbiased for 8, because the median of tWl is about - 1.20 when 8, = 1. Of course, if 8, is restricted to the interval (- 1, I], it is not possible to construct a reasonable estimator of 0, that is mean unbiased for all 8, E (- 1,1]. The function c(PW,) of (10.1.55) was chosen to be a smooth function of t,,,, with value of 1.20 at .iW1 = -1.20. The modified estimator differs from the weighted least squares estimator if +w, > -7.00. This set of t, corresponds, approximately, to j1 > (n + 49)-’(n - 49). The weighted least squares estimator J1 is biased for most values of el, but the bias is small relative to the standard error for 8, considerably less than one and n large.
The empirical properties of the weighted symmetric estimator and the estimator (10.1.55) are compared for the first order process in Table 10.1.2. The true model is
q=pU,- , +@,,
where e, -N(O, 1). The generated ptocess is stationary if p < 1, and Yo = 0 if p = 1. The weighted least squares estimator, denoted by a, is defined by (10.1.42) with U, replaced by Y, - y, provided the estimator is less than one, and is equal to one otherwise. The entries in the table are based on 3000 samples.
defined by (10.1.55) has a uniformly smaller median bias than the weighted least squares estimator Jw. The mean bias is also uniformly smaller. However, the empirical variance of bW is smaller than that of 6 for p differing from one by more than 8n-’, and the mean square error of bw is smaller than that of for p differing from one by more than four standard errors of the estimator. In those situations, the mean square error of exceeds that of b,,, by modest percentages. The largest percentage difference in the table is about 8%. On the other hand, for all three values of n, the modified estimator has a mean square m r less than one half that of for p = 1. For a fixed p, such as 0.75, the mean square emor multiplied by n declines as n increases for both estimators, primarily because of the decrease in bias.
The last two columns of Table 10.1.2 contain the empirical 2.5 percentiles of the statistics
The estimator
580 UNIT ROOT AND EXPLOSIVE TIME SERlEs
Table 10.13. Empirical Properties of Estfmatar (10.155) and Weightea h a s t Squares Estimator for Flmt Order Proms
Median nMSE Pmntiles
0.15 0.50 0.75 0.85 0.90 0.95 0.98 1 .oo
0.50 0.75 0.85 0.90 0.95 0.98 0.99 1 .oo
0.75 0.85 0.90 0.95 0.98 0.99 1 .oo
0.122 0.456 0.695 0.791 0.838 0.884 0.912 0.933
0.479 0.724 0.822 0.870 0.9 19 0.948 0.957 0.965
0.738 0.836 0.886 0.935 0.964 0.974 0.983
0.126 0.49 1 0.757 0.860 0.909 0.955 0.98 1 1 .Ooo
0.483 0.747 0.855 0.905 0.956 0.984 0.992 1 .OOo
0.742 0.847 0.901 0.953 0.983 0.992 1 .Ooo
n-550
1.046 0.998 0.857 0.782 0.747 0.720 0.710 0.728
n = 100
0.836 0.630 0.521 0.459 0.395 0.363 0.360 0.379
n = U X )
0.562 0.421 0.342 0.262 0.216 0.204 0.203
1.127 1.064 0.748 0.567 0.467 0.357 0.309 0.311
0.887 0.624 0.452 0.352 0.241 0.170 0.156 0.159
0.590 0.421 0.313 0.199 0.125 0.101 0.09 1
- 2.25 -2.36 - 252 -2.58 -2.64 -2.71 -2.82 -2.83
-2.13 -2.27 -2.38 -2.45 -2.57 -2.71 -2.75 -2.80
- 2.29 -2.33 - 2.40 -2.52 -2.69 -2.74 -2.80
- 2.25 -2.31 -2.28 -2.22 - 2.20 -2.19 -2.25 -2.22
-2.13 -2.16 -2.13 -2.1 1 -2.10 -2.15 -2.16 -2.17
-2.28 -2.25 -2.20 -2.18 -2.21 -2.19 -2.17
and
where jjw is the weighted symmetric estimator and p is the estimator (10.1.55). Because p has small median bias, the pivotal statistic 1; has a median close to zero
Because the 2.5 percentile for +w, is -2.80 for n = 200 and p = 1, the 2.5 for all values of p.
UNIT ROOT AClTOREGRESSlVE TIME SERIES 581
percentile of fI is -2.17 for n = 200 and p = 1. The estimator bW has a negative bias for positive p, and the 2.5 percentile is less than -2.0 for p’s that are a considerable distance from one. The percentiles for il show a steady decrease to about -2.8 at p = 1.00. The empirical 97.5 percentiles of fl and i, are compared in Table 10.1.3. The
percentiles of i, decline as p moves towards one. The maximum vdue of the 97.5 percentile for fl is approximately equal to 2.30 for 1 - p = 12n-’. At least 2.5% of the estimates jj are equal to one for all three sample sizes when 1 - p = 6n-I. The majority of the 97.5 percentiles of f , fall below 2.30, and the majority of the 2.5 percentiles are above -2.30. Therefore, the interval
( 10.1.56)
will have coverage greater than 95% for most parameter values and for sample sizes in the 50 to 200 range. Because p is restricted to (- 1,1], the actual confidence interval is the intersection of (10.1.56) and (- 1,1].
Example 10.13. We continue the analysis of Example 10.1.2 under the assumption that the largest root of the autoregressive process is in the interval (-1.1). The statistic +wl from Example 10.1.2 is -1.36. The modified estimator of (10.1.55) is
8, = aW, + 0.013 = 0.997
Regressing y, - 0.997y,-, on AT-, and AU,-,, we obtain the estimated equation,
y,= O.997yt-, + 0.355AY,-, - 0.103 AT-, , (0.01 3) (0.065) (0.066)
where y, = U, - ji,. The standard errors for the coefficients of AT-, and AT-, are those computed for the original weighted regression of Example 10.1.2. The standard emr for the coefficient of is the standard error from Example 10.1.2 increased by 15%. With these standard mrs, normal tables can be used for approximate tests and confidence intervals for the parameters of the equation. An
Table 10.13. Empirical 975 Percentiles of T and i n=JO n-100 n=200 -
1 - P ‘1,973 ‘1,975 f, ,975 ‘1.97, tl,975 ‘1.975
Zn-‘ 1.49 2.02 1.58 2.17 1.55 2.17 12n-I I .26 2.27 1.34 2.29 1.36 2.32 6 n - I 1.04 2.04’ 1.11 2.08’ 1.05 2.Mt
Mom than 2.5% of the estimates equal one.
582 UNIT ROOT AND E X W I V E TIME SERIES
approximate 95% confidence interval for 6, under the assumption that the largest root is in (-1, l] is [0.971,1].
The characteristic equation associated with the modified estimates is
m3 - 1.352m2 + 0.458m - 0.103 = 0,
and the largest root is 0.996. Using a Taylor expansion and the estimated covariance matrix with the variance of the coefficient of y t - l increased by 15%, the standard emr of the largest r o d is 0.013. An approximate 95% confidence interval for the largest root is r0.970, 11. AA
10.1.4. PredictrOn for Unit Root auto mom^ It is an important result that prediction procedures appropriate for stationary time series are also appropriate for time series with an autoregressive unit root. As in the stationary case, estimation of the parameters adds a term that is ~,<n-"' ) to the prediction error when the autoregression contains a unit root.
Theorem 10.1.11. Let Y, satisfy
P
= ~ , + B , Y , - , +1c q < ~ , - ~ + , - q - j ) + e , , j=2
where e, = 9, 4 = Xf-, ad, j = 2,3, . . . , p, and e, are iid(0, a2) random variables. Let Y,, Y,,.. ., Yp be fixed, and let & and 4 be the least squares estimators of a = (ao, a,, , . , , aP)' and 6 = (e,, S,, , . . ,@,)'. Assume that one of the roots of the characteristic equation is one and that the other root'. ire less than one in absolute value. Let fn+, be the predictor of Y ~ + ~ based upon rir. Then
s - 1
where the wj satisfl the difference equation D
subject to the initid conditions wo = 1 and
prool. By theorem^ 10.1.4 and 10.1.5, bir- O! = OP(n-I'*) a d
583
It follows that
because ALIY, = OP( 1). Y, = Op(nl") if q, = 0, and Y, = Op(n) if a,, # 0. Repeated substitution into the expression
yields the result for s > 1. A
It follows from Theorem 10.1.11 that the estimator of the prediction variance given in (8.5.14) for the stationary process can also be used for a process with a unit root.
Example 10.1d. By Theorem 8.5.1 and Theorem 10.1.11, the prediction variance formula that ignores the effect of estimation error can be used for modeis whose autoregressive part has mots less than or equal to one in absolute value. In Examples 10.1.1 and 10.12, we found the behavior of the one-yeat treasury bill interest rate to be consistent with the hypothesis of a unit root. Least squares or maximum likelihood can be used to construct predictors that. are appropriate whether or not the process has a unit root.
The one-, two-, and three-period forecasts computed with the third order process estimated by stationary maximum Iikelihood are 9.01, 8.95, and 8.87, respectively. The standard errors estimated using the approximation that ignores the estimation error are 0.291, 0.486, and 0.626 for the one-, two-, and three- period forecasts. The corresponding approximate standard errors computed by the least squares prediction formulas are 0,292, 0.493, and 0,641. The least squares standard errors were computed by fitting the model with three explanatory variables for the three predictions as described in Example 8.5.1. AA
10.2. EXPLOSIVE AUTOREGRESSIVE TIME SERIES
We begin our study of explosive autoregressive time series with the first order pmcess. Let
584 UNIT ROOT AND EXF'LOSIVE TIME SERIES
f?,V,-, +e,, t = l , 2 ,..., t = o , (10.2.1)
where I@, I > 1 and {e,}L -~ is a sequence of i.i.d. random variables with mean zero and positive variance u2. Equation (10.2.1) is understood to mean that V, is created by adding e, to elV,-,. As with other autoregressive processes, repeated substitution enables us to express V, as a weighted average of the e,,
r - 1
(10.2.2)
It foflows that, for lell > I, the variance of V, is
v{y,}= (ef - i)-'(eF - i ) ~ * (10.2.3)
and the variance increases exponentially in t. In expression (10.2.2), the term in el is 8:-'el. When lel1 > 1, the weight 6t-l
increases exponentially as t increases. Hence, the variance contribution from el is of order e y. If 16, I < 1, the weiat applied to the first observation declines to zem as t increases. If [el I = 1, the weight of the first observation becomes s d i relative to the standard deviation of the process. To furthet understand the importance of the fimt few e, in the model with
lol I > 1, we observe that
I
= e; Z: e; 'e , . (10.2.4) i s 1
Letting X, = e Liei, we have
X,+X a.s. (10.2.5)
as t increases, where X = X;=l 8;'ei. Thus for large t, the behavior of V, is essentially that of an exponential multiple of the random variable X,
The model (10.2.1) extended to include an intercept term and a general initiai value is
f?o+BIY,-I + e r , t = l , 2 ,..., Y, ={ Y o * t = O . (10.2.6)
where
EXF'LOSIVE AUTOREGRESSIVE TIMe SERIES ss5
n
yct) = n- I y,+, , i = 0, - 1 . 1=I
If yo is fixed and e, - NI(0, a2), the least squares estimators are the maximum likelihood estimators.
Rubin (1950) showed that plim 8, = 8, for the least squares estimator of the model (10.2.1). White (1958, 1959) considered the asymptotic properties of the least squares statistics for (10.2.1) under the assumption that e, -M(o, a'). M i t e used moment generating function techniques to obtain his results. Anderson (1959) extended White's results using the representation (10.2.4). H a m (1977) studied estimators for the general model (10.2.6), and Theorem 10.2.1 is taken from that work.
Theorem 1021. Let {&}:==,, be defined by (10.2.6) with 1 and {e,} a sequence of iid(0.a') random variables. Let the least squares estimators be defined by (10.2.7). The limiting distribution of (6: - 1)-'0:(8, - 0,) is that of W,W;', where W, and W, are independent random variables,
/ m m
and 8,, = y o + Oo(el - l)-t, The limiting distribution of n"2(80 - ea) is that of a N(0, a') random variable, and n 'I2( do - $) is independent of e;( 0, - el) in the
in distribution to a Cauchy random variable. limit. If e, -M(O, a2), y o =0, and 0, = 0, then (fl; - 1) - I n - 4) converges
Proof. We may write Y, as
where
Now X, converges as. as r + to a random variable
(10.2.8)
where 6, = y o + eo(Ol - l)-'. Then e;'Y, -+X a.s. as t increases. Note that
586 UNIT ROOT AND B)BU)SIVE "WE SERE3
The error in the least squares estimator of 6, is
(10.2.9)
where " f- I
using
and
( 10.2.1 1 )
we have
By similar arguments n
EXPLOSIVE AUTOREGRESSIVE TIME SERIES 587
and
It follows that
(e: - i)-Ie;(b, - el) =x-’z, + O~(~I-~’~), (10.2.14)
and
2, and X are asymptotically independent. Therefore, (0: - 1)-’8;($, - 6,) has a nondegenerate limiting distribution. If
the et’s are NI(0, a’) random variables, then 2, and X, both being linear combinations of the e,’s, are normally distributed. The mean of the Z,, is zero and the mean of X is 4. so that the limiting distribution of (0; - l)-’@:(d, - 8,) is that of W, W; ’ , where ( W, , W,) is a n d vector. If S, = 0 and the e, are normal, then the limiting distribution of (6: - l)-’@;(d, - 8,) is that of the ratio of two independent standard normal variables, which is Cauchy. Now, from (10.2.7),
a0 - eo = a(,,) - (dl - el )y(- ,) = qo1 + o ~ n - 9
by (10.2.13) and (10.2.14). Also, n”zZ~,, is independent of X and of 2, in the limit, because the elements of
are independent and the vector L,, - (ntlzZ0, 2,) converges in probability to zero A for no < k,, < no.’, for example.
The distributions of other regression statistics follow from Theorem 10.2.1. Let the regression residual mean square be
Codlory 10.2.1.1. Suppse the assumptions of Theorem 10.2.1 hold. Then
plim ci2 = a2. n - w
Proof. Using the lemt squares definition of $,
B2 = (n - 2)-1[ 5 (e, - 212 - <dl - el)2 i (T-, - y ~ ] . t = I r - I
We have shown that (4 -el) = U5(/Oll-") and that X;=l(u,-l -j&l))2 = Up(}61~-zn). Because the e, are iid(0, u ), the mean square of the e, converges in probability to a2. A
Cordlary 10.2.1.2. Suppose that e, - NI(0, u2). Then, under the model of Theorem 10.2.1.
W. We have
The random variables X and 2, are asymptotically independent, and thepvariance of 2, converges to (6;- 1)-'a2. The result follows, because 82-+a2 by corollary 10.2.1.1. A
Because the pivotal .i is approximately N(0,l) in large samplm, + may be used to test hypotheses concerning 8, and to set confidence intervals for 0, when the original errors are normally distributcd.
We now extend these results to the pth order process with a single root larger than one in absolute value, Let
P
( 10.2.17) j - I t = - p + l , - p + 2 ,..., 0,
where ( Y - ~ + ~ , . . . , yo) is an initial vector and {e,}:=-- is a sequence of independent identically distributed (0, a') random variables. Let lmll > Im,I 2 lm3( z - - .a Im,l be the roots of the characteristic quation
EXPLOSIVE AUTORujRESSIVE TIME SBRIES 589
(10.2.18)
Assume Im,l> 1 and al l other mts ate less than one in absolute value. We can also write the first equation of model (10.2.17) as
where the roots of
(10.2.20)
are the p - 1 mts of (10.2.18) that are less than one in absolute value. We consider a conceptual regression with the true m denoted by my, used to define the regression variables. Let
(10.2.21)
where
be the least squares estimator of 6 = (e,, el,. . . , e,)’, where 8, = m i . The conclusion of Theorem 10.2.2 is analogous to that of Theorem 10.1.2 in that the limiting distribution of the coefficient of K - l is closely related to the limiting distribution in the first order case and the other coefficients have the limiting distribution associated with the stationary process.
Two alternative assumptions for the initial conditions can be considered. In the first, the vector (Y-~+,, Y - ~ + ~ , . . . , yo) is treated as fixed. In the second, we let
+zt , t = u , ..., y, ={Yo 9 t - 0 , (10.2.22)
where yo is fixed and {Zt};=-= is a stationary autoregressive process with characteristic equation (10.2.20). The limiting distributions are the same under the two formulations.
Theorem 10.2.2. Suppose the model (10.2.17), (10.2.22) holds with lePI > 1 and the roots of (10.2.20) less than one in absolute value, where 6’ is the true parameter vector. Let 4 be defined by (10.2.21), and let
6, = d’Z[($ - e&($ - 63,. . . , (4 - e,O)y .
590 UNIT ROOT AND EXPLOSIVE TIME SERIES
Then ti:;:(&, - 0;) converges to a random variable witb mean zero and variance u2 and
Proof. We have
where
D, = diag[n, (2 Y t 1 ) , n,. . . ,n] .
By the arguments used in Theorem 10.2.1
and Uf converges to
To simplify the notation, we now write ml for my. It follows from the arguments used to obtain (10.2.12) and (10.2.14) that
Now,
We have
and
EXPLOSIVE AUTOUEQIRESSIVE TIME SERIES 591
Therefore,
and
It follows that
where the row in A* associated with d1 is (0,1,0,. . . ,O). The matrix obtained by deleting the second row and column of A* is A. By the arguments of Theorem 8.2.1,
converges to a n o d vector random variable and we have the limiting result for 5,. The remaining results follow from arguments used in the proof of Theorem 10.2.1. A
In Theorem 10.2.2 we obtained the limiting distribution of estimators where some of the explanatory variables were a function of the unknown true maximum root. These results can be used to construct confidence intervals for the largest root. This operation is illustrated in Example 10.2.1. In Corollary 10.2.2 we give the limiting distribution of the ordinary regression pivotal for the largest root.
CorolIary 10.22. Let (ijO,rii,, l&,. . . , OP) be the nonlinear least squares estimator of the parameters of (10.2.19), where Itii,l is at Ieast as large as the largest absolute value of the roots of the estimated equation (10.2.20). Let the assumptions of Theorem 10.2.2 hold with e, -NI(O, a'). Let
592 UNIT ROOT AND EXPLOSIVE TlME SERIES
Then
where 6,, is the second diagonal element of the element associated with fil.
k P , The nonlinear least squares estimator for m,, based on the model (10.2.19), differs from the root calculated with the characteristic equation of the linear least squares fit of (10.2.17) only in that the rii, from the nonlinear fit is always a real number. By Thwrem 10.2.2, the linear least squares estimators are consistent. Because the true largest root is real, the probability approaches one that the two estimation procedures agree. The estimated characteristic equation, based on the estimator of (10.2.21), is
The coefficients of U, j = 1,2, . . . , p, obtained from the estimator ( 10.2.2 1 ) are the same as those obtained by the least squares fit of the model (10.2.17). Therefore, the roots of (10.2.23) are the same as the toots of the estimated characteristic equation associated with the least squares estimatM of (10.2.17).
By a Taylor expansion, we have that the largest root of (10.2.23). denoted by rEt , , satisfies
and m, -my = OJlmyl-"). It follows that
Also.
BxpLosIVE AUTOREoRESSIVE TME SERIES 593
(10.2.25)
and U is defined in the proof of Theorem 10.2.2. By the arguments used in the proof of Theorem 10.2.2,
Therefore,
Because the multiplier for 8, - m: in (10.2.24) is the square root of the multiplier for (m~)*"[(m0)* - 1]U2 in (10.2.25), the result follows from the
A limiting distFibution of &(dl - 0;) given in Theorem 10.2.2.
Example 10.2.2. Engle and Kraft (1981) analyzed the logarithm of the implicit price deflator for the gross national product as an autoregressive time series. The analysis we present follows Fuller (1984). We simplify the model of Engle and Kraft and use data for the period 1955 first quarter through 1980 third quarter. If we we least squares to estimate an autorepsive equation of the third order, 100 observations are used in the regression. The estimated characteristic equation associated with the third order process is
m3 - 1.429m2 + 0.133m + 0.290 = 0.
and the largest root of this equation is 1.0178. Because the largest root is greater than one, the estimated model is explosive. Before accepting an explosive model, it is reasonable to test the hypothesis that the largest root is one. To test the hypothesis of a unit rod, we regress the first difference on Y,-l and the lagged first diffmnces. The estimated equation is
where the numbers in parentheses are the estimated standard e m obtained from tbe ordinary least squam regression program. By Theorem 10.1.4, the statistic
7- = (0.00z0)-'(0.0054) = 2.72
594 UNfT ROOT AND EXPLOSIVE TIME SERIES
has the distribution tabulated in the second part of Table 10.A.2 when the hugest root is one. According to that table, FP will exceed 0.63 about 1% of the time when the true root is one. Therefore, the hypothesis of a unit root is easily rejected
To set confidence limits for the largest root, we use Corollary 10.2.2. Let the coefficient of q-l in the regression of U, -mlY,-l on 1, q- , , q-l -mlY,-2, and ytW2 - m,q..., be denoted by 6. If m, > 1 is the largest root of the characteristic equation and if all other mts are less than one in absolute value, then the limiting distribution of the statistic i = (see, S>-lS, where s.e. 6 is the ordinary least squares standard enor, is that of a N(O.1) random variable. Therefore, we define a confidence set for m, to be those rn, such that the absolute value of the calculated E is less than the tabular value of Student’s t for the desired confidence level. The confidence set can be constructed by fitting models for a range of m,-values and choosing the values for which the test statistic is equal to the tabular value. For our data
~-1.0091Y,~,=-0.0211 + 0.00274Y,-,+ 0.4f7(Y,-l-1.0091Y,-2) (0.082) (0.00139) (0.098)
(0,099) + 0.288 (qV2 - 1.0091 q-,3)
and
f, - 1.0256 = - 0.0211 - 0.00254 Y,-* + 0 . M (q-1- 1.0256 q - 2 )
(0.0082) (0.00128) (0.098)
(0.097) f 0.283 (Y,-? - 1.0256 q-3) .
It follows that a 95% confidence interval for mI based on the large sample theory is (1.0091, 1.0256).
By Corollary 10.2.2, we can construct tests for the stationary part of the model. For example, the ordinary regression t-statistic for 5-3 - rn, yI-4 in the regression of $-m,Y, - l on ( l , Y , - , , ~ - l - m ~ Y , - ~ , ~ - 2 - ~ l Y , - ~ , ~ - 3 - m , Y , - 4 ) has a N(0,l) distribution in the limit. Because the t-statistic for the hypothesis that the coefficient for q-3 - m, q-4 is zero is identical (for any m, # 0) to the t-statistic for the coefficient of q.-4 in the regression of Y, on (1, Y,-, , yt+ yt--3, U,-,), we have a test of the hypothesis that the process is third order against the hypothesis that it is fourth order. The fitted fourth order model, using the estimated value of m, is
- 1.0178 Y,-l - 0.0207 - 0.00000 q-1 + 0.409 @‘,-I - 1.0178 <-2)
(0.00%) (0,001 14) (0.104)
(0.108) (0.104) + 0.280 (x - , - 1.0178 Y,-3) + 0.012 (Fe3 - 1.0178 K-4).
Because the regression pivotal for y1-4 is t = (0.104)-10.012 = 0.12, we easily
EXPLOSIVE ALIT0RU;RESSNE TIME SERIES 595
accept the hypothesis that the process is third order. The argument extends to the use of an F to test that the vector of coefficients for a set of variables is zero. AA
We now consider the problem of predicting Y,+, for the explosive auto- regressive model. We derive the results for the first order model
Y,=O,Y,-, + e , ,
where the e, are NI(0, a') random variables, Given (Y,, Yz,. . . , Yn), we take as our prediction of Yn+s
t*s = &y, 9 (10.2.26)
where s>O. The error in our prediction is
where
Also,
- 2 n ) .
where we have used O;"U, =X+Op(tl;A). Now Z, and 2Jal 8f-je,+j are independent,
var(z,) = u'(e: - i)-I(i - e;211), and
Therefore, the variance of the leading term in the prediction error is 2 2 2fs-I)
VZK(C,,) = e l (e: - 1) + (e; - i)-'(e: - 1)i + oclelI-2n), (10.2.27)
where
5% UNIT ROOT ANR BxpLoslvE TIME SER[ES
The variance of the prediction e m may be estimated by
(10.2.28) 1- s - 1 2 2 4:. - 1 + <so, ) Y , $(fn+, - Y,+,) = ci2 e; - 1 y - 1 YL,
For s = 1, this estimated variance reduces to the familiar regression formula for the variance of a prediction.
In the explosive case, unlike the case with lol I d 1, the m r in $I contributes to the leading term of the variance of the prediction e m . It follows that the usual approximations, such as (8,5.13), are not formally correct when the process bas a root larger than one. However, the approximations based upon regression formulas remain appropriate for predictions with s > 1 and for higher order processes with a root greater than one in absolute value.
Example 10.2.2. In this example, we construct predictions using the model of Example 10.2.1 and the method of indicator variables described in Example 8.5.1. To obtain predictions for three periods, we write
Y, = e, + e,x,, + 62Xf2 + e,x,, + e,x,, + (86 - e,e,)x,, + (e, - e,e, - e2e5)x,, + e, ,
where the regression variables are defined in Table 8.5.1 of Example 8.5.1 and the estimator of (S,, 06, 4) is the predictor of (Y,+,, Yn+2, Yn+,). Using a nonlinear least squares program and the data for 1955-1 through 1980-4, we obtain
f81-2, fa,-,) = (4.5144, 4.5410, 4.56721, (0.0044) (0.0074) (0.0107)
where the numbers in parentheses are the standard errors output by the nonlinear least squares program. There are 117 observations used in the regression, and the residual mean square is 6' = O.oooO179. Using this estimate of u2, the approxi- mation (8.5.13) gives 0.0042 as the standard error for the one-period prediction. Thus, for a root close to one the difference between the appropriate estimator and the approximation based on the stationary model is not large for a one-period predictim. AA
10.3. MULTIVARIATE AUTOREGRESSIVE PROCESSES WITH UNIT ROOTS 103.1. Multivariate M o m Walk Before studying estixnators for the general autoregressive mode€, we give the distribution of the least squares estimator for a multivariate version of the first order unit root process. Let
(10.3.1)
where Y, is a &-dimensional column vector and the e, are independent identically distributed (0, Zee) random variables. It is always possible to transform the vector Y, into a new vector X, that satisfies an equation of the form (10.3.1) with Z,, = I. One transformation is
x, = TY, 9
where T =Xi:" = QD;1'2Q', Q' is the orthogonal matrix such that
Q'LQ = D,
Q'Q = I, and D, is diagonal. We write the general first order autoregressive process as
Y, = H,Y,-, +el
and consider the least squares estimator of Hi,
(10.3.2)
(10.3.3)
If HI = I, the nonmalized error in the least squares estimator is
The quantities in the expression converge in distribution to random variables that are analogous to the limit random variables of Theorem 10.1.1. The first order vector autoregressive model with an intercept is
Y, = Ho + BIY,-, + e l , (10.3.4)
where H, is a k-dimensional column vector. The ordinary least squares estimator of HI for the mode1 (10.3.4) is
- 1
A;,@ =p r=2 v,-, --~<-J~Y,-, - T ( - 1 4 n
x 2 (YI-l - Y(-l)xyt - k(0,)' 9 (10.3.5) ,=z
where (P,-,,, q o ) ) = (n - 11-l x;-*(y,-l, YI).
Lemma 103.1. Let the model (10.3.1) hold. Let Yo be a vector random variable with finite covariance matrix, perhaps the m o matrix, independent of {el). Let (el}Ll be a sequence of random vectors, and let 4-, be the sigma-field generated by {el, e2,. . . , eI-i}, Assume
We,, e,e:) I 4-,) = (0, Zee) 8.s.
and
E{le,(’+’ I . C & ~ ) < M < ~ as.
for some 6 > 0, or assume the e, are iid(0, Z,,), where Z,, is positive definite. Let fii be defined by (10,3.3), and by (10.3.5). Then
nZjj’(<ijt; - 1 ) Z ~ ” ’ ~ G-IY,
and
where I 0
G = 2 y?U,U: = I, W(r)W(r) dr , i== I
I ( P m
Y = 2(y, + q)-’yfqlJ,Ui - 0.51 = I, W(r) dW’(r) , i s 1 j = l
a
q = It] 2”’yiU, =W(l),
3 = (-1)’+’2[(21- 1)wI-l , i - I
U, = (Ull, is the standard vector Wiener process.
. , , Ukl)’, 1 = 1,2,. . . , is a sequence of NI(0, I) vectors, and W(r)
Proof, The result follows immediately from Theorem 5.3.7, See Phillips and A Durlauf (1986) and Nagaraj (1990).
The extension of ‘I%eorem 5.3.7 to a moving average process wilt be used repeatedly in this section,
Lemma 103.2. Let {e,) satisfy the assumptiom of Theorem 5.3.7. Let
m m
2 1Fjll<=, 2 K,=C, 1-0 i-0
llKjl[ is the maximum of the absolute values of the elements of $. and C is a nonsingular real matrix. Then
MULTIVARIATE AUrTOREOReSSlVE PROCESSES WITH UNIT ROOTS 599
where G, Y, 5, and W(u) are defined in Lemma 10.3.1.
Proof. Using the arguments in the proof of Theorem 10.1.2, the sums of squares and products of the Y, can be approximated by the sums of squares and
A products of C I ei. The results then follow from Theorem 5.3.7.
103.2. Vector Process with B Sfirgle Unit Root In this section we examine estimation of the parameters of a vector autoregressive process whose characteristic equation has a unit root. We begin by considering least squares estimation of the parameters for two special equations. The first is
Y , = B , Y , _ , +e,, t = 1,2 ,... , (10.3.6)
where the e, are independent (0, TJ random variables with bounded 2 + S (8 > 0) moments and Y, is a scalar. Assume E{e, I g-,} = 0, where .$-I is the sigma-field generated by [(q- , X, - I , e,- , ), CYf-2, Xr-2t e, . . . , (Yo. KO, eo)l. Assume that X, is a zero mean stationary R-dimensional pth order autoregressive process satisfying
(10.3.7)
where the 4 are independent (0,XJ random variables with bounded 24-8 (8 > 0) moments.
If Bl = 1, then from (10.3.6),
(10.3.8)
Also, from (10.3.7) and Theorem 2.8.1, fl'X,-, has a representation as an infinite moving average
600 UNIT ROOT AND E X W I V E TIME SERIBS
Hence, by the proof of Theorem 10.1.2, I
y,"Yo + 2 u/ , j - 1
(10.3.9)
(10.3.10)
where u, = ciq-, + e, and ci = /3' Ey,o Kj. These properties of the processes can be used to obtain the limiting distribution of the feast squares estimator of (6,. p').
Theorem 10.3.1. Let the model (10.3.6)-(10.3.7) hold with 6, = I, where (er, e;) is a sequence of independent zero mean vectors with common covariance matrix and bounded 2 + 6 (6 > 0) moments. Let 8 be the least squares estimator of (B,, p')', where
(10.3.1 1)
and F, = (F-l,X;-l). Let
t , = [0(81}~-"2(~1 - I ) ,
4, - 1 = op(?l-l), where ${q} is the least squares estimator of the variance of 8,. Then
and
t ~ ~ ( g - @I--% NO, r;;ce6), where r,, = E(x,X:}, Q 4 7;, ? is defined in Corollary 10.1.1.2, 4 has the limiting distribution of Corollw 10.1.1.2, u, is defined in (10.3.10), p,, is the correlation between u, and el, d is a N(0,l) random variable, and % is independent of d. If (@,,/.?')=(LO), then
n(8, - 1 ) s (2G)-'(T2 - I ) ,
where (G, T ) is defined in Theorem 10.1.1.
Proof. From representation (10.3.10) and Theorem 10.1.1, we have
MULTIVARIATE AU'rOREGReSSIVB PROCESSBS WITH UNlT ROOTS 601
Also
because
Following the arguments used in the proof of Theorem 10.1.2, we have
Thus,
Where
and D,, = diag(n, n1I2, n i l2 , . . . , r i l l 2 ) . Therefore,
If /3 = 0, then Y, = e,,
and
n(d1 - 1 ) s (2G)-'(T2 - 1).
The distrib~tiona~ result for n'"@ - 8) follows because n-'" z:=~ X,-,e,
To establish the limiting result for t , , we see that satisfies tbe conditions of ~hmrem 5.3.3 with Z, = n-'/2~l-,e,.
UNIT ROOT AND ExpzosIvEI TIME SERIES 682
Now
where be, = cr~uiaue, and d, = e, - bcuu, is uncorrelated with u,. By Corollary 10.1.1.1,
By Lemma 10.3.1,
n m m
where the limit nomal random variables (UU,,Udi) are defined for the p e s s (u,, d, ) by the expression following (10.1.12). Because u, and d, are u n c m l d , UMi and u d j are independent. Therefore, the conditional distribution of x,, conditional on {Uul, Uuz,. . .} is normal with mean zero and variance
2 2 -112 for almost every {u~,, vU2, . . .I. It follows that ( B t l y i U u i ) dent of 3. and
X d is indepen-
(10.3.13)
where f is a N ( 0 , l ) random variable independent of 7;. A
The X,.-l of the mode1 (10.3.6) is a rather general vector. The vector could contain current values of an explanatory variable and might contain variables of the form A&- ,=&- , . -q -,-, for i p l . If X,-, contains only A&-,, i =
MULTlVARIATE AUTOReGRESSIVE PROCESSES WITH UNIT ROOTS 603
1,2,. . . , p - 1, then we obtain Theorem 10.1.2 as a special case of Theorem 10.3.1. In Theorem 10.3.1, the model contains no intercept. If the model Contains a
constant term, then the pivotal statistic for the coefficient of lagged Y is a lineat combination of ?jp of Table 10.A.2 and a N(0,l) random variable.
Note that the usual regression statistics are appropriate, in large samples, for testing and confidence statements concerning the /?-parameters. The usual pivotal statistic for the coefficient for Y,- , has the distribution of Table 10.A.2 if p,. = 0. This can happen if the only elements of X,-, with nonzero coefficients are elements of the form j 3 1. If one is interested in testing p = 1 in a model such as (10.3.6), the statistic of Theorem 10.3.4 can be computed for the vector proms (q.x:).
A special case of the model (10.3.6) is that in which V, is a function of a vector, say XI - I , and an error that is autoregressive with parameter 0,. The model can be written’
or as
Under rather general conditions for XI,, and e, = 1, the nonlinear least squares estimator (&, 8, ) satisfies
We next consider an equation in which one of the explanatory variables satisfies an autoregressive equation with a unit root.
Theorem 10.3.2. Let
xr = &V,-, + 81Z,-I + e, ? (10.3.14)
P
Y,+&u,-j=u,. j - I (10.3.15)
604 UNIT ROOT AND EXPLOSIVE TIMB SERIES
r
j = I Z, -t- BjZ , , = v, I (10.3.16)
where (e,, v:, u,) is a sequence of independent zero mean random vectors with covariance matrix I; and bounded 2 + 6 (6 > 0) moments. Let one of the roots of
mp -I- 2 ajmP-j = o
be one, and let the remaining p - 1 m& be less than one in absolute value. Let 2, be a stationary process where the roots of
P
j - I
B, - =o,Cn-9 ,
~fll+Pu,7;+(1-P,.) d ,
n1‘2(b2 - m - 5 MO, r h C ) , (10.3.17) a 2 II2
a where r, = E{Z,Z;), .i + 6, .i is defined in Corollary 10.1.1 .?. pa, is the correlation between u, and e,, u, is defined in (10.3.15), d is a N(O.1) random variable, and 6 is independent of d.
Proof. From the proof of Theorem 10.1.2,
where c = [I -t- Z7mh, (i - l)q]-’, G is defined in Theorem 10.1.1, and
The e m in the least squares estimator is
MULTIVARIATE AIJTOREORESSIVB PROCESSES WITH UNIT ROOTS 605
n
1 I2 where Dn = diag(n, n'", nl", . , . , n ),
and we have used
Thus,
The limiting distribution of n-1'2 Zy,, Z,-,e, follows from Theorem 5.3.3. The derivation of the limiting distribution of t,, follows that for t i of Theorem
10.3.1. A
Theorem 10.3.2 identifies the difficulties in making inferential statements about parameters of equations such as (10.3.15) where the explanatq variable is a time series. In Chapter 9 we assumed the explanatory variables to be fixed sequences. An operationally equivalent assumption is that the explanatory variables form a random sequence independent of the e m sequence. In such cases, the t-statistics are N ( 0 , l ) random variables in the limit. If an explanatory variable is an autorepsive process, the distribution of the t-statistics depends on the roots of the process and on the correlation properties of the error processes. If the autoregressive process defining the explanatory variable has a unit root and if the error in the autoregressive process is correlated with the error in the equation, then by Theorem 10.3.2 the usual t-statistics have nonstandard limiting distributions. If the explanatory process is stationary, then the r-statistics for the parameters of the model (10.3.6) are N(0,l) random variables in the limit.
If it is known that the explanatory process has a unit root, then imposing that condition on the system (10.3.14). (10.3.15) produces efficient estimates of the parameters of interest, and the t-stahtics converge in distribution to N(0,l) random variables. See Phillips (1991) and Exercise 10.12. However, in many practical situations it is not known that the Y,-process of equation (10.3.15) has a
606 UNlT ROOT AND EXPLOSIVE TIMS SERIES
unit root. In such situations, the investigator must recognize the potential effect of a unit root in the Y,-process on inferences about p,.
Example 103.1. In this example, we consider estimation for the model
where (el, u,)' - NI(0.E). Our situation is assumed to be that described in Section 10.1.3 in that we specify 0, €(-1,1] and 4 = 0 if 0, = 1. The +ta of Table 10.B.4 were artificially generated to satisfy the model. The ordinary least squares estimates of the two equations are
$, = 12.992 + 0.7783 y,- , (0.099) (0.0075)
j , = 0.9798 y,-, + 0.7869 AK-, , (0.0076) (0.0597)
where yr = U, - y,, and the ordinary least squares standard errors are given in parentheses. The estimated covariance matrix of (ett u,) is
0.9618 0.7291
and the estimated correlation between e, and u, is j3,2 = 0.751. The weighted least squares estimate of 0, is d,,,, = 0.9850, and the test for a unit root is .iwl = -1.94. The estimator of el constructed from (10.1.55) is #w, = 0.9921. To estimate the parameters of the X-equation, we use a system method of
estimation such as full information maximum likelihood or three-stage least squares, available in SAS/ETs'. The two equations are estimated simultaneously, subject to the restriction that 0, = 0.9921. The standard errors computed by the program shouid not be used as the standard errors of the estimators. They are only appropriate if it is known that 0, = 0.9921. We suggest that the standard e m from the initial ordinw least squares regression be modified as in Example 10.1.3. Thus, the standard e m for y,-, in the regression of y, on yf-, is increased by 15%, and the standard error for yI - , in the relpession of X, on ( l , ~ , - ~ ) is multiplied by 1 +0.15@,,. Our estimate of tbe model is
2, = 12.937 + 0.7876 Y,-) , (0.099) (0.0083)
9, = 0.9921 y,-, + 0.7308 AY,-, (0.0087) (0.0597)
MULTIVARlATB AUTOREGRESSIVE PRQCESSES WITH UMT ROOTS 607
where the numbers in parentheses are the modified ordinary least squares standard errors. AA
We now investigate estimation for the vector autoregressive process. Let Y, be a pth order autoregressive process of dimension R Let
P
Y,+I:A,Y,_,=~,, t=1 ,2 ,..., i= I
(10.3.18)
where the e, are independent and identically distributed (0, Zee) random vectors, and the Aj are k X k parameter matrices. The representation (10.3.18) was introduced in Section 2.8. Assume that the chatacteristic equation
(10.3.19)
has g mots equal to one and that all remaining roots are less than one in absolute value. By analogy to (10.1.29, we can write
P
Y, =HIY,- l + 2 Hi AY,-i+I +e,, (10.3.20) i=2
Hp-i - - Aj 1-p - i
(10.3.21)
for i = 2,3,. . . , p - 1. We will study the process in which the portion of the Jordan canonical form of H, associated with the unit roots is a g-dimensional identity matrix.
By Theorem 2.4.2, there exists a Q such that
Q- 'H~Q = A , (10.3.22)
where the elements of A are determined by the roots of H, . If the roots of H I are distinct, A is a diagonal matrix with the roots on the diagonal. If H, has repeated roots, the matrix A is of the form described in Theorem 2.4.2. Assume that A is block diagonal with the g X g identity matrix as the first block and A,, as the second block, where A22 - I is nonsingular.
Let T be a nonsingular matrix whose first g rows are the first g rows of Q-', the rows associated with the roots of one. The remaining rows of T are linear combinations of the rows of Q-' chosen such that all elements of T are real. The matrix T can be chosen 50 that the upper left g X g matrix of TE.,,T' is an identity matrix. Let X, = TY,. Then
I
608 WNlT ROOT AND EXPLOSIVE TIME SERIES
P
X,=cPIXI-, + C @ i A X I - i + l +a,, (10.3.23)
where @,=TH,T-', i=1,2, . . . , p , and a,=Te,. The matrix has a g X g identity matrix as the upper left block. The remaining elements of the first g rows and 6rst g columns of
1-2
are zero. It follows that the vector process
is a vector autoregressive process with all mots less than one in absolute value. A vector process Y, for which them exists a transformation T such that k - g
components of the transformed vector are stationary and g components are unit root processes is called coinregrated of or&r k - g in the econometrics literature. See Granger (1981) and Engle and Granger (1987). Granger's original definition of cointegration required every element of Y, to contain a unit root. The last k - g rows of T that define stationary processes are called cointegruting vectors.
A canonical form for the pth order process of interest is
P
Y , = H , Y , - , + ~ H , A Y , - , + l + e l , (10.3.24)
where (Y f,, Y;,) = Yi, Y,, is tbe vector composed of the first g elements of the Y,, the upper left g X g portion of Xea is I,, H, =block diag{Ig, HI.,,}, and HI,,, - Ik-* is nonsingular. The conformable partition of e, of (10.3.24) is (e;,,e;,)', where e,, is a g-dimensional vector. We can write
ezt = eltZe,lz + hi 9 (10.3.25)
1-2
where XeeIz is the g X (k - g) upper right part of Z,,, a,, is uncomlated with el, and
E(a,*a:J = ~ p o 2 2 * (10.3.26)
The vector (AY i f , Y;,) is a stationary process and can be expressed as
by Theorem 2.8.1. Also, f P f
(10.3.27)
where €I,,l , is the upper left g X g block of HJ and €I,,,, is the upper right g X (k - g) block of H,. Because E;=l AY,i = Y,, - Y,, and AYII is stationary, we have
MULTIVARIATE AUTOREGRESSIVE PROCESSES WITH UNIT ROOTS 609
and
1
d g ~ , 2 el, + 0,(1) (10.3.28) l = I
where the O,( 1) remainder is uncorrelated with e, for s > 1.
(10.3.24) be Let the ordinary least squares estimators of the parameters of the model
and
(10.3.29)
(10.3.30)
where
Lc-l = ( Y ~ - l , A Y ~ - l . . . . . A Y ~ - p + l ) .
and 6: = Y; - LL-$'. lple ordinary least squares estimator of the "covariance matrix" of vec H' is
H' = (HI, H,, . . . , Hp)' ,
(10.3.31)
This matrix will normalize the estimators, but it is not an estimated covariance matrix in the traditional sense because some of its elements cannot be standardized to converge to constants. This is analogous to the results for the univariate case discussed in Section 10.1. The matrix V= X r S p + I L,!- I L,- I is kp X kp, and we denote the ijth k X k block of the matrix by v,. From the derivation of the canonical form (10.3.24), we b o w that the least
squares estimator is invariant to linear transformations in the sense that if T is nonsingular, then the least squares estimators of the coefficients for the trans- formed model based on TY, are Tt?,T-', i = 1,2 ,..., p , where fit, i = 1.2,. . . , p, are the least squares estimators for the original model. Also, the estimator of the error covariance matrix for the transformed problem is '&,TI, where E,, is the regression residual mean square for the original problem. We now give the limiting distribution of the estimators for the canonical model.
610 UNIT ROOT AND expzoSIVE TIME SERIES
Theorem 10.33. Let Y, be a k-dimensional pth order autoregressive process written in the canonical form (10.3.24). Assume the c, am iid(O,E,,) random variables or that the e, are martingale differences satisfying the conditions of Lemma 10.3.1. Partition Y; as (Yi,,Yi,), where Y,, is the g-dimensional vector with g unit roots. Assume that (AY i,, Yb,) is a stationary process with E{AY ,,} = 0 and YI,* = 0. Let the ordinary least squares estimators be defined by (10.3.29) and (10.3.30). Then
*re C, = (I - ZT=2 Hj,l,)-i is defined in (10.3.28), %, is defined in (10.3.25).
D, = diag(n, n, . . . , n, nl", t ~ l ' ~ , . . . , r ~ " ~ ) ,
(G, Y) is defined in Lemma 10.3.1, G,, is the upper left g X g portion of G, Y, . = (Y I , Y 12) is the matrix composed of the first g rows of Y, the elements of q,, are zero mean normal mdom variables with
and
Also, e,, converges in probability to Xee.
Proof. By (10.3.28) and by the arguments used in the proof of Theorem 10.1.2,
Likewise, by Theorem 5.3.7 and Theorem 8.2.3,
where a2. is a matrix of zero mean n o d random variables with covari- matrix
V { V ~ s e e QD zL,zz *
The covariance between n-"2 Z AY,,, e:, j a 1, and n-' Z Y,-,e: goes to zero as n increases and the limit random variables are independent. See Lai and Wei (1985a) and Chan and Wei (1988). Using
MULTIVARIATE AUTOREGRESSIVE PROCESSES WITH UNlT ROOTS 611
we obtain the limiting distribution. The estimated covariance matrix See converges to Zee be~ause
n
[n - ( p + l)k]--' 2 (Y: - Lt-,H')'(Y; -L,-,fi') I -pfl
= [n - ( p + 1)kI-l ere: + ~ , , ( n - ' ) . A 1=p+l
Theorem 10.3.3 gives the distribution for the parameters of the canonical form (10.3.24). The coefficient matrices for any other representation are linear combina- tions of the eIements of the canonical coefficient matrices. Hence, the limiting distributions are defined in terms of those of Theorem 10.3.3. As in the univariate case, the limiting distribution of coefficients associated with unit roots differ from the limiting distribution of coefficients associated with the stationary portion of the model.
The following corollary to Theorem 10.3.3 was derived for g = 1 by Fwntis and Dickey (1989).
Corollary 10.33. Let Y, be a k-dimensional pth order autoregressive process with g unit mts and all other roots less than one in absolute value. Assume the process can be transformed to the canonical form (10.3.24) with g unit roots. Let the assumptions on e, of Theorem 10.3.3 hold. Let &, 2 lir, 2 * - * 3 1, be the g roots of
l I m P +A,mP-' + . . . +Apl=o, with largest absolute values, where the A,, i = 1,2,. . . , p, are the least squares estimators. Then the limiting distribution of n(ki - l), i = 1,2,, . . , g, is the distribution of the g roots of
1h - Y ;.G-'l= 0 ,
where (G, Y) is the g X 2g ma& defined in Lemma 10.3.1.
Froof. The roots of the characteristic equation are unchanged by nonsingular transformations and we assume, without loss of generality, that the equation is in the canonical form (10.3.24).
By Theorem 10.3.3, the least squares coefficients are converging to the true values. Therefore, by tfie results of Section 5.8, the roots of the characteristic equation are converging to the true mots. We expand the largest toots about one and write &' = 1 + ri3 + op(S). Then the local approximation to the determinantal equation becoma
612 UNlT ROOT AND EXPLOSIVE TIME SERIES
[ I ( l+pS)+A, f l + ( p - 1)S]+A2[l + ( p - 2 ) S J + . - + A p ~ = O .
Using the definitions of fit in terms of the A,, we have
(10.3.32)
The lower right rtion of I -a, is converging to a nonsingular matrix. The matrix 1 - xr-2 c i s converging to a matrix, say M, with ci* as tbe upper left g X g block, Therefore, we multiply the matrix in (10.3.32) on the left by the nonsingular matrix
where MZ1 is the lower left (k - g) X g portion of M, The values of S tbat satisfy the resulting deteminantal equation are e ual to the values that satisfy (10.3.32). using the fact that the first g columns of 8, - 1 are op(n-l) , the g smallest roots of (10.3.32) are converging to the g smallest roots of
IC,@l,,, - 1,) - I ,4 = 0 *
The result follows from the limiting distribution of at,,, - I, given in Theorem 10.3.3. A
We now turn to the problem of estimating the parameters of (10.3.18) subject to the restriction that one of the roots of (10.3.19) is equal to one. If we write the model in the form (10.3.20), the determinant of H, - I is zero under the null model and there exists a vector K,, with K ~ K , $0, such that
K;(H, -I)=O. (10.3.33)
Therefore, if the first observations are treated as fixed, the maximum likelihood estimator of (HI, , , . , €Ip) is obtained by imposing the restriction (10.3.33) on the least squares estimator, Assume the first element of q, denoted by K, is not zero, and let K; ;K; = (1, -Bo,i2). If we multiply (10.3.20) by K F ~ I K ; , we have
P - 1
AY,, = ~ 0 , 1 2 A Y ~ , + E Bi+t,l by,- , + 'it (10.3.34) I = 1
- I t where Bd+l,l. = K ~ ~ ~ K ~ H , + , , Y: = (Y,,.Yi,), and vl, = K , ~ qe,. Let 6 = (B,,i2r B2,, , . . . , BP.,.). Let li = (li,, li2, li3) be the matrix of regres-
sion coefficients obtained in the regression of [AY,,, AYI,, (AYi-,,. . . , AY,'-p+s)I on (Y,!- ,L'Y:- . . . , AY,!-p+l 1. Of course, the elements of &3 are zeros and ones. Let Ak be the smallest root of
MULTNARlATE AUTOREGRESSIVE PROCESSES WITH UNIT ROOTS 613
where
is a k(p - 1) X k(p - 1) matrix of zeros, e,, is defined in (10.3.30), and Lt- l= (Y~- . l , AY;-, ,.,., AY;-p+l). Then, the maximum likelihood es- timator of B is
OM P - 1 M p - 1 )
where eCez2 is the lower right (kp - 1) X (kp - 1) submatrix of e,,, and %ct21 is the lower left (@ - 1) X 1 submatrix of 3,.
The vector 6 is called the limited information maximum likelihood estimator in the econometric literature. Hence, 4 can be calculated by any of the many econometric computer packages. The limiting distributions of the test statistic and the limiting distribution of the coefficients associated with the conditional likelihood were obtained by Johansen (1988) and Ahn and Reinsel (1990). We give a proof that differs somewhat from the proofs of those authors.
Theorem 103.4. Suppose the model (10.3.18) holds with exactly one mot of (10.3.19) equal to one and exactly one root of HI equal to one. Assume the e, are iid(0, &) random variables or that the e, are mathgale differences satisfying the conditions of Lemma 10.3.1. Assume that Y, is such that K] can be mmalized to give equation (10.3.34). Let 8 of (10.3.36) be the estimator of B. Let
614 UNIT ROOT AND EXPLOSIVE TIME SERIES
where 4 s 4, 4 is defined in Corollary 10.1.1.2, the distribution of + is given in COrOhJ’ 10.1.1.2, and 8ik is defined (10.3.35).
proot. Let Vf-l be the deviation from fit for the regression of Y;-l on (AYf’-l,. . . ,AY,!-,p+l). Then ik is also the smallest root of
where (hll, k12) is the upper k X R portion of (&l, h2). Note that (&,l, k12)= 8: -1.
Let T, I be a transformation matrix such that the unit root is associated with the first element of XI = TI lYf. That is, XI satisfies the equation
P
x, = HxlXf-1 + I: Hx, A&-,+, +TI,% 9 i-2
where Hxi =TllHiTL/, H, is defined in (10.3.21), and the first row and first column of H,, - I are vectors of zeros. Then
n
r-p+1 (*11, +,2)’ t: vf-lv;-l(kl19 6 1 2 )
n
= (hlI, kJT,’ 2 ~f-l%~-,T~~‘(hll, ii,2> r=p+l
n
,-p+l =&’ c X,-,X:-,&,
where %= (&, &) = T;:’(iil &,2), and %,-, is the deviation from fit for the regression of X,- on (AYi- . . . AY,!,,,,. 1. Also,
where ik is the smallest root of (10.3.35) and the roots of (10.3.35) are the roots of
MULTIVARIATE AUTQREGRESSIVE PROCESSES WlTH UNIT ROOTS 615
for i = 1,2,. . . , k and j = 1,2,. . . , k, by the arguments used in the proof of T h m m 10.1.2. Therefore, the elements of the first row and column of 6 z & are oP(l). Also, because (x,,, . . . , &) is stationary with positive definite covariance matrix, the matrix of mean squares and products of (x2,, . . . ,&) is an estimator of a noxisingular matrix. Also, by the assumption that HI - I has one zero root, t2 is of rank R - 1 and
where M is a nonsingular matrix. It follows that i k = 0 ~ 1 ) . ~ l s o see Theorem 2.4.1 of Fuller (1987, p. 151). From the definition of d,
where Bo,l, = (1, -Bo.lz) and ar is the probability limit of li. Because (wZ, m3)'~iWl is a stationary pmess, the asymptotic normality of n1'2(d- 8) follows from Theorem 5.3.4. See Theorem 8.2.1.
obtained in the regression of q$ on Lf-l . Because * is converging to ar, f i , , , is converging to Bo,lZ, and e,, is converging to Z,,, we have
(n -P)?,,& a - l B o , I 3SB,B6., ,
Note that L,- I ( d?-, h3) is the predicted value for
and the distribution result for oii'2( d - 0) is established. We now obtain the limiting distribution of i k . Because the roots are unchanged
by linear transformation, we assume, without loss of generality, that the model is in the canonical form. In the canonical form, Bo,l. = (1, 0, . . . , 0). From (10.3.37),
where
616 UNIT ROOT AM) EXPLOSIVE 'IWE s e m
and the first row of Q is a zero vector. It follows that
The standardized matrix
D;;'2[8;: - g2(&8xx@-'g;]s;:",
where B;; = diag !$,, is converging to a matrix with (1,0,. . . $0) as the first row. Therefore,
and & 3 8;. See the proof of Theorem 10.1.2. A
The results extend to the model with an intercept.
Corollary 10.3.4. Let the model P
Y,=H,+H,Y,-, + ~ H , A Y , - , + , + e , , t=2,3 ,..., n ,
be estimated by ordinary least squares where H is a &-dimensional column vector. Let KY'H: = 0, where K ~ ' ( H ; - I) = 0, and K I and HY are the true parameters of the process. Let all statistics be defined as in Theorem 10.3.4 except that the regression equations contain an intercept. Let the error assumptions of Theorem 10.3.4 hold. Let b= @,,,,, &, &, . . . , tipvl ), where tl is the estimated intercept for the model (10.3.34) expanded to include an intercept. k t 988 be defined as in Theorem 10.3.4 with
# = (AY2,. AY,,, . . . , AY,,, 1, AY:-,, AY:-,, . . . , AY:-,,).
i = 2
8
MULTIVARIATE AUMReGRESSIVE PROCESSES W m UNIT ROOTS 617
3 where +M + $,, the limit random variable e,, has the distribution given in Theorem 10.1.3, and Ak is defined by (10.3.35) with L,.-l expanded to include a one.
Proof. omitted. A
One method of estimating the entire system (10.3.20) subject to the restriction (10.3.33) is to use a nonlinear estimation program. The computations are illustrated in Example 10.3.1 of the next subsection.
10.33. Vector Process with Several Unit Roots We now extend our discussion to the estimation of the multivariate process (10.3.18) subject to the restriction that exactly g mts of (10.3.19) are equal to one, under the assumption that the part of the Jordan canonical form of H, associated with the unit roots is the g-dimensional identity matrix. This means that there are g vectors 4 such that
KI(H, -I) = 0, (10.3.38)
where the 4 define a g-dimensional subspace. Let
* s,, = (fi, - I)V,'(fi, - I)', (10.3.39)
where V, , is the portion of the inverse of
P = (k,, 4% * . . , 41, where e,, is defined in (10.3.30). That is,
( 6 h h - &see)@ = 0,
where &, i = 1,2,. . . , k, are the roots of
i = 1,2, . . . , k , (10.3.40)
Is,, - Ae,,I = 0 (10.3.41)
and R'Z,,& = 1. For normd e,, the method of maximum likelihood defines the estimated
g-dimensional subspace with the g vectors associated with the 8 smallest roots of (10.3.41). The vectors 4, i = k - g f 1, k - g + 2,. . . , k, define a gdimensional
618 UNIT ROOT AND EXPLOSIVE TIME SBRIES
subspace, but any set of g vectors of rank g formed as linear combinations of the 4 can be used to define the subspace.
We have solved the problem of estimating those linear combinations of the original variables that are unit root processes. One could ask for those linear combinations of the original variables that define stationary processes, the cointegrating vectors. The cointegrating vectors are obtained in Example 10.3.2 using a maximum likelihood computer program. The cointegrating vectors can be computed directly using a determinantal equation different than (10.3.41), Define S O O I SIO, and so, by so, =s;o'
and
where d, is the degrees of freedom for %ee. For the first order model, V,' = X~=2Yr-lYr'.-l and
Then, the k - g cointegrating vectors are the vectors associated with the k - g largest mots of
~s, ,s~s , , - .v,'l= 0. (10.3.44)
The roots Q of (10.3.44) are related to the roots of (10.3.41) by
P = ( l + d i ' f i ) d i ' f i (10.3.45)
and L = ( 1 - P)-'kjr Given that (10.3.38) holds, there is some arrangement of the elements of Y,
such that
where (I, -Bo,,2) is a g X k matrix obtained as a nonsingular transformation of (K~. 5,. . . , KJ. Thus, if we let
we can multiply the model (10.3.20) by Bo to obtain
MULTIVARIATE AUTOREGRBSSNE PROCESSES WITH UNIT ROOTS 619
where v, = Boer, and the first g rows of B, are composed of m s . Then the maximum likelihood estimator of B,,,, is
fro,,, = C 2 2 L 7 (10.3.47)
where
S,, = S,,R[Diag(i, - I, i2 - 1,. . . , i k - 8 - I, O,O, . . . , o>J&S,, ,
earZZ is the lower right (k - g) X (k - g) submatrix of em,, & is the matrix of vectors 4, and 4, i = 1,2,. . . , k, are the characteristic vectors in the metric gee associated with equation (10.3.40). A derivation of this estimator in a different context is given in Fuller (1987, Theorems 4.1.1 and 4.1.2). The estimators of B,,l,, j = 2,3,. . . , p , are given by the regression of f), AY, on (AY;-,, AY:-2.. . . , AY:-p+,).
We now give the limiting distributions for some of the statistics associated with maximum likelihood estimation.
Theorem 10.3.5. Assume that the model (10.3.19), (10.3.46) holds, where e, are iid(0, Zee) random vectors, g of the roots of H, are one, and R - g of the roots are less than one in absolute value. Assume that the part of the Jordan canonical form associated with the unit roots is diagonal. Let be the least squares estimator of H, obtained in the least squares fit of equation (10.3.20). Let i i a & % - . . ~ A k betherootsoftheequation
I(&, -I)V,’(A, - I ) ’ -A~ , I=O,
where e,, is the least squares estimator of Z,,, and V,, is defined with (10.3.39). Then the distribution of (&-,+Is &-g+2,. . . , ik) converges to the distribution of the roots ( f i , , V2,. . , , 4 ) of
where G,, and Y,, are g X g matrices defined in Lemma 10.3.1.
(ik-,+ I , i k - g + 2 , . . . , li,> converges to the distribution of the roots ( 4, G ~ , . . . , If the statistics are computed using deviations from the mean, the distribution of
of
KYg, - lq’Wg, - as‘>-’(Y,, - lq‘> - vIgl = 0 I
where 6 and q are g-dimensional vectors defined in Lemma 10.3.1. Let fi,,, be defined by (10.3.47). let Bj,*, be the g X k matrix composed of the
620 UNIT ROOT AND EXPLOSIVE TIME SERIES
first g rows of Bj, j = 2,3, . . . , p, of (10.3.46), and let fij,*., j = 2,3,. . . , p , be the associated estimators of Bi,'., j = 2.3, . . . , p . Then
n"2vec((6 - B)'}-% N(0, VBB),
where V,, = Xu, 8 a- I , a = E{ $; &},
and
Proof. We outline the proof for the no-intercept model. The model can be transformed to the canonicai form (10.3.24) and the mots of (10.3.41) computed with the canonical form are the same as those computed with the Original variables.
Let v,-, be the column vector of residuals obtained in the regression of Y,-, on (AY,!- ),. . . , AY;-,+ ,), and let sit be the regression residual mean square for the regression of AU,, on ( Yf'- I , AY ;- . . . , AY f'-p + ). Then, in the set of ordinary regression statistics, the coefficients for Y,-' in the regression of AX, on (Y:-,, AY:-, , . . . , AY;-,+ I ) are the regression coefficients for the r e p s i o n of AY;., on y f - l and
is the estimated covariance matrix for that vector of Coefficients. For the model (10.326) in canonical form,
and D, = diag(n, n, . . . , n, nl", n i l 2 , . . . , Also,
MULTIVARIATE AUTOREGRESSIVE PROCESSES WITH UNIT ROOTS 621
* * where Y is the limit 3 d o p variable for n-’ &’=, (Xi:: e,)e:, Y,, is the*upper left g X g submatrix of Y, Y,, is the upper right g X (k - g) submatrix of Y, and the elements of V,, are normal random variables. See the proof of Theorem 10.3.1. Tzlen
where
The nonsingular matrix n1’ZM2,(H~,2, - I) dominates Rn22t which is O,,( 1). Thus,
-I- remainder,
yhere*the remainder terms are of small order relative to the included terms. Now M:,Mn2, is a nonsingular matrix multiplied by n, R,,, is O,,(l), Rn21 is OJl), and the upper left g X g portion of gee converges to I,. Therefore, the g smallest roots of (10.3.41) converge to the roots of
* Because the elements of e,, am small relative to Mn and the g smallest mots are OJl), the limiting behavior of e, of (10.3.47) is that of
Therefore, the limiting behavior of n1’2(fio,1z - B,,J is the limiting behavior of
In the canonical form, ii,,, is estimating the zeco matrix with an error that is
622 UNi" ROOT AND EXPLOSIVE IIME S w e S
converging to a mahrix of normal random variables. Because the maximum likelihood estimators are invariant to linear transformations, the distribution for the estimators of the parameters of the original model are the appropriate transforma- tion of the canonical &os12. The limiting distribution of the vector of other elements of b is obtained by the arguments used in the proof of Theorem 10.3.4. A
Table 10.A.6 of Appendix 10.A contains percentiles to be used in testing for unit mts in the characteristic equation (10.3.19). The entries in the table are the percentiles associated with
A, = (1 +df-'Ai)-',( , (10.3.48)
where li, i = 1,2,. . . ,A, are the roots of the &tednantal equation (10.3.41), and d, is the degrees of freedom for see. Monte Carlo studies conducted by Hoen Jin Park (personat communication) indicate that in smaR samples with g > 1 that statistics based upon 1, have power superior to the corresponding test based on $.
The first row of Table 10.A.6 is the distribution of the test statistic for testing the hypothesis of a single unit rod against the alternative of no unit roots, the second row is for the hypothesis of two unit roots against the alternative of no unit roots, the third row is for the hypothesis of two unit roots against the alternative of one unit root, tbe fourth row is for the hypothesis of three unit mts against the alternative of no unit roots, etc. m e test statistic for row 1 is X,, the test statistic for row 2 is ik-, + x k s the test statistic for row 3 is $ 9 the test statistic for row 4 is Xk+, + X k - , -I- X k , the test statistic for row 5 is &-, + X,, etc. he numkrs at the left are the degrees of freedom associated with z,,.
Table 10.A.7 contains the percentiles of d o g o u s statistics for the model (10.3.20) estimated with an intercept. Table 10.A.8 contains the percentiles of analogous statistics for the model estimated with a time trend.
The tables were constructed by Heon Jin Park using the Monte Car10 method. The limiting percentiles were calculated by simulation using the infinite series representations of Lemma 10.3.1. The percentiles were smoothed using a function of d,, where df is the d e p s of freedom. The standard errors of the entries in the tables are a function of the size of the entries and range from 0.001 for the smallest entries to 0.25 for the largest entries.
We now discuss the estimation of H, - I of equation (10.3.20) subject to the linear constraints associated with the presence of unit roots. Assume that the least squares statistics have been computed, and let fi, be the least squares estimator of HI. Let V,, be the part of the inverse of the matrix of sums of squares and products associated with Y,-,. Each column of ft: is a vector of regression coefficients and the ordinary least squares estimator of the "covariance matrix" of a column is V,, multiplied by the e m variance. Thus, arguing in a nuwigmuus manner, the rows of
*
have the property that they are estimated to be uncomlated.
MULTIVARIATE AUMREGRBSSWE PROCESSES WITH UNIT ROOTS 623
Under the hypothesis of g unit roots in the original system, equivalent to g zero roots for H, - I, there is a B0,,. such that
y’”(I3; - I)BL,,, = 0 . (10.3.49)
The estimator of Bo,,,, up to a linear transformation, is given by the g characteristic vectors of (10.3.40) associated with the g smallest roots. Let
il; =v;Ii’2(H; -I)
and
ri =V,”’(H; -I) .
Now each column of v = Bo,l.&, contains g linear contrasts that are estimating zero and &,,.a, is an estimator of those contrasts. Therefore, an improved estimator of a column of r, is obtained by subtracting from the column of Iff, an estimator of the e m in fi, constructed with the estimator of the g linear contrasts. Thus, the estimator is
where 9; = fii.,.@;,,,, eVv = Po,, zeefi;,,., and eve = &,,e,,. It follows that the estimator of Hi - I is
a; - I =v;;2P; = (4%; - 1x1 - ii;,,,2;w12ve), (10.3.51)
The estimated covariance matrix of the estimators i = 1.2, . . . , k, is
e r r = g e e - ~ c v ~ ~ v i ~ v e .
Therefore, the estimated covariance matrix of vecffi, - I) is
?{Vec(ii, -I)} =(V::2~I)(I~e,r)(V::2QDI)f *
The estimator of HI - I is seen to be the least squares estimator adjusted by the estimators of zero, where the coefficient matrix is the matrix of regression coefficients for the regression of the e m in 8, - I on the estimators of zero. his formulation can be used to construct the maximum likelihood estimators of the entire model. Let
d = ve~{(fi, - I)f(I, -Bo,,2)f} be the Rg-dimensional vector of estimators of zero. Then
624 UNIT ROOT AND EXPLX)SIVE TIME SERIES
follows from regression theory. If the original vector autoregressive process is
Y, =H,+H,Y,-, +H,AY,-, +-.~+H,AY,-,+, +e,, (10.3.53)
we may wish to estimate the vector of intercepts Ho with the restriction that
Bo.,Ho = 0. (10.3.54)
This is the restriction that the linear combinations of the original variables that are unit root processes are processes with no drift. The estimator of Bo, , , is given by the eigenvectors associated with the smallest roots of
I(&, ft, - I ) v p o , A, - I)‘ - = 0, (10.33)
where V** is the portion of the estimated covariance matrix of the least squares estimators associated with the vector of regressor variables (1, Y,’- I ).
In Example 10.3.2 we demonstrate how existing software for simultaneous equation systems can be used to construct the estimators and estimated standard errors of the estimators.
Example 103.2. To illustrate the testing and estimation methods for multi- variate autoregressive time series, we use the data on U.S. interest rates studied by Stock and Watson (1988). The three series are the federal funds rate, denoted by Y,,; the %day treasury bill rate, denoted by Y2,; and the one-year treasury bill rate, denoted by Y3,. The data are 236 monthly observations for the period, January 1960 through August 1979, The one-year treasury bill interest rate was studied as a univariate series in Examples 10.1.1 and 10.1.2.
Following the procedures of Example 10.1.1, a second order process fit to tbe differences yields
A’f,, = 0.017 - 0.550 AY,,-, - 0.161 A2Yl,r-I , (0.025) (0.075) (0.065)
A’f,, = 0.021 - 0.782 AY2,,-, - 0.025 AZY2,,-, , (0.021) (0.082) (0.065)
A2f3, = 0,018 - 0.768 AY3,1-l + 0.105 A2Y3.,-, . (0.019) (0.076) (0.065)
(10.3.56)
MULTIVARIATE AUTOW-SIVE PROCESSES WITH UNIT ROOFS 625
The +p statistics for the hypothesis of a second unit root under the maintained hypothesis of a single unit root are -7.37, -9.52, and -10.13, respectively. 'Ihus, in al l cases, the hypothesis of a second unit root is rejected at the one percent level.
"hiid order autoregressive models fit by ordinary least squares to each series gives
APl, = 0.108 - 0.017 Y,,,-, + 0.297 AY,,-, + 0.177 AYl,f-2, (0.062) (0.010) (0.065) (0.M)
AYz, = 0.078 - 0.01 1 Y2,, - , + 0.200 AYZ,, - I + N(0.035 AYZ,, .-2
(0.062) (0.012) (0.066) (0.066) (10.3.57)
A t . , = 0.082 - 0.012 Y3,r-I + 0.343 AY3*,-, - 0.094 dYs,,-2. (0.062) (0.01 1) (0.065) (0.064)
The three residual mean squares are 0.143, 0.098, and 0.083 for the federal funds rate, 9O-day treasury rate, and one-year treasury rate, respectively. There are 233 observations in the regression, and the residual mean square has 229 degrees of freedom. The 7; statistics for the hypothesis of a unit root in the univariate processes are - 1.62, -0.98, and -1.09. The 1096 point for 7: for a sample with 229 degrees of freedm is about -2.57. Hence, the nuU hypothesis of a unit mot is not rejected for each series. Models were also fit with time as an additional explanatory variable. In all cases, the hypothesis of a unit root was accepted at the 10% level. On the basis of these results, we proceed as if each series were an autoregressive process with a single unit root and other roots less than one in absolute value.
We write the multivariate process as
or as
BY, = H, + (H, - I)Y,-, + H, AYl - l + H, AY,-, + e, , (10.3.59)
where Y, = (Y,,, Y,,, Y3t)' is the column vector of interest rates, H, is a three- dimensional column vector, and H,, i = I , 2,3, are 3 X 3 matrices of parameters. If the multivariate process has a single unit root, the rank of HI - I is 2. The rank of H, - I is one if the multivariate process has two unit roots.
To investigate the number of unit mts associated with the multivariate process, we compute the roots of
/(al - I)VL;(fil - I)' - Ae,,l= 0 , (10.3.60)
where B, is the least squares estimate of H,, S,, is the least squares estimate of see, and V, ( jcrl i is the least squares estimated covariance matrix of the ith row of H,. The matrix (H, - I)V~l'(&, - I)' can be computed by subtracting the residual
626 UNIT ROOT AND EXPLOSIVE TIME SERfES
sum of squares and products matrix for the full model from that for the reduced model
AY, =He + H, b y , - , + A, BY,-, + e, . (10.3.61)
The full model has nine explanatory variables (Y:-,, AY:- ,, AY:-2) plus the intercept and is fitted to 233 observations, Hence, the estimator of See has 223 degrees of freedom. The reduced model contains (AY,- I , and the intercept.. The matrices are
a ; - I =
'-0.167 0.043 0.020 (0.043) (0.034) (0.033)
0.314 -0.245 -0.058 (0.121) (0.098) (0.094)
-0.109 0.189 0.019 ,(0.098) (0.080) (0.077)
(10.3.62) 0.1304 0.0500 0.0516 0.0500 0.0856 0.0712 0.0516 0.0712 0.0788
) 2.0636 -0.4693 -0.1571
( -0.1571 0.1752 0.1574 (fi, - X)V,'(fh, -1)' = -0.4693 0.5912 0.1752 . (10.3.63)
The standard errors as output by an ordinary regression program are given in parentheses below the estimated elements of (k, -1)'. The estimates of HI - I are not normally distributed in the limit when the Y,-process contains unit roots.
The three roots of (10.3.60) are 1.60, 12.67, and 32.30. The test statistic tabled by Park is
k
2 (1+d;'ii)-IA= i: x,, i = k - g + l i - k - g + l
where ti, is the degrees of m o m for &, i, 3 4 P * * - zz & % and 8 is the hypothesized number of unit rvts in the process. The test statistic for the hypothesis of a single unit root is A, = 1.59. Because the median of the statistic for a single root is 2.43, the hypothesis of one unit root is easily accepted. The test for two unit roo3 against the alternative of one unit root is based on the second smallest root A, = 11.99. On the basis of this statistic, the hypothesis of two unit roots is accepted at the 10% level because the 10% point for the largest of two i given tw? unit roots is about 12.5. The hypothesis of three unit roots is pjected because A, = 28.21, exceeds the 1% tabular value for the largest of three A given three unit roots. For illustrative purposes, we estimate the process as a multivariate process with two unit rods and diagonal Jordan canonical form.
MULTIVARIATE AUTORffiRESSIVE PROCESSES WITH UNIT ROOTS 627
The eigenvectors associated with the determinantal equation (10.3.60) are
kl = (0.4374,0.1042,3.1472)' , 4 = (1.8061,5.3600, -6.3725)' , k3 ~(2.6368, -4,3171,1.8279)' .
(10.3.64)
If the vector process contains two unit roots, the two time series defined by
CW,,,W,,, = ( d j Y , , q f , ) 7 (10.3.65)
are estimated to be the unit root processes. Furthermore, the errors in these two processes are estimated to be uncorrelated. Thus, it is estimated that
(H, - Mk1, &) = (0.0) (10.3.66)
and that
(kl, &)'XJk,, 4) = I - The determinantid equation (10.3.44) that can be used to define the cointegrat-
ing vectors is
897.0 679.4 639.1 15.00 6.47 7.18 1261.6 897.0 834.9 1 ( 6.47 5.26 5.50) - .( 7.18 5.50 6.69 834.9 639.1 615.8
The roots are 0.1264, 0.0537, and 0.0071, and the associated vectors are
(-0.104,0,313, -0.188)' , (-0.056, -0.106,0.195)' , (10.3.67) (0.004,0.056, -0.025)' .
If we assume that there are two unit roots, then the first vector of (10.3.67) defines the single cointegrating vector.
We note that if (10.3.66) holds,
(a, - 1x4, &)K;; = (0,O) (10.3.68)
for any nonsingular 2 X 2 matrix K, , . In particular, we could let
(10.3.69)
The K l , of (10.3.69) is nonsingular if Y,, and Y,, are both unit root processes and are not both functions of a single common unit root process. Therefore, in practice, care should be exercised in the choice of normalization. For our example, we assume that
628
where
UNIT ROOT AND EXPLOSIVE TIME SERIES
(H, -I)'p=O,
(10.3.70)
We create a square nonsingular matrix by augmenting fl with a column (O,O, 1)' to obtain
If we multiply (10.3.59) by B', we obtain the restricted model
AY1~=et,Yl ,~- l +e12Y2.r-l +eB3Y3.r-l + w 1 6 1 ~ + s r l I
AY21 = P*, AYlI + w 1 e w + u21 ' (10.3.71)
Ay3r = a1 "1, + wt61w + ' 3 1
where W, = (1, AY,'... , , AY;-J. This model is in the fonn of a set of simultaneous equations. A number of programs are available to estimate the parameters of such models, We use the full information maximum likelihood option of SASIETS' to estimate the parameters. The estimates are
A f l 1 = - 0.130 Y, ,,-, + 0.3% Yz,,-, - 0.238 (0.039) (0.1 10) (0.080)
+W,4, ,
bY2, = - 0.487 AY,, + W,4w ,
AY3, = - 0.122 AY,, + W14,,, , (0.3 19)
(0.23 8)
(10.3.72)
where we have ommitted the numerical values of (4w, 4w, hW) to simplify the dispiay. The vector (-0.130,0.396, -0.238) applied to Y, defines a process that is estimated to be stationary. The vector is a multiple of the cointegrating vector given as the first vector of (10.3.67).
The estimator of Hi - I estimated subject to the restriction that the rank of H, - I is one is
fi;-+
-0.130 0.063 0.016 (0.039) (0.030) (0.027) 0.3% -0.192 -0.048
(0.1 10) (0.091) (0.087)
-0,238 0.116 0.029 (0,080) (0.058) (0,052)
(10.3.73)
TESTING FOR A UNIT ROOT IN A MOVING AVERAGE MODEL 629
The standard errors of the estimates are given in parentheses below the estimates. The estimates are not al l normally distributed. However, the difference between the estimate and the true value divided by the standard error converges to a N(0,l) random variable for a correctly specified model. Therefore, the mud regression testing and confidence interval procedures can be applied to the restricted estimates under the assumption that the process contains two unit roots. The tests are subject to the usual preliminary test bias if one estimates the number of unit roots on the basis of hypothesis tests. The two time series defined by
U,, f 0.487Y1,
and Y3, + 0.122Y,,
are processes estimated to have unit roots. These two pmsses are linear combinations of the two processes in (10.3.65). That is, the two processes are obtained by transforming the first two vectors of (10.3.64).
"?*). AA 3.147 -6.372 0 1
10.4. TESTING FOR A U" ROOT IN A MOVING AVERAGE MODEL
"he distributional results developed for the autmgressive process with a unit root can be used to test the hypothesis that a moving average process has a unit root. Let the model be
Y, =el + Be,-I , t = 1,2,. . . , (10.4.1)
where the el ate iid(0, a*) random variables. We assume e, is unknown and to be estimated. We have
Yl = Pe, + e l . U,=f;(Y; P,eo) + el, t = 2 , 3 , . . . , (10.4.2)
where 1 - 1
. t (Y B, eo) = -2 - (-P)le0 . j * 1
On the basis of equation (10.4.2) we define the function el = e,(Y J3, eo). where
e , ~ P, eo) = C c-B>'u,-, + (-@)'e0. I - I
j - 0
Then
630 UNlT ROOT AND EXPLOSIVE TIME SERIES
and
If p = -1, we have from (10.4.1) I
2 Y, = -e, + el. (10.4.3) i-1
Therefore, treating e, as a 6x4 parameter to be estimated, n r
1-1 ]=I 2, = -n-' 2 2 yi (10.4.4)
is the best linear unbiased estimator of e, when /? = - 1, and the error in i, is -Li, = -n Xr=, el.
We shall construct a test of the hypothesis tba: /3 = - 1 by considering one step of a Gauss-Newton itecafion for (p, e,) with (- 1,8,) as the initial value for (@, e,). The dependent variable in such an iteration is
- 1
(10.4.5)
Under the null hypothesis,
el(Y; -1, go) = e, - gn.
Let W,(Y; /3, e,) denote the partial derivative o f f l y @, e,) with respect to /3, and note that w,(Y; fi, 8,) is the negative of the partial derivative of e,(Y; p, e,) with respect to p evaluated at <S,G,). men, under the null,
The W,(Y; - I , E,) satisfy the recursive rekitionship
W,(Y; -1, 2,) = 2, , WJY; -1, 2,) = Y* + so , q(X--l,&,)=q-* +2w,-,(Y; -1 ,8J-q- .2w; -1,iQs
t =3,4, 1 . . ,
and the e1(X - 1, 2,) satisfy the recursive relationship
TIis"G FOR A UNIT ROOT IN A MOVlNG AVERAGE MODEL a1
e,(Y; -1, 2,) = U, + 2,, e,(Y; -1, So) = Y, + e,-,(Y; -1, g o ) , I = 2,3,. . . .
The Gauss-Newton iteration consists of regre&ag e,(Y; - 1, i,) on W,(Y; - 1, 2,) and a second variable that is identically equal to negative one. The regression equation is
e,(Y; - 1, 2,) = W,(Y; - 1, go) Afl - Aeo + e, , (10.4.7)
and the estimator of AS is
where ~ = ~ ( Y ; - l . 8 0 ) , ~ n = = n - ' Z ~ = , ~ , andi?,=e,(Y;-l,i?,). Thepivotal statistic associated with (10.4.7) is
(10.4.8)
Where n
s2 = (II - 2)-' (2, + he ,̂ - A&)' 1= 1
and Ai?, is the estimated regression coefficient for the regression equation (10.4.7). The limiting distribution of n AB follows from the results of Section 10.1.1.
Lemma 10.4.1. Let U, satisfy the model (10.4.1) with flo = - 1. Then
II AS". -0.5[G - H2 - TK + (12)-'T2]-' , * a T+ -O.J[G - H* - TK + (12)- 'T2]-"' ,
where A b is defined in (10.4.7), G, T, and H are defined in Theorem 10.1.3, and K is defined in Theorem 10.1.6.
Proof. Under the null model, the repsion coefficient estimating the change in fi is
632 UNIT ROOT AND EXI"VE TIME SGRIES
and
Hence,
1 = I i ('g e,)e, = 0.5(x: - 1=0 e:) ,
rn n
1=1
nA@= - I 2 ' G,, - N: - T,Kn f (12) T, (10.4.10)
Where
and n- 1
Kn = n-'" (n - j ) ( j - l)ej. J- 1
The result for n A s follows by the arguments of Theorem 10.1.2. Because A)? O,(n-') and Ago = Op(n-1'2), s2 converges to u2 and we have the result for r. A
* Percentiles for the test statistic rare given in Table 10.A.12 of Appendix 1O.A.
If the true Po is greater than - 1, the Gauss-Newton procedure should produce a positive A@ to move the estimate away from -1. HOW~V~X, under the null model all values of A@ are negative. Therefore, the null hypothesis is rejected for the least negative of the values, the right tail of Table 10.A.12.
Consider the testing situation in which the null model is
Y, =e, - e l - ] , r = 1,2, . . . , (10.4.1 1)
and the alternative model is
l', = p + /3el.-l + e l , t = 1,2,. . . , (10.4.12)
where < 1. Under the altemative model
Y, =f;(Y; p, B eo) 3- el * (10.4.13)
TESTING FOR A UNIT ROOT IN A MOVING AVERAGE MODEL 633
where
for t = 2,3,. . . . The partial derivatives are such that one step of a Gauss-Newton iteration for (p, p, e,) with (0, - 1, go) as the initial value is given by the regression of e,(Y; 0, defined in (10.4.5) on (t, W,, -l), where 0, is defined in (10.4.4) and W, = W,(Y, 0, - L E O ) is defined in (10.4.6). Under the null, the distribution of the regression pivotal for W,(Y; 0, - 1, go) is that tabulated as 7, in Table 10.A.2.
Lemma 10.42. Let the model (10.4.1) hold with /? = -1. Let Afi be the coefficient of W, in the regression of e,(Y; 0, -1, 2,) on (t, W,, - I), and let 7: be the corresponding regression pivotal. Then
re 2 -112 ?7+ 0.5(G - H 2 - 3K ) [(T - 2H)(T - 6K) - 11 ,
where G, T, H, and K are defined in Theorem 10.1.6.
Prd , Under the null model
where
and XI = Z;=, ei. This is the same as the expression for d1 in the first order model (10.1.34), and the result follows from Theorem 10.1.6. A
A testing situation in which the null model contains a constant t m is also of interest. The null model is
K = p + e , - e , - , , f = 1,2,. * . , (10.4.14)
and the alternative model is (10.4.12) with lpl< 1. The test statistic is conshucted in a similar manner to that for the null model (10.4.11). The initial values for p and e, are
634
and
UNIT ROOT AND BXPLOSIVB TlMfi SERIES
n f
Zo = -n-' 2 y/ + 0.5ji4n + 1). r = l / = l
The value for Z1 is Y , + g O - f i , we have Z,=2i=lY,+Zo-fp for t = 2,3 ,..., n,and
t - I
W,(Y; j i , - 1, Zo) = -OSt(t - 1)h + 2 jq-, + . j - 1
TIE test statistic tp is the regression pivotal for AP associated with the regression equation
e,(Y @, -1, go) = ( t - 1 ) A k + Y(Y; A -I ,&) A@ - Ae, +e, . (10.4.15)
The limiting distribution of the test statistic is given in Lemma 10.4.3. The distribution of the test statistic is tabled in the second part of Table 10.A.12.
Lemma 10.4.3. Let the model (10.4.12) hold witb @ = -1. Let A 4 be the regression coefficient for W,(Y; A -1, go) in the regression (10,4.15), and let Cfi be the corresponding regression pivotal. Then
where 0, = G - H2 - 3K2 + 0S(T - 2U)2 - (T - 2H)(12L - yi - 3K),
G, H, and K are defined in Theorem 10.1.6, and W(r) is the standard Wiener process.
proof. Omitted. See Arellano and Pantula (1995). A
The testing procedure can be extended to the autmgressive moving average process. Let the mode1 be
P a
or P a
(10.4.16)
TEsnNO POR A UNlT ROOT 1N A MOVlNG AVERAGE MODEL 635
where the roots of
are less than one in absolute value, 5, = Xfsl p,, and a, = -ZyG1 4, j = 2.3,. . . , q. Assume it is desired to test 5, = -1. The derivatives of the function with respect to the autoregressive moving average parametem are given in (8.4.9) of Section 8.4. If we set f l = -1 in the original model, we have
P 4
f o r r = p + l , p + 2 ,..., whereX,=Eisly/,and
4
K = -e, - Z: i=2
Therefore, treating K as fixed, XI is a stationary process (or is converging to a stationary process) with mean K. If the original model contains a mean p, then the model for X, contains the mean function + K$. The testing procedure consists of the following steps:
1. Create the variable X, = 5. If the null model is of the form (10.4.16), estimate f ) = ( a , , q , .. .,ap, 12,. . . , l,) and K by any of the procedures of Section 8.4. If the null model contains an unknown mean, estimate 8 and (KO, K ~ ) by any of the procedures of Section 8.4. Let Z, = e,(Y,; P', 2, f ) be the residuals from this fitting operation for (10.4.16).
2. For the model (10.4.16), evaluate the derivatives at (@', K, 5,) = ( P ' , k, - 1) and carry out one step of the Gauss-Newton procedure with Z1 as the dependent variable. The test of a, = -1 is a test for a unit root of the moving average paa of the process. The critical values for the test are tbe *7 of Table 10.A.9. If the null model contains an unknown mean, (k,, E , ) is a part of t&e model and changes in tbese two parameters are estimated as part of the Gauss-Newton step. critical values for the test of the unknown-mean null model based on c, are the values given as *., in Table 10.A.9.
Tanaka (1990). Tsay (1993). Saikkonen and Luukkonen (1993). and Breihmg (1994) have discussed alternative testing procedures.
Example 10.4.1. To illustrate testing for a moving average unit root we use computer generated data. The data are given in Table 10.33.3. The model for the data is
636 UNIT ROOT AND EXPLOSIVB TIME SERIES
y, - p + q(L.1 - P) + q A K - 2 - =e, +S,e,-1 + &el-,
=el + 4',e,-, + -e,-z),
(10.4.17)
where the er are NI(0, a*) random variables, l1 = P, + &, and 5, = -&. The maximum likelihood estimates of the parameters ate
(p, 61, &,&,&)=(- 0.021, - 1.375, 0.601, - 0.471, - 0.401). (0.060) (0.100) (0099) (0.118) (0.117)
The roots of the moving average polynomial are 0,911 and -0.440. To test the hypothesis that one of the roots of the moving average polynomial is one, we create the variable XI = 2jz1 u/ and estimate the parameters of the model
(10.4.18)
where K,-, = -e, - h e - , and K! = p( 1 + aI + %), by maximum likelihood. The estimates are
(GI, &, 12, 4, gl) = (- 1.385, 0.573, 0.481, 0.532, - 0.009). (0,100) (0.098) (0.110) (1.427) (0.024)
We construct a test for the null model of the form (10.4.17). The dependent variable for the Gauss-Newton step is the residual Zl obtained from the fit of model (10.4.18). We write the regression equation as
The derivatives defining the explanatory variables are given by the recursive equations
Q all = mT-1 + Q u , , t - i - lz(Q,,.,-i - Q o , , , - z )
=-Y1, t = 2 ,
Qa, , t= - L 2 + Qa2.r-i - L(Qu2, t - i - Q m z , t - 2 ) *
Q C , . l = 81-i + Q,, , t - i - l z ( Q ~ ~ , t - i - Qr,.c-z) 9
QL2,t = 8t - i - gt-2 + Qgz,r - i - L ( Q L z , t - i - Q ~ ~ . i - z )
where the equations hold for t 2 3, and it is understood that the values are zero for t = 1 and t = 2 unless otherwise specified.
TESTING FOR A UNIT ROOT IN A MOVDJG AVERAGE MODEL 637
Regressing gr on the derivative in a regression with 98 observations, we obtain the estimated regression equation,
5, = 0.23 - 0.003 (r - 2)- O,OllQ,l,r- 0.016Q,2,1- 0.081QL1,,+ O.058QL2,,. (0.32) (0.004) (0.102) (0.101) (0.051) (0.114)
The test that the moving average contains a unit root is * T- = -(0.051)-'(0.081) = -1.39.
The hypothesis of a unit root is just accepted at the 5% level, because -1.59 is just less than the tabular value of -1.563. The hypothesis is rejected at the 10% level because the calculated value is greater than the 10% tabular value of - 1.71 1,
If the null model is (10.4.17) with ( f;, p ) = (- LO), and the alternative is the model (10.4.7), we estimate the model in X, that contains only u,,. The maximum likelihood estimates are
(it,, 4, 12, %)=(- 1.385, 0.572, 0.483, 0.074). (10.4.20) (0.099) (0.098) (0.109) (0.724)
The derivatives for (a,, %, &, &) are found in the same manner as for the alternative model with a mean. One step of the Gauss-Newton procedure is computed with the intercept but no time variable. The result of the Gauss-Newton step is
Er = - 0.11 - 0.002 Q,,,, - 0.016 Q,,, - 0.059 Q,,,, + 0.051 Qc2,f (0.26) (0.102) (0.101) (0.042) (0.112)
and the test statistic is % = - 1.40. In this case, the hypothesis of a moving average unit root is accepted at the 10% level, because the calculated value is less than the tabular value of -1.282.
If the null is (10.4.17) with (c,, p) = (- 1,O) and the alternative is the model (10.4.17), we use the residuals associated with the estimates (10.4.20) but include time in the Gauss-Newton step. The result is
2, = 0.43 - 0.006 (t - 2) - 0.009 Q,,, (0.36) (0.004) (0.102)
- 0.018 Q,,,, - 0.102 QC,., + 0.070 Qs,,, , (0.100) (0.053) (0.113)
and the test statistic is $, = - 1.91. Because the 10% critical value of the test obtained from Table lO.A.2 is -1.22, the null hypothesis is accepted at the 10% level. AA
638 UNIT ROOT AND ExeLOsIVE TIME SeRIES
Sectton 10.1. Bhargava (1986), Dickey (1976). Dickey and FblIer (1979, 1981), Dickey, Hasza. and miller (1984), Dickey and Said (1982), Elliott, Rothenberg, and Stock ( 19921, Evans and Savin ( 198 1 a,b), Gonzalez-Frrrias and Dickey ( 1992), Lai and Wei (1985a.b), Pantula (1982, 1988a), Pantula, Ganzalez-Faxias, and FulIer (1994). Phillips (1987a-c), Phillips and Pemn (19881, Rao (196l), Reeves (1972), Sargan and Bhargava (1983), Stigum (1974). White (1958).
Sectton 10.2. Anderson (1959). Basawa (1987), Fuller and Hasza (1980, 1981), Has% (19771, Rao (1961, 1978a,b), Rubin (1950), Stigum (1974), Venkataraman (1%7, 1973), White (1958, 1959).
Section 103. Ahn and Reinsel (1990). Chan and Wei (1988), Fountis and Dickey (1989), FuIler, Hasza, and Goebel(1981), Johansen (1988). Johansen and Juselius (1990), Lai and Wei (1985a,b), Nagaraj and Fuller (1991), Phillips (1987a-c, 1988, 1991), Phillip and Durlauf (1986), Stock and Watson (1988).
Seetion 10.4. Arellano (1992), Arellano and Pantula (1993, Chang (19891, Tsay (1993).
EXERCISES
1. Prove
for fixed X,, where the e, are iid(0, @').
2. Let the model (10.1.1) hold with p = 1. (a) Show that
n[ (i Yf)-' 2 Y,-,Y, - 1 -% -(2G)-'(T2 + 1). t c 1 r 9 2 1
(b) Show that
3. Let Y, be the pth order autoregressive process with a unit root considered in Thearem 10.1.2. Let rii, be the largest root of the characteristic equation associated with the ordinary least squares estimator. Show that
n{ [ (2 r=2 Yf)-' r-2 5 Y f - l ] ' ' 2 ~ t - 1 ) z -(2G)-' ,
where G is defined in Theorem 10.1.1.
EXERCISES 639
4. Assume that Y, = Y,-, + e,, t = 1,2,. . . , where Yo = 0, and e, - NI(0, a'). Show that
5. Assume that the time series U, satisfies the equation
U , = p q - , + e l , r = l , 2 ,..., where Yo = 0 and e, -NI(O, 1). Let $= b, if @, d 1 and J= 1 if fi, > 1, where
Let bm be the value of p that maximizes (10.1.47). Show that the limiting distribution of n($- 1 ) has a smaller second moment about zero than the limiting distribution of n(& - 1) when po = I .
6. Show that, under normality, the likelihood ratio statistic for testing H,,: at = a; against HA: a, #a: for the model (10.2.6) is a function of .i defined in cor0Ilal-y 10.2.1.2.
7. Plot the residuals from the model of Example 10.1.1 against time. Do you believe the original errors are identically distributed over time? Do you believe the original errors are normally distributed? Fit autoregressive models of orders 2, 3, and 4 to the square mts of the original observations. Plot the residuals from the second order model.
640 UNIT ROOT AND EXPLOSIVE TIME SERIES
8. Let the assumptions of Theorem 10.3.1 hold for the model
Y , = & + ~ ' X , ~ , + ~ l Y , ~ l + e , .
Show that
where
t,, = [e{.($)J-"2(63, - 11,
t{ijl} is the ordinary least squares estimator of the variance of the or$- least squares estimator $,, 7; has the distribution of Table lO.A.2, +* -+ and d is a N(0,l) random variable independent of tjl.
9. Use the facts that & k = 8 , and &Xe$=I, where A= diag(X,, &, . . . , &), to show that the expression (10.3.47) for BiSl2 is equivalent to that in (10.3.37) when g = I .
10. Let the following model hold
YlJ = 75Y2.t-I + el1 * y21= Y2.f- 1 + ez:
for t = 2.3,. . . , where YZ0 = 0 and (ebt,e2f)'--NI(0,Z). Show that the maximum likelihood estimator of v1 is given by the coefficient of Y2,f - in the regression of Yl on (Y2,, - , , Yz, - k'2at- ), What is the coefficient of Yz, - Y2,, - , estimating? Show that the limiting distribution of the usual regression t- statistic for the coefficient of Yz,t-, in the multiple regression is that of a N(0,l) random variable. See Phillips (191).
11. Let model (10.1.25) hold with el = 1 and assume e -NI(O, a 2 ) . Show that A? of (10.1.26) is uncorrelated with dl - 1 but that ntS2 Ahp is not independent of n(8, - I ) in the limit.
12. Assume that (Zt, Y,) of Table lO.B.4 satisfy the model
4 = a0 + alu, + e3t 9
Y, = r2,, + 7r2,Y,-, + T~~ AKml + e2, ,
where (e3r, eZr)' -NI(O, Z,,), r2] E (- 1, I], and r2* = 0 if 7r2, = 1. Estimate the parameters of the model by ordinary least squares and by the system method of Example 10.3.1. Also estimate the parametem subject to the restriction that ( T ~ ~ ~ rZl) = (0,l).
A P P E N D I X A
Appendix 1 O.A. Percentiles for Unit Root Distributions
Table 1O.A.1. Empirical Cumulative DtsMbution of n($ - 1) for p = 1
Sample Size n 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99
Probability of a Smaller Value
25 50
100 250 500 W
25 50
100 250 500 W
25 50
100 250 500 W
-11.8 - 12.8 -13.3 - 13.6 -13.7 - 13.7
- 17.2 -18.9 - 19.8 -20.3 -20.5 -20.6
-22.5 -25.8 -27.4 - 28s -28.9 -29.4
-9.3 -9.9 - 10.2 - 10.4 - 10.4 - 10.5
- 14.6 - 15.7 - 16.3 - 16.7 - 16.8 - 16.9
-20.0 -22.4 -23.7 -24.4 -24.7 - 25.0
-7.3 -7.7 -7.9 - 8.0 - 8.0 -8.1
-12.5 -13.3 -13.7 - 13.9 - 14.0 -14.1
- 17.9 - 19.7 -20.6 -21.3 -21.5 -21.7
ri -5.3 -0.82 -5.5 -0.84 -5.6 -0.85 -5.7 -0.86 -5.7 -0.86 -5.7 -0.86
4 -10.2 -4.22 -10.7 -4.29 -11.0 -4.32 -11.1 -4.34 -11.2 -4.35 -11.3 -4.36
A -15.6 -8.49 -16.8 -8.80 -17.5 -8.96 -17.9 -9.05 -18.1 -9.08 -18.3 -9.11
1.01 0.97 0.95 0.94 0.93 0.93
-0.76 -0.81 -0.83 -0.84 -0.85 -0.85
-3.65 -3.71 -3.74 -3.76 -3.76 -3.77
1.41 1.34 1.31 1.29 1.29 1.28
0.00 -0.07 -0.11 -0.13 -0.14 -0.14
-2.51 -2.60 -2.63 -2.65 -2.66 -2.67
1.78 1.69 1.65 1.62 1.61 1.60
0.64 0.53 0.47 0.44 0.42 0.41
-1.53 -1.67 -1.74 - 1.79 - 1.80 -1.81
2.28 2.16 2.09 2.05 2.04 2.03
1.39 1.22 1.13 1.08 1.07 1.05
-0.46 -0.67 -0.76 -0.83 -0.86 -0.88
~
NOTE!. This table was construd by David A. Dicky using the Monte Carlo method. Details are given in Dickey (1976). Standard errors of the estimates vary, but most are less than 0.10 for entries in the left half of the table and less than 0.02 for entries in the right half of the table. Some entries differ slightly from those of the first edition.
641
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
642 APPENDDL A
Tabie 10.A.2. Emptrkal Cmnulatlve Distribution of B for p = I
Sample Size n 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99
Probability of a S d l e r Value
9
25 50 100 250 500
W
25 50 100 250 500 02
25 50 100 250 So0 W
-2.65 -2.26 -1.95 -1.60 -0.47 0.92 1.33 1.70 -2.62 -2.25 -1.95 -1.61 -0.49 0.91 1.31 1.66 -2.60 -2.24 -1.95 -1.61 -0.50 0.90 1.29 1.64 -2.58 -2.24 -1.95 -1.62 -0.50 0.89 1.28 1.63 -2.58 -2.23 -1.95 -1.62 -0.50 0.89 1.28 1.62 -2.58 -2.23 -1.95 -1.62 -0.51 0.89 1.28 1.62
-3.75 -3.33 -2.99 -2.64 -1.53 -0.37 0.00 0.34 -3.59 -3.23 -2.93 -2.60 -1.55 -0.41 -0.04 0.28 -3.50 -3.17 -2.90 -2.59 -1.56 -0.42 -0.06 0.26 -3.45 -3.14 -2.88 -2.58 -1.56 -0.42 -0.07 0.24 -3.44 -3.13 -2.87 -2.57 -1.57 -0.44 -0.07 0.24 -3.42 -3.12 -2.86 -2.57 -1.57 -0.44 -0.08 0.23
t -4.38 -3.95 -3.60 -3.24 -2.14 -1.14 -0.81 -0.50 -4.16 -3.80 -3.50 -3.18 -2.16 -1.19 -0.87 -0.58 -4.05 -3.73 -3.45 -3.15 -2.17 -1.22 -0.90 -0.62 -3.98 -3.69 -3.42 -3.13 -2.18 -1.23 -0.92 -0.64 -3.97 -3.67 -3.42 -3.13 -2.18 -1.24 -0.93 -0.65 -3.96 -3.67 -3.41 -3.13 -2.18 -1.25 -0.94 -0.66
2.15 2.08 2.04 2.02 2.01 2.01
0.71 0.66 0.63 0.62 0.61 0.60
-0.15 -0.24 -0.28 -0.31 -0.32 -0.32
NOTE. This table was constructed by David A. Dickey using Ihe Monte Carlo method. Derails are given in Dickey (1976). Standard em#s of the estimates vary. but most lcss than 0.014. Some entries differ slightly from those of the first edition.
APPENDIX A 643
Table 10.A.3. Cumulative Distribution of Simp€e Symmetric for p = 1
Samph Size n 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99
hobability of a Smaller Value
25 50
100 250 500 m
25 50
100 250 500
OD
25 50
100 250 500 W
25 50
100 250 500 m
No Adjustment
-2.72 -2.35 -2.05 -1.74 -0.87 -0.43 -0.37 -0.33 -0.29 -2.71 -2.36 -2.08 -1.78 -0.90 -0.44 -0.38 -0.33 -0.29 -2.70 -2.37 -2.09 -1.79 -0.91 -0.45 -0.38 -0.34 -0.30 -2.70 -2.37 -2.10 -1.80 -0.92 -0.46 -0.39 -0.34 -0.30 -2.70 -2.37 -2.10 -1.80 -0.92 -0.46 -0.39 -0.34 -0.30 -2.69 -2.37 -2.10 -1.81 -0.93 -0.46 -0.39 -0.34 -0.30
Mean Removed
-3.40 -3.02 -2.71 -2.37 -1.42 -0.83 -0.73 -0.65 -0.59 -3.28 -2.94 -2.66 -2.35 -1.44 -0.84 -0.73 -0.65 -0.58 -3.23 -2.90 -2.64 -2.34 -1.44 -0.84 -0.73 -0.65 -0.58 -3.20 -2.88 -2.62 -2.34 -1.45 -0.85 -0.73 -0.66 -0.58 -3.19 -2.88 -2.62 -2.33 -1.45 -0.85 -0.73 -0.66 -0.58 -3.17 -2.87 -2.62 -2.33 -1.45 -0.85 -0.73 -0.66 -0.58
Linear Trend Removed
-4.19 -3.76 -3.45 -3.09 -2.10 -1.42 -1.28 -1.18 -1.07 -3.99 -3.36 -3.36 -3.04 -2.11 -1.44 -1.29 -1.18 -1.07 -3.89 -3.57 -3.31 -3.02 -2.12 -1.44 -1.30 -1.19 -1.07 -3.84 -3.54 -3.29 -3.01 -2.12 -1.45 -1.30 -1.19 -1.07 -3.82 -3.52 -3.28 -3.00 -2.12 -1.45 -1.30 -1.19 -1.07 -3.80 -3.51 -3.27 -299 -2.12 -1.45 -1.30 -1.19 -1.07
Quadratic Trend Removed
-4.75 -4.30 -3.97 -3.61 -2.57 -1.84 -1.69 -1.57 -1.45 -4.50 -4.14 -3.85 -3.54 -2.58 -1.86 -1.70 -1.58 -1.45 -4.38 -4.05 -3.79 -3.50 -2.58 -1.87 -1.70 -1.58 -1.45 -4.30 -4.00 -3.76 -3.47 -2.58 -1.87 -1.70 -1.58 -1.44 -4.28 -3.99 -3.74 -3.47 -2.58 -1.87 -1.71 -1.58 -1.44 -4.26 -3.97 -3.73 -3.46 -2.58 -1.87 -1.71 -1.58 -1.44
NOTE. This table was constructed by David A. Dickey.
644 APPENDIX A
TaMe 10A.4. Cumulative DisMbmtion of Wesgbtd SymmeMe +w for p = 1
Sample Probability of a SznaUcrValue Size n 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99
2s 50 100 250 500 m
25 50 100 250 500 m
25 50 100 250 500 03
No Adjustment
-2.73 -2.37 -2.09 -1.80 -1.05 -0.05 -2.74 -2.40 -2.13 -1.85 -1.09 -0.06 -2.74 -2.42 -2.16 -1.88 -1.10 -0.06 -2.75 -2.43 -2.17 -1.89 -1.12 -0.06 -2.75 -2.43 -2.18 -1.90 -1.12 -0.06 -2.76 -2.44 -2.18 -1.90 -1.12 -0.07
Mean removed
-3.33 -2.92 -2.60 -2.26 -1.19 -0.07 -3.21 -2.85 -2.57 -2.25 -1.19 -0.04 -3.16 -2.82 -2.55 -2.24 -1.20 -0.02 -3.12 -2.80 -2.54 -2.23 -1.u) -0.01 -3.11 -2.80 -2.53 -2.23 -1.20 -0.00 -3.10 -2.79 -2.52 -2.22 -1.20 -0.00
Linear Trend Removed
-4.11 -3.70 -3.37 -3.02 -1.98 -1.07 -3.93 -3.57 -3.28 -2.96 -1.96 -1.01 -3.84 -3.51 -3.24 -2.94 -1.96 -0.97 -3.78 -3.47 -3.21 -2.92 -1.95 -0.95 -3.76 -3.45 -3.20 -2.91 -1.95 -0.95 -3.75 -3.45 -3.19 -2.91 -1.94 -0.94
0.24 0.48 0.24 0.51 0.25 0.53 0.25 0.53 0.25 0.54 0.26 0.54
0.25 0.51 0.30 0.58 0.32 0.62 0.34 0.64 0.34 0.65 0.35 0.65
-0.82 -0.60 -0.72 -0.48 -0.68 -0.42 -0.65 -0.38 -0.64 -0.37 -0.63 -0.36
0.80 0.83 0,85 0.86 0.86 0.86
0.84 0.93 0.98 1.01 1.01 1 .or
-0.35 -0.19 -0.11 -0.06 -0.05 -0.03
NOTE. This table was conshuc(ed by Heon Jin Park.
APPI?NDIX A 645
Table 10.A.5. Cumulative Distribution of Maximum Likelihood +m for p = 1
Sample Size n 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99
No Adjustment
Robability of a Smaller Value
25 50
100 250 500 00
25 50
100 250 500 00
25 50
100 250 500 00
-2.80 -2.44 -2.16 -1.86 -0.99 -0.40 -0.32 -0.27 -2.78 -2.44 -2.17 -1.87 -1.00 -0.40 -0.32 -0.27 -2.77 -2.44 -2.17 -1.88 -1.00 -0.39 -0.32 -0.27 -2.76 -2.44 -2.17 -1.88 -1.00 -0.39 -0.32 -0.27 -2.75 -2.44 -2.17 -1.88 -1.00 -0.39 -0.31 -0.27 -2.75 -2.44 -2.17 -1.88 -1.00 -0.39 -0.31 -0.27
Mean Removed
-3.52 -3.11 -2.78 -2.44 -1.44 -0.65 -0.51 -0.42 -3.33 -2.98 -2.69 -2.38 -1.42 -0.64 -0.51 -0.42 -3.24 -2.92 -2.64 -2.34 -1.41 -0.64 -0.50 -0.42 -3.19 -2.88 -2.61 -2.32 -1.40 -0.64 -0.50 -0.41 -3.17 -2.86 -2.60 -2.31 -1.39 -0.64 -0.50 -0.41 -3.16 -2.85 -2.59 -2.30 -1.38 -0.63 -0.50 -0.41
Linear Trend Removed
-4.40 -3.97 -3.62 -3.25 -2.17 -1.35 -1.15 -1.00 -4.07 -3.72 -3.43 -3.11 -2.12 -1.34 -1.14 -1.00 -3.92 -3.60 -3.34 -3.04 -2.10 -1.33 -1.14 -0.99 -3.84 -3.54 -3.28 -3.00 -2.08 -1.32 -1.13 -0.99 -3.82 -3.52 -3.27 -2.98 -2.08 -1.32 -1.13 -0.99 -3.79 -3.50 -3.25 -2.97 -2.07 -1.31 -1.13 -0.99
-0.23 -0.22 -0.22 -0.22 -0.22 -0.22
-0.35 -0.34 -0.34 -0.34 -0.34 -0.34
-0.87 -0.86 -0.86 -0.85 -0.85 -0.85
NOTE. This table was constructed by Heon Jin Park.
646 APPBNDIX A
Table 10.A.6. Vector O r d i ~ q Lepst Squares: No Aaustmcnt
Probability of a Smaller Value
D.f. Ho H, 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99 ~~ ~ ~~~~
23 1 0 0.00 0.00 0.01 0.02 0.58 2.70 3.68 4.62 5.80
22 2 0 1.18 1.49 1.84 2.31 4.75 8.35 9.53 1060 11.89 2 1 1.01 1.28 1.58 1.99 4.16 7,42 8.48 9.47 10.59
21 3 0 5.38 6.11 6.80 7.67 11.34 15.76 17.09 18.34 19.83 3 1 5.11 5.83 6.50 7.32 10.84 15.06 16.34 17.49 18.86 3 2 3.34 3.82 4.30 4.90 7.48 10.63 11.58 12.37 13.32
20 4 0 12.06 13.11 14.04 15.19 19.62 24.63 26.10 27.47 29.00 4 1 11.78 12.82 13.72 14.84 19.15 24.03 25.46 26.75 28.29 4 2 9.78 10.67 11.46 12.44 16.17 20.27 21.43 22.51 23.70 4 3 5.81 6.40 6.92 7.55 10.05 12.77 13.54 14.19 14.91
48 1 0 0.00 0.00 0.01 0.02 0.59 2.83 3.90 4.96 6.36
47 2 0 1.21 1.56 1.93 2.43 5.12 9.35 10.84 12.23 13.90 2 1 1.04 1.34 1.65 2.10 4.50 8.39 9.79 11.02 12.67
46 3 0 5.80 6.64 7.43 8.44 12.84 18.53 20.38 22.06 24.13 3 1 5.53 6.35 7.10 8.06 12.31 17.84 19.62 21.24 23.19 3 2 3.61 4.17 4.73 5.42 8+62 12.95 14.38 15.70 17.23
45 4 0 13.55 14.85 16.04 17.47 23.31 30.26 32.42 34.36 36.67 4 1 13.26 14.53 15.68 17.10 22.81 29-65 31.80 33.67 35.88 4 2 11.08 12.15 13.19 14.43 19.49 25.56 27.40 29.07 31.01 4 3 6.62 7.34 8.02 8.87 12.41 16.82 18.19 19.38 20.83
98 1 0 0.00 0.00 0.01 0.02 0.59 2.91 4.03 5.16 6.69
97 2 0 1.23 1.59 1.97 2.49 5.31 9.91 11.56 13.15 15.08 2 1 1.06 1.37 1.69 2.15 4.67 8.91 10.49 11.98 13.84
96 3 0 6.00 6.89 7.74 8.83 13.62 20.08 22.23 24.21 26.64 3 1 5.73 6.59 7.39 8.44 13.09 19.37 21.46 23,40 25.74 3 2 3.74 4.34 4.94 5.69 9.23 14.29 16.00 17.59 19.56
95 4 0 14.33 15.76 17.06 18.66 25.32 33.50 36.10 38.44 41.23 4 1 14.01 15.41 16.68 18.26 24.81 32.86 35.44 37.74 40.48 4 2 11.72 12.93 14.07 15.46 21.32 28.61 30.94 33.02 35.46 4 3 7.05 7.84 8.60 9.56 13.71 19.19 20.97 22.61 24.61
APPENDIX A 647
Table 10.A.6.-Conrinued
probability of a Smaller Value
D.f. H, H, 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99
248 1 0 0.00 0.00 0.01 0.02 0.60 2.95 4.10 5.27 6.87
247 2 0 1.25 1.61 1.99 2.53 5.42 10.26 12.01 13.73 15.87 2 1 1.07 1.38 1.71 2.18 4.77 9.25 10.93 12.58 14.59
246 3 0 6.12 7.03 7.92 9.05 14.11 21.07 23.42 25.60 28.28 3 1 5.83 6.71 7.56 8.65 13.57 20.35 22.65 24.79 27.41 3 2 3.82 4.44 5.06 5.85 9.61 15.15 17.05 18.83 21.11
245 4 0 14.80 16.32 17.69 19.39 26.57 35.59 38.49 41.15 44.27 4 1 14.46 15.94 17.30 18.98 26.06 34.95 37.81 40.45 43.55 4 2 12.11 13.42 14.62 16.10 22.47 30.59 33.25 35.66 38.48 4 3 7.31 8.15 8.96 9.99 14.54 20.74 22.83 24.79 27.21
498 1 0 0.00 0.00 0.01 0.02 0.60 2.96 4.12 5.30 6.91
497 2 0 1.25 1.61 2.00 2.54 5.46 10.37 12.16 13.92 16.14 2 1 1.08 1.39 1.72 2.19 4.80 9.36 11.08 12.75 14.86
4% 3 0 6.16 7.07 7.98 9.12 14.28 21.42 23.84 26.07 28.85 3 1 5.86 6.75 7.61 8.72 13.73 20.69 23.06 25.27 27.98 3 2 3.84 4.47 5.09 5.90 9.74 15.43 17.42 19.28 21.66
495 4 0 14.95 16.50 17.90 19.65 26.99 36.31 39.32 42.10 45.34 4 1 14.62 16.12 17.51 19.23 26.48 35.67 38.64 41.42 44.62 4 2 12.23 13.57 14.80 16.33 22.86 31.27 34.04 36.58 39.56 4 3 7.40 8.27 9.09 10.14 14.82 21.28 23.49 25.55 28.13
00 1 0 0.00 0.00 0.01 0.02 0.60 2.97 4.13 5.31 6.94
2 0 1.25 1.62 2.01 2.55 5.49 10.48 12.31 14.10 16.42 2 1 1.08 1.39 1.72 2.20 4.83 9.48 11.23 12.88 15.14
3 0 6.19 7.11 8.03 9.18 14.46 21.78 24.26 26.54 29.41 3 1 5.89 6.78 7.66 8.77 13.91 21.06 23.47 25.75 28.56 3 2 3.87 4.49 5.13 5.95 9.88 15.71 17.82 19.77 22.22
4 0 15.10 16.67 18.12 19.91 27.41 37.04 40.17 43.08 46.47 4 1 14.78 16.29 17.73 19.50 26.89 36.40 39.51 42.44 45.69 4 2 12.36 13.72 15.00 16.56 23.25 31.96 34.81 37.54 40.71 4 3 7.50 8.39 9.21 10.31 15.11 21.83 24.18 26.31 29.07
NOTE. This table was constructed by Heon Jm park. See equation (10.3.48) for the definition of the statistics.
648 APPENDIX A
Table fO.A.7. Vector OrdlaarJr Leust Squares: Mean Rornovsd
D.f.
22
21
20
19
47
46
45
44
97
96
95
94
Probability of a Smaller Vaiue
H, H, 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99
1 0 0.00 0.02 0.07 0.24 2.13 5.20 6.25 7.23 8.44
2 0 2.51 2.99 3.51 4.18 7.32 11.27 12.57 13.68 15.03 2 1 2.07 2.46 2.85 3.38 5.79 8.92 9.89 10.75 11.75
3 0 7.90 8.74 9.50 10.46 14.38 18.93 20.30 21.53 23.03 3 1 7.33 8.11 8.80 9.67 13.18 17.17 18.36 19.43 20.65 3 2 4.53 5.05 5.53 6.13 8.59 11.41 12.23 12.93 13.73
4 0 15.22 16.27 17.23 18.37 22.74 27.65 29.10 30.41 32.M 4 1 14.62 15.63 16.53 17.60 21.69 26.18 27.50 28.66 29.99 4 2 11.78 12.62 13.37 14.28 17.69 21.34 22.41 23.30 24.32 4 3 6.79 7.33 7.82 8.39 10.64 13.02 13.65 14.19 14.81
1 0 0.00 0.02 0.07 0.25 2.30 5.88 7.20 8.43 9.99
2 0 2.69 3.28 3.85 4.63 8.33 13.42 15.07 16.57 18.40 2 1 2.23 2.68 3.13 3.74 6.65 10.80 12.25 13.51 15-10
3 0 8.92 9.93 10.91 12.13 17.22 23.46 25.41 27.19 29.30 3 1 8.28 9.23 10.11 11.24 15.87 21.60 23.36 24.98 26.90 3 2 5.15 5.81 6.43 7.20 10.54 14.88 16.23 17.44 18.92
4 0 18.02 19.46 20.75 22.33 28.58 35.83 38.06 40.07 42.40 4 1 11.33 18.74 19.95 21.45 27.42 34.28 36.36 38.28 40.49 4 2 14.05 15.23 16.28 17.57 22.72 28.65 30.48 32.09 33.99 4 3 8.19 8.96 9.65 10.50 14.01 18.23 19.54 20.65 22.10
I 0 0.00 0.02 0.07 0.26 2.38 6.26 7.72 9.10 10.82
2 0 2.77 3.40 4.03 4,86 8.89 14.60 16.53 18.29 20.50 2 1 2.31 2.79 3.27 3.93 7.12 11.90 13.59 15.11 17.07
3 0 9.44 10.58 11.66 13.03 18.79 26.10 28.45 30.61 33.26 3 1 8.78 9.84 10.82 12.08 17.38 24.18 26.36 28.37 30.83 3 2 5.48 6.21 6.91 7.77 11.65 16.93 18.67 20.30 22.26
4 0 19.47 21.12 22.61 24.47 31.87 40.67 43.45 45.94 48.% 4 I 18.75 20.34 21.76 23.53 30.65 39.07 41.72 44.14 47.05 4 2 15.24 16.59 17.83 19.35 25.59 33.08 35.44 37.58 40.13 4 3 8.93 9.81 10.64 11.66 15.99 21.50 23.29 24.91 26.97
APPENDIX A 649
Table 10.A.7.4ontinued
Probability of a Smaller Value
D.f. Ho H, 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99
247 1 0 0.00 0.02 0.07 0.26 2.43 6.47 8.01 9.48 11.34
246 2 0 2.82 3.47 4.13 5.00 9.23 15.36 17.47 19.42 21.93 2 1 2.35 2.86 3.36 4.04 7.41 12.60 14.44 16.17 18.38
245 3 0 9.76 10.99 12.13 13.58 19.77 27.83 30.45 32.88 35.92 3 1 9.08 10.21 11.26 12.60 18.33 25.85 28.33 30.63 33.49 3 2 5.69 6.46 7.20 8.13 12.35 18.29 20.32 22.24 24.57
244 4 0 20.38 22.15 23.78 25.81 33.97 43.88 47.05 49.89 53.48 4 1 19.64 21.35 22.90 24.85 32.71 42.25 45.32 48.09 51.53 4 2 15.99 17.45 18.81 20.50 27.44 36.03 38.79 41.32 44.39 4 3 9.41 10.36 11.27 12.40 17.28 23.73 25.91 27.93 30.44
497 1 0 0.00 0.02 0.07 0.26 2.45 6.52 8.10 9.59 11.53
4% 2 0 2.84 3.49 4.15 5.04 9.33 15.62 17.79 19.82 22.43 2 1 2.36 2.88 3.39 4.07 7.50 12.83 14.72 16.54 18.85
495 3 0 9.86 11.13 12.28 13.75 20.10 28.43 31.15 33.68 36.84 3 1 9.19 10.34 11.41 12.76 18.65 26.43 29.02 31.41 34.42 3 2 5.76 6.54 7.30 8.25 12.59 18.77 20.91 22.92 25.38
494 4 0 20.69 22.51 24.19 26.27 34.70 45.00 48.32 51.30 55.09 4 1 19.% 21.69 23.30 25.30 33.42 43.37 46.59 49.48 53.13 4 2 16.25 17.75 19.15 20.90 28.08 37.07 39.98 42.65 45.93 4 3 9.58 10.56 11.49 12.65 17.72 24.53 26.85 29.02 31.70
co 1 0 0.00 0.02 0.07 0.26 2.46 6.56 8.16 9.68 11.74
2 0 2.86 3.51 4.17 5.07 9.41 15.89 18.11 20.21 22.94 2 1 2.36 2.88 3.41 4.10 7.59 13.05 14.99 16.93 19.36
3 0 9.97 11.25 12.45 13.91 20.42 29.03 31.85 34.49 37.73 3 1 9.29 10.46 11.56 12.92 18.96 27.02 29.70 32.18 35.32 3 2 5.83 6.62 7.39 8.37 12.83 19.27 21.54 23.59 26.19
4 0 21.04 22.89 24.63 26.74 35.44 46.17 49.62 52.77 56.73 4 1 20.31 22.07 23.73 25.76 34.14 44.55 47.90 50.91 54.79 4 2 16.54 18.09 19.52 21.33 28.72 38.12 41.20 44.00 47.55 4 3 9.78 10.78 11.73 12.92 18.17 25.33 27.85 30.17 33.01
NOTE This table was waspNcted by Ha00 Jin Parit. See quation (10.3.48) for the definition of the statistics.
650 APPENDIX A
Table IO.A.8. Veetor Ordinary Least Sqapres: L W r Trend Removed
Probability of a SmalletValue
D.f. H, H, 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99
21
20
19
18
46
45
44
43
96
95
94
93
I
2 2
3 3 3
4 4 4 4
1
2 2
3 3 3
4 4 4 4
1
2 2
3 3 3
4 4 4 4
0 0.08 0.33 0.71 1.32 3.89 7.19 8.24 9.22 X0.32
0 4.20 4.93 5.62 6.49 10.09 14.22 15.44 16.51 17.78 1 3.24 3.73 4.20 4.79 7.31 10.30 11.18 11.96 12.83
0 10.65 11.62 12.49 13.56 17.72 22.26 23.62 24.78 26.18 1 9.56 10.38 11.14 12.04 15.50 19.26 20.33 21.27 22.39 2 5.73 6.26 6.75 7.35 9.70 12.22 12.95 13.54 14.21
0 18.56 19.68 20.64 21.84 26.30 31.04 32.44 33.67 35.12 1 17.52 18.51 19.40 20.47 24.41 28.53 29.74 30.79 31.% 2 13.77 14.55 15.28 16.17 19.37 22.62 23.56 24.34 25.22 3 7.79 8.28 8.74 9.30 11.34 13.39 13.93 14.38 14.87
0 0.11 0.40 0.80 1.43 4.32 8.44 9.86 11.13 12.76
0 4.71 5.56 6.41 7.46 11.95 17.50 19.31 20.90 22.79 1 3.57 4.16 4.75 5.48 8.74 13.06 14.46 15.75 17.29
0 12.52 13.79 14.96 16.38 22.08 28.75 30.81 32.67 34.83 1 11.18 1230 13.30 14.51 19.47 25.33 27.13 28.77 30.70 2 6.79 7.49 8.16 8.98 12.46 16.71 18.07 19.31 20.78
0 22.95 24.58 25.98 27.73 34.47 42.06 44.25 46.18 48.65 1 21.60 23.05 24.39 24.00 32.18 39.12 41.20 43.01 45.21 2 17.06 18.31 19.46 20.82 26.03 31.92 33.69 35.21 37.14 3 9.77 10.59 11.31 12.17 15.69 19.81 21.05 22.17 23.45
0 0.13 0.43 0.86 1.50 454 9.09 10.72 12.23 14.13
0 4.% 5.89 6.79 7.94 12.95 19.36 21.46 23.41 25.75 t 3.76 4.39 5.02 5.84 9.51 14.62 16.36 17.98 20.02
0 13.48 14.88 16.22 17.85 24.49 32.53 35.06 37.37 40.12 1 12.02 13.26 14.40 15.82 21.69 28.90 31.22 33.31 35.84 2 7.32 8.12 8.89 9.85 14.04 19.47 21.28 22.90 24.95
0 25.21 27.12 28.81 30.92 39.08 48.57 51.44 54.05 57.18 1 23.71 25.44 27.06 29.01 36.60 45.50 48.22 50.68 53.57 2 18.80 20.32 21.70 23.35 29.88 37.63 40.07 42.16 44.76 3 10.85 11.83 12.71 13.78 18.26 23.89 25.69 27.33 29.28
APPENDIX A 651
Table lO.Ad.-Continued
Probability of a Smaller Value
D.f. H, H, 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99 ~
246 1 0 0.14 0.44 0.88 1.54 4.68 9.49 11.26 12.93 15.03
245 2 0 5.10 6.08 7.02 8.22 13.56 20.57 22.87 25.06 27.74 2 1 3.86 4.53 5.18 6.05 9.99 15.66 17.63 19.48 21.85
244 3 0 14.05 15.55 16.98 18.75 26.02 35.01 37.89 40.52 43.72 3 I 12.53 13.84 15.08 16.63 23.10 31.26 33.95 3637 39.36 3 2 7.63 8.51 9.34 10.40 15.05 21.33 23.47 25.39 27.84
243 4 0 26.61 28.71 30.58 32.92 42.05 52.92 56.31 59.42 63.11 4 1 25-01 26.95 28.73 30.89 39.47 49.76 52.99 55.92 59.43 4 2 19.91 21.59 23.11 24.94 32.38 41.49 44.42 47.00 50.16 4 3 11.54 12.61 13.59 14.81 19.96 26.71 28.96 31.02 33.54
4% 1 0 0.14 0.45 0.89 1.55 4.72 9.62 11.44 13.17 15.34
495 2 0 5.15 6.13 7.09 8.32 13.76 20.99 23.39 25.66 28.46 2 1 3.89 4.57 5.24 6.11 10.15 16.03 18.09 20.03 22.49
494 3 0 14.23 15.77 17.23 19.05 26.55 35.87 38.88 41.64 45.02 3 1 12.70 14.04 15.30 16.91 23.59 32.10 34.91 37.45 40.63 3 2 7.72 8.64 9.49 10.58 15.40 21.98 24.25 26.28 28.89
493 4 0 27.09 29.26 31.18 33.61 43.09 54.47 58.05 61.33 65.28 4 1 25.46 27.48 29.30 31.53 40.46 51.26 54.69 57.80 61.59 4 2 20.31 22.03 23.58 25.49 33.26 42.87 45.95 48.74 52.18 4 3 11.79 12-88 13.89 15.16 20.56 27.73 30.15 32.38 35.13
m 1 0 0.14 0.45 0.89 1.55 4.77 9.77 11.63 13.41 15.67
2 0 5.19 6.16 7.16 8.41 13.95 21.44 24.00 26.32 29.24 2 1 3.91 4.60 5.29 6.17 10.30 16.43 18.59 20.64 23.15
3 0 14.40 16.00 17.47 19.36 27.08 36.76 39.92 42.83 46.41 3 1 12.85 14.23 15.54 17.19 24.08 32.97 35.92 38.57 41.97 3 2 7.82 8.77 9.64 10.78 15.75 22.64 25.05 27.25 30.00
4 0 27.60 29.86 31.81 34.31 44.16 56.11 59.88 63.27 67.64 4 1 25.94 28.06 29.89 32.18 41.48 52.80 56.47 59.74 63.97 4 2 20.75 22.50 24.06 26.05 34.15 44.30 47.50 50.57 54.44 4 3 12.06 13.16 14.19 15.51 21.19 28.78 31.42 33.81 36.88
NOTE. This table was construcled by Hcon Ji Park. See equation (10.3.48) for the definition of the statistics.
652 APPENDIX A
Table 10A.9. Perattilee of Test StaWcs for Test of Unit Root of Moving Average
Sample Size n 0.00 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99
probability of B Smallex Value
*7 of (10.4.8)
25 -3.87 -3.47 -3.16 -2.83 -1.88 -1.28 -1.16 -1.08 -1.00 50 -3.71 -3.37 -3.09 -2.79 -1.89 -1.28 -1.16 -1.07 -0.98
100 -3.64 -3.32 -3.06 -2.77 -1.89 -1.28 -1.16 -1.06 -0.97 250 -3.59 -3.29 -3.04 -2.76 -1.90 -1.28 -1.16 -1.06 -0.97 500 -3.58 -3.28 -3.03 -2.75 -1.90 -1.28 -1.16 -1.06 -0.97 750 -3.57 -3.28 -3.02 -2.75 -1.90 -1.28 -1.16 -1.06 -0.96 W -3.56 -3.27 -3.02 -2.75 -1.90 -1.28 -1.16 -1.06 -O.%
* 7, of (10.4.15)
25 -4.51 -4.09 -3.75 -3.40 -2.39 50 -4.28 -3.93 -3.64 -3.33 -2.39
100 -4.17 -3.85 -3.58 -3.29 -2.39 250 -4.10 -3.80 -3.55 -3.27 -2.40 500 -4.08 -3.78 -3.54 -3.27 -2.40 750 -4.07 -3.78 -3.53 -3.26 -2.40 m -4.06 -3.77 -3.53 -3.26 -2.40
NOTE. This table was CollSlNcted by David A. Dickey.
-1.69 -1.55 -1.45 -1.70 -1.56 -1.45 -1.71 -1.56 -1.45 -1.72 -1.56 -1.45 -1.72 -1.56 -1.45 -1.72 -1.57 -1.45 -1.72 -1.57 -1.45
-1.34 - 1.34 - 1.33 -1.33 - 1.33 -1.33 -1.33
A P P E N D I X B
Appendix 10.B. Data Used in Examples
Tab& 10.8.1. Serraonslly Adlestea Interest Rates ObS. u1 u2 u3
1 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
4.12808 4.1 1977 4.02260 4.08650 3.89452 3.29188 3.13677 2.78538 2.41740 2.34835 2.35063 2.00812 1.58808 2.68977 2.20260 1.66650 2.02452 1.70188 1.06678 1.80538 1.69740 2.13835 2.53M2 2.35812 2.27808 2.51977 2.88260 2.85650 2.33452
4.39345 4.10203 3.39690 3.23849 3.33345 2.52677 2.25655 2.15797 2.39310 2.29151 2.32655 2.18323 2.28345 2.56203 2.47690 2.29849 2.33345 2.39677 2.19655 2.24797 2.193 10 2.29151 2.43655 2.53323 2.76345 2.87203 2.80690 2.73849 2.72345
5.02055 4.47400 3.821 10 3.62040 3.65055 3.15180 2.95945 2.64600 2.71890 2.84960 2.79945 2.58820 2.70055 2.92400 2.901 10 2.81040 2.79055 2.85180 2.7 1945 2.73600 2.73890 2.82960 2.82945 2.91820 3.26055 3.38400 3.12110 2.97040 2.98055
653
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
654 APPENDIX B
Tabh lO.B.l.--Conrinlred
Obs. u 1 u 2 u 3
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
2.65188 2.61678 2.73538 2.71740 2.77835 2,85062 2.95812 3,04808 3.14977 3.16260 3.06651 3.04452 2.96188 2.92678 3.29538 3.29740 3.37835 3.39062 2.4081 1 3.61808 3.62937 3.61260 3.63651 3.54453 3.47 188 3.32678 3.30538 3.26740 3.23835 3.43062 3.8781 1 4.03808 4.12977 4.22260 4.2565 1 4.14453 4.01 188 3.99678 3.92538 3,82740 3.95835 4.01062 4.3481 1 4.55808
2.79677 2.87655 2.67797 2.69310 2.73 15 1 2.78655 2.80323 2.95345 3.06203 2.97690 3.90849 2.96345 3.05677 3.13656 3.17797 3.293 10 3.44151 3.47655 3.45323 3.56344 3.67203 3.62690 3.47849 3.52345 3.54677 3.41656 3.35797 3.44309 3.56151 3.59655 3.77323 3.85344 4.07203 4.01691 3.93849 3.93345 3.86677 3.79656 3.69797 3.83309 4.02151 4.04655 4.31323 4.63344
4.74977 4.79203
2.94180 3.09945 2.92600 2.84890 2.82960 2.86945 2.88820 3.07055 3.17399 3.1 1 110 3.10040 3.13055 3.16180 3.32946 3.32601 3.42890 3.53960 3.59945 3.63820 3.75054 3.88399 3.921 10 3.82040 3.78055 3.75180 3.56946 3.49601 3.58890 3.71960 3.78945 3.90820 3.98054 4.17399 4.161 10 4.07040 4.03055 3.94180 3.81946 3.78601 3.94890 4.08960 4.15945 4.50820 4.76054 4.98399
655
Obs. Ul u2 u3
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 1 0 101 102 103 104 105 106 107 108 109 110 I l l 112 113 114 115 116 117 118 119
4.83260 4.8365 1 4.94453 5.14188 5.20678 5.33538 5.21740 5.40835 5.68062 5.4281 1 5.07808 5.14977 4.71260 4.21651 3.98453 3.95188 3.69678 3.69538 3.8 1739 3.75835 4.03062 4.5381 1 4.73808
5.23260 5.92651 6.16453 6.04188 5.92678 5.83538 5.59739 5.79835 5.72062 6.048 1 1 6.43807 6.78977 6.97260 7.5765 1 8.71453 8.87188 8.51678 8.99538 8.96739 8.87835 8.76062
4.86977
4.67691 4.62849 4.68345 4.56677 4.75656 4.81797 5.28309 5.34151 5.27655 4.89323 4.76344 4.70203 4.34691 3.84849 3.64345
4.16656 4.12797 4.33309 4.55 15 1 4.68656 4.90323 5.04344 5.12203 525691 5.38849 5.70344 5.58677 5.26656 4.94797 5.10309 5.34151 5.40656 5.89323 6.18344 6.26203 6.10691 6.11849 6.08344 6.50677 6.95656 6.83797 7.0309 6.99151 7.19656
3.60677
4.95111 4.81040 4.92055 4.83180 4.86946 5.16601 5.65889 5.44960 5.41945 4.94819 4.68054 4.74399 4.321 11 3.97040 3.95055 4.21 181 4.82946 4.87601 4.95889 5.14960 5,31945 5.53819 5.37054 5.41399 5.561 11 5.54040 5.91055 5.73181 5.31946 4.97601 5.03889 5.25960 5.43945 5.9 18 19 6.14054 6.38399 6.341 11 6.1 1040 6.18055 6.93181 7.11946 7.10601 7.20889 7.13960 7.28945
656 APPENDIX B
Table lO.B.l.+dued - -~ -
Obs. Ul u 2 u 3
1 20 121 1 22 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 1 62 I 63 164
8.99810 9.11807 9.12977 7.94260 8.26651 7.98453 7.57 188 7.1 1678 6.41538 6.10739 6.07835 5.51062 4.92810 4.27807 3.86977 3.89260 4.31651 4.67453 4.88188 5.21679 5.37538 5.36739 5.07835 4.82062 4.16810 3.63807 3.43977 4.0126 4.3365 4.3 145 4.4319 4.4568 4.6054 4.6874 4.9183 4.9706 5.3581 6.0781 6.7298 7.2726 7.2865 7.8845 8.4619
10.3068 10.3054
7.75323 7.91344 7.27203 6,71691 6.51849 6.88344 6.7467'7 6.40657 6.26797 6.04309 5.90151 5.23656 4.80322 4.48343 3.84203 3.4669 1 3.86849 4.18344 4.81678 5.35657 4.79798 4.60309 4.45 151 4.17656 3.94322 3.42343 3.34202 3.81691 3.7 1 849 3.73344 3.97678 3.93657
4.57309 4.73151 4.73656 5.00322 5.45343 5.74202 6.17691 6.26849 6.40344 7.25678 7.96657 8.52798
3.87798
7.56819 7.58054 7.22399 6.641 11 6.60040 7.19055 7.121 81 6.54947 6.37601 6.24889 6.15960 5.31945 4.78819 4.47053 4.01 399 3.741 11 4.16040 4.71055 5.38181 5.66947 5.34601 5.04889 4.67960 4.41945 4.34819 3.89053 4.23399 4.571 11 4.72040 4.53055 4.76181 4.82947 4.72601 5.29889 5.31960 5.12945 5.22819 5.65053 6.10399 6.67111 6.58040 6.70055 7.10181 7.89947 8.14601
APPENDIX B 657
Table l&B.l.-CoRtinued
obs. u1 u2 u3
165 166 167 168 1 69 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 1 93 194 195 1% 197 198 199 200 201 202 203 204 205 206 207 208 209
10.5974 9.8883 9.9406 9.978 1 9.7881 9.1198 9.5326 10.6765 1 1.3545 11.9019 12.8268 11.8154 11.1574 9.9383 9.3606 8.5581 7.2681 6.3898 5.7226 5.6565 5.2645 5.5219 6.0068 5.9454 6.0574 5.6983 5.1306 5.2281 5.008 1 4.9198 5.0226 4.9865 5.3345 5.4519 5.2168 5.0954 5.0674 4.9083 4.8606 4.6781 4.7481 4.8298 4.8726 4.8965 5.3945
8.20309 7.21151 7.78656 7.38322 7.81343 7.26202 8.04692 8.33849 8.27344 7.96678 7.50657 8.81798 7.97308 7.45151 7.42656 7.08322 6.30343 5.64202 5.57692 5.61849 5.27344 5.40678 6.08657 6.29798 6.33308 5.95151 5.43656 5.37322 4.91343 5.02202 5.08692 4.86849 5.24344 5.47678 5.18657 4.99798 4.99308 4.91151 4.70656 4.28322 4.66343 4.81202 4.68692 4.54849 5.00344
7.92889 7.09960 7.32945 6.958 19 7.08053 6.68399 7.481 1 1 8.15040 8.28055 8.21181 7.96947 8.70601 8.37889 7,51960 7.21945 6.73819 6.34053 5.73399 5.841 1 1 6.47040 5.98055 5.91181 6.56947 6.98601 7.05889 6.40960 5.99945 6.10819 5.15053 5.70399 5.961 I1 5.61040 6.05055 6.17 18 1 5,74947 5.46601 5.35889 5.11960 4.92945 4588 19 5.07052 5.33399 5.331 11 5.17040 5.50055
658 APPENDIX B
Tabie lOS.l.-Continued
ObS. u1 u2 u3
210 21 1 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 23 I 232 233 234 235 236
5.3619 5.3268 5.7054 5.9574 6.3483 6.4206 6.5881 6.8381 6.9298 6.9726 7.0565 7.4045 7.5719 7.7 168 7.8454 8.2674 8.8383 9.6706
10.0581 10.208 1 10.2098 10.2726 10.1765 10.2845 10.2619 10.3768 10.7454
5.08678 5.14658 5.34798 5.72308 6.15151 6.05656 6.00322 6.48342 6.59202 6.37692 6.29849 6.45344 6,79678 6.96658 6.93798 7.76308
8.59656 9.01322 9.39342 9.46202 9.56692 9.46849 9.65344 9.12678 9.19658 9.37798
7.9aisi
5.46181 5.49948 5.79601 5.98889 6.44960 6.44945 6.46819 6.87052 7.03399 6.961 12 7.03040 7.35055 7.58181 7.71948 7.55601 7.86888 8.37960 9.12945 9.38819 9.61052 9.56399 9.521 12 9.35040 9.34055 8.86181 8.79948 8.98601
SOURCE. Banking and Monetary Statistics: 1Wl-1970, Federal Reserve Systems. Washington, D.C., and Annual Statistical &gmt (1970- 1979). Federal Reserve Systems. Washington. D.C. Thcse data differ from those in the original sources because they are seasonally adjusted in a different manner.
APPENDIX B 659
Table 10.B.2. Arti6dal Autoregressive Moving Average Data Time Observation Time Observation Time Observation
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
-0.589 -0.665
1.655 2.689 1.695
-2.183 -6.983 - 10.053 - 10.566 -8.129 -2.242
4.792 9.154 9.521 6.793 2.258
-2.285 -4.200 -2.712 -0.547
0.389 0.731
- 1.075 -2.228 -2.305 -0.519
1.788 4.267 5.913 5.483 3.745 2.017 0.3 1 8
- 1.804
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
-4.546 -5.403 -4.41 1 -3.709 -0,834
2.498 3.103
-0.698 -6.6% - 10.295 -8.954 -4.039
2.415 8.275
12.414 13.220 9.959 3.281
-3.934 -8.842 -9.985 -7.533 -3.458
2.159 8.960
14.563 1 5.020 9.890 3.059
-3.910 -7.551 -8.689 -7.357
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 % 97 98 99
100
-5.698 -3.757 - 1.778 -0.637
1.473 3.078 3.157 3.902 5.557 6.295 4.099
-0.5 18 -5.132 -7.221 -6.819 -5.023 -3.389 -1.546
1.469 3.959 3.387 0.777
-2.547 -5.955 -8.043 -5.394 - 0.869
2.738 3.708 3.245
-0.757 -5.353 -8.213
660 APPENDIX B
Table lO.B.3. Computer Generated Moving Average Data
Time Observation Time Observation Time Observation
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
0.0824 -0.7421 -2.5863 - 1.0847
0.08 1 I 2.1080 1.2283 0.2222 0.1139 0.9652 1.7643 22358 0.8803 0.8975 0.8376
-0.3025 - 2.5 194 -2.7883 -3.5046 -4.1382 -3.6002 - 1.5214
2.0135 3.2380 1.5426 1.1661 0.6807
-0.8016 - 1 S89O -1.1519 - 1.33%
0.0265 1.785 1 0.1%5
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
-0.2092 1.1027 0.7200
-0.1 186 -0.7560 -1.1736
0.1670 0.4029 1.4065
-0.6458 - 1.8765 - 1.6670 -0.9132 0.9964 1.6263 0.5360 1 .O236 0.7965 0.2285
-0.2926 -1.31 10
0.3846 1.9833
-0.1662 -0.0513
3,0565 1.9968 0.4235
-1.3013 - 1,235 1 - 1.7 102 - 1.2697 -0.6817
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
100
-0.8659 -0.3166
1.2318 2.6148 I .3507 0.3786 0.0751
-0.1 134 -0.9633 -2.1476 - 1.6736
0.2056 2.3535 1.7980
-0.1434 - 1.47% - 1.4369 -0,7226
0.4259 1.3133 1.8455 0.5274
- 1.8385 -3.3115 - 1.7593
0.0670 0.4853 3.0867 1.7763 1.5428 0.2979
- 1.4703 - 1.4294
APPENDIX B 661
Table 10.B.4, Computer Generated Data for Vector PIwm3!3
Time x, u, z, 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
-5.59 -6.85 -7.25 -6.72 -9.56 -8.20 -7.93 -8.85 -9.55 -7.56 -6.47 -4.98 -3.14 - 1.65 -0.78 -0.65
2.67 5.28 5.96 8.97
13.18 14.49 17.45 19.15 20.99 20.56 19.87 19.88 18.84 18.25 17.46 16.03 14.89 14.69 15.46 14.56 14.45 14.27 14.63 14.12 14.89 15.09 14.16 13.46
- 10.78 - 12.1 1 -13.21 - 12.48 - 12.73 - 13.50 -13.66 - 13.72 - 14.25 - 12.61 -9.67 -7.05 -4.43 -2.79 -3.03 -3.47 -0.30
5.08 8.73
12.29 15.86 19.61 21.46 22.98 23.74 23.45 22.11 21.98 21.47 20.14 18.61 17.23 16.32 14.99 15.41 15.60 14.62 14.44 14.98 16.56 16.39 16.30 15.56 13.34
-6.64 -7.91 -8.12 -6.14 -9.75 -8.82 - 8.06 -8.89 -9.98 -6.24 -4.13 -2.88 -1.05 -0.33 -0.98 - 1.00
5.21 9.59 8.88
1 1.83 16.03 17.49 18.93 20.37 21.60 20.33 18.80 19.77 18.44 17.19 16.24 14.93 14.17 13.62 15.80 14.71 13.66 14.13 15.06 15.38 14.75 15.02 13.57 11.68
662 APPENDIX B
Time XI Y, Zl
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
13.96 12.93 10.99 9.5 I 9.63 8.58 8.00 5.16 3.46 4.60 3.10 3.53 5.53 5.20 6.41 6.20 6.30 7.63 8.95 9.25
1 1.25 15.56 17.16 19.77 19.82 24.38 23.36 24.19 25.47 23.57 22.30 22.44 23.50 23.66 25.15 26.57 27.17 28.03 28.43 27.36 27.87 27.17 26.52 24.90
12.44 11.53 11.53 10.19 8.48 7.14 5.00 3.39 I .33 1.13 1.14 2.69 4.21 4.70 4.57 5.29 5.93 7.08 9.84
1211 14.41 17.82 21.12 24.07 24.78 26.91 28.22 28.20 28.35 27.55 25.98 25.37 26.42 28.76 30.08 31.39 32.18 33.44 33.19 32.43 30.06 29-50 29.52 28.90
13.25 12.20 10.99 8.44 8.26 7.50 6.29 3.88 1.81 4.43 3.11 4.77 6.75 5.59 6.3 1 6.78 6.81 8.55
11.15 11.07 13.08 18.29 19.81 22.12 20.39 26.08 24-40 24.18 29.59 22.94 21.05 21.95 24.34 25.53 26.21 27.62 27.81 29.04 28.22 26.74 25.98 26.72 26.54 24.40
APPENDIX B 663
Table 10.B.4 --Com*nued
Time X, Y, 2,
89 90 91 92 93 94 95 96 97 98 99
100
24.56 24.04 21.84 23.68 22.50 20.36 20.53 20.59 19.75 17.93 15.65 15.12
27.58 26.97 26.34 26.11 24.44 24.31 23.21 22.00 20.66 18.60 16.22 14.62
23.50 23.56 21.33 23.50 21.16 20.30 19.60 19.63 18.67 16.28 13.75 13.83
Bibliography
Agiakloglou, C., and Newbold, P. (1992), Empirical evidence on Dickey-Fuller type tests. J. Time Series Anal. 13, 471-483.
Ahlberg, J. H., Nilson, E. N., and Walsh, J. L. (I%?), The Theory of S p l k s curd Their Applicutions. Academic Press, New York.
Aha, S. K., and Reinsel, G. C. (19881, Nested reduced-rank autoregressive models for multiple time series. J. Amer. Statist. Assoc. 83, 849-858.
Ahn, S. K., and Reinael, G. C. (19901, Estimation for partially nonstationary multivariate autoregrdve models. J. Amer. Statist. hsoc. 85, 813-823.
Ahtola, J., and Tim, G. C. (19841, Parametor inference for a nearly nonstationary first order autoregressive model. Biometrika 71, 263-272.
Ahtola, J.. and Tiao, G. C. (1987), Distributions of least squares estimators of auto- regressive parameters for a process with complex roots on the unit circle. J. Time Series Anal. 8, 1-14.
Aigner, D. J. (1971). A compendium on estimation of the autoregressive moving average model from time series data. Inr. Econ, Rev. 12, 348-371.
Akaike, H. (1969a), Fitting autoregressive models for prediction. Ann. Inst. Stutist. Math. 21, 243-247.
Akaike, H. (1969b). Power spectrum estimation through autoregressive model fitting. Ann. Iwt. Statist. Math. 21, 407-419.
Akaike, H. (19731, Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrih 60, 255-265.
Akaike, H. (1974), Markovian representation of stochastic processes and its application to the analysis of autoregressive moving average prwesses. Ann. Inst. Slanist. Math. 26,
Amemiya, T. (1973). Generalized least square8 with an estimated autocovariance matrix. Econometrica 41, 723-732.
Amemiya, T.. and Fuller, W.A. (1%7), A comparative study of alternative estimators in a distributed lag model. Econometrica 35, 509-529.
Amemiya, Y. (1990). On the convergence of the ordered roots of it sequence of detednantal equations. Linear Algebra Appl. 127, 531-542.
Amos, R. E., and Koopmans, L. H. (19631, Tables of the distribution of the coefficient of coherence for stationary bivaciate Gaussian processes. Sandia Corporation Monograph SCR-483, Albuquerque, New Mexico.
Anderson, B. D. O., and Moore, J. B. (1979). Optimal Filtering. Prentice-Ha& Englewood Cliffs, New Jersey.
Anderson, R. L. (1942), Distribution of the serial correlation coefficient. Ann. Moth. Smisz.
664
363-387.
13, 1-13.
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
BIBLIOGRAPHY 665
Anderson, R. L., and Anderson, T. W. (1950), Distribution of the circular Serial correlation
Anderson, T. W. (1948). On the theory of testing serial correlation. S h d . Akfuariedskri#
Anderson, T. W. (1952). Asymptotic theory of certain ‘‘goodness of fit” criteria based on
Anderson,T. W. (1959),On asymptotic distributionsofestimates ofparameters ofstochastic
Anderson, T. W. (1962), The choice of the d e p of a polynomial regression as a multiple
Anderson, T. W. (1971), The Statistical Analysis of Time Series. Wiley, New Yo& Anderson. T. W. (1984), Introducrion to Multivariate Statistical Analysis (2nd 4.). Wiley,
New York. Anderson, T. W., and DarJing, D. A. (1952), Asymptotic theory of certain “goodness of fit”
criteria based on stochastic processes. Ann. Math. Statist. 23, 193-212. Anderson, T. W., and Takemula, A. (1986), Why do noninvertible estimated moving
averages occur? J. Time Series Anal. 7 , 235-254. Anderson, T. W., and Taylor, J. B. (1979), Strong consistency of least squares estimators in
dynamic models. Ann. Statist. 7, 484-489. Anderson, T. W., and Walker, A. M. (1964). On the asymptotic distribution of the
autocorrelations of a sample from a linear stochastic process. Ann. Math. Statist. 35, 12%- 1303.
Andrews. D. W. K. (1987), Least squares r e p s i o n with integrated or dynamic regressors under weak error assumptions. Econometric Theory 3, 98-116.
Arellano. C. (1992). Testing for trend stationarity versus difference stationarity. Un- published thesis, North Carolina State University, Raleigh, North Carolina.
Arellano, C.. and Pantula, S. G., (1995). Testing for trend stationarity versus difference stationarity. J. Time Series Anal. 16, 147-164.
Ash, R. B. (1972). Red Analysis and Probability. Academic Press, New Yo& Ball, R. J., and St. Cyr, E. B. A. (1966), Short term employment functions in British
Bartle, R. G. (1964). The Elemem of Real Analysis. Wiley, New York. Bartlett, M. S. (1946). On the theoretical specification and sampling properties of
autocorrelated time series. Suppl. J , Roy. Statist. SOC. 8, 27-41. Bartlett, M. S. (19M), Periodogram analysis and continuous spectra. Biometrih 37, 1-16. Bartlett, M. S. (1966), An Introduction to Stochastic Processes with Special Reference to
Bartlett, M. S . (19781, An Introduction to Stochastic Processes (3rd ed.). Cambridge
Basawa, I. V. (1987), Asymptotic distributions of prediction errors and related tests of fit for
Basmann, R. L. (1957). A generalized classical method of linear estimation of coefficients
M s h , N., and Priestly, M. B. (1981), A study of AR and window spectral estimation.
coefficient for residuals ftom a fitted Fourier Series. Ann. Math. Srurist. 21, 59-81.
31, 88-116.
stochastic processes. Ann. Math. Statist. 23, 193-212.
difference equations, Ann. Math. Stat& 30,676-687.
decision problem. Ann. Math. Statist. 33, 255-265.
manufaduring industry. Rev. Econom. Stud. 33, 179-207.
Methods and Applications (2nd ed.). Cambridge University Press, Cambridge.
University Press. Cambridge.
nonstationary processes. Ann. Statist. 15, 46-58.
in a structural equation. Econometrics 25, 77-83.
Appl. Statist. 30, No. 1.
666 BIBUOGRAPHY
Bell, W. R. (1983), A computer program (TEST) for detecting 0ui:ws in time series. 1983 Proceedings of the Bvsiness and Economics Section, h r . Siatist. Assoc,, 634-639.
Bellman, R. (1960). Inrrodwtion to Matrix Analysis. McGraw-Hill, New York. Beran, J. (1992). Statistical methods for data with long-range dependence. Sf&t. Sci. 7,
Beran, J., and Ghosh, S. (lWl), Slowly decaying comlations, testing normality, nuisance
Berk, K. N. (1974). Consistent autoregressive spectral estimates. Ann. Statist. 2,489-502. Bernstein, S . (1927). Sur I’extension du thdorhe limite du calcul des probabiiWs aux
sommes de quantitibs d6pndantes. Math. Ann. 97, 1-59. Bhansali, R. J. (1980). Autoregremive and window estimates of the inverse correlation
function. Biometrika 67, 55 1-566. Bhansali, R. J. (1981), Effects of not knowing the order of an autoregressive process I. J.
Amer. Statist. Assoc. 76, 588-597. Bhansali, R. J. (1991), Consistent recursive estimation of the order of an autoregmsive
moving average process. Int. Statist. Rev. 59, 81-96. Bhansali, R. J. (1993). Order selection for linear time series models: a review. In
Developments in Time Series Analysis. (T. Subba Rao, ed.) Chapman and Hall, New
Bhargava, A. (1986), On the theory of testing for unit roots in observed time series. Rev.
Billinglsey, P. (1968). Convergence of Probdlity Measures. Wiley, New York. Biliingsley, F? (1979). ProWility and Measure. Wiley, New Yo&. Birnbaum, Z. W. (1952). Numercial tabulation of the distribution of Kolomogorov‘s statistic
Blackman, R. B., and Tukey. J. W. (1959). The Measurement of Power S’ctrafTom the
Bloomfield. P. (1976). Fourier Analysis of Time Series: An Introduction. Why, New Yo& Bollerslev, T,, Chou, R. Y., and Kroner, K. F. (1992), ARCH modeling in finance: A review
of the theory and empirical evidence. J. Econometrics 52, 5-59. Box, G. E. P. (1954), Some theorems on quadratic forms applied in the shdy of analysis of
variance problems, I. Effect of inequality of variance in the one-way classification. Ann. Math. Statist. 2!4, 290-316.
Box, G. E. P., Jenkins, G. M., and Reinsel. G. C. (1994), Time Series Analysis: Forecusting und Control. Prentice Hall, Englewood Cliffs, New Jersey.
Box, G. E. P., and Pierce, D. A. (1970), Distribution of residual autoconelations in autorepsive-integrated moving average time series models. J. Amer. Statist. Assoc.
Box, G. E. P., and Tiao, G. C. (1977). A canonid analysis of multiple time series. Biometrika, 64, 355-365.
Breidt, P. J., and CarriqUj., A. L. (1W4), Stochastic volatility models: an application to S & P 500 Futures. Unpublished manuscript, Iowa State University, Ames, Iowa.
Breitung, J. (1994), Some simple tests of the moving-average unit root hypothesis. J. Time Series Anal. 15, 351-370.
Brillinger, D. R. (1975). Erne Series: Data Analysis and Theory. Holt, Rinehart and
404-416.
parameters. J. Amer. Sratist. Assoc. 86, 785-791.
YO&. 50-66.
&on. Stud. 53, 369-384.
for finite sample size. J, Amer. Starist. Assoc. 47, 425-441.
Point of View of Communications Engineering. Dover, New York.
65, 1509-1526.
BIBLIOGRAPHY 667
Winston, New York.
Springer-Verlag, New York. Brockwell, P. J., and Davis, R. A. (1991), Time Series: Theory and Methods (2nd ed.).
Brown, B. M. (1971), Martingale central limit theorems. Ann. Math. Srurist, 42, 59-66. Brown, B. M., and Hewitt, J. I. (1975). Asymptotic likelihood theory for diffusion
processes. J. Appl. Prob. 12, 228-238. Brown, R. G. (1962). Smoothing. Forecasting, and Prediction of Discrete Rme Series.
Prentice-Hall, Englewood Cliffs, New Jersey. Bruce, A. G., and Martin, R. D. (1989). Leave-&out diagnostics for time series (with
discussion). J. Roy. Statist. SOC. B 51, 363-424. Burg, J. P. (1975). Maximum entropy spectral analysis, Ph.D. thesis, Stanford Univesity,
Stanford, California. Burguete, J., Gallant, A. R., and Souza, 0. (1982), On unification of the asymptotic theory
of nonlinear econometric models. Econ. Rev. 1, 151-190. Burman, J. P. (1965), Moving seasonal adjustment of economic time series. J. Roy. Statist.
Soc. A 128, 534-558. Carmll, R. J., and Ruppert, D. (1988). Trahsfoormation and Weighting in Regression.
Chapman and Hall, New York. Chan, N. H. (1988). On the parameter inference for nearly nonstationary time series. J.
Amer. Statist. Assoc. 83, No. 403, 857-862. Chan, N. H.. and Tsay, R. S. (1991), Asymptotic inference for non-invertible moving
average time series. Unpublished manuscript. Carnegie Mellon University, Pittsburgh, Pennsylvania.
Chan, N. H., and Wei, C. Z. (1987), Asymptotic inference for nearly nonstationary AR(1) processes. Ann. Statist. 15, 1050-1063.
Chan, N. H., and Wei, C. Z. (1988), Limiting distributions of least square estimates of unstable autoregmssive processes. Ann. Sturist. 16, 367-41.
Chang, I., Tiao, G. C., and Chen, C. (1988), Estimation of time wries parameters in the presence of outliers. Technometrics 30, 193-204.
Chang, M. C. (1989), Testing for overdifference. Unpublished thesis, North Carolina State University, Raleigh, North Carolina.
Chernoff, H. (1956), Large sample theory: parameteric case. Ann. Math. Stutisr. 27, 1-22. Chung, K. L. (1968). A Course in Probubility Theory. Harcourt, Brace and World, New
Cleveland, W. S. (1971). Fitting time series models for prediction. Technometrics 13,
Cleveland, W. S. (1972), The inverse autocorrelations of a time series and their applications. Technometrics 14, 277-293.
Cochran, W. T., et al. (1%7), What is the fast Fourier transform? IEEE T r m . Audio and Electroacoustics AU-15, No. 2, 45-55.
Cochrane, D., and Orcutt, 0. H. (1949), Application of least squares regression to relationships containing autocorrclated error terms. J. Amer. Srafisr. hsoc . 44.32-61.
Colley, I. W., Lewis, P. A. W., and Welch. P. D. (1%7), Historical notes on the Fast Fourier Transfm. IEEE Trans. Audio and Electroacoustics AU-15. No. 2, 76-79.
Cox, D. D. (lWl), Gaussian likelihood estimation for nearly nonstationary AR(1) processes. Ann. Statist. 19, 1129-1142.
York.
713-723.
668 BIBLIOGRAPHY
Cox, D. R. (1977). Discussion of papers by Campbell and Walker and 2 and Morris, J.
Cox, D. R. (19841, Long-range dependence: a review. In Statistics: An Appraisal ( H . A.
Cox, D. R., and Miller, H . D. (1%5), The Theory of Stochusric Processes. Wiley, New
Cram&, H. (1940). On the theory of stationary random procasses. Ann. Math. 41,215-230. Cram&, H. (1946), Mathematical Methods of Statistics. Princeton URiversity Press,
Crowder, M. J. (1980), On the asymptotic properties of lwt squares estimators in
Cryer, 1. D., and Ledolter, J. (1981). Small sample propities of the maximum likelihood
Cryer, J. D.. Nankervis, J, C., and Savin, N. E. (1989), Mirror-image distributions in AR(1)
Dahlhaus, R. (1989), Efficient parameter estimation for self-similar processes. Ann. Statist.
Davies, R. 8. (1973), Asymptotic inference in stationary Gaussian time-series. Adv. Appl. Prob. 5. 469-497.
Davis, H. T. (1941), The Analysis of Economic Time Series. Rincipia Press, Bloomington, Indiana,
Davisson, L. D. (1965). The prediction error of stationary Gaussian time series of unknown covariance, IEEE Trans. Ittjorm. Theory 11, 527-532.
W a c i e , J. S., and Fuller, W. A. (1972), Estimation of the slope and analysis of covariance when the concomitant variable is measured with error. J. Amer, Sfarist. Assoc. 67, 930-937.
DeJong, D. N., Nankervis, J. C., Savin, N. E., and Whiteman, C. H. (1992a), Integration versus trend-stationarity in macroeconomic time serics. Economefrica 60,423-434.
DeJong, D. N., Nankervis, J. C., SavJn, N. E., and Whiteman, C. H. (t992b). The power probfems of unit root tests for time series With autoregressive errors. J. Econometrics
Denby, L., and Martin, R. D. (1979). Robust estimation of thc fvst order autoregressive
Deo, R. (1995), Tests for unit roots in mdtivruiate autonpssive praesses. Rh.D.
Dhrymes, P. J. (1971), Distributed Lugs: Probiem of Estimation and FonnrJation.
Diananda, F? H. (1953)' Some probability limit theorems with staristical applications. Proc.
Dickey, D. A. (1976). Estimation and hypothesis testing in nonstationary time series. Ph.D.
Dickey, R, A., Be& W. R., and Miller, R. B. (1986), Unit roots series model: Tests and
Dickey, D. A,, and Fuller, W. A. (1979). Distribution of the estimators for autoregressive
Roy. Stafist. SOC. A 140, 453-454.
David and H. T. David, eds.). Iowa State University Press, A m , Iowa.
York.
Princeton, New Jersey.
autoregression. Ann. Statist. 8, 132-146.
estimator in the MA( 1) model. Biometrika 68, 691-694.
models. Econometric Theory 5, 36-52.
I
17, 1749-1766.
53, 323-343.
parameter. J. Amer. Statist. Assoc. 74, 140-146.
dissertation, Iowa State University, Ames, Iowa.
Holden-Day, San Francisco.
Cambridge Philos. Soc. 49,239-246.
thesis, Iowa State University, Ames, Iowa.
implications. Amer. Statistician 4, 12-26.
time series with a unit root. J. Amer. Statist. Assoc. 74, 427-431.
BIBLIOGRAPHY 669
Dickey, D. A., and Fuller, W. A. (1981), Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica 49, 1057-1072.
Dickey, D. A., Hasza, D. P., and Fuller, W. A. (19&4), Testing for unit roots in seasonal time series. J. Amer. Statist. Assoc. 79. 355-367.
Dickey. D. A., and Pantula, S. G. (1987), Determining the order of differencing in autoregressive processes. J. Business and Econ. Statist. 5, 455-461.
Dickey, D. A., and Said, S. E. (1982), Testing ARIMA (p, 1, q) vmus ARMA (p f 1,q). In Applied Time Series Analysis (0. D. Anderson, ed.). North Holland, Amsterdam.
Didemch, G. T. (1985), The Kalman filter from the perspective of Goldberger-lbil estimators. Amer. Statistician 39, 193-198.
Diebold, F. X., and Nerlove, M. ( l W ) , Unit mots in economic time series: a selective survey. Econometrica 8, 3-69.
Draper, N. R., and Smith, H. (1981). Applied Regression AnaJysis (2nd 4.). Wiley, New Yo*.
Dufour, J. M., and King, M. L. (1991), Optimal invariant tests for the autocorrelation coefficient in linear regressions with stationary or nonstationary AR(1) errors. J. Econometrics 47, 115-143.
Duncan, D. B., and Horn, S. D. (1972), Linear dynamic recursive estimation from the viewpoint of regression analysis. J. h r . Statist. Assoc. 67, 815-821.
Durbin, J., (1959), Efficient estimation of parameters in moving-average models. B k t -
W i n , J. (1960), Estimation of parameters in time-series regression models. J. Roy. Srcltist.
Durbin, J. (1%1), Efficient fitting of linear models for continuous stationary time series from discrete data. Bull. Int. Statist. Inst. 38, 273-281.
Durbin, J. (1962). Trend elimination by movingaverage and variate-difference filters. Bull. Int. Statist. Inst. 39, 131-141.
Durbin, I. (1%3), Trend elimination for tbe purpose of estimating seasonal and periodic components of time series. Pmc. Symp. Time Series Analysis, Brown University (M. Rosenblatt. ed.). Wiley, New York, 3-16.
Durbin, J. (1%7), Tests of serial independence based on the cumulated periodogram. Bull. Int. Statist. Inst. 42, 1039-10119.
Durbin, J. (1%8), The probability that the sample dishibution function lies between two parallel straight lines. Ann. Math. Statist. 39, 398-411.
Durbin, J. (1%9). Tests for serial correlation in regression analysis based on the periodogram of least-squares residuals. Biometrika 56, 1-15.
Durbin, J. (1973), Distribution Theory for Tests Based on the Sample Distribution Function. Regional Conference Series in Applied Mathematics No. 9, SLAM, Philadel- phia.
hubin, J., and Cordero, M. (1993), Handling structural shifts, outliers and heavy-tailed distributions in state space time series models. Unpublished manuscript, London School of Economics and Political Science, London.
Durbin, J., and Watson, G. S. (1950). Testing for serial conelation in least squares regression, I. BiometriRa 37. 409428.
r i b 46, 306-316.
SW. B 22, 139-153.
670 BIBLIOGRAPHY
Rurbin, J., and Watson, G. S. (1951), Testing for serial *lion in least squares regression, II. Biomehika 38, 159-178.
Dvorekky, A. (1972), Asymptotic normality for sums of d e p c h n t mdom variables. Sixth Berkeley Symp. Math. Statist. Prob. 2, University of California Press, Berkeley, 5 t 3-535.
Eberlain, E. (1986), On strong invariance principles under dependence assumptions. Ann. Probab. 14, 260-270.
Eicker, F. (1963). Asymptotic normality and consistency of the least squares estimators for families of linear regressions. Ann. Math. Sratist. 34, 447-456.
Eicker, F. (19671, Limit theorems for regressions with unequal and dependent errors. Proc. Fzph Berkeley Symp. Math. Statist. Prob. 1, University of California Ress, Berkeley,
Eisenhart, C., Hastay, M. W., and Wallis, W. A. (1947), Selected Techniques of Statistical Analysis. McGraw-Hill, New York.
Elliott, G. (1993), Efficient tests for a unit mot when the initial observation is &awn from its unconditional distribution. Unpublished manuscript. Harvard University, Cam- bridge, Massachusetts,
Elliott, G., Rothenberg, T. J., and Stock, J. H. (1992), Efficient tests for M autoregressive unit root Unpublished paper presented at the NBER-NSF Time Series Seminar, Chicago.
Elliott, G., and Stock, J. H. (1992), Efficient tests for an autoregressive unit root. Paper presented at the NBER-NSF Time Series Seminar, Chicago.
Eltinge, J. L. (1991), Exponential convergence propeaies of autocovariance matrix inverses and latent vector prediction coefficients. Unpublished manuscript, Department of Statistics, Texas A & M University, College Station, Texas.
Elton, C., and Nicholson, M. (1942), The ten-year cycle in numbers of lynx in Canada. J .
Engle, R. P. (1982), Autoregressive conditional heteroscedasticity with estimates of UK inflation. Econornetrica 50, 987-1007.
Engle, R. F., and Bollemlev (1986). Modelling the persistence of conditional variances. Econometric Reviews 5, 1-50.
Engle, R. F., and Granger, C. W. I. (1987). Co-integration and error comction: representa- tion, estimation, and testing. Econornetrica 55, 251-276.
Engle, R. F.. and Kraft, D. F. (19811, Multiperiod forecast M o r variances of inflation estimated from ARCH models. P m . MA-Census-NBER Conference on Applied Time Series Analysis of Economic Dafa (A. ZeIlner, 4.). 176-177.
Evans, G. B. A., and Savin, N. E. (198la), The calculation of the limiting distribution of the least squares estimator of the parameter in a random walk model. Ann. Srarist. 9,
Evans, G. B. A., and Savin, N. E. (1981b), Testing for unit roots: 1. Ecomtnerrica 49,
Evans, G. B. A., and Savin, N. E. (1984), Testing for unit roots: 2. Economtrica 52,
Ezekiel, M., and Fox, K. A. (1959). Methods of Correlation and Regression Analysis.
59-82.
Anilnal ECO~O~Y 11, 215-244.
1114-1118.
753-777.
1241- 1249.
Wiley, New York.
BIBLIOGRAPHY 671
Fieller, E. C. (1932). The distribution of the index in a n o d bivariate population.
fieller, E. C. (1954). Some problems in interval estimation. J. Roy. Statist. Soc. B ZQ,
Fidley, D. F. (1980), Large sample behavior of the S-army of seasonally nonstationary ARMA series, In Time Series Analysis (0. D. A n h n and M. R. Perryman, ed~.), North Holland, Amsterdam, 163-170.
Fidley, D. F. (1985). On the unbiasedness property of AIC for exact or approximating linear stochastic time series models. J. Tirsre Series Anal. 6, 229-252.
Findley, D. F., and Wei, C. Z. (1989). Beyond chi-square: likelihood ratio ptmxdwe~ for comparing non-nested possibly incorrect regressors. Unpublished manuscript, U.S. Bureau of the Census, Washington.
Finkhiner. D. T. (1960). Introduction to Mutrices and Linear Transformations. W. H. Fheman. San Francisco.
Fisher, R A. (1929). Tests of significance i0 harmonic analysis. Proc. Roy. Soc. London Ser. A 125, 54-59.
Fishman, G. S. (1969). Spectral Methods in Econometrics. Harvard University Press, Camkdge, Massachusetts.
Fountis, N. G. (1983). Testing for unit mots in multivariate automgressions. Unpublished PkD. thesis, North Carolina State Univesity, Raleigh, North Carolina.
Fountis, N. G., and Dickey, D. A. (1989), Testing for a unit root nonstationarity in multivariate autorcgressive time series. Ann. Statist. 17, 419-428.
Fox, A. J. (1972), Outliers in time series. J. Roy. Statist. Soc. B 34, 350-363. Fox, R., and Taqqu, M. S. (1985). Noncentd limit theorems for quadratic forms in random
Fox, R., and Taqqu, M. S. (1986). Large sample properties of parameter estimates for
Franklin, J. N. (1968), Matrix Theory. Prentice-Hall, Englewood Cliffs, New Jersey. Friedman, M., and Schwiuiz, A. J. (1%3), A Monetury History of the United Stutes
Fuller, W. A. (1969), Grafted polynomials as approximating functions. Australion J . Agr.
Fuller, W. A. (1979), Testing the autoregressive process for a unit root. Paper presented at
Fuller, W. A. (1980). The use of indicator variables in computing predictions. J.
Fuller, W. A. (1984). Nonstationary autoregressive time series. in H u h & of Statistics, 5, Time Series in the Time Domain (E. J. Hannan, P. R. Krishnaiah, and M. M. Rao, eds.).
Biometrika 24, 428-440.
175-185.
variables having long-range dependence. Ann. Prob. 13, 428-446.
strongly dependent stationary Gaussian time series. Ann. Statist. 14, 517-532.
2867-1960 Princeton Univeristy Press, Princeton, New Jersey.
Econ. 13, 35-46.
the 42nd Session of the International Statistical Institute, Manila.
E C O I W F ? W ~ ~ ~ ~ 12, 231-243.
Noah Holland, Amsterdam, 1-23. Fuller, W. A. (1987). Meusurement Error Models. Wiley, New Yo&. Fuller, W. A., and Hasza, D. I? (1980), Predictors for the first-order autoregressive process.
Fuller, W. A., and Hasza, D. P. (1981), Properties of predictors for autoregressive time J. Econometrics 13, 139-157.
series. J. Amer. Sm'st. Assoc. 76, 155-161.
672 BIBLIOGRAPHY
Fuller, W. A.. Has% D. P., and Cbabel, 1. J. (1981). Estimation of the parameters of
Fuller, W. A,. and Martin, J. E. (1961). The effects of autocomlatcd emrs on the statistical
Galbraith, R F., and Gailbraith, J. I. (1974), On the invmes of some p a t t d matrim
Gallant, A. R. (1970, Srarisrical Inference for Nonlimar Regression Models. Ph.D. thesis,
Gallant, A. R. (19753, Nonlinear regression. Amer. Statistician 29, 73-81. Gallant, A. R. (1975b), The power of the likelihood ratio test of location in nonlinear
Gallant, A. R. (1987). Nonlinear Staristical Models. Wlley, New York. Gallant, A. R., and Fuller, W. A. (1973), Fitting segmented polynomial regression models
whose join points have to be estimated. J. Amer. Statist. Assoc. 68, 144-147. Gallant, A. R., and White, H. (1988). A Um@d Theory of Estimation and Inference for
Nonlinear Dynamic Models. Basil Blackwell, New York. Gay, R., and Heyde, C. C. (19901, 00 a class of random field models which allows long
range dependeuce. Bwmerrika 77,401-403. Geweke, J., and Porter-Hudak, S. (1983), The estimation and application of long memory
time series models. J. Time Series Anal. 4, 221-237. Giraitis, L., and Surgailis, D. (1990). A central limit theorem for quadratic forms in strongly
dependent linear variables and its application to asymptotic normaIity of Whittle’s estimate. Prob. Theory and Related Fields 86, 87-104.
Gnedenko, B. V. (1967). The Theory of Probability ( m l a t e d by B. D. *Her). Chelsea, New York.
Goldberg, S . (1958), Inrmduction ro Di@ereme Eqrrations. Wiley, New York. Gonzalez-Farias, G. M. (1992), A new unit root test for autorepsive time series. Ph.D,
dissertation, North Carolina State University, Raleigh, North Carolina. Gonzalez-Fatias, G., and Dickey, D. A. (1 W), An unconditional maximum iikelihoad teat
for a unit mot. 1W Proceedings of the Business and Economic Statistics Section,
Gould, I. P., and Nelson, C. R. (1974). The stochastic strucm of the velocity of money.
Gradshtcyn, I. S., and Ryzhik, I. M. (1%5), (4th ed.). Tables of Integmh, Series and
Granger, C. W. J. (1980), Long memory relationships and the aggregation of dynamic
Granger, C. W. J. (1981), Some properties of time series data and their use in econometric
Granger, C. W. J., and Andersen, A. P. (1978), An Inrroahction fo Bilinear Tim Series
Granger, C. W. J.. and Hatanaka, M. (1!364), Spectre1 Analysis of Economic Time Series.
Granger, C. W. J., and Joyeux, R. (1980). An introduction to long-range time series models
stochastic difference equations. Ann. Stafisc. 9, 531-543.
estimation of distributed lag models. J . Farm Econ. 43, 71-82.
arising in the theory of stationary time series. J. Appl. Prob. 11, 63-71.
Iowa State University, Ames, Iowa.
regression models. J. Amer. Statist. Assoc. 70, 199-203.
Amer. SUrisr. ASSOC. 139-143.
Amer. Econ. Rev. 64, 405-417.
Producrs. Academic Press, New York.
models. J. Econometrics 14, 227-238.
model specification. J. Econometrics 16, 121-130.
Models. Vandenhoeck and Ruprecht, Giittingen.
Princeton University Press, Princeton, New Jersey.
and hctional differencing. J. Time Series Anal. 1, 15-30.
BIBLIOGRAPHY 673
Granger, C. W. I., and Teriisvirta, T. (1993), Modelfing Nonlinear Economic Relationships. Oxfd University Press, Oxford.
Granger, C. W. J., and Weiss, A. A. (1983), Time series analysis of error correction models. In Studies in Economic Time Series and Multivariate Statistics ( S . Karlin, T. Amemiya and L.A. Goodman. eds.), Academic hess, New York, 255-278.
Grenander, U. (1950), Stochastic processes and statistical inference. Ark. Mar. 1,195-275. Grenander, U. (1954). On the estimation of regression coefficients in the case of an
Grenander, U. (1957), Modern trends in time series analysis. Smkhya 18, 149-158. Grenander. U., and Rosenblatt, M. (1957). Statistical Anafysis of StatioMly Time Series.
Grether, D. M., and Nerlove, M. (1970). Some properties of "optimal" seasonal
Oreville, T. N. E. (1969). Theory and Applications of Spline Functions. Academic Press,
Griliches, Z. (1%7), Distributed lags: a swey. Econumetrica 35, 16-49. Grizzle, J. E., and Allen, D. M. (1969), Analysis of growth and dose response curves.
Gumpertz, M. L., and Pantula, S. G. (1992). Nonlinear regmsion with variance com-
Guo, L., Huang, D. W., and Hannan. E. J. (1990), On ARX(m) approximation. J.
Hall, A. (1992). Teating for a unit root in time series with pretest data-based model
Hall, P., and Heyde, C. C. (1980), Marringale Limit Theory and its Application. Academic
Hampel, F. R., Ronchetti, E. M., Rwsseeuw, P. J., and Stahel, W. A. (1986). Robusr
Hannan, E. J. (19561, The estimation of relationships involving distributed lags. Ecorw-
Hannan, E. J. (19581, The estimation of spectral density after trend removal. J. Roy. Sfarist.
Hannan, E. J. (1960), Time Series Anualysis. Methuen, London. Hannan, E. J. (19611, A central limit theorem for systems of regressions. Proc. Cambridge
Hannan, E. J. (1963). The estimation of seasonal variation in economic time series. J .
Hannan, E. J. (1%9), A note on an exact test for trend and serial correlation. Economerricu
Hannan, E. J. (1970). Multiple Time Series. Wiley, New York. Hannan, E. I. (1971). Nonlinear time series regression, 1. Appl. Prob. 8, 767-780. Hannan, E. J. (1979). The central limit theorem for time series regression. Stock Processes
Hannan, E. J. (1980). The estimation of the order of an ARMA process. Ann. Statist. 8,
autoconrelated disturbance. Ann. Math. Statist. 25,252-272.
Wiky, New York.
adjustment. Econometrics 38, 682-703.
New York.
Biometrics 25, 357-381.
ponents. J. Amer. Starist. Assoc. B7, 201-209.
Multivariate Anal. 32, 17-47.
selection. J. Business and Econ. Statist. 12,461-470.
Press, New York.
Statistics: The Approach Based on Influence Functions. Wiley, New York.
metrica 33,206-224.
SOC. B 20, 323-333.
Phil. SOC. S7, 583-588.
Amer. Statist, Assoc. 58, 31-44.
37, 485-489.
Appl. 9, 281-289.
107 1 - 108 1.
BIBLIOGRAPHY 674
Hannan, E. J. (1987), Rational transfer function approximation, Statist. Sei. 2, 135-161. Hannan, E. J., and Deistler, M. (19881, The Statistical Theory of Linear Systems. Wiley,
Hannan, E. J., Dunsmuir, W. T. M., and Deistler, M. (1980), Estimation of vector ARUAX
Hannan, E J., and Hey&, C. C. (1972), 00 limit theorems for quadratic funtions of discrete
Hannan, E. I., and Kavalietis, L. (1984). A method for autoregressive-moving average
Hannan, E. J., and Nicholls, D. E (1972), Thc estimation of mixed regression, autoregm-
Hannan, E. J., and Q u b , B. G. (1979), The determination of the order of an autoregres-
Hannan, E. J., and Rissanen, J, (1982), Recursive estimation of mixed autoregressive-
Hansen, J., and Lebedeff, S. (19871, Global trends of measured surface air temper-. J.
Hansen, J., and L.ebedeff, S. (1988), Global surface air temperatures: update through 1987.
Hansen, M. H., HurwitZ, W. N., and Madow, W. G. (1953). Sump& Survey Merhocis and
H a ~ ~ i s . B. (1967). Spectral Analysis of Time Series. Wiley, New York. Hart, B. I. (1942). Significance levels for the ratio of the mean square successive difference
Hllrtley, H. 0. (l%l), The modified Gauss-Newton method for the fitting of nonlinear
Hartley, H. O., and Booker, A. (1965), Nonlinear least squares estimation. Ann. Math.
Harvey, A. C . (1984), A unified view of statistical forecasting procedures (with comments), J. Forecasting 3, 245-283.
Harvey, A. C. (1989). Forecasting. Structural Time Series Models and the Kalnuur Filter. Cambridge University Press, Cambridge.
Harvey, k C., and Durbin, J. (1986), The effects of seat belt legislation on British road casualties: a case study in structural time series modelling (with discussion). J. Roy. Statist. Soc. A 149, 187-227.
Harvey, A. C., and Pierse, R. G. (1984). Estimating missing observations in economic time series. J . Amer. Statist. Assoc. 79, 125-131.
Harvey, A. C.. Ruiz, E.. and Shephard, N. (1994), Multivariate stochastic variance models. Rev. Econ. Stud. 61, 247-264.
Harville, D. A. (1986). Using ordinary least squares software to compute combining intra-interblock estimates of treatment contrasts, Amer, Statistician 40, 153-157.
Haslett, J., and Raftery, A. E. (1989). Space-time modelling with long-memory depen- dence: assessing Ireland's wind power resource (with discussion). J. Roy. Statist. Soc. c 38, 1-21.
New York.
models. J. Multivariate AM^. 10, 275-295.
time series. Ann. Math. Statist, 43, 2058-2066.
estimation. Biometrih 72, 273-280.
sion, moving average and distributed lag models. Econotnetrica 48, 529-548.
sion, J. Roy. Statist. Soc. B 41, 190-195.
moving average order. Biometrika 69, 81-94. Correction (1983). 70, 303.
Geophys. Res. 92, 13345-13372.
Geophys. Res. k i t . 15, 323-326.
Theov, 11. Wiley, New York.
to the variance. Ann. Math. Statist. 13, 445-447,
regression functions by least squares. Techtwmetrics 3. 269-230.
Statist. 36, 638-650.
BlBLNXiRAPHY 675
Hasza, D. P. (19771, Estimation in nonstationary time series. Ph.D. thesis, Iowa State University, Ames, Iowa.
Hasza, D. P. (1980), A note on maximum likelihood estimation for the first-order autoregressive process. Corn. Srarisr. Theory and Merhods A9, No. 13, 141 1-1415.
Hasza, D. P., and Fuller, W. A. (1979), Estimation for autoregressive processes with unit roots. Ann. Statist. 7 , 1106-1 120.
Hasza, D. P., and Fuller, W. A. (1982), Testing for nonstationary parameter specifications in seasonal time series models. Ann. Sfatist. 10. 1209-1216.
Hatanaka, M. (1973), On the existence and the approximation formuIae for the moments of the k-class estimators. Econ. Smd. Quurr. 24, 1-15.
Hatanaka, M. (1974), An efficient two-step estimator for the dynamic adjustment model with autoregressive errors. J. Economerrics 2, 199-220.
Hermdorf, N. (1984). A functional central limit theorem for weakly dependent sequences of random variables. Ann. Prdi. 12, 141-153.
Higgins. M. L., and Bera, A. K. (1992). A Class of Nonlinear ARCH Models. Inf. &on. Rev. 33, 137-158.
Hildebrand, F. B. (1968), Finite-Difference Equurions Md Simulations. Prentice-Hall, Englewood Cliffs, New Jersey.
Hildreth, C. (1969), Asymptotic distribution of maximum likelihood estimators in a linear model with autoregressive disturbances. Ann. Math. Srarisr. 40, 583-594.
Hillmer, S. C., Bell, W. R., and Tiao, G. C. (1983). Modeling considerations in the seasonal adjustment of economic time series. In Applied Time Series Analysis of Economic Duta (A. Zelher, ed.). United States Bureau of the Census, Washington.
Hinich, M. (1982), Testing for Gaussidty and linearity of a time series. 1. Time Series
Hoeffding. W., and Robbins, H. (1948), The central limit theorem for dependent random
Hosking, J. R. M. (1981). Fractional dif'ferencing. Bbmetriku 68. 165-176. Hosking, 1. R. M. (1982), Some models of persistence in time series. In Time Series
Analysis: Theory und Practice I (0. D. Anderson, 4.). North-Holland, Amsterdam. Huang, D., and Anh, V. V. (1993). Estimation of the nonstationary factor in ARUMA
models, J. Time Series Anal. 14, 27-46. Huber, P. J. (1981), Robust Stutisrics. Wiley, New York. lbragimov, I. A., and Linnik, Y. V. (19711, Independent and Stationary Sequences of
Jacquier. E., Polson, N. G., and Rossi, P. E. (1994). Bayesian analysis of stochastic
Jenkins, G. M. (1%1), General considerations in the analysis of spectra. Technometrics 3,
Jenkins, G. M., and Watts, D. G. (1968). Spectral Analysis and its Application. Holden-
Jennrich, R I. (1%9), Asymptotic properties of non-linear least squares estimators. Ann.
Jobson. J. D. (1972). Estimation for linear models with unknown diagonal covariance
AMl. 3. 169-176.
variables. Duke Math. J. 15, 773-780.
Random Variables. Wolters-Noonibff, Groningen.
volatility models. J. Business und Econ. Sfatisr. 12, 371-389.
133-166.
Day, San Francisco.
Math. Stutist. 40, 633-643.
matrix. Unpublished R.D. thesis, Iowa State University, Ames, Iowa.
676 BIBLIOGRAPHY
Jobson, J. D., and Fuller, W. A. (1980). Least squares estimation when the covariance matrix and the parameter vector are functionally related. J. Amer. Statist. Assoc. 75,
Johansen. S. (1988), Statistical analysis of cointegration vectors, J. Eon, DyMmics and Control 12,231-254.
Johansen, S . (1991h Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrca 59, 1551- 1580.
Johansen, S., and Juselius, K. (1990), Maximum liLelihood estimation and inference on cointegration-with applications to the demand for money. Oxford Bull. &on. and Srarisr. 52, 169-210.
176-1 81.
Johnston, J. (1984). Econometric Methods (3rd ad.). McGraw-Hill, New York. Jolley, L. B. W. (1%1), Swnmation of Series (2nd ed.). Dover, New Yo&. Jones, D. A. (1978), Nonlinear autoregressive processes. Proc. Roy. Soc. Am, 71-95. Jones, R. H. (1980), Maximum likelihood fitting of ARMA models to time series with
missing observations. Technometrics 22, 389-395. Jorgenson, D, W. (1964). Minimum variance, linear, unbiased seasonal adjustment of
economic time series. J. Amer. Statist. Assoc. 59, 681-724. Judge, G. C., Griffiths, W. E., Hill, R. C., uithkepohl, H., and Lee, T-C. (1985), & Theory
and Practice of Econometrics. Wiley, New Yo&. Kalman, R. E. (19601, A new approach to linear filtering and pradiction pblems. T m .
ASME J. Basic Eng. 82, 35-45. Kalman, R. E. (19631, New methods in Wiener filtering theory, In Proceedings ofthe First
Symposium on Engineering Applications of Random Function Theory and Probability (J. L. Bodanoff and F. Kazin, eds.). Wiley, New York.
Kalman. R. E., and Bucy, R. S. (1961), New results in linear filtering and prediction theory. Trans. Anter. Soc, Mech. Eng. J . Basic Eng. 83, 95-108.
ICawashima, H. (1980), Parameter estimation of autoregressive integrated processes by least squares. Ann. Statist. 8, 423-435.
Kempthome, O., and Folks, L. (1971), Probability, Statistics and Data Analysis. Iowa State University Press, Ames, Iowa.
Kendall, M. G. (1954). Note on bias in the estimation of autocornlation. Biomerrih 41, 403-404.
Kendall, M. 0.. and Stuart, A. (1966), The Advanced Theory of Statistics 3. Hafner, New York.
King, M. L. (1980), Robust tests for spberical symmetry and their application to least squares regression. Ann. Stafist. 8, 1265-1271.
King, M. L. (1988), Towards a thsory of point optimal tcsting. Econ. Rev, 6, 169-218. Kitigawa, G. ( 1987), Non-Gaussian state-space modelling of nonstationary time series (with
Klimko, L. A., and Nelson, P. I. (1978), On conditional least squares estimation for
Kohn, R., and Ansley, C. P. (1986), Estimation, prediction, and interpolation for ARIMA
Koopmans, L. H. (1974). 7% Specrral Analysis of Time Series. Academic Press, New Yo&.
discussion). J. Amer. Statist. Assoc. 82, 1032-1063,
St~haStiC P~OCWS~S. Ann. StW'St. 6, 629-642.
models with missing data. J. Amer. Statist. Assoc. 81, 751-761.
BIBLIOGRAPHY 677
Koopmans, T. (1942). Serial cormlation and quadratic forms in nonnal variables. Ann. Marh. Statist. 13, 14-33.
Koopmans, T. C., Rubin, H., and bipnik, R. B. (1950), Measuring the equation systems of dynamic economics. In Statistical Inference in DyMmic Economic Models (T. C. Koopmans, 4.). Wiley, New York.
Kramer, K. H. (1963). Tables for constructing confidence limits on the multiple comlation coefficient. J. Amer. Statist. Assoc. 58, 1082-108s.
Kriimer, W. (1986). Least squans regression when the independent variable follows an ARIMA process. J. Amer. Sratist. Assoc. 81, 150-154.
Kreiss, J. P. (1987). On adaptive estimation in stationary ARMA processes. Ann. Statist. 15,
Lai, T. L., and Siegmund, D. (1983). Fixed accuracy estimation of an autoregressive parameter. Ann. Statist. 11, 478-485.
Lai, T. L.. and Wei, C. 2. (1982rt), Asymptotic properties of projections with applications to stochastic regression problems. J. Multivariare A d . 12, 346-370.
Lai, T. L., and Wei, C. 2. (1982b), Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Srarist. 10, 154-166.
Lai, T. L., and Wei, C. 2. ( 19824, A note on martingale difference sequences satisfying the local Marcinkiewicz-Zygmund condition. Columbia University Tech. Report, New Yo&.
Lai, T. L., and Wei, C. Z. (19856, Asymptotic properties of general autoregressive models and strong consistency of least squares estimates of them parameters. J. Multivariate
Lai. T. L.. and Wei. C.Z. (198Sb). Asymptotic properties of multivariate weighted sums with applications to stochastic regression in linear dynamic systems. In Multivariate Analysis VZ (P. R. Krishnaiah. ed.). North-Holland, Amsterdam, 375-393.
Lighthill, M. J. (1970). Introduction to Fourier Analysis and Generalised Functwns. Cambridge University Press, Cambridge.
Little, R. J. A., and Rubin, D. B. (1987). Statistical Analysis with Missing Data. Wiley. New Yo&.
Liviatan, N. (1963), Consistent estimation of distributed lags. Znt. Econ. Rev. 4, 44-52. Ljung, G. M. (19931, On outlier detection in time series. J. Roy. Stutisr. Soc. B 55.
Ljung, G. M., and Box, G. E. P. (1978), On a measure of lack of fit in time series models.
Lohe, M. (1%3), Probability Theory (3rd ed.). Van Nostrand, Princeton, New Jersey. Lomnicki, 2 A., and Zaremba, S. K. (1957), On estimating the spectral density function of
a stochastic process. J. Roy. Statist. Soc. B 19, 13-37. LoveU, M. C. (1%3), Seasonal adjustment of economic time series and multiple regression
analysis. J. Amer. Statist. Assoc. 58, 993-1010. MacNeil, 1. B. (1978), Properties of sequences of partial sums of polynomial regression
residuals with applications to tests for change of regression at unknown times. Ann. Stafist. 6. 422-433.
1 12- 133.
Anal. 13, 1-23.
559-567.
Biomefrika 72, 299-315.
678 BIBLIOGRAPHY
Macpherson, B. D. (1981), Properties of estimators for the parameter of the first order
Macphersm, B. D., and Fuller, W. A. (1983), Consistency of the least squares estimator of
Magnus, J. R., and Neudeclrer, H. (19881, Matrix Difserential Culculur with Applications in
Malinvaud, E. ( 197Oa), Stutisfical Methods in Econometrics. North-Holland, Amsterdam. Malinvaud. E. (1970b). The consistency of non-linear regressions. Ann. Murh. Statist. 41,
Mallows, C. L. (1973). Some comments on C,, Technomerrics 15, 661-675, Mandelbrot, B.B. (1969). Long-run linearity, locally Gaussian process. H-spectra and
Mandelbrot, B. B., and Van Ness, J. W. (1968), Fractional Brownian motions, fractional
Mann, H. B., and Wald, A. (1943a), On the statistical tnatment of linear stochastic
Mann, H. B., and Wald, A. (1W3b), On stochastic limit and order rclafionships. Ann. Math.
Marden, M. (1966). Geometry of Poljmomials. Amer. Math. Soc., Providence, Mode Island.
Mariott, F. H. C.. and Pope, J. A. (1954), Bias in the estimation of autocomlations. Biometrika 41, 390-402.
Marple, S. L. (1987). Digital Specrral Analysis with Applications. Prentice-Hall, En- glewoad Cliffs, New Jersey.
Martin, R. D. (1980), Robust estimation of autoregressive models, In Directions in Time Series (D. R. Brillinger and G. C. Tiao, eds.), Jnstimte of Mathematical Statistics, Hayward, California, 228-254.
Martin. R. D., and Y0hai.V. J. (1986), Influence functionals for time series. Ann. Statist. 14,
McClave, J. T. (1974), A comparison of moving average estimation procedures. Comm.
McDougall, A. J. (1994), Robust methods for autoregressive moving average estimation. J.
McLeish, D. L. (1974), Dependent central limit theorems and invariance principles. Ann.
McLeish, D. L. (1975), A maximal inequality and dependent strong Iawa. Ann. Prob. 3,
McNeilI, 1. B. (1978). Properties of sequences of partial sums of polynomial regression residuals with applications to tests for change of regression at unknown times. Ann. Statist. 6, 422-433.
Meinhold. R. J.. and Singpurwalla, N. D. (1983). Understanding the Kalmau filter. Amer. Stutistician 37, 123-127.
Melino, A., and Turnbull, S. (1990), Pricing foreign cumncy options with stochastic volatility. J. Ecom~zr ics 45, 239-265.
Miller, K. S. (1963). Linear Di@erential Eqwtions in the Real Domain. Norton, New York.
moving average process. B.D. Thesis, Iowa State University, Ames, Iowa.
the first order moving average parameter. Ann. Sfutisz. 11, 326-329.
Stutistics and Econometrics. Wiley, New York.
956-969.
infinite variances. Inr. &on. Rev. 10, 82-111.
noises and applications. S I N Rev. 10, 422-437.
difference equations. E c o m t r i c a 11, 173-220.
Sintist. 14, 217-226.
781-818.
Statist. 3, 865-883.
ROY. Statlkt. SOC. B 56, 189-207.
Pmb. 2, 620-628.
829-839.
BIBLIOGRAPHY 679
Miller, K. S. (1%8), Linear Diference Equations. Benajmin. New York. Moran, P. A. P. (1947). Some theorems on time series I. BIometrika 34, 281-291. Moran. P. A. P. (1953), The statistical analysis of the Canadian lynx cycle, I: structure and
Morettin, P. A. (1984), The Levinson algorithm and its applications in time series analysis.
Muirhead, C. R. (1986), Distinguishing outlier types in time series. J. Roy. Statisr. SOC. B
Muirhead, R. J. (1982), Aspects of Multivariate Statistical Theory. Wiley, New York. Nabeya, S., and Tanaka, K (1990a). A general approach to the limiting distribution for
estimators in time series regression with nonstable autongressive errors. Econornetrica
Nabeya, S., and Tan&, K. (1990b). Limiting powers of unit root tests in time series regression. J. Econometrics 46, 247-271.
Nagaraj, N. K. (1990). Results on estimation for vector unit root processes. Unpublished manuscript, Iowa State Univeristy, Ames, Iowa,
Nagaraj, N. K., and Fuller, W. A. (1991), Estimation of the parametets of linear time series models subject to nonlinear restrictions. Ann. Stm'st. 19, 1143-1 154.
Nagaraj, N. K., and Fuller, W. A. (1992), Least squares estimation of the linear model with autoregressive errors. In New Directions in Time Series Analysis, Part I (D. Brillinger et al., 4s.) . IMA Volumes in Mathematics and Its Applications, Vol. 45, Springer, New
Nankervis, J. C., and Savin, N. E. (1988). The Student's t approximation in a stationary first order autoregressive model. Ecotwmetrica 56, 119-145.
Narasimham, G. V. L. (1969), Some properties of estimators occurring in the theory of linear stochastic processes. In k c w e Notes in Operations Research and Mathematical Economics (M. Beckman and H. P. Kiinzi, eds.). Springer, Berlin.
Nelson, C. R. (1974). The first-order moving average process: identification, estimation and prediction. J. Econometrics 2, 121-141.
Nelson, C. R., and Plosser, C. I. (1982), Trends and random walks in macroeconomic time series: some evidence and implications. J. Monetary &on. 10, 129-162.
Nerlove, M. (1964a), Notes on Fourier series and analysis for economists, Multilith prepared under Grant NSF-GS-I42 to Stanford University, Palo Alto, Caliiornia.
Nerlove, M. (1964b). Spectral analysis of seasonal adjustment procedures. Econornetrica
Newbold, P. (1974), The exact likelihood function for a mixed autorepsive-moving
Newey, W. K., and West, K. D. (19871, A simple, positive definite, heteroscedasticity and
Newton, H. I. (1988). Timcskrb: A Time Series Analysis Laboratory. Wadsworth and
Newton, H. J., and Pagano, M. (1984). Simultaneous confidence bands for autoregressive
Nicholls, D. E (1976). nK efficient estimator of vector linear time series models.
prediction. Australian J. Zuol. 1, 163-173.
Int. &tist. Rev. 52, 83-92.
48,3947.
58, 145-163.
Ywk, 215-225.
32, 241-286.
average process. Biometrika 61, 423-426.
autocorrelation consistent cov&ance matrix. Econometdca 55, 703-708.
BmldCole , Pacific Grove, California.
spectra. Biometrikn 71, 197-Un.
Biometrika 63, 381-390.
680 BIBLIOGRAPHY
Nicholls, D. F.. Pagan, A. R., and Terrell, R. D. (1975). The estimation and use of models with moving average disturbance terms: a survey. Inr. Econ. Rev. 16, 113-134.
NichoUs, D. F., and Quinn, B. G. (1982). Random CoefJicient Autoregressive Models: An Introduction, Springer Lecrure Notes in Statistics, No. 11. Springer, New York.
Nold, F. C. (1972). A bibliography of applications of techniques of spectral analysis to economic time series. Technical Report No. 66. Institute for Mathematical Studies in the Social Sciences, Stanford University, Stanford, Caiifmia.
Nyblom, J., and Makelainen, T. (1983), Comparisons of tests for the presence of random walk coefficients in a simple linear model. J. Amer. Statist. Assoc. 78, 856-864.
Olshen, R. A. (1967), Asymptotic properties of the periodogram of a discrete stationary process. J . Appl. Prob. 4, 508-528.
Orcutt, G. H., and Winokur, H. S. (1969), Fust order autoregression: inference. estimation, and prediction. Econometricu 37, 1-14.
Otnes, €2. K., and Enochson, L. (1972)) Digitul Tim Series Analysis. Wiley, New Yo&. Otto, M. C., and Bell, W, R. (1990). Two issues in time series outlier detection using
indicator variables. 1990 Proc. Business and Economics Section. h e r . Statist. Assoc.,
Ozaki, T. (1980). Non-linear threshold models for non-linear random vibrations. J. Appl.
Pantula, S . 0. (1982), Roperties of the parameters of auto-regressive time series. PhD.
Pantula, S. G. (1986), On asymptotic properties of the least squares estimators for
Pantula, S. G. (1988a), Statistical time series. Unpublished manuscript for Statistics 613,
Pantula, S. G. (1988b), Estimation of autoregressive models with ARCH emrs. Sankhjk
Pantula, S. G. (1989). Testing for unit roots in time series data Economm*c Theory 5,
Pantula, S . G., and Fuller, W. A. (1985), Mean estimation bias in least squares estimation of
Pantula, S. G., and Fuller, W. A. (1993), The large sample distribution of the roots of the
Pantula, S . G.. Gonzalez-Paris, 0.. and Fuller, W. A, (1994), A comparison of unit root
Park. H. J. (1990), Alternative estimators of the parameters of the autoregressive process.
Park, H. J., and Fuller, W. A. (1995), Alternative estimators and unit mot tests for the
Park, J., and Phiilips. I? C. B. (1988), Statistical inference in regressions with integrated
Park, J., and Phillips, P. C. B. (1989), Statistical inference in regressions with integrated
Panen, E. (1957). On consistent estimates of the spectrum of a stationary time series. AM.
I
182-187.
Prob. 17, 84-93.
Thesis, Iowa State University. Arnes, Iowa.
autoregressive time Series with a unit root. Sankhyil Ser. A 48, 208-218.
North Carolina State University, Raleigh, North Carolina.
Ser. B, SO, 1 1 9-138.
256-27 1.
autoregressive processes. J. Econometrics 27, 99-121.
second order autoregressive polynomial, Biometrika SO, 919-923.
criteria. J. Business and &on. Stat&. 13, 449-459.
Ph.D. dissertation, Iowa State University, Amea, Iowa.
autoregressive process. J. Time Series Anal. 16, 415-429.
processes: Part I. Econometric Theory 4, 468-497.
processes: Part 11. Econometric Theory 5, 95-131.
Math. Statist. 28, 329-348.
BIBLIOGRAPHY 681
P m n , E. (1958). conditions that a stochastic process be ergodic. Ann. Math. Statist. 29,
Parzen, E. (1%1), Mathematical considerations in the estimation of spectra. Technometrics
Parzen, E . (1962). Stochtic Processes. Holden-Day, San Francisco. Panen, E. (1%9), Multiple time series modeling. In Multivariate Analysis ZZ (P. R.
Krishnaiah, ed.), Academic Press, New Yak. Panen, E. (1974). Some recent advances in time series modeling. IEEE Trans. Automatic
Control AC-19, 723-729. P m n , E. (1977). Multiple time wries modeling: determining the order of the approxhat-
ing autoregressive scheme. In Mulrivariare Analysis N (P. R. Krishnaiah, ed.). North
Peiia, D., and Guttman, I. (1989). Bayesian approach to robustifying the Kalman Blter. In Bayesian Analysis of Time Series and Dynamic ModeLr (J. C. Spall, ed.). Marcel Dekker, New Yo& 227-253.
Perron, P. (1989), The calculation of the limiting distribution of the least-squares estimator in a near-integrated model. Economerric Theory 5, 241-255.
Phillips, P. C. B. (1979). The sampling distribution of forecasts from a firstader autoregression. J. Econometrics 9, 241-262.
Phillips, I! C. B. (1986), Understanding spurious regressions in econometrics. J. Econo-
Phillips, P. C. B. (1987a), Time series regression with a unit root. Econometrica 55,
Phillips, P. C. B. (1987b), Towards a unified asymptotic theory of autoregression. Biometrika 74, 535-548.
Phillips, P. C. B. (t987c), Asymptotic expansions in nonstationary vector autoregressions. Econometric m0Iy 2.45-68.
Phillips, P. C. B. (1988). Weak convergence to rhe matrix stochastic integral .fi B dB'. J.
Phillips, P. C. B. (1991), Optimal inference in cointegrated systems. Econometrica 59,
Phillips, P. C. B. (1994). Some exact distribution themy far maximum likelihood estimators of cointegrating coefficients in error correction models. Econometrica 62, 73-93.
Phillips, P. C. B., and Durlauf, S. N. (1986), Multiple time series regression with integrated processes. Rev. Econ. Stud. 53, 473-496.
Phillips, P. C. B., and Hansen, B. E. (1990). Statistical inference in instrumental variables regression with l(1) processes. Rev. Econ. Stud. 57, 99-125.
Phillips, P. C. B., and Ouliaris, S. (lW), Asymptotic properties of residual based tests for cointegration. Econometrica 58, 165- 193.
Phillips, P. C. B., and Perron, P. (1988). Testing for a unit root in time series regression. Biometrika 75. 335-346.
Phillips, P. C. B., and Solo, V (1992), Asymptotics for linear processes. Ann. Srurisr. 20,
299-301.
3, 167-190.
HolIand, Amsterdam, 283-295.
metrics 33, 311-340.
277-301.
Multivariate Anal. 24, 252-264.
283-306.
971-1001.
682 BlBLlOORAPHY
Pierce, D. A. (1970a), A duality between amrepss ive aMf moving average procegses concerning their least squares parmeter estimates. Ann. Math. Statisr. 41, 422-426.
Pierce, D. A. ( 1970b). Rational distributed lag models with autorepsive-moving average errors. 1970 Prw. Business and Economics Statistics Section, Amer. Statist. Assoc.
Pierce, D. A. (1971). Least squares estimation in the r e p s i o n model with autoregressive-
Pierce, D. A. (1972). Least s q u m estimation in dynamic-disturbance time series models.
Plosser, C. I., and Schwett, 0. W. (I977), Estimation of noninvextible moving average
Poirier, D. J. (1973), Piecewise regression using cubic splines. 1. Amer. Starist. Assoc. 68,
Pollard, D. (1984), Convergence of Stwhastic Processes. Springer, New Yo&. Potscher, B. M., and h h a , 1. R (lwla), Basic structure of the asymptotic theory in
dynamic nonlinear econometric models, part I consistency and approximation concepts, Ecowmeic Rev. 10, 125-216.
Potscher, B. M., and &ha, I. R. (1991b). Basic structure of the asymptotic theory in dynamic nonlinear econometric models, part II: asymptotic normality. Economerric Rev. 10, 253-325.
Pratt, J. W. (1959), On a genml concept of “in probability.” Ann. Math. Statist. 30,
Prest, A. R. (1949). Some experiments in demand analysis. Rev. Econ. and Statist. 31,
Priestly, M. B. (1381), Specrd Analysis and Time Series. Academic Press, New York. Priestly, M. B. (1988). Non-linear and Non-stationary E m Series Analysis. Academic
Quenouille, M. H. (1957). Tlre Analysis of Multiple Time Series. H a , New York. Quinn, B. G. (1982), A note on the existence of strictly stationary solutions to bilinear
equations. J. Times Series Anal. 3, 249-252. Rao, C. R. (1959), Some problems involving lineat hypothesis in multivariate analysis.
Biometrika 46,49-58. Rao, C. R. (1965). The theory of least squares when the parameters are stochastic and its
apptication to the analysis of growth curves, Bwmrrika 52, 4 7 - 4 8 . Rao, C. R. (1%7), Least squares thcory using an estimated dispersion matrix and its
application to meamment of s ip is . In Fifth Berkeley Symp. Math. Statist. Prob. 1,
Rao, C. R. (1973), Linear Statistkal Inference and its Applications (2nd ed.). Wiley, New
Rao, J. N. K., and Tintnet, G. (1963). 00 the variate difference method. Australian J .
Rao, M. M. (1%1), Consistency and limit distributim of estimatom of parameters in
Rao, M. M. (lW8a). Asymptotic distribution of an estimator of the boundary parameter of
591-596.
moving average e m . Biometrikia 58, 299-312.
Biometrika 59, 73-78,
process: the case of overdifferencing. J. Economtrics 6, 199-224.
515-524.
549-558.
33-49,
Press, Saa Diego.
355-372.
York.
Statist. 5, 106-116.
explosive stochastic difference equations. Ann. Math, Statist. 32, 195-218.
an unstable process. Ann. Stat&. 6, 185-190. Correction (1980), 8, 1403.
BIBLIOGRAPHY 683
Rao, M. M. (1978b), Covariance analysis of nonstationary time series. In Developments in
Reeves, J. E. (t972), The distribution of the maximum likelihood estimator of the parameter
Reinsel. G. C. (1993), Elements of Multivariate Time Series Analysis. Springer, New Yo& Reinsel, 0. C., and Ahn. S. K. (1989). Asymptotic distribution of the likelihood ratio test
for cointegration in the nonstatiomy vector AR model. Unpublished report, University of Wisconsin, Madison.
Robinson, P. M. (1972). Nonlinear regression for multiple time series. J. &PI. Prob. 9,
Robinson, P. M. (1994a). Semiparametric analysis of long-memory time series. Ann. Srutisz.
Robinson, P. M. (1994b). Efficient tests of nonstationary hypotheses. J. Amer. Stat&.
Rogosinski, W. (1959). Fourier Series (translated by H. Cohn and F. Steinhardt). Chelsea,
Rosenblatt, M. (ad.) (1963). Proc. Synrp. Time Series Analysis, Brown University. June
Rothenberg, T. J. (1984). Approximate nmality of generalized least squares estimates.
Royden. H. L. (1989), Real Anafysis (3rd ed.). Macmillan, New York. Rubin, D. B. (1976), inference and missing data. Biometriku 63. 581-592. Rubin, H. (1950), Consistency of maximum-likelihood estimates in the explosive case. In
Statistical Inference in Dynamic Economic Models (T. C. Koopmans, ed.). Wiley, New Yo&.
Rutherford, D. E. (1946). Some Continuant Determinants Arising in Physics and Chemistry. Proceedings of the RUM Sociev of Edinburgh, Sect. A . 62. 229-236.
Sage, A. P. (1968). Oprimum System Control. Prentice-Hall, Englewood Cliffs. New Jersey.
Said, E. S.. and Dickey, D. A. (1984). Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 71, 599-607.
Saikkonen, P. and Luukkonen, R. (1993), Testing for a moving average unit root in autoregressive integrated moving average models. J. Amer. Stutist. Assoc. 88, 596- 601.
Salem, A. S. (1971), Investigation of alternative estimators of the parameters of auto- regressive processes. Unpublished M.S. thesis, Iowa State University, Ames, Iowa.
Sallas, W. M., and Harville, D. A. (1981). Best linear recursive estimation for mixed linear models. J. Amer. Statist. Assoc. 76, 860-869.
Samaranayake, V. A., and Hasm D. P. (1983). The asymptotic properties of the sample autocorrelations for a multiple autoregressive process with one unit root. University of Missouri. Rolla, Missouri.
Sanger, T. M. (19921, Estimated generalized least squares estimation for the heterogeneous measurement error model. PhD. thesis, Iowa State University, Am-, Iowa.
Sargan, J. D. (1958). The estimation of economic relationships using instnunental variables. Econometrica 26, 393-415.
Statistics (P. R. Krishnaiah, ed.). Academic Press, New York.
in the first-order autoregressive series. Biumenih 59, 387-394.
758-768.
22. 515-539.
ASSOC. 89, 1420-1437.
New Yo&
11-14, 1%2, Wiley, New York.
Econometrica 52, 811-825.
684 BIBLIOGRAPHY
Sargan, J. D. (1959). The estimation of relationships with autocoxrelated residuals by the
Sargan, J. D., and Bhargava, A. (1983), Testing residuals from least squares regression for
Sarlcar, S. (1990), Nonlinear least squares estimators with differential rates of convergence.
SAS (1984), SASIETS User’s Gu&,Version 5 ed., SAS Institute, Cary, N.C. SAS Institute Inc. (1989). SASISTAT User’s Guide, Version 6 (4th ed.), Vol. 2. SAS
Scheffi, H. (1970), Multiple testing versus multiple estimation. Improper confidence sets.
Schmidt, P., and Phillips, F? C. B. (1992). Testing for a unit root in the presence of
Schwartz, G. (1978), Fjstimating the dimension of a model. Ann. Statist. 6, 461-464. Schwartz, G. W. (l989), Tests for unit rods: a Monte Carlo investigation. J. Business and
Econ. Statist. 7 , 147-160. Scott, D. J, (1973), Central limit theorems for martingales and for processes with stationary
increments using a Skorokhod representation approach. A&. Appl. Prob. 5. 119-137. Sen, D. L., and Dickey, D. A. (1987). Symmetric test for second differencmg in univariate
time Series. J. Business and Econ. Statist, S, 463-473. Shaman, P., and Stine, R. A. (1988), The bias of autoregressive coefficient estimators. J.
Amer. SWist. Assoc. (13, 842-848. Shibata, R. (1976). Selection of the order of an autoregressive model by Akaike’s
information critwion. Biometrika 63. 117-126. Shibata, R. (1986), Consistency of model selection and parameter es t idon . In Essays in
Time Series and Allied Processes (J. M. Gani and M. B. Priestley, eds.), Appl. prob, Ttust, Sheffield, 126-141.
Shin, D. (1990). Estimation for the autoregressive moving average model with a unit root. Ph.D. thesis, Iowa State University, Ames, Iowa.
Shin, D., and Fuller, W. A. (1991), Estimation for the autoregressive moving average with an autoregressive unit root. Unpublished manuscript, Iowa State University, Ames, Iowa.
Shively, T. S. (1988), An exact text for a stochastic coefficient in a time series regression model. J. T h e Series Anal. 9, 81-88.
Sims, C. A,. Stock, J. H., and Watson, M. W. (1990), Inference in linear time Saies models with some unit mots. Ecometricrr 58, 113-144.
Singleton, R. C. (1969). An algorithm for computing the mixed radix fast Fourier transform. IEEE Trans. Audw und Electroacoustics AU-17, No. 2, 93-103.
Solo, V. (1984), The order of differencing in ARIMA models. J. h r . Stitfist. Assoc. 79,
Sorendon, H. W. (1966). Kalrnan littering techniques. In Advances in Control Systems 3 (D. T. Leondis, 4.). Academic Ress. New York.
Stephenson, J. A.. and Farr, H. T. (1972), Seasonal adjustment of economic data by application of the general linear statistical model. J. Amer. Statist. Assoc. 67, 37-45.
use of instrumental variables. J. Roy. Statist. Soc. B 21, 91-105.
being generated by the Gaussian random walk. Econometrica 51, 153-174.
Ph.D. thesis, Iowa State University, Ames, Iowa.
Institute, Cary, North Carolina.
Estimation of directions and ratios. Ann. Math. Statist. 41. 1-29.
deterministic trends. Ogord Bull. .&on. and Statist. M, 257-288.
916-92 1.
BIBLIOGRAPHY 685
Stigum, B. P. (1974), Asymptotic properties of dynamic stochastic parameter estimates
Stigum, B. P. (1975), Asymptotic Properties of autoregressive integrated moving average
Stigum, B. P. (1976), Least squares and stochastic difference equations. J. Econometrics 4,
Stine. R. A., and Shaman, P. (1990), Bias of autoregressive spectral estimators. J. Amer.
Stock. J. H. (1987), Asymptotic properties of least squares estimators of cointegrating
Stock, J. H., and Wason, W. W. (1988), Testing for common trends. J. Amer. Statisr. Assoc.
Stout, W. F. (1%8), Some results on tbe complete and almost sure convergence of linear combinations of independent variables and martingale differences. Ann. Math. Sratist.
Strasser, H . (1986), Martingale difference arrays and stochastic integrals. Prob. Theory Md Related Fie& 72, 83-98.
Stuart, A,, and Ord, J. K. (1987), Kendall’s Advanced Theory of Stntistics, Vol. 1. Oxford University Press, New York.
Stuart. A., and Ord, J. K. (1991), Kendall’s Advanced Theory of Statistics, Vol. 2. Oxford University Press, New Yo&
Subba Rao, T., and aabr, M. M. (1984). An Introduction to Bispectral Analysis and Bilinear Time Series Models. Springer Lecture Notes in Statistics, Springer, New York.
Sweeting, T. J. (1980), Uniform asymptotic normality of the maximum likelihood estimator. Ann. Statist. 8, 1375-1381.
Tanaka, K (1990), Testing for a moving average unit root. Economerric Theory 6, 433-444.
Taqqu, M. S. (1988), Self-similar processes. In Encycbpediu of Sraristical Sciences, Vol. 8. Wiley, New York, 352-357.
Theil, H. (1961), Economic Forecusrs and Policy. North-Holland, Amsterdam. Thompson, L. M. (1969), Weather and technology in the production of wheat in the United
Thompson, L. M. (1973). Cyclical weather patterns in the middle latitudes. J. Soil and
Tiao, G. C., and Tsay, R. S. (1983), Consistency properties of least squares estimates of
Tintner, G. (1 940), TBe Variute Dflereme Method. Principia Press, Bloomington, Indiana. Tintner, G. (1952). Econometrics. Wiley, New York. Tolstov, G. P. (1%2), Fourier Series (translated by R. A. Silverman). Prentia-Hall,
Tong, H. (1981), A note on a Markov bilinear stochastic process in discrete time. J. Time
Tong, H. (1983). Threshold Models in Non-linear T h e Series Analysis. Lecture Notes in
(III). J. Multivariate Anal. 4, 351-381.
proce~se~. Stockarric P r ~ p e r t i e ~ Appl. 3, 315-344.
349-370.
Statist. ASSOC. 85, 1091-1098.
vectors. Econometrica 55, 1035-1056.
83, 1097-1107.
39, 199-1542.
States. J. Soil and Water Conservation 24, No. 6, 219-224.
Warer Conservation 28, No. 2, 87-89.
autoregressive parameters in ARUA models. Ann. Statist. 11. 854-871.
Englewood Cliffs, New Jersey.
Series Anal. 2, 279-284.
Statistics, No. 21. Springer, Heidelberg.
6%6 BIBLIOORAPHY
Tong, H. (1990). Non-linear Time Series: A Llynamical System Approach. O x f d
T0rres.V. A. (1986), A note on teaching infinite moving avmges. Amer. Statistician 40,
Toyooka, Y. (1982). Second-urder expansion of mean squared enor matrix of generalized least squares estimator with estimated parameters. Biometrika 69, 269-273.
Toyooka, Y. (1990), No estimation effect up to the second order for the covariance parameter of GLSE and MLE in a regression with a general covariance structure, Math. Japonica 35, 745-757.
University Press, New Yo&.
40-4 1.
Tsay, R, S. (1986), Nonlinearity tests for time series. Biomerrika 73,461-466. Tsay, R. S. (1988), Outliers, level shifts and variance changes in time series. J, Forecasting
Tsay. R. S . (1993), Testing for noninvertible models with applications. J. Business and
Tucker, H. G. (1%7), A G m d w e Course in Probability. Academic Press, New Yo&. Tukey, J. W. (l%l), Discussion, emphasizing the connection between analysis of variance
and spectrum analysis. Technomeirics 3, 1-29. ulyrych, T. J., and Bishop, T. N. (1975). Maximum entropy spectral analysis and
autoregressive decomposition. Rev. Geophys. and Space Fhys. 13, 183-200. Van der h u g t e n , B. B. (1983), The asymptotic behavior of the estimated geaeralized least
squares method in the linear regression model. Statist. Neerlandicu 37, 127-141. Varadarajan, V. S. (1958), A useful convergence theorem. Sankhyd 20, 221-222. Venkataraman, K. N. (19671, A note on the least square estimators of the paramem of a
second order linear stochastic difference equation. Calcutta Statisr. Assoc. Bull. 16,
Venkarman, K. N. (1968), Some limit theorems on a linear stochastic difference equation with a constant term, and their statistical applications. Sankhyd Ser. A 30,51-74.
Venkataraman, K. N. (19731, Some convergence theorems on a second order linear explosive stochastic difference equation with a constant term. J. IndiM Statist. Assoc.
von Neumann, J. (1941). Distribution of ratio of the mean square successive difference to the variance. Ann. Marh. Statist, 12, 367-395.
von Neumann, J. (1942), A further remark concerning the distribution of the ratio of the mean square successive difference to the variance. Ann. Math. Statist. 13, 8 6 8 8 .
Wahba, G. (1968). On the distribution of some statistics useful in the analysis of jointly stationary time series. Ann. Math. Statist. 39, 1849-1862.
Wahba, G. (1990), SpIine Models for Observational Data. SIAM, PhiIaddphia. Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math.
Walker, A. M. (1%1), Large-sample estimation of parameters for moving-average models.
Walker, A. M. (19641, Asymptotic properties of least-squm estimates of the spectnun of a
Wallis, K. F. (19671, Lagged dependent variables and serially correlated errors: a
7,140.
Econ. Statist. 11, 225-233.
15-28.
11.47-69.
Statist. 20, 595-601.
Biomtrika 48, 343-357.
stationary non-deterministic time series. J. Australian Math. Soc. 4, 363-384.
reappraisal of three pass least squares. Rev. &on. Statist. 49, 555-567.
BIBLIOGRAPHY 687
Wallis, K. F. (1972), Testing for fourth order autocornlation in quarterly regression
Watson, G. S. (1955). Serial correlation in regression analysis I. Biometrika 42, 327-341. Watson, G. S., and Hsnnan, E. J. (1956), Serial correlation in regression analysis IT.
Wegman, E. J. (1974). Some results on nonstationary first order autoregression. Tech-
Weiss, A. A. (1984), ARMA models with ARCH errors. J. Time Series Anal. 5, 129-143. Weiss, A. A. (1986), ARCH and bilinear time series models: comparison and combination.
Weiss, L. (1971). Asymptotic properties of maximum likeliW estimators in some
Weiss. L. (1973). Asymptotic properties of maximum likelihood estimators in some
White, H., and Domowitz, I. (1984), Nonlinear regression with dependent observations.
White, J. S. (1958), The limiting distribution of the serial correlation coefficient in the
White, J. S. (1959). The limiting distribution of the serial correlation coefficient in the
Whittle, P. (1951). Hypothesis Testing in Time Series A ~ l y ~ i s . Almqvist and Wiksell.
Whittle, P. (1953), The analysis of multiple stationary time series. J. Roy. Srnrist. Soc. B 15,
Whittle, I? (1%3), Prediction and Regulation by Linear Least-Square Methods. Van
Wilks, S . S . (1%2), Mathematical Statistics. Wiley. New York. Williams, E. J. (1959), Regression Analysis. Wiley, New Yo&. Williamson, M. (1972). The Analysis of BwlogicuZ Populations. Edward Arnold, London. Wilson, G. T. (1%9), Factorization of the generating function of a pure moving average
process. SIAM J. Numer. Anal. 6, 1-7. Wincek, M. A., and Reinsel, G. C. (1986), An exact maximum likelihood estimation
procedure for regtession-ARMA time series models with possibly nonconsecutive
Withers, C. S. (1981). Conditions for linear processes to be strong mixing. Z. Wahr. verw. Geb. 57, 477-480.
Wold, H. 0. A. (1938), A Study in the Analysis of Stationary Time Series (2nd ed., 1954). Almqvist and Wiksell, Uppsala.
Wold, H. 0. A. (1965), Bibliography on Tinre Series and Stochastic Processes. Oliver and Boyd, Edinburgh.
Woodward, W. A.. and Gray, H. L. (1991). “Global warming” and the problem of testing for trend in time series data. Departmot of Statistical Science, Southern Methodist University.
equations. Economettica 40, 617-638.
Biometrika 43, 436-445.
m m e t r i ~ ~ 16, 321-322.
J. Business a d Econ. Statist. 4, 59-70.
nonstandard cases. J. Amer. Statist. Assoc. 66,345-350.
nonstandard cases, II. J. Amer. Statist. Assoc. 68,428-430.
Econometrics 52, 143-161.
explosive cam. Ann. Math. Statist. 29, 1188-1197.
explosive case II. Ann. Math. Statist. 30, 831-834.
Uppsala.
125- 139.
Nostrand, Princeton, New Jersey.
data 1. ROY. S I d t . SW. B 48, 303-313.
688 BIBLIOGRAPHY
Wu. C. F. (1981). Asymptotic thwwy for nonlinear least 8quares estimation. Ann. Math.
Wu, L. S.-Y., Hosking, I. R. M., and Ravishanker, N, (1993). Reallocation outliers in time
Yaglom. A. M, (1962). An Inrroduction to the meory of Starionary Random Functions
Yajima, Y. (1985). On estimation of long-memory time series models. Austrufiun J. Sturist.
Yajima, Y. (1988), On estimation of a regression model with long-memory stationary errors.
Yajima. Y. (19891, A central Iimit t h w m of Fowim transform of strongly dependent stationary processes. J. Time Series AnaI. 10, 375-383.
Yajima, Y. (1991). Asymptotic properties of the LSE in a regression model with long- memory stationary errom. Ann. Statisr. 19, 158-177.
Yamamoto, T. (1976). Asymptotic mean square prediction error for an autoregtessive model with estimated coefficients. J, Appl. Sratisr. 25, 123-127.
Yap, S. F., and Reinsel, G. C. ( m a ) , Results on estimation and testing for a unit root in the nonstationary autoregressive moving-average model. J. T h e Series Anal. 16, 339-353.
Yap, S. F., and Reinsel, G. C. (1995b), Estimation and testing for unit roots in a partially nonstationary vector autoregressive moving average model. J. Amer. Siatist. Assoc. 90,
Yule, G. U. (1927), On a method of investigating peridcities in distributed series with special reference to Wolfer’s sunspot numbers. Philos. T r m . Roy. Soc. London Ser. A
Stufist. 9, 501-513.
series. Appl. Statist. 42, 301-313.
(translated by R.A. Silverman). PrenticeHall, Englewd Cliffs, New Jersey.
27, 303-320.
Ann. Staht. 16, 791-807,
253-267.
226, 267-298. Zygmund, A, (I959), Trigonometric Series. Cantbridge University Press, Cambridge.
Index
Absolutely summable sequence, 28 of matrices, 76 see a h Sequence
ACF, see Autocorrelation function Additive outlier, 460 AIC. 438 Alias, 136 Amplitude, 14 Amplitude spect~m, 174. See uko Cross
amplitude spectrum Analysis of variance, 356 AR, scp Autoregressive time series ARCH, 110 (Exercise 43), 494 m 101 Arithmetic means, method of, 129. See a h
ARMA, 70. See ako Autoregressive moving
Associated polynomial, see Characteristic
Autocorrelation function, 7-10
CesAro summability
average time series
equation
definition, 7 estimation, 317
bias, 317 covariance of estimators, 317 distribution of estimators, 335 least squares residuals, 484-486
partial, 10-12,24,27 spectral representation, 127 for vector valued time series, 17 see also Autocovariance
Autocovariance, 4 of autoregressive moving average time
series, 72 of autoregressive time series, 40,55,62
difference equation for, 55.62 of complex valued time series. 12 estimation, 313
bias, 3 16 consistency, 331,343 covariance of estimators, 3 14-3 I5 distribution, 333 least squares residuals, 484486 variance of, 314-315
of moving average time series, 22 properties of, 7-10 of second order autoregressive process, 55 spectral representation, 127 of vector process. 15-17
Autoregression, on residuals, 519-520. See also Autoregressive time series, estimation
70-75 Autoregressive moving average time series,
autocorrelation function, 72 autoregressive representation. 74 estimation, 431-443
properties, 432-434 model selection, 437438 moving average representation, 72-73 prediction. 88-90 spectral density, 161 state space representation, 200 vector, 79
Autoregressive time series, 36-41,54-58 autocomelation function, 40,%, 61 backward representation, 63-64 estimation for, 404-421
bias. 412 distriiution, 408 empirical density, 412 first order, 404-407 higher order, 407418 least squares, 407413 maximum likelihood, 405,413414 multivariate, 419-421
689
Introduction to Statistical Time Series WAYNE A. FULLER
Copyright 0 1996 by John Wiley & Sons, Inc.
INDEX
Autoregressive time series, estimation for (Continuat)
nonlinear, 451 order of, 412,437-439 residual mean square, 408 from residuals, 539-520
first order, 39-41 infinite order, 65 likelihood for. 405,413-414 model selection. 412,437-439 moving average representation, 39.59,
nonstationary. 546,583 order of, 39 partial correlation in, 40.62 prediction for, 83-85
with estimated parameters, 444-449 as representation for spectral density,
165 root representation for, 67-68 second order. 54-58
63
autocowelation, 56 autocovariance function, 55,61-62 correlogram, 57
spectral density of, 159 threshold. 454 unit root, 64,546,555-560.563-564 vector, 77-78
estimation. 419-421 Autoregressive transformation. 520. See also
Gram Schmidt orthogonalization Auxiliary equation, 46. See also
Characteristic equation
Backward shiR operator, 43-44.67 autoregressive moving average
representation, 74 Bandpassed white noise, 212 Bartlett window. 384 Basis vectors, 112 Bessel's inequality, 118 Bilinear model, 457 Bounded in probability, 216. See also Order
Bricks, pottery, glass and cement, 535 Burg estimator, 418
Causal, 108 (Exercise 32) Centered moving average, 501 Central limit theorem, 233
for ARMA parameters. 432 for finite moving average series, 320 for infinite moving average series, 326 Liapounov. 233 Lindeberg, 233
in probability
for linear functions, 329 for martingale differences. 235 for nt-dependent random variables.
multivariate, 234 for regmsion coefficients. 478 for sample autocovariance. 333
Cesiiro summability, 129 for Fourier Series. 129
Characteristic equation, of autoregressive
32 1
series. 54.59'64 of difference equation, 46 of moving average time serips, 65
toots greater than one. 68-70 roots, 46-48 of vector difference equation, 50
Characteristic function, 9, 133. 143 Characteristic roots. of autoregressive series.
46-48 of circulant matrix, I51 of explosive process, 61 1-612 of matrix, 298 of moving average time series, 65 of unit root process, 591
Chebyshev's inequality. 219 Circuiant 150
characteristic roots of, 151 characteristic vectors of, 151
Circular matrix, 150 symmetric. 151
Coherency. 172 inequality, 172 see also Squared coherency
Coincident spectral density, 169 Cointegrated time series, 608 Cointegrating vectors. 608
Complete convergence. 228. See a h Convergence in distribution
Complete system of functions. 119 Complex exponential, 9 Complex multivariate normaL 207
Complex trigonometric series, 130 Complex valued time series, 12-13 Component model. 475.509. See also
Structural model Conjugate transpose, 151,170 Continuity theorem, 230 Continuous process, 5 Convergence, complete, 228
in distribution, 227 of functions, 222 of matrix functions. 233
estimators, 617,627
(Exercise 10)
in law, 228
INDEX 691
in mean square (squared mean), 31-32.35,
of moving average. 3 1-37 in probability. 215 in r-th mean, 221 of series. 27-29 in squared mean (mean square), 31-32,
35.120 of sum of random variables, 31-37 weak 236
Convolution, 30,136 Fourier transform of, 136-138 of sequence, 30
120
Correlation, cross. 17 Correlation function. 7-9. See also
Autocorrelation function; Cross correlation
Correlation matrix. 17 Comlation, partial, 10-12
autoregressive time series, 40-41 moving average time series, 24 see also Partial autocorrelation
Correlogram, 57. See also Autocomlation function
Cospectrum, 169
Covariance, 4. See also Autocovariance;
Cross amplitude spectrum, 174 confidence interval for. 394 estimation, 393-394
estimation, 341-342 consistency of, 343 covariance of. 342 distribution oT. 345
estimation of, 391
Cross covariance
Cross correlation, 17
Cross covariance. 15 estimation, 339-343
consistency, 343 covariance of, 342
Cross periodogram 388 definition, 388 expected value of, 388 smoothed. 389-394
expected value of. 390 variance of, 390
Cross spectral function. 169 Cycle, 14
Delay, 167 fmpenCy response function of, 167 phase angle of, 167
Delta function, 141 (Exercise 8) Des Moines River, 184,192,346,394
sample correlations. 347
Determinantal equation, 50,77.78 Deterministic time series, 81,% Diagnostic checking. see Residuals, least
Difference, 41,507-509 squares
of autoregressive moving amage series,
effects oT, 513414,516 fractional. 100 gain of operator, 514-515 of integrated time series, 508 of lag H. 508 as moving average, 508 of periodic functions, 508 of polynomials, 46.508 of time series, 507
DitTerence equation. 41-54 auxiliary equation, 46 characteristic equation, 46 determinantal equation, 50 general solution, 45 homogeneous, 44 Jordan canonical form, 52 particular solution, 45 reduced, 44 second order, 47-48 solution of n-th order, 48 solution of second order, 47-48 solution of vector, 50-51.53-54 stochastic. 39 systems of, 49-54 vector. 49-54 vector n-th order, 53-54
516
Dirac's delta function, 141 (Exercise 8) Distriiutcd lags, see Regression with lagged
Dism%ution function, 2-3 Donsker's theorem, 237 Drift random walk, 565 Durbin-Levinson algorithm, 82 Durbin's moving average estimator, 425
Durbin-Watson d, 486
variables
e m in. 426
t approximation for, 488
Ensemble, 3 Ergodic. 308 Error spectral denaity, 177 Error spectrum, 177
estimation of, 392 Estimated generalized least squam, 280
distribution of, 280 based on residuals. 288
Estimator, 79. SBe alro w n w m d s and parameten
INDEX
Even function, 8-9 Event. 1 Expectation, of produets of means, 242
of sequence of functions, 243 Explanatory variable, 605 Explosive autoregression, 583-596
estimation, 584-596 prediction, 5954%
Extrapolation, 79-94 polynomial, 482 see ah0 Prediction
Fast Fourier transform, 357 Fieller's confidence interval, 393 Filter, 79,165
delay, 167 frequency response function. 167 gain of, 167 Kalman, 188 linear, 165.497-509 matrix, 179 for mean, 497-502 mean square error of. 184 moving average, 497-509 phase of, 167 power transfer function, 167 for signal measuremenf 181 time invariant, 166 transfer function, 167 see also Moving average; Difference;
Autoregressive transforma tion Fisher's periodogram test, 363
Folding frequency, see Nyquist frequency Forecast, 79-94. See also Prediction Fourier coefficients, 114
table. 361
analysis of variance, 356 complex, 130-131 of finite vector, 114 of function on an interval, 117 multivariate, 385-386
distribution of, 389 as regression coefftcients, 114-135.356 of time series, 355-356 uniqueness, 119
Fourier Integral Theorem, 132 Fourier series, CeSaro summability, 129
convergence, 121-130 integral formula, 123 of periodic function, 125 see also Fourier transform; Trigonometric
pot ynomial Fourier transform, 132
of convotution, 136-138
of cornlation function, 127,132 of covariance function, 357. &a.!w
spectral density of estimated covariance function, 357. See
also Periodogram inverse transfom 133 pair, 132 of product, 138439 on real line, 132 table of, 134
Fourier transform pair, 132 Fractional difference, 100 Frequency, 14 Frequency domain, 143 Frequency respame function, 167 Fubmi's theorem, 34 Function, complete system of, 119
delta, 141 (Exercise 8) distribution, 2-3 even, 8-9 expectation of, 243 generalized, 140 (Exercise 8) intejyable, 117
periodic, 13. 14 piecewise smooth, 121 sample. 3 smooth, 121 symmetric distribution, LO trigonometric. 13
odd. 8-9
Gain, 174 bivariate time series, 174 confidence interval for, 394 of difference operator, 514 estimation, 394 of filter, 167 of trend removal fiter, 513-514
Gauss-Newton cstimator. 269,437 estimator of variance of, 271 moving average time series, 422424 one step, 269 unit root moving average, 630
Generalized function, 140 (Exercise 8) Generalized inverse, 80 Generalized least squares, 280
efficiency. 479-480 estimated. 280.520-522 for trend, 476
Gibb's phenomenon, 140 (Exercise. 3) Grafted polynomials, 480-484
definition, 481 for trend, 482-484
Grum-Schmidt orthogonalization. 86 520
INDEX 693
Helly-Bray theorem. 230 Hermitian matrix. 170 Heterogeneous variances, 488.492497 HOlder inequality, 243 Homogeneous difference equation. 44. See
also Difference equation
Implicit price deflator, 593 Impulse response function. 529 Index set. 3.5 Initial condition equation. 188 Innovation algorithm. 86 Innovation outlier, 460.461 Input-output system. 174 Instrumental variables. 273. 532-534
distribution of estimator. 275 estimator. 274 example of. 277 for lagged dependent variables.
Integral formula, Fourier Series. 123 Integrated spectrum, 144 Integrated time series, 502
532-534
moving average for, 502-503 see also Unit root autoregression
Inverse autocorrelation function, 21 I Inverse transform. 133 Invertible moving average, 66. See also
Moving average time series
Joint distribution function. 3 Jordan canonical form, 52 Jump discontinuity, 122
Kalman filter. 188.497 initial condition equation for. 188 with missing data, 198 for predictions. 198.202 for unit root process. 195 updating equations. 190 see also State space
Kernel, 382. See also Window Kolmogorov-Smimov test. 363 Kronecker's Lemma. 126
Lagged value. 42
Lag window, 382 Law of large numbers, 253-255. See also
Least squares estimator, 251
in regression. 530
Convergence in probability
for autoregressive time series, 405-412. See also Autoregressive time series, estimation
of autoregressive unit root, 603 distribution of. 256.478 eficiency of, 312,479480,518-519 of explosive process, 584,591 Fourier coefficients, 114.117 of mean, 3 I1 of mean function, 476-480 moving average, 497 for moving average time series. 421-429.
See also Moving average time series, estimation
nonlinear. 251. See also Gauss-Newton estimator
for prediction, 79-80. See also Prediction residuals, 484497
autoregressive parameters from, 490 heterogeneous variances, 488,492497
with time series errors. 518-529 for trend, 476 with trigonometric functions, 117 variance of, 256 variance estimator, 260 of vector unit root process. 578-629 see also Generalized least squares;
Maximum likelihood; various models Lebesque-Stieijes integral, 9 Liapounov central Iimit theorem, 233 Likelihood estimation. see Maximum
Limited information maximum likelihood,
Lindeberg central limit theorem, 233 Linear filter, 165.497-509 Linear operator, see Filter Line spectrum, 146 Long memory time series, 98-101
likelihood estimation
274
estimation for. 466471 spectral density of, 168
Lynx, 455
Martingale difference, 235 Martingale process, 234 Matrix, absolutely summable sequence, 76
circulant, 150 circular, 150 circular symmetric. 151 covariance of least squares estimator. 256. See a h Least squares estimator, Gauss-Newton estimator; Generalized least squares; Instrumental variables
covariance of n obsewations, 80 covariance of prediction errors, 90-91 covariance of vector time series, 15 diagonalization of covariance, 151-154
694 INDEX
Matrix (Conrinrred) difference equations, 49-54 Hermitian, 170 Jordan canonicat form, 52 positive definite, 154 positive semidefinite, 154
Maximum likelihood estimation, autoregression, 405,413414
434 autoregressive moving average, 431,
lagged dependent variable, 535-538 regression model with autoregressive
unit root autoregressive process, 546-547, errors, 526527
574-578 m-dependent, 321 Mean, best linear estimator of, 3 I I
central limit theorems for, 321,329 consistency of, 309,325-326 distribution of sample, 321 efliciency of sample, 3 12 estimation of, 308 function, 475-476 local approximation to, 497-499 moving average estimator, 497-502 structural model for, 509-513 product of sample, 242 of time series, 4 variance of, 309
Mean prediction error, 438 Mean square approximation, 120 Mean square continuous, 6 Mean square convergence, 31-32,35 Mean square differentiable, 7 Measurement error, 181 Measurement equation, 187 Missing data, f i l m a n filter for, 198
maximum likelihood with, 458-459 Moorman Manufacturing Company, 278 Moving average, 21-26,31-39,497-509
centered. 501 computation of, 499-500,506-507 covariance function, 22 covariance of, 33-39.501-502 difference as, 508 filter, 497-509 for integrated time series, 502 least squares, 499 for mean, 498-501 mean square convergence of, 31-32,35 for nonstationarity, 502 for polynomial mean, 501, 504 for seasonal, 504-507.515 for seasonal component, 504-507 spectral density, effect on, 513-514
for trend, 498-501,513 see also FilteG DitTerence
Moving average process, 21-26. Seealso Moving average time series
Moving average time series, 21-26 autocornlation function, 22-23 autoregressive representation, 65 covariance function, 22-23 estimation for, 421-429
autocovariance estimator, 421422 distribution, 424,432 Durbin's procedure, 425 empirical density, 429 first order, 421-425 higher order, 430-432 initial estimators, 425 residual mean square, 424
first order, 23-25.66 finite, 21-22 infmite, 23 invertible, 66 one sided, 22 order of, 22 partial correlation in. 23.65 prediction for, 85-88 as representation for spectral density, 163 as representation for general time series.
root representation for, 66 spectral density of, 156 unit root, 66.629-636 vector, 75-76
Multiplicative model, 440 Multivariate random walk 596-599 Multivariate spectral estimation, 385-399 Multivariate time series, 15-17. See also
96
Vector valued time series
NI(0, d), 19 (Exercise 9) Noise to signal ratio, 177 Nondeterministic, 81.86.94 Nonlinear model. 250
autoregression,' 451 estimator for, 251,269
Nonlinear regression, 250. ,!h?aLw Gauss-Newton estimator
Nonlinearity, test for, 453-4M Nonnegative definite, 7,9 Nonstationary autoregression, 5414%. See
also Explosive autoregression; Unit root au toregression
Nonstationary time serie-s, 4.475.547 types of, 475 see also Nonstationary autoregression
Nyquist frequency, 136
INDEX 695
Phase angle, 14 of delay, 167 of filter, 167
Phase spectrum 174 confdence interval for, 393 estimation, 392
Pig feeding experiment, 277 Polynomial, extrapolation, 482
grafted, 480 moving average for, 501,504 trend, difference of, 508 trigonometric, 112
Positive definite Hermitian matrix, 170 Positive semidefinite, 7
Hermitian matrix, 170 Power spectrum, see Spectral density Power transfer function, 167 Prediction, 79-94
for autoregressive moving average, 88-94 for autoregrassive time series, 83-85 confidence interval for, 90 estimated expIosive pmess, 5954% with estimated parameters, 443-457 estimated unit root process, 582-583 first order autoregressive, 83-84 with Kalman filter, 198,202 linear, 80 minimum mean square error, 80 for moving average time series, 85-88 poiynomial, 482 recursive equations for, 86-87 variance of, 80
Probability limit, 215 of functions, 222
Probability space, 1-2 Process, 3-5. See also Time series: various rvpes Product, expectation of- 242
Quadrature spectral density, 169 Quadrature spectrum, 169
estimation of, 391
Observation equation, 187 Odd function, 8,9 One-step Gauss-Newton, 269 One-step prediction, see Prediction Order, of autoregressive moving average
time series, 70 of autoregressive time series, 39 of diffemnce equation, 41.53 of expected value of functions, 214-250 of expected value of moments. 242 of magnitude, 214 of moving average time series, 22 in probability, 215 of trigonometric polynomial, I I7
Ordinary independent variable. 530 Ordinate. periodagram, 358 Orthogonal basis, 112 Orthogonal system of functions, 112,116 Orthogonal vectors, 112 Orthogonality, of trigonometric functions,
Outliers. 460-466 additive, 460 innavation. 461
112-1 l4,l I6
PACE see Partial autocorrelation Parseval's theorem, 120 Partial autocorrelation, 10-32
autoregressive time series, 40,62 estimators, 416 moving average time series, 23-24.65
Partial regression, 10 Period, 13 Periodic function. 13
difference of, 508 Periodic time series, 13-14 Periodicity, test for. 363 Periodogram, 357-363
chi-square distriiution of. 360 confidence interval based on, 375 covariance of ordinates. 369 cross, 388. See a h Cross periodogram cumulative, 363 definition. 357 distribution of. 360 expected value of, 359 as Fourier transform of sample
covarisnces, 357 Fisher's test, 363 Kolmogorov-Smimov test, 363 smoothed, 371-374
distribution of, 372.374 expected value of, 372 variance of. 372
test for perbdicity, 363
Random variable, 2 Random coeficient autoregression, 457 Random walk, 546
with drift, 565 estimation, 546 vector, 596-599 see also Unit root
Realization, 3,s. 81 Regression. with lagged variables. 530-538
for time series, 51 8 - 5 3 see aLro Least squares estimator:
Generalized least squares; Partial regression
696 lNDEX
Regular time series, 81,86,94 Residuals. least squares. 484-497. See also
Response function, see Frequency response
Roots, of characteristic equation, 46-48
Least squares estimator
function
of matrices, 298 of polynomial, 291 see also Characteristic roots
Sample function, 3 Seasonal adjustment, 504-507. See also
Scasonal component. 14,475,504-M7 moving average estimation, 504-507 see olso Mean; Filter; Moving
Filter; Moving average
average Sequence. 26-39
absolutely summable, 28 convergent. 28 convolution o t 30 of expectations. 243 infinite, 26-27 of matrices, 76 of uncorrelated random variabla, 5 white noise, 5
correlation
covariance
Serial correlation, see Autocorrelation; Cross
Serial covariance, see Autacovariance: Cross
Series, infinite, 27 absolutely convergent, 28 convergent, 28 divergent, 28 Fourier. 123-130. &e oko Fourier series trigonometric, 117
Shift opetator. see Backward shin operator; Lagged value
Short memory process, 98 Sigma-algebra, 2 Sigma-field, 2 Signal measurement, 181 Singular time series, 81, % Skew symmetric, 13 Smoothed periodogram, 371-374 Smoothing, see Moving average; Filter;
Periodogram Spectral density, I32
approximation for, 163, 165 autoregressive estimator. 385 of autoregressive moving average, 161 autoregressive representation, 165 of autoregressive time series, I59 coincident, 169 confidence interval for, 375
covariance estimator of, 382 distribution of. 382-384 windows for. 384-385
of convolution, 138 cross, 169. See also Cms spectrum of differenced time series. 513-514 error, 177 estimation, 355-385. See ulro Cms
periodogram; Periodogram; Spectrum estimation
existence of, 127 of filtered time series, 156,166-167,
of long memory process. 168 moving average representation, 163 of moving average time series, 156 properties of. 127 quadrature!, 169 rational, 161 of seasonally adjusted time series, 515-516 of vector autoregressive process, 180 of vector moving average process. 180 of vector time scrim, 172
estimation. 385-399 see a h Spectrum
vector process, 172
179-180,513-514
Spectral distribution function, 144
Spectral kernel, 382 Spectral window, 382. See aLE0 Window Spectrum 132
amplitude. 174 cospectrum, 169 cma, 169 discrete, 146 e m , 177
estimation of, 392 estimation, 355-385. See a150 C m
periodogram; Periodogram; Spectral density, covariance estimator
integrated, 144 line, 146 phase, 174 quadrature, 169 see uho Spectral density
Spline functions, 484. See a h Grafted
Spirits. 522,528 Squared cohmncy. 172.175
confidence interval. 392 estimation, 391 multiple. 391-392 test of zero coherency, 3%
Starting equation, 188 State equa tion, I87 State space, 187
polynomials
INDEX 697
measurement equation. 187 representation for ARMA, 200 state equation, 187
Stationarity. 3-4 covariance, 4 second order, 4 strictly, 3 weakly, 4 wide sense, 4 see also Nonstationary time series
estimation, 404-407. See also Autoregressive
vector. 77 see also Autoregressive time series
continuous, 6 definition. 3 mean square continuous. 6 see also Time series
Stochastic difference equation. 39
time series, estimation
Stochastic ptocess, 3-5
Stochastic volatility model, 494-497 Strictly stationary. 3 Structural model, 509-513 Symmetric estimators. 414-416
simple. 414 simple for unit root, 568-573 weighted, 414-416 weighted for unit root. 570-574
Taylor's series. for expectations, 214-250 for random functions, 224-226
Threshold autoregression, 454 smoothed. 455
Time domain. 143 Time series, causal. 108 (Exercise 32)
complex. 12-13 covariance stationary, 4 continuous, 5 definition. 3 deterministic, 81 long memory, 98-101 nonsingular. 82 nonstationary, 4 periodic, 13 realization of. 81 regular, 81.94 short memory. 98 singular, 81, % stationary, 3,4 strictly stationary. 3 see also Autoregressive time series; Moving
average time series; various time series enm'es
model. 529 Transfer function. 167
of trend removal filter. 514 Transformation, autongressive:
Gram-Schmidt orthogonalization, 86 see also Filter; Moving average
Transform pair, 132 Transition equation, 187 Tteasury bill rate, 564,577,581,624 %nd. 4.475
filter for, 497-502 global estimation of, 476-480 grafted polynomial. 480-482 moving average estimation. 497-502 structural model for, 509-513 see also Mean: Least squares estimator.
Moving average Trigonometric functions, 14,112
amplitude, 14 completeness of, 119 frequency, 14 orthogonality of. on integers. 112-114
period. 14 phase. 14 polynomial. 112,355 sum on integers, 112-1 14 see also Fourier series; Fourier transform
Trigonometric polynomial. 112 order of. 117
Trigonometric series, 117 complex representation, I30
h e s t a g e least squares, 274,532
Unemployment rate. 336,379,412
on interval, 116
autoregressive representation, 412 correlogram for, 339-340 monthly, 379
Unit root, autoregression. 63-64,546 autoregressive estimation, 547-583
distribution 550-551,556,561-563 drift, 565-568 first order. 547-563 higher order, 555-560 maximum likelihood. 573-577 mean estimated, 560-565 median unbiased, 578-579 trend estimated, 567-568
explanatory variable, 603-606 first order, 546 Kalman filter for, 195 moments of, 547.639 (Exercise 4) moving average, 66.629-636 prediction for, 573-577 test for, univariate, 553.555.558.
568-577 tables. 641-645
698 INDEX
Unit root (Continued) test for, vector, 622
tables. 646-452 vector autoregression, 596-429
Unobserved components, #)9 Updating equation, 190
Variance, 4 heterogeneous, 488,492497 see ako Autocovariance; Cross covariance
Vector process, 15-17,75-79. See ulw, Vector
Vector valued time series, 15-17.75-79 valued time series
autoregressive, 77-78 unit root, 607410
covariance matrix, 15 covariance stationary, 15 estimation. 419-421. SeeuLso Cross
covariance..estimation; Cross periodogram
moving average, 75-76 spectral density, 172
estimation, 385-399 unit mots, tests for, 622
tables, 646-652 Wiener process. 238,596-599 seeulso Cross correlation; cross spectral
fUnCtiOa von Neumann ratio, 319,486
f approximation for, 488
Weak convergence. 236 Weakly stationary, 4 Weighted average. see Filter; Moving average Wheat yields, 510.527 White noise, 5
band passed, 212 Wiener ptocess, 236 Window, 382.384
Bartlett. 384 Blackman-ltkey, 385 hamming, 385 hanning. 385 lag, 382 Panen, 385 rectangular, 384 spectral, 382 triangular, 384 truncated. 384
Wold decomposition, 94-98
Yuie-Walker equations. 55,408 multivariate, 78