introduction to statistical time series (wiley series in probability and statistics)

721

Upload: others

Post on 11-Sep-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)
Page 2: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

Introduction to Statistical Time Series

Page 3: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

WILEY SERIES IN PROBABILITY AND STATISTICS

Established by WALTER A. SHEWHART and SAMUEL S. WILKS

Editors: Vic Barnett, Ralph A. Bradley, Nicholas I. Fisher, J . Stuart Hunter, J. B. Kadane, David G. Kentiall, David W, Scott, Adrian F. M. Smith, Jozef L. Teugels, Geoffrey S. Watson

A complete list of the titles in this series appears at the end of this volume

Page 4: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

Introduction to Statistical Time Series Second Edition

WAYNE A. FULLER Iowa State University

A Wiley-Interscience Publication JOHN WILEY & SONS, UVC. New York Chichester Brisbane Toronto 0 Singapore

Page 5: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

A NOTE TO THE READER This book has been electroaiailly replodaoed Eram digital infixmation stated at John Why & Sons, Inc. we are pleased that the use oftbis Itew tcchndogy will enabie us to keq works of enduriag scholarly value in print as loq as thge is arcasonable demaad for them. The content of thisbodc is idmicd to

This text is printed on acid-free paper.

Copyright 8 1996 by John Wiley & Sons, Inc.

All rights reserved. Published simultaneousfy in Canada.

No pdrl ul rlus plblruiim may be lrpmJufd stored u) a m v a l sywm or msmtlrd in any form or hy any means. elctronic. mechanical. photocopying, r d n g , scanmg or dimwise, except as pmnitrcd under Wiona 107 or 108 of b e 1976 United States Copyright Act, wilboiii either thc pnor wntlen permtssiw o# be Publrsher, or authorimion hmgh p~ymrnt of he appropriate p-copy fec to the Copyright CIearamv Center, 222 Roscwocd Dnvc, Danvers. MA 01923, (978) 73J -W. fax (978) 750-4470. Requcsca to the P u M i r fw p e m u s s h sbould br addressed to the Petmmionc Ueporcment, John Wtky & Sons, Im , 11 1 Rlvm Street, Hoboken, NJ 07030, (201) 748-6011. fax (201) 748-6008, &Mail P E W Q O W I L E Y C Q M

To order b k s or for cuslomer service please, call 1(8Oo)-CALL-WY (225-59452

Libmy of Congrem CatelogiRg in Pubkzfion Da*r: Fuller, Wayne A.

Introduction to statistical time series / Wayne A. Fuller. -- 2nd ed.

statistics) p. cm. -- (Wiley wries in probability and

“A Wiley-Interscience publication.” Includes bibliographical references and index. ISBN 0-471-55239-9 (cloth : alk. paper) I . Time-series analysis. 2. Regression analysis. I. Title

11. Series. QA280.F%4 1996 5 19.5’5--&20 95- 14875

1 0 9 8 7 6 5 4 3

Page 6: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

To Evelyn

Page 7: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

Contents

Preface to the First Edition Preface to the Second Edition List of Principal Results List of Examples

1. Introduction

1.1 ProbabiIity Spaces 1.2 Time Series 1.3 Examples of Stochastic Processes 1.4 1.5 Complex Valued Time Series 1.6 1.7 Vector Valued Time Series

Properties of the Autocovariance and Autocorrelation Functions

Periodic Functions and Periodic Time Series

References Exercises

2. Moving Average aad Autoregressive Processes

2.1 2.2 2.3 2.4 2.5 2.6

2.7

2.9 2.10

2.8

Moving Average Processes Absolutely S u m b l e Sequences and Infinite Moving Averages An Introduction to Autoregressive Time Series Difference Equations The Second order Autoregressive Time Series Alternative Representations of Autoregressive and Moving Average Processes Autoregressive Moving Average Time Series Vector Processes Prediction The Wold Decomposition

xi Xiii

xxi

1

1 3 4 7

12 13 15 17 17

21

21 26 39 41 54

58 70 75 79 94

vii

xv

Page 8: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

viii

2.1 1 Long Memory proCesses References Exercises

3. Introduction to Fourier Analysis

3.1 3.2 3.3 3.4

Systems of Orthogonal Functions-Fourier Coefficients Complex Representation of Trigonometric Series Fourier Transfonn-Functions Defined on the Real Line Fourier Transfonn of a Convolution References Exercises

4. Spectral Theory and Filtering

4.1 The Spectrum 4.2 Circulants-Diagonalization of the Covariance Matrix of

Stationary Process

CONTENTS

98 101 101

112

112 130 132 136 139 139

143

143

149 4.3 The Spectral Density of Moving Average and Autoregressive

Time Series 155 4.4 Vector Processes 169 4.5 Measurement Enor4ignal Detection 181 4.6 State Space Models and Kalman Filtering 187

References 205 Exercises 205

5. Some Large Sample Theory 214

5.1 5.2 5.3 5.4 5.5

5.6 5.7 5.8

Order in Probability Convergence in Distribution Central Limit Theorems Approximating a Sequence of Expectations Estimation for Nonlinear Models 5.5.1 5.5.2 One-Step Estimation Instrumental Variables Estimated Generalized Least Squares Sequences of Roots of Polynomials References Exercises

Estimators that Minimize an Objective Function

214 227 233 240 250 250 268 273 279 290 299 299

Page 9: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CONTENTS

6. Estimation of the Mean and Autocormlatiom

6.1 Estimation of the Mean 6.2

6.3 6.4

Estimators of the Autocovariance and Autocorrelation Functions Central Limit Theorems for Stationary Time Series Estimation of the Cross Covariances References Exercises

7. The Periodogram, Estimated Spectrum

7.1 The Periodogram 7.2 Smoothing, Estimating the Spectrum 7.3 7.4 Multivariate Spectral Estimates

Other Estimators of the Spectrum

References Exercises

8. Parsmeter Mmation 8.1 8.2

8.3 8.4 8.5 8.6 8.7 8.8

h

308

308

313 320 339 348 348

355

355 366 380 385 400 400

404

First Order Autoregressive Time Series Higher Order Autoregressive Time Series

404 407

8.2.1 Least Squares Estimation for Univariate Processes 407 8.2.2 Alternative Estimators for Autoregressive Time Series 413 8.2.3 Multivariate Autoregressive Time Series 419 Moving Average Time Series 42 1 Autoregressive Moving Average Time Series 429 Prediction with Estimated Parameters 443 Nonlinear Processes 45 1 Missing and Outlier Observations 458 Long Memory Processes 466 References 47 1 Exercises 47 1

9. Regression, Trend, and Seasonality

9.1 Global Least Squares 9.2 Grafted Polynomials 9.3 Estimation Based on Least Squares Residuals

9.3.1 Estimated Autocorrelations 9.3.2 Estimated Variance Functions

475

476

484 484 488

480

Page 10: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

X m m

9.4 Moving Averages-Linear Filtering 9.4.1 Moving Averages for the Mean 9.4.2 Moving Averages of Integrated Time Series 9.4.3 S~XSOMI Adjustment 9.4.4 Differences

9.5 Structural Models 9.6 Some Effects of Moving Average Operators 9.7 Regression with Time Series Errors 9.8 Regression Equations with Lagged Dependent Variables and

Time Series Errors References Exercises

10. Unit R o d and Explosive Time Series

10.1 Unit Root Autoregressive Time Series 10.1.1 The Autoregressive Process with a Unit Root 10.1.2 Random Walk with Drift 10. I .3 Alternative Estimators 10.1.4 Prediction for Unit Root Autoregressions

10.2 Explosive Autoregressive Time Series 10.3 Multivariate Autoregressive Processes with Unit Roots

10.3.1 Multivariate Random Walk 10.3.2 Vector Process with a Single Unit Root 10.3.3 Vector Process with Several Unit Roots

10.4 Testing for a Unit Root in a Moving Average Model References Exercises

10.A Percentiles for Unit Root Distributions 10.B Data Used in Examples

Biblrography

Index

497 497 502 504 507 509 513 518

530 538 538

546

546 546 565 568 582 583 596 596 599 617 629 638 638 641 653

664

689

Page 11: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

Preface to the First Edition

This textbook was developed from a course in time series given at Iowa State University. The classes were composed primarily of graduate students in econ- omics and statistics. Prerequisites for the course were an introductory graduate course in the theory of statistics and a course in linear regression analysis. Since the students entering the course had varied backgrounds, chapters containing elementary results in Fourier analysis and large sample statistics, as well as a section on difference equations, were included in the presentation.

The theorem-proof format was followed because it offered a convenient method of organizing the material. No attempt was made to present the most general results available. Instead, the objective was to give results with practical content whose proofs were generally consistent with the prerequisites. Since many of the statistics students had completed advanced courses, a few theorems were presented at a level of mathematical sophistication beyond the prerequisites. Homework requiring application of the statistical methods was an integral part of the course.

By emphasizing the relationship of the techniques to regression analysis and using data sets of moderate size, most of the homework problems can be worked with any of a number of statistical packages. One such package is SAS (Statistical Analysis System, available through the Institute of Statistics, North Carolina State University). SAS contains a segment for periodogram computations that is particularly suited to this text. The system also contains a segment for regression with time series errors compatible with the presentation in Chapter 9. Another package is available from International Mathematical and Statistical Library, Inc.; this package has a chapter on time series programs.

There is some flexibility in the order in which the material can be covered. For example, the major portions of Chapters 1, 2, 5, 6, 8, and 9 can be treated in that order with little difficulty. Portions of the later chapters deal with spectral matters, but these are not central to the development of those chapters. The discussion of multivariate time series is positioned in separate sections so that it m a y be introduced at any point.

I thank A. R. Gallant for the proofs of severat theorems and for the repair of others: J. J. Goebel for a careful reading of the manuscript that led to numerous substantive improvements and the removal of uncounted mistakes: and D. A.

xi

Page 12: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

xii PREFACE TO THE FIRST EDITION

Dickey, M. Hidimglou, R. J. Klemm, and a. H. K. Wang for computing examples and for proofreading. G. E. Battese, R. L. Carter, K. R. Crwse, J. D. Cryer, D. P. Hasza, J. D. Jobson, B. Macpherson, J. Mellon, D. A. Pierce and K. N. Wolter also read portions of the manuscript. I also thank my colleagues, R Groeneveld, D. Isaacson, and 0. Kempthorne, for useful comments and discussions. I am indebted to a seminar conducted by Marc Nerlove at Stanford University for the organiza- tion of some of the material on Fourier analysis and spectral theory. A portion of the research was supported by joint statistical agreements with the U.S. Bureau of the Census.

I thank Margaret Nichols for the repeated typings required to bring the manuscript to final form and Avmelle Jacobson for transforming much of the original illegible draft into typescript.

WAYNE A. FULLER

Anus, Iowa February 1976

Page 13: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

Preface to the Second Edition

Considerable development in statistical time series has occurred since the first edition was published in 1976. Notable areas of activity include nonstationary models, nonlinear estimation, multivariate models, state space representations and empirical mode1 identification. The second edition attempts to incorporate new results and to respond to recent emphases while retaining the basic format of the first edition.

With the exception of new sections on the Wold decomposition, partial autocorrelation, long memory processes, and the Kalman filter, Chapters one through four are essentially unchanged from the first edition. Chapter 5 has been enlarged, with additional material on central limit theorems for martingale differences, an expanded treatment of nonlinear estimation, a section on estimated generalized least squares, and a section on the roots of polynomials. Chapter 6 and Chapter 8 have been revised using the asymptotic Wry of Chapter 5. Also, the discussion of estimation methods has been modified to reflect advances in computing. Chapter 9 has been revised and the material on the estimation of regression equations has been expanded.

The material on nonstationary autoregressive models is now in a separate chapter, Chapter 10. New tests for unit roots in univariate processes and in vector processes have been added.

As with the first edition, the material is arranged in sections so that there is considerable flexibility to the order in which topics can be covered.

I thank David Dickey and Heon Jin Park for constructing the tables of Chapter 10. I thank Anthony An, Rohit Deo, David Hasza, N. K. Nagaraj, Sastry Pantula, Heon Jin Park, Savas Papadopoulos, Sahadeb Sarkar, Dongwan Shin, and George H. K. Wang for many useful suggestions. I am particularly indebted to Sastry Pantula who assisted with the material of Chapters 5, 8, 9, and 10 and made substantial contributions to other parts of the manuscript, including proofs of several results. Sahadeb Sarkar contributed to the material on nonlinear estimation of Chapter 5, Todd Sanger contributed to the discussion of estimated generalized least squares, Yasuo Amemiya contributed to the section on roots of polynomials, Rohit Deo contributed to the material on long memory processes, Sastry Pantula, Sahadeb Sarkar and Dongwan Shin contributed to the material on the limiting

2Liwiii.

Page 14: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

XiV PREFACE To THE SECOND EDITION

distribution of estimators for autoregressive moving averages, and Heon Jin Park contributed to the sections on unit toot autoregressive processes. I thank Abdoulaye Adam, Jay Breidt, Rohit Deo. Kevin Dodd, Savas Papadopouios. and Anindya Roy for computing examples. I thank SAS Institute, Cary, NC, for providing computing support to Heon Jin Park for the construction of tables for unit root tests. The research for the second edition was partly supported by joint statistical agreements with the U.S. Bureau of the Census.

I thank Judy Shafer for the extensive word processing required during preparation of the second edition.

WAYNE A. FULLER

Ames. lo wa November 1995

Page 15: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

List of Principal Results

Theorem 1.4. I 1.4.2 1.4.3 1.4.4 1.5.1 2.2. I

2.2.2 2.2.3 2.4.1 2.4.2 2.6.1

2.6.2

2.6.3

2.6.4 2.7. I

2.7.2

2.8. I

2.8.2

2.9.1 2.9.2

Topic Covariance function is positive semidefinite, 7 Covariance function is even, 8 Correlation function on real tine is a characteristic function, 9 Correlation function on integers is a characteristic function, 9 Covariance function of complex series is positive semidefinite, 13 Weighted average of random variables, where weights are absolutely summable, defines a random variable, 31 Covariance of two infinite sums of random variables, 33 Convergence in mean square of sum of random variables, 35 Order of a polynomial is reduced by differencing, 46 Jordan canonical form of a matrix, 51 Representation of autoregressive process as an infinite moving aver- age, 59 Representation of invertible moving average as an infinite autoregres- sion, 65 Moving average representation of a time series based on covariance function, 66 Canonical representation of moving average time series, 68 Representation of autoregressive moving average as an infinite moving average, 72 Representation of autoregressive moving average as an infinite autoregression, 74 Representation of vector autoregression as an infinite moving average, 77 Representation of vector moving average as an infinite autoregression, 78 Minimum mean square error predictor, 80 Durbin-Levinson algorithm for constructing predictors, 82

xv

Page 16: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

xvi LIST OF PRINCIPAL RESULTS

2.9.3 2.9.4 2.10.1 2.10.2 3.1.1

3.1.2 3.1.3 3.1.4 3.1.5 ’ 3.1.6 3.1.7 3.1.8 3.1.9 3.1.10 3.3.1 3.4.1 4.2.1

4.3.1 4.3.2

4.3.3

4.3.4

4.3.5

4.4.1 4.5.1

5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 5.1.6 5.1.7 5.2.1

Predictors as a function of previous prediction errors, 86 Limit of prediction error is a moving average, 89 Limit of one period prediction emr, 94 Wold decomposition, % Sine and cosine functions form an orthogonal basis for N dimensional vectors, 112 Sine and cosine functions are orthogonal on [- n; r], 116 Bessel’s inequality, 118 Fourier coefficients are zero if and only if function is zero, 119 If Fourier coefficients are zero, integral of function is zero, 119 Integral of function defined in terms of Fourier coefficients, 120 Pointwise representation of a function by Fourier series, 123 Absolute convergence of Fourier series for a class of functions, 125 The correlation function defines a continuous spectral density, 127 The Fourier series of a continuous function is C e s h summable, 129 Fourier integral theorem, 133 Fourier transform of a convolution, 137 Approximate diagonalization of covariance matrix with orthogonal sine-cosine functions, 154 Spectral density of an infinite moving average of a time series, 156 Moving average representation of a time series based on covariance function, 162 Moving average representation of a time series based on continuous spectral density, 163 Autorepwive representation of a time series based on continuous spectral density, 165 Spectral density of moving average with square summable coeffi- cients, 167 Spectral density of vector process, 179 Linear filter for time series observed subject to measurement error, 183 Cbebyshev’s inequality, 219 Common probability limits, 221 Convergence in rth mean implies convergence in probability, 221 Probability limit of a continuous function, 222 The algebra of O,, 223 The algebra of o,, 224 Taylor expansion about a random point, 226 Convergence in law when difference converges in probability, 228

Page 17: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

LIST OF PRINClPAL RESULTS xvii

5.2.2 5.2.3

5.2.4 5.2.5 5.2.6

5.3.1 5.3.2 5.3.3 5.3.4 5.3.5 5.3.6 5.3.7 5.3.8 5.4.1 5.4.2 5.4.3 5.4.4 5.5.1 5.5.2

5.5.3

5.5.4 5.6.1 5.7.1

5.7.2 5.7.3

5.7.4

5.8.1 5.8.2 6.1.1 6.1.2 6.1.3 6.2.1 6.2.2

Helly-Bray Theorem, 230 Joint convergence of distribution functions and characteristic func- tions, 230 Convergence in distribution of continuous functions, 230 Joint convergence in law of independent sequences, 230 Joint convergence in law when one element converges to a constant, 232 Lindeberg central limit theorem, 233 Liapounov central limit theorem, 233 Centrai limit theorem for vectors, 234 Central limit theorem for martingale differences, 235 Functional central limit theorem, 236 Convergence of functions of partial sums, 237 Multivariate functional central limit theorem, 238 Almost sure convergence of martingales, 239 Moments of products of sample means, 242 Approximate expectation of functions of means, 243 Approximate expectation of functions of vector means, 244 Bounded integrals of functions of random variables, 247 Limiting properties of nonlinear least squares estimator, 256 Limiting distribution of estimator &fined by an objective function, 260 Consistency of nonlinear estimator with different rates of conver- gence, 262 Limiting properties of one-step Gauss-Newton estimator, 269 Limiting properties of instrumental variable estimators, 275 Convergence of estimated generalized least squares estimator to generalized least squares estimator, 280 Central limit theorem for estimated generalized least squares, 284 Estimated generalized least squares with finite number of covariance parameters, 286 Estimated generalized least squares based on simple least squares residuals, 289 Convergence of roots of a sequence of polynomials, 295 Differentials of roots of determinantal equation, 298 Convergence in mean square of sample mean, 309 Variance of sample mean as function of the spectral density, 310 Limiting efficiency of sample mean, 312 Covariances of sample autocovariances, 314 Covariances of sample autocovariances, mean estimated, 316

Page 18: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

XViii LlST OF PRINCIPAL RESULTS

6.2.3 6.3.1 6.3.2

6.3.3 6.3.4

6.3.5 6.3.6 6.4.1 6.4.2

7.1.1 7.1.2 7.2.1 7.2.2 7.3.1

7.4.1 7.4.2 7.4.3 8.2.1

8.2.2

8.2.3

8.3.1

8.4.1

8.4.2

8.5.1 8.5.2 8.5.3 8.6.1 8.8.1 9.1.1

Covariances of sample correlations, 317 Central limit theorem for rn-dependent sequences, 321 Convergence in probability of mean of an infinite moving average, 325 Central limit theorem far mean of infinite moving average, 326 Central limit theorem for linear function of infinite moving average, 329 Consistency of sample autocovariances, 331 Central Iimit theorem for autocovariances, 333 Covariances of sample autocovariances of vector time series, 342 Central limit theorem for cross covariances one series independent

Expected value of periodogram, 359 Limiting distribution of periodogram ordinates, 360 Covariances of periodogram ordinates, 369 Limiting behavior of weighted averages of priodogram ordinates, 372 Limiting behavior of estimated spectral density based on weighted autocovariances, 382 Diagonalization of covariance matrix of bivariate process, 387 Distribution of sample Fourier coefficients for bivariate process, 389 Distribution of smoothed bivariate periodogram, 390 Limiting distribution of regression estimators of parametem of pth order autoregressive pmess, 408 Equivalence of alternative estimators of parameters of autoregressive process, 418 Limiting distribution of estimators of parameter of pth order vector autoregressive process, 420 Limiting distribution of nonlinear estimator of parameter of first order moving average, 424 Limiting distribution of nonlinear estimator of vector of parameters of autoregressive moving average, 432 Equivalence of alternative estimators for autoregressive moving average, 434 Order of error in prediction due to estimated parameters, 444 Expectation of prediction error, 445 Order n-' approximation to variance of prediction error, 446 Polynomial autoregression, 452 Maximum likelihood estimators for long memory processes, 470 Limiting distribution of simple least squares estimated parameters of regression model with time series errors, 478

(0, d), 345

Page 19: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

LIST OF PRINCIPAL RESULTS xix

9.1.2

9.1.3

9.3.1

9.4.1 9.4.2 9.4.3 9.7.1

9.7.2

9.8.1

9.8.2

9.8.3

10.1.1

10.1.2

10.1.3

10.1.4

10.1.5

10.1.6

10.1.7

10.1.8

10.1.9

10.1.10

10.1.1 1

Spectral representation of covariance matrix of simple least squares estimator and of generalized least squares estimator, 479 Asymptotic relative efficiency of simple least squares and generalized least squares, 480 Properties of autocovariances computed from least squares residuals, 485 Centered moving average estimator of polynomial trend, 501 Trend moving average removes autoregressive unit root, 502 Effect of a moving average for polynomial trend removal, 504 Asymptotic equivalence of generalized least squares and estimated generalized least squares for model with autoregressive errors, 521 Limiting distribution of maximum likelihood estimator of regression model with autoregressive errors, 526 Limiting distribution of least squares estimator of model with lagged dependent variables, 530 Limiting distribution of instrumental variable estimator of model with lagged dependent variables, 532 Limiting properties of autocovariances computed from instrumental variable residuals, 534 Limiting distribution of least squares estimator of first order auto- regressive process with unit root, 550 Limiting distribution of least squares estimator of pth order auto- regressive process with a unit root, 556 Limit distribution of least squares estimator of first order unit root process with mean estimated, 561 Limiting distribution of least squares estimator of pth order process with a unit root and mean estimated, 563 Limiting distribution of least squares estimator of unit root autoreges- sion with drift, 566 Limiting distribution of least squares estimator of unit root autoregres- sion with time in fitted model, 567 Limiting distribution of symmetric estimator of first order unit root autoregressive process, 570 Limiting distribution of symmetric estimators adjusted for mean and time trend, 571 Limiting distribution of symmetric test statistics for pth order auto- regressive process with a unit root, 573 Limiting distribution of maximum likeIihood estimator of unit root, 573 Order of e m r in prediction of unit root process with estimated parameters, 582

Page 20: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

xx LIST OF PRINCIPAL RESULTS

10.2.1 10.2.2

10.3.1

Limiting distribution of explosive autoregressive estimator, 585 Limiting distribution of vector of least squares estimators for pth order autoregression with an explosive mot, 589 Limiting distribution of least squares estimator of vector of estimators of equation containing the lagged dependent variable with unit coefficient, 600 Limiting distribution of least squares estimator when one of the explanatory variables is a unit root process, 603 Limiting distribution of least s q m estimator of coefficients of vector process with unit roots, 610 Limiting distribution of maximum likelihood estimator for multi- variate process with a single root, 613 Limiting distribution of maximum likelihood estimator for multi- variate process with g unit roots, 619

10.3.2

10.3.3

10.3.4

10.3.5

Page 21: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

List of Examples

Number 1.3.1 1.3.2 1.3.3 1.3.4 2.1.1 2.1.2 2.3. I 2.5.1 2.9.1 2.9.2 4.1.1 4.1.2

4.5.1 4.6. I 4.6.2 4.6.3 4.6.4 4.6.5 5.1.1 5.4.1 5.5.1 5.5.2 5.5.3 5.6. I 6.3.1

Topic Finite index set, 4 White noise, 5 A nonstationary time series, 5 Continuous time series, 6 First order moving average time series, 23 Second order moving average time series, 24 First order autoregressive time series, 40 Correlogram of time series, 57 Prediction for unemployment rate, 90 Prediction for autoregressive moving average, 91 Spectral distribution function composed of jumps, 147 Spectral distribution function of series with white noise component, 148 Filter for time series observed subject to measurement emr, 184 Kalman Filter for Des Moines River data, 192 Kalman filter, Des Moines River, mean unknown, 193 M a n filter, autoregressive unit root, 195 &ilman filter for missing observations, 198 Predictions constructed with the Kalman filter, 202 Taylor expansion about a random point, 227 Approximation to expectation, 248 Transformation of model with different rates of convergence, 265 Failure of convergence of second derivative of nonlinear model, 267 Gauss-Newton estimation, 272 Instrumental variable estimation, 277 Sample autocorrelations and means of unemployment rate,, 336

Page 22: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

xxii LIST OF EXAMPLES

6.4.1 7.1.1 7.2.1 7.2.2 7.4.1 8.2.1 8.3.1 8.4.1 8.4.2 8.5.1 8'6.1 8.7.1 8.7.2 9.2.1 9.3.1 9.3.2 9.5.1 9.7.1 9.7.2

9.7.3 9.8.1

10.1.1 10.12 10.1.3 10.1.4 10.2.1 10.2.2 10.3.1 10.3.2 10.4.1

Autocomlations and cross correlations of Boone-Saylorville data, 345 Periodogram for wheat yield data, 363 Periodogram of autoregression, 375 Periodogram of unemployment rate, 379 Cross spectrum computations, 394 Autoregressive fit to unemployment rate, 412 Estimation of first order moving average, 427 Autoregressive moving average fit to artificial data, 439 Autoregressive moving average fit to unemployment rate, 440 Prediction with estimated parameters, 449 Nonlinear models for lynx data, 455 Missing observations, 459 Outlier observations, 462 Grafted quadratic fit to U.S. wheat yields, 482 Variance as a function of the mean, 490 Stochastic volatility model, 495 Structural model for wheat yields, 510 Regression with autoregressive errors, spirit consumption, 522 Nonlinear estimation of trend in wheat yields with autoregressive error, 527 Nonlinear estimation for spirit consumption model, 528 Regression with lagged dependent variable and autoregressive errors, 535 Estimation and testing for a unit root, ordinary least squares, 564 Testing for a unit root, symmetric and likelihood procedures, 577 Estimation for process with autoregressive root in (-1,1], 581 Prediction for a process with a unit root, 583 Estimation with an explosive root, 593 Prediction for an explosive process, 596 Estimation of regression with autoregressive explanatory variable, 606 Estimation and testing for vector process with unit roots. 624 Test for a moving average unit root, 635

Page 23: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

Introduction to Statistical Time Series

Page 24: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

C H A P T E R 1

Introduction

The analysis of time series applies to many fields. In economics the recorded history of the economy is often in the form of time series. Economic behavior is quantified in such time series as the consumer price index, unemployment, gross national product, population, and production. The natural sciences also furnish many examples of time series. The water level in a lake, the air temperature, the yields of corn in Iowa, and the size of natural populations are all collected over time. Growth models that arise naturally in biology and in the social sciences represent an entire field in themselves.

The mathematical theory of time series has been an area of considerable activity in recent years. Applications in the physical sciences such as the development of designs for airplanes and rockets, the improvement of radar and other ebctronic devices, and the investigation of certain production and chemcial processes have resulted in considerable interest in time series analysis, This recent work should not disguise the fact that the analysis of time series is one of the oldest activities of scientific man.

A successful application of statistial methods to the real world requires a melding of statistical theory and knowledge of the material under study. We shall confine ourselves to the statistical treatment of moderately well-behaved time series, but we shall illustrate some techniques with real data.

1.1. PROBABILH'Y SPACES

When investigating outcomes of a game, an experiment, or some natural phenomenon, it is useful to have a representation for all possible outcomes. The individual outcomes, denoted by o, are called elementary events. The set of all possible elementary events is called the sure event and is denoted by R. An example is the tossing of a die, where we could take R = (one spot shows, two spots show,. . . , six spots show} or, more simply, R = {1,2,3,4,5,6}.

Let A be a subset of R, and let d be a collection of such subsets. If we observe the outcome o and w is in A, we say that A has occurred. Intuitively, it is possible

1

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 25: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

2 INTRODUCTION

to specify P(A), the probability that (or expected long-run frequency with which) A will occur. It is reasonable to require that the function P(A) satisfy:

AXIOM 1. P(A) 3 0 for every A in I.

A X I O M ~ . IfA,,A,, ... isacountablesequencefromdandA,flAj isthenull AXIOM 2. P ( 0 ) = 1.

set for all i Zj , then P( U ;= A,) = 22;= P(A,).

Using our die tossing example, if the die is fair we would take P(A) = i [the number of elementary events o in A]. Thus P({1,3,5)) = i X 3 = i. It may be verified that Axioms 1 to 3 are satisfied for I equal to fin), the collection of all possible subsets of 0.

Unfortunately, for technical mathematical reasons, it is not always possible to define P(A) for all A in P(0) and also to satisfy Axiom 3. To e h b t e this difficulty, the class of subsets I of 12 on which P is defined is restricted. The collection d is required to satisfy:

1. If A is in &I?, then the complement A" is also in 58. 2. IfA,,A, ,... isacountablesequencefromI,then u ~ = l A i h i n I . 3. The ndl set is in d.

A nonempty collection I of subsets of 0 that satisfies conditions 1 to 3 is said to be a sigma-algebra or sigma-Bid.

We are now in a position to give a formal definition of a probability space. A probability space, represented by (0, I, P), is the sure event Q together with a sigma-algebra 1 of subsets of 12 and a function P(A) defined on I that satisfies Axioms 1 to 3.

For our purposes it is unnecessary to consider the subject of probability spaces in detail. In simple situations such as tossing a die, 0 is easy to enumerate, and P satisfies Axioms 1 to 3 for I equal to the set of all subsets of 0.

Although it is conceptually possible to enumerate all possible outcomes of an experiment, it may be a practical impossibility to do so, and for most purposes it is unnecessary to do so. It is usually enougb to record the outcome by some function that assumes values on the real line. That is, we assign to each outcome o a real number X(o), and if o is observed, we record X(w). In our die tossing exampie we could take X(w) = 1 if the player wins and - 1 if the house wins.

Formally, a random variable X is a real valued function defined on a such that the set {o : X(w) six} is a member of I for every real number x. The function F,(x) = P({w : X(o) 6 x}) is called the distribution function of the random variable X.

The reader who wishes to explore further the subjects of probability spaces, random variables, and distribution functions for stochastic processes may read Tucker (1967, pp. 1-33). The preceding brief introduction will sufEce for our Purposes.

Page 26: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

Tam SERIES 3

1.2. TIME SERIES

Let (a, d, P) be a probability space, and let T be an index set. A real valued time series (or stochastic process) is a real valued function X(t, o) defined on T X fi such that for each fixed t, X(t , o) is a random variable on (a, d, P). The function X ( t , o ) is often written X , ( o ) or X,, and a time series can be considered as a collection {X, : t E T} of random variables.

For fixed o, X(t, w ) is a real valued function of t. This function of f is called a realization or a sample function. If we look at a plot of some recorded time series such as the gross national product, it is important to realize that conceptually we are looking at a plot of X ( t , w ) with o fixed. The collection of all possible realizations is called the ensemble of functions or the ensemble of realizations.

If the index set contains exactly one element, the stochastic process is a single random variable and we have d e W the distribution function of the process. For stochastic processes with more than one random variable we need to consider the joint distribution function. The joint distribution function of a finite set of random variables {Xt1, XIZ, . . . , X, } from the collection {X, : t E T} is defined by

A time series is called strictly stationary if

where the equality must hold for all possible (nonempty finite distinct) sets of indices t , , t 2 , . . . , tm and t , + h , t2 + h , , . . , t , + h in the index set and all ( . X , ~ , X , ~ , . . . ,x , ) in the range of the random variable X,. Note that the indices t,, t,, . . . , t, are not necessarily consecutive. If a time series is strictly stationary, we see that the distribution function of the random variable is the same at every point in the index set. Furthermore, the joint distribution depends only on the distance between the elements in the index set, and not on their actual values. Naturally this does not mean a particular realization will appear the same as another realization.

If {X,: t E T} is a strictly stationary time series with E{B,l}C 00, then the expected value of X, is a constant for all t, since the distribution function is the same for a~ r. Likewise, if E{x:} < 00, then the variance of X, is a constant for all t.

A time series is defined completely in a probabilistic sense if me knows the cumulative distribution (1.2.1) for any finite set of random variables (X,r, X,2, . . . ,XI"). However, in most applications, the form of the distribution function is not known. A great deal can be accomplished, however, by dealing

Page 27: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

4 INTRODUCplON

only with the first two moments of the time series. In line with this approach we define a time series to be weakly stationary if:

1. The expected value of XI is a constant for all t. 2. The covariance matrix of (XI,, XI*, . . . ,XI ) is the same as the covariance

matrix of ( X ~ ~ + , , , X ~ ~ + ~ , ~ . . ,x,,+,+) for a11 nGnempty finite sets of indices ( t , , t,, , . . ,t,,), and all h such that t , , t,, . . . , t , , t , + h, t, + h , . . . , r,, + h are contained in the index set.

As before t , , t,, . . . , t,, are not necessarily consecutive members of the index set. Also, since the expected value of XI is a constant, it may conveniently be taken as 0. The covariance matrix, by definition, is a function only of the distance between observations. That is, the covariance of XI+,, and X, depends only on the distance, h, and we may write

where E{X,} has been taken to be zero. The function Hh) is called the autocovariance of XI. When there is no danger of confusion, we shall abbreviate the expression to covariance.

The terms stationary in the wide sense, covariance stationary, second order stationary, and stationary are also used to describe a weakly stationary time series. It follows from the definitions that a strictly stationary process with the first two moments finite is also weakly stationary. However, a strictly stationary time series may not possess finite moments and hence may not be covariance stationary.

Many time series as they occur in practice are not stationary. For example, the economies of many countries are developing or growing. Therefore, the typical economic indicators will be showing a “ttend” through time. This trend may be in either the mean, the variance, or both. Such nonstationaiy time series are sometimes called evolutionary. A good portion of the practical analysis of time series is connected with the transformation of an evolving time series into a stationary time series. In later sections we shall consider several of the procedures used in this connection. Many of these techniques will be familiar to the reader because they are closely related to least squares and regression.

13. EXAMPLES OF STOCHASTIC PROCESSES

Example 13.1. Let the index set be T= {1,2}, and let the space of outcomes be the possible outcomes associated with tossing two dice, one at “time” t = 1 and one at time t = 2 . Then

fz = ( I , 2,3,4,5,6}X {1,2,3,4,5,6}.

Page 28: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXAMPLES OP STOCHASTIC PROCESSES 5

Define

X(r, w) = t + [value on die r ] * . Therefore, for a particular o, say o, = (1,3), the realization or sample function would be (1 + 1,2 + 9) = (2,ll). In this case, both St and Tare finite, and we are able to determine that there are 36 possible realizations. AA

Example 1.32. One of the most important time series is a sequence of uncorrelated random variables, say {el : t E (0,4 1,+2,. . .)}, each with zero mean and finite variance, c2 > 0. We say that e, is a sequence of UncorreIated (0, a*) random variables.

This time series is sometimes called white noise. Note that the index set is the set of all integers. The set 4'l is determined by the range of the random variables. Let us assume that the e, are normally distributed and therefore have range (-m,a). Then w € S t is a real valued infinite-dimensional vector with an element associated with each integer. The covariance function of {e,} is given by

h = O , '(') _(0"2 ' otherwise .

Because of the importance of this time series, we shall reserve the symbol el for a sequence of uncorrelated random variables with zero mean and positive finite variance. On occasion we shall further assume that the variables are independent and perhaps of a specified distribution. These additional assumptions will be stated when used.

In a similar manner the most commonly used index set will be the set of all integers. If the index set is not stated, the reader may assume that it is the set of ail integers. AA

Example 133. Consider the time series

where B = (b,, a2)' is distributed as a bivariate normal random varible with mean #3 = (PI, &>' and covariance matrix

(:: Z::) *

Any realization of this process yields a function continuous on the interval [0, 11, and the process therefore is called conrinuous. Such a process might represent the outcome of an experiment to estimate the linear response to an input variabIe measured on the interval [0,1]. The 8 ' s are then the estimated regression coefficients. The set il is the two-dimensional space of real numbers. Each

Page 29: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

6 INTRODUCTION

experiment conducted to obtain a set of estimates of /$ and realization of the process.

constitutes a

The mean function of X, is given by

mf) = a 1 + At> = w + &t 9

and the covariance function by

It is clear that this process is not stationary. AA

Example 13.4. For the reader familiar only with the study of multivariate statistics, the idea of a random variable that is both continuous and random in time may require a moment of reflection. On the 0th hand, such processes occur in nahue. For example, the water level of a lake or river cao be plotted as a continuous function of time. Futhemore, such a plot might appear so smooth as to support the idea of a derivative or instantaneous rate of change in level.

Consider such a process, {Xf: r €(-m, m)}, and let the covariance of the process be given by

Hh) = E{(Xf - P)(X~+~ - p)) = Ae-uh2, a > 0. '

Thus, for example, if time is measured in hours and the river level at 7:OO A.M. is reported daily, then the covariance between daily reports is Ae-576a. Likewise, the covariance between the change from 7:OO A.M. to 8:OO A.M. and the change from 8:OO A.M. to 900 A.M. is given by

E{(x,+, - X,XX,+, -x,+,) = -A[I - 2e-" + e - 4 a ~ .

Note that the varianm of the change from r to r + h is

var(x,+, -Xt}=2A[1 - e -*h21 .

and

limvar(x,+h-x,}=o. h 4

A process with this We might define

property is called man square conti-. the rate of change per unit of time by

Page 30: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

AUTUCOVARLANCE AND AUTOCORRELATION PUNCITONS 7

For a fixed h > 0, R,(h) is a well-defined random variable and

Furthermore, by L'Hospital's rule,

2A[2ahe - ah23 lim Var@,(k)} = lim h -+O h+O

= 2 A a .

Stochastic processes for which this limit exists are called meun squaw differentiable. AA

1.4. PROPERTIES OF THE AUTOCOVARIANCE AND AUTOCORRELATION FUNCTIONS

To compare the basic properties of time series, it is often useful to have a function that is not influenced by the units of measurement. To this end, we define the autocorrelation function of a stationary time series by

Thus the autocorrelation function is the autocovariance function normalized to be one at h = 0. As with the autocovariance function, we shall abbreviate the expression to correlation function when no confusion will result.

The autocovariance and autocorrelation functions of stationary time series possess several distinguishing characteristics. A function Ax) defined for x E ,y is said to be positive semide$nite if it satisfies

( 1.4.1)

for any set of real numbers (u, , u2.. . . , a n ) and for any (r,, t,, . . . , f n ) such that t , - t, is in ,y for all (i , j ) . Some authors reserve the term positive semidefinite for a function that is zero for at least one nonzero vector (a,, a,, . . . , un) and use nonnegative definite to describe functions satisfying (1.4.1). We shal1 use both terms for functions satisfying ( I .4.1).

Theorem 1.4.1. The covariance function of the stationary time series {T, : t E T } is nonnegative definite.

Proof. Without loss of generality, let E{X,} = 0. Let (tl, t,, . . . , t , ) E T, let (al, a2 , , . . , u,,) be any set of real numbers, and let r(tk - t,) be the covariance

Page 31: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

8 INTRODUCTION

between X, and Xtk. We b o w that the variance of a random variable, when it is defined, is 'nonnegative. Therefore,

A

If we set n = 2 in (1.4.1), we have

which implies

For t , - t , = h we set - a , = a , = l and then set a , = a , = I to obtain the well-known property of correlations:

The concepts of even and odd functions (about zero) will prove useful in our study. For example, cast is an even function. The use of this description is apparent when one notes the symmetry of cos t on the interval [ - n, n]. Similarly, sin t is an oddfunction. In general, an even function, fit), defined on a domain T, is a function that satisfiesfit) = f l - t ) for all t and - f in T. An odd function g(f) is a function satisfying g(t) = -g(-t) for ali t and --t in T. Many simple praperties of even and odd functions fallow immediately. For example, iffit) is an even function and g(t) is an odd function, where both are integrable on the interval [-A, A], then, for 0 S b C A,

As an exercise, the reader may verify that the product of two even functions is even, the product of two odd functions is even, the product of an odd and an even function is odd, the sum of even functions is even, and the sum of odd functions is odd.

Theorem 1.4.2. The covariance function of a real valued stationary time series is an even function of ir. That is, Hh) = y(-h).

Page 32: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

AUTOCOVARIANCE AND AUTOCORRELATION FUNCTIONS 9

Proof. We assume, without loss of generality, that E{X,} = 0. By stationarity,

E{X,X,+,)= r(h)

for all r and t + h contained in the index set. Therefore, if we set to = t , - h,

A

Given this theorem, we shall often evaluate r(h) for real valued time series for nonnegative h only. Should we fail to so specify, the reader may always safely substitute lhl for h in a covariance function.

In the study of statistical distribution functions the characteristic function of a distribution function is defined by

whek the integral is a Lebesgue-Stieltjes integral, G(x) is the distribution function, and ecxh is the complex exponential defined by lXh = cos xh f 6 sin xh. It is readily established that the function q(h) satisfies:

1. do)= 1. 2. Ip(h)j s 1 for all h E (-a, 00).

3. p(h) is uniformly continuous on (-m,w).

See, for example, Tucker (1967, p. 42 ff.) or Gnedenko (1967, p. 266 ff.). It can be shown that a continuous function p(h) with p(0) = 1 is a characteristic

function if and only if it is nonnegative definite. For example, see Gnedenko (1967, pp. 290, 387).

Tbeoresn 1.43. The real valued function p(h) is the correlation function of a mean square continuous stationary real valued time series with index set T = (-do, m) if and only if it is representable in the form

f"

where ax) is a symmetric distribution function, and the integral is a Lebesgue- Stieltjes integraI.

For the index set T = (0, rt 1, 2 2 , . . .}, the corresponding theorem is:

Theorem 1.4.4. The real valued function p(h) is the correlation function of a real valued stationary time series with index set T = {0,2 1.22, . . .} if and only if

Page 33: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

10 INTRODUCTION

it is representable in the form

where a x ) is a symmetric distribution function.

This theorem will be discussed in Chapter 3. Associated with the autocorrelation function of a time series is the partial

autocorrelation function. Before discussing ideas of partial correlation for time series, we recall the partial correlation coefficient of multivariate analysis. Let Y = (Y,* Y2, . . . * Y,) be a p-variate random variable with nonsingular covariance matrix. Then the partial correlation between Yl and Yz after Y3 is

where pu=(qio;,)-1’2q,2and qj is the covariance between Y, and 5. An alternative definition of ~ 1 2 . 3 in tern of regression coefficients is

( 1.4.4) Pi2.3 = 882.3fl2821.3 9

where j3,,.k is the partial regression coefficient for q in the regression of U, on y j and Yk. For example,

813.2

Recall that pi,+ is the simple population regression coefficient for the regression of

U, - P A on yl. - &Y,, where & = aki’a,,. Therefore, the partial conelation between U, and y j after V, is the simple correlation between Y, - &Y, and Y, - &Yk. Also note that the sign of

is always equal to the sign of &k. The multiple correlation associated with the regression of on y/ and U, is denoted by Ri,, , and is defined by the equation

(1.4.5) 1 -R:(jk) = (1 - pi)(1 - p:k.j) = (1 - p:&)(l - pi.&> *

Using (1.4.4), the extension of the definition to higher order partial correlations is straightforward. The squared partial cumiation between & and 7 after adjusting for Y, and & is

( 1.4.6) 2 P t , . k l = 6 , . k J $ . k I 9

Page 34: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

AUTOCOVARIANCE AND AUTOCORRELATlON FUNCTIONS 11

where

qj q k q{ - I (2:;)=(; 2: ;) (z). (1.4.7)

The partial correlation P,~.~, can also be interpreted as the simple Correlation between yi - pikJ, - /3,/J and - &Y, - & k & The multiple correlation associated with the regression of yi on the set (U,. Yk, U,) satisfies

pi/. j k

2 1 - RZ,,/, = (1 - P i x 1 - P;k.j)(l - Pi/.,*)

( 1.4.8)

The squared partial correlation between Y, and Y, after adjusting for

2 = (1 - PjL,)(l - P ; / . N - P,J.t,)

(Y3, Y,,. . . , Y,) is

2 P12.3.4 ... .p [email protected], .p&l.3.4 ..... p 9

where /312.3.4, , , p is the population regression coefficient for Y2 in the regression of Yl on Yz, Y3, . . . , Y,, and ,,, is the population regression coefficient for Yl in the regression of Y2 on Yl, Y3, Y,, . . . , Y,. The multiple correlation between Y, and the set (Y2, Y,, . . . , Y,) satisfies the equation

2 2 2 2 1 - R I ( 2 . 3 , , ,p)=(l - k ) ; Z ) ( l - P 1 3 . 2 ) ( 1 - P 1 4 . 2 . 3 ) ” ’ ( 1 -P lp .2 .3 .._. p - 1 ) ’

(1.4.9)

< 1, the partial autocarrelatation function is denoted by &h) and, for 0 < h G r, is the partial correlation between U, and U,+h after adjusting for Y,+I, Y,+2,. . . , T + h - , . It is understood that 40) = 1 and Cjyl) = p(i). The partial autocorrelation function is defined in an analogous way for h < 0 . Because p(h) is symmetric, +(h) is symmetric. Let q.h be the population regression coefficient for q+, 1 C i S h, in the regression of Y, of U,- yI-+ . , . , y I - h . The population regression equation is

For a covariance stationary time series U, with R:o,3.

where

( 1.4.1 1)

Page 35: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

12 INTRODUCTION

the matrix of correlations is positive definite by the assumption that R ; ( ~ , ~ , . , , , , . - , ) < 1, and a,,, is the population regression residual. From (1.4.11).

&h) = ehh (1.4.12)

because the coefficient for U,-h in the regression of Y, on U,- Y,-2, . . . , q-h is equal to the coefficient for U, in the regression of q-,, on &-h+ I , q - h + Z I . . . , U,. The equality follows from the symmetry of the autocorrelation function d h ) . From (1.4.9) we have

(1.4.13)

1.5. COMPLEX VALUED TIME SERlES

Occasionally it is advantageous, from a theoretical point of view, to consider complex valued time series. Letting X, and Y, be two real valued time series, we define the complex valued time series 2, by

z, =x, + fix. (1.5.1)

The expected value of 2, is given by

and we note that

E*{Z,} = E{ZY} = E{X,} - iE{y,},

(1 5 2 )

(1.5.3)

where the symbol "*" is used to denote the complex conjugate. The covariance of 2, and z , + h is defined as

Note that the variance of a complex valued process is always real and nonnegative, since it is the sum of the variances of two real valued random variables.

The definitions of stationarity for complex time series are completely analogous to those for real time series. Thus, a complex time series Z, is weakly stationary if the expected value of 2, is a constant for all z and the covariance matrix of (Zf,, Z12, . . . ,Z, ) is the same as the covariance matrix of (Z,, +,,, Z , Z + h , . . . , Zt ,+h) ,

where all indi&s are contained in the index set.

Page 36: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PERIODIC FUNCTIONS AND PERIODIC TIME SERIES 13

From (1.5.4), the autocovariance of a stationary complex time series 2, with zero mean is given by

We see that g, (h) is a symmetric or even function of h, and g,(h) is an odd function of h. By the definition of the complex conjugate, we have

rz*(h) = 7g-W * (1.5.6)

Therefore, the autocovariance function of a complex time series is skew symmetric, where (1.5.6) is the definition of a skew symmetric function.

A complex valued fucntion r( - ), defined on the integers, is positive semidefi- nite if

(1 57)

for any set of n complex numbers (u, , u2, . . . , un) and any integers (t I , t,, . . . , rJ. Thus, as in the real valued case, we have:

Theorem 1.5.1. The covariance function of a stationary complex valued time series is positive semidefinite.

It follows from Theorem 1.5.1 that the correlation inequality holds for complex random variables; that is,

In the sequel, if we use the simple term “time series,” the reader may assume that we are speaking of a reaI valued time series. All complex valued time series will be identified as such.

1.6. PERIODIC FUNCTIONS AND PERIODIC TIME SERIES

Periodic functions play an important roie in the analysis of empirical time series. We define a function f(t) with domain T to be periodic if there exists an H > 0 such that for all t, t + H E T,

At + H) =fct>. where H is the period of the function. That is, the functionflt) takes on all of its

Page 37: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

14 WTRODUmION

possible values in an interval of length H. For any positive integer, k, kff is also a period of the function. While examples of perfectly periodic functions are rare, there are situations where observed time series may be decomposed into the sum of two time series, one of which is periodic or nearly periodic. Even casual observation of many economic time series will disclose seasonal variation wherein peaks and troughs occur at approximately the same month each year. Seasonal variation is apparent in many natural time series, such as the water level in a lake, daily temperature, wind speeds and velocity, and the levels of the tides. Many of these time series also display regular daily variation that has the appearance of rough “cycles.”

The trigonometric functions have traditionally been used to approximate periodic behavior. A function obeying the sine-cosine type periodicity is com- pletely specified by three parameters. Thus, we write

At) = A sin( At + q) , ( 1.6.1)

where the amplitude is the absolute value of A, A is the frequency, and Q is the phase angle. The frequency is the number of times the function repeats itself in a period of length 2 ~ . The phase angle is a “shift parameter” in that it determines the points ( - - A - ’ Q plus integer multiples of T) where the function is zero. A parametrization that is more useful in the estimation of such a function can be constructed by using the trigonometric identity

sin(Ar + 4p) = sin A t cos Q + cos A t sin 4p .

Thus (1.6.1) becomes

At) = B, cos At + B, sin A t , (1.6.2)

where B, = A sin Q and B, = A cos 4p.

perfect periodicity. Define {X, : t E (0, -t 1,-t2, . . .)) by Let us consider a simple type of time series whose realizations will display

X, = e , cos At + e, sin At, (1.6.3)

where e , an e, are independent drawings from a normal (0 , l ) population. Note that the realization is completely determined by the two random variables e l and e,. The amplitude and phase angle vary from realization to realization, but the period is the same for every realization.

The stochastic properties of this time series are easily derived:

E{X,} = E{e, cos At + e2 sin Ar} = O ;

r(h) = E{(e, cos At + e2 sin At)[e, cos A(t + h) + e, sin A(r + h)]}

= E{e f cos k cos A(t f h) + e 2” sin At sin A(t + h) ( 1.6.4)

Page 38: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

VECTOR VALUED TIME SERIES 1s

+ e,e2 cos At sin A(t + h) + e2el sin At cos A(t + h)}

= cos At cos A(t + h) + sin At sin A(t + h) =cosMt.

We see that the process is stationary. This example also serves to emphasize the fact that the covariance function is obtained by averaging the product X;Yl+,, over realizations.

It is clear that any time series defined by the finite sum

M M

( 1.6.5)

where e, , and eZj are independent drawings from a normal (0, a:) population and A, Z A, for k Sj, will display periodic behavior. The representation (1.65) is a useful approximation for portions of some empirical time series, and we shall see that ideas associated with this representation are important for the theoretical study of time series as well.

1.7. VECTOR VALUED TIME SERIES

Most of the concepts associated with univariate time series have immediate generalizations to vector valued time series. We now introduce representations for multivariate processes.

The kdimensional time series {X,: t = 0, rt 1 , t 2 , . . .} is defined by

XI = [ X f , . x,,, . f . 7 &,I' 9 (1.7.1)

where {Xi, : t = 0, If: 1, 22, . . .}, i = 1,2,. . . , k, are scalar time series. The expected value of Xi is

E@:) = [W,,), E{X,,}, * 9 E{X,,)I 9

Assuming the mean is zero, we define the covariance matrix of X, and XI+,, by

As with scalar time series, we define X, to be covariance stationary if:

Page 39: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

16 UriaODUClgON

1. The expected value of X I is a constant function of time. 2. The covariance matrix of XI and X,+,, is the same as the covariance matrix

of X, and X,,,, for all r, t + h, j , j + h in the index set.

If a vector time series is stationary, then every component scalar time series is stationary. However, a vector of scalar stationary time series is not necessarily a vector stationary time series.

The second stationarity condition means that we can express the covariance matrix as a function of h only, and, assuming E{X,} = 0, we write

for stationary time series. Note that the diagonal elements of this matrix are the autocovariances of the $,. The off-diagonal elements ate the cross covariances of Xi, and Tg. The element E{X,&.f+h} is not necessarily equal to E{X,,f+,,X,,}, and hence r(h) is not necessarily equal to r(-h). For example, let

X I , = el , X z f = e f + / 3 e f _ , .

(1.7.3)

Then

However,

and it is clear that

It is easy to verify that (1.7.4) holds for all vector stationary time series, and we state the result as a lemma.

Lemma 1.7.1. The autocovariance matrix of a vector stationary time series satisfies r (h ) = r’( - h).

The nonnegative definite property of the scalar autocovariance function is maintained essentially unchanged for vedor processes.

Page 40: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 17

Lemma 1.7.2. The covariance function of a vector stationary time series {X, : r E T} is a nonnegative definite function in that

for any set of real vectors {a,, a2,. . . , a,} and any set of indices {rl, t,, . . . , f,} E T.

Proof. The result can be obtained by evaluating the variance of

A

We define the correlation matrix of X, and X,+,, by

P(h) = D, ‘r(h)D, I , (1.7.5)

where Do is a diagonal matrix with the square root of the variances of the Xi, as diagonal elements; that is,

D: = diagb,, (01, ~ ~ ( 0 ) . . . . y,(O)} . The ijth eiement of P(h), p,(h), is called the cross correlation of X,, and Xjt.

For the time series of (1.7.3) we have

Chung (1968). Gnedenko (1%7), M v e (1963). Rao (1973), Tucker (1967). Yaglom (1%2).

EXERCISES

1. Determine the mean and covariance function for Example 1 of Section 1.3.

2. Discuss the statiomity of the following time series:

Page 41: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

18 INTRODUCTION

(a) (X, : t E (0, 5 1.22, . . .)} =Value of a randomly chosen

] t odd observation from a n o d distribution with mean $ and variance +

t even I = 1 if toss of a true coin results in a head

= 0 if toss of a true coin results in a tail

(b) {X,: t E (0, tl,-t-2,. . .)} is a time series of independent identicidly distributed random variables whose distribution function is that of Student’s t-distribution with one degree of freedom.

t = O , ={Ti:, +@,, t = 1,2,,, .,

where c is a canstant, IpI< 1, and the el are iid(0.1) random variables.

3. Which of the following processes is covariance stationary for T = { t : t E (0.1,2,. . .)}, where the e, am independent identically distributed (0,l) random variables and a, , a, are fixed real numbers? (a) e, + e2 cos t. (b) el + e, cost t + e3 sin t. (c) a, + e l cos t. (d) a, + e l cost + e2 sin t. (e) el + a, cos t. (f) a, +e,a:+e2, O<a,<l.

4. (a) Prove the following theorem.

Theorem, If X, and y( are independent covariance stationary time series, then UX, + bY,, where a and b are real numbers, is a covariance stationary time series.

(b) Let Z, = X, f Y,. Give an example to show that Z, covariaace stationary does not imply X, is covariance stationary.

5. Give a covariance stationary time series such that p(h) # 0 for ad h.

6. Which of the following functions is the covariance function of a stationary time series? Explain why or why not.

Page 42: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 19

(a) g(h) = 1 + Ihl, h = 0 , t l , t 2 , . . . ;

(b)g(h)= -+, h = - C l ,

(c) g(h)=l+$sin4h, h=0 ,+1 ,+2 ,... ; (d) g(h)= 1 + $ cos4h, h =0,+1,+2, . . .;

1 , h = O ,

{ 0 otherwise;

1 , h = 0 , + 1 , 0 otherwise.

7. Let e, be time series of normal independent (0, a') random variables. Define

e, ifcisodd, Y, ={ -e, iff is even.

Is the complex valued time series Z, = e, f iU, stationary?

8. Let

1 , h = O ,

0 otherwise.

(a) For what values of a is p(h) the autocorrelation function of a stationary

(b) Compute the partial autocorrelation function associated with p(h) , given time series?

that a is such that p(h) is an autocorrelation function.

9. Let the time series $ be defined by

U, = e , cos 0.5m + e2 sin 0.5.rrt + e,

for f = 3,4, . . . , where e, - NI(0,l) and -M(O, 1) denotes normally and independently distributed with zero mean and unit variance. Compute the partial autocorrelation function of U, for h = 1,2,3,4.

10. Show that if /312.3 of (1.4.4) is zero, then i s also zero. Extend this result to the pth order partial regression coefficients ., p-l and P,,l.2, , ,p-l.

11. Assume that for the stationary time series U,, there is some r such that

correlation between Y, and (Y2,. . . , Y,). Show that I& - 1)1= 1 and that the correlation matrix in (1.4.1 1) is singular for h 3 r.

2 R& ..,,) = 1, and R f ( 2 . 3 , . . , r - l ) C 1, where R,(2 ,3 , .,) is the squared multiple

Page 43: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

20 IN'TRODUCTION

12. Let {Z,} be a sequence of n o d independent (0.1) random variables defined on the integers. (a) Give the mean and covariance function of the time series (i) U, = Z,Z,-,

(b) Is the time series and (ii) X, = 2, + Z,- I .

t even,

covariance stationary? Is it strictly stationary?

Page 44: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

C H A P T E R 2

Moving Average and Autoregressive Processes

In any study, one of the first undertakings is to describe the item under investigation, Typically there is no unique description and, as the discipline deveiaps, alternatives appear that prove more useful in some applications.

"he joint distribution function (1.2.1) gives a complete description of the distributional properties of a time series. The covariance function provides a less complete description, but one from which useful conclusions can be drawn. As we have noted, the correlation function has functional properties analogous to those of the characteristic function of a statisticat distribution function. Therefore, the distribution function associated with the correlation function provides an alter- native representation for a time series, This representation will be introduced in Chapter 3 and discussed in Chapter 4.

Another method of describing a time series is to express it as a function of more elementary time series. A sequence of independent identically distributed random variables is a very simple type of time series. In applications it is often possible to express the observed time series as a relatively simple function of this elementary time series. In particular, many time series have been represented as linear combinations of uncorrelated (0, (r2) or of independent (0, a') random variables. We now consider such representations.

2.1. MOVING AVERAGE PROCESSES

We shall call the time series {XI : f E (0 .2 1-22, . . .)), defined by

(2.1.1)

where M is a nonnegative integer, a, are real numbers, a_, # 0, and the e, are uncorrelated (0, a') random variables, a finite moving average time series or a jinite moving average process.

21

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 45: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

22 MOVING AVERAGE A M ) AUTOREGRESSIVE PROCESSES

If there exists no finite A4 such that t y ~ = 0 for all j with absolute value greater than M, then {XI} is an infinite moving average process and is given the representation

The time series defined by

X, = C , jE.0

(2.1.2)

(2.1.3)

where a, # 0 and aM # 0, is called a one-sided moving average of order M . Note that the number of terms in the sum on the right-hand side of (2.1.3) is M + 1. In particular, (2.1.3) is a kfi-moving average, since nonzero weights are applied only to e, and to e's with indexes less than 1. We shall most frequently use the one-sided representation of moving average time series. Note that there would be no loss of generality in considering only the one-sided representation. If in the representation (21.1) we define the random variable e, by e, =e l+M, we have, for (2.1.1),

A4 2M

(2.1.4)

where $I, = a#+.

is not one, we simply define Likewise we can, without loss of generality, take B, in (2.1.4) to be one. If &,

U P S = B , L 9

6*=/i?sp;', s=0,1,2 ,..., 2M,

and obtain the representation 2H

XI = c 6*ul-s, (2.1.5) r=O

where S, = 1 a d the u, are uncorrelated (0, jl&2) random variables.

is zero for all h satisfying [hi > M. For the time series (2.1.3) we have The covariance function of an Mth order moving average is distinctive in that it

E{XJ = 0 9

(2.1.6)

Page 46: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MOVING AVERAGE PROCESSES 23

As an example, consider the simple time series

Clearly,

and

By (2.1.6) only the first autocornlation of a first order moving average is nonzero, while all autocorrelations of order greater than two for a second order moving average are zero. For a q& order moving average, p(q) # 0, and all higher order autocorrelations are zero. There are further restrictions on the Correlation function of moving average time series. Consider the first order moving average

X,=e ,+ae , - , . (2.1.7)

Then

a p(l)=-

l f f f Z ' (2.1.8)

Using elementary calculus, it is easy to prove that p( 1) achieves a maximum of 0.5 for a = 1 and a minimum of -0.5 for a = -1.

Thus, the first autocorrelation of a first order moving average process will always fall in the interval [-0.5,0.5]. For any p in (0.0.5) [or in (-0.5, O)] there are two a-values yielding such a p, one in the interval (0.1) [or in (- 1,O)) and one in the interval ( 1 , ~ ) [or (-03, -I)].

Example 21.1. In Figure 2.1.1 we display 100 observations from a realiza- tion of the time series

X, = e, + 0.7et-, ,

where the el are normal independent (0,l) random variables. The positive correlation between adjacent observations is clear to the eye and is emphasized by the plot of XI against XI-, of Figure 2.1.2. The correlation between X, and X,-, is (1.49)-'(0.7) = 0.4698, and this is the slope of the line plotted on the figure. On the other hand, since the e, are independent, X, is independent of and this is clear from the plot of X, against XI-* in Figure 2.1.3.

Page 47: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

24

2

xt 0

-2

MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

I

I I I I i I I 0 15 30 45 60 75 90

Time Flgun 2.1.1. One hundred observations from the time series X, = e, + 0.7e,- ,.

While the simple correhtion between X, and XI - 2 is zero, the partial correIation between XI and X,-.z adjusted for X,-, is not zero. The correlation between XI and Xt- l is p( l )=(l + Cy2)-la, which is the same as that between X,-l and Xg-2. Therefore, the partial correlation between X, and X,-2 adjusted for X I - , is

COV{X, - cxI-I,xf-z - CX,-J - a' - - var(x, - cx, - 1 1 1 + a z + Cy4 *

where c = (1 f a2)-'(u. In Figure 2.1.4, X, - 0.7( 1 .49)-'X, - I is piotted against XI -2 - 0.7( 1 ,49)-'X,-

for the time series of Figure 2.1.1. For this time series the theoretical partial autocorrelation is -0.283, and this is the slope of the line plotted in Figure 2.1.4.

AA

&ample 2.1.2. One hundred observations from the second order moving average

XI = el - 1.40e,-, + 0.4&,-,

are displayed in Figure 2.1.5. The correlation function of this process is

p(h)f - 0.65, h = 1 , = 0.15, h = 2 , = 0 , h L 3 .

Once again the nature of the correlation can be observed from the plot. Adjacent

Page 48: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MOVING AVERAGE PROCESSES 25

3 1

-3 -2 -1 0 1 2 3 xt-1

Figure 21.2. Plot of X, against X,-, for 100 observations from X, = e, + 0.7e,- ,.

observations are often on opposite sides of the mean, indicating that the first order autocorrelation is negative. This correlation is also apparent in Figure 2.1.6, where X, is plotted against X, -, . AA

At the beginning of this section we defined a moving average time series to be a linear combination of a SeQuence of uncorrelated random variables. It is also common to speak of forming a moving average of an arbitrary time series X,. Thus, Y, defined by

is a moving average of the time series XI. The reader may prove:

Lemma 21.1. If {X,: t E (0,+1,+_2,. . .)} is a stationary time series with mean zero and covariance function ~ ( h ) , then

Page 49: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

26 MOVING AVERAGE AND AUTOREORESSIVE PROCESSES

Figure 2.13. Plot of X, against X,..z for 100 observations from X, = e, + 0 . 7 ~ ,-,.

M

u,= Z: G t + i 9

i = - M

where M is finite and the a, are real numbers, is a stationary time series with mean zero and covariance function

M M

22. ABSOLUTELY SUMMABLE SEQUENCES AND INFINITE MOVING AVERAGES

Before investigating the properties of infinite moving average time series, we review some results on the convergence of partial sums of infinite sequences of real numbers.

Recall that an injhire sequence is a function whose domain is the set

Page 50: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ABSOLUTELY SUMMAELE SEQUENCES

3-

2-

n

+ + +s + +

-2

-3

EvgUre 2.1.4. Plot of X, -0.47X,-, against X,-2 -0.47X,-, for LOO observations from X, = e, + 0.7e ,-,.

+ + + * + .I. + -

+ +

-

I I I I I I

T, = (1,2,. . .) of natural numbers. A &z.&ly inJinite sequence is a function whose domain is the set T2 = (0, 21, 22,. . .) of all integers. When no confusion will result, we shall use the word sequence for a function whose domain is either T, or T,. We denote the infinite sequence by {u,},?=, and the doubly infinite sequence by {aj},----. When the domain is clear, the notation will be abbreviated to {uj} or

The inJinite series created from the sequence {uj}/"-, is the sequence of partid Perhaps to uj.

sums {sj}lm_, defined by

(2.2.1)

The symbols Z,-=, uj and X{uj} are often used to represent both the infinite sequence {si}/"ol generated by the sequence {u,} and the limit

Page 51: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

28

xt

Time Figure 215. One hundred observations from the time series X, = e, - 1.4Oe,-, + 0.48e,-,.

when the limit is defined. If the sequence {s,) has a finite limit, we say that the series Z{uj} is convergent. If the limit is not defined, the series is divergent.

If the limit

is finite, the series X{a,> is said to be absolutely convergenr, and we write XJwx, Iu,I < 00. We also describe this situation by saying the sequence {uj} is absolutely summable. For a doubly infinite sequence, if the limit

Page 52: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ABSOLUTELY SUMMABLE SEQUENCES 29

-3

-2 t + +

++

- +

I t I t + I I

exists and is finite, then the series is absolutely convergent, and we write

and say that the sequence { Q ~ } is absolutely summable,

useful in our investigation of moving average time series. The following elernenmy properties of absolutely summable sequences will be

1 . If (u,} is absolutely summable, then

Page 53: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

30 MOVING AVERAGE AND AUTOREGRESSIVE PRoceSSES

The reader can verify that the converse is not hue by considering X;= j -' and ZJm=, j - ' .

2. Given two absolutely summable sequences {uj} and {bj}, then the sequences {aj + b,} and {uJbj} are absolutely summable:

m m m m

m m m

Of course, XT=-.. (lujl + [bj1)2<m, since {(lujl + 16jl)2) is the square of an absolutely summable sequence.

3. The convolution of two absolutely summable sequences {aj} and {bj}, defined by

is absolutely summable:

This result generalizes, and, for example, if we define m

where {u,}, {bj}, and {cj} are absolutely summable, then

j = - *

The infinite moving average time series

x,= 2 aje,-;, r=0,+1, t2 , ..., I = - m

was introduced in equation (2.1.2) of Section 2.1. In investigating the behavior of infinite moving averages we will repeatedly interchange the summation and expectation operations. As justification for this procedure, we give the following theorems. The theorems may be skipped by those not familiar with real analysis. The organization of the proofs follows suggestions made by Torres (1986) and Pantula (1988a). We begin with a lemma on almost sure convergence of a sequence of random functions.

Page 54: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ABSOLUTELY SUMMABLE SEQUENCES 31

Lemma 2.2.1. Let {q} be a sequence of random variables defined on the probability space (a, Sa, P). Assume

Then )=I"- is defined as an almost sure knit and

h L Let a nondecreasing sequence of random variables be defined by

U W ) = 2 IZ,Wl j = - k

Therefore, the limit as k -+ CQ is defined. By assumption, E{IZ,l} is finite for each j and it follows that

E{Y,} C M <

for some M and all k. Therefore, by the monotone convergence theorem,

See, for example, Royden (1989, p. 265). AIso C,m=-= 151 < as. Therefore,

and

where S,, = X;=-,, q. It follows, by the dominated convergence theorem, that

A

Theorem 2.2.1. Let the sequence of real numbers {a,} and the sequence of random variables {Z,} satisfy

m

j - -m

Page 55: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

32 MOVING AVERAGE AND AUTOREORESIVE PROCESSES

and E{Z:} 6 K, t = 0,k 1.22, . . . , for some finite K. Then there exists a sequence of random variables {XI ) such that, for r = 0,” 1,+2,. . . ,

m

X, = 2 ajZ,-, as., 1 m - m

and E{X:} d M 2 K .

Proof. By Lemma 2.2.1, X, is well defined as an almost sure limit and

Because (p,l- 1q1)’ B 0, we have E{lZll lZ,l} d K for all t, j . It follows that

and applying Lemma 2.2.1, we have that

and

for all t = 0, 2 1,+2,. . . . It follows that

and XI,,,,,luj( converges to zero as n -+m. A

Because almost sure convergence implies convergence in probability, the sequence of random variables {X,} of Theorem 2.2.1 is defined as a limit in probability, as a limit in mean square, and as an almost sure limit. The second moment condition is required for convergence in mean square, and the first

Page 56: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ABSOLUTELY SUMMABLE SEQUENCES 33

absolute moment is required for almost sure convergence. Hereafter, we shall often use the infinite sum with no modifiers to represent the limit random variable.

Theorem 23.2. Let the sequence of real numbers {ul}, {b,} be absolutely summable. Let {Z,} be a sequence of random variables such that E{Z:} 6 K for r=0,+-1,+2 ,..., and for some finite K. Define

Then

and

Proof. The expectation result follows from Lemma 2.2.1 and the proof of Theorem 2.2.1. By assumption,

and the conclusion follows by Lemma 2.2.1. A

Corollary 2.2.2.1. Let the sequences of real numbers {u,} and (6,) be absolutely summable. Let {e,} be a sequence of uncorrelated (0, a') random variables, and define for t = 0, 5 1, +2, . . .

Then, E{X,} = 0 and co

E{X,Y, }=uZ 2 a,bj. j - -m

Proof. The sequence {el} satisfies the hypotheses of Theorem 2.2.2 with A K = uz. Since E{e,} = 0 and E{e,ej} = 0 for t Z j , we have the results.

Corollary 2.2.2.2. Let {X,} be given by m

Page 57: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

34

and define

MOVING AVERAGE AND AUTORWRESSIVE PROCESSES

Y, = bjXt-, , j - - m

where {e,} is a sequence of uncorrelated (0, a’) random variables and {aj} and {b,} are absolutely summable sequences of real numbers. Then {Y,} is a stationary moving average time series and

m m

m

(2.2.2)

m

h=O

Proof. By definition, m W

Since {a,} and {b,} are absolutely summable, we my interchange the order of summation (Fubini’s theorem). Setting m = j + k, we have

m m

m p - m

where c, = 2,m_-m b,a,-j. By result 3 on the convolution of absolutely summble sequences, {c,} is absolutely summable. Therefore,

2 cmcm-hu m‘-W

is also absolutely summable. Using Theorem 2.2.2,

00

2 y#)= 2 c,c,-hu ,=-m

m

Page 58: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ABSOLUTELY SUMMABE SEQUENCeS 35

m m m

m m

A

We may paraphrase this corollary by saying that an infinite moving average (where the coefficients are absolutely summabie) of an infinite moving average stationary time series is itself an infinite moving average stationary time series. If the {uj} and (6,) are exponentially declining, then {c,} of Corollary 2.2.2.2 is exponentially declining.

Corollary 2.2.23. Let the {uJ} and {bj) of Corollary 2.2.2.2 satisfy

IujI <MA'" and ibjl <MA'"

for some M < 00 and 0 < A < 1. Let Y, be as defined in Corollary 2.2.2.2. Then

where c, = ZJm= __ bJa, ._ and Ic, I < M, A'"' for some M, < 00.

Pmof. We have m

Ic,~ = I 2 bja,-i 1 S M 2 A C l ( m + 1 f 2 AZJ) . j = - m j = I

A

In Theorems 2.2.1 and 2.2.2, we made mild assumptions on the Z, and assumed the sequence of coefficients, {a,}, to be absolutely summable. In Theorem 2.2.3, we give a mean square result for the infinite sum of the elements of a time series under weaker assumptions on the coefficients.

Theorem 2.2.3. Let {aj} be a sequence of real numbers such that {a;} is suminable. Let Z, be the time series

m

where (e,} is a sequence of uncorrelated (0,(r2) random variables and {bj} is absolutely swnmable. Then there exists a sequence of random variables {X,} such that for t =o, 51.22, * . . ,

Page 59: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

36

E{X,) = 0, and

MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

(2.2.3)

Proof. By the Cauchy-Schwa inequality, we have, for R > m and h > 0,

Since 2 u: < =, given E > 0, there exists an No such that, for all n > m >No,

l J ~ + l a J u j + h l <' (2.2.5)

uniformly in h. Let Yn, = X;= -,, ujZI-, . For n > m >No,

Now

where

by (2.2.5). Because, by Corollary 2.2.2.2, I Ix(h)l C 00, it follows that the right side of (2.2.6) converges to zero in mean square as rn + m and n +m. Therefore, q,f is Cauchy in squared mean and there exists a random variable XI such that E{X:) < ~4 and

E{(Xf - Ynl)'} -+ 0 as R -+ 00 9

See, for example, Tucker (1967, p. 105). Also, E{X,}=lim,,, E{Yn,}=O and E~X,X,+, ) = limn+ mnlYnl+h).

Page 60: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ABSOLUTELY SUMMABLE SEQUENCES 37

We write m m m

where

aj if I j l d n , o if l j l > n .

djn =

Since d,,, and b, are absolutely summable, we have m m m

where ob m

Also,

Therefore,

m

because {m,} is square summable. See Exercise 2.45. Therefore, by the dominated convergence theorem,

Corollary 2.23. Let { c ~ } : ~ be square summable and let {dj}z- be absolutely summable. Let

m

Z~ = 2 cJe , , , ,=-m

Page 61: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

38 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

where e, is a sequence of uncorreiated (0, (r2) random variables. Then there exists a sequence of random variables {XI}:- such that

E(X} = 0, and

Proof. From Theorem 2.2.3, we have that 2, is well defined in mean square with E{Z,} = 0 and ~ ( h ) = u2 Xi"p-- cJc +h. Now, the existence of the sequence of random variables {XI}:- such that E{Xj = 0 and

n-rw

follows from Theorem 2.2.1. Furthermore, from Theorem 2.2.2, we have

where gj = 2;- -=ckd,+. Using the dominated convergence theorem, we have

n OD

Page 62: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

AN INTRODUCTION TO AUTOREGRESSIVE W E SERIES 39

Therefore,

= O . A

23. AN INTRODUCTION TO AUTOREGRESSIVE TIME SERIES

Many time series encountered in practice are well approximated by the representa- tion

(2.3.1) i=o

where % # 0, a,, # 0, and the e, are uncorrelated (0, a*) random variables. The sequence {XI} is called a pth order autoregressive time series. The defining equation (2.3.1) is sometimes called a stochastic dwerence equation.

To study such processes, let us consider the first order autoregressive time series, which we express in its most common form,

X t = p X I - , + e l . (2.3.2)

By repeated substitution of

for i = 1,2,, . . ,N into (2.3.2), we obtain

N- I

X, =pNx,_N + 2 ple,-, 130

Under the assumptions that IpI < 1 and E{X:} < K < 00, we have

(2.3.3)

(2.3.4)

Thus, if el is defined for t E{O, +1,+2,. . .) and X, satisfies the difference equation (2.3.2) with lp( < 1, then we may express XI as an infinite moving average of the e,:

X, = C p i e r u i .

Page 63: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

40 MOVING AVERAGE AND AUTOREaReSSIVE PROCESSES

It follows that E{X,) = 0 for all t , and

(2.3.5)

for h = 0,1, . , . . The covariance function for h # 0 is also rapidly obtained by making use of a fonn of (2.3.3),

h - I

X, = phX,-, + c, pJe,- j , h = 1,2,. . . . J 50

(2.3.6)

Since is uncorrelated with ey-h+l , er-h+z. . . . . Therefore, we have, after multiplying both sides of (2.3.6) by X,-,, and taking expectations,

is a function of e's with subscript less than or equal to t - h,

= phr (0 ) 1 h = 1,2,. . . . (2.3.7)

The correlation function is seen to be a monotonically declining function for p>O, while for p<O the function is alternately positive and negative, the absolute value declining at a geometric rate.

Example 2.3.1. Figure 2.3.1 displays 100 observations from the first order process

XI = 0.7X, - I + el . The current observation, X,, is plotted against the preceding observation, X,- I , in Figure 2.3.2 and against in Figure 2.3.3. The correlation between X, and Xt-.2 is (0.7)2 = 0.49, but the partial correlation between XI and adjusted for X,-, is zero, since

COV{X, - px, - I , XI - 2 - px,- I } = ( p 2 - p2 - p2 + p2)r(0) = 0 .

The zero partial correlation is illustrated by Figure 2.3.4, which contains a plot of X, - 0.7X,- I against X,-2 - 0.7X, - I . In fact, for any h E (2,3, . . .), the partial correlation between XI and XI-,, adjusted for X,-, is zero. This follows because

cov{x, - p x ( - ~ ~ x ~ - h - p h - l X l - I }

= ( ph - p h - ' p - pph-' + ph)r(0) = 0 , h 2,3, . . . . This important property of the first order autoregressive time series can be

Page 64: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

DIFFERENCE EQUATIONS 41

I I I I I I

Time 0 15 30 45 60 75 90

Figure 2.3.1. One hundred observations from the time series X, = 0.7X,-, + e,.

characterized by saying that all the useful (correlational) information about X, in the entire realization previous to X, is contained in X,-,. Compare this result with the partial correlational properties of the moving average process discussed in Section 2.1. AA

Linear difference equations and their solutions play an important role in the study of autoregressive time series as well as in other areas of time series analysis. Therefore we digress to present the needed elementary results before studying higher order autoregressive processes. The initial development follows Goldberg (1958).

2A. DIFFERENCE EQUATIONS

Given a sequence ( Y ~ } ~ - ~ , the first difference is defined by

AY, = Y , - Yt- I 9 t = 1,2,. . . , (2.4.1)

Page 65: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

42 MOVINO AVERAGE AND AUTORBORbSSIVE PROCESSES

and the nth dverence is defined by

n

Any, = A'-'y, - An-'y,- , = c (- l)r&I-r, t = n, n + 1, . . . , r=O

where

are the binomial coefficients. The equation

where the a, are real constants, a,, #0, and rf is a real function of t , is a linear diflerence equation of order n with constant coefficients. The values y I - , , y t V 2 , . , . , yI-, are sometimes called lugged values of yf. Equation (2.4.2)

Page 66: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

DIFERENCE EQUATIONS 43

could be expressed in terms of yf-, and the differences of y,; for example,

Anyf + b, A"-'yr-, + b, A''-2yt-2 + * * * + bn-f Ay,-,,+l + b,y,-, - - r , ,

(2.4.3)

where the b,'s are linear functions of the a,'%

A third representation' of (2.4.2) is possible. Let the symbol 98 denote the operation of replacing yf by y+,; that is,

9Byi = Yr- 1 =

' The reader may wMtder at the benefiu associated with several alternative notations. AII are heavily used in the literature and all have advantages for certain operations.

Page 67: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

44 MOVING AVERAGE AM) AUTOREGRESSIVE PROCESSES

+

Figure 2.3.4. Plot of X, - 0.7X,.., against X,..a - 0.7Xf-, for 100 observations from X, = 0.7X,-, + e,.

Using the backward shift operator 53, we have

A Y ~ = Y ~ - % ~ = ( ~ - W Y , . As with the difference operator, we denote repeated application of the operator with the appropriate exponent; for example,

g 3 y , = a@*, = y1-3 .

Therefore, we can use the operator symbol to write (2.4.2) as

(1 + a , 9 + a , B 2 + * * * +aR9In)y, = r , , t = n,n + 1,. . . * Associated with (2.4.2) is the reduced or homogeneous difference equation

y , + (I) y , - 1 + a2y1 -2 + * * * + anYl-n = 0 . (2.4.4)

Page 68: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

DIFFERENCE EQUATIONS 45

From the linearity of (2.4.2) and (2.4.4) we have the following properties,

1. If y ( I ) and y ( , ) are solutions of (2.4.4), then b , y " ) + b,y'" is a solution of

2. If Y is a solution of (2.4.4) and y t a solution of (2.4.21, then Y + y t is a (2.4.4), where b, and b, are arbitrary constants.

solution of (2.4.2).

y t is called a particular solution and Y + y t a general solution of the difference equation (2.4.2).

Equations (2.4.2) and (2.4.4) specify y , as a function of the preceding n values of y . By using an inductive proof it is possible to demonstrate that the linear difference equation of order n has one and only one solution for which values at n consecutive t values are arbitrarily prescribed. This important result means that when we obtain n linearly independent solutions to the difference equation, we can construct the general solution.

The first order homogeneous difference equation

~ , = a y ~ - ~ , r=1,2, ..., (2.4.5)

has the unique solution

Y, = a'Yo 9 (2.4.6)

where y o is the value of y at t = 0. Note that if lal< 1, the solution tends to zero as t increases. Conversely, if b1> 1, the absolute value of the solution sequence increases without bound. If a is negative, the solution oscillates with alternately positive and negative values. The first order nonhomogeneous difference equation

Y,=aYf-I + b (2.4.7)

has the unique solution

(2.4.8) a = l .

If the constant b in the difference equation (2.4.7) is replaced by a function of time, for example,

y , = u y , - , + b , , t = l , 2 ,..., (2.4.9)

then the solution is given by , - I I .

y f = yoaf + a'bf-j , t = 1,2, . . . , J"0

(2.4.10)

where yo is the value of y, at time 0.

Page 69: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

46 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

The solutions (2.4.6), (2.4.8), and (2.4.10) are readily verified by differencing. The fact that the linear function is reduced by differencing to the constant function is of particular interest. The generalization of this result is impartaat in the analysis of nonstationary time series.

Theorem 2.4.1. Let yf be a polynomial of degree n whose domain is the integers. Then the first difference Ay, is expressible as a polynomial of degree n - 1 in t.

proot. since

and

we have

where yn - = nPn . A

It follows that by taking repeated differences, one can reduce a polynomial to the zero function. We state the result as a corollary.

Corollary 2.4.1. Let yf be a polynomial of degree n whose domain is the integers. Then the nth difference Any, is a constant function, and the (n + 1)st difference A"' *y, is the zero function.

In investigating difference equations of higher order, the nature of the solution is a function of the auxiliary equarion, sometimes called the characteristic eqwtion. For equation (2.4.2) the auxiliary equation is

m" + u l m n - ' + *.. +an = o . (2.4.1 1)

Page 70: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

DIFFERENCE EQUATIONS 47

This polynomial equation will have n (not necessarily distinct) mots, and the behavior of the solution is intimately related to these roots.

Consider the second order linear homogeneous equation

where a, # 0. The two mots of the auxiliary equation

m * + a , m + n , = O

(2.4.12)

(2.4.13)

will fall into one of three categories:

1, Real and distinct. 2. Real and equal. 3. A complex conjugate pair.

The general solution of the homogeneous difference equation (2.4.12) for the three categories is:

1.

y , = b ,m: + b ,m; . (2.4.14)

where b , and b, are constants determined by the initial conditions, and m, and m2 are the roots of (2.4.13).

2.

y , = (b, + b , W 7

where m is the repeated root of (2.4.13).

(2.4.15)

3.

y , = bl;ml -t b , m i , (2.4.16)

where bl; is the complex conjugate of b, and m2 = m: is the complex conjugate of m I .

If the complex roots are expressed as

~(COS 6 t i sin 0) ,

where r = a:'' = tm, I and

(cos e, sin 8 ) = (2r)-'(ai, la: - 4a,1'"),

Page 71: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

48 MOVING AVERAGE AND AUTOREGRESSIVE PROCESS

then the solution can be expressed as

(2.4.17)

where d,, d,, g,, and g, are real constants. Substitution of (2.4.14) into (2.4.12) immediately establishes tbat (2.4.14) is a

solution and, in fact, that mi and mi are independent solutions when m, # m2. For a repeated root, substitution of (2.4.15) into (2.4.12) gives

(6, + b2t)m'-2(m2 + u p + a,) - b,m'-2(u,m + h,) = 0 .

The first term is zero because m is a root of (2.4.13). For m a repeated root, mu, + 2u, = 0, proving that (2.4.15) is the general solution.

As with unequal real roots, substitution of (2.4.16) into (2.4.12) demonstrates that (2.4.16) is the solution in the case of complex roots.

The solution for a linear homogeneous Metence equation of order n can be obtained using the roots of the auxiliary equation. The solution is a sum of n terms where:

1. For every real and distinct root m, a term of the form bm' is included, 2. For every real root of order p (a root repeated p times), a term of the form

is included. 3. For each pair of unrepeated complex conjugate roots, a term of the form

ar' cos(te + @)

is included, where t and 0 are defined following (2.4.16). 4. For a pair of complex conjugate roots occurring p times, a term of the form

r ' [ q m(t6 +pi) + ajtcos(f6 + 4) + * * * + qP-' cos(& + p,)]

is included.

yo. y , , . . . , y n - , , a particular solution is For the nonhomogeneous difference equation (2.4.2), with initial conditions

I

y ; = z w j r ' - j , t = n , n + 1 I . . . ,

j - 0

where w,, = 1, i

C a,wj+ = 0 , j = I , 2, . . , n - 1 , r = O

n

~ u l w , - , = ~ , j = n , n + 1 ,..., 1x0

Page 72: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

DIFFERENCE EQUATIONS 49

and ro, rl, . . . , r,- , are defined by I

y' = c wjr'-j, t = 0,1,. . . , n - I . / - 0

Observe that the weights wI satisfy the same homogeneous difference equation as the y,. Therefore, wj can be expressed as a sum of the four types of terms that define the solution to the homogeneous difference equation.

Systems of difference equations in more than one function of time arise naturally in many models of the physical and economic sciences. To extend our discussion to such models, consider the system of difference equations in the two functions ylr and yzr:

Y l f =allYI.f-l +alZYz,l-l +b,' 9

Y 2 t = ~ z r Y l , f - l +%2YZ,f-I + b 2 f , (2.4.18)

where a , , , a,,, a,,, a22 are real constants, b,, and b,, are real functions of time, and t = 1.2,. . . . The system may be expressed in matrix notation as

Y, = AY,- I + bl7 (2.4.19)

where

Y: = ( Y I I I Y2') 9

A =("I a21 a,, ""1 and bi = (b l f , b,,).

difference equation, we might postulate that the homogeneous equation Proceeding from the solution (2.4.6) of the first order homogeneous scalar

Yf =Ay,-, (2.4.20)

would have the solution

Yr = A'Y, * (2.4.21)

and that the equation (2.4.19) would have the solution l- 1 . .

yf = A'y, + 2 A%,-, j - 0

(2.4.22)

where we understand that A' = I. Direct substitution of (2.4.21) into (2.4.20) and (2.4.22) into (2.4.19) verifies that these are the solutions of the matrix analogs of

In the scalar Case the behavior of the solution depends in a critical manner on the magnitude of the coefficient a for the first order equation and on the magnitude of the roots of the characteristic equation for the higher order equations. If the

the first order scalar equations.

Page 73: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

50 MOVING AVERAGE! AND AUTORBOReSSIVE PROCESSES

roots are all less than one in absolute value, then the effect of the initial conditions is transient, the initial conditions being multiplied by the product of powers of the roots and polynomials in t.

In the vector case the quantities analogous to the roots of the characteristic equation are the roots of the determinantad equation

/A - mI/ = 0.

For the moment, we assume that the roots of the determinantal equation are distinct. Let the two mts of the matrix A of (2.4.19) be m, and m2, and define the vectors q.] and q.2 by the equations

(A - m,I)q., = 0 , (A - m21)q., = 0,

q"q.1 = q.'zq.* = 1 . By construction, the matrix Q = (q.!, q.2) is such that

AQ=QM, (2.4.23)

where M = diag(m, , m,). Since the roots are distinct, the matrix Q is nonsingular. H-9

Q-'AQ = M . (2.4.24)

Define Z: = (z#,, 221) C: = (c,,, Czr) by

2, = Q-'Y, , c, = Q-lb,.

Then, multiplying (2.4.19) by Q-', we have

Z, = Q-'AQz,-, + C,

or

The transformation Q-I reduces the system of difference equations to two simple first order difference equations with solutions

(2.4.25) I

Page 74: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

DIFFERENCE EQUATIONS 51

where we have set y l0 = b , , and y,, = b,, for notational convenience, Therefore, the effect of the initial conditions is transient if 1.2, I < 1 and Im,l< 1. The solutions (2.4.25) may be expressed in terms of the original variables as

or I

yt = C QM*Q-%-, j = O

(2.4.26)

where the last representation follows from A' = QMQ-'QMQ-' = QM2Q-', etc. If the A X n matrix A has multiple roots, it cannot always be reduced to a

diagonal by a transformation matrix Q. However, A can be reduced to a matrix whose form is slightly more complicated.

Theorem 2.4.2. Let A, be the k, X ki matrix

0 0 0 * * - 0 m,

where A, =mi when ki = 1. Then there exists a matrix Q such that

Q-'AQ = = A , (2.4.27)

0 0 * * *

where Xi=, k, = n and the mi, i = 1.2,. . . , r, are the characteristic roots of A, If mi is a repeated root, it may appear in more than one block of A, but the total number of times it appears is equal to its multiplicity.

Proof. See Finkbeiner (1960) or Miller (1963). A

Page 75: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

52 MOVING AVERAGE AND AUMREf3RESSIVE PROCESSES

The representation A is called the Jordan canonical fonn. The powers of the matrix A, are given by

0 0

0 0 0 ... mi

. (2.4.28)

Therefore, the solution of the first order vector difference equation with initial conditions yo = bo is given by

I

y, = 2 QA'Q-'b,, , j - 0

(2.4.29)

and the effect of the initial cooditions goes to zero as t + m if al l of the roots of IA - rnII = 0 are less than one in absolute value.

To further illustrate the solution, consider a system of dimension two with repeated root that has been reduced to the Jordan canonical form:

Y , , - -my1,,-1 +Y2.PI 1

Y2, = mY2.r-1 +

It follows that

Y,, = Y20mf

Y I , - ~ Y , , , - ~ +yzoml-' 9

and y l , is given as the solution of the nonhomogeneous equation

-

whence

This approach may be extended to treat higher order vector difference equations. For example, consider the second order vector difference equation

Page 76: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

DIFFERENCE EQUATIONS 53

where A , and A, are k X k matrices and y; = (y,,, Y2,.. . . , Yk,). The System of equations may also be written as

x, =Ax,-, . (2.4.31)

where xi = (yi, Y:- ) and

(2.4.32)

Thus we have converted the original second order vector difference equation of dimension & to a first order difference equation of dimension 28. The properties of the solution will depend on the nature of the Jordan canonical form of A. In particular the effect of the initial conditions will be transient if all of the roots of IA - mII = 0 are less than one in absolute value. This condition is equivalent to the condition that the mots of

lImz - A,m - A,I = 0

are less than one in absolute value. To prove this equivalence, form the matrix product

(i A,ljl-')(A, -mI

=(A, - "Io+ A,rn-' I

- mI

and take the determinant of both sides. kt the nth order vector difference equation be given by

y , + A,y,-, + A2yI-2 + * * +A,y,-, = r, , t = 1,2,. . . , (2.4.33) where y, is a &-vector and r, is a real vector valued function of time. The associateh matrix is

A =

Then the solution is

-Ai -A2 * * * -An-, -A,,'

I 0 * - . 0 0

0 0 . * * I 0 .

Page 77: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

54 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

where the W, are k X k matrices defined by

W 0 = I , W, + A , = O ,

W2+A,W, + A 2 = 0 ,

W, +A,W,-, + * * * +AnWjAn =0, j = n , n + 1,. . . , 7 is the vector composed of the first k entries in the nk-vector x, = Aka, and x; = (y;, yl ,, . . , , y;-,,). It follows tbat W, is the upper left block of the matrix A’.

25. THE SECOND ORDER AUTOREGRESSIW TIME SERIES

We consider the stationary second order autoregressive time series defined by the stochastic difference equation

X , + a , X , - I + a j X , - 2 = e , , tE(O,k1,22 ,... 1, (2.5.1)

where the e, are uncmlated (0, a’) random variables, a2 # 0, and the foots of m2 + q m + a;_ = 0 are less than one in absolute value. Recalling that we were able to express the first order autoregressive process as a weighted average of past values of e,, we postulate that X, can be expressed as

m

X, = E wje,, , l = O

(2.5.2)

where the wj are functions of a, and %. We display this representation as

X, = woe, + w,e,- , + w2erm2 + W ~ C , - ~ + * - * , X,-, = woe,-, f ~ , e , - ~ + w2et-3 +... +

X,.-2 = woe,-’ + wle,-3 + * * 9 . If the X, are to satisfy the difference equation (2.5.1). then we must have

wo= 1 ,

w 1 + a l w o = o , (2.5.3)

wj + aIwj-I + %wj-2 =0, j = 2,3,. . . . Thus the w, are given by the general solution to the second order linear homogeneous difference equation subject to the initial conditions spified by the equations in w,, and w,. Therefore, if the roots are distinct, we have

- 1 j+r - I j+l wj =(ml -m2) m , +(m2 -m,) m, , j = O , 1,2. . . ,

Page 78: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE SECOND ORDER AUTOREORESSIVE TIME SERIES 55

and if the roots are equal,

wj=(l+j)m’, j=0 ,1 ,2 ,... . In Theorem 2.6.1 of the next section, we prove that the representation (2.5.2) holds for a stationary time series satisfying (2.5.1).

The covariance function for the second order autoregressive time series defined in (2.5.2) is

m

H h ) = o Z % w,wj-*, h=0,1,2 ,... . I‘h

(2.5.4)

We shall investigate the covariance function in a slightly different manner. Multiply (2.5.1) by X,+ (h 30) to obtain

(2.5.5)

Because X, is expressible as a weighted average of e, and previous e’s,

Thus, taking the expectation of both sides of (2.5.5). we have

That is, the covariance function, for h > 0, satisfies the homogeneous difference equation associated with the stochastic difference equation defining the time series.

The equations (2.5.6) are called the Yule-Wufker equations. With the aid of these equations we can obtain a number of alternative expressions for the autocovariances, autmrrelations, and coefficients of the original representation. Using the two equations associated with h = 1 and h = 2, we may solve for a, and

as functions of the autocovariances:

If we use the three equations (2.5.6) associated with h = 0,1,2, we can obtain expressions for the autocovariances in terms of the coefficients:

(2.5.8)

Page 79: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

56 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

Using fi0) and y(1) of (2.5.8) as the initial conditions together with the general solution of the homogeneous equation, we can obtain the expression for the covariance function.

However, it is somewhat simpler to derive tbe correlation function first. To obtain p(1) as a function of the coefficients, we divide equation (2.5.6) for h = 1 by HO), giving

P(1) + a1 + q P ( l ) = 0

and

(2.5.9)

Using (2.5.9) and p(0) = 1 as initial conditions, we have, for unequal roots (real or complex),

~ ( h ) = [(m, - m;)(l+ m,m,)l-’[m~+’(l - m i ) - m ! + l ( l - m t ) ~ , (2.5.10)

h=0,1,2 ,... . If the roots are complex, the autocorrelation function may also be expressed as

(2.5.1 1)

where

For roots real and equal the autocorrelation function is given by

P ( h ) = [ l + h ( 5 ) ] m h , h=0,1,2 ,... . (2.5.12)

If we substitute a, = -(m, + m2) and cu, = mlm2 into (2.5.8), we can express the variance as a function of the roots:

(2.5.13)

Page 80: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE SECOND ORDER AUTOREGRESSIVE TIMI3 SERIES 57

w o n (2.5.13) together with (2.5.10) and (2.5.12) may be used to express the covariance function in terms of the roots.

Example 25.1. Figure 2.5.1 contains a plot of the correlation function, sometimes called the correlogrurn, for the time series

The roots of the auxiliary equation are 0.8 and 0.6. The correlogram has much the same appearance as that of a first order autoregressive process with large p. The empirical correlograms of many economic time series have this g e n d appear- ance, the first few correlations being quite large.

The cornlogram in Figure 2.5.2 is that associated with the time series

X, = X,-, - 0.89X,-, + e, .

1.2

1 .o

0.8

p(h) 0.6

0.4

0.2

0.0 0 6 I? 18 24 30 36 42

h Figure 2.5.1. Cornlogram for X, = 1.4OX,-, - 0.48X,-, + e,.

Page 81: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

58

1 .o

0.5

p(h) 0.0

-0.5

-1.0

MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

0 6 12 18 24 30 36

Figure Ut. Com10gram for X, = X,- - 0.89X,-, + e,.

h

The roots of the auxiliary equation are the complex pair 0.520.86. In this case the conrelogram has the distinctive “declidng cyclical” appearance associated with complex mots. ’his correIogram illustrates how such a process can generate a time series with the appeamce of moderately regular “cycles.” In the present case cos 8=0.53 and 8=0.322 n, where cos B is &tined in (2.5.11). Thus the apparent period of the cyclical behavior would be 2(0.322)-’=6.2 time units. AA

2.6. ALTERNATIVE REPRESENTATIONS OF AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES

We have argued that the stationary second order autoregressive time series can dso be represented as an infinite moving average time series. We now prove that a time series satisfying a pth order difference equation has a representation as an infinite moving average.

Page 82: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ALTEWIATIVE REPUESENTATIONS OP MOVING AVERAGE PRoeEsSES 59

Theorem 2.6.1. Let {XI} be a time series defined on the integers with E{x~} < K for a11 c. Suppose X, satisfies

P ~ ~ + X q x , - ~ = e , , r=0,+1,+2 ,..., (2.6.1)

where {e,}, r = O,rtr1,+2,. . . , is a sequence of uncomlated (0, a’) random variables. Let m, , m,, . . . , mp be the roots of

j - I

P m p + C q m p - j = o , (2.6.2)

j - 1

and assume Im,l< 1, i = 1,2, . . . , p . Then X, is covariance stationary. Further- more, X, is given as a limit in mean square by

(2.6.3)

where { W ~ } ~ W = ~ is the unique solution of the homogeneous difference equation

wI+a,wjy i - l + * ~ * + C Y ~ W ~ - ~ = O , j = p , p + l ,..., (2.6.4)

subject to the boundary conditions wo = 1 and

i w + 2 a i w j _ , = o , j = 1 , 2 ,..., p - 1 .

j i = l

The limit (2.6.3) is also an almost sure limit.

Proof. LetY,=(X,,X,-, ,..., XI-p+l)’r r,=(e,,O, ..., 0)’,and

A =

-a, -a2 -a3 . - - - ap-l -up’

1 0 0 ..* 0 0

0 1 0 0 0

0 0 0 * . * 1 0 .

(2.6.5)

Then

r-1

Page 83: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

60 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

for n 9 1 and t = O , t l , r t 2 ,... . Therefore,

where A:ij) is the ijth element of the matrix A'. Consider the detenninant

0 -1 A * * . 0 0 . . . . . .

0 0 0 * * ' -1 A

Beginning with column 1, successively multipIy column i by A and add the result to column i + 1 to obtain

0 ... -1 0 0

0 ... 0 -1 0

0 ... 0 0 0

= ( r \ P + a , A p - ' + . * * + a p - l ~ + aP)(-1)'"-' .

By the Cayley-Hamilton theorem,

AP + a,AP-' + * * + ap-, A + apI = 0 .

For j P p multiply the matrix equation (2.6.6) by A'-' to obtain

A' + a,AJ-' + . . .

Set wj = A:, ,), the 6rst element of A'. Then

w, + aI wj- , + q w j - 2 + - * - + apwj-p = 0 , j * p .

(2.6.6)

(2.6.7)

Page 84: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ALTERNATIVE REPRESENTATIONS OF MOVING AVERAGE PROCESSES 61

It i s readily verified that wj = A{ l satisfies the initial conditions associated with (2.6.4) in the statement of the theorem, Therefore,

P n- I

and

for n = I , 2, . . . . Because each element of A" satisfies the homogeneous difference equation (2.6.7), we have

A { ! ~ ) i- a r , ~ { ; , . ; + - +- a,~{;: = o for j a p ,

and there exists a c such that lA{li)l <cA' for i = 1,2,. . . , p, where 1 >A>M and M is the Iargest of the absolute values of the roots m,. See Exercise 2.24. It follows that

E [ (x, - 2 Wje'-,)'] c ( C P A ~ ~ K J-0

+ O

as n 4 0 3 , because 0 < A < 1. Because ZJm=,, I w , ~ < m, X, Now,

is covariance stationary.

C cpAnK' l 2 ,

and it follows that the limit (2.6.3) holds almost surely. A

Because X, can be written as a weighted sum of current and past e,'s, it follows that c, is uncorrelated with X,+, for h = 1,2, . . . . This enables us to derive many useful properties of the autoregressive time series and of the autocovariance function of the autoregressive time series.

Corollary 2.6.1.1. Let X, be stationary and satisfy

X, + culX,-, + - - * + ( U , X ~ - ~ = e, ,

where the e, are uncorrelated (0 ,a2) random variables and the roots of the characteristic polynomial

mP + a ,mP- '+ * * * -t- ab = o

Page 85: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

62 MOVING AVERAGE AND AUMlReGRESSIVE PROCBSSES

are less than one in absolute value. Then

r(0) + ff,y(l) + * * + ff,y(p) = a2

and

~ ~ ) + f f , y ( ~ - l ) + ~ ~ ~ f ~ ~ ~ r ( ~ - ~ ) = O , h=1,2 ,... . (2.6.8)

Proof. Reserved for the reader. See equation (2.5.6). A

The system (2.6.8) defines the a’s in terms of the y’s and vice versa. For example, for a stationary third order autoregressive process, we have

and

(2.6.9)

Using the representation results, we show that the partial autocorrelation function, introduced in equation (1.4.12). of a pth order autoregressive process is zero €or k > p.

Corollary 2.6.1.2. Let X, be a stationary pth order autoregressive process with all of the roots of the characteristic polynomial less than one in absolute value, and let dyk) be the partial autocorrelation kction. Then

d y k ) = O , k = p f l , p + 2 ,.... Proof. By the &hition of tbe partial autocorrelation. #p+1) is the

correlation between the residual obtained in the population regression of XI on X, - I X, -,, . . . , X, -p and the residual obtained in the population regression of X, -p - I on X, - ,, X,-z, . . . , XI - p . By the definition of the autoregressive process, the residual obtained in the popuIation regression of X, on X,- Xf-2, . . . , is c,, and the coefficients are -a,, i = 1.2,. . . , p. By the representation of Thewem 2.6.1, e, is uncorrelated with Xr+ i 1 and hence with a linear combination of the

Page 86: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ALTERNATIVE REPRESENTATIONS OF MOVING AVERAGE PROCESSES 63

X,+ i b 1. Therefore, e,, of (1.4.11), and hence &k), is zefo for k = p + 1. If 4, = -a, for i = 1,2, . , . , p and 4, = 0 for i = p + 1, . . . , k, then e,+ I,,+ = 0 and the result follows. A

Whenever discussing stationary autoregressive time series, we have assumed that the roots of the auxiliary equation were less than one in absolute value. This is because we have explicitly or implicitly visualized the time series as being created in a forward manner. If a time series is created in a forward manner, the effect of the initial conditions will go to zero only if the mts are less than one in absolute value. For example, if we define

(2.6.1 1)

where it is understood that X, is formed by adding e, to pX, - and e, is a sequence of uncorrelated (0, a’) random variables, then {X, : t = 0,1,2, . . .} is stationary for lpl C 1 and a, = (1 - pZ)-I” . However, for IpI 3 1, the time series formed by adding e, to pX,-, is nonstationary for all a,.

On the other hand, there is a stationary time series that satisfies the difference equation

X,=pX,-, +e, , t = O , a l , + 2 ...., (2.6.12)

for lpl> 1. To see this, let us consider the stationary time series

X, = O.8Xp-, + E, (D

= 2 (0 .8)J~, , , r = o , +1,+2 ,..., (2.6.13)

where cr are unconeiated (0, u2) random variables. If we change the direction in which we count on the integers so that -r = t and t - I = --t + 1, and divide (2.6.13) by 0.8, we have

j - 0

XI+, = 1.25Xf - 1 . 2 5 ~ ~ (2.6.14)

and

(2.6.15)

By setting eIc, = -l.25ef, equation (2.6.14) can be written in the form (2.6.12) with p = 1.25.

The covariance function for the time series (2.6.15) is the same as the covariance function of the time series (2.6.13). Thus, if a time series {X, : t E (0, 2 1, 22, . . .)) has an autocorrelation function of the form plh’, where 0 < lpl< 1, it can be written as a forward moving average of uncorrelated random variables

Page 87: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

64 MOVING AVERAGE AND AUTOREGRESSIVE F'RocesSeS

or as a backward moving average of uncorrelated random variables. Likewise,

XI - pXl-, =e l (2.6.16)

defines a sequence of uncorrelated random variables, as does

x; - p - ' x t - , =z;. (2.6.17)

From the representations (2.6.13) and (2.6.15), one sees that the only stationary time series with a unit root is the trivial time series that is a constant (with probability one) far all t. See Exercise 2.19.

While both the e, of (2.6.16) and the Z, of (2.6.17) are uncorrelated random variables, the variance of Z, is larger than the variance of the e, by a factor of p-*. This explains why the representation (2.6.16) is the one that appears in the applications of stationary autoregressive processes. That is, one typically chooses the p in an equation such as (2.6.16) to minimize the variance of e,. It is worth mentioning that nonstationary autoregressive representations with roots greater than or equal to one in absolute value have appeared in practice. Estimation for such time Series is discussed in Chapter 10. That the stationary autoregressive time series can be given either a forward or a backward representation also finds some use, and we state the result before proceeding.

cordlary 2.6.13. Let the covariance function of the time series {X,: t E (0,+1, 52, . . .)) with zero mean satisfy the difference equation

P

where q, = 1 and the mots of the characteristic equation

are less than one in absolute value. Then X, satisfies the stochastic difference equation

n

where {el} is a sequence of umrrelated (0,a2) random variables, and also satisfies the stochastic difference equation

where {u,} is a sequence of uncorrelated (0,a2) random variables.

Page 88: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ALTERNATIVE REPRESENTATIONS OF MOVING AVERAGE PROCESSES 65

Proof. Omitted. A

Having demonstrated that a stationary finite autoregressive time series can be given an infinite moving average representation, we now obtain an alternative representation for the finite moving average time series. Because the finite moving average tim series can be viewed as a difference equation in e,, we have a result parallel to that of Theotem 2.6.1.

beo or ern 2.6.2. Let the time series (X, : t E (0, t 1, . . .) be defined by

X,=e,+b,e , - , +b2e , - ,+ . . .+b ,e , - , , t = 0 , & 1 , t 2 ,...,

where b, Z 0, the roots of the characteristic equation

are less than one in absolute value, and {e,} is a sequence of uncorrelated (0, d) random variables. Then X, can be expressed as an infinite autoregressive process

m

Z C ~ X , - ~ = e, , j - 0

(2.6.18)

where the coefficients cj satisfy the homogeneous difference equation

cj + b,cj- , + b,c,-, + - + bqcj-, = O , j =q. q + 1,. . . , (2.6.19)

with the initial conditions co = 1, c , = -b, ,

c,= -b,c, - b, ,

cq- , = -b,c,_, - b,cq-, - * * - bq-, *

Proof. Reserved for the reader. A

The autocorrelation function of the pth autoregressive process is nonzero for some integer greater than No for all No. See Exercise 2.33. By Corollary 2.6.1.2, the partial autocorrelation function of the pth order autoregressive process is zero for argument greater than p. The opposite conditions hold for the autocorrelations of the moving average process.

Comflsry 2.6.2.1. Let the time series X, be the qth order moving average defined in Theorem 2.6.2. Let No be given. Then there is some k > No such that the partial autocorrelation function 4(&) # 0.

hf. By our definition, X , = X:,, b,e,+, where b, = 1 and 6, # 0. If Qyk) =

Page 89: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

66 MOVING AVERAGE AND AUTORGORESSIVE PROCESSES

0 for all k >No, then the coefficients cj in Theorem 2.6.2 must be zero for j > No. By Theorem 2.6.2 the coefficients c, satisfy the difference equation (2.6.19). If al l cj are equal to zero for j >No, then cNo must be zero if it is to satisfy (2.6.19). Because bq # 0, this leads to the conclusion that all c, are zero, which contradicts

). A the initial condition for (co, c I , . . . , cq

In discussing finite moving averages we placed no restrictions on the co- efficients, and, for example, the time series

U, = e, - (2.6.20)

is clearly stationary. The root of the auxiliary equation for th is difference equation is one, and therefore the condition of Theorem 2.6.2 is not met. An attempt to express e, as an autoregressive process using that theorem will fail because the remainder associated with an autoregressive representation of order n will be ef-"-I*

Time series satisfying the conditions of Theorem 2.6.2 are sometimes called invertible moving averages. From the example (2.6.20) we see that not all moving average processes are invertibfe.

In our earlier discussion of the moving average time series we demonstrated that we could always assign the value one to the coefficient of e,. We were also able to obtain all autocorrelation functions of the first order moving average type for an a restricted to the range [- 1.11. We are now in a position to generalize this resuit.

Theorem 2.6.3. Given a time series XI with zero mean and autocorrelation function

1, h = O , p,(h)= dl), h = l , I 0 , h > l ,

where Ip(l)l< 0.5, there exists an a, 1.1 < 1, and a sequence of unmW random variables {e,) such that X, is defined by

XI = e l + cue,-, . pro~f. The equation ~ ( 1 ) = (1 + a2)-'a in a has one root that is less than

one in absolute value and one that is greater than one in absolute value. The root of smaller absolute value is chosen, and we define el by

By Theorem 2.2.1 this random variable is well defined as a limit in squared mean,

Page 90: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ALTERNATIVE REPRESENTATIONS OF MOVING AVERAGE PROCESSES 67

and by Theorem 2.2.2, m W

m m

X(h) = (-a)h'2j%(0) f ( I + a2)(-a)h-'+ZJ %(I) J"0 j = O

( I + a2)(-a)h-' a

1 - a 1 -a2 = 0 , h > O . A

The dualities between autoregressive and moving average representations established in Theorems 2.6.1 and 2.6.2 can also be described using formal operations with the backward shift operator. We recall that the first order autoregressive time series

K - PK- , = (1 - P W K = e, . / P I < 1 ,

can be written as

where it is understood that 53' is the identity operator, and we have written

( I - p a ) - ' = 2 (Bp)' 150

in analogy to the expansion that holds for a real number IpSeI 4 1. The fact that the operator can be formally manipulated as if it were a real

number with absolute value 1 furnishes a useful way of obtaining the alternative expressions for moving average and autoregressive processes whose characteristic equations have roots less than one in absolute value. Thus the second order autoregressive time series

XI + + a2Xl-, = e,

can also be written as

or

Page 91: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

68

or

MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

where rn, and m2 are the roots of the characteristic equation

rn2 + a,m + a, = 0 .

Since the roots of the quadratic in 98

1 + a , 9 + a , 9 ~ = 0

are the reciprocals of the mots of the characteristic equation, the restriction on the mots is often stated as the requirement that the roots of

I + a , B + a2i@' 50

be greater than one in absolute value. Using the backward shift operator, the invertible second order moving average

Y, = e, + b,e,- , + b 2 q 2

can be given the representation

y,=(l + b , B +b,S2)e,

or

r, = (1 - 8, W(l - g, w e , 5

g2 + b,g +b2 = o , where g, and g,, the roots of

are less than one in absolute value. The autoregressive representation of the moving average process is &en given by

The backward shift representation of the moving average process is useful in illustrating the fact that any finite moving average process whose characteristic equation has some roots greater than one and some less than one can be given a representation whose characteristic equation has all roots less than one in absolute value.

Theorem 2.6A. Let {X, : t E (0, f 1,22. . . .)} have the representation "

Page 92: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ALTERNATIVE REPRESENTATIONS OF MOVING AVERAGE PROCESSES 69

where the e, are uncomlated (0,~') random variables and none of the mi, i = 1.2,. . , , p, are of unit absolute value. Let rn,,m,, . . . , mL, O < L G p , be greater than one in absolute value. Then X, also has the representation

where the q are uncorrelated (0, cr'[IIf=, mil2).

prod. We define the polynomial 2( 8) in 3 by

The mots of 2(3) are greater than one in absoiute value, and the roots of mP$(m-') are less than one in absolute! value. Therefore the time series

where the wj satisfy the difference equation

and the initial conditions outlined in Theorem 2.6.2, is well defined. Let mi be a real root greater than one in absolute value, and &fine 2, by

Then the two time series Z, -m,Zf-l and ml(Zf - m ; ' Z , - , ) have the same covariance function. Likewise if m , and m, = rn: are a complex conjugate pair of mts with absolute value greater than one, 2 - (m, + mz)Z,- , + Im, 122f-z has the same covariance function as Im,1*[z, - (m;' + m ; l ) Z , - , + I~,~-~z,-~I. By repeated application of these arguments, we conclude that the original time

S e r i e s

Page 93: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

70

and the time series

MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

have the same covariance function. Therefore,

9 = 3 - I ( 3 ) X ,

and

have the same covariance function.

As an example of this theorem, the moving average

X, = el - 3.2er-, - 3.2e,-, = (1 - 4.0B)(l f 0.83)e1 ,

where the e, are uncorrelated (0, d), also has the representation

XI = e, + 0.55~,-, - 0.20~~,_,

= (1 - O.WSe)(l+ 0.8@)% ,

A

where the 6, are uncorrelated (0,160~). The reader may check that ~ ( 0 ) . ~ ( l ) , and ~ ( 2 ) an 21.48az, 7.04~~. and -3.20a2, respectively, for this moving average.

2.7. AUTOREGRESSIVE MOVING AVERAGE TIME SERIES

Having considend autoregressive and moving average processes, it is natural to investigate time series defined by the combination of low order autoregressive and moving average components. The sequence {XI : t E (0,L 1,22 , . . .)), defined by

where ap # 0, bq # 0, and (el} is a sequence of unconelated (0, a') random variables, is called an autoregressive moving average time series of order ( p , q), which we shall abbreviate to autoregressive moving average ( p , 4). The notation ARMA(p, q) is also commonly used.

The autoregressive moving average (1, l),

(2.7.2)

Page 94: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ALTOREGRESSIVE MOVfNG AVERAGE TIME SERIES 71

where lf?,l< 1, has furnished a useful approximation for some time series encountered in practice. Note that we have defined the coefficient on X,- I as -6, to simplify the representations to follow. If we let

u, = e, + b,e,-l

and write

x, = e,x,-, + u, ,

we can express XI of (2.7.2) as an infinite moving average of the u,, m

and hence of the el,

m

=el + (6, + b , ) x 6{-ier-J, j = I

(2.7.3)

where the random variables are defined by Theorem 2.2.1. If lb, I < 1, we can also express e, as an infinite moving average of the u,,

and hence of the X,, I)

e,=X,-(6, + b l ) x (-bi)’-’X,-j,

If we let 2, denote the first order autoregressive process

j - I (2.7.4)

then

X, =Z, + b ,Z , - , . (2.7.5)

That is, the time series can be expressed as a first order moving average of a first order autoregressive time series. Using this representation, we express the autocovariance function of X, in terms of that of Z,,

Page 95: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

72

to obtain

MOVING AVERAGE AND AUTOREGRESSIVE PRWSSES

Hence,

and

Thus, for h 3 1, the autocorrelation function of the autoregressive moving average ( 1,l) has the same appearance as that of a first order autoregressive time series in that it is declining at a geometric rate where the rate is 6,.

From equation (2.7.7) we see that ~ ( h ) is zero for h 3 1 if b, = -el. The autoregressive moving average (1,l) prucess reduces to a sequence of uncorre- lated random variables in such a case, which is also clear from the representation in (2.7.3).

To avoid this kind of degeneracy, the autoregressive moving average of order (p, q) is often defined with the condition that none of the roots of

are roots of

mq + 6,mQ-' + * 4- 6, = O . (2.7.9)

The autoregressive and moving average representations of (2.7.3) and (2.7.4) generalize to higher order processes in the obvious manner. We state the generalizations as theorems.

Theorem 2.7.1. Let the stationary time series X, satisfy

(2.7.10)

where a, = b, = 1, the roots of (2.7.8) are less than one in absolute value, and {e,} is a sequence of uncorrelated (0.a') random variables. Then X, has the

Page 96: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

AUTOREORESSIVE MOVING AVERAGE TIME SERIFS

representation

XI = C 0: yer-/ '

j = o

73

(2.7.1 1)

whereq,=l,v=Oifj<O,and

P

y = bj - C a i y - , i - I

with b j = O f o r j > q .

Proof. writing

P

C a 7 f - j = u, , j = O

we obtain

m m / a \

XI = 2 W ] U , - ~ = C w j ( , x bit-,,,) J 'o j - 0 i=o

where u, = Zymo hie,+ and the wj are defined in Theorem 2.6.1. Since the wj are absolutely summable, this representation is well defined as a limit in mean square. We then write

a D \

and obtain the result by equating coefficients of e, +, i = 0.1, . . . . A

The first few u, of Theorem 2.7.1 can be used to compute the first few covariances of the process. If we multiply the defining equation of Theorem 2.7.1 by for r = 0, 1, . . . , p and take expectations, we have

where it is understood that the summation on the right of the equality is zero for r > q.

Page 97: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

74 MOVING AVERAQE AND AUTDREORESSIVE PROCESSES

Theorem 27.2. Let the stationary time series XI satisfj D a

where u0 = b, = 1, the roots of (2.7.8) and of (2.7.9) are less than one in absolute value, and {e,} is a sequence of uncorrelated (0, cr') random variables. Then XI has the representation

(2.7.12)

wheredo=l,dj=Oforj<O, and 0

d, = - 2 bidj-, + uj , , = I

with a, = 0 for j >p.

Proof. Reserved for the reader. A

The backward shift operator is often used to give a compact representation for the autoregressive moving average. Let

A(m) = 1 + u,m + - - - + u p P

and

B(m) = 1 + b,m + * 3. b,m9

be the polynomials associated with the autoregressive moving average (2.7.10). Then we can write

A ( 8 ) x , =wb, (2.7.13)

or, if the roots of A(m) = 0 art! greater than one in absolute value,

XI = A-'( B)B( B)e, (2.7.14)

or, if the roots of B(m) = 0 are also greater than one in absolute value,

el = B-'( 8 ) A ( BP, . (2.7.15)

The representation (2.7.14) corresponds to (2.7.11), and the repmentation (2.7.15) corresponds to (2.7.12). Also see Exercise 2.26.

The equation (2.7.13) defines an autoregressive moving average constructed with el variables. By (2.7.14). we could apply the same moving average operator to any sequence. Thus, if C(m) and D(m) are polynomials whose roots are greater

Page 98: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

VECTOR PROCESSES 75

than one in absolute value, and if XI is the time series of (2.7.13), we might define the new time series Y, by

or

v, = c- I ( @)o( a>x, = C-'( 46))D(@)A-'( B)B( a ) e ,

=G- ' (B)H(g)e, ,

(2.7.16)

where G( 93) = C( B)A( a) and H( B) = o( a)B( a). If there are any common factors in G(B) and H(99), they can be removed to obtain the standard representation. On the basis of this discussion, we might say that an autoregressive moving average of an autoregressive moving average is an autoregressive moving average.

2.8. VECTOR PROCESSES

From an operational standpoint the generalizations of moving average and autoregressive representations to the vector case are obtained by substituting matrix and vector expressions for the scalar expressions of the preceding sections.

Consider a first order moving average process of dimension two:

x,, = bttet,t-, + h 2 e 2 . r - I +el , * (2.8.1)

The system may be written in matrix notation as

XI = Be,-l +el,

where {e,} is a sequence of uncorrelated (0,X) vector random variables,

and we assume B contains at least one nonzero element. Then the autocovariance function of X, is

Page 99: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

76

and it follows that

MOVING AVERAGE AND AUTOREORESSIVE PROCESSES

fBI;B' + 2, h = 0, h = l , h = - 1 , r (h) =

l o otherwise.

Note that r(h) = P(-h) , as we would expect from Lemma 1.7.1. The qth order moving average time series of dimension k is defined by

where the Bj are R X k matrices with B, = I and at least one element of B, not zero, and the e, are uncomlated (0,C) random variables. Then

B,ZB,!+h, 0 6 h "4, i -0

r ( h ) = q f h

Bj-,ZB;, -4 ZS h 0, I j = O

(2.8.2)

10 9 I h h .

As with the scalar process, the autocovariance matrix is zero for In investigating the properties of infinite moving average and autoregressive

processes in the scalar case, we made extensive use of absolutely summable sequences. We now extend this concept to matrix sequences.

q.

Deftnition 2.8.1. Let {Gj}%, be a sequence of k X r matrices where the elements of G, are gi,,,(j), i = 1.2,. . . , k, m = 1,2,. . . , r. If each of the kr sequences {gi,,,(j)},m=l is absolutely summable , we say that the sequence {G,},m=, is absolutely summable.

The infinite moving average m

x, = 2 Bier-/ j -0

where the el are unconelated (0 .2) random variables, will be well defined as an almost sure limit if {B,} is absolutely summable. Thus, for example, if XI is a vector stationary time series of dimension k with zero mean and absolutely summable covariance function and {T.},m- is an absolutely summable sequence of k X k matrices, then

Page 100: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

VECTOR PROCESSES 77

is well defined as an almost sure limit and

The vector autoregressive process of order p and dimension k is defined by

2 A,X,, = e, I

j-0 (2.8.3)

where the e, are uncomlated k-dimensional(0, Z) random variables and the Aj are k X k matrices with A, = I and A, # 0.

Any autoregressive process can be expressed as a first order vector process. Let Y, = (Xi, X I f - , , . . . , X,'-p+ ,)' and r, = (e:, 0,. . . , O)l, where the X, satisfy (2.8.3). Then we can write

Y, +AY,-, = e,,

This representation for the scalar autoregressive process was used in the proof of Theorem 2.6.1.

The process (2.8.3) can be expressed as an infinite one-sided moving average of the e, when the roots of the characteristic equation are less than one in absolute value.

Theorem 28.1. Let the stationary time series {X,: r E (0,+1,+2,. . .)} satisfy (2.8.3), and let the roots of the determinautal equation

lA,mP + A,mp-' f * - - + Apl = O (2.8.4)

be less than one in absolute value. Then X, can be expressed as an infinite moving average of the e,, say

m

Page 101: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

78 MOVING AVERAGE AND AUTOREGReSSIVE PROCESSES

where KO =I, K, = -Al ,

X2= -A,K, - A , ,

P

Proof. Essentially the same argument as used at the b e m g of Section 2.5 may be used to demonstrate that 2~~oKje , - j is the solution of the difference equation. Since the roots of the determinantal equation are less than one in absolute value, the sequence of matrices {K,) is absolutely summable and the random variables are well defined as almost sure limits. A

Given Theorem 2.8.1, we can construct the multivariate Yule-Walker equations for the autocovariances of the process. Multiplying (2.8.3) on the right by X;-$, s 3 0 and taking expectations, we obtain

D

where we have used

2 Ajr(j) = 2, i = o

These equations are of the same form as Corollary 2.6.1.1, but they contain a larger

Theorein 2&2. Let the time series XI

s = o ,

(2.8.5)

s = 1,2,. . . ,

s = o , otherwise.

those given in equation (2.5.6) and number of elements.

be defined by

where the e, are uncorreIated k-dimensional vector random variables with zero mean and covariance matrix 2, the B, are k X R matrices with B, = I, B, # 0. and the kq roots of the detenninantal equation

IB,m9 + Blmq-l + * * + Bgl = 0 (2.8.6)

are less than one in absolute value. Then XI can be expressed as an infinite

Page 102: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PREDlCT'ION 79

autoregressive process m

e, = C c,x,-, , j - 0

where the C, satisfy C,=I, C, = -Bl,

C2=-B,C, - B 2 ,

Proof. Reserved for the reader. A

It can also be demonstrated that an autoregressive moving average of the form

where the roots of (2.8.4) and of (2.8.6) an less than one in absolute value, can be given either an infinite autoregressive or an infinite moving average representation with matrix weights of the same form as the scalar weights of Theorems 2.7.1 and 2.7.2.

2.9. PREDICTION

One of the important problems in time series analysis is the following: Given n observations on a realization, predict the (n + s)th observation in the realization, where s is a positive integer. The prediction is sometimes called theforecast of the (n + s)th observation. Because of the functional nature of a realization, prediction is also called extrapotation.

In some areas of statistics the term predictor is applied to a function of observations used to approximate an unknown random quantity, and the term estimator is applied to a function of observations used to approximate an unknown Jixad quantity. Some authors use the term estimator to describe the function in both situations, In time series, the tern $her is often used for the function of observations used to approximate an unknown value of the random process at time c. If only values of the process prior to t are used in the approximating function for the time series, the function is called a predictor. We adopt this use and apply the term predictor to functions used to approximate a future term of the realization. We will use estimator as a generic term for a function of observations used to approximate either a fixed or a random quantity.

Page 103: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

80 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

In order to discuss alternative predictors, it is necessary to have a criterion by which the performance of a predictor is measured. The usual criterion, and the one that we adopt, is the mean square e m of the predictor. For example, if ?,+,(Y,, . . . , Y,) is the predictor of Y,+, based on the n observations Y,, Y,, . . . , Y,, then the mean square m (MSE) of the predictor is

Generally, the problem of determining optimal predictors requires that the class of predictors be restricted. We shall investigate linear predictors.

The best linear predictor for a stationary time series with k n o w covariance function and known mean is easily obtained.

Theorem 2.9.1. Given n observations on a realization of a time series with zero mean and known covariance matrix, the minimum mean square error linear predictor of the (n + s)th, s = 1,2, . . . , observation is

f,+,(Y,, . . . , Y,) = y'b,, , (2.9.2)

where y' = (Y,, Y,, . . . , Y,), V,, = Eby ' } ,

and V,+, is a generalized inverse of V,,. [See Rao (1973) for a discussion of generalized inverses.] The mean of the prediction e m r is zero, and the variance of the prediction error is

(2.9.3)

Proof. To find the minimum mean square error linear predictor, we minimize

with respect to the vector of coefficients b. Differentiating the quadratic form with respect to b and equating the derivative to zero, we find that a b that minimizes the expectation is b,, = V;,V,,. While b,, is not necessarily unique, the predictor y'b,, is unique. See Exercise 2.42. The mean of the prediction error is zero because the mean of the process is zero and

Page 104: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PREDICTION 81

The variance of the prediction e m r follows immediately from least squares regression theory. A

To understand why the generalized inverse was required in Theorem 2.9.1,

XI = el cos m , (2.9.4)

where e, is normally distributed with zero mean and variance u2. This time series is perfectly periodic with period two and covariance function ~ ( h ) = u2 cos wh. The matrix V,, for a sample of three Observations is

consider the time series

(2.9.5)

which is easily seen to be singular. Thus there are many possible linear combinations of past values that can be used to predict fume values. For example, one can predict X, by -X,- I or by X,-2 or by - +XI- .f 3X1-,.

The time series of equation (2.9.4) is a member of the class of time series defined by

X, = e l cos At+e, sin At , (2.9.6)

where ei, i = 1,2, are independent (0, a') random variables and A E (0, TI . Given A and two observations, it is possible to predict perfectly all future observations in a given realization. That is, for n a 2,

L1=l 1 - 1 J Lr-1

and all future observations can be predicted by substituting these e's into the defining equation (2.9.6).

This example illustrates that in predicting future observations of a realization from past observations of that realization, the element oEf l is fixed. In the current example, o€a is a two-dimensional vector and can be perfectly identified for a particular realization by a few observations on that realization. In the more common situation, w E $2 is infinite-dimensional, and we can never determine it perfectly from a finite number of observations on the realization.

Time series for which it is possible to use the observations q, Y,-, , . . . of a realization to predict f u h observations with zero mean square e m r are called deterministic. They are also sometimes called singular, a term that is particularly meaningful when one looks at the covariance matrix (2.9.5). Time series that have a nonzero lower bound on the prediction error are called regular or nodeterninis- tic.

Page 105: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

82 MOVING AVERAGE AND AUTOReORESSIVE PROCESSES

DefZnition 2.9.1. A time series is nonshgular (regular, nondetenninistic) if the sequence of mean square e m of one-period prediction, T : ~ , is bounded away from zero. A time series is singular (u'eteministic) if

2 lim rf l l = 0 . n - w

The vector of coefficients in Theorem 2.9.1 for one-period predictions can be computed recursively for nondeterministic time series. The recursive procedure is known as the Durbin-bvinson algorithm. See Durbin (1960) a& Morettin (1984).

Theorem 2.9.2. Let be a zero mean, covariance stationary, nondetenninis- tic time series defined on the integers. Then the coefficients defining the predictor of Yn+l given in (2.9.2) of Theorem 2.9.1 satisfy

(2.9.7)

(2.9.8)

and

forn=1,2, ... .

prool. It follows from the symmetry of the covariance function of a stationary process that the coefficients of the predictor of Yn based upon (Y,, . . . , Y n V l ) are the coefficients of the predictor of Y, based upon (q, . . . , Y2). That is,

and n- I

Page 106: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PREDICl’ION 83

where f,(Y,,. . . , Y2) is the predictor of Y, based on (Y2,. . . , Y,) and

is the vector of (2.9.2) for s = 1. The coefficient vector b,-,,, is unique because the covariance matrix of a nondeterministic time series is positive definite.

The best predictor of Y, + based on (Y , , . , . , Y,) can be constructed from the regression of Y , + ~ on the n-dimensional vector [Y~ - f , ( ~ , , , . . . , Y,), Y~,. . . , Y,I. By stationarity, the coefficients for the regression of Y,+, on (Y2, , , . , Y,) are the same as the coefficients for the regression of Y, on (Yl,. . . , Y,- , ) . Because ?l(Yn, . . . , Y2) is the best linear predictor, Y, - f, (Y,, , , . , Y2) is uncorrelated with (Y2,. . . , Y,), and the regression coefficient for Y, - f l (Y , , . . . , Yz) is

Therefore,

and we have the result (2.9.7). The e m r mean square for the prediction of Y,+, based on Y, , Yz, . . . , Y, is equal to the prediction variance for the prediction of Y,+ , based on Y2, Y,, . . . , Y, reduced by the sum of squares due to Y, after adjusting for Y2, Y3,. .., Y,. The variance of the prediction e m for ?,,+l(Y,, . . . , Y,) is 7,-,,,, and b:,],, is the squared conelation between Y,+, -

A 2

fn+ l (Y2 , . . . , Y,) and Y, - f ,(Y,, . . . , Y,), giving the result (2.9.9).

Theorem 2.9.1 gives a quite general solution to the problem of linear prediction, but the computation of n X n generalized inverses can become inconvenient if one has a large number of predictions to make. Theorem 2.9.2 presents a recursive method of computing the coefficients of the lagged Y‘s required for prediction. However, in general, these computations also become sizable as n increases. Fortunately, the forecast equations for autoregressive processes are very simple, and those for invertible moving average processes are fairly simple.

We begin with the first order autoregressive process, assuming the parameter to be known. Let the time series Y, be defined by

Y, = P Y , - ~ + e f t [ P I < 1 ,

where the e, are independent (0, a’) random variables. We wish to construct a predictor of Y,+l =pY, f e n + , . Clearly, the “best predictor” of pY, is pY,.

Page 107: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

84 MOMNQ AVERAGE AND AUTOREGReSSIVE PROCESSES

Therefore, we need only predict e,,,. But, by assumption, c independent of Y,,, Yn-,, , . . . Hence, the best predictor of en+, is zero, si I dcting en+, is equivalent to predicting a random selection from a distribution with zero mean and finite variance. Therefore, we choose as a predictor of Yn+] at time n

P,+,(Y,, . . ., Y,) = PY, . (2.9.10)

Notice that we constructed this predictor by finding the expect& value of Yn+l given Y,, Yn-,, . . . , Y l . Because of the nature of the time series, knowledge of Y,-,, Yn+. . . , Y, added no information beyond that contained in knowledge of Y,. It is clear that the mean square error of this predxtor is

E{EY"+, - ~,+I(Y,, . . , ,yR)J2}=~{e,2+1}=~*.

To predict more than one period into the future, we recall the representation s

Y,+, = P'Y, + 2 pf-je,,,+, , s = 1,2,. . . . (2.9. I 1)

It follows that, for independent el, the conditional expectation of Y,,, given Yls Yz,. . . , Y, is

I - 1

aY,., I Y,, *. * , Y,l=P"R 9 (2.9.12)

and h i s is the best predictor for &+,.

autoregressive process be defined by These ideas generalize immediately to higher order processes. Let the pth order

where the roots of

are less than one in absolute value and the el are uncorrelated (0, a') random variables. On the basis of the arguments for the first order process we choose as the one-period-ahead hear predictor for the stationary pth order autoregressive Pmcms

P

~ , + , ( Y , , . . ., Y,)= c t y , + I _ J * (2.9.13) j - 1

It is clear that this predictor minimizes (2.9.1), since the prediction erro~ is eAsl and this is uncorrelated witb Y, and all previous q ' s . If the e, are independent random variables, the predictor is the expected value of Y,+] conditional on Y,, Y,- I , . . . , Yl . If the e, are only uncorrelated, we cannot make this statement. (See

Page 108: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PREDIO'ION 85

Exercise 2.16). To obtain the best two-period linear prediction, we note that

where it is understood that eP+, = 0 for i = 1,2,. . . and, to simplify the subscripting, we have taken p 2 2. Since en+2 and en+, are uncorrelated with (Yt , Y,, . . . , Y,), the two-period predictor can be constructed as

P

In general, we can build predictors for s periods ahead by substituting the predictors for earlier periods into (2.9.13). Hence, for p b 2,

fn+,(Y,, . . . , Y,) = c qfn-,+JY,, . . . , Y,) + c qY,-,+, , s-1 P

s = 2,3, . . . , P , i-1 J'S

(2.9.15) D

= c, q f e - , + J Y I , . * - , Y,) 1 s = p + 1, p + 2,. . . . j - I

We now turn to prediction for the qth order moving average time series

Y, = C ge , - , + e, . j = 1

(2.9.16)

We assume S,, j = 1,2, . . . , q, are known. The pth order autoregressive pmcess expresses Y, as a linear combination of p previous Y's and e,, while the gth order moving average process expresses Y, as a linear cornbination of q previous e's and e,. In both cases, e, is uncomlated with previous Ys and uncorreiated with previous e's. Suppose we knew the e, for t = R, n - 1,. . . , R - q + I for the process (2.9.16). Then, by arguments analogous to those used for the auto- regressive process, the best linear predictor for s periods ahead would be

Page 109: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

86 MOVING AVERAGE AND AUTOREORESSIVE PROCESSES

If the el's are independent,

fn+$(e1, . . . , e,) = E{Y, + 8 I e l , e2, . . . en) . Because we do not know the e,, we must develop predictors based upon estimators of the e,, where the estimators of the e, are functions of the observed Y,. We first show that the predictions of Thearem 2.9.1 can be expressed in terms of the

The recursive computation is based on the Gram-Schmidt ortbgonalization. Brockweli and Davis (1991, p. 171) call the procedure the innovation algorithm, where the Z, are the innovations. We give the proof for stationary time series, but the result holds for more general covariance functions.

. . one-period prediction errors of previous pmhctions when U, is nollclefeRllllll stic.

Theorem 2.93. Let Y, be a zero mean, stationary, nondeterministic time series defined on the integers. Let

I- I . .

2, = U, - 2 c&, t=2 ,3 , . . . , j - I

(2.9.18)

(2.9.19) i - 1

K [ - ~ [ yY(t - i ) - c, ,cjj~;] , i = 2.3,. . . , f - 1 , c*i = j - I

and

Then the best predictor

and the variance of the

of Yn+, given (Y,, . . . , Y,) is

prediction error is

rK:+l. s = l ,

(2.9.20)

(2.921)

(2.9.22)

Proof. Because Z, is the residual from the regression of yi on (Zt-l,

Page 110: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PREDICTION 87

q-2, , . , , Z,), Z, is uncomlated with (Zip I , Zi-2,. . . , Zl), and hence the Z's are mutually uncorrelated. The covariance between U, and Zl for t > i is

Cov(u,, Z,) = cov

= 'K.(t - i ) - cov u,. z cyz, ( :I: ) ) ( r = t I- 1

I - 1 I - I

= r;(t - i ) - cov zI + Z ctrzr, t: ci,q i - 1

where we have used the zero correlation property of the Z,. Dividing Cov(Y,, Z,) by the variance of q, we obtain the expression for c,,. The variance of the prediction error is the regression error mean square, and by the zero correlation property we have (2.9.20).

Because of the m correlation property, the coefficients of (Z", . . . , Z,) in the regression of Ym on (Z8, . . . , 2, ) are the same as the coefficients of (Z,, . . . , Z, ) in the regression of Y,+, on (Z,,+s-l,. . . , Zl). The result (2.9.21) follows. The result (2.9.22) is an immediate consequence of (2.9.21) and (2.9.20). A

The recursive equations of Theorem 2.9.3 are particularly effective for predictions of finite moving average processes. Because the autocovariances of the qth order moving average are zero for h > q, the cfi of Theorem 2.9.3 are zero for i < t - q .

CoroUary 2.93. Let Y, be the qth order moving average

where the e, are uncorrelated (0.0') random variables. Assume that the Z,, cli, and ~ j ' of Theorem 2.9.3 have been constructed for j = 1.2,. . . , t - 1, where t - 1 > q. Then the cfi of Theorem 2.9.3 are

cf i=O, i < t - q , - 2

c:,:-q = K l - q x ( 4 )

and

f o r i = t - q + 1 , ..., t -1.

Page 111: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

88 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

Proof. omitted. A

If the roots of the moving average process are less than one in absolute value, then the 2, of Theorem 2.9.3 converges to e,, ~f converges to cr2, and qf - , converges to p, as t increases. For invertible moving averages, the coefficients are often approximated by the pi. Let (Zl ,Z2, . . . , gq) be initial estimates of (eI, e,, . . . , eJ, and let

4

z, = y, - z S,z,-j , t = q + I , q + 2 , ... I (2.9.23) i= 1

Then c?,, for n > q is

PI-1 n-1-0

where the cj are defined in Theotem 2.6.2. For an invertible moving average, the Icjl converge to zero as j increases, and Zn converges to e, in mean sqm as n increases. The initial values of (2.9.23) can be computed using Theorem 2.9.3 or by setting Z-q+l,i?-q+2,.,.,Zo equal to zero and using (2.9.23) for t = 1,2,. . . , q. Therefore, an easily computed s-pen'od predictor for the qth order invertible moving average, appropriate for large n, is

where E' is defined in (2.9.23). The prediction error for fa+ (Yl, . . . , Y,) goes to a2 as n increases.

The procedures for autoregressive and moving average processes are easily extended to the autoregressive moving average time series defined by

P P u, = 2 q v , , + c Bier-I + ef '

I- 1 i= 1

where the roots of

and of

Page 112: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PREDICTION 89

are less than one in absolute value and the e, are uncorrelated (0,a2) random variables. For such processes the predictor for large n is given by

s = l ,

s = 2.3,. .

(2.9.25)

. Where

ro 1 t = p, p - 1. . . . ,

and it is understood that 3 = 0 for j >p and 4

cp izn- i+s=o . s = q + l , q + 2 ,...,

' c ~ Y n - j + s = o , s = p + l , p + 2 ,....

i=s

P

j - a

The stationary autoregressive invertible moving average, which contains the autoregressive and invertible moving average processes as special cases, can be expressed as

m

where q, = 1, X,m'o lyl< 03, and the e, are uncorrelated (0, v2) random variables. The variances of the forecast errors for predictions of more than one period are readily obtained when the time seria is written in this form.

Theorem 2.9.4. Given a stationary autoregressive invertible moving average with the representation (2.9.26) and the predictor

n++--l .. . .

Un+,CY1,. . . , Yn> = c q,-,+, . j - s

where 2,. t = 1.2,. . . , n, is defined following (2.9.25). then

Page 113: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

m MOVING AVERAGE AND AUTOREGRESSIVE PRocesSES

Proof. The result follows immediately from Theorem 2.2.1. A

Using Theorem 2.9.4, we can construct the covariance matrix of the s predicton for Y,+,, YR+2,. . . , &+,. Let Y:.# = [Y,+I, Y , + 2 , . . . , Y,+,l, and let

*:.# = if,+,CY,, . * . f Y,), fR+2(YI, y2, * * * 9 %I,. . * , f,+,<Y,, . * * 1 Y,)l. Then

n-w l h E { V , . s - Pn.,XYn.s - q n . s I f I ... 1 Y VJ- I

I

u2. (2.9.27)

Note that the predictors we have considered are unbiased: that is,

w,,, - f,+,<y,, - * * * = 0

Therefore, the mean square e m of the predictor is the variance. To use this information to establish confidence limits for the predictor, we require knowledge of the distribution of the e,. If the time series is normal, confidence intervals can be constructed using normal theory. That is, a I - level confidence interval is given by

t+,v*, - 4 9 q)qJVar(Y,+, - P,+,<Y,, * * ' S Y")111/2 I

where r? is the value such that 1 of the probability of the standard normal distribution is beyond Ztv.

Example 29.1. In Example 6.3.1, we demonstrate that the U.S. quarterly seasonally adjusted unemployment rate for the period 1948-1972 behaves much likethetimeseries

U, - 4.77 = 1.54(U,-, - 4.77) - 0.67(U,-, - 4.77) + e, , where the e, are uncorrelated (0,O. 115) random variables. Let u8 assume that we know that this is the proper representation. The four observations for 1972 are 5.83, 5.77, 5.53, and 5.30. The predictions for 1973 are

~7, - l (Y72- l , . . . , Y72-1) = 4.77 + 1.54(0.53) - 0.67(0.76) = 5.08,

Page 114: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PREDICYYON 91

f73-z(Y,z-t2. . -, Y72-4) = 4.77 + 1-54(0.3070) - 0.67(0*53) = 4.89,

Y7,-,(Y7,- I, . . . , Y72-4) = 4.77 + 1.54(0.1177) - 0.67(0.3070) = 4.75,

f73-4(Y7z-l,, . . , Y,2-4)=4.77 + 1.54(-0.0245)-0.67(0.1177)

= 4.65 . The covariance matrix of the prediction emrs can be obtained by using the

representation Y, = X,?wo w,e,-j, where the w’s are defined in Theorem 2.6.1. We have a, = - 1.54, az = 0.67, and

w , = l , w 1 = 1.54,

w2 = ( 1.54)’ - 0.67 = 1.702 ,

w, = 1.5q1.702) - 0.67(1.%) = 1.589.

Therefore, the covariance matrix of equation (2.9.27) given by the product

0.115 0.177 0.1% 0.183 0.177 0.388 0.478 0.477 0.1% 0.478 0.720 0.789 0.183 0.477 0.789 1.011

Where

1.59 1.70 1.54 1.00

We observe that there is a considerable increase in the variance of the predictor as s incresses from 1 to 4. Also, there is a high positive correlation between the predictors. AA

Example 29.2. In Table 2.9.1 are seven observations from the time series

U, = O.SU,_, + 0.7e,-, i- e, ,

where the e, are nonna( independent (0,l) random variables. The first 11 autocorrelations for this time series are, apjmximately, 1.00, 0.95, 0.85, 0.77, 0.69,0.62,0.56,0.50,0.45,0.41, and 0.37. The variance of the time series is about 14.4737. To construct the linear predictor for Y,, we obtain the seven weights by solving

Page 115: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

92 MOVING AVERAOE AND AUTOREGRESSIVE PROCESSES

Table 2.9.1. Observations and predictions for the time sen- Y, = 0.9x-, f O.7et-, + e,

I Observation Prediction

1 2 3 4 5 6 7 8 9

10

-1.58 -2.86 -3.67 -2.33

0.42 2.87 3.43 -

- 3.04 2.73 2.46

the system of quations

. (2.9.28)

0.95

This system of equations gives the weights for the one-period-ahead forecast. The vector of weights b;,l = (0.06, -0.18,0.32, -0.50,0.75, - 1.10, 1.59) is applied to (Ul, Y2, Y3, Y4, Y,, Y6, Y7). [In obtaining the solution more digits were used than are displayed in equation (2.9.28).] The predictor for two periods ahead is obtained by replacing the right side of (2.9.28) by (0.45,0.50,0.56,0.62,0.77, OM)', and the weights for the three-period-ahead predictor are obtained by using (0.41,0.45,0.50,0.56,0.62,0.69,0.77)' as the right side. The predictions using these weights are displayed in Table 2.9.1.

The covariance matrix of the prediction mrs is

1 1.005 1.604 1.444

( 1.444 3.907 5.637 E{(Y,,, - 'P7,3)(Y7 - 'P7,3)') = 1.604 3.563 3.907 ,

where 97,3 and Y7,3 are defined following Theorem 2.9.4. The covarianm matrix of prediction m r s was obtained as AVA', where V is the 10 X 10 covariance matrix for 10 observations and

-0.05 0.16 -0.29 0.45 -0.68 0.99 -1.44 0 1 0 . -0.06 0.18 -0.32 0.50 -0.75 1.10 -1.59 1 0 0

-0.05 0.14 -0.26 0.41 -0.61 0.89 -1.29 0 0 1 I A=[

Page 116: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PREDlCTlON 93

If we had an infinite past to use in prediction, the variance of the one-period- abead prediction error would be one, the variance of e,. Thus, the loss in prediction efficiency from having only seven observations is about 3%. Since the time series is normal, we can construct 95% confidence intervals for the predictions. We obtain the intervals (1.08, 5.00). (-0.97,6.43), and (-2.19,7.11) for tbe predic- tions one, two, and three periods ahead, respectively.

To construct a predictor using the &method, we set e', = O and calculate

Zz = Yz - 0.9Y, - 0.7Z1 = -2.86 - 0.9(- 1.58) = - 1.438, E, = Y, - 0.9Y2 - o.7Zz = -3.67 - 0.9(-2.86) - 0.7(- 1.438) = -0.089, E4 = -2.33 - 0.9(-3.67) - 0.7(-0.089) = 1.035, Z5 = 0.42 - 0.9(-2.33) - 0.7( 1.035) = 1.793 , Z6 = 2.87 - O.g(O.42) - 0.7( 1.793) = 1.237, Z7 = 3.43 - 0.9(2.86) - 0.7(1.227) = -0.003 .

Therefore, the predictor of Y8 is

I',(Y,, . . . , Y,) = 0.9Y7 + 0.7Z7 = 3.085 ,

and

p9(Y)* . . 9 Y7) = 0.9f8(Y# I - . p Y7) = 2.776 1

?,,,(Y,,. . . , Y,) = 0.9f..(y,, . . + , Y7) = 2.498.

In this case the prediction using the approximate method of setting El = 0 is quite close to the prediction obtained by the method of Theorem 2.9.1, even though the sample size is small. If the coefficient of e,-, had been closer to one, the difference would have been larger.

By Theorem 2.7.1, we may write our autoregressive moving average process as m

U, = C ye,-, , j = O

q = b o = l , q = b , -a l =0.7+0.9=1.6, t j = - ~ ~ t j - ~ = ( 0 . 9 ) ~ - , , j = 2 , 3 ,... .

Therefore, the large sample covariance matrix of prediction e m for predictors one, two, and three periods ahead is

1.00 1.60 1.44

1.44 3.90 5.63

Page 117: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

94 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

which is very close to the covariance matrix of the best predictor based on n = 7 obsemations. AA

2.10 THE WOLD DECOMPOSITION

We obtained moving average repmentations for statiomy autoregressive and autoregressive moving average processes h Sections 2.6 and 2.7. In this section, we give a moving average representation, due to Wold (1938), that is appropriate for any stationary process. We first investigate the properties of the sequence of one-period prediction errors of a stationary nondeterministic process. We are particularIy interested in the limit as n becomes large.

Theorem 218.1. Let (5) be a stationary, nondeterministic time series defined on the integers. Then there exists a unique (as.) time series (e,} defined as the limit in mean square of the one-period prediction error as n becomes large. Further- more, {e,} is a sequence of uncorrelated random variables with mean zero and common variance v2, where

(2.10.1)

and T:, is the variance of the one-period prediction error defined in Theorem 2.9.1.

Proop. Without loss of generality, let the mean of U, be zero. Define the new time series

(2.10.2)

i - I

where

bi-1.l = ( ’ ~ - i , ~ , ~ , ’ i - ~ , ~ , ~ ~ . ~ * B’i-1,i.t-l”’ (2.10.3)

E { w ~ L , ) = , i = 2,3,. . . , (2.10.4)

a d b,, and T:, are defined in Theorem 2.9.1. Because Y, is regular, E{W:) is bounded below and the vectors bi-l,, are unique. Each K, is a stationary time

Page 118: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE WOLD DECOMPOSITION

Let

where

95

(2.10.5)

(2.10.6)

(2.10.7)

Then n

(2.10.8) E{e:(n)) = YAO) - I: C;T:- I , l = T~~ .

By (2.10.8), T:~ is monotone decreasing in n, and by the assumption that Y, is regular, T:~ is bounded MOW. Therefore, the limit

2

j- I

2 2 lim rn1 = u n+w

(2.10.9)

is well defined and

By the definitions,

Page 119: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

96 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

because E{ef(n)} C %(O). Therefore, by Exercise 4.14 of Royden (1989, p. 93),

&,e,+,} =!$ E{e,(n)e,+h(n)l.

Similarly,

E{e,} = lim E{e,(n)} = 0 . n-v=

The covariance behween e,(n) and et+h(n), for n > h > 0, is

where we have used the fact that e,+,,(n) is uncorrelated with K+h- l , Yt+h-Z+ * * * x + h - n * Because

i - n - h + l m

we have

n - v lim E{e,(n)el+h(n)) = E{e,er+h) = 0 .

E(e :) = lim Ere f(n)J = CT* . Similarly,

A

We now give Wold's theorem for the representation of a stationary time series. The theorem states that it is possible to express every stationary time series as the sum of two time series: an infinite moving average time series and a singular time series.

n - m

Theorem 2.10.2 (Wold decomposition). Let U, be a covariance stationary time series defined on T = (0, k 1,+2, . . .}. Then Y, has the representation

u, =x, +z,, (2.10.10)

W h e r e m

x, = 2 a,e,-, I =o

Page 120: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ntE WOLD DECOMPOSITION w uncorrelated (0, a') random variables defined in Theorem 2.10.1, and X, is defined as a limit in mean square.

Proof. If Y, is singular, then Y, = 2, and X, = 0. Suppose Y, is nonsingular. Let {e,} be the time series defined by Theorem 2.10.1. Let

where

and

We have n n

and Xys0 a ; < 03. It follows that

i =O is0

in mean square. Let 2, = Y, -X,. Using the arguments follows that

of Theorem 2.10.1, it

h 3 O .

Therefore, the processes (2,) and {X,} are uncorrelated. For any set of real numbers { q : i = O , l , ..., p } with S o = l ,

where we abbreviate Var{X,} to V(X,}. By (2.10.9), given (F > 0, there is some p ,

Page 121: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

98 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

and a set of 4 defined by (2.10.6) such that

Also,

and we conclude that (2,) is singular. A

Note that the mean can be included in the singular part of the representation (2.10.10). Proofs of the Wold decomposition using Hilbert space arguments are available in Anderson (1971, p. 420) and Btockwell and Davis (1991, p. 187).

2.11. LONG MEMORY PROCESSES

In Section 2.7, we demonstrated that the autocorrelation function of an auto- regressive moving average process declines at an exponential rate. By the results of Section 2.10, any nondeterministic stationary process can be represented as an infinite moving average of uncorrelated random variables. The coefficients of the sum decline exponentially if the original process is an autoregressive moving average. Because of this rapid rate of decline, finite autoregressive moving averages are sometimes called short memory processes.

Consider a weighted average of uncorrelated (0, a') random variables m

(2.11.1)

in which the coefficients of e , , are declining at the rate id-' for some d E (-0.5,0.5). Such a time series is well defined as a limit in mean square by Theorem 2.2.3. For the special case y = ( j + l)d-', we have

E{Y:)= 2 ( j + 1)2'd-')~2*[1 - (2d- 1)~1(1S)~~']~2, (2.11.2) 1=0

m

.yu(h) = E{Y,Y,+,) = id'- ' ( i + l~)~-'u' , h > O , (2.11.3) i - 1

Page 122: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

LONG MEMORY PROCESSES 99

where we have used the integral approximation to the sum in (2.1 1.2). For large h, %(h) is approximately equal to ~ h ~ ~ - ~ , where c is a constant. See Exercise 2.44. We say that a process X, is a lung memoly process if px(h) is approximately p ~ p r t i 0 ~ 1 to h2d-1 for d E (-0.5,OS).

A time series that has received considerable attention is

(2.1 1.4)

where q,= 1,

and r( .) is the gamma function. Using Stirling’s approximation that r(s + 1)=(2~s)”~e- ’s~ for large s, we have

for large j . It can be shown that the autocovariance, autocorrelation, and partial autocorrelation of the time seria (2.11.4) are

(2.1 1.7)

and

(2.1 1.8)

Using Stirling’s approximation,

Thus, the moving average with weights (2.11.5) is a long memory process. The time series (2.11.4) can be written

X, = ( I - a)-$, , (2.1 1.9)

Page 123: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

100 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

where (1 - 8)' is defined by the expansion

(1 - 93)' = 1 - t 9 - (2!)-'r(l- r ) a 2 - (3!)-'r(l - t)(2 - r)a' - - * - for -0.5 < t C 0 and 0 < r < 0.5. For r = 0, the operator is the identity operator. The operator (1 - 9 3 f for d E (-0.5,0.5) is called the fracriontlr duereme opemtor. The jth coefficient in the expansion of (1 - is the y of (2.11.5).

For the time series (2.11.4), we can also write

(2.1 1.10) ( 1 - Se)"x, = e,

or

(2.1 1.1 1)

where

j

I = 1 5 ( d ) = [ r ( j + l ) r ( - -d ) ] - ' r ( j - d ) = n i - ' ( i - 1 - d ) . (2.11.12)

If the time series satisfies (2.1 1.10) witb d E (-0.5,0.5), then it has an infinite moving average representation in which the coefficients decline at the rate jd- ' and an infinite autoregressive representation in which the coefficients decline at the

Table 2.11.1. Coefficients for fractional differenced time series

d = 0.4 d = -0.4

Index Y 5 N) Y 5 P(j )

1 2 3 4 5 6 7 8 9

10 15 20 25 30 35 40

0.4OoO 0.2800 0.2240 0.1904 0.1676 0.1508 0.1379 0,1275 0.1190 0.1 119 0.088 I 0.0743 0.0650 0.0583 0.0532 0.049 1

-0.4OOO -0.1200 -0.0640 -0.0416 -0.0300 -0.0230 -0.0184 -0.0152 -0.0128 -0.01 10 -0.0062 -0.0041 -0.0030 - 0.0023 -0.0019 -0.0015

0.6667 0.5833 0.5385 0.5085 0.4864 0.4691 0.4548 0.4429 0.4326 0.4236 0.3906 0.3688 0.3527 0.3400 0.3297 0.3210

-0.4OOO -0.1200 -0.0640 -0.0416 -0.0300 -0.0230 -0.0184 -0.0152 - 0.0 128 -0.01 10 -0.0062 -0.0041 -0.0030 -0.0023 -0.0019 -0.0015

0.4000 0.2800 0.2240 0.1904 0.1676 0.1508 0.1379 0.1275 0.1 190 0.1 119 0.0881 0.0743 0.0650 0.0583 0.0532 0.0491

-0.2857 -0.0714 -0.0336 -0.0199 -0.0 132 -0.0095 -0.0072 -0.0057 -0.0046 -0.0038 -0.0018 -0.001 1 -0.0007 -0.0005 -0.OOO4 -0.0003

Page 124: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 101

rate j - d - ’ , Thus, if d > 0, the moving average coefficients are square summable and the autoregressive coefficients are absolutely summable. If d < 0, the moving average coefficients are absolutely summable and the autoregressive coefficients ace square summable. Examples of the coefficients are given in Table 2.11.1. To generalize the class of time series with long memory properlies, we assume

Y, satisfies

(1 - se)dy, = zt , (2.1 1.13)

where 2, is the stationary autoregressive moving average satisfying

(2.11.14)

e, are uncorrelated (0,a2) random variables, the roots of the characteristic equations associated with (2.11.14) are less than one in absolute value, and d E (-0.5,OS). The time series Y, is sometimes called a fractionally differenced autoregressive moving average. The representation is sometimes abbreviated to AFUMA(p, d, 4). By Theorem 2.2.2 and Theorem 2.2.3, the time series Y, can be given an infinite moving average or an infinite autoregressive repmentation.

REFERENCES

Sections 2.1, 23, 23-29. Anderson (1971). Bartlett (1966). Box, Jenkins and Reinsel ( l W ) , Hannan (1970), Kendall and Stuart (1966), Pierce (197Oa), Wold (1938), Yule (1927).

section 2.2. Bartle (19641, Pantula (1988a), Royden (1989), Torres (1986), Tucker (1967). Section 2.4. Bellman (1960). Finkbeinex (19601, Goldberg (1958). Hiidebrand (1%8).

Section 2.10. Anderson (19711, Brockwell and Davis (1991). Royden (1989), Tucker

Section 2.11. Beran (1992). Cox (19841, Deo (1995), Granger (1980), Granger and Joyeux

Kempthorne and Folks (1971). Miller (1963, 1968).

(1%7), Wold (1938).

(1980), Hosking (1981, 1982), Mandelbmt (1969).

EXERCISES

1.

2.

Express the Coefficients b, of (2.4.3) in terms of the coefficients u, of (2.4.2) for n = 3.

Given that {e,: t E (O,kl, 22, . . .)} is a sequence of independent (0, a*) random variables, define

Page 125: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

102 MOVING AVERAGE AND AUMREORESSIVE PROCESSES

(a) Derive and graph ~ ( h ) and ~ ( h ) . (b) Express Y, as a moving average of e,. (c) Is it possible to express e, as a finite moving average of X,?

3. For the second order autoregressive process, obtain an expression for p(h) as a function of a, and a2 analogous to (2.5.10).

4. Given a second order moving average time series, what are. the largest and smallest possible values for p(l)? For p(2)? For p(3)? Give an example of 8

moving average time series such that y(0) = 2, y(1) = 0, fi2) = - 1, and fib) = o for lhl> 2.

5. Consider the difference equation

(a) Give the characteristic equation, and find its roots. (b) Give the expression for y, if yo = 0.8 and yI = 0.9. (c) ~f e, -NI(O, a’), a* >o, t ={o, 4 1 ~ 2 2 , . . .I, is there a stationary time

Y, - 1.6Y,-, + 0.89Y,-, = e,?

series satisfying

Explain. (d) Let the time series U,, t = 2,3,4,. . , , be defined by

Y, = 1.6Y,-, - 0.89Y,-, + e, ,

where (Yo, Y,) = (0,O) and e, -M(O, n2). Give the covariance matrix for w 2 * y,, YJ.

(e) Define (Yo, Y,) so that Y, of (ti) is a stationary time series.

6. Find four (admittedly similar) time series with the covariance function

I , h = O ,

0 otherwise.

7. Draw pictures illustrating the differences among the thearetical correlation functions for (a) A (finite) moving average process. (b) A (finite) autoregressive process.

Page 126: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 103

(c) A strictly periodic process of the form

M ._

X, = {el, cos A,t t e2, sin hit}, M < Q) , i = l

where e,i and e2, are independent (0, a’) random variables independent of el, and ey for all i Z j , and A,, i = I , 2, . . . , M, are distinct frequencies.

8. Consider the time series {X,} defined by

X, + alX,-, + $X,-, + = e,,

where the e, are uncorrelared (0,a2) random variables and the roots of

m3 + almZ + a,m + = O

are less than one in absolute value. Develop a system of equations defining y(0) as a function of the a’s.

9. Let the time series Y, be defined by

Y, = p, + p r t + X, , t = 1.2,. . . , where

X, = e, + 0.6e,- I ,

{e,: t E (0, 1,2,.. .)) is a sequence of normal independent (0, c2) random variables, and Po and PI are fixed numbers. aive the mean and covariance functions for the time series.

10. The second order autoregressive time series

X,+a,X,-, +a,X, - ,=e , ,

where the e, are uncomlated ( 0 , ~ ~ ) random variables and the roots of mz f a l’p + a, = 0 are less than one in absolute value, can be expressed as the infinite moving average

X, = 2 wje,-, . J=o

For the following three time series find and plot w, for j = 0,1,. . , , 8 and p(h) forh=0,1, ..., 8: (a) X, + 1.6Xt-, + 0.64X,-, = e,. (b) X, - O.4Xf- I - 0.45Xf-, = e,.

(c) Xf-1.2X,-,+0.85X,-,=e,.

Page 127: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

1oQ MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

11. Given the time series

Y,-O.8Y,-, =el +0.7ef-, +0.6e,-,,

where the e, are uncorrelated (0, a?) random variables, find and plot p(h) for h = 0,1,2,. . . ,8.

12. Prove Theorem 2.1.1.

13. Find a second order autoregressive time series with complex r o d s whose absolute value is 0.8 that will give realizations with an apparent cycle of 20 time periods.

14. Given that X, is a first order autoregressive process with parameter p = 0.8, use Theorem 2.9.1 to find the best one- and two-period-ahead predictions based on three observations.

15. Let the time series XI satisfy

(X, - 4) - 0.8(Xf -, - 4) = el ,

where the e, are uncomlated (0, 7) random variables. Five observations from a realization of the time series are given below:

t XI

1 5.9 2 4.9 3 2.2 4 2.0 5 4.9

Predict X, for t = 6 , 7, 8, and estimate the covariance matrix for your predictions, that is, estimate the matrix E((x,,, - 25,3)(~5,3 - 2 ~ ) .

16. Let Y, be defined by

r, = p ~ , - , + e, . lpl< 1 ,

where {e,, c E (0, ? 1, 22, . . .)} is obtained from the sequence {u,) of n o d independent (0.1) random variables as follows:

t = 0,+2,+4, * . . , t = t l , + 3 , .... el=("' 2- *I12 @ , - , - I ) , 2

Is this a strictly stationary process? Is it a covariance stationary process? Given a sample of ten observations (Y,, Y,, . . . , Y,,), what is the best linear predictor of Y,,? Can you construct a better predictor of Y,,?

Page 128: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES

17. Let U, be defined by

105

? = e , + P e , - , . lPl<1.

Show that the predictor of Y,+ , given in (2.9.17) is equivalent to the predictor ~ ~ + l ( Y ] , . . . , Y , ) = - ~ ~ = l (-P)’Y,-,w

18. Let {el} be a sequence of independent (0,l) random variables. Let the time series X, be defined by: (a) X, = e,. (b) XI =el - 1.2e,-, + 0.36el-,. (c) X, = el cos 0 . 4 ~ + e, sin 0 . 4 ~ + 6. (d) X,=e,+ 1.2el-l. Assume the covariance structure is known and we have available an infinite past for prediction, that is, {X, : t E (. . . , - 1,0,1, . . . , n)} is known. What is the variance of the prediction error for a predictor of X,, for each case above?

19. Show that if a stationary time series satisfies the difference equation

Y, - =el,

then E{e:} = 0.

20. Let {X, : t E (0, 2 1,+2, . . .)} be defined by

XI + alX,-, +%XI-* =e,,

where the e, are uncorrelated (0, cr2) random variables, and the roots r , and r2 of

2 m + a , m + a j = O

are less than one in absolute value. (a) Let rl and r2 be real. Show that X, - r , X I - , is a first order autoregressive

process with parameter r2. (b) If the roots rl and r2 form a complex conjugate pair, how would you

describe the time series Y, = X, - rlXl - I ?

21. Assume that X, is a stationary, zero mean, normal time series with auto- correlation function &(h). Let Y, = X:. Show that the autocorrelation fuiiction of U, is p,(h) = p:(h).

22. LetAyl=a,+a,cosur,t=1,2 ,..., wherea,,a,,anduarerealnumbers, and yo = 0. Find yI .

Page 129: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

106 MOVINQ AVERAGE AND AUTOREGRESSIVE PROCESSES

23. Let Y, = e, - el- , , where the e, ace normal independent (0, rr') d o m variables. Given n observations ftom a realization, predict the (n + I)st observation in the realization. What is the mean square error of your predictor? Hint: Consider the time series

and predict Z,,, ,. How would your answer change if e, were considered fixed?

24. Prove the following lemma. Lemma. Let A > fi 3 0 and let p be a nonnegative integer. Then there exists an M such that (t f I)'@ < MAf for all t 2 0.

25. Let

V, - 0.8Y,-, = e, - 0.9el-, ,

where the e, are uncorrelated (0,l) random variables. Using Theorem 2.7.1 obtain E{Y,e,-,} and E{Y,-,e,.-,}. MuItiply the equation defining Y, by Y, and Y,- and take expectations, to obtain a system of equations defining %(O) and ~ ~ ( 1 ) . Solve for the variance of U,.

26. Using the backward shift operator, we can write the autoregressive moving average process of order ( p , q) as

Show that the coefficients of the moving average representation

given in Themem 2.7.1 can be obtained from the quotient (Zf=,, u, 93')-1(Xy=ob, a') by long division. Obtain the autoregressive repre- sentation of Theorem 2.7.2 in the same manner,

27. Demonstrate how the nth order scalar difference equation y, + a l y l - l i- u,y,-, f s - s + any,-,, = 0 can be written as a first order vector difference equation of dimension n. If A denotes the matrix of coefficients of the resulting first order difference equation, show that the rods of (A - rnI[ = 0 are the same as the roots of

m"+a,rn"- ' + . - . + a , = o .

Page 130: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 107

28. F i d r(h) for the vector autoregressive time series

where e, =(el,,ez,)' is a sequence of normal independent (0,X) random variables and

Hinr: Express the covariance matrix as a linear function of the covariance functions of the two scalar autoregressive processes associated with the canonical form. Assume that we have observations XI, X,, . . . , X, with Xl= (3 .2) . Predict X,,, and XR+, and obtain the covariance matrix of the prediction errors.

29. The cobweb theory has been suggested as a model for the price behavior of some agricultural commodities. The model is composed of a demand equation and a supply equation:

where Q, is the quantity produced for time r, P, is the price at time r, and U: = (UIr, U,,) is a stationary vector time series. Assuming U, is a sequence of uncorrelated (0,s) vector random variables, express the model as a vector autoregressive process. Under what conditions will the process be stationary? If Ulr=e lr+bl l e , , - , and U =e, ,+b, ,e ,~,- i where e,=(e,,,e,,)' is a sequence of uncorrelated (0, cr I) vector random variables, how would you describe the vector time series (Qf, P,)'?

f

30. Let e, be a sequence of independent (0, a*) random variables, and let {u,} be an absolutely summable sequence of real numbers. Use Theorem 2.2.2 to show that

is defined as an almost sw limit and X, is a stationary process with zero mean and

Page 131: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

108 MOVING AVERAGE AND AUTOREGRESSIVE PROCESSES

31. Show that the assumption that E{Wi} is bounded below gumtees the existence of the inverse of V,, in Theorem 2.9.1 used to define (2.10.3).

32. Let the stationary pth order autoregressive process satisfy

where a, = 1 and el is a sequence of uncorrelated (0, a’) variables. The equation can also be written

waw, = e, . where 3 is the back shift operator defined in Section 2.4, and

~ ( g e ) = 1 + a , S + u , ~ ’ + ~ ~ ~ + u p 4 8 ~ . Some authors define a stationary process U, to be causal if it is possible to represent U, as

m

~ = ~ w i e , - , , t=0,+-1,+2 ,..., i - 0

where ZT-,, Iwjl < 00. Show that the foUowing statements are equivalent: (i) XI is causal. (ii) The roots of the characteristic equation

mP+a,mP-‘ i- ... +ap = O

are less than one in absolute value. (iii) q z ) # 0 for all complex numbers z such that lzl C 1. (iv) All roots of @(z) = 0 are greater than one in absolute value.

33. Using (2.6.8) and the fact that ap # 0, show that the autocorrelation function of the pth order autoregressive process is nonzero for some integer greater than No for all No.

34. Let

Y,=e,cosm+Z,, t = 1 , 2 ,..., where e, - N(0, ae,), independent of 2, - NI(0. qz). Let a,, and qz be known. (a) Given three observations (Y,, Y2, Y3), what is the minimum mean square

error predictor of Y4? Give the variance of the prediction emr. (b) What is the limiting expression for the predictor of Ye+, given

(Y,, Yz,. . . , Y,), where the limit is taken as n increases? What is the limit of the variance of the prediction emr?

Page 132: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 109

35. Let (Y,, . . . Y,) be observations on the first order moving average,

Y, = e , +/?,el- , . where el - NI(0, cr2) and IS, I < 1. Construct the best predictor e^,(Y,, Y2, Y,) of e,. What is the variance of the prediction error of e^,(Y,, Y2, Y,)? What is the limit of the variance of the prediction error of e*,(Y,, . . . , Y,) as n + m?

36. Let the stationary vector autoregressive process Y, satisfy D

Y, + 2 A/Y,-~ = e, . j = I

where the e, are independent (0, Xe8) random vectors. Show that the process also satisfies an equation of the form

D

Y, + 2 B,Y,.+~ = a, , /= 1

where the a, are uncorrelated (0,2,,) random vectors. Give the equations defining the B,, j = 1,2,. . . , p, as a function of the r(h). Show that if p = I , A, = A:, and 2, = I, then 8,, = I.

37. Let a, be a sequence of uncorrelated (0,l) random variables. Put the following time series in canonical form as moving averages of uncorrelated random variables e l : (a) a, + &,-, , (b) a,- I + 4a, + a,,,, (c) a, - 2.5~,-, +

Give the variance of e, in each case.

38. Let V,, denote the n X n covariance matrix of n observations (Y,, Y2, . . . , Y,,) on the first order moving average process

~ = e , + P e , - , .

where the e, are uncorrelated (0,l) random variables. Let D,, be the determinant of V,. Show that

D, = (1 + Pz)D,-l - P2Dn-z

for n = 2,3,. . , , with Do = 1 and D, = 1 + 8'. Hence, show that D, = (1 - /32)-i(l-)62(n+')), n=1,2 ,..., for ISl<l, and D n = n + l , n=1 ,2 ,..., for 1Pt = 1.

39. Let Y, be a zero mean stationary time series with y(0) > 0, lim,+,, y(h) = 0, and &h) = 0 for h > K. Show that Y, is nondeterministic.

40. Consider the model Y, = x# f 2,. t = 1,. . . , n, where Z, is a process with mean zero and known covariance function Cov(Z,, Z,) = y,.

Page 133: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

110 MOVING AVERAGE AND AUMREGRESSIVE PROCESSES

(a) Show that the best linear unbiased estimator for f i is

BG = (x'v;lx)-lx'v;;y,

where X = (xl, . . . , x,)', y = (Yl , . . . Y,)', and V,, =Vb} is nonsingular. (b) Given Y l , . . . Y,, Xi+,, and V,, = Cov(y, Y,,,), show that the best linear

unbiased predictor of U,+, = x:+# + Z,,, is

d+,& + bi,(Y - XSO) 1

where b,, = Vi:V,,.

41. Consider a stationary and invertible time series Y, given by

where xJm= lwj} < w, Z,m- 171 < 00, and the e, are uncorrelated (0, a') random variables. Show that

where t,, = t(Y,- I, , . , , Y,J is the best lineat predictor of given V

42. In Theorem 2.9.1, the fact that the system of equations V,,b = V,, is consistent is used. Show that this system is always consistent and that y'b,, in (2.9.2) is unique ae. for different choices of generalized inverses of V,,.

43. Consider a sequence satisfying el =Z,(po + /31e:-l)1'2, where Z, - M(0, l), /3,, > 0 and 0 d /3, < 1. That is, the distribution of e, given the past is N(0, h,), where h, = Po + p,e:- I . Thus, the conditional variance depends on the past errors. The model

Y, = cuU,-, + e,

is a special case of autoregressive conditionally heteroscedastic models called an ARCH(1) model. See Engle (1982). Note that

e : = ~ : ( ~ o + ~ ~ e : -, ) = ~ : [ B ~ + P ~ z : - ~ ( B ~ + B ~ ~ : - ~ ) I . . .

j = o ' i-0 = A 5 A z:-j) a s ,

where it is assumed that the process started with finite variance in the indefinite past.

Page 134: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 111

(a) Show that {e,} is a sequence of uncorrelated [0, (1 - PI)-'&,] random

(b) Show that if 38; < 1, then E{e:} exists. Find E{e:}. (c) Consider X, = e:. Assuming 3& C 1, show that X, is stationary. Give its

(d) Consider the stationary ARCH(1) model, Y, = at Y,...l -t- e,, where Ia,l C 1.

variables.

autocorrelation function.

Assume (a,, Po, Pi) are known. (i) What is the best predictor fn+s for Yn+, given Yl, . . . , Y,? (ii) Find V{Y,+, - ?,,+,}, the unconditional forecast error variance. (iii) Find V{(Yn+, - fn+,) I (el, . . . ,en)}, the conditional forecast erro~

variance. (iv) Show that V{(Y,+, - fn+,)l(eI, . . . ,en)} may be less than V{(Y,+, -

~ + , ) ~ ( e , , . . . , en)} for some &, fll, a,, and e:. (That is, the conditional forecast error variance for two-stepahead forecasting may be less than that for one-stepahead forecasting.)

(v) Show that

limV{Y,+, - fn+$ [(el , . . . , e n ) } = (1 - aI) 2 - I u 2 as., s - w

where U' = (1 - #31)-1P,.

44. Use the facts that, for -0.5 < d < 0.5 and h > 0,

for I C i C h , 2d-Lid-I <id-l(l + h - I j ) d - l e i d - 1

( i + h ) Z d - Z < j d - I ( j + h ) d - l < 2 d - l j 2 d - 2 for i > h

to show that y ( h ) of (2.11.3) is bounded above by clhZd-' and bounded below by czh2", where cI and c2 are constants.

45. Let and {b,}rm be sequences of real numbers satisfying

m m

m rn

c,= 3 brar-, = 3 arb,-, . j = . - O O ,'--

Page 135: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

C H A P T E R 3

Introduction to Fourier Analysis

3.1. SYSTEMS OF ORTHOGONAL FUNCTIONS-M)URIER COEF'FICIENTS

In many areas of applied mathematics it is convenient to appmximate a function by a linear combination of elementary functions. The reader is familiar with the Weierstrass theorem, which states that any continuous function on a compact set may be approximated by a polynomial. Likewise, it may be convenient to construct a set of vectors called a busis such that all other vectors may be expressed as linear combinations of the elements of the basis. Often the basis vectors are constructed to be orthogonal, that is, the sum of the products of the elements of any pair is zero. The system that is of particular interest to us in the analysis of time series is the system of trigonometric polynomials.

Assume that we have a function defined on a finite number of points, N. We shall investigate the properties of the set of functions (cos(2mtlN), sin(2m7ulN): t = 0,1,. . . , N - 1 and m = 0.1,. . . , L"]}, where L[N] is the largest integer less than or equal to N/2. The reader should observe that the points have been indexed by t running from zero to N - 1, whiIe rn runs from zero to L"]. Note that for m = 0 the cosine is identically equal to 1. The sine function is identically zero for m = 0 and for m = N12 if N is even. Therefore, it is to be understood that these sine functions are not included in a discussion of the collection of functions. The collection of interest will always contain exactly N functions, none of which is identically zero. We shall demonstrate that these functions, defined on the integers 0, 1, , , , , N - 1, are orthogonal, and we shall derive the sum of squares for each function. This constitutes a proof that the N functions defined on the integers furnish an orthogonal basis for the N-dimensional vector space.

Theorem 3.1.1. Given that m and r are contained in the set {0,1,2, * . . , L"), then

N 2 ' N

m = r # O or - 2 '

2 m N- 1 I= cos- 1=0 N

112

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 136: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SYSTEMS OF ORTHOGONAL PUNCIIONS-fQURIER COEFFICIENTS 113

2mn 2 m N - 1

2 sin-tcos-t=o, Vm,r; I S 0 N N

N 2mn m = r # O or 7 , N - l

1=0 0, m + r .

proof. Consider first the sum of the products of two cosine functions, and let

2mn 27rr s(m, r) = Z cos --p cos -t N - I

t==O N

2mt 27rt 1 N-l

= ';i 2 [ co8 7 (m + r) + cos - (m - r)] . (3.1.1) 1-0 N

For m = r = 0 or m = r = N/2 (N even) the cosine terms on the right-hand side of (3.1.1) are always equal to one, and we have

Form = r, but not quai to zero and not equal to N/2 if N is even, the sum (3.1.1) reduces to

1 N - l 2m 1

2 r=o S(m,r)=- 2 c o s ~ 2 m + y N ,

where the summation of cosines is given by

The two sums are geometric series whose rates are exp(i(2m)21r/N} and exp{-6(2m)27r/N}, respectively, and the first term is one in both cases. Now 42m)2dN is not an integer multiple of 27r; therefore the rates are not unity. Applying the well-known formula Zy=iI A' = (1 - A N ) / ( 1 - A), the partial sum is

Since exp(i(2m)Z.rr) = exp(&?r} = I, the numerators of the partial sums are 0,

Page 137: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

114 INTRODUCITON TO FQURIER ANALYSIS

and S(m, r) reduces to N/2. For m # r, we have

1 N - 1 = - 2 [ei(ZurIN)(m+r) - i<Zr f lN) (m+r)

+ e 4 1-0

1 4 2 a f I N)(m - r ) + - d2nf I N X m - r ) + e

= O .

The sum of products of a sine and cosine function is given by

2mn 21rr 2 sin -jp cos -t f='O N N - 1

where the proof follows the same pattern as that for the product of cosines with m = r # 0. We leave the details to the reader. A

Having demonstrated that the N functions form an orthogonal basis, it follows that any function f l f ) defined on N integers can be represented by

At) = (a, cos W,J + b, sin wmt), f = 0, 1, . . . , N - 1 , (3.1.2) m =O

where

2mn w, =- N ' m = 0, 1,2,. . . , L"]:

N - I

2 2 flt)cosw,t

c, flt)cosw,,,t

2 2 flt)sin w,t

f =o , m = l , 2 , ..., LrN-11, N a m = N - 1

N 1-0 , m = O , and m = y i f N is even: N N - I

1-0 , m = 1,2,. . . ,L[N- 11. N

i b,,, =

The a, and b, are called Fourier coeficients. One way to obtain the

Page 138: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SYSTEMS OF ORTHOGONAL F"CI'IONS--POURIER COEFFICIENTS 115

representation (3.1.2) is to find the a, and b, such that

(3.1.3) LINl

{fir) - 2 (a, cos w,t + b, sin w,t) r=o m=O

is a minimum. Differentiating with respect to the aj and 6, and setting the derivatives equal to zero, we obtain

LW"

{fir) - 2 (a, cos wmr + b, sin w,t) 1=0 m =O

j=O,1,2 ,...* LEN.

LtNl

{fir) - 2 (a, cos o,r + b, sin w,t) 1-0 m-0

j = 1.2,. . . ,L[N- 11.

By the results of Theorem 3.1.1, these equations reduce to N- 1 N - I

2 fit)cos wmr = am cos2w,,,t, m = 0 ~ 1 , . . . , L [ N ] , I S 0 1-0

N- I N- I

2 f(t)sin w,t = b, 2 sin20,r, 1-0 t -o

m = 1 , . . . , L[N - I J ,

and we have the coefficients of (3.1.2). Thus we see that the coefficients are the regression coefficients obtained by regressing the vectorfft) on the vectors cos omt and sin w,t, By the orthogonality of the functions, the multiple regression coefficients are the simple regression coefficients.

Because of the different sum of squares for cos Or and cos rr, the a's are sometimes all defined with a common divisor, N / 2 , and the first and last term of the series modified accordingly. Often N is restricted to be odd to simplify the discussion. Specifically, for N odd, we have

( N - 1 ) / 2 QO f(r) = 5 f (a, cos wnrt + b, sin w,t) t = 0 , l , . . . , N - 1 ,

m=1

(3.1.4)

where N - i

2 c At) cos w,t 1-0 N - 1

am = N . m = O l l , . . . , - 2 ' N- I .. .

2 C f(t) sin wmt P O N- 1 , m=1,2 ,.... - N 2 . b, =

Page 139: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

116 INTRODUCTION TO FOURIER ANALYSIS

The reader will have no difficulty in identifying the definitions being used. If the leading team is given as a,/2, the definition (3.1.4) is being used; if not, (3.1.2) is being used.

The preceding material demonstrates that any finite vector can be represented as a linear combination of vectors defined by the sine and cosine functions. We now turn to the investigation of similar representations for functions defined on the real line.

We first consider functions defined on a finite interval of the real line. Since the interval is finite, we may code the end pints in any convenient manner. When dealing with trigonometric functions, it will be most convenient to t m t intervals whose length is a multiple of 2z We shall most frequently take the interval to be r-?1; 721.

Deffnltion 3.1.1. An infinite system of square integrable functions {a,},:=, is orthogonal on [a, b] if

and

The following theorem states that the system of trigonometric functions is orthogonal on the interval [- n, w].

Theorem 3.1.2. Given that m andj are nonnegative integers, then

0 , m + j , sinmxsin j x & = u, m = j Z O ,

0 , m = j = O ;

2 z , m = j = O ,

0 , m f j .

I cusmxcosjxdr=

R

Proof. Reserved for the reader. A

Page 140: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SYSTEMS OF ORTHOGONAL FUNCTIONS-FOURJER COEFFICIENTS

The sum

117

(3.1.5)

21rk hk=- , k = O , I ,..., n ,

is called a mgonometric porynOmiul of order (or degree) n and period T. We shall take T = 2w (i.e., Ak = k) in the sequel unless stated otherwise. If we let n increase without bound, we have the infinite trigonoinetric series

m

By Theorem 3.1.2 we know that the sine and cosine functions ate orthogonal on the interval [-n, w], or on any interval of length 27r. We shall investigate the ability of a trigonometric polynomial to approximate a function defined on this interval.

In the finite theory of least squares we are given a vector y of dimension n, which we desire to approximate by a linear combination of the vectors x,, k = 1,2,. . . , p. The coefficients of the predicting equation

are obtained by minimizing

with respect t0 the bk. The analogous procedure for a function Ax) defined on the interval [ - w, v) is

to minimize the integral

(a, cos Ap + b, sin Ap) (3.1.7)

of come, m x ) - (ak cos A$ + bk sin A,x)J2 must be integrable. In this section we shall say that a function g(x) is integrable over [a, b] if it is continuous, or if it has a finite number of discontinuities [at which g(x) can be either bounded of unbounded], provided the hnpmper Riemann integral exists. More general definitions could be used. Our tteatment in the sequel follows closely that of Tolstov (1%2), and we use his definitions.

The reader may verify by expansion of the square and differentiation of the

Page 141: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

118 INTRODUCTION To FOURIER ANALYSIS

resultant products that the coefficients that minimize (3.1.7) are

a, = - I" cosApf(x)clx, k = 0 , 1 , 2 ,..., (3.1.8) - I f

where the first term is written as a0/2. In this form the approximating sum is

(3.1.10)

When T, the length of the interval, is not equal to ZT, A, is set equal to 2vk/T, and we obtain the formulas

Observe that .n[(ui/2) + X:=,, (a: f bi)] is completely analogous to the s u m of squares due to regression on n orthogonal independent variables of finite least squares theory. We repeat below the theorem analogous to the statement in finite Ieast squares that the multiple correlation coefficient is less than or equal to one.

Theorem 3.13 ( B d ' S kt bk, and S,,(X) be defined by (3.1.8). (3.1.9), and (3.1.10), respectively. If.@) defined on [-T, w] is square integrable, then

Proof. Since the square is always nonnegative,

(3.1.12)

By the definitions of a, and b, and the orthogonality of the sine and cosine

Page 142: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SYSTEMS OF ORTHOGONAL ”CTIONS-FOURIER COEFFlClENTS 119

functions, we have

n

&= 1

and

A

De&Mon 3.13. A system of functions {q(~)},“=~ is complete if there does not exist a functionfix) such that

and

[flx)q(x)h=*, j = o , 1 ,... .

We shall give only a few of the theorems of Fourier analysis. The following two theorems constitute a proof of completeness of the trigonometric functions. Lebesgue’s proofs are given in Rogosinski (1959).

Theorem 3.1.4. If Ax) is continuous on the interval [-n, n], then all the Fourier coefficients are zeiu if and only if

Theorem 3.1.5. If Ax) defined on the interval [ - n, r ] is integrable and if all the Fourier coefficients {ak, b,: k = 0, 1’2,. . .} are zero, then

Proof. omitted. A

By Theorem 3.1.4 the Fourier coefficients of a continuous function are unique. Theorem 3.1.5 generalizes the result to absolutely integrable functions. Neither furnishes information on the nature of the convergence of the sequences {a,} and @,I.

Page 143: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

120 lIWRODUCTlON TO FOURIER ANALYSIS

Theorem 3.1.6 (Parseval's theorem) answers the question of convergence for square integrable functions.

Theorem 3.1.6. If j&) defined on [ - n; 7r] is square integrable, then

or, equivalently,

Proof. By Bessel's inequality, the partial sum of squares of the Fourier coefficients converges for any square integrable function. (The sum of squares is monotone increasing and bounded.) Themfore, the function

where a, and b, are the Fourier Coefficients of a square integrable functionf(x), is square integrable on [ - v, T]. Now D(x) =fix) - g(x) is square integrable and all Fourier coefficients of D(x) are zero; hence, by Theorem 3.1.5,

A

Thus, any square integrable function defined on an interval can be approxi- mated in mean square by a trigonometric polynomial.

As one might suspect, the convergence of the sequence of functions S,,(x) at a point no q u i r e s additional conditions. We shall closely follow Tolstov's (1962) text in presenting a proof of the pointwise convergence of Sn(xo) toflxo). The following definitions are needed.

Deftnition 3.13. The lefr limit offlx) is given by

lint fl%) =fix,) i--IXg X < X g

provided the limit exists and is finite. The limit offlx) as x+xo, x >xo, is called the right limit of f lx) and is denoted by f lx i), provided it exists and is finite.

Definition 3.1.4. The right derivative of Ax) at x = xo is defined by

Page 144: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SYSTEMS OF ORTHOGONAL FUNCTIONS-FOURIER CQEpLlClENTS 121

and the lej? derivative by

h z 0

provided the limits exist and are finite.

Definition 3.13. The functionflx) has a jump discontinuity at the point x, if

From the definitions it is clear that the jump Ax:) -flxO) is finite. fld) +fix;).

DelWtian 3.1.6. A functionflx) is smooth on the interval [c,d] if it has a continuous derivative on the interval [c, dl.

Definition 3.1.7. A function flx) is piecewise smooth on the interval [c, d ] if

(i) Ax) andf’e) are both continuous, or (ii) flx) andf’(x) have a finite number of jump discontinuities.

Bessel’s inequality guarantees that the Fourier coefficients of any square integrable function go to zero as the frequency increases; that is,

lim urn = lim bm =: 0 . rn+ m+m

The following lemma is a proof of the same result for a different class of functions.

Lemma 3.1.1. Ifflx) is a piecewise smooth function on [c,d], then

Proof. Integrating by parts gives

Since bothflx) andf’(.x) contain at most a finite number of jump discontinuities on the interval [c ,d] , they are both bounded, Therefore, the quantity contained

A within the curly braces is bounded, and the result follows immediately.

Using the fact that any absolutely integrable function can be approximated in the mean by a piecewise smooth function, it is possible to extend this result to any absolutely integrable function. We state th is generalization without proof.

Page 145: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

122 INTRODUCTION TO FOURIER ANALYSIS

Lemma 3.1.1A. The Fourier coefficients a, and 6, of an absolutety integrable function defined on a finite interval approach zero as tn + 00.

Lemmas 3.1.2 and 3.1.3 are presented as preliminaries to the proof of Theorem 3.1.7.

Lemma 3.1.2. Define

1 " 2 j - 1

~ , ( u ) = - + s cos j u .

Then

sin (n + f)u A,(u) = 2 sin(u/2)

and

(3.1.13)

(3.1.14)

Proof. Multiplying both sides of the definition of A&) by 2sin(u/2), we have

n

/ = I 2An(u)sin(u/2) = sin(u/2) + 2 2 cos ju sin(u/2)

and the result (3.1.13) follows. Since the integral of cos ju, j = 1.2, . . . , is zero on an interval of length 2v,

and hence

1 sin (n + 4). IT -(R 2sin(u/2) dU=1.

The result (3.1.14) follows, since A&) is an even function.

(3.1.15)

A

Page 146: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SYSTEMS OF ORTHOGONAL FUNCTIONS-FQURIER COEPRCIENTS 123

Lemma 3.13. The partial sum

Qo " S,,(x)=y+ 2 (ukcoskx+b,sinkx) k-1

may be expressed in the form

(3.1.16)

where a, and b, are the Fourier coefficients of a periodic functionfix) of period 2 n defined in (3.1.8) and (3.1.9).

Proof. Substituting (3.1.8) and (3.1.9) into (3.1.10), we have

[ { j-: flt)cos kt dt }cos kx + { f(t)sin kt dt }sin kx] 1 "

='I" n - w A?)[:+ k = l cosk( t -x ) 1 d f .

Using (3.1,13) of Lemma 3.1.2, we have

Setting u = t - x and noting that the integral of a periodic function of period 2n over the interval [ - n - x, n - 4 is the same as the integral over [ - n, PI, we have

U sin (n + $)u A

Equation t3.1.16) is called the integral formula for the partial sum of a Fourier series.

Theorem 3.1.7. Let f lx) be an absolutely integrable function of period 2n. Then:

(i) At a point of continuity where f i x ) has a right derivative and a left derivative,

Page 147: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

124 INTRODUCTION TO FOURIER ANALYSIS

(ii) At every p i n t of discontinuity wherefix) has a right and a left derivative,

Proof'. Consider first a point of continuity. We wish to show that the difference

equals mro. MuitipIying (3.1.15) of Lemma 3.1.2 by fix), we have

1 /I sin(n +$)u f ix) = - du I

7r -vflx) 2sin(u/2)

(3.1.17)

(3.1.18)

Therefore, the difference (3.1.17) can be written as

Now, u - ' m x + u) -fix)] is absolutely integrable, since the existence of the left and right derivatives means that the ratio is bounded as u approaches zero. Also, [2 sin(u/2)]-'u is bounded. Therefore,

fix + u) -m U

U

is absolutely integrable. Hence, by Lemma 3.1.1A,

giving us the desired result for a point of continuity. For a point of jump discontinuity we must prove

Using (3.1.14) of Lemma 3.1.2, we have

sin (n + +)u du 2 2 sin(ul2)

Page 148: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SYSTEMS OF ORTHOGONAL tWNCTIONS--FOURIER COEFFICIENTS 125

and

The same arguments on boundedness and integrabiIity that were used for a point of continuity uui be used to show that

"Ax + u) - f i x + ) u sin (n + 4)u du

U 2 sin(ul2)

f l x + u ) - f l x - ) u sin (n + +)u du

U 2 sin@ / 2) =lim -

= O . A

Theorem 3.1.8. Let f i x ) be a continuous periodic function of period 2~ with derivativef'(x) that is square integrable. Then the Fourier series offlx) converges t o m ) absolutely and uniformly.

Proof. Under the assumptions we may integrate by parts to obtain

Therefore, the Fourier coefficients of the function are directly related to the Fourier coefficients of the derivative by a, = -bL/h, b, = a;/h, where

a;=- I' f'(x)cos hxak , 72 -ff

The Fourier coefficients of the derivative are well defined by the assumption that f ' ( x ) is square integrable. Furthermore, by Theorem 3.1.6 (Parseval's theorem), the series

Page 149: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

126

converges. Since

INTRoDUCrtON TO FOURIER ANALYSIS

we have

It follows that X;,, (IuhI + lbhl) converges. Now,

and therefore, by the Weierstrass M-test, the trigonometric series

converges tofIx) absolutely and uniformly. A

As a simple consequence of the proof of Theorem 3.1.8, we have the following important result.

Corollary 3.1.8. If

converges, then the associated trigonometric series

converges absolutely and uniformly to a continuous periadic function of period 2w of which it is the Fourier series.

We are now in a position to prove a portion of Theorern 1.4.4 of Chapter 1. In the proof we require a result that will be used at later points, and therefore we state it as a lemma

Lemma 3.1.4. (Kronecker’s lemma) If the sequence {aj} is such that

Page 150: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SYSTEMS OF ORTHOGONAL FUNCTIONS-FOURIER COEFFICIENTS 127

then

Proof. By assumption, given Q > 0, there exists an N such that

Therefore, for n > N, we have

Clearly, for fixed N,

and since c: was arbitrary, the result follows. A

Theorem 3.1.9. Let the correlation function Ah) of a stationary time series be absolutely summable. Then there exists a continuous functionflw) such that:

(i) p(h) = J-", A w W s wh do. (ii) flo) 3 0.

(iii) Jr,, Aw) do = 1. (iv) f(w) is an even function.

proof. By Corollary 3.1.8

1 " do) = - + p(h)cos h o h i t

is a well-defined continuous function. Now, by the positive semidefinite property of the comlation function,

n n

and

2 2 p(m - q)sin mw sin qo a 0 . n=lq*1

Page 151: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

128 INTRODUCFION To FOURIER ANALYSIS

Hence,

i: 2 d m - q)[cos mw cos g o + sin mw sin qwl m = l q=1

Letting m - q = h, we have

n-I

h = -(n 2 - I ) (*)p(h)coshw*O

Now, p(h)cos hw is absolutely summable, and hence, by Lemma 3.1.4,

Therefore,

Having shown that g(w) satisfies conditions (i) and (ii), we need only muhiply g(o) by a constant to meet condition (iii). Since

g(w)dw = 1 , 1 " Ir I,

the appropriate constant is v-', and we define flu) by

The functionfiw) is an even function, since it is the uniform limit of a sum of even functions (cosines). A

In Theorern 3.1.8 the square integrability of the derivative was used to demonstrate the convergence of the Fourier series. In fact, the Fourier series of a continuous function not meeting such restrictions need not converge. However, C e s h ' s method may be used to recover any continuous periodic function from its

Page 152: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SYSTEMS OF ORTHOGONAL "CI'IONS--FOURIER COEFFICIENTS 129

Fourier series. Given the sequence {S,}T= I , the sequence {C,} defined by

is called the sequence of arithmetic means of {$}. If the sequence {C,} is convergent, we say the sequence {S,} is Cesciro summable. If the original sequence was convergent, then (C,} converges.

Lemma 3.1.5. If the sequence {S,} converges to s, then the sequence (C,,} converges to s.

Proof. By hypothesis, given e > 0, we may choose an N such that IS, - sl < $e for a l l j > N . For n>N, we have

Since we can choose an n large enough so that the first term is less than i e , the result follows. A

Theorem 3.1.10. Let Am) be a continuous function of period 277. Then the Fourier series of flu) is uniformly s d l e to flu) by the method of C e s h .

Proof. The C e s h sum is . n - l J

where we have used Lemma 3.1.3. Now,

(3.1.19)

Page 153: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

130 INTRODUCI'IQN To FOURIER ANALYSIS

Therefore,

where we have used

Since flo) is uniformly continuous on [- w, n], given any e > 0, there is a 6 >O such that

In@ + u) -A@)[ < ; for all lu1<6 and all w E [ - n , w]. We write the second integral of (3.1.20) as

+ 1" IJrw + u) -Ao)l du 27rnsin2(6/2) 0

€ M c-+ 4 nsinZ@/2) *

where M is the maximum of h w ) l on [-n, n]. A similar argument holds for the first integral of (3.1.20). Therefore, there exists an N such that (3.1.20) is less than E for all n >Nand all w E [-n, n]. A

3.2. COMPLEX REPRESENTATION OF TRIGONOMETRIC SERIES

We can represent the trigonometric series in a complex form that is somewhat more compact and that will prove useful in certain applications. Since

cos e + i s i n e +cus 8 - i s in 8 ele + e-& - 2 - 2 mse=

Page 154: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

COMPLEX REPRESENTATION OF TRIOONOMEIWC SERIES 131

and

we have

( e & ~ + 2 e - ~ ~ ) ( e~~ - e - i k ~

2 a, cos kx + 6, sin kx = a, - 4.’

Thus we coul write the approximating sum for a function Ax). defined on [ - v, v], as

0 0 Sn(x) = - + 2 (a, cos kr + 6, sin kx) = ckeYkx , & - I k = - n

where

aL + ib, , k=0,1,2 ,... . - &k

c,= , c- ,=c$=

The coefficients c, are given by the integrals

Consider now a function X, defined on N integers. Let us identify the integers by {-m, -(m -- l), . . . , -LO, 1,. . . , m} if N is odd and by {-(m - l), -(m - 2), . . . , -l,O, 1 , . . . ,m} if N is even. Taking N to be even, the c, are given by

k=-(m-1) ,..., 0 ,..., m , (3.2.2)

Page 155: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

132

and

INTRODUCITON TO FOURIER ANALYSIS

Note that co and c, do not require separate definitions. The complex form thus makes manipulation somewhat easier, since the correct divisor need not be specified by frequency.

If N is odd and X, is an even function (i.e.. XI =X-,), then the coefficients ct are red. Conversely, if X, is an odd function, the Coefficients ck of equation (3.2.1) are pure imaginary.

3.3. FOURIER TRANSFORM-FUNCTIONS DEFINED ON THE REAL LINE

The results of Theorems 3.1.8 and 3.1.9 represent a special case of a more general result known as the Fourier integral theorem. By Theorem 3. I .8 the sequence of Fourier coefficients for a continuous periodic function with square integrable derivative can be used to construct a sequence of functions that converges to the original function. This result can be stated in a very compact form by substituting the definition of ak and bk into the statement of Theorem 3.1.7 to obtain

(3.3.1)

Since the reciprocal relationships are well defined, we could also write, using the notation of Theorem 3.1.9,

We say that p(k) and f i x ) form a transfinn pair. We shall associate the constant and the negative exponential with one transform and call this the Fourier transform or spectral density. The terms spectrum or spectral function are also

Page 156: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

FOURIER TRANSFORM-FUNCTIONS DEFINED ON THE REAL LINE 133

used. Thus, in Theorem 3.1.9, the function flu) defined by

is the Fourier transform of p(h), or the spectral density associated with p(h).

inverse transform or the characteristic function. Thus the correlation function The transform in (3.3.1) with positive exponent and no constant we call the

is the inverse transform, or characteristic function, of Ao). We have mentioned several times that this is the statistical characteristic function if flu) is a probability density.

These definitions are merely to aid us in remembering the transform to which we have attached the constant 1/2m. We trust that the reader will not be disturbed to find that our definitional placement of the constant may differ from that of authors in other fields.

Theorem 3.1.8 was for a periodic function. However, there exists a considerable body of theory applicable to integrable functions defined on the real line. For an integrable functionflx) defined on the real line we formally define the Fourier transform by

(3.3.2)

where u E (-m, Q)). The Fourier integral theorem states that if the function f ix ) meets certain regularity conditions, the inverse transform of the Fourier transform is again the function. We state one version without proof. [See Tolstov (1962, p. 188) for a proof.]

Theorem 3.3.1. Let f lx ) be an absolutely integrable continuous function defined on the real line with a right and a left derivative at every point. Then, for ail x E (-w, m),

1 " f i x ) = 5 I_ e ' u x fit>e- 'u' dt du . (3.3.3)

Table 3.3.1 contains a summary of the Fourier transforms of the different types of functions we bave considered. We have presented theorems for more general functions than the continuous functions described in the table. Likewise, the finite transform obviously holds for a function defined on an odd number of integers. For the first, third, and fourth entries in the column headed "Inverse

Transform" we have first listed the domain of the original function. However, the inverse transform, being a multiple of the transform of the transform, is defined for the values indicated in parentheses.

Page 157: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

w

w

Tabl

e 33.1.

Sum

mar

y of

Fou

rier

tran

sform

s

Type

of F

unct

ion

Domain o

f Fun

ctio

n Fo

urier

Tra

nsfo

rm

Inve

rse T

rans

form

-(n

- l), . . . ,

o,. .

. ,n

Ct =

- 1

2 X

Ie-'

"k""

XI

= 2

Ck

ec

-kfl

n

Fini

te s

eque

nce

X,

2n I-

-("

-l)

I=

-m - a

)

t= -(

n -

I),.

. .,O

,. . . ,

n

(can

be ex

teM

ied to

r =

0, 2

1, f

2,. . .)

A = -(

n - 1)

. . . .

, 0, -

. . , n

Infin

ite se

quence

fib

) ab

solu

tely

su

mm

able

(0, 2

1, 2

2,.

. .) m

dh

) = I f

(o)e

""

"

do

1"

21r

It-.

-_

-*

fl

u)=

- r,

@)e

-'""

w€

(-7

T,

1r)

fio)

per

iodi

c of p

erio

d 27r

h =

0, 2 1.

22.

. . .

rn

f~

)-

C c

kem

tx

x E

[ - m,

w]

(can be

extended as

aper

iodi

c fu

nctio

n on th

e real li

m)

k---

C

ontin

uous

pie

cew

ise

[-n

; TI

c, =

- ' IT

flx)e-'t

xdx

2u -

n

t =

0,I

tl. 22.. . .

smoo

th jl

x)

- TI2

fF)=

2 C

ke=

2-

k=.-!

n co

ntiB

uous

perio

dic

(-T12, T1

2)

&

piec

ewise

smoo

th

(can

be

exte

nded

to

rhe

real

line

) k =

0.2

1,22

, . . .

x E (-

T12,

T12

) (can be

exte

nded

to r

eal li

ne)

Page 158: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

FOURIER TRANSFORM-FUNCTIONS DEFINED ON THE REAL LINE 135

Given a continuous function defined on an interval, it may be convenient to record the function at a finite number of equally spaced points and find the Fourier transform of the N points. Time series are often created by reading “continuous” functions of time at fixed intervals. For example, river level and temperature have the appearance of continuous functions of time, but we may recoTd for analysis the readings at only a few times during a day or season.

Let g(x) be a continuous function with square integrable derivative defined on the interval [-T, TI. We evaluate the function at the 2.m points

x, = ( : ) T , t = -(m - I), -(m - 2), . . . , m - 1, rn .

The complex form of the Fourier coefficients for the vector g(xl) is given by

and g(x,) is expressible as

5 C#8)er‘ lrkr/nt = , t = -(m - 11, -(nt - 21,. . . ,M . (3.3.4) k - - (m - 1 )

The superscript (m) on the coefficients is to remind us that they are based on 2m equally spaced vaiues of gQ.

If we compute the Fourier coefficients for the function using the integral form (3.2.1), we have

and the original function evaluated at the points x,, t = -(m - l), -(m - 2), . . . , m, is given by

k = - m

By the periodicity of the complex exponential we have

(3.3.5)

Page 159: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

136 JNTRODUCl'ION TO FOURIER ANALYSIS

Table 3.32. Cmputed Coellicients for Fhnctbn (3.3.6) observed at 10 Points

Period Preclaency cp' + c'-": Alias Frequencies

- 0 0.5 10120 20 1 120 0.0 - 10 2/20 0.0 - 2013 3/20 1.6 13/20,23/20

5 4/20 0.0 - 4 5/20 0.3 15/20

Cosine Coefficient Contributing

Equating the coefficients of eilrk'"" in (3.3.4) and (3.3.5), we have aD

cp' = ck + 2 (Ck-Zns +Ck+zmj) * s = l

It follows that the coefficient for the kth frequency of the hnction defined at 2m points is the sum of the coefficients of the continuous function at the k, k + 2m, k - 2m, k + 4m, k - 4m, . . . frequencies. The frequencies k - i - h , k+4m,. . . are called the aliases of the kth frequency.

In our representation the distance between two points is T/m. The cosine of period 2Tlm or frequency m/2T is the function of highest fresuency tbat will be used in the Fourier transform. The frequency m/2T is called the Nyquist frequency. The aliases of an observed frequency are frequencies obtained by adding or subtracting integer multiples of twice the Nyquist frequency. To illustrate these ideas, let g(x) defined on [-lo, 101 be given by

g(x) = 1.0 cos 27r& + 0.5 cos m

+ 0.4 cos 2 n S x + 0.3 cos 27r&

+ 0.2 cos 27rgx . (3.3.6)

If the function is observed at the 10 points -8, -6,. . . ,8,10 and the {ci'): k = -4, -3,. . . ,4,5} are computed by

we will obtain the coefficients given in Table 3.3.2.

3.4. FOURIER TRANSFORM OF A CONVOLUTION

Fourier transform theory is particularly useful for certain function-of-function problems. We consider one such problem that occurs in statistics and statistical time series,

Page 160: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

FOURIER TRANSFORM OF A CONVOLUTION 137

For absolutely integrable functions j@) and g(x) defined on the real line, the function

is called the convolution of f i x ) and g@). Since f ix ) and g(x) are absolutely integrable, ~ ( x ) is absolutely integrable.

We have the following theorem on the Fourier transform of a convolution.

"beorem 3.4.1. If the functions f ix ) and g(x) are absolutely integrable, then the Fourier transform of the convolution offlx) and g(x) is given by

where

and

are the Fourier transforms offlx) and g(x).

proof. The integrals defining c,(o), CAW), and c,(w) are absolutely con- vergent. Hence,

where the absolute integrability has enabled us to interchange the order of integration. A

Page 161: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

138 INTRODLJCIION To POURlER ANALYSIS

We discussed the convolution of two absolutely summabk quences in Section 2.2 and demonstrated that the convolution was absolutely summable.

Corollary 3.4.1.1. Given that {aJ} and {b,) ace abaolutely summable, the Fourier transform of

m

d,,, = 2 amejbj j * - m

is

Where

and

Proof. The Fourier transform of id,} is

We may paraphrase these results as follows: the spectral density (Fourier transform) of a convolution is the product of the spectral densities (Fourier transforms) multiplied by 2v. That the converse is also true is clear from the proofs. We give a direct statement and proof for the product of absolutely summable sequences.

Corollary 3.4.1.2. Let {a,) and {b,) be absolutely summable, and define d, by

d, = ambm .

Page 162: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES

Then

139

where f,(w) and &(w) are &fined in Corollary 3.4.1.1.

Proof. We have . m

REFERENCES

A

Jenkins and Watts (1968), Lighthill (1970). Nerlove (1964), Rogosinski (1959), Tolstov (19621, Zygmund (1959).

EXERCISES

1. Give the 12 orthogonal sine and cosine functions that furnish an orthogonal basis for the 12-dimensional vector space. Find the linear combinations of these vectors that yield the vectors ( I , 0,. . . , O ) , (0, l , O , . . . , O ) , . . . , (0, * . . ,o, 1).

2. Expand the following functions defined on [ - 7r, 771 in Fourier series: (a) Am) = cos ao, where a is not an integer. (b) f iw) = sin aw, where a is not an integer. (c) fro> = e.'"', where a z 0.

- I r c w c o

Page 163: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

140

3. &fine g(x) by

INTRODUCTION Vl FOURIER ANALYSIS

(a) Find the Fourier coefficients for g(x). (b) What is the maximum value for

l 3 S,(x) = 5 + 2 (ak cos Rx + b, sin Rx)?

& = I

Where does this maximum occur? What is the maximum value of S&)? S,(x)? The fact that the approximating hct ion always overestimates the true function near the point of discontinuity is called Gibbs' phenomenon.

4, Prove Theorem 3.1.2.

5. Let Ax) be the periodic function defined on the real tine by

where j = 0, -+I, 22,. . . and 0 < b < v. Find the Fourier transform of f ix) .

6. Let

-b C X < b , 0 otherwise,

where b is a positive number and f ix ) is defined on the real lint?. Find the Fourier transform of f(x). Show that the limit of this transform at zero is infinity as b -$ m. Show that as b -+ GQ the transform is bounded except at zero.

7. Let

1 I

otherwise, a,@) =

where n is a positive integer and S,(x) is defined on the real line. Find the Fourier transform of a,,@). Show that, as n -+a, the transform tends to the constant function of unit height.

8. Let the generalizedfunction &x) represent a sequence of functions {8,(x)}:=,

Page 164: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES

such that

141

wherefix) is a continuous absolutely integrable function defined on the real line. Then S(x) is called Dirac's delta function. Show that the sequence of functions {S,(.x)} of Exercise 7 defines a generalized function.

112 -nx2 9. kt g,(x) = (dn) e for x E (-a, a). Show that the sequence {g,,(x)};=, yields a Dirac delta function as defined in Exercise 8.

10. Letfin) defined for n E (-00, a) have the Fourier transform c(u). Show that the Fourier transform of flax + b) is lal-le'bu'"c(ula), a f 0.

11. Assume that price of a commodity is recorded on the last day of each month for a period of 144 months. The finite Fourier coefficients for the data are computed using the formulas following (3.1.2). Which coefficients will be affected if there is a weekly periodicity in prices that is perfectly represented by a sine wave of period 7 days? Assume that there are 30.437 days in a month Which coefficients will be affected if the weekly periodicity in prices can be repented by the sum of two sine waves, one of period 7 days and one of period 3 3 days? See Granger and Hatanaka (1964).

12. Let f l x ) and g(x) be absolutely integrable functions, and define

Show that d x ) satisfies

13. Let f l x ) and g(x) be continuous absohtely integrable functions defined on the real line. State and prove the result d o g o u s to Corollary 3.4.1.2 for 444 = f l - M x ) .

14. Give a direct proof of Corollary 3.4.1.2 for finite transfoxms. That is, for the two functions y(h) and w(h) defined on the 2n - 1 integers h = 0, 2 1,

Page 165: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

142 INTRODUcllON TO FOURIER ANALYSIS

22, . . . , +(n - I), Show that

2 nk *=-* k=0, 21, 2 2 (..., +(n- 1).

IS. Let f(o) be a nonnegative even continuous periodic function of period 27~. Show that

II

dh)= fjw)e-'"hdw, h = O , 21, 2 2 ,..., -W

is an even positive semidefinite function.

Page 166: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

C H A P T E R 4

Spectral Theory and Filtering

In Chapter 1 we discussed the correlation function as a way of characterizing a time series. In Chapter 2 we investigated representations of time series in terms of more elementary time series. These two descriptions are sometimes called descriptions in the time domain because of the obvious importance of the index set in the representations.

In Chapter 3 we introduced the Fourier transform of the correlation function. For certain correlation functions we demonstrated the uniqueness of the transform and showed that the correlation function is expressible as the inverse transform of the Fourier transform. The Fourier transform of the absolutely summable correla- tion function was called the spectral density. The spectral density furnishes another important representation of the time series. Because of the periodic nature of the trigonometric functions, the Fourier transform is often called the representation in the freqwncy domain.

4.1. THE SPECTRUM

In Chapter 1 we stated the result that the correlation function of a time series is analogous to a statistical characteristic function and my be expressed in the form

where the integral is a Lebesjpe-Stieltjes integral and mu) is a statistical distribution function. In Theorem 3.1.9 we proved this result for time series with absolutely summable covariance functions.

Since the covariance function of a stationary time series is the correlation function multiplied by the variance of the process, we have

(4.1.1a)

143

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 167: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

144

where

dF(o) = do) dG(o) . Both of the functions G(w) and F(w) have been called the spectral distribution function in time series analysis. The spectral distribution function is a non- decreasing function that, for our pwposes, can be assumed to be composed of the sum of two parts: an absolutely continuous portion and a step function.’ We take (4.1.1a) as the definitional relationship between the spectral distribution function and the covariance function. The spectral distribution function is also sometimes called the integrated spectrum.

Let us assume Hh) is absolutely summable. Then, by Theorem 3.1.9, flu) defined by

(4.1.2)

is a continuous nonnegative even function, and

Thus for time series with absolutely sumable covariance function, dF(w)= flu) do, where f(w> was introduced as the spectral density function in Section 3.3.

Recall that we have taken {e,} to be a time series of uncorrelated (0, r’) random variables. The spectral density of {e,} is

’ Any statistical distribution function can be decomposed into three components: ( I ) a step function containing at most a countable number of finite jumps; (2) an absolutely continuous function; and (3) a “continuous singular” function. The third portion will be ignored in our treatment. See Tucker (1%7, p. 15 ff.). While not formally correct. one may think of F(o) as the sum of two parts, a step function with jumps at the points o,, j = -M. -(M - I ) , . . . , M - 1. M, and a function with continuous first derivative. Then the Lebesgue-Stieltje-s integral J g(o) dF(w) is the sum of ZE-, g(o,)f(q) and J g(w)j(w) do. where I @ ) is the height of the jump in F(o) at the point 9, flu) is the derivative of the continuous portion of F(o), and g(o)j(w) do is a Riemann integral.

Page 168: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE SPECIXUM 145

which is positive, continuous, and trivially periodic. Similarly,

“ I

“ 1

yJh) = c2ed** d o

- cz cos wh dw = I_, 27r c , h = O , 0 otherwise.

As noted previously, the time series e, is often called white noise. The reason for this description is now more apparent. The spectrum is a constant multiple of the variance and, in analogy to white light, one might say that all frequencies contribute equally to the variance.

If the spectral distribution function contains finite jumps, we can visualize the spectrum containing discrete “spikes” at the jump points in the same way that we view discrete probability distributions. We now investigate a time series whose spectral distribution function is a step function.

Let a time series be composed of a finite sum of simple processes of the form (1.6.3). Tha! is, define the time series U, by

M

Y, = 2 (Aj cos o y + E, sin ?t ) , (4.1.3) 1-0

where the Aj and B, are random variables with zero mean and

E{A:} = E{E;} = a; , j = O , 1,2 ,..., M , j + k ,

vj, k , E{EjEk} = E{A,A,} = 0 .

E{A,E,} = 0

and a+, j = 0,1, . . . , M, are distinct frequencies contained in the interval [ - n, 4. By (1.6.4), we have

M

y,(h) = E{y,y,+,} = z: v;rcos ay cos q(f + h)] J”0

=% crfcosyh. j - 0

(4.1.4)

Since the function y,,(h) is composed of a finite sum of cosine functions, the graph of u; against 9 (orj) will give us a picture of the relative contribution of the variance associated with frequency q to the variance of the time series. This is

Page 169: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

146 SPECTRAL THEORY AND FILTERING

true because the variance of is given by

While we permitted our original frequencies in (4.1.3) to lie anywhere in the interval [-n; T] , it is clear that with no loss of generality we could have restricted the frequencies to [0, T]. That is, the covariance function for

X, = A j cos ( - 9 ) t + B’ sin (-q)t

is the same as that of

X, = A, cos qt + Bj sin 9’.

his suggests that for a covariance function of the form c; cos u,h we associate one half of the variance with the frequency -9 and one! half with the frequency 9. To this end, we set

To facilitate our representation, we mume that o, = 0 and then write the sum (4.1.4) as

(4.1.5)

where 0-, = -9, We say that a time series with a covariance function of the form (4.1.5) has a

discrete spectrum or a line spectrum. Equation (4.1.5) may be Written as the Lebesgue-Stieltjes integral

U

- - \ e*’”’dF(o), (4.1.5a) J - u

where F(o) is a step function with jumps of height iu; at the points 9 and -9, id: 0, and a jump of height a: at w, = 0. By construction, the jumps Z(I$ are

symmetric about zero. and we have expressed ~ ( h ) in the general form (4.1.1a). We used this method of construction because the covariance function &(It)= ZEo u: CQS +h is not absolutely summable and we could not directly apply (4.1.2).

For our purposes it is sufficient for us to be able to recognize the two types of autocovariance functions and associated spectra: (1) the autocovariance function

Page 170: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE SPECTRUM 147

4 n

I n

that is the sum of a finite number of cosine functions and is associated with a spectral distribution function, which is a step function; and (2) the absolutely summable autocovariance function that is associated with a continuous spectral density.

r

.I

Example 4.1.1. Let us consider an example. Assume that the covariance function is given by (4.1.4) with the variances and frequencies specified by Table 4.1.1. Defining I(?) as in (4.1.5), we have

Z(0) = 4 , f ( - q 7 r ) = Z ( ~ . . ) = ~ ,

1(- & 71.) = f(& 7r) = f , I( - a 7r) = f(i 7r) = ; , l(-i n) = f(i 7r) = .

The original variances are plotted against frequency in Figure 4.1.1, and the line s p e c t ~ m is plotted in Figure 4.1.2. The associated spectral distribution function is given in Figure 4.1.3. AA

1 1

Table 4.1.1. Examptes of Varianas for Time Series of Form (4.13)

I

Page 171: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

148 SPECTRAL THFORY AND FILTERING

Frequency FLgure 4.1.2. Graph of the line spectrum for the time series of Table 4.1.1.

I1 ‘ 1

- c

I I l l 1 1 1 I -n 0 R

Figure 4.13. Spectral distribution function associated with the spectrum of figure 4.1.2.

Frequency, w

Example 4.13. As a second example, let X, be defined by

W IT X, = Q, msyt + c, sin j - f + e, , (4.1.6)

where the e,, z = 0, 2 1.22.. . , , are independent (0.0.21~) random variables independent of the 4, j = 1,2, which are independent (0.0.8~) random variables. Letting

Page 172: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CIRCULANTS-DIAGONALEATION OF THE COVARIANCE MATRIX 149

it follows that

and

Therefore,

The autocovariance function of X, is

= I(w,)cos cqh + 1 ” 0.1 cos wh dw J = - l - w

h = O , f ’ n

0.8n cos yh otherwise,

where w-, = - d 2 and w, = d 2 . The reader may verify this expression by evaluating E{X,X,+,} using the definition of X, given in (4.1.6). AA

4.2. CIRCULANTS-DIAGONALIZATION OF THE COVARIANCE MATRIX OF STATIONARY PROCESSES

In this section we investigate some properties of matrices encountered in time series analysis and use these properties to obtain the matrix that will approximately diagonalize the n X n covariance matrix of n observations from a stationary time series with absolutely summable covariance function. Let the n X n covariance matrix be denoted by r. Then

r =

Page 173: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

150 SPECTRAL THEORY AND PILTERiNG

It is well known' that for any n X R positive semidefinite covariance matrix r, there exists an M satisfying M'M = I such that

M'I'M = diag(A,, A,, . . . A,,), where A,l i = 1,2,. . . , n, are the characteristic roots of l'. Our investigation will demonstrate that for large n the Ai are approximately equal to 2@0,), whereflw) is thespectraldensityofX, and0+=2?~jln, j=0, 1 , 2 , . . . 1 n - 1.Thispermitsus to interpret the spectral density as a multiple of the variance of the orthogod random variables defined by the transformation M. We initiate our study by introducing a matrix whose roots have a particularly simple representation.

A matrix of the form

y(l) * * - r(n-2) r(n-I ) f i n - 1) y(0) * * . fin-3) fin-2) fin-2) Hn-1) * . . r(n-4) fin-3)

r, = [ ] (4.2.1)

is called a circular matrix or circulunt. Of course, in this definition f i j ) may be any number, but we use the covariance notation, since our immediate application will be to a matrix whose elements are covariances.

The characteristic roots Aj and vectors xj of the matrix (4.2.1) satisfy the equation

(4.2.2)

and we have

fi2) r(3) - * * Y(0) f i l l f i l l 742) * . * fin- 1) 740)

rcxj = A j x j , j = 1,2,. . . , n ,

y(l)xjj f y(2)x, ,+**'+y(n- lbn-fi,j+fi0bmj=A~,,j* where xy is the kth element of the jth characteristic vector. Let r, be a root of the scalar equation r n = 1 and set xkj = r:. The system of equations (4.2.3) becomes

r(0)rj + fil)r; + + - 2)t;"-' + fin - l)t$'=Ajr,,

f in - l)r, + H0)r; + - * * + fin - 3)r;-' + Hn - 2)r; = Air;, (4.2.4)

Page 174: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CIRCULANTS-DIAGONALIZATION OF THE COVARIANCE MATRIX 151

If we multiply the first equation of (4.2.4) by ry-’, the second by ry-’, and so forth, using rjn+k = rf, we see that we sball obtain equality for each equation if

n- I

Aj = 2 y(h)rj. h = O

(4.2.5)

he equation tn = I ha^ n distinct roots. eo2rrj1r, j = 1.2,. . . , n, which may , j = 0,1,. . . , n - 1. Therefore, the n charac- - e 2 u j l n also be expressed as rj = e

teristic vectors associated with the n characteiistic roots are given by = n - 1 / 2 [ 1 , e-*2rrjln -.2a2jln -rZrr(n- I ) j ln I

X j = gj , e ,... , e ] , j = O , I I . . . , n - I , (4.2.6)

where g; is the conjugate transpose of the row vector

I . -112 c2u j ln , e v 2 ~ 2 ~ t n c2n(n- l ) j l n g j . = n [I, e ,... * e

If we define the matrix G by

G = (g;.? a; , . . . , g:.)’ ,

then

GG*=I, Gr,G*=diag(~,A,, ..., A n , _ , ) ,

where G* is the conjugate transpose of G.

obtain the circular symmetric matrix Setting y(1) = r(n - l), y(2) = fin - 2)*. . . , Hh) = f in - h), . , . in r,, we

(4.2.7)

(4.2.8)

where we have used the periodic property e-02T’(n-h)’n - - @a2rrjhln . Note that these roots are real, as they must be for a real symmetric matrix.

Page 175: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

152 SPECTRAL W R Y AND FILTERING

In some applications it is preferable to work with a rea1 matrix rather than the complex matrix G. Consider first the case of n odd. Eiquation (4.2.8) may also be Written

( n - I ) / 2 2 s A j = f i h ) c o s n h j , j = O , l , ..., n - 1 . (4.2.9)

h- - ( n - 1)12

SinceforOSrn~2~wehavecosm=cos(2r-m) , weseethatthereisaroot for j = 0 (or j = n) and (n - 1)/2 roots of multiplicity two associated with j = 1,2,. . . , (n - 1)/2, For each of these repeated roots we can find two real orthogonal vectors. These are chosen to be

and

If we choose the vectors in (4.2.6) associated with the j th and n - jth roots of one, the vectors in (4.2.10) are given by

2-"*(gj. +gn- j , . ) and 2-l'*4gj.

Much the same pattern holds for the roots of a circular symmetric matrix of dimension n Xn where n is even. There is a characteristic vector n (1,1, ..., 1) associated with j = O , and a vector n-1'2(l ,- l , l , . . . , - l) associated with j = n/2. The remaining (d2) - I roots have multiplicity two, and the roots are given by (4.2.8).

-112

Define the orthogonal matrix Q by setting t~"'2-"~Q' equal to

2 - I l 2 ... 2-'12 2-1/2 '2-Il2 n-1

cos 2 v - n ... 1 2

cos 2T - n cos 2 s - n 1 n - 1

sin 2 9 - n ... 1 2

sin 2 v - 0 sin29; n n - 1

1 cos 41r n ... cos 4 s y 2 cos4r- n

1

n - 1 1 n - 1 2 n - 1 n - l 2s; sin-25 * . . sin 7 2 r y-- 2 0 sin- 2 L1

(4.2.11)

Note that Q is the matrix composed of the n characteristic vectors defined by

Page 176: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CIRCULANTS-DIAGONALUATION OF THE COVARIANCE MATRIX 153

(4.2.10) and our illustration (4.2.1 1) is for odd n. Let X, be a stationary time series with absolutely summable covariance function Hh). For odd n, define the n X n diagonal matrix D by

D = diag{d,, d,, . . . , d,) , (4.2.12)

where

j = 1,2, . . . , OS(n - 1). For r, defined by (4.2.7) the matrix QT,Q is a diagonal matrix whose elements converge to 2721) as n increases. This also holds for even n if the definition of Q is slightly modified. An additional row,

- 1 1 2 n [I, -1, l , . . . , I , - 1 1 ,

which is the characteristic vector associated with j = n / 2 , is added to the Q' of (4.2.1 1) when n is even. The last entry in D for even n is

The covariance matrix for n observations on X, is given by

HO) f i l ) - . . f i n - 1 )

r = (4.2.14)

fin-1) Hn-2) - - . For Hh) absolutely summable, we now demonstrate that QTQ also converges to 27rD.

Let q = [ql , , qZi,. . . , qni]' be the ith column of Q. We have, for I' defined in (4.2.14) and r, defined in (4.2.7),

Page 177: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

154 SPECTRAL THBORY AND PLTERlNG

whete M = (n - 1)/2 if n is odd and M = (0112) - 1 if n is even. Now (4.2.15) is less than

since qkiqri G 21n for all k, i, r, j E (1.2, . . . , n). As n increases, the limit of the first term is zero by Lemma 3.1.4 and the limit of the second term is zero by the absolute summability of r(h). Therefore, the elements of Q'I'Q converge to 27rD. We state the result as a theorem.

Theorem 4.2.1. Let r be the covariance matrix of n observations fr+n a stationary time series with absolutely summable covariance function. Let Q be defined by (4.2.1 I), and take D to be the n X n diagonal matrix defined in (4.2.12). Then, given e > 0, there exists an n, such that for n > n, every element of the matrix

Q'rQ - 2nD is less than B in magnitude.

Corollary 4.2.1. Let r be the covariance matrix of n observations from a stationary time series with covariance function that satisfies

W

2 l~lIrcNl=L<cQ. h m - m

Let Q and D be as defined in Theorem 4.2.1. Then every element of the matrix Q'I'Q - 2nD is less than 4Lln in magnitude.

Proof. By (4.2.16) the difference between q.;I'#q.] and qlirq, is less than

Now QT,Q is a diagonal matrix, and the difference between an element of QT,Q and an element of 211D is

Page 178: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE S I ” R A L DENSITY OF MOVING AVERAGES 155

We have demonstrated that, asymptotically, the Q defined by (4.2.1 1) or the G defined by (4.2.6) will diagonalize all r-matrices associated with stationary time series with absolutely swnmable covariance functions. That is, the transformation Q applied to the n observations defines n new random variables that are “nearly” uncorrelated. Each of the new random variables is a linear combination of the n original observations with weights given in (4.2.10). The variances of both the 2jtb and (2j + 1)th random variables created by the transformation are approxi- mately 2rrf l2q’ln).

For time series with absolutely summable covariance function the variance result holds for the random variable defined for any w, not just for integer multiples of 2 d n . That is, the complex random variable

n

has variance given by

By Lemma 3.1.4,

We shall return to this point in Chapter 7 and investigate functions of the random variables created by applying the transformation (4.2.10) or (4.2.6) to a finite number of observations from a realization.

4.3. THE SPECTRAL DENSITY OF MOVING AVERAGE AND AUTOREGRESSIVE TIME SERIES

By the Fourier integral theorem we know that the Fourier transform of an absolutely summable sequence is unique and that the sequence is given once again by the inverse transform of the transform. Therefore, we expect the spectral density of a particular kind of time series, say a finite moving average, to have a

Page 179: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

156 SPECTRAL THEORY AND mLTERMG

distinctive form. We shall see that this is tbe case: the spectral densities of autoregressive and moving average processes are easily recognizable.

Stationary moving average, stationary autoregressive, and stationary auto- regressive moving average time series are contained within the representation

X, = bjeIdj (4.3.0) j - -0

where

and the e, are uncorrelated (0,~') random variables. Therefore, if we obtain the spectral density of a time series of the form (4.3.0), we shall have the spectral density for such types of time series. In fact, it is convenient to treat the case of an infinite weighted sum of a more general time series.

Theorem 4.3.1. Let X, be a stationary time series with an absolutely summable covariance function and let {aj}y=-m be absolutely summable. Then the spectral density of

is

wheref,(w) is the spectral density of XI,

is the Fourier transform of uj, and

is the complex conjugate of the Fourier transform of uj.

(4.3.1)

Proof. Given the absohte summability, we may interchange integration and summation and calculate the expectation term by term. Therefore, letting

Page 180: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE SPECTJIAL DENSlTY OF MOVING AVERAGES 157

E{X,} = 0, we have

m m

Ifwe s e t p = h - k + j , then m m

(4.3.2)

Since yy(h) is absolutely summable, we may write

t m m m

where we have made the transformation s = j + h - p. A

The theorem is particularly useful, and we shall make repeated application of the resuit. If the X, of the theorem are assumed to be uncorrelated random variables, we obtain the spectral density of a moving average time series.

Corollary 4.3.1.1. The spectral density of the moving average process m

where {e,} is a sequence of uncorrelated (0, a*) random variables and the sequence {a,] is absolutely summable, is

(4.3.3)

Proof. The result follows immediately from T h m m 4.3.1. A

Page 181: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

158 SPECTRAL THEORY AND FILTERING

Let us use (4.3.3) to calculate the spectral density of the finite moving average

P

X, = C aje,-j , /=O

We have

(4.3.4)

(4.3.5)

Equation (4.3.5) is a proof of the corollary for finite p and Serves to reinforce our confidence in the gened result. The spectral density of a finite moving average may, alternatively, be expressed in terms of the roots of the auxiliary equation. The quantity

is seen to be a poiynominal in e-'". Therefore, it may be written as

where uo = 1 and the m, are the roots of

Page 182: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE SPECTRAL DENSITY OF MOVING AVERAGES 159

It follows that the spectral density of the {X,} defined by (4.3.4) is

We can also use Corollary 4.3.1.1 to compnte the spectral density of the first order autoregressive process

The transform of the weights uJ = p' is given by

and the complex conjugate by

Therefore.

(4.3.7)

U2 1 2.rr 1 - 2 p c O s u + p 2 *

-- -

sincefc(o), the spectral density of the uncorrelated sequence {q}, is a2/27r for all

We note that we could also find the spectral density of X, by setting Y, = e, in (4.3.1) and using the known spectral density of e,. Applying this approach to the pth order autoregressive time series, we have the following corollary.

w.

Codary 4.3.1.2. The spectral density of the pth order autoregressive time series {X,} defined by

Page 183: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

160 SF'FLTRAL THEORY AND FILTERMI

where a. = 1, ap # 0, the e, are uncorrelated (0, rr2) random variables, and the roots {m,: j = 1,2,. . . , p } of

P

R(m) = 2 C Y / ~ ~ - ~ = 0 j = O

are less than one in absolute value, is given by

u2 1 2~ j - I ( 1 - 2m, cos o + m:) * =-n (4.3.8)

Proof. Apply Theorem 4.3.1 to the finite moving average time series 1

to obtain

Where

= R(eiw)R(e-'").

Neither R(e"") nor R(e-"") can be zero, since the mots of R(m) cannot have an absolute value of one. Therefore, no) is not equal to zero for any o, and we are able to write

f X ( 4 =f,(o)T-'(w) *

This is the first repmentation given in (4.3.8). The alternative form forfX(w) follows by writing no) in the factored form

A

Page 184: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE SPECTRAL IIEfGITY OF MOVING AVERAGES 161

We can also write (4.3.8) in the factored form

The spectral density of the autoregressive moving average process is sometimes called a rational spectral density because it is the ratio of polynomials in e"".

Corollary 43.1.3. Let an autoregressive moving average process of order ( p , q) be defined by

where a,, = & = 1, ap #O, p9 # 0, the e, are uncomlated (0, u2) random variables, and the rods {mi: j = 1,2,. . . , p ) of

P

R(m) = 2 aJmP-j = 0 1-0

are less than one in absolute value. Then the spectral density of {X,} is given by

0 a

where {rk: k = 1,2.. . . , q} are the mots of

(4.3.10)

k=O

Proof. Reserved for the reader. A

In Theorem 2.6.4 we demonstrated that a finite moving average process with no roots of unit absolute value always has a representation with characteristic equation whose roots are less than one in absolute value. In Theorem 2.6.3 we obtained a moving average representation for a time series with IpCl)( < 0.5 and p(h) = 0, h = rt2, +3, . . , . We can now extend Theorem 2.6.3 to time series with a finite number of nonzero covariances.

Page 185: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

162 SPECTRAL THEORY AND FILTERING

Theorem 4.3.2. Let the stationary time series X, have zero mean and spectral density

where flu) is strictly positive, Hh) is the covariance function of X,, and Hq) # 0. Then X, has the representation

4 4

where Po = 1, the e, are uncomlated (0, a2) random variables with a' = HO)[X;=o Sf]-'. and the roots of

4

are less than one in absolute value.

Proof. The function

is a polynomial of degree 2q in z with unit coefficient for zZ4. Therefore, it can be written in the factored form

where the m, are the roots of A(z) = 0. If A(m,) = 0, then

and by symmetry

h a - q

Sinceflu) is strictly positive, none of the roots are of unit absolute value. Also, the coefficients are real, so that any complex roots occur as conjugate pairs. Thetefore, we can arrange tbe roots in q pairs, (m,, m-J = (mr, rn;'), where multiple roots are repeated the proper number of times. We let m,, r = 1.2,. . . , q. denote the roots less than one in absolute value. Using these roots, we define 4, for

Page 186: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE SPE€TRAL DENSITY OF MOVING AVERAGES 163

J=0 ,1 ,2 ,..., +by a a I3 (z - m,) = 2 gzq-J .

r= I j-0

Therefore,

4 4

z-4r(q)A(z) = z-4r(q) r= n I (z - m,) r = l n (z - m;‘ )

It foliows that the expression in e-’e, defining f(o) can be written as

where fa(o) is the Fourier transform of {g} and a* = r(O)(;Ci”=, @:)-’. Now define the sequence of random variables {e,) by e, = Z,Tm0 c,X,-~, where the sequence {c,} is defined in Theorem 2.6.2. These c, are such that X, = X;=o P,e,-,, which means that l f , (~)&(w)1~ = ( 2 ~ ) - ~ . Therefore, the spec3al density of e, is

and the e, are uncorrelated with variance c2. A

By our earlier results in Fourier series we know that a continuous periodic function can be approximated by a trigonometric polynomial. This means that a time series with a continuous spectral density is “very nearIy” a moving average process and also “very nearly” an autoregressive process,

Theorem 43.3. Let g(o) be a nonnegative even continuous periodic function of period 2v. Then, given an e 3 0, there is a time series with the representation

x, = x 4 45-J ]=O

such that Ifx(o) - g(w)I < for all o E [-P, ?r] where Pa = 1, q is a finite

Page 187: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

164 S M A L THEORY AND FILTERING

integer, the e, are uncmlated (0, a’) random variabies, and

Proof. The result is trivial if g(o) * 0; hence, we assume g(w) > 0 for some w. Given e > 0, let S = 2eC(4nM + 3G)-’ and set

g ( 4 if g ( 4 ’ 6 v

where

M=mwa g(o)

P

G = I_, g ( 4 d o *

Then by Theorem 3.1.10, there is a q such that I&(#) - C(o)l<+S for all w E [-T, R], where

and

By Exercise 4.14, r(h) is positive semidefinite, and fu(o) is of the same form as the spectral density in Theorem 4.3.2. Therefore, we may write

where 8, = 1, the roots of 2,9;, /$rnqPJ are less than one in absolute value, and T~ = [z,’=, g,”]-’ J:,, C(o) do. Setting

where

A

Page 188: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE SPECTRAL DENSITY OF MOVING AVERAGES 165

Theorem 43.4. Let g(w) be a nonnegative even continuous periodic function of period 2n. Then, given an E > 0, there is a time series with the representation

such that Ifu(w) - g(w)l< t. for all w f [-n, n], where q-, = 1, p is a finite integer, the roots of

P

9 r n P - J = o J-0

are less than one in absolute value, and the e, are uncorrelated (0, v2) random variables.

Proof. The result is trivial if g(w) = 0, Assuming g(w) > 0 for some w define

Also define G = max g(w), and let 0 < S < e[G(2G + €)I-'. Then, by Themem 3.1.10, there is a finite p such that

for all w E [- .rr, lr]. By Theorem 4.3.2,

where a, = 1, M is a constant, and the roots of "=,, ajmp-J = 0 are less than one in absolute value. Hence, by defining

and u2 = 2nM-', we have the conclusion. A

Operations on a time series are sometimes called filtering. In some areas, particularly in electronics, physical devices are constructed to filter time series. Examples include the modification of radio waves and radar signals. The creation of a linearly weighted sum is the application of a linear filter, the weights being identified as the filter. If, as in (4.3.0), the weights are constant over time, the filter

Page 189: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

166 SPECTRAL THEORY AND FILTERING

is called time inv~riant.~ Obviously such operations change the behavior of the time series, and in practice this is often the objective. By investigating the properties of the filter, one is able to make statements about the change in behavior.

It is clear that moving average and autoregressive processes are obtained from a white noise process by the application of a linear time invariant fib. Thus, using this terminology, we have been studying the effects of a hear time invariant filter on the spectral density of a time series. To further illusbate the effects of the application of a linear filter, consider the

simple time series

and apply the absolutely summable linear filter uj to obtain m

j = - m

We may express the Fourier transform of uJ in a general complex form - @,,,, (say)

OD

271f,(w) = C, uje = u(o) + /s-0

= @(/(w)[cos do) + is in &o)] = @(&?'(0), (4.3.11)

where

The reader should note the convention used in defining the sign of u(w)/u(o): u(w) is the coefficient of i (not -6) in the original transform.

In this notation. the filtered cosine wave becomes

1 y, = a r M w ~ e * l m f + B + ' ( w ' l + e-sletr+8+r(41

= 2aflw)cos(wi + B + p(w)}.

Thus, the process Y, is also a cosine function with the original period, but with amplitude 2a@(o) and the phase angle /? + do). The function #(to) is called the

' we may define the time invariant property as follows. If the output of a filter g applied to X, is q, lhen the filter is time invariant if g(X,+,) = V,,, €or all integer h.

Page 190: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE SPECTRAL DENSITY OF MOVING AVERAGES 167

gain of the filter, and the function p(w) is the phase angle (or simply phase) of the filter. The rationale for these tenns is clear when one observes the effect of the filter on the cosine time series. The transform 2?rf,(w) is called the frequency responsefunction or the transferfunction of the filter. From (4.3.1) we know that the spectrum of a time series Y, created by linearly filtering a time series X, is the spectrum of the original series, fx(w), multiplied by the squared gain of the filter. The squared gain is sometimes called the power transfer function.

A particularly simple filter is the perfect delay. Let {X,: t E (0, 2 1, 22, . . .)} be a stationary time series. The time series delayed or lagged by 7 periods is defined bY

y, =X,-, I

where the filter is defined by u7 = 1 and uj = 0, j # 7. Hence the frequency response function of the Mter is e-'"' = cos m - i sin w7, the gain is (cos2w + sin2m)"* = 1, and the phase angle is tan-'(-sin m/cos urr) = - m. A cosine wave of frequency w and corresponding period 27rw-' completes (2n)- 'w7 cycles during a "time period" of length 7. Thus the phase of such a cosine wave lagged T periods is shifted by the quantity (27r)-'07.

Theorem 4.3.1 is a very useful theorem for absolutely summable filters. In some situations, a mean square result applicable to a broader range of filters and time series is appropriate.

Theorem 43.5. Assume X, is a zero mean stationary time series with spectral density fx(w). Assume {cj}T--.- is a square summable sequence and that

converges absolutely. Define m

y," 5 cjx,, , t = O , k I , t 2 ,... . j 3 - m

Then

and

where

(4.3.12)

(4.3.13)

Page 191: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

168 SeecTRAL THBORY AND FILTERING

Proof. Because (4.3.12) converges, we can interchange orders of integration and summation and we have

m .

Because of the one-to-one correspondence in mean square between the Fourier coefficients of a square integrable function and the function, f,(w) is the function specified in (4.3.13). A

In Section 2.11, we introduced the long memory processes that satisfy

(1 - aIdu, =z, (4.3.14)

or

y, = (1 - a,-“, , (4.3.15)

where Zf is a finite autoregressive moving average. By Theorem 3.1.6, there is a functionf,(o) defined as a iimit in mean square,

P

f&4 = ( 2 7 v c Y y ( W i W h * h = - m

because y,(h) is square summable. Also,

w j = I_-: fv(w)e*mh dw . Consider the simple case in which the Zf of (4.3.14) is e,, a time series of

uncorrelated (0, cr’) random variables. Then, by Theorem 4.3.5,

f,(o) = ( I - e-co)-d(l - e’”)-df,(w) = (2m)-71 - e’Qy-uJ = (2~ ) - ’ [4 sir1~(0.5w)]-~a’ .

Observe that fu(w) is unbounded at w = 0 when d > 0. Because x - ’ sin x -+ 1 as x 3 0 ,

fu(”) 3 (2 7r) - 0 - 2duZ

Page 192: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

V-R PROCESSES 169

as w -+ 0. Sometimes a long memory process is defined to be a process whose spectral density is approximately equal to a multiple of w-zd for -0.5 < d < 0.5 and w near zero.

It follows from the spectral density that

4.4. VECTOR PROCESSES

The spectral representation of vector time series follows in a straightforward manner from that of scalar time series. We denote the cross covariance of two zero mean stationary time series X,, and X,, by &(h) = r(h)} , ,n = E{xj,x,,,+h) and assume that {am(h)} is absolutely summable. Then

(4.4.1)

is a continuous periodic function of w, which we call the cross spectral function of and Xm,. Since xm(h) may not be symmetric about 0, &(w) is, in general, a

complex valued function. As such, it can be written as

&(w) = C j , ( 4 - iqi,(w) 7

where c,,(o) and qjm(w) are real valued functions of w. The function cjm(w) is called the coincident spectral density or simply the cospectrum. The function qj,(w) is called the quadrature spectral density. The function cim(w) is the cosine portion of the transform and is an even function of w, and q,,(w) is the sine portion and is an odd function of w. Thus we may define these quantities as the transforms

1 ” c, ,(4 = - c +[ym(W + r/m(-h)le-””h 9

2 w h = - m

or, in real terms,

r m

(4.4.2)

Page 193: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

170 SPECTRAL THEORY AND FILTERING

By the Fourier integral theorem,

If we let f(w) denote the matrix with typical element &(o), we have the matrix representations

r(h) = 1' eePhf'(u) dw , -5T

. m

(4.4.3)

(4.4.4)

For a general stationary time series we can write

in complete analogy to (4.1.Ia). k t us investigate some of the properties of the matrix f(w).

Definition 4.4.1. A square complex valued matrix B is called a Hermitian matrix if it is equal to its conjugate transpose, that is,

B=B*,

where thejmth element of B* is the complex conjugate of the mjth element of B.

Definition 4.4.2. A Hermitian matrix B is positive definite if for any complex vector w such that w*w>O,

w*Bw > 0 ,

and it is positive semidejinite if

w*Bw 3 0 ,

Lemma 4.4.1. For stationary vector time series of dimension k satisfying

for j , m = 1,2,. . . , k, the matrix f(w) is a positive semidefinite Hermitian matrix for all o in [-w, w].

Proof. The matrix f(o) is Hermitian, since, for all w in [-w, 4 and for

Page 194: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

VECTOR PROCESSES

j , m = l , 2 ,..., k,

Consider the complex valued time series

k

Z, = alxl = C, ?xj,, j = I

where! d=(a1 ,a2 , ... , ark) is a vector of arbitrary autocovariance function of 2, is given by

171

complex numbers. The

Now ~ ( k ) is positive semidefinite, and hence, for any n,

and

Taking the limit as n 00, we have

k b

which establishes that f(o) is positive semidefinite. A

It follows immediately from Lemma 4.4.1 that the determinant of any two by two matrix of the form

Page 195: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

172 SPECTRAL THEORY AND FlLTERlNG

is nonnegative. Hence,

The quantity

(4.4.7)

is called the squared coherency funcfion. The spectraI density may be zero at certain frequencies, in which case X’,,,(o) is of the form O/O. We adopt the convention of assigning zero to the coherency in such situations. The inequality (4.4.6), sometimes written as

st,m(o) c 1 , (4.4.8)

is called the coherency inequality. To further appreciate the properties of f(w), consider the time series

X,,=A,cosrr+B,sinrr,

X,, = A , cos rt + B2 sin rt , (4.4.9)

where r E (0, T ) and (A ,, B , , A,, B2)‘ is distributed as a multivariate normal with zero mean and covariance matrix

Then

Y,,(h) = a,, cos rh , %2(h) = a,, cos ?

y, , (h) = E{A ,A [cos nlcos r(f + h) + A , B2 [cos rf ]sin r(r + h) + B,A,[sin rtJcos rfr + h) + B,B,[sin rtlsin r(t + h)}

= q3 cos rh + Q , ~ sin r h ,

and Y&) = %*(--h) -

The matrix analog of the spectral distribution function introduced in Section 4.1

Page 196: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

VECTOR PROCESSES

is

173

where F,,(w) is the cross spectral distribution function. For the example (4.4.9). F l , ( w ) is a step function with jumps of height fo;, at kr , F22(w) is a step function with jumps of height ;a,, at +r, F , , (u ) is a complex valued function where Re F,,(o) is a step function with a jump of height at kr, and Im F,,(w) is a step function with a jump of +c14 at -r and a jump of -4cI4 at r. Since the elements of F(w) are pure jump functions, we have a pure line spectrum. The real portion of the cross line spectrum is one-half the covariance between the coefficients of the cosine portions of the two time series, which is also one-half the covariance between the coefficients of the sine portions. The absolute value of the imaginary portion of the cross line spectrum is one-half the covariance between the coefficient of the cosine portion of the first time series and the coefficient of the sine portion of the second time series. This is one-half the negative of the covariance between the coefficient of the sine of the first time series and the coefficient of the cosine of the second. To consider the point further, we write the time series X2, as a sine wave,

X,, = # sin(rz + p) ,

where

A2

4 p = tan-’ - .

Let X,, be the cosine with the same amplitude and phase,

X,, = $ cos(rr + p)

= B, COSH- A , sin r t .

It follows that X,, is uncorrelated with X3,. The covariance between XI, and X,, is

E(X,,X2,} = E((A I cos rt + B, sin rt)(A2 cos r? + B, sin rr)} 2 = v13(cos2rt + sin rt) = wI3 ,

and the covariance between X,, and X,, is

E{X,,X3,} = E{(A I cos rr + B , sin r?)(B, cos rt - A, sin rt)} = Cr,, .

The covariance between XI, and X,, is proportional to the real part of the cross spectrum at r. The covariance bemeen X,, and X,, is proportional to the imaginary

Page 197: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

174 SPtXTRAL THEORY AND FulTERINo

portion of the cross spectrum at r. The fact that X,, is X,, “shifted” by a phase angle of d 2 explains why the real portion of the cross spectrum is called the cospectntm and the imaginary part is called the quadrature spectrum. The squared coherency is the multiple correlation between XI, and the pair Xzrr X3,, that is,

We now introduce some cross spectral quantities useful in the analysis of input-output systems. Let the bivariate time series (X,, U,)‘ have absolutely summable covariance function. We may then write the cross spectral function as

fxr(w) = A,y(w)e”PXY‘”’ , (4.4.10)

where

We use the convention of setting y3ry(w) = 0 when both cxu(w) and qxy(o) are zero. The quantity (pxr(w) is calIed the phase spectrum, and A,,(@) is called the cross amplitude spectrum. The gain of Y, over X, is defined by

(4.4.11)

for those w where fxx(w) > 0. Let us assume that an absolutely summable linear filter is applied to an input

time series X, with zero mean, absolutely summable covariance function, and everywhere positive spectral density to yield an output time series

= i a,xl-j . (4.4.12) j= -a .

The cross covariance function is

and, by Corollary 3.4.1.1, the cross spectral function is

(4.4.13)

Page 198: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

VECTOR PROCESSES

where

175

It follows that fxv(o)/fxx(w) is the transfer function of the filter. Recall that the phase do) and gain +(a) of the filter {aj} were defined by

Sincefx,(o) is the product of 2@(w) andfxx(w), where&(@) is a real function of o, it follows that the phase spectrum is the same as the phase of the filter. The cross amplitude spectrum is the product of the spectrum of the input time series and the gain of the filter. That is,

and the gain of Y, over XI is simply the gain of the filter. By Theorem 4.3.1, the spectral density of Y, is

and it follows that the squared coherency, for If,(w)l> 0, is

This is an interesting result, because it shows that the squared coherency between an output time series created by the application of a linear filter and the original time series is one at all frequencies. The addition of an error (noise) time series to the output will produce a coherency less than one in a linear system. For example, consider the bivariate time series

(4.4.16) X*r=@XI3r-1 + e l f

x,, = a r + %XI.,-, + e2r '

where 1/31 < 1 and {(eIf,e2,) '} is a sequence of uncorrelated vector random variables with E{e:,} = a,, , E{e:,) = a,,, and E{elfe2,f+h} = 0 for all t and h.

The input time series XI, is a first order autoregressive time series, and the output X,, is a linear function of X l f , XI.,-, , and eZr. The autocovariance and cross

Page 199: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

176 SPECTRAL THEORY AND FlLTERlNG

covariance functions are therefore easily computed:

where g, that is.

is the spectral density of el,, and g22(o) is the spectral density of e2,;

TaMe4.4.1. Autocovariance and Cross Covarlance Functions of t4e Time Series M n e d in (4.4.16)

-6 -5 -4 -3 -2 -1

0 1 2 3 4 5 6

0.262 0.328 0.410 0.512 0.640 0.800 1 .Ooo 0.800 0.640 0.5 12 0.4 10 0.328 0.262

0.5% 0.745 0.932 1.165 1.456 1.820 2.550 1.820 1.456 1.165 0.932 0.745 0.5%

0.341 0.426 0.532 0.666 0.832 1.040 1.300 1.400 1.120 0.8% 0.7 17 0.573 0.459

0.213 0.267 0.333 0.4 17 0.52 1 0.651 0.814 0.877 0.701 0.561 0.449 0.359 0.287

Page 200: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

VECTOR PROCESSES 177

The cospectrum and quadrature spectral density are

the phase spectrum and cross amplitude spectrum are

and the squared coherency is

W h e r e

Since g,,(o) is positive at all frequencies, the squared coherency is strictly less than one at all frequencies. The quantity do) is sometimes called the noise ro signu2 ratio in physical applications. Although the presence of noise reduces the squared coherency, the ratio of the cross spectrum to the spectrum of the input series still gives the transfer function of the filter. This is because the time series ezf is uncmelated with the input Xlf . ?he quantity

is sometimes called the error spectral density or error spectrum. We see that for a model such as (4.4.16),

For the example of Table 4.4.1 the elements of the spectral matrix are

Page 201: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

178 SFWTRAL THHORY AND FILTERING

It is interesting that the spectral density of X,, is that of an autoregressive moving average (1,l) process. Note that if &, had been defined as the simple sum of X,, and ez,, the spectral density would also have been that of an autoregressive moving average (1, I ) process but with different parameters.

The noise to signal ratio is

?it4 = g z z ( 4 - (1 -0.8e-'")(l -0.8e'") -

1211f,(o)(%,(w) (0.72)(0.5 + e-"")(O.J -+ e'") '

and the squared coherency is

0.45 + 0.36 cos w x:2(w) = 1.27 - 0.44 cos w *

Let us consider the example a bit futher. The input time series XI, is autoconelated. Let us filter both the input and output with the same filter, choosing the filter so that the input time series becomes a sequence of uncorrelated random variables. Thus we define

x,, =XI, - @XI,,-l = el, 9

'41 = '21 - p'2.I - 1 9

and it follows that

The cross covariance function of X,, and X,, then has a particularly simple fonn:

qull , h = O , y3&)= a;q,, h = l , I , otherwise.

By transforming the input series to white noise, the cross covariance function is transfwmed into the coefficients of the function (or linear filter) that defines X,, as a function of X3,. The spectral matrix of X,, and X,, has elements

Page 202: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

VE(JT0R PROCESSES 179

The reader may verify that

-a2 sin w

As we expected from (4.4.1 1) and (4.4.13), the phase spectrum is unchanged by the transformation, since we transformed both input and output witb the same filter. The filter changed the input spectrum, and as a result, that portion of the cross amplitude spectrum associated with the input was altered. Also, the error spectral density

is that of a moving average with parameter -@. The reader may verify that xi,(") is the same as X:,(o).

We have introduced and illustrated the ideas of phase spectrum and amplitude spectrum using the input-output model. Naturally, these quantities can be computed, in much the same manner as we compute the correlation for a bivariate normal distribution, without recourse to this model. The same is true of squared coherency and error spectrum, which have immediate generalizations to higher dimensions.

The effect of the application of a matrix filter to a vector time series is summarized in Theorem 4.4.1.

Theorem 4.4.1. Let X, be a real k-dimensional stationary time series with absolutely summable covariance matrix and let {A,}JW,-m be a sequence of real k X k matrices that is absolutely summable. Then the spectral density of

is

where

Page 203: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

180 SPECTRAL THEORY AND FILTERING

and Q(w) is the conjugate transpose of fA(w),

. m

Proof. The proof parallels that of Theorem 4.3.1. We have

and the resuit follows. A

As in the scalar case, the spectral matrix of autoregressive and moving average time series follows immediately from Theorem 4.4.1.

Corollary 4.4.1.1. The spectral density of the moving average process

x, = 5 Bje,-, . 1- -m

where {e,} is a sequence of uncorrelated (0, X) vector random variables and the sequence {B,} is absolutely summable, is

= (2T)2f,9(0)2f*8(0).

Corollary 4.4.1.2. Define the vector autoregressive process X, by D

where A, = I, A,, # 0. the e, are uncorrelated (0,Z) vector random variables, and the roots of

I &Ajrnp-Jl = O

are less than one in absolute value. Then the spectral density of X, is

f,(W) = 2nrf~ , (w)~- '11[ f~ , (w) i - ' ,

where fA.(w) is defined in Theorem 4.4.1.

Page 204: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MEASUREMENT ERROR - SIGNAL DETECTION 181

43. MEASUREMENT ERROR - SIGNAL DETECTION

In any statistical model the manner in which the “emrs” enter the model is very important. In models of the “input-output” or “independent-variable-dependent- variable” form, measurement error in the output or dependent variable is relatively easy to handle. On the other hand, measurement error in the input or independent variable typically introduces additional complexity into the analysis. In the simple regression model with no& e m the presence of normal measurement error in the independent variable requires additional information, such as knowledge of the variance of the measurement error, before one can estimate the slope of the regression line.

In time series analysis the distinction between independent variable and dependent variable may be blurred. For example, in predicting a future observation in the realization, the past observations play the role of independent variables, while the future observation plays the role of (unknown) dependent variable. As a result, considerable care is required in specifying and treating measurement e m in such analyses. One important problem where errors of observation play a central role is the estimation of the values of an underlying time series that is observed with measurement error.

To introduce the problem, let {XI: t E (0, t 1, 22, . . .)} be a stationary time series with zero mean. Because of measurement error we do not observe X, directly. Instead, we observe

y, =x, + u, * (4.5.1)

where {u,: t E (0, t: 1,+-2,. . .)} is a time series with u, independent of X, for all t , j . The u, are the measurement mrs or the noise in the system. A problem of signal measurement or signal detection is the construction of an estimator of XI given a realization on Y, (or a portion of the realization). We assume the covariance functions of XI and u, are known.

We first consider the problem in the time dornain and restrict ourselves to finding a linear filter that minimizes the mean square error of our estimator of X,. Thus we desire weights {a,: j = -L, -(L - 11,. . . , M - 1, M}, for L 3 0 and M 3 0 fixed, such that

is a minimum. We set the derivatives of (4.5.2) with respect to the aj equal to zero and obtain the system of equations

M

zl aj[Yxx(j - r) + %“(j - r)l= .xrx(‘) 9

I= -L

r = -L, -(L - I), , . . , M - 1 , M . (4.5.3)

Page 205: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

182 SPECTRAL THEORY AND FILTERING

For modest choices of L and fU this system of liner equations is w i l y solved for the a,. The mean square error of the estimator can then be obtained from (4.5.2). To obtain the set of weights one would use if one had available the entire

realization, we investigate the problem in the frequency domain. This permits us to establish the bound on the mean square error and to use this bound to evaluate the performance of a filter with a finite number of weights. We assume that u, and X, have continuous spectral densities with bounded derivatives. The spectral density of Y, and the cross spectral density follow immediately from the independence:

hub) = f x x ( 4 + f , , ( W ) *

f x y ( 4 = f x x ( 4 *

We assume that frr(o) is strictly positive and that I;:(&) has bounded first derivative.

We shall search in the class of absolutely summable filters {aj} for that filter such that our estimator of X,,

m

Now the mean square e m r is

(4.5.4)

and W, = X, - X,yw-m ajY, , is a stationary time series with a spectral density, say fw,,,(y(o). Hence, the variance of is

7r

&W(O) = EW:} = f W W ( 4 - 7 r

1T

= I_, cfxx(4 - 2mXdf , , (@) +ff(w)fux(w)I

+ ( 2 T ) 2 f , ( w 3 4 f u u o ) y ~ ( d * (45.5)

We have converted the problem of finding the a, that minimize (4.5.4) to the problem of finding thef,(o) that minimizes (4.5.5). This problem retains the same form and appearance as the classical r e p s i o n problem with f,(w) playing the role

Page 206: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MEASUREMENT ERROR - SIGNAL DETECTION 183

of the vector of coefficients. Therefore, we take as our candidate solution

which gives

ra

(4.5.6)

(4.5.7)

The weights aj are given by the inverse transform off,(@), II

ai= \-IIf,(ok'"do* j=0,+1*-+2 ,... . To demonstrate that these weights yield the minimum value for the mean

square error, we consider an alternative to (4.5.6):

wheref,(w) must be in the class of functions such that the integral defining the mean square error exists. The mean square emor is then

c f x x ( 4 - 2 . r r t r , ( w ; x ( d + f t ( @ l f Y X ( d I

+ ( 2 7 4 2 f ( 4 f X ( ~ ) f y y O ) d o

Since If,(w)i2/fyy(w) is nonnegative, we conclude that the f,(w) of (4.5.6) yields the minimum value for (4.5.5).

Note that 212f,(w)=fi;(w)f,(o) is a real valued symmetric function of w because fyx(w) =fxx(o) is a real valued symmetric function of w. Therefore, the weights a, are also symmetric about zero.

We summarize in Theorem 4.5.1,

Theorem4.5.1. Let{X,:rE(O,k1,+2 ,... )}and{u,:rE(O,%l,-C2 ,... )}be independent zero mean stationary time series, and define Y, = X, + u,. Let fyu(w), f i ,!(w), andfxx(w) be continuous with bounded first derivatives. Then the best linear filter for extracting X, from a realization of Y, is given by

Page 207: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

184

Where

SPECTRAL THEORY AND FILTERING

Furthermore, the mean square error of X;=-ma,q-, as an estimator for X, is

U I_: cfxx(4 - r . f Y Y ( ~ ~ l - ' I & Y ( ~ ) 1 2 1 ~ ~ =I_, fXX(4[1 - G(41 dfiJ - Proof. Since the derivatives off;,!(#) andfyx(w) are bounded, the derivative

of f;,!(w)&',(o) is square integrable. Therefore, by Theorem 3.1.8, the Fourier coefficients of ~ ; , ! ( U & ~ ( W ) , the a,, are absolutely summable, and by Theorem 2.2.1, U , K - ~ is a well-defined random variable. That {a]} is the best linear filter follows from (4.5.81, and the mean square error of the filtex follows from (4.5.7). A

Example 4.5.1. To illustrate the ideas of this section, we use some data on the sediment suspended in the water of the Des Moines River at Boone, Iowa. A portion of the data obtained by daily sampling of the water during 1973 are displayed in Table 4.5.1. The data are the logarithm of the parts per million of suspended sediment. Since the laboratory determinations are made on a small sample of water collected from the river, the readings can be represented as

y c = x f + u f ,

where U, is the recorded value, XI is the "true" average sediment in the river water, and u, is the measurement error introduced by sampling and laboratory determination. Assume that XI can be represented as a first order autoregressive process

XI - 5.28 = 0.81(X,-, - 5.28) + el ,

where the e, are independent (0,0.172) random variables. Assume further that u, is a sequence of independent (0,0.053) random variables independent of X, for all

To construct a filter {a-2,a-,,a,,a,,a,} that will best estimate X, using the I, j .

observations { U , - 2 , Y,- ,, q, y i+ l , q+2}r we solve the system of equations

0.553 0.405 0.328 0.266 0.215 a _ , 0.328 0.405 0.553 0.405 0.328 0.266 a_, 0.405 0.328 0.405 0.553 0.405

0.328)( ;:) =(0.500) 0.405 0.266 0.328 0.405 0.553 0.405 0.215 0.266 0.328 0.405 0.553 0.328

to obtain

a = {a-,, a- ,, a,, a, , a,} = (0.023,0.120,0.702,0.120,0.023}.

Page 208: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MEASUREMENT ERROR - SIGNAL DETECTION 185

Tak45.1 . Logarithm of sedhnent Suspended in Des Moines River, Boone, Iowa, 1973

Smoothed Observations Daily Observations Two-Sided Filter One-sided Filter

5.44 5.38 5.43 5.22 5.28 5.21 5.23 5.33 5.58 6.18 6.16 6.07 6.56 5.93 5.70 5.36 5.17 5.35 5.5 1 5.80 5.29 5.28 5.27 5.17

- 5.40 5.26 5.27 5.22 5.25 5.37 5.63 6.08 6.14 6.13 6.38 5.96 5.69 5.39 5.24 5.36 5.51 5.67 5.36 5.29 -

- 5.26 5.28 5.22 5.23 5.31 5.53 6.04 6.10 6.04 6.42 5.99 5.14 5.42 5.22 5.33 5.47 5.72 5.37 5.30 5.27 5.19

SOURCE: U.S. Department of Interior Geological Survey-Water Resources Division, Sediment Concentration Notes, Des Moines River, Boone, Iowa.

The mean square error of the filtered time series 5.28 + Z:,=-2uj(&-i - 5.28) as an estimator of X, is 0.500 - a[0.328,0.405,0.500,0.405,0.328]’ = 0.0372. The data transformed by this filter are displayed in the second column of Table 4.5.1. Note that the filtered data are “smoother” in that the variance of changes from one period to the next is smaller for the filtered data than for the original data.

We now obtain a one-sided filter that can be used to estimate the most recent value of X, using only the most recent and the four preceding values of Y,. The estimator of X, is given by

5.28 + b,(& - 5.28) + b,(Y,-, - 5.28) -t bZ(q-l - 5.28) + b,(Y,-, - 5.28) + b4(Y-., - 5.281,

Page 209: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

186 SPECTRAL TweoRY AND FILTERING

where

(b) b, (0.405 0.553 0.405 0.553 0.328 0.405 0.266 0.328 0.215 0.266) - I (0.405) 0.500 (0.134) 0.790

This one-sided estimator of X, has a mean square error of 0.0419. The filtered dafa using this filter are given in the last column of Table 4.5.1. To obtain the minimum value for the mean square error of the two-sided filter,

we evaluated (4.5.7), where

= 0.328 0.405 0.553 0.405 0.328 0.328 = 0.023 . 0.266 0.328 0.405 0.553 0.405 0.266 0.005 0.215 0.266 0.328 0.405 0.553 0.215 0.0o0

0.172 fxx(” )= 2741 - 0.81e-’”)(l - 0.81e’”) ’

0.172 0.053 +- fyy(o)= 2?r(l - 0.81e-i“)(l - 0.81e’”) 27T

- 0.2524(1 - 0.170e’”)(l- O.l7Oe-‘”) - 2n(l - 0.81e-’’w)( 1 - 0.81e’”) ’

obtaining

= 0.500 - 0.463 = 0.037 . The infinite sequence of weights is given by the inverse transform of

0.172 2~(0.2524)(1 - 0.17e-‘”)(l - 0.17e””) ’

which yields

(. . . , 0.1 193,0.7017,0.1193,0.0203,0.0034,~ . .) . While it is possible to use specrral metfiods to evaluate the minimum mean

square error of a one-sided filter [see, for example, Yaglom (1952, p. 97)], we examine the problem in a slightly different manner. Since Y, is an autoregressive moving average (1,l) process, the methods of Section 2.9 can be used to obtain a one-perioci-ataead predictor t (~-] , Y,-~, . . .) based on an infinite past. AS Y, = X, + u,, where u, is a sequence of independent random variables, the best predictor of XI based on q-,, Y,-2,. . . must be the same as the best predictor of Y, based on

q-z,. . . . Furthennore, given the predictor, the partial correlation between any q-,, j >O, and X, is zero. Therefore, to obtain the best filter for X, using Y,. Y,-,, . . . , we find the optimal linear combination of f’r(Y,-,, qb2,. . .) and Y,.

Page 210: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

STATE SPACE MODELS AND KALMAN FILTERING 187

Denote the linear combination by c,Y, -I- c,t(Y,- ,, Y,-2, . . .), where

0.5530 0.3006) - *(“‘MOO) - (“‘“) (zy) =(0.3006 0.3006 0.3006 - 0.2100 ’

the matrix

) 0.5530 0.3006 ( 0.3006 0.3006

is the covariance matrix of [q, t($-,, Y,-2,. . .)I, and (0.5000,0.3~) is the vector of covariances between [Y,, Y,(q-l, yt+. . .)I and X,. It follows that the minimum mean square error for a one-sided predictor is 0.0419, AA

4.6. STATE SPACE MODELS AND KALMAN FILTERING

We begin our discussion under the assumption that the univariate time series XI is a stationary, zero mean, first order autoregressive process. We write

X~=QX,-~ + e , , r = l , 2 ,... , (4.6.1)

where e, am independent identically distributed (0, a:) random variables, denoted by e, - II(0, a:). Assume we are unable to observe X, directly. Instead, we observe Y,, where

Y,=XI+ul, t = 1 , 2 ,..., (4.6.2)

and u, is the measurement error. We assume u, -II(O, a:) and that u, is independent of e, for all t and j .

The model (4.6.1)-(4.6.2) is a special case of the state space representation of time series. Equation (4.6.1) is called the state equation or the transition equation, and X, is called the state of the system at time t. Equation (4.6.2) is called the measurement eguatwn or the observation equation. The model (4.6.1)-(4.6.2) was introduced in the Des Moines River example of Section 4.5. In that example it is very natural to think of the unknown level of the river as the true “state” of nature.

In Section 4.5, we constructed linear filters to estimate the values of the time series X, that is observed subject to measurement error. The filters were designed to minimize the mean square error of the estimation error for a particular set of observations. In many applications, the observation set is composed of all previous observations plus the current observation. In some engineering applications, it is important to have an efficient method of computing the current estimated value requiring as little storage of information as possible. Such computational methods have been developed by Kalman (1960, 1963) and others. The methods are often

Page 211: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

188 S&”RAL THEORY AND MLTERING

called Kalman jilters. In this section, we study the state space model and the Kalman filter.

Let the model (4.6.1) and (4.6.2) hold, and assume that an initial estimator for X,, denoted by i,, is available. Let

2, = x, + u, , (4.6.3)

where u, is a (0, a:,) random variable, independent of (u,, e,), t = 1,2, . . . . The initial estimator 2, and parameters a:, af, at,, and a are assumed known. Equation (4.6.3) is called the initial condition equation (or starting equation). A possible choice for 2, for the model (4.6.1) is 2, = 0. With i0 = 0, we have

At time t = 1, we have the observation (Y1,ko) and knowledge of a:, a:, and a to use in constructing an estimator (predictor) of XI. On the basis of the specification, we have

2 2 - I 2 avo = x(0) = (1 -a ) a,.

Y, = X I + u , ,

aJ?,=X, + w , ,

(4.6.4)

where w , = avo - e,. The system of equations (4.6.4) can be written in matrix form as

zg=JxI+~l,

where Z, = (Y,, &f0)’, J = (1, I)’, and E , = (u,, w,)’. Because u, , e l , and uo are mutually uncorrelated, the covariance matrix of E, is diag(a:,oi,), where ail = a, + a avo. merefore, it is natural to construct an estimator of X, as the weighted average of Y, and a&, where the weights are proportional to the inverses of the variances of u, and wI. Thus,

2 2 2

8, =(a;’ + o;y(a;’Y, + a,;a8,) . (4.6.5)

The estimator (4.6.5) is constructed by analogy to linear regression theory. In the problem formulation (4.6,4), the information about the unknown random value, X,, is contained in the second equation of system (4.6.4). See Exercise 4.19. The same approach can be used to construct succeeding estimators.

Let the error in 8, as an estimator of X, be u , , where

(4.6.6) - 2 - I u, =fl -XI ={a;’ + awl ) (a,-2uu, + cr,:w,) . The variance of u , is

(4.6.7)

At time t = 2, it is desired to estimate X, using Y,, Y , , and go. Now d, is the best

- 2 - I a;, = (a;2 + CT,, ) .

Page 212: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

STATE SPACE MODELS AND KALMAN FlLTERlNG 189

predictor of X 2 constructed from the data (Y,, $). Therefore, we need only combine the information in a i l with that in Y2 to obtain the best estimator of X , . Using (4.6.1)-(4.6.2) and the identity

= ax, + a(& - x,) = ax, + auI ,

we have the system of two equations containing X 2 ,

Y2=X2+uz.

aY?, =x2 + w , ,

(4.6.8)

where w2 = au, - e,. Because the vectors (u,, e,) are uncorrelated, we have

E&,e2, uI)'(u2,e2,u,)} = diag(at, a:, d) and

~ { ( u , , w,)) = diag(a,2, a: + a * ~ ~ ~ ) .

It fouows that the best estimator of x,, given ($, Y ~ , Y,), is

(4.6.9)

2 where a,* = a: + a'a:, . Letting the error g2 - X, be -2 - 1

0, = (Cri2 + uw2 (+2 + +2),

2 - 2 - 2 - I the variance of u, is uu2 = (a, + uvz) . previous observations, the estimator of X, for general t is

Because if-, contains all of the information about X, available from the

(4.6.10) -2 - 2f =(a;' + ~;;,">-'(u;~Y, + cw, axf-,),

and

= (ai2 + a;;:>- I a",

Equation (4.6.10) can be rearranged to yield

(4.6.11)

Page 213: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

190

or

SPECFRAL THEORY AND FILTERING

The first term on the right of (4.6.12) is the estimator of X, based upon 2,- I . The second term is the negative of an estimator of the error made in predicting X, with a& I. We can give a direct justification of (4.6.12) as follows. Because u, - w, = Y, - a$, -, is a linear combination of V, and 2, -, , the linear estimator of X, b a d on

(ut - w,,$-,) = (u, -

is the same as the linear estimator based on (K, 2, - , ). Now u, - w, is uncorrelated with 2, - I. It follows, by the properties of regression (or of Hilbert spaces), that the best linear estimator of X, based on ($-,, Y,) is the best linear estimator of X, based on 2,- , , plus the best linear estimator of X, - &, based on Y, - a$,-, . Hence, the best estimator of X, is given by (4.6.12). See Exercise 4.25.

An equation equivalent to (4.6.12) is

2, = u, - a,2(a,z + r;r)-t(x - a$,-,). (4.6.13)

The form (4.6.12) appears in the original signal extraction literature, and the form (4.6.13) appears in the random model prediction literature. In the form (4.6.13), an estimator of the error u, is subtracted from Y, to obtain the estimator of XI.

The variance of u, satisfies the equation

(4.6.14) 2 r", = - (..;,)'(a: f a y .

Hence, the data required at time t f 1 to construct the estimator of X,, I and to construct the variance of the estimation error are the elements of (v,+,, 2,. CT;,),

The equations (4.6.12) and (4.6.14) are sometimes called the updating equations of the Kalman filter.

The results for the simple model (4.6.1)-(4.6.2) generalize to p-dimensional vector time series X, and to the situation wherein the observed vector Y, is the sum of a known linear function of X, and measurement error. Let

X, =AIXI-I + e l , t = 1,2 ,..., (4.6.15)

Y, = H,X, + u, , I = 1,2, . . . , (4.6.16)

go = x, + v, , (4.6.17)

where XI is a pdimensional column vector, Y, is an rdimensional column vector, {HI} is a sequence of known r X p matrices, A, is a sequence of known p X p matrices, 2, is a known initial vector, and {(u:, e:)} is a sequence of uncmlated,

Page 214: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

STATE SPACE MODELS AND KALMAN F[LTERING 191

zero mean, vector random variables with known covariance matrix

Equation (4.6.15) is the state equation, equation (4.6.16) is the measurement equation, and equation (4.6.17) is the initial equation. As before, we assume vo of the initial equation to be uncorrelated with u, and el, t = 1,2, . . . . Considerabie generalization of the model is obtained by permitting the variances to be functions of r and by the inclusion of the matrices H, in the measurement equation. Many different forms of state space representations appear in the literature.

For the vector model, the system of equations analogous to (4.6.8) is

Y, = H,X, + u, ,

A,%, - I = X, + w,

(4.6.18)

for r = 1,2,. . . , where w, = A,v,- - e, and v, = 8, - XI. If we assume that Z,,,, is nonsingular, the best estimator of X, in (4.6.18) is

where

The estimator (4.6.19) can also be written as

where D, is the covariance matrix of Y, - R,A,fZ,- I and SWw,,H~ is the covariance between w, and Y, - H,A,%,- I . Therefore, the estimator (4.6.23) is the difference between the unbiased estimator A,%,-, of X, and an unbiased estimator of the e m r w, in A,%,.-l.

Equation (4.6.20), the second equation of (4.6.21), and uation (4.6.23) form a set of equations that can be used to construct 8, given Y,, , -, , If this system of equations is used for updating, only the matrix D, is required to be nonsingular. The matrix D, should always be nonsingular because there is little reason for the subject matter specialist to consider singular observation vectors Y,.

Page 215: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

192 SFJXXRAL THRORY AND FLTERMG

The vector of updating equations analogous to (4.6.13) is

and an alternative expression for Xu,,, is

Exampie 4.6.1. In this example, we construct the Kalman filter for the Des Moines River example of Section 4.5. The model for the data is

(4.6.24)

where e, - II(0,0.172) and u, - iI(O,O.O53). The sediment time series has a nonzero mean, so the diffemce XI - 5.28 plays the role of XI of (4.6.1).

We begin with the first observation of Table 4.6.1, which we call Yt . If XI is a stationary process, as we assume for the Des Moines River example, we can use the population mean to initiate the filter. If we use the population mean as an estimator of X,, the variance of the estimation error is 0.500, which is the variance of the X, process. The system of equations (4.6.4) becomes

Y, -5.28=0.16=(Xt -5.28)+u1 ,

O = ( X , -5 .28)+wI.

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15

5.44 5.38 5.43 5.22 5.28 5.2 1 5.23 5.33 5.58 6.18 6.16 6.07 6.56 5.93 5.70

5.42467 5.38355 5.4 161 3 5.25574 5.2 7 5 8 8 5.22399 5.23097 5.31 117 5.52232 6.03227 6.10318 6.04413 6.42 123 5.98760 5.73215

0.04792 0.04205 0.041 88 0.04187 0.041 87 0.04 1 87 0.04187 0.04187 0.04187 0.04187 0.04187 0.04187 0.04187 0.04187 0.04 1 87

0.50000 0.20344 0.19959 0.19948 0.19947 0.19947 0.19947 0.19947 0.19947 0.19947 0.19947 0.19947 0.19947 0.19947 0.19941

Page 216: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

STATE SPACE MODELS AND KALMAN FILTERING 193

where V((u,, w , ) } = diag(0.053,0.500). Therefore, by (4.6.10),

2, = 5.28 + (18.868 + 2.000)-'[18.868(YI - 5.28) + 01 = 5.28 + 0.145 = 5.425 .

The reader may verify that the coefficient for Yl - 5.28 is the covariance between X, and Y, divided by the variance of Y, . The variance of the error in the estimator of X I is cr," = 0.0479.

The estimate of X 2 is

z2 = 5.28 + [(0.053)-' + (0.2034)-']-'

X [(0.053)-'(Y2 - 5.28) + (0.2034)-'(0.81)(~, - 5.28)]

= 0.2075 + 0.7933Y2 + 0.16742, = 5.3836,

2 2 2 where ut2 = ue + a r,, = 0.2034. The variance of 22 - X, is

at2 = 0.2034 - (0.2034)2(0.053 + 0.2034)-' = 0.0420.

The estimate for X, is

(0.053)-'(Y3 - 5.28) + (4.0581)($2 - 5.28) (0.053)-' + (0.1996)-'

I!3 = 5.28 + = 5.4161.

and the variance of the estimation error is

u:3 =0.1996 - (0.1996)2(0.053 + 0.1996)-' = 0.0419.

The estimates and variances for the remaining observations are given in Table 4.6.1. Note that the variance of the estimation error is approaching 0.0419. This limiting variance, denoted by u:. was derived in Example 4.5.1 as the mean square error based on an infinite past. The variance of w, stabilizes at

ui = a: + a'ua,' = 0.172 4- (0.81)2(0.0419) = 0.1995.

It follows that equation (4.6.12) stabilizes at

gI = 5.28 + 0.81(2,-.l - 5.28) + 0.7901[Y, - 5.28 - 0.81(I!1-, - 5.28)],

where 0.7901 = (a: + u:)-'u,: and a = 0.81. AA

Example 4.6.2. We investigate estimation of XI for the Des Moines River

Page 217: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

194 SPECTRAL THEORY AND FILTERING

example under the assumption that the mean of the process is unknown. We retain the assumption that the other parameters of the model are known. We write the model as

(4.6.25)

where p is the mean of the process. The first two equations of (4.6.25) are the state. equations and the last equation is the measurement equation. in terms of the model (4.6.15)-(4.6.17). a,,,, = a,, = 0.053,

A, = A = diag(a, 1) = diag(0.81,l) , &,, = x,, = diag(0.172,0),

Xi = (Z,, p), ei = (e,,, 0), and HI = (1,l). Under the model, each Y, is unbiased for p, and the variance of U, - p is uzz + cuu.

To initiate the filter, we use the knowledge that 2, is a random variable with mean zero and variance azz = a:. Letting Yo be the first observation of TabIe 4.6.2, we can form the system of equations

(4.6.26)

(4.6.27)

0 = z, + u, ,

Yo= p + z, I- uo . We are treating p as a fixed unknown parameter, so we do not have an initial equation of the type (4.6.26) for p. The first real observation furnishes the first information about p. From the system (4.6.26)-(4.6.27), we obtain the estimator

g; = (20, A) = (0, Yo)

with covariance matrix

where a - = 0.500 and a,, = 0.053. Using 2; = (0,5.44) and equation (4.6.19), we have

0.46953 -0.45252 18.868Y, + 19.5884 (3 =( -0.45252 0.47901 18.868Yl + 24.1832 = (-0.0193,5.4100)' ,

and from (4.6.20),

0172 0 0.328 -0.405) =( ' * ~ l t =( '0 0) '(-0.405 0.553 -0.405

Page 218: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

STATE SPACE MODELS AND KALMAN FUTERMG 195

If we use (4.6.23),

'1 =(Ob8' ~ > ( S . ~ ) '(-0.405 Oso0 -OS4O5) 0.553 (:)(0.1430)-'(5.38 - 5.44)

= (-0.0193,5.4100)' ,

where

D, =0.053+(l,1)2,w,,(1,1)'=0.29605.

It follows that

0.46953 -0.45252) zuull =(-0.45252 0.47901

estimate of the sediment at t = 1 is

S, = gt + 4, = -0.0193 + 5.4100 = 5.3907 . The variance of the error in the estimated sediment at time one is

(1, l ) ~ n u , i ( l , 1)' =0.04351.

The estimates for t = 2 are

0.43387 -0.41230 -0.41230 0.43367

Where

0.4801 -0.3665) ' w w Z z = (-0.3665 0.4790 *

The estimates of S, = 2, + p, based on Y,, I-',-.,, . . . , Yo, are given in Table 4.6.2. Note that the estimation of the mean conttibutes modestly to the variance of $, - S,. 'Ihe variance of fi, is declining approximately at the rate t - I . While the variance of the estimator of S, = p + 2, will eventually approach 0.0419, the approach is slowed by the estimation error in &. AA

Example 4.63. In the previous examples, we appIied the Kalman filter to a stationary time series. Stationarity made it relatively easy to find starting values for the filter. In this example, we consider filteMg for a time series in which the autoregressive part has a unit root. Let the model be

t = O , Z,-,+e,, r = l , 2 ,..., (4.6.28)

y , = e + z I + u l , t = o , i ,..., (4.6.29)

Page 219: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

1% SPECTRAL THEORY AND FILTERING

Table 4.6.2. Estimates of Sedhnent Constructed with Kalman Filter, Unknown Mean

0 1 2 3 4 5 6 7 8 9 10 I1 12 13 14

5.44 5.38 5.43 5.22 5.28 5.2 1 5.23 5.33 5.58 6.18 6.16 6.07 6.56 5.93 5.70

5.44OOo 5.39074 5.42324 5.25915 5.27933 5.22621 5.23298 5.31422 5.52856 6.04632 6.1 1947 6.06090 6.44377 6.00652 5.74886

5.44ooo 5.41001 5.42436 5.35070 5.35 185 5.326 13 5.32174 5.34347 5.40987 5.57235 5.61890 5.62881 5.74903 5.67363 5.62767

0.05300 0.0435 1 0.04293 0.04280 0.04272 0.04266 0.04261 0.04256 0.04252 0.04249 0.04246 0.04243 0.04240 0.04238 0.04236

0.55300 0.47901 0.43367 0.39757 0.36722 0.34 12 1 0.31864 0.29887 0.28141 0.26588 0.25 198 0.23%5 0.228 1 1 0.21780 0.20838

where (e,* uI)' - NI[O, diag(cr:, (r:)]. We treat the true part of Yo, denoted by 8, as a fixed unknown constant to be estimated. The Y-data of Table 4.6.3 were generated by the model with (wt, a:) = (0.25,0.16). In terms of the model (4.6.15)- (4.6.17), a,,,, = o;, = 0.16, A, = A = I,

H, = H = (1, I ) , and X,, = diag(0.25,O) . Table 4.63. Kalman Fflter Applied to a Unit Root Process

f - 0 1 2 3 4 5 6 7 8 9 10 1 1 12 13 14

r, j , = a, + 2,

2.30797 2.54141 3.08044 1.35846 1.55019 2.34068 1.33786 0.98497 1 .I7314 0.65385 0.35140 0.47546

-0.56643 0.04359

- 0.25374

2.30797 2.47588 2.89622 1.83049 1.63629 2.12430 1.57945 1.16759 1.17143 0.8 1285 0.493 1 5 0.48090

-0.24470 -0.04497 -0.18% 1

2.30797 2.37350 2.4252 1 2.38483 2.38257 2.38432 2.38372 2.38358 2.38358 2.3 8 3 5 7 2.38357 2.38357 2.38356 2.38356 2.38356

0.16OOo 0.11509 0.1 1125 0.1 1089 0.1 1085 0.1 1085 0.1 1085 0.1 1085 0.1 1085 0.1 1085 0. I 1085 0.1 1085 0.1 1085 0.1 1085 0.1 1085

0.16ooO 0.1 1509 0.11 125 0.11089 0.11085 0.1 1085 0.1 1085 0.1 I085 0.1 1085 0.1 1085 0.1 1085 0.1 I085 0.1 1085 0.1 1085 0.1 I085

0.16OOo 0.0449 I 0.0 1 369 0.00420 0.00129 0.00040 0.00012 0.00004 O.oooO1 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000

Page 220: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

STATE SPACE MORELS AND KALMAN FILTERING 197

The model specifies 2, = 0 and U, = 8 + uo. Therefore, we initiate the filter with

86 = (2?o, do) = (0, Yo) = (0,2308)

and

= diag(0,0.16).

Then

= diag(0.25,0.16) , D, = 0.16 -b (1 , l)[diag(0.25,0.16)]( I, 1)' = 0.57 ,

and

= (0.1024,2.3735)' . The estimate of B + Z, is yI = 4, + 2, = 2.4759,

The remaining estimates and the variances of the estimators are given in Table 4.6.3. There are several interesting aspects of the results. First, the variance of the estimator of 8 stabilizes rather quickly. Observations after the first five add very little information about the true value at time zero. Associated with the stabiliza- tion of the variance of JI is the fact that the estimate of 8 + Z, at time t depends very little on values of Y, for t - r < 5. Also, the variance of the estimator of 8 at time t is equal to the variance of the estimator of B f Z,. This is because the estimator of t9, expressed as a function of (Yo, Yi, . . . , K), is the mirror image of the estimator of 8 + 2,.

After a few initial observations, the estimation equation stabilizes. For this probIem, the limiting updating equation for yI = B + 2, can be written as

9, = 9, - I + c(K - 9( - , 1 I (4.6.30)

where

2 2 - 1 2 c = @, + a,) a,. , u~=O.5ae,,,[1 + (1 + ~ C T ~ ~ ' , , U ~ ) ' ' ~ I ,

2 2 2 2 - 1 4 uu = 0, - (CT, + aw) a,.

The expression for e: was obtained from the expression following (4.6.10) by setting a, = uwl and cot = uw.l- 2 2 2 2

Page 221: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

198 SPECTRAL THEORY AND PILTERING

The j l of (4.6.30) is a convex combination of yl-i and U;. Hence, the sum of the weights on the Y,-,, j = 0, 1,. . . , f , that define 9, is one. This means that estimates constructed with this model maintain the level of the original series. If the true U, contains a positive time trend, the predictor (4.6.30) will have a negative bias. AA

In our discussion to this point, we have used information on the Y-process through U, to construct an estimator for XI. It follows from (4.6.15) that predictors of future values can be constructed from the estimator of XI. For example, tbe predictor of XI+, constructed with data t h g h Y, is

% + I & =Al+l% (4.6.3 1)

and the variance of the prerfction error V@, + - Xf + is

Zf+tlI = ~ w w , l + I . l + l =AI+i~uu11A:+I + ~ e e * f + I , f + l * (4.6.32)

The formula (4.6.31) can be applied recursively to obtain predictions for any number of periods. Thus,

CI CI

Xf+/[f =Af+r-iXf+/-i(f 1

Zf+/,l = A,+,%+,-IlfAf'+/ + % e * l + / . f + , *

and the variance of the prediction error is

The prediction formuIas can also be used in constructing estimators of X, when data are missing. Assume that (Y,,Y,, , . . , Yr-,, Y,+,) is available, that the Kalman filter has been initiated prior to time r - 1, and that the objective is to estimate X,+ At time r, the best estimator of X, is

2, = grIr- =A,%,- I ,

and V{X, - A,%,- ,} = Z,,,,. Because there is no Y-information at time r, we have X,,,, = Xu,,,. That is, the e m r in the estimator of X, constructed with 8,- , is the final estimation error. At time r + 1, when Y,+I is available, the best estimator of Xr+l is

g r + i =A,+ i g r + x w w . r + I.r+ l ~ ~ + ~ D ~ + ' ~ ( y r + 1 - H r + l A r + l ' r ) 9

where &w,r+l , ,+ l , I),+,, and Zuu,r+l , r+l are given by (4.6.201, (4.6.22), and (4.6.21), respectively.

Example 4.6.4. We use the Kalman filter and the data of Example 4.6.1 to construct estimators in the presence of missing data. We assume that the mechanism that causes data to be missing is independent of the (V,,Xl) process.

Table 4.6.4 contains the data of Table 4.6.1 with observations 7, 1 1, and 12

Page 222: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

STATE smcE MODU AND KALMAN FJLTERING 199

Table 4.6.4. Estimates of Sediment ColrsGucted with Wman Filter

I Y, gt a:, a:,

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15

5.44 5.38 5.43 5.22 5.28 5.21

5.33 5.58 6.18

6.56 5.93 5.70

5.42467 5,38355 5.41613 5.25574 5.27588 5.22399 5.23463 5.31708 5.52380 6.03256 5.88957 5.77375 6.44992 5.99176 5.73285

0.04792 0.04205 0.04188 0.04187 0.04 187 0.04187 0.19947 0.045 1 I 0.04197 0.04 1 88 0.19948 0.30288 0.04637 0.04200 0.04188

0.J0000 0.20344 0.19959 0.19948 0.19947 0.19947 0.19947 0.30287 0.20160 0.19954 0.19948 0.30288 0.37072 0.20242 0.19956

missing. Recall that the model of Example 4.6.1 is

X, - 5.28 = 0.81(X1-, - 5.28) + e,,

yI=x,+u, ,

where (el, u,)’ -II[O, diag(O.172,0.053)]. Given the previous data, the predictor of X, is

2, =c. 5.28 + 0.81(5.22399 - 5.28) = 5.23463 ,

and the variance of the estimation emr is a:, = 0.19947. The estimator for X, is

= 5.28 + 0.81(-0.04537) + (0.30287)(0.35587)-1(0.05 f 0.03675) = 5.31708,

where Y, - 5.28 = 0.05,

a:, = a: + a‘a’,, = 0.172 + (0.81)20.19947 = 0.30287, R, = 0.053 + 0.30287 = 0.35587,

= 0.30287 - (0.30287)’(0.35587)-’ = 0.0451 1 .

, . TIE estimator for x,, is The calcuiations are analogous for

R,, = 5.28 + 0.81(5.88957 - 5.28) = 5.77375 ,

Page 223: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

zoo

where

SPECTRAL THEORY AND FILTERING

a:,lz = a: + a 2 a ~ , , , = 0.172 f (0.81)20.19948 = 0.30288.

The estimator for XI, given in Table 4.6.4 is conshucted with

2 ( ( T ~ , ~ ~ , D I 3 , &) = (0.37072,0.42372,0.04637).

These calculations illustrate the importance of Y, in the estimation of X, for these data. AA

In some situations, it is useful to express an autoregressive moving average in the state space form. The vector first order autoregressive process is in the fom (4.6.15) with a fixed matrix A and zero measurement error. Therefore, because any autoregressive model can be put in the vector first order form (see Section 2.8), any pure autoregressive model can be put in the state space form. We shall see that there are alternative state space representations.

h i k e (1974) suggested a state space represeatation for autoregressive moving averages. To introduce the ideas, consider a second order moving average

Y, = BS-2 f PI S - 1 f e, (4.6.33)

where the e, are independent (0, af) mdom variables. Let

x: = (HY, I4 HY,,, I ay,,, I & 9

= (u,, a%- t + PI st WS) 9 (4.6.34)

where E{YT I t ) is the expected value of Y,. given an infinite past (Y,, U,-, , . . .). The vector X, contains information equivalent to the vector (4, ef-,), which is the information required for any future predictions of Y,. Furthermore, X, satisfies the first order vector autoregression

X , = A X , _ , f e , , (4.6.35)

where

and e: = ( I , PI, a)%. The state space representation of (4.6.18) is completed by adding the “observation equation”

u, = (1, 0, O)X, = Hx, (4.6.36)

to (4.6.35). To initiate the filter for the second order moving average, we begin with the

Page 224: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

STATE SPACE MODELS AND KALMAN FILTERING 201

vector 2; = (0, o,o). The covariance matrix of go - X, is

The reader may verify that

(4.6.37)

and

g2,, =A%, =(1 +&+&)-'(PI +P,&,&,O) 'Y, . (4.6.38)

The vector 8; is the best predictor of (Y,, Y,, Y,) given Y , , and the vector g2,, is the best predictor of (Y,, Y3, Y,) given Y, . The predictor of (Y2, Y,, Y,) could have been obtained directly from the regression of (Y,, Y3, Y4) on Y,. The predictor for X, given (Y,, Y,) . and subsequent predictions can be constructed using the equa- tions (4.6.20), (4.6.21), and (4.6.23).

We now give a state space qresentation for a stationary autoregressive moving average of order (p, q). Let

P 9 y,=z q q j + I : P,q-,+E,, (4.6.39) j = I I = ,

where the e, are independent (0, at) random variables. The vector of conditional expectations, XI, of (4.6.34) becomes

where m = max(p, q + 1). From Theorem 2.7.1, we have m

y, = x Y4-r 9 (4.6.41) i - 0

where the y are defined in that theorem. It follows that m

(4.6.42)

Page 225: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

202 SPECTRAL THEORY AND FILTERING

for j = I , 2, . . . , m - I . From equations (4.6.41) and (4.6.42), we have

X, =AX,-, + e l , (4.6.43)

where

A =

0 0

0 1 0 ... 1 ... 0 0

and it is understood that a,,,, . . . , ap+I are zero if m > p . Equation (4.6.43) and the equation

T=( l ,O , ..., O)X, (4.6.44)

form a state space representation for a stationary autoregressive moving average process.

Example 4.6.5. Xn this example, we use the Kalman filter to consttuct predictions for the stationary autoregressive moving average model

where ct -NI(O, 1). The second column of Table 4.6.5 contains observations generated by the process (4.6.45). By Theorem 2.7.1,

i -0

where (q,, y , y) = (1 .OO, 1.40.1.70). The state space representation of the model (4.6.45) is

X, =AX,-, + e l ,

w, = (LO, O)X, I

where

(4.6.46)

Page 226: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

STAT@ SPACE MODeLS AND KALMAN FILTERING 203

Table4.6.5. Kalman Filter Used to Construct Predictors for an Autoregressive Moving Average

1 3.240 3.008 2.578 2 1.643 0.699 0.036 3 2.521 2.457 2.875 4 3.122 3.779 3.359 5 3.788 3.371 2.702 6 2.706 1.782 1.052 7 4.016 4.169 4.601 8 5.656 6.680 6.199 9 6.467 5.903 4.600

10 7.047 6.201 5.622 11 4.284 2.939 1.241 12 2.587 0.749 0.395 13 -0.421 -1.242 -1.671 14 0.149 0.276 1.028 15 -1.012 -0.776 -1.368

1.515 1.162 1.150 1.049 1.032 1.024 1.008 1.007 1.004 1 .002 1.002 1,001 1 .Ooo I .Ooo 1 .OO0

4.033 3.200 3.170 3.074 3.001 3.001 2.977 2.969 2.968 2.%3 2.%2 2.961 2.960 2.960 2.960

and

A = O O I . (1 : 008) Multiplying (4.6.45) by Y, and Y,-,, and taking expectations, we have

yY(0) - 0.8Oyy( 1) = 2.8260, ~ ~ ( 1 ) - 0.80%(0) = 1.4120.

It follows that [%(O), yv( l)] = [10.9878, 10.20221. If we initiate the filter with

2; = (0, 0,O) ,

the covariance matrix of 2, - X, is the variance of [Y,, E{Y,+, It}, E{Y,,, I ~ > I , namely,

10.202 9.988 8.802 . 10.988 10.202 8.742

8.742 8.802 8.028 ) L o o =( To construct the estimator of XI given 2, and Y, , we compute

=Z,, +A%,,&' = ~ E , u o o l

D, = (40, O)Zuv, ,( LO, 0)' = yu(0) = 10.988,

Page 227: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

204

and

SPECTRAL THEORY AND HLTERXNG

2; = (1.000,0.929,0.7%)Y I

= (3.240,3.008,2.578). (4.6.47)

The reader can verify that the coefficients in (4.6.47) are y;’(O)[x(O), ‘yu(l), yy(2)1. The covariance matrix of 8, - X, is

The predictor of X, given 8, is

22 , l =a1 ’

and the covariance matrix of the prediction error is

1 1.515 2.085 2,248

( 2.248 3.238 3.577 Ezll = 2.085 3.033 3.238 ,

where LIZ!, = A ~ u u I , A ’ + Zee, The quantities required to construct g2 are

1.515 2.085 2.248 2.085 3.033 3.238 2.248 3.238 3.577

and

The predictions and variances of the prediction errors are given in the third and fifth columns, respectively, of Table 4.6.5. As the initial effects die out, the estimator of X, stabilizes at

8, =A$,,-, + (1.0,1.4,1.7)‘[Y, - (1.0, O)Ag,-ll = AX,-, + (1.0, 1.4, 1.7)’el .

The predictor stabilizes at

%,+ $1, = = AX, 9

and the covariance matrix of the e c t i o n e m stabilizes at

&+,,,=V{%+,,t -X,+l )=%c.

The second entry in g,,,,, is the predictor of E{Y,+, It + l}, based on

Page 228: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 205

observations through time t. Thus, it is also the predictor of & + 2 based on observations through time t. The limiting variance ofA!f+l,2,f - Y , + ~ is (1 + u:)a2, while the second entry on the diagonal of Z,, is u:u2. In the last column of Table 4.6.5 we give

where V{$,+,,,,, - X,, and u2 = 1. AA

is the second entry on the diagonal of Z,,

REFERENCES

Wens 4.1, 43-45. Anderson (1971). Beran (1992), Blackman and Tukey (1959), Bloomfield (1976), Brillinger (1975). Brockweli and Davis (1991), Granger and Hatanaka (1964), Hannan (1960, 1970), Harris (1%7), Jenkins and Watts (1968), Kendall and Stuart (1966). Koopmans (1974). Whittle (1963), Yaglom (1962).

seetlon 4.2. Amemiya and Fuller (1967). Bellman (1960), Wahba (1968). Section 4.6. Akaike (1974), Anderson and Moore (1979), Diderrich (1983, Duncan and

Horn (19721, Jones (19801, Kalman (1960, 1%3), Kahan and Bucy (1961). Meinhoid and Singpunvalla (1983), Sage (1968). S a l k and Harville (1981), Sorenson (1966).

EXERCISES

1. Which of the following functions is the spectral density of a stationary time series? Explain why or why not. (a) ~ w ) = 1 - + 0 2 , - - n < w ~ r . (b) f (w)= l + $ w , - T C W G ~ :

(c) Jlw) = 476 + cos 130, - Q 4 w T.

2. Let {X,: t E (0, 2: 1, 22, . . . )} be defined by X, = e, + 0.4e,- ,. Compute the autocovariance function r(h) and the spectral density f(w), given that the e, are independently and identically distributed (0, u2) random variables.

3. Give the spectral density for the time series defined by

X , - / 3 X I - , = e , + q e , _ , +crje,_,, t =o, +1,+2,. . . , where value, and the e, are uncorreiated (0, u2) random variables.

< 1, the roots of ma2 + a,m + az = 0 are less than one in absolute

4. Let

X, + a&- I f = 2,

Page 229: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

206

and

SPECTRAL THeORY AND FTLTUUNG

where the el are uncorrehted (0, a2) random variables, and the roots of both m2 + q m + a; = 0 and rz + &r + = 0 are less than one in absolute value. Give an expression for the spectral density of X,. How do you describe the time series XI?

5. Find the covariance function and spectral distribution function for the time series

X, = u , cost + u2 sin f + V, ,

where (u,, uz)’ is distributed as a bivariate normal random variable with zero mean and diagonal covkance matrix diag(2,2), Y, = e, - e,- , , and the el are independent (0,3) random variables, independent of (u, , uz)’.

6. Given the following spectral distribution function:

n+ w , - n s w < -ni2, F,(W)= 5 1 ~ 4 - 0 , - d 2 G w < d 2 , I 9n+ 0 , ni2G w c T .

What is the variance of XI? What is the spectral distribution function of XI - X,- ,? Is there a k such that XI - X,.+ will have a continuous spectraI distribution function?

7. Prove Corollary 4.3.1.3.

8. Let {el: t E (0, t 1, 22, . . .)} be a time series of unconelated (0, a’) random + 0.5e1-,. Give the covariance matrix r(h) and the variables. Let XI =

spectral matrix tyo) for (Xl,ef)’.

9. Let {el,} and {ezf) be two independent sequences of uncorrelated random variables with variances 0.34 and 0.50 respectively. Let

XI, = O.8Xl1- , + e l l , x2, = x,, + %I

10. The complex vector random variable X of dimension q is distributed as a complex multivariate normal if the real vector [(Re X)’, (Im X)‘]’ is distribut- ed as a multivariate normal with mean [(Re p)’, (Im p)’]‘ and covariance

Page 230: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES

matrix

207

where p = E{X} and Z is a positive semidefinite Hermitian matrix. Let (A B , , A,, BJ be the multivariate normal random variable defined follow- ing (4.4.9). Show that X = ( A , + iB,, A, + 5B2)r is distributed as a complex bivariate normal random variable, Give the Hermitian matrix E.

11. Let {uj},m=-= and {b,}r='=_m be absolutely summable, and let {X,: tE (0,+1, k2,. , .)} be a stationary time series with absolutely summabie covariance function. Consider the following filtering operation: (1) apply the filter {aj} to the time series XI to obtain a time series Z, and then (2) apply the filter {b,} to 2, to obtain a time series Y,. What is the transfer function of this filtering operation? Express the spectral density of Y, as a function of the spectral density of X, and of the transfer function.

12. Let {aj}, {bj}, and {X,} be as defined in Exercise 11, and define m

Y, = (aj + b,)X,-, . j a - m

Express the spectral density of Y, as a function of the spectral density of X , and the transfer functions of {a,} and {b,}.

13, Let X, and Y, be defined by

X I = 0.9X,- I + e, , U , = x , + u , ,

where {e,} is a sequence of normal independent (0,l) random variables independent of the sequence {u,}. Let u, satisfy the difference equation

u, + 0.5u,-, = u, ,

where {u,} is a sequence of normal independent (0,0,3) random variables. Assuming that only U, is observed, construct the filter {u,: j = -2, - 1, 0,1,2} so that

2 c. qf-, j - -2

is the minimum mean square error estimator of X,. Construct the one-sided filter {b,: j = 0, 1, . . . , 5 } to estimate X,. How does the mean square error of these filters compare with the lower bound for linear filters?

Page 231: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

208 SPEcf R& THEORY AND FILTERING

14. Let f ( w ) be an even nonnegative continuous periodic function of period 27r. Let

%r

u(h) = I_ Aa)e-*ah do .

Show that, for q a positive integer,

!I - a(h), h = O , ? l , 22,. . . , + q ,

otherwise

is the covariance function of a stationary time series. (Hint: See Exercise 3.15 and Theorem 3.1.10.)

15. Let X, be a time series with continuous spectral density fx(w). Let K be a time series satisfying

P

Z q K - j = e , ,

Ifv(w) -fX(dl< (5 9

I'O

where the e, are uncorrelated (0, u2) random variables, a,, = 1, and Y, is defined by Theorem 4.3.4. (a) Show that (b) Let &(o) be strictly positive. Prove that given > 0 there is a p and a set

- y,,(h)l< 21re for all h.

(9: j = 0, 1,. . . , p } with a,, = 1 such that the time series Zt defined by

satisfies

for all w, where &(o) is the spectral density of 2, and ~ ( 0 ) = u2. Show that

m c yZ(h) =G 2 m 2 . h= I

(c) L,e$f,(w) be strictly positive. Show that, given e > 0, one may define two autoregressive time series 1',1 and Yz, with spectral densities

Page 232: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 209

(d) For the three time series defined in part (c) prove that

for any fixed real numbers {u,: t = 1.2, . . . , n}. 16. Let

XI, = e, - 0.8ef.., , X,, = u, - O . ~ U , - , ,

where {e,} is a sequence of independent (0,l) random variables independent of {u,}, a sequence of independent (0,6) random variables. Express

Y, =x,t +xz, as a moving average process.

17. Let Y, be defined by

y, = s, + 2, , s, = 0.9 SIP, + U, , 2, = 0.8 + e, ,

where the sequence {(e,,~,)'} is a sequence of normal independent (0,E) random variables with E. = diag(O.l,O.6). Construct the optimum filter {a,: j 7 -9, -8,. . . ,8,9} to estimate S,, where the estimator is defined by Zj=.-9u,Y,-j. Construct the best one-sided filter {bj: j = 0, 1, . . . ,8,9} to estimate S,. Compare the mean square error of these filters with the lower bound for linear filters.

18. Let C and D be k X k nonsiagular matrices. Show that

(C-' + D-')-'D--' = C(C + D)-' .

19. Let X-N(0 , ui), and let Y = X + u, where u -N(O, (r:) independent of X. Give the covariance matrix of (Y ,X) and the conditional expected value of X given Y. Consider the regression problem

(Y, 0)' = (1,l)'X + (u, €)' ,

where (u. E)' -N[(O,O)', diag{u:, a:)]. Show that the generalized regression

Page 233: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

210 SPECraAL THMRY AND FILTERING

estimator of X constructed with known (a:, .;') is the conditional expected value of X given Y.

20. Let Z,, be an t X r nonsingular matrix, Z,, a p X p nonsingular matrix, and H an r X p matrix. Show that

[H'E;,,'H f $;:]-' = E.,, - E.,,H'(E,, + HE,,H')-'HE.,, ,

and hence verify (4.6.21). [See Duncan and Horn (1972).]

21. Using Exercise 20, verify equation (4.6.23).

22. Let the model (4.6.1)-(4.6.2) hold, and assume it is desired to construct a recursive estimator of XI+, given the observations Y,, Yz,. . . , Y,. Construct such an estimator. Consider both s > 0 and s < 0.

23. Let the following linear model hold

Y ' = X , B + u , , Y, = x,p + u2 ,

where Y, is n, X 1, XI is n, X k, /3 is k X 1, Y, is n, X 1, X, is n, Xk, and (ui,u;)' -N[O,I~T']. Let XiX, be nonsingular. Let XIX, and where

be given. Construct the best linear unbiased estimator of p as a function of fit, X',X,, X,, and Y,.

24. Prove Corollary 4.2.2.

25. Prove the following lemma.

Lemma. Let Z be the nonsinguiar covariance matrix of the vector Y: = (Y:I,Y;2,Y:3), where

Page 234: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BXERCISES 211

26.

27.

be the best linear estimator of (Y:,, Y:,) based upon Y, , . Let

(e:z,ef'3) = [ ( Y , ~ - fir2(1)f. (y13 - 9 f 3 , ~ ) f ~

and let V be the covariance matrix of (ef'z,e;3)f, where

Then the best linear estimator of Y,, given (Y;l,Y;z) is the best linear estimator of Y,, given Yfl , plus the best linear estimator of er311 given eftl,. That is,

't3/(1,2) = 9,311 +'I312 '

where E:312 = e~,V;~V2,, and

Show that the redictor of U,,, given (Y,, . , . , Y,) of Example 4.6.4 is the first

left element of element of A' 4 ,. Show that the variance of the prediction error is the upper

A~P,,,,A" +- A.X,J~ + P, .

Assume that the time series U, has a spectral density f,(w) such that f,(w) > 0 for all w and such that the Fourier coefficients of tf,(w))-' are absolutely summable. Cleveland (1972) defines the inverse autocorrelation function by pi(h) = [7+i(0)1-~yi(/z), where

U

ri(h) = (4v2)-'&(w)]-'e'"* dw . -II

(a) Show that if U, is a stationary autoregressive process satisfying

then the pi@) are the autocorrelations of the moving average process D

X, =e l + C ale,-, . (b) Show that if U, is an ARMA(p,q) process, then pi@) is the auto-

correlation of the ARMA(q, p ) process with the autoregressive and moving average coefficients interchanged.

i = I

Page 235: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

212 SPECTRAL THEORY AND FILTERIN0

28. Let XI have the spectral density

(2?~)-‘ , d 4 d o d 7d2, Ma)= ( 2 ~ ) - ’ , - 1 ~ / 2 a ~ ~ - d 4 ,

[O otherwise.

The time series XI is sometimes called “bandpassed white noise.” Give the covariance function of XI. Plot the covariance function. Using the covariance function (or otherwise), approximate X, by a third order autoregressive process. Plot the spectral density of the third order process with the spectral density of the original process. What would be the error made in predicting XI one period ahead using the third order autoregressive process?

Given the availability of a random number generator, explain how you would create 200 observations from the time series X, using the “bandpass” idea. Assume you wish to create a normal time series.

29. Let

y, = s, + u, ,

where S, is a stationary time series satisfying

S, = 0.9SI-, + e,

where the e, are M(0,1.9) random variables. Let u, be a sequence of NI(0,l) random variables where u, is independent of Sj for all & j . Construct a centered two-sided filter of length seven to estimate S, given (q-3, . . , Y,+3). Construct a one-sided filter of length 6 using (Y$, Y-,. . . . , q+) to estimate S,. For the two-sided filter define

x, = s, - $ ,

where 8, is the estimate of S, computed using the filter. Compare the v&riance of XI with the lower bound for a two-sided filter.

30. Let Y, be a stationary autoregressive process satisfying

P

2 a;q-,=e,, or B ( w Y , = c , ,

where a, = 1, B( a) = 2:=o a{$#’, e, are independent (0, a*) random vari- ables, and S is the backshift operator. The process also satisfies

i = O

P

2. a l y + , = a , , or A ( w Y , = ~ , , t-0

where A( 93) = Xr=o a, Op-’ and a, are uncorrelated (0, a’) random variables,

Page 236: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCEES 213

Show that the partial correlation between Y, and q-p given (U,-,, . . . , U,-,, . . , yt,,,) is zero for j = 1,2,. . . , and that the partial correlation between U, and Y,-p-j given (q-,,, . . . , U,-,, Y,,, , .. . q,,) is zero for j = 1,2,. . . .

31. Let

U,=x,+u,, X , = @ X 1 - , + e , ,

where e,-NI(0, a:), u,-NI(0, a:), {e,} is independent of {u,}, and X, is stationary with I@(< 1. Show that

U, = eq- , + U, +f lu , - , ,

where /? = -[a2 + (1 + 02)a:]-'8a:, u, are NI(0, a:), and (1 + pz)ui = a; + (1 + 82)af.

Page 237: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

C H A P T E R 5

Some Large Sample Theory

So far, we have been interested in ways of representing time series and describing their properties. In most practical situations we have a portion of a realization, or of several realizations, and we wish a description (an estimate of the panuneters) of the time series.

Most of the presently available results on the estimation of the covariance function, the parameters of autoregressive and moving average processes, and the spectral density rest on large sample theory. Therefore, we s h d present some results in large sample statistics.

5.1. ORDER IN PROBABILITY

Concepts of relative magnitude or order of magnitude are useful in investigating limiting behavior of random variables. We first define the concepts of order as used in real analysis. Let (a,}:=, be a sequence of real numbers and (g,}E, be a sequence of positive real numbers.

Definition 5.1.1. We say a, is of smaller order than g, and write

a, = o(g,,)

if

Definition 5.13. We say a, is at most of order g,, and write

a, = O(g,)

if there exists a real number A4 such that g,' la,[ Q M for all n.

order and the properties of limits.

214

The Properties of Lemma 5.1.1 are easily established using the definitions of

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 238: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ORDER IN PROBABILITY 215

kmma 5.1.1, Let {a,} and {b,,} be sequences of real numbers. Let x} and {g,,} be sequences of positive real numbers.

A

The concepts of ordm when applied to random variables are closely related to convergence in probability.

Debition 5.13. The sequence of random variables {X,,} converges in probability to the random variable X, and we write

plimX,, = x (the probability limit of X,, is X), if for every e > 0

lim ~{(x,, - X I > e} = 0. n+

An equivalent definition is that for every > 0 and S > 0 there exists an N such that for all n > N.

The notation P x,,+ x

is also frequently used to indicate that X,, converges in probability to X. For sequences of random variables, definitions of order in probability were

introduced by Mann and Wald (1943b). Let {X,,) be a sequence of random variables and {g,,} a sequence of positive real numbers.

Page 239: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

216 SOME LARGE SAMPLE THulRY

Definition 5.1.4. We say X,, is of smaller order in probability than g,, and Write

if

plim g, 'x , = 0 .

Delinition 5.1.5. We say X, is at most of order in probability g,, and write

if, for every c > 0, there exists a positive real number M, such that

W n l * M e g n I c c

for all n. If X,, = Op(g,), we sometimes say that X,, is bouded in probability by g,,. We

define a vector random variable to be Op(g,,) if every element of the vector is Op(g,,) as follows.

Definition 5.1.6. If X, is a &-dimensional random variable, then X, is at most of order in pmbabiIity g,, and we write

if, for every (5 > 0, there exists a positive real number M, such that

P{&,,l 3M,g , , )Q € , j h2.m . . P k 9

for all n. We say X,, is of smaller order in probability than g,, and write

if, for every E > 0 and S > 0, there exists an N such that for all n > N,

P{&.,,I > q,,} < s , j = 1,2, . . . , k . Note that R might be a function of n, and X, could still satisfy the definition.

However, it is clear that the M, of the definition is a function of f only (and not of n).

A matrix random variable may be viewed as a vector random variable with the elements displayed in a particular manner, or as a collection of vector random variables. Therefore, we shall define the order of matrix random variables in an analogous manner.

Defsnition 5.1.7. A k X r matrix B,, of random variables is at most of order in

Page 240: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ORDER IN PRO3ABILlTY 217

probability g,,, and we write

B, = OJg,) 9

if for every Q > 0 there exists a positive real number Me such that

P{lbjpI *Megn} v i = l , 2 ,..., &, j = 1 , 2 ...., r ,

for all n, where the b,, are the elements of B,. We say that B, is of smaller order in probability than g, and write

ifforevery e>OandS>OthereexistsanNsuchthatforalln>N,

P{lb,l>eg,}<S, i = 1 , 2 ,.... k , j = 1 , 2 ,..., r .

For real numbers a,, i = 1, , . . . , n, we know that [Zy=, ujl GX;b,la,l. The following lemma, resting on this property, furnishes a useful bound on the probability that the absolute value of a sum of random variables exceeds a given number.

Lemma 5.12. Let X i , i = 1,2,. . , , n, be &-dimensional random variables. Then, for every e > 0,

h f . Let e > o be arbitrary. We see that if XE, lX,l a e, then (x i / a e/n for at least one i E {1,2, . . . , n}. Therefore,

A

Definition 5.1.3 applies for vector random variables of fixed dimension as well as for scalar random variables, if it is understood that IX, -XI is the common Euclidean distance.

Lemma 5.13. Let X, be a Rdimensional random variable such that

plimXi,=$, j = l , 2 ,..., R ,

where X,, is the jth element of X,. Then, for k fixed,

plimX, = X.

Page 241: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

218 SOME LARGE SAMPLE THEORY

Proof. By hypothesis, for each j and for every E > 0 and S > O? there exists an integer N/ such that for ail n > l$

P{&, -x j (>k-”2r}sk-16 a

Let N be the maximum of {N,, N2, . . . * Nk}. Using Lemma 5.1.2. we have

k

P{IX, - X I > e } 4 C P { C X , - ~ ] > k - ” * € } s a j = I

for n > N. A

The proof of Lemma 5.1.3 should also help to make it clear that if k is not fixed, then the fact that X, = op( 1) does not necessarily imply that limlX,,l= 0. The vector random variable cornposed of n entries all equal to n-” furnishes a counterexample for k = n. We shall demonstrate later (Theorems 5.1.5 and 5.1.6) that operations valid for

order are also valid for order in probability. Since it is relatively easy to establish the properties analogous to those of Lemma 5.1.1, we do so at this time.

Lemma 5.1.4. Let Cf,} and { g,} be sequences of positive real numbers, and let {X,} and {Y,} be sequences of random variabIes.

Proof. We investigate only part i, leaving parts ii and iii as an exercise. By arguments similar to those of Lemma 5.1.3, ~,,Y,,I >f,g, implies that R,/f,l> 1 or (and) (Y,lg,,( > 1. By hypothesis, given e > 0 and S > 0, there is an N such

Page 242: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ORDER IN PROBABlLITY

that

Pflx,l'$I<0*5a *

P{[YnI > q,,} < 0.56

for n > N. Therefore,

219

for n >N. The second equality in part i follows from

which holds for ali E > 0. Let q, = maxcf,,, g,,}, Given E > 0 and S > 0, there exists an n such that

P{IX, I > 4 q,,} < $ s . P{lYn 1 > 3 cqn) < + 6

for n >N. Hence, the third result of part i foIlows by Lemma 5.1.2. A

One of the most useful tools for establishing the order in probability of random variables is Chebyshev's inequality.

Theorem 5.1.1 (Chebyshev's inequality). Let r>O, let X be a random variable such that E{IX1? < a, and let FOE) be the distribution function of X. Then, for every e > 0 and finite A,

Proof. Let us denote by S the set of x for which b - A1 a 1~ and by s" the set of x for which - A [ < E. Then,

\ ~r - A l w b ) = b - AlrdF(x) + I, 1. - A l ~ - ( x ) S

> er I, dF(x) = dP{(X - A1 3 E } . A

It follows from Chebyshev's inequality that any random variable with finite variance is bounded in probability by the square root of its second moment about the origin.

Page 243: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

220 SOME LARGE SAMPLE THEORY

Corollary 5.1.1.1. Let {X,} be a sequence of random variables and {a,} a sequence of positive real numbers such that

E{Xi } = O(a,2).

Then

Xn = Op(an) *

Proof. By assumption there exists an M, such that

for all n. By Chebyshev’s inequality, for any M, > 0,

Hence, given L: > 0, we choose M, 2 M, e-’”, and the result follows. A

If the sequence (X,} has zero mean or a mean whose order is less than or equal to the order of the standard error, then the order in probability of the sequence is the order of the standard error.

Corollary 5.1.13. Let the sequence of random variables {X,} satisfy

E{CXn - = O(a5

where {a,} is a sequence of positive real numbers. Then

Xn = Qp(an) *

Proof. By the assumptions and by property ii of Lemma 5.1.1,

w:} = Et<X, - ECQY} + (E(X,b2 = O(a:) 1

and the result follows by Corollary 51.1.1. A

Let the probability limits of two sequences of random variables be defined. We now demonstrate that the sequences have a common probability limit if the probability limit of the sequence of differences is zero.

Page 244: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ORDER IN PROBABILITY 221

Theorem 5.1.2. Let {X,) and {Y,,} be sequences of random variables such that

If there exists a random variable X such that plim X, = X, then

plim Y,, = X . Proof. Given E > 0 and 6 > 0, there exists, by hypothesis, an N such that for

n > N ,

P{IY,, - X,l3 O S E } 6 0.58 and P { P , -X[ 3 0 . 5 ~ ) 4 0.56 . Applying Lemma 5.1.2, for n > N,

Deftnition 5.1.8. For r 3 1, the sequence of random variables {X.} converges in rth man to the random variable X if E{lX,,l'} < 03 for all n and

E{IX,, - XI'} +o as n + -. We denote convergence in rth mean by writing X,, &X.

We note that if E{lX,,-Xmlr}+O as n+w and m - + q then there exists a random variable X such that X,,&X. Using Chebyshev's inequality it is easy to demonstrate that convergence in rth mean implies convergence in probability.

Theorem 5.1.3. Let {X,) be a sequence of random variables with fipnite rth moments. If there exists a random variable X such that X,, &X, then X, +X.

Prod. Given r > 0 ,

by Chebyshev's inequality. For 6 > 0 there is, by hypothesis, an integer N = N(E, 6) such that for d l n >N,

and therefore, for R > N,

One useful consequence is Corollary 5.1.3.1, which can be paraphrased as follows. If the sequence of differences of two sequences of random variables converges in squared mean to zero, then the two sequences of random variables have a common probability limit if the limit exists.

Page 245: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

222 SO&: OE SAMPLE THEORY

Corohry 5.1.3.1. Let {X,} and {Y,} be sequences of .dom variables such that

If there exists a random variable X such that plimX, = X, then plim Y, = X.

Proof. By Theorem 5.1.3, we have that pUm(X, - Y,) = 0. The conclusion follows by Theorem 5.1.2. A

Corollary 5.1.3.2. If the sequence of random variables {Y,} is such that

lim E{ Y,} = p n+m

and

then plim Y, = p.

Proof. The proof follows directly by letting the wnstants p and {E{Y,)} be, A respectively, the X and {X,} of Corollary 5.1.3.1.

Since we often work with functions of sequences of random variables, the following theorem is very important. The theorem states that if the function g(x) is continuous, then '+the probability limit of the function is the function of the probability limit."

Theorem 5.1.4. Let or,) be a sequence of real valued k-dimensional random variables such that plim X, = X. Let g(x) be a function mapping the reaI k- dimensional vector x into a real p-dimensional space. Let g(x) be continuous. Then plim g(X, ) = g(X).

Proof. Given E > 0 and S > 0, let A be a closed and bounded k-dimensional set such that

P{X E A) 2 1 - 0.58 . Since g(x) is continuous, it is uniformly continuous on A, and there exists a 8, such that

I&,) - g(x2)IC 6

if IP, - x21 < 6, and x, is in A. Since pIim X, = X, there exists an N such that for n>N,

P(IX, - XI > 8,) < +a .

Page 246: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ORDER IN PROBABILITY 223

Therefore, for n > N,

Theorem 5.1.4 can be extended to functions that are continuous except on a set D where P{X E D} = 0. See, for example, Tucker (1967, p. 104). Mann and Wald (I943b) demonstrated that the algebra of the common order

relationships holds for order in probability. The following two theorems are similar to a paraphrase of Mann and Wald's result given by Pratt (1959). The proof follows more closely that of Mann and Wald, however.

Theorem 5.1.5, Let {X,} be a sequence of k-dimensional random variables with elements (5,: j = 1,2, + + . , k}, and let {r,) be a sequence of k-dimensional vectors with positive real elements {rjn: j = 1,2,. . . , k} such that

3, = Op(r,,>, j = 192,. f . . t ,

q. ,=oP(rjfl) , j = t + l , t + 2 ,..., k.

Let g,(x) be a sequence of rea€ valued (Bore1 measurable) functions defined on k-dimensional Euclidian space, and let {s,} be a sequence of positive real numbers. Let {a,} be a nonrandom sequence of k-dimensional vectors. If

Sn(a") = 06,)

for all sequences {a,} such that

ujn = O(r,,) +

ujn = o(rj,),

j = 1.2,. . . , t , j = r + 1, t + 2, . . . , k ,

then

gn(Xn) = op(s, ) +

Proof. Set 6 > 0. By assumption there exist real numbers M,, M2, . . . , M, and sequences {Mj,,}, j = t + 1, t + 2,. . . , k, such that limn+ PAjn = 0 and

€ P{I$,,I>M,r, ,}<~, j = 1,2, . . . , t ,

P { & , ~ S M , ~ , , , } < ~ . c

j = t + 1,t + 2,. . . , k ,

Page 247: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

224 SOME LARGE SAMPLE THEORY

for all n. Let {A,,} be a sequence of k-dimensional sets defined by

Then, for a E A,,, there exists an M such that lg,(a)l< Msn. Hence,

Ign<Xn>l< Msn

for all X,, contained in A,,, and the result follows, since the A,, were constructed so A that P { X , E A,) > 1 - c for all n.

in the conclusion.

Proof. The set A, is constructed exactly as in the proof of Theorem 5.1.5. There then exists a sequence {b,,} such that b,,+ b,, = 0 and

I 8, (4 c bnsn

for a contained in A,,. Therefore, lgn(X,,)l< b,s,, for all X, contained in A,,, and A the result follows from the construction of the A,,.

Corollary 5.13. Let {X,} be a sequence of scalar random variables such that

x,, = a -I- q A r n ) P

where r, + 0 as n + @J. If g(n) is a function with s continuous derivatives at x = a,

Page 248: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ORDER IN PROBABILITY 225

then

"-"(a)(X, - a)$-' + O&), 1

(s - l)! +..*+-

where g"'(a) is the j th derivative of g(x) evaluated at x = a.

Proof. Since the statement holds for a sequence of real numbers, the result folIows Erom Theorem 5.1.5. A direct proof can be obtained by expanding g(x) in a Taylor series with remainder

gCS'(b)(X, - a)" S!

I

where b is between X, and a. Since go)(x) is continuous at a, for n sufficiently large, g"'(b) is bounded in probability [i.e.. g'"(b) = Op( l)]. Therefore,

g'"'(b)(X, - = O,(r:).

S! A

Corollary 5.1.6. If

x, = a + Up@,)

in the hypothesis of Corollary 5.1.5 is replaced by

then the remainder O,(r:) is replaced by op(r:),

Proof, The proof is nearly identical to that of Corollary 5.1.5. A

Note that the condition on the derivative defining the remainder can be weakened. As we saw in the proof of Corollary 5.1.5, we need only that g'"(6) is bounded in probability. Also see Exercise 5.30.

The corollaries genetalize immediateIy to vector random variables. For example, let

X, = a + op(r,,).

where Xn=(Xln,XZ ,,..,, XJ. a=(a, ,a , ,..., a&)', and r,,-+O as n - + w Let g(x) be a real valued function defined on k-dimensional Euclidean space with

Page 249: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

226 SOME LARGE SAMPLE THEORY

continuous partial derivatives of order three at a; then, for example,

where t3g(a)/dxj is the partial derivative of g(x) with respect to xj evaluated at x = a and d2g(a)/t3x, ax, is the second partial derivative of g(x) with respect to xi and xi evaluated at x = a.

The Taylor expansions of Corollaries 5.1.5 and 5.1.6 are about a fixed point a. Expansions about a random point are also valid under certain conditions.

Theorem 5.1.7. Let {X,,}, {W,,} be two sequences of kdimensional vector random variables defined on the same probability space. Let l(X,,, Wn) be the line segment joining X,, and W,,. For every (r > 0, suppose there is an N, and a set B, in k-dimensional Euclidean space, such that

P(l(X,, W,,) E Be} a 1 - €

for al l n >N,. Let g(x) be a real valued function that is continuous with continuous partial

derivatives through order s on B, for every > 0. Suppose the absolute values of the sth order partial derivatives are bounded on B, and that

x, - w,, = OJr,,) ?

k k

+ 2-' c, 2 g'V'(wn)(xi, - w,")(3,, - Y,,) + * ' ' i = l j = l

k k

+ [(s - l)!]-I 2 .c g(l-r)(w,,)(x,n - yn)...(xl,, - 4,)

+ OJr3 9

i=1 I = ]

where g(')(W,,) is the first partial derivative of g(x) with respect to xi evaluated at x = w,,, and g(*...')(W,,) is the sth partiai derivative of g(x) with respect to the elements of x whose indexes are i, . . . , t, evaluated at x = W,,.

Proof. Let 6 > 0 be given, and let Be be the associated set. Then g(x) and the first s derivatives of g(x) are continuous on Be. For Z(X,,, W,,) E Be, one can expand the function in a Taylor series with exact remainder. Far scalar X,,, the expansion

Page 250: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CONVERGENCE IN DISTRIBUTION 227

is

g(XJ = g(WJ + g(‘)(W,,NX, - W,) + * * *

-I- [(s - l)!]-’g‘‘-’)(W,)(X, - %)’-’ + (~!)-‘g‘”)(W,’)(X,, - W,)“ ,

where W: is on the line segment joining X, and W , and for scalar arguments, g”’(W,) denotes the jth derivative. Now g‘”(W,) is bounded on B,, and X, - W, = OJr,). It follows that

F

The arguments are analogous for the vector case, and the result is established. A

Example 5.1.1. Let W, = W, where W-N(0 , I). and let

X , = W + Z , ,

-112 where Z,, - U(-n-’”, n given, and set

), independent of W Let g(x) = x-2. Let 1 > t. > 0 be

B, = {x: 1x1 > 2-’e},

BZr = {x: 1x1 > e} . The ordinate of the standard normal distribution is about 0.40 at zero. Therefore, P{W E Be} 2 1 - 0.4~ and P{W E B2e} 3 1 - 0 . 8 ~ If W E B , , and n > 4 ~ - ~ , then X, E B,. Hence, if n > 4r-2 the probability is greater than 1 - E that f(X,, W,) is in Be. The function x - 2 is continuous with continuous derivatives on Be. The absolute value of the first derivative and of the second derivative are bounded by finite multiples of e-3 and E - ~ , respectively, for x E B , . Therefore. the conditions of Theorem 5.1.7 are met and

&= w-2 - ~ w - ~ ( x , , - W) + O , ( ~ - ’ )

= W - 2 f O p ( n - ” 2 ) .

Because W 2 is a chi-square random variable, g(X,,) converges in probability to the reciprocal of a chi-square random variable, by Theorem 5.1.4. Theorem 5.1.7 enables us to establish the order of the remainder in the approximation, AA

5.2. CONVERGENCE IN DISTRIBUTION

In the preceding section we discussed conditions under which a sequence of random variables converges in probability to a limit random variable. A second type of convergence important in statistics is the convergence of a sequence of distribution functions to a limit function. The classical example of such conver-

Page 251: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

228 SOME LARGE SAMPtE THEORY

gence is given by the central limit theorem wherein the sequence of distribution functions converges pointwise to the normal distribution function.

Definition 5.2.1, If {X,} is a sequence of random variables with distribution functions {F&)}, then {X,) is said to converge in distribution (or in$aw) to the random variable X with distribution function F'(x), and we write X, -+X, if

lim F,,(x) = Fx(x) n-W

at all x for which Fx(x) is continuous. Note that the sequence of distribution functions is converging to a function that

is itself a distribution function. Some authors define this type of convergence by saying the sequence {Fxd(x)} converges completely to F&). Thus our symbolism

4

is understood to mean that Fx,(x) converges to the distribution function of the random variable X. The notation

is also used.

Theorem 5.2.1. Let {X,} and {Y,} be sequences of random variables such that

plim(X, - Y,,) = 0 .

If there exists a random variable X such that

Y X , - - + X ,

then

Proof. Let W and Z be random variables with distribution functions F,(w) and F,(z), respectively, and fix e > 0 and S > 0. We first show that

P{IZ - w[ > 4) 4 s

implies that

F,(Z - €) - s F,(Z) e F,(Z + 6 ) + S

Page 252: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CONVERGENCE W DISTRIBUTfON 229

for all z. This result holds because

F& - €1 - F,(Z) = P { Z b Z - €} - P{W 6 z } C P{Z = P { Z < z - r and W > z }

z - E} - P{Z 6 z - E and W d Z }

P{(W- 2) > €}

q w - 21 > €} c 6 ,

and, in a similar manner,

Let xo be a continuity point for F,@). 'Then, given S > 0, there is an q > 0 such - x, lC q and FJx) is continuous at xo - g and at that IFx(.> - Fx(xo)l < $6 for

no + q. Furthermore, for this S and q, there is an N, such that, for n > N,,

PW" - y,t>rl)-=Jis

and therefore

for all x. Also, there is an N, such that, for n > N2,

and

IFxp, + 7) - FX(X0 + 4 < 26 '

Therefore, given the continuity point xo and S > 0, there is an q > O and an N = max(N,, N,) such that for n > N,

As a corollary we have the result that convergence in probability implies convergence in law.

Page 253: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

230 SOME LARGE SAMPLE THEORY

CoroJlary 53.1.1. Let {X,} be a sequence of ranc&mr iuiables. If there exists a random variable X such that plim X, = X, then X, +X.

Corollary 5.2.1.2. Let {X,} and X be random variables such that plim X, = X. If g(x) is a continuous function, then the distribution of g(X,) converges to the distribution of g(X).

Proof. This follows immediately, since by Theorem 5.1.4,

We state the following two important theorems without proof.

Theorem 5.2.2 (Helly-Bray). If {F,(x)] is a sequence 05 distribution hc t ions over k-dimensional Euclidean space @) such that F,(x) -#(x), then

for every bounded continuous function g(x).

Theorem 5.23. Let {F,(x)} be a sequence of distribution functions over 9"' with corresponding characteristic functions {pa(u)}.

(i) If F&)%(x), then q,,(u)+ &u) at ail u E 9!(k', where &u) is the characteristic function associated with F(x).

(ii) Continuity theorem. If y,,(u) converges pointwise to a function du) that is continuous at (0, 0, . . . , 0) E B@), then p(u) is the characteristic function of a distribution function F(x) and F,(x) SF(=).

Theorem 53.4. Let {X,} be a sequence of k-dimepional random variables with distribution functions {Fxl(x)} such that Fx,(x)+Fx(x), and let T be a continuous mapping from B") to ~9). m e n

Proof. By the Helty-Bray theorem, the characteristic function of RX,) A converges to the characteristic function of RX), and the result follows.

Theorem 5.2.5. Let {X,} and {Y,} be two sequences of k-dimensional random variables such that X, is independpt of Y, for ail ne If there exist random variables X and Y such that Fx,(x)-+Fx(x) and FyA(y)+F,(y), then

FX,Y,(% Y > A U X ) F Y ( Y ) '

Page 254: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CONVERGENCE IN DISTRIBUTION 231

Proof. The characteristic function of (X:,Y:)' is given by

Now, by the Helly-Bray theorem, e,(u) 3 e(u) and fipyn(v) + (pv(v). Therefore,

Pxy(u, v) =,l@ eFk,(u)fin(v) = %(U)4Py(V) *

c By the continuity theorem, this implies that F, (x,y)+FXy(x,y), where Fxy(x, y) is the distribution function of independent "ddom variables associated

A with the characteristic function yky(u, v).

From Corollary 5.2.1.1, we know that convergence in probability implies convergence in law. For the special case wherein a sequence of random variables converges in law to a constant random variable, the converse is also true.

Lemma 5.2.1. Let {Y,} be a sequence of p-dimensional random variables with corresponding distribution functions (Fym(y)}. Let Y be a pdimensional random variabie with distribution function Fy(y) such that P{Y = b} = 1, b is a constant vector, and

Then, given E > 0. there exists an N such that, for n > N,

P(IY, - b l 2 E } < E . Proof. t e t B = {y: y, > b, - d p , y2 > b, - e/p, . . . , yp > b, - d p } . Then

Fy(y) = 0 on the complement of B. Fix E > 0. As Fy (9) +F,Jy), there exists an No such that, for n > N o ,

C

E € E c F Y , , @ , ) = F ~ ~ ( ~ , +- ,b , P +-, P . . . ,bp -p ) <- 2P

There also exists an N, such that, for n > N,, 1 - Fy,(b, + dp, b, + SIP, . . . , b, +

Page 255: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

232 SOME LARGE SAMPLE "HE4)RY

rlp) < d2. Therefore, for n > max(No, N, ),

and it follows that

Theorem 5.26. Let {(XL, Y:)') be a sequence of (k +-p)-dimensional random variables where X, is k-dimensional. Let the sequence of joint distribution functions be denoted by {Fx,yn(x,y)} and the sequences of marginal distribution functions by {Fxm(x)} and {Fy (y)}. If then exists a k-dimensional -om variable X andca p-dimensional &om variable Y such that Fxn(x)-sFx(x) and Fym(y) +Fy(y), where P ( y = b} = 1 and b = (bl , b,, . . . , bp)' is a constant vector, then

C Fx,y,(x, Y)-+ Fxy(x. Y)

Proof. Now P{Y = b} = 1 implies that

Fx(x) = F x y k b) ,

that Fxy(x, y) = 0 if any element of y is less than the corresponding element of b, and that Fxy(x, y) = Fx(x) if every element of y is greater than or equal to the corresponding element of b. Fix e > 0, and consider a point (xo, yo) where at least one element of yo is less than the corresponding element of b by an amount E.

Then Fxy(xo, yo) = 0. However, there is an No such that, for n >No,

by Lemma 5.2.1. Let (xo,yl) be a continuity point of Fxy(x,y) where every element of yI exceeds the $orresponding element of b by elpl" > O . Because F,,Cx) -+Fi(x) and Fyn(y) +Fy(y), we can choose Nl such that, for n ,A',, lFxn(xxo) - Fx(xo)l < €12 and IFy,(yI) - Fy(yI)l < €12. Hence,

A

Page 256: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CENTRAL LIMIT THEOREMS 233

Utilizing Theorems 5.2.4 and 5.2.6, we obtain the following corollary.

Corollary 5.2.6,l. Let {X,} and {Y,} be two sequences of k-dimensional random variables. If thgre exists a &dimensional random variable Y and a fixed vector b such that Y,+Y and X,+b, then

Corollary 5.26.2. Let {Y,} be a sequence of k-dimensional random variables, and let {A,) be a sequence of k X k random matrices. 15 there egsts a random vector Y and a fixed nonsingular matrix A such that Y, +Y, A,, +A, then

A;IY,& A-IY.

5.3. CENTRAL LIMIT THEOREMS

The exact distributions of many statistics encountered in practice have not been obtained. Fortunately, many statistics in the class of continuous functions of means or of sample moments converge in distribution to normal random variables. We give without proof the following central limit theorem.

Theorem 53.1 (Lindeberg central limit theorem). Let {Zf: t = 1.2,. . .} be a sequence of independent random variables with distribution functions {F,(z)}. Let E{Z,) = u, E{(Z, - p$} = rrf, and assume

(5.3.1)

2 for all Q > 0, where V , = 2:-, rrr . Then,

where N(0,I) denotes the normal distribution with mean zero and variance one. A form of the central limit theorem whose conditions are often more easily

verified is the following theorem.

Theorem 5.3.2 (Liapounov central limit theorem). Let {Zf: t = 1,2, . . .} be a sequence of independent random variables with distribution functions {Ff(z)}. Let

Page 257: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

234 SCWE 1;.4RCj@ SAMPLE THEORY

for some S >O, then n

v y 2 c, (2, - W)J'N(O, 1). 1 - 1

Proof. LetS>Oands>O,anddefinethesetA, by

A , = {z: IZ - > eV:"}.

"hen

1 " d l+8/2 6 c I lz - &12+6 dF,(z) 9 v, € , = I

which, by hypothesis, goes to zero as n + m. Therefore, the condition on the 2 + S A moment i m p k the condition of the Lindeberg theorem.

The reader is referred to the texts of Tucker (1967), Gnedenko (1%7), and

For a proof of the following extension of the central limit theorems to the Lobve (1963) for discussions of these theorems.

multivariate case, see Varadarajan (1958).

Theorem 533. Let {Zn:n= 1,2,,..} be a sequence of k-dimensiond random variables with distribution functions {Fzm(z)}. Let F&) be the dis- tribution function of X,, = A'Z,, where A is a fixed vector. A necessary and sufficient condition for Fzm(z) to converge to the k-variate distribution function F(z) is that FAan(x) converges to a limit for each A.

In most of our applications of Theorem 53.3, each F,,,(x) will be converging to a normal distribution function and h e w the vector random variable 2, will converge in distribution to a multivariate normal.

The Lindeberg and Liapounov central limit theorems are for independent random variables. It is possible to obtain the limiting normal distribution for sequences that satisfy weaker conditions. One type of sequence that has been studied is the martingale process.

Delenition 53.1. Let {XI}:-, be a sequence of random variables defined on the

Page 258: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CENTRAL LIMIT THBORBMS 235

space {a, d, P}. The sequence is a martingale if, for all t,

E{lXf 1> < O0

and

where d, is the sigma-field generated by ( X I , X , , . . . , X f } . It is an immediate consequence of the definition that for s G t - 1,

DeMtioo 5.3.2. Let {X,}:=, be a martingale sequence defined on ($2, d, P}. Let Zf = XI - XI- I . Then the sequence {Z,}:- I is called a sequence of martingale digerences.

We give a centraI limit theorem for martingale differences. Theorem 5.3.4 is due to Brown (1971). Related results bave been obtained by Dvoretsky (1972), Scott (1973), and McLeish (1974). Also see Hall and Hey& (1980) and Pollard (1984).

Theorem 53.4. Let {Zm: 1 C t G n, n 3 1) denote a triangular array of random variables defined on the probability space (a, d, P), and let {dfn: 0 C t s n, n 3 1) be any triangular m y of sub-sigma-fields of d such that for each n and 1 G t C n, Zf,, is &,,,-measurable and dl-l.,, is contained in din. For 1 Gk =G n, 1 " j G n, and n a 1, let

k

stn = Ir3 9

~ i = E { Z ~ n I ~ - l , n I ,

qn= c a:,, I

r-1

1

1 = I

and

Assume

Page 259: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

236 SOME LARGE SAMPLE THEORY

where I(A) denotes the indicator function of a set A. Then, as n -+ 03, P

s;nlsnn+ N(0,l).

Proof. omitted. A

Corollary 53.4. Let {ef,,} be a triangular array satisfying

E{(e,,12+'ldf,_,.,}<M<w as.

for some 6 >0, where d,,,, is the sigma-field generated by {e),,: jsf} . Let {w,,,: 1 c t c n, n 3 1) be a triangular array of constants, x:=, w:,, + o for all n, satisfying

Then

Proof. Reserved for the reader. A

We now give a functional central limit theorem which is an extension of Theorem 5.3.4. For related results, we refer the reader to Billingsley (1968). In the following, we let D = D[O, I] denote the space of functionsflu) on [0,1] that are right continuous and have left limits. Let (0, d, P) be a probabiiity space, and let X(u, w ) be a random function on D defined by an element o of s1. We will often abbreviate X(u, w ) to X(u) or to X.

An important random function is the Wiener process. If W is a standard Wiener process, then W(u) is normally distributed for every u in [0,1], where W(u)- N(0, u). Furthermore, W(u) has normal independent increments. A process on [0,1] has independent increments if, for

O S ; u , € u , ~ * - . c u , s 1 ,

the random variables

W,) - W@,) , W(U, ) - WU,), . . . , WU,) - W(Uk-1) are independent. The weak convergence or convergence in distribution of a sequence of random eJements X,, in D to a random element X in D will be denoted by X, 3 X or by X,, +X.

Theorem 5.3.5. Let {e,}:' be a sequence of random variables. and let df-

Page 260: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CENTRAL LIMIT THEOREMS

be the sigma-field generated by {el, e,, . . . , e,- ,}, Assume that

and

for some S > 0. Let

where [nu] denotes the integer part of nu and the sum is zero if [nu] = 0. Then

Sn*W,

where W is a standard Wiener process,

Proof. See Billinglsey (1968) or Chan and Wei (1988). A

The conclusion of Theorem 5.3.5 holds for el that are iid(0, (r2) random variables. That form of the theorem is known as Donsker's theorem. See Billingsley (1968, p. 68).

We now give a result for the limiting distribution of continuous functionals of S,. The result follows from the continuous mapping theorem. See Theorem 5.1 of Billingsley (1968).

Theorem 53.6. Let S,, and S be random elements in D. For 0 C s C 1, let

Zi,,(d = 6 ukrXISn(u)I du

and

where fit . . . , f, are real valued continuous functions on the real line and ki 3 0. If S, + S, then

(S,,, z, ,,, . * * I Z,,,) * (S, z, 3 - * ' * Z,) . Proof. Omitted. A

The following corollary is useful in deriving the limiting distribution of estimators of the parameters of nonstationary autoregressive processes.

Page 261: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

238 SOME LARGE SAMYLE THEORY

Corollary 53.6. Let {el) be a sequence satisfying the conditions of Theorem 5.3.5. Define

far r b 1 with Yo = 0. Then

Proof. From Theorem 5.3.5, we have that inul

converges weakly to crW(u), where W(u) is the Wiener process. If we let

and f i ( x ) = x2, then

and

The result fallows from Theorem 5.3.6. A

Theorem 5.3.7 is the vector version of the functional central limit theorem. We define a standard vector Wiener process of dimension k to be a process such that the elements of W(u) are independent standard Wiener pmeases.

Thearem 5.3.7. Let { e , } ~ ~ , be a sequence of random vectors, and let 4- be the sigma-field generated by {el, e,, . . . , e,- I). Assume

and

Page 262: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CEMaAL JJMlT THeoREus 239

E{le,l' *' 1 d,- } < M < m a.s.

far some 6 > 0, or assume the el are iid(0, XJ, where x,, is positive definite. Let I

Y, = Z ej j - I

and

s,,(u) = n- I i2 Z,1'2~fnul , o G u G 1 ,

where [nu) is the integer part of nu, and 2:;' is the symmetric sqm root of see. Then

Sn(u) * w ?

where W is a standard vector Wiener process. Also,

and R

n - ' X ; y Z (Yl-l -y)e;X,"2=Y-~' (1) . 1 - 1

Proof. See Phillips and Durlauf (1986). A

We shall have use for a strong consistency result for martingales.

Theorem 53.8. Let {S,, = Zy=, X,, Js,, n 2 1) be a martingale, and let {M,,, n 2 I} be a nondecreasing sequence of positive random variables where M,, is

measurable for each n. Let

Iim M,, = 03 as. n+

and m

2 M,VE{IX,~P I ~e,- , } < m as. 1'1

Page 263: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

240

for some 1 S p C 2 . Then

SOME LARGE SAMptE THEORY

Proof. See Hall and Heyde (1980, p. 36). A

Coronary 53.8. Let {a,),", be a sequence of random variables, and let 59,- I

be the sigma-field generated by {e,, e,, . . . , ef-,}. Assume

Pruof. B y assumption

The conclusion follows by letting Mi = t in Theorem 5.3.8. A

5.4. APPROXIMATING A SEQUENCE OF EXPECTATIONS

Taylor expansions and the order in probability concepts introduced in Section 5.1 are very important in investigating the limiting behavior of sample statistics. Care must be taken, however, in understanding the meaning of statements about such behavior. To illustrate, consider the sequence {fill, where fn is the mean of n normal independent (p, 1) random variables, p # 0. By Corollary 5.1.5, we may write

fil = p-I - p-,(..tn - p) + p 3 f n - p)2 - p-4(zn - r)3 + op<n-2) (5.4.1)

It follows that 112 - - I plim n [ (xn - p- ' ) + p-'(zn - p)l= 0.

Therefore, by Theorem 5.2.1, the limiting distribution of n 112 ( x , - - I -p-') is the

Page 264: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPROXIMATING A SEQUENCE OF EXPWTTATIONS 241

same as the limiting distribution of -n”’ p -‘ (4 - p). The distribution of -n p (f,, - p ) is N(0, for all n, and it follows that the limiting distribution of n”‘(.f,’ - p- ’ ) is N(0, P-~).

On the other hand, it can be demonstrated that E { X i ’ } exists for no finite n. since tbe expectation of (3,’ - p-’1 is not defined, it is clear that one cannot speak of the sequence of expectations { E { f i ’ } } , and it is incorrect to say that E ( x ~ ’ } converges to p-’.

The example illustrates that a random variable Y, may converge in probability and hence in distribution to a random variable Y that possesses finite moments even though E{Y,} is not defined. If we know that Y, has finite moments of order r > 1. we may be able to determine that the sequence of expectations differs from a given sequence by an amount of specified order. The conditions required to permit such statements are typically more stringent than those required to obtain convergence in distribution. In this section we investigate these conditions and develop approximations to the expectation of functions of mean or “meanlike” statistics. In preparation for that investigation we consider the expectations of integer powers of sample means of random variables with zero population means. Let (Ls,, y,,, Z,, fin)’ be a vector of sample means computed from a random sample selected from a distribution function with zero mean vector and finite fourth moments. Then

112 - 2

where E{XYZ} is the expectation of the product of the original random variables. Siroiiarly,

Page 265: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

242 SOME LAROE SAMPLE THEORY

where uq is the covariance between X and Y, a,, is the covariance between Z and W, and so forth.

We note that the expectation of a product of either three or four mews is his is an example of a g e n d result that we state as a theorem. Our

proof follows closely that of W e n , Hurwitz, and Madow (1953).

Theorem 5.4.1. Let t, = (Z,,, f2,, . . . , j,,,,)' be the mean of a random sample of n vector random variables selected from a distribution function with mean vector zero and 6nite Bth moment. Consider the sequence {%,},"=,, and let b,, b,, . . . , b, be nonnegative integers such that B = Zys, b,. Then

O(n-'") if B is even, O(n-''f1)'2) if B is odd.

E{f;:j;; . . 0 jbm} =

Proof. Now E{X':,&; - .fh} can be expanded into a sum of terms such as

If there is a subscript matched by no other subscript, the expected value of the product is zero. (Recall that in the four-variable case this included terms of the form X,qZ,W,, XiV,ZGk, and XiqZkZ,..) If every subscript agrees with at least one other subscript, we group the terms with common subscripts to obtain, say, H groups. The expected value is then the product of the H expected values. The sum contains n(n - 1). * a (n -H + 1) terms for a particular configuration of H different subscripts. The order of n-B[n(n - 1) - - (n - H + l)] is n-'+H and will be maximized if we choose H as large as possible, If B is even, the largest H that gives a nonzero expectation is B/2, in which case we have BIZ groups, each containing two indexes. If B is odd, the largest H that gives a nonzero expectation is (B - 1)/2, in which case we have (B - 1)/2 - 1 groups of two and one group of three. A

The following lemma is used in the proof of the principal results of this section.

Lemma 5.4.1. Let {X,} be a sequence of k-dimensional random variables with corresponding distribution functions {F,(x)} such that

bi - p,l'dF,(x) = q.3, i = 1.2, . . . , k,

where the integral is over B'~), a, > 0, s is a positive integer, E{x,} = p =

Page 266: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPROXIMATINO A SEQUENCE OF EXI”ATI0NS 243

(p! , H, . . . , k)’, 1x1 = [Zf=, x ; ] ’ ‘ ~ is the Euclidean norm, and

lim a, = 0 . n - w

Then

where the pi, i = 1,2,. . . , R, are nonnegative real numbers satisfying k

2 p;as . i- I

Proofi Without loss of generality, set all p, = 0. Define A = [- 1,1], and let IA(x) be the indicator function with value one for n E A and zero for x rZ A. Then, for oa q G s,

b,IQ G&(&) + GCir *

so that

where the integrals are over a‘”. By the Holder inequality,

where k

r=C P i . A

Theorem 5.43. Let {Xn} be a sequence of real valued tandom variables with corresponding distribution functions {Fn(x)}, and let cf,(x)} be a sequence of real valued functions. Assume that for some positive integers s and No:

(i) Jlx - pizs dF,(x) =a:, where a, 4 0 as n +w.

Ilf.(x)l* dFncX) = O(1).

flp’Cx) 5fnCx) .

(iii) f:’(x) is continuous in x over a closed and bounded interval S for n greater than No, where fi”(x) denotes the jth derivative of &(x) evaluated at x and

Page 267: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

244 SOME LAROE SAMPLE 'IIJEORY

(iv) p is an interior point of S. (v) There is a K such that, for n >No,

and

Then

where

s = l ,

s > l .

Proof. See the proof of Theorem 5.4.3. A

Theorem 5.43. Let {X,} be a sequence of k-dimensional random variables with corresponding distribution functions {FJx)}, and let un(x)} be a sequence of functions mapping 9"' into 9, Let S E (0, m), and define a = &-'(I + 6). Assume that for some positive integers s and No:

(i) Jlx - (ii) J&x)I'+' ~ F , ( x ) = o(I).

(iii) f t f l ..dJ

greater than No, where

dF,,(x) = a:, where a, -+O as n +=.

(x) is continuous in x over a closed and bounded sphere S for all n

f ;I..

(iv) p is an interior point of S. (v) There is a finite number K such that, for n >No,

Page 268: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPROXIMATING A SEQUENCE OF EXPECTATIONS 245

Then

where

The result also holds if we replace (ii) with the condition that the f,(x) are uniformly bounded for n sufficiently large and assume that (i), (iii)’ (iv), and (v) hold for a = 1.

Prod. We consider only those n greater than No. By Taylor’s theorem there is a sequence of functions {Y,} mapping S into S such that

where

1 i f x E S , 0 otherwise

and

k k

c(s!)-’ C * ’ - 2 Klx -

= (s!)-’Kk”Jx - . ) , - I i,=1

Page 269: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

246 SOME LARGE SAMPLE THEORY

We now have that, for s > 1,

However,

Page 270: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPROXWA'MNG A SEQUENCE OF EXPECTATIONS 247

We now give an extension of Theorem 5.4.3 to a product, where one factor is a function of the type studied in Theorem 5.4.3 and the second need not converge to zero.

Corollary 5.43. Let the assumptions of Theorem 5.4.3 hold. Let 2, be a sequence of random variables defined on the same space as X,. Let F,(x, z) be the distribution function of (X,,Z,). Assume

Then

Theorems 5.4.2 and 5.4.3 require that Jlf.(x)l'+* dF,(x) be bounded for all n sufficiently large. The following theorem gives sufficient conditions for a sequence of integrals Jl&(x)l dF,(x) to be O(1).

Theorem 5.4.4. Let U;,(x)} be a sequence of real valued (measurable) functions, and let {X"} be a sequence of k-dimensional random variables with corresponding distribution functions {F,(x)}. Assume that:

(i) If.(x)l d K, for x E S, where S is a bounded open set containing p, S is the

(ii) If,(x)lS Y(x)n' for some p > O and for a function Y(-) such that

(iii) Jlx - & I r dF,(x) = O(n-") for a positive integer r and an q such that

closure of S, and K, is a finite constant.

JIY(x)ly d ~ , ( x ) = q1) for some y. 1 < y <m.

l l q + l / y = 1.

Then

The result also holds for q = 1 given that (i), (iii), and lf,(x)I S K,np hold for all x, some p > 0, and K, a finite constant.

Page 271: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

248 SOME LARGE SAMPL6 THEORY

Proof. Let S > 0 be such that

By Chebyshev’s inequality,

= K, + R P [ P { X , EA}]””[0(1)] = O(1).

For lf,(x)I G K2np and q = 1, we have

= o(1). A

Example 5.4.1. To illustrate some of the ideas of this section, we consider the problem of estimating log p, where p > 0 is the mean of a random variable with finite sixth moment. Let f,, be the mean of a simple random sample of n such observations. Since log x is not defined for x S 0, we consider the estimator

Suppose that we desire an approximation to the sequence of expectations of f,(Q to order n-’. We first apply Theorem 5.4.2 with a, = O(n-*I2) and s = 3. By Theorem 5.4.1, we have that

- p)zs} = q n - 8 ) , = 1 2, I (5.4.2)

Page 272: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPROXLMATtNNG A SEQUENCE OF EXPECTATWNS 249

so that condition i of Themem 5.4.2 is met. To establish condition ii, we demonstrate that l f , ( ~ ) 1 ~ satisfies the conditions of Theorem 5.4.4. Since f,(x) is continuous for x>O, condition i of Theorem 5.4.4 is satisfied. For p = 1, y = 9 = 2 , and

we have that lf.(x)12 < nY(x) and f}Y(x)12 dF,(x) = @I). This is condition ii of Theorem 5.4.4; from equation (5.4.2) we see tbat condition iii also holds for r = 4, 5, or 6. Therefore, the conditions of Themem 5.4.4 are satisfied and the integral of lf.(x)I2 is order one.

Since the third derivative of log x is continuous for positive x, we may choose an No > p-’ and an interval containing y such that f y ’ ( x ) is continuous on that interval for all n >No. To illustrate, let 1.1 = 0.1, and take the interval S to be [0.05,0.15]. Then, for n > 20, f y ’ ( x ) is continuous on S. Therefore, conditions iii, iv, and v of Theorem 5.4.2 are aIsa met, and we may write

1 - - E((2, - + q n - y

2P2

where u2 is the variance of the observations. To order n-’, the bias inL(f,) as an estimator of log p is - ( Z I Z ~ ~ ) - ’ U ~ .

Since 3, possesses finite fifth moments, we can decrease the above remainder term to O(n-’) by carrying the expansion to one more term and using the more general form of Theorem 5.4.3. By equation (5.4.2), condition i of Theorem 5.4.3 holds for s = 4, a = 4, and a, = O(n-’”), so that S = 2 by definition. Condition ii is established by demonstrating that lf,(x)13 satisfies the conditions of Theorem 5.4.4. Defining

and letting p = 8 . we see that lf(x)I’< Y(x)nP and

jIycr)lz~F,(x)+ +x3/212dF,(x)=0(1),

as 2, has finite third moments. Therefore, we are using y = 7 = 2 and q p = 3. Since .@ - 4’ dF,(x) = 0(n-3) and lf.(x)I’ is continuous, the conditions of Theorem 5.4.4 are met. The sixth derivative of logx is continuous for positive x,

Page 273: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

250 SOME LARGE SAMPLE THEORY

and we can find an No and a set S such that conditions iii, iv, and v of Theorem 5.4.3 are also met. BecausefL3)(p) = O(1) and E{(%,, - p)’} = O(n-2), we have

= log p - (2np2)-’u2 -I- O ( 8 ) I AA

5.5. ESTIMATION FOR NONLINEAR MODELS

55.1. ktimators That Minimize an Objective Fuactioa

The expected value of a time series is sometimes expressible as a nonlinear function of unknown parameters and observable functions of time. For example, we might have for {q: r E (0,1,2,. . .)}

where {x,} is a sequence of known constants and A is unknown. The estimation of a parameter such as A is considerably more complicated than the estimation of a parameter that enters the expected value function in a linear manner.

We assume the model

for t = 1.2, . . . , where the random variables are defined on a complete probability space (a, d, P), the vector 6’ = (6p, 6p, . . . , @:I’ of unknown parameters lies in a compact subset 8 of k-dimensional Euclidean space Rk, the function& is defined on R X Q with the form off; known, andA(o, 0 ) is continuous on 8 almost surely, for all r. The expression&(w, * ) represents the function for a particular o in a. The dependence of the function on w is generally suppressed for notational con- venience. The functionf; may depend on an input vector s,, and sometimes we may write the model as

Y, =fix,; 6’) + e, . The dimension of x, need not be fixed, and it may increase with r. Some elements of x, may be fixed and some may be random.

The derivatives of the functionf;(@) are very important in the treatment of this problem, and we introduce a shorthand notation for them. Let f:’)( 6) denote the first partial derivative off;@) with respect to the jth element of 6 evaluated at the point 6 = &. Likewise, letfI’“’(6) denote the second partial derivative of&(@ with respect to the jth and sth elements of 8 evaluated at the point 6 = d. The third

Page 274: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION FOR NONLINEAR MODEM

partial derivative, f~’”’(6), is defined in a similar manner. Let

251

(5.5.2)

be the vector of first derivatives, and let

be the n X R matrix of first derivatives. Let an estimator of the unknown 8’ be the 8 that minimizes Qn(8), where en(@) is a function of the observations and 8. The least squares estimator of 8’ is

the 8 that minimizes n

Q,,(@) = [Y, -f;(@12. r = l

For least squares, the function evaluated at 8’ is

(5.5.3)

n

en,(@’) = n- ’ 2 e: . 1 - 1

The function Qn@) is the negative of the logarithm of the likelihood in the case of maximum likelihood estimation for the model (5.5.1) with normal el.

In order to obtain a limiting distribution for the estimator, the sequence {xl, el}, the function A(@), and the parameter space Q must satisfy certain conditions. Stronger Conditions in one a m will permit weaker conditions in another. The random variables and the function must be such that the estimator is uniquely defined, at least for large n. Furthermore, the function being minimized must have a unique minimum “close to” the true parameter value, and the distance must decrease as the sample size increases.

These ideas are made moiz precise in the following theorems. We begin with results for the general problem and then consider special cases. Let 4 in 8 be a measurable function that minimizes Q,(8) on fd almost surely. Lemma 5.5.1 contains sufficient conditions for the consistency of 4. Lemma 5.5.1 is given in Wu (1981) and is based on the idea used by Wald (1949) in a proof of the strong consistency of the maximum l ikelihood estimator.

Lemma 55.1. Let 4 in 69 be a measurable function that minimizes an objective function Qn(0) on 0 almost surely.

(i) Suppose, for any q > O ,

(5.5.4)

h o s t surely. men tj, + eo almost surely as n -+ 00.

Page 275: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

252 SOME LARGE SAMPLe THEORY

(ii) Suppose, for any q>O,

(5.5.5)

Then 4 4 6' in probability as n + *.

Proof. A proof of part i is given by Wu (1981). We p v e part ii. Suppose 4 does not converge to 8' in probability as n + m . Then there exist q>O, 1 5 e,, > 0, and a subsequence {n,} such that

implying

Therefore,

lim SUP P{ inf [Q,(6) - Q,(6')] 6 0) > e;, . n - w le-e0lrV

That is,

which is a contradiction of the condition (5.5.5). Hence, if (5.55) holds, then 4 A converges to 6' in probability as n + 01.

The conditions in Lemma 5.5.1 are rather general. Gallant and White (1988, pp. 18-19) give a set of sufficient conditions for strong consistency. We give a form of their result in Lemma 5.5.2. The condition (5.5.6) of the lemma is a uniform law of large numbers for the function Qn(6>. This condition guarantees that the sample function will be close to the limit function as the sample size increases. The condition (5.5.7) insures that the limit of QJ6) has a minimum at 8'. It is sometimes called the identification condition.

Lemma 5.5.2. Given (a, d, P) and a compact set 8 that is a subset of W', let Q : X 8 + 9 be a random function continuous on 43 a.s., for n = 1,2, . . . . Let 4 be as defined in Lemma 5.5.1. Suppose there exists a function Qn : Q) + W such that

Page 276: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION FOR NONLINEAR MODELS

uniformly on 8. Assume that for any q > 0

253

(5.5.7)

Then h-e0+o a.s. as n+=.

Proof. Now

The first term on the right of the inequality converges to zero as n + 00 because

and by the assumption (5.5.6), there exists a B in I with P(B)= 1 such that, given any e- > 0, for each o in B there exists an integer N(o , e-) < 00 such that for ail n > N(w, e-),

SUPIQ,(~, e) - Q,,(~)I .= c . e

The difference &@') - Q,,(6') also converges to zero as n -+m by (5.5.6). The A conclusion follows from the assumption (5.5.7) and Lemma 5.5.1.

A critical assumption of Lemma 5.5.2 is (5.5.6), which guarantees that the random functions en(@) behave like nonrandom functions Qn(e> for large n. If Q,(O) can be written as

then a natural choice for en(@) is

(5.5.8)

(5.5.9)

Gallant and White (1988, p. 23) give a uniform law of large numbers for a function of the form (5.5.8) which is a slight madifidon of the result of Andrews (1986). We give the result as Lemma 5.5.3.

Page 277: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

254 SOME LARGE SAMPLE THEORY

Lemma 5.5.3. Let qf(8) be a function from fi X 8 to 9, t = 1.2,. . . , where (a, d, P) is the underlying probability space, and 8 is a compact subset of ak. Assume q,(w,* is continuous in e almost surely. Assume q,(- ,eo) is d- measurable for each 8' in 6 and for t = 1,2, . . . . For given 8' in 8 and S > 0, let

A(&) = {e E 8: 16 - 0'1 C 6 ) .

Assume that for each 8' in 8 there exists a constant So > 0 and positive random variables Ly such that

R

n-! c E{LP} = q 1 ) ; = I

and

1q,(t9) - qf(60)/ Lyle - 8'1 , t = I , 2, . . . , as.

for all 8 in the closure of A(&. Also assume that for al l 0 C S C So, where So may depend on 8*,

n

lim n-' 2 [I,(s) - E{I,(s)}J = o a.s, r = 1 n 3 m

and

where

Proof. See Gallant and White (1988, p. 38). A

It may be easier to obtain a uniform law of large numbers for the function Qn(e) directly than for {q,@), t = 1.2,. . . , n}. A uniform strong law of large numbers for en(@) is given in Lemma 5.5.4, and a uniform weak law of large numbers is given in Lemma 5.5.5.

Page 278: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION FOR N O W E A R MODELS 255

Lemma 55.4. Let 8 be a convex, compact subset of kdimensional Euclidean space, Bk, and let {Qn(6), n = 1,2,. . .} be a sequence of random functions. Assume

(i) For each @, in 8

(ii) Them exist random variables Ln such that

(b) lim sup 5, < 00 as., n-w+

(c) sup E{L,} < 03 . n

Then

n + m { lim sup I Q n ( @ ) - E ( p , ( 8 ) ~ } = 0 as.

Proof. omitted. A

Lemma 5.5.5. Let 8 be a convex, compact subset of k-dimensional Euclidean space, and let {Q,(@): n = 1,2,, . .} be a sequence of random functions. Assume:

(i) For each el in 6,

for some Q(@).

that for 0, and & in 8,

(b) lQ(8,) - eC%>l Ll8, - 41 I

(c) Ln = Up( 1) and L = Up( 1).

(ii) There exists a sequence of positive random variables {L,} and an L such

(a) IQn(# 1 - Qn(4)I -(LnI@l - @ I 9

Then

P W a,(@ - a@> = 0 n 4-

uniformly in 8 in 8.

Proof. Given > 0, consider the open set Gq = {& 149 - 91 < rl) defined by 4 and q. Since 6 is compact, there exists a k =k(q) and @,,%,. . . ,a, such that

Page 279: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

256 SOME LARGE SAMPLE THEORY

8 = ue, G,, where Gi = G,. Now,

Given Q >O and ?> 0, by (i) there exists an Ni = Nf(ti, 7) such that for n >4,

By (ii)(c)? for given S > 0 there exists M > 0 such that

sup P[Ln>M]<3-'S n

and P[L >MI < 3-'S. Thus, given Q > 0 and 6 > 0, choose 11 = ( ~ M ) - ' Q . Then,

Similarly,

Choosing r = (3k)-'S gives the result. A

Given conditions for consistency of 4, we investigate the limiting distribution of the estimator. We begin by giving conditions undet which the least squares estimator converges in law to a nonnal random variable. In the theorem statement we assume the estimator to be consistent. The o h r assumptions of the theorem are sufficient for the existence of a cunsistent sequence of estimators (see Theorem 5.5.3), but are not sufficient to rule out multiple local minima. The theorem is general enough to cover models that contain lagged values of U, in the explanatory vector x,, To permit this, the sigma-field d,-, of the theorem is generated by xl through x, and el through q-'. For a vector a, let la1 =

Theorem 5.5.1. Let the model (5.5.1) hold, and let 4 be a weakly consistent estimator of 8' that minimizes Q,,,(e) of (5.5.3). Let .Q(-, be the sigma-field

Page 280: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION FOR NONLINEAR MODELS 257

generated by

Assume: (i) There exists a convex, compact neighborhood S of the true parameter

vector 8' such that A( - ,8) is measurable for each 8 in S, &(to, - ) is almost surely twice continuously differentiable with respect to 8 on S, S is in 8 and 8' is an interior point of S.

(ii) The vector sequence {[F,(Oo)e,, e,]} satisfies

E{[Ff(Bokf. e,, e:lld,-,) = [O, 0, a*] *

E{F:<8')Ff(8')e>e~.S,- I ) = F:(8°)F,(80)a' mF,(@Ok,, e,l12+g 14- 11 < 4 < Q, 1

a.s., for all t, where S > 0, F,(8) is defined in (5.5.2), and MF is a positive constant.

(iii) There exists a k X k fixed matrix B(6) such that B(8) is continuous at 8 O ,

plim [ 0.5 aeaet - B(B)] = 0 uniformly on S , n+w

plim n-'P:,(8')F,,(8') = n+- lim n-'E{F~,(8°)F,,(80)} = B(8') , n - w

and B(8') is positive definite.

Let

Then

n

s2 = (n - & ) - I x [y , -fix,; bn)12. 1-1

(5.5.10)

(5.5.1 1)

and

(b) s2 converges to Q' in probability.

Proof. Because the estimators et lie inside the convex, $ompact neighborhood S of do with probability approaching one as n increases, 8, satisfies

with probability approaching one as n increases. We can expand the first derivative

Page 281: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

258 SOME LARGE SAMPLE THEORY

in a Taylor series about 8’ to obtain

(5.5.12)

for 4 in S, where & is on the lint? segment joining 4 and 8’. By assumption iii and the consistency of 4,

Now

where en = (el, e,, . . . ,en)’ . Therefore

d- eo = ~ - ~ ( t ~ ’ ) n - ’ F ~ , ( e ’ ) e ~ + r, , where rn is of smaller order in probability than d- 8’.

To prove that

n-1’2F:k(@o)en ”, “0, B(eo)a2]

it is enough to show that for any k-dimensional vector A, AZO,

(5.5.13)

(5.5.14)

(5.5.15)

k

A’[n-1’2F:k(eo)en] = t = l 2 n-’’’[X - 5 1 A,f;”(S’)e,]

converges in law to a univariate normal distribution. We prove this by showing that the conditions of Theorem 5.3.4 hold. Let

Page 282: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION FOR NONLINEAR MODELS 259

and s,, 2 = E{V:,} = ~ZA’[E{n-’F~~(Bo)F,~(Bo)}]A.

By assumption iii

u-~(V: , - sin) = A’[n-’F:,(eo)F,(eO)]A - A‘[E(n-’F&(8°)F,k(80)}]A

converges to zero in probability,

s;, + a2A‘B(tl0)A > 0, and

Hence, con4 ion i of Theorem 5.3.4 is satisfied. Now for any suyitmy e > 0, n

s;: c Erz,t,r(lz,.l~ eS,,)} j = I

n

-8 - (2+S) - (8 /2 ) 6 6 s,, n M,,

where I(A) is the indicator function for the set A. Hence, condition ii of Thearem 5.3.4 is satisfied and

n-l’zF:k(Oo)enL “0, B(Oo)az] . (5.5.16)

By (5.5.12)-(53.14). using Corollary 5.2.6.2, we have

n112($ - e o ) - 5 “0, ~ - ‘ ( e ~ ) ( r ~ l . To obtain the limit of sz, we note that

Page 283: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

260 SOME LARGe SAMPLE THEORY

where 6 is on the line segment joining & and 9' and we have used (5.5.15). Therefore

By assumption ii, the e, have bounded 2 + S conditional moment. Therefore, by Corollary 5,3,5,

n 2 ~ i m (n - k)-' 2 e: = cr almost surely.

t -1 ti--tm

Conclusion b follows by (5.5.17). A

Because s2 is converging to cr2 and n-'Fr:k(d)Fnk(d) is converging to B(9*), we can use

as an estimator of the variance of 4. The statistics computed as the usual "t-statistics" of regression will be distributed as N(0,l) random variables in the limit.

There ~IE situations in which the estimator dm may have a limiting distribution that is not normal. Also, for some functions Q,(6), the covariance matrix of the limiting distribution may not take the simple form of (5.5.11). Some of these situations are covered by Theorem 5.5.2.

Theorem 5.52. Suppose that:

(i) The assumptions (5.5.6) and (5.5.7) of Lemma 5.5.2 hold. (ii) The first and second derivatives of Qm(9) with respect to 9 are continuous

for all R on a convex, compact neighborhood S containing 8' as an interior point.

(iii) n"' dQn(e")la9~2X, where X represents the limiting distribution. (iv) There exists a k X k fixed matrix function B(f3) such that B(f3) is

continuous on S, B(f3') is a symmetric, positive definite matrix, and

n--tm iim [ a2Q,0 aeaer - 2B(8)] = 0

almost surely, uniformly on S.

Let 4 be a measurable function that minimizes the objective function Q,(9). Then

n1"(4 - 6 O ) Y ' - B-'(e')X I

Page 284: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION FOR NONLINEAR MODELS 261

Proop. By assumption i, the estimator 4 converges to 8' almost surely, and 8, lies inside the convex, compact neighborhood S of 8' for large n, almost surely. Therefore, we can expand the partial derivative of Q,(&) with respect to 8 in a Taylor series about 8', for large n, almost surely. For #n on the line segment joining 9 and 8' we have

(5.5.18)

for large n, almost surely. By assumption ivy the second parhal derivative of Q,(6) evaluated at 4 is converging to 2B(eo) 8,s. Therefore, using assumption iii,

A (5,5.18), and Corollary 5.2.6.2, we have the conclusion.

If, in Theorem 5.5.2, X - N[0,2xx], then

nl/*($ - e 0 ) 5 N[O,B-~~~~)X,B-~(~~)].

If X, =B(f9°)aZ, we obtain the result of Theorem 5.5.1 that the limiting distribution is "0, B-'(8')a2].

In Theorem 5.5.1 and Theorem 5.5.2, it is n1'2(8 - 6') that converges in distribution to a nondegenerate random variable. In assumption iii of Theorem 5.5.1, the matrix of derivatives divided by n is assumed to converge to a nonsingular matrix. The normalization of the sums of squares by n and the corresponding normalization of 8- 8' with n112 is not always appropriate.

In time series analyses, it sometimes happens that the sum of squares of the derivative with respect to one parameter increases at a faster rate than the derivative with respect to a second parameter. Consider the simple model

with vector of derivatives F(x,; #3) = [ I , t ] . It follows that the sum of squares of the derivative with respect to p0 increases at the rate n, while the sum of squares of the derivative with respect to p1 increases at the rate n3. %refore, Theorem 5.5.1 could not be used to obtain the limiting distribution of the least squares estimator of the parameter vector. Following Sarkar (1990), we now give a theorem, Theorem 5.5.3, that is applicable to situations in which the sums of squares of the first derivatives increase at different rates. Standard results are obtained by setting M, and a, of the theorem equal to n"'I and n1l2, respectively.

The assumptions of Theorem 5.5.3 are local in that assumption d is a local pmpeay of Qn(8) in the vicinity of 8'. This assumption permits the existence of other local minima. Thus, the conclusion of Theorem 5.5.3 is local in that the sequence of estimators associated with the local region of assumption d are consistent for 6'.

Page 285: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

262 SOME LARGE S A W THEORY

We assume that an estimator of 6 is constructed by minimizing the function Q,,(6). Let the vector of first partial derivatives be

and the matrix of second partial derivatives be

If the miaimum of Q,,(6) occurs in the interior of the parameter space, then the 6 that minimizes QJ6) is a solution of the system

u,(e) = 0 . (5.5.19)

heo or em 55.3. Suppose the true value 6’ is an interior point of the parameter space 8. Assume that there is a sequence of square matrices {M,} and a sequence of real numbers {a,,} such that

(a) limn+- M,’Bn(6’)M~” = B a.s., where B is a k X k symmetric, positive

(b) limn+ a,’M,’U,(e’) = 0 B.S.

(d) There exists a S > 0 and random variables L,, such that for all n, all 6 in

definite matrix, as.

(c) limn+ an = 00.

Snb, and all 4, 0 < 4 < 6,

I ~ ~ ’ B n ( 6 ) M ~ ” - M,’B,,(B’)M,’’I/ d Ln& a.s.,

and

lim sup L,, < w as., n-tm

where

sna = {e E 8: 1l.i .I M:V - eo)>ll < 4)

and for any k X r matrix C, l[Cl12 = tr CC‘ denotes the sum of squares of the elements of C.

Then there exists a sequence of roots of (5.5.19), denoted by {e}, such that

lim /b;’M;(# - 6’)1l= 0 IS. n 3 m

Proof. Given e > 0 and 0 < S, S S, it is possible to define a set A. an no, and positive constants K,, K,, and K3 such that P{A} 3 1 - e and for all o in A and alt

Page 286: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION FOR NONLINBAR MODELS

n>no,

263

y,,,[M~'Bn(8')M,"1 zs 3K2 , (5.5.21)

(5.5.22) I~,'B,(fl')M,'' - BlJ2 < K2 , Ibf,'[B,(@) - B , ( ~ o ) l M ~ " ~ ~ < K 3 ~ o ,

and L,(o) =s K3 for all 8 in Snw by assumptions c, a, a, b, and d, respectively, where y,,,(C) is the smallest root of the matrix C. Let 8; = min(8,. K i ' K , ) . Then, for o in A, 8 in S,,,;, and n >no, we have

Ilanll< KzSX (5.5.23)

and

where Ib,,ll= Ip.5a;'M, 'U,(@')>l/. Let the Taylor expansion of Q,(8) about 8' be

Q,(e) = Q,,(e0) + os(e - eoyu,(eo) + (e - so)tB,(8,,)(e - eo) = Q,,(eo) + os(e - e0yu,(eo) + (e - e0)~~,(8,,) - ~,(e~)l(e - eo)

+ (e - eo)'B,(eo)(e - 8') , (5.5.25) *

where 8, lies on the line segment joining 8 and 8'. Using (5.5.24).

(8 - 8")'M,,{M, ' [B,(i,> - B,(eo)1M,"}M:(6 - 8') Iw#l- 8')1I2K2 (5.5.26)

for all 8 in Snb;. By (5.5.21),

(e - eo)'B,(eoxe - 8') a ~ K J M ~ - eo)>(12 . (5.5.27)

Let R(S,,;) be the boundary of SnS;,

From the expansion (5.5.25), using (5.5.26), (5.5.27), and (5.5.20),

Q,(@ - Q,,(Oo) 3 IIM;(8 - 8°)1)2[2K2 + (e - @')'IlM;(@ - ~0)I/-2a,M,,d,l (5.5.28)

Page 287: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

264

and

SOME LARGE SAMPLE THEORY

and

Q,(@ - Q,(eo) a S,*2KIK2

for all w in A, all n > no, and all B in R(Sn8;). It follows that Q,(@ must attain a minimum in the interior of Sns;, at which point the system (5.5.19) is satisfied. Therefore,

P(1/u,'ML( 8, - e0]>ll< S o } * r 1 - E for n >no .

Now, let A,, n,, and S,* denote A, no, and S,* associated with 4 = 2-I. Since 4 is arbitrary, we can choose the S: to be a sequence decreasing to zero and take n, to be an increasing sequence. Define A = lim inf A,. Then P(A) = 1. Given 7 > 0 and a fixed w in A, there exists a j,, such that w is in A) for all j > jo. Choose jl >jo such that s/, < q. Then, for j > j , and n > n,, we have

lb;1M;t4 - @"I1 c s, <7* A

Corollary 5.53.1. Let assumptions a through d of Theorem 5.5.3 hold in probability instead of almost surely. Then there exists a sequence of roots of (5.5.19), &noted by {8,}, such that

plim i[a,'ML( 8, - Oo)>l[ = 0. n-P

Proof. omitted. A

If the roots of U;'M,M; are bounded away from zero, then 8, is consistent for 8'. If 8, is consistent, and if the properly normalized first and second derivatives converge in distribution, then the estimator of 8 converges in distribution.

Corollpry 5.53.2. Let assumptions a through d of Theorem 5.5.3 hold in probability. In addition, assume 8, is consistent for B0 and

(e) {M,'U,(8°), M;'B,(OO)M;' '}z (X, B) as n +-J. Then

ML(@-t?o)-% -B-'X as n + = ,

Page 288: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION FOR NONLINEAR MODELS 265

where B is defined in assumption a and Un(flo) is the vector of first partial derivatives evaluated at 8 = 8'.

M. Because @ is converging to 8' in probability, there is a compact set S containing 8' as ao interior point such that, given E > 0, the probability that @, is in S is greater than 1 - e for n greater than some N,. For 8, in S, we expand UJ8) in a first order Taylor series about 8' and multiply the Taylor series expansion by Mi' to obtain

0 = M,'U,( 6) = M,'U,(6') + M,'B,( i&)Mi''M;( 8. - 8').

where 8" is an intermediate point between 8, and 8'. Note that

M;'Bn(e)M;'' =M,'Bn(6')M~'' + M;'[B,,(@) - B,(6')]M,''

and the conclusion follows by assumptions a, d, and e. A

By Theorem 5.5.3, there is a sequence of roots of (5.5.19) that converges to 8'. Hence, the assumption of the corollary that 6 is consistent is equivalent to assuming that we are considering the consistent sequence of roots.

Example 5.5.1. Consider the model

(5.5.30)

where e, - NI(0, a'), PI # 0, and

if t is odd, ''1 ={!I if t is even,

Assume that a sample of size n is available and that an estimator of (A, P,)' = f l is constructed as the (PO, P, ) that minimizes

Page 289: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

266 SOME LAROE SAMPLE mmr

is singular if so # 0 and PI # 0. We now show how we can apply Theorem 5.5.3 to this problem. Let

=( 1 -p")("' o)-'" o p ; 0 1

Setting a, = n1I2, we have n

The second derivatives of Q,,(/3) are continuous with continuous derivatives, and hence condition d of Theorem 5.5.3 also holds. It follows that the estimator of /3 is consistent for fi. Using Theorem 5.3.4, one can show that

M , ' U , ( P 0 ) A N O , Vaa)

where Vaa is tbe matrix of (5.5.31) multiplied by a'.

covariance matrix is to reparametrize the model. Assume that /3, #O. Let An alternative method of defining a sequence of estimators with a mnsingular

(% a,) = (8081r PI) 9

(A, P,) =(a;'%, a,).

from which

and

Y, =a;'% + apI1 + q,t + e l .

The vector of first derivatives of the a-form of the mode1 is

It follows that

Page 290: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION FOR NONLlNI3AR MODELS 267

where

and H, = diag(n3, n). It can be shown that n

Hi”‘ Gf(a)e,& “0, Aa21 . I= I

Thus, the least squares estimator of f l is consistent and there is a normalizer that gives a limiting n o d distribution. However, the two nonnalizers that we used are functions of the true, unknown parameters. The practitioner prefers a distribution based on a nonnalizer that is a function of the sample.

It is natural to consider a nonnalizer derived from the derivatives evaluated at the least squares estimator. For the model in the a-form, let

n

K, = 2 Gi(P)G,(&). 1=1

One can show that

and hence

Ki’”(&- a)* N(0,Io’).

It follows that one can use the derivatives output by most nonlinear least s q m programs to construct approximate tests and confidence intervals for the parame- ters of the model. AA

Example 5.53. As an example where Theorem 5.5.3 does not hold, consider the model

u, = /3 + p2t + e, ,

where F(x,; /3) = (1 + 2Sr) and e, - NI(0, a’). Assume that the model is to be estimated by least squares by choosing p to minimize

Then

Page 291: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

268

and

SOME LARGE SAMPLE THEORY

If Po = 0,

n

Bn(/3O) = 1 + 2n-' c te, . The quantity n-' Xr=, re, does not converge to zero in probability. If one divides Bn(p0) by n'", the normalized quantity converges in distribution, but not in probability. If one divides Bn(/3') by a quantity increasing at a rate faster than n"', the Limit is zero. merefore, the m ~ d e ~ with P'=O does not satisfy assumption a of Theorem 5.5.3. However, one can use Lemma 5.5.1 to demon- strate that the k s t squares estimator is a consistent estimator of /3' when /3O = 0. AA

5 . 5 1 one-step Estimation

An important special case of nonlinear estimation occurs when one is able to obtain an initial consistent estimator of the parameter vector 8'. In such cases, asymptotic results can be obtained for a single step or for a finite number of steps of a sequential minimization procedure.

We retain the model (5.5.1), which we write as

u, =AX,; eo) + e, , r = I , 2,. . . , (5.5.32)

where the e, are iid(0, a') random variables or are (0, P') martingale differences satisfying condition ii of Theorem 5.5.1. Let d be an initial consistent estiytor of the unknown true value 8'. A Taylor's series expansion of AxI; 0') about B gives

(5.5.33) k

fix,; eo) =Ax,; 8) + 2 f ( j ) (x , ; &e," - e,> + d,(x,; 8) , j - I

where 0; and 4 are thejth elements of 8' and d, respectively,

* and 0, is on the line segment joining d and 8'. On the basis of equation (5.5.33), we consider the modified sum of squares

= n-'[w - F(&]'[w - F(B)S] ,

Page 292: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION FOR NONLlNEAR MODELS 269

where 6 = 6 - 6, w is the n X 1 vector with rth element given by

w, = y, -fix,; a), and F(6) is the n X k matrix with tth row F,(6) defined in (5.5.2). Minimizing &ij> with respect to s leads to

I = & + # (5.5.34)

as an estimator of 6', where

f = IF'( a)F( &]-IF'( 6)w.

We call 8 the one-step Gauss-Newton estimator.

Theorem 55.4. Assume that the nonlinear model (5.5.32) holds. Let 4 be the one-step Gauss-Newton estimator defined in (5.5.34). Assume:

(1) There is an open set S such that S is in 63, 8'ES, and

piim ~-'F'(~)F(B) = ~ ( 6 ) n - w

is nonsingular for all 8 in S, where F(6) is the n X k matrix with tjth element given by f(j)(x,; 6).

(2) plim,,, n-'G'(@)G(6) = L(6) uniformly in 6 on the closure s of S. where the elements of L(6) are continuous functions of 6 on 3, and G(8) is an n X (1 -k k + k2 + k3) matrix with ?th row given by

Then

8 - 6' = ~'(Bo)F(6')~-'F'(8')e + O,(max{a,?, ~ , n - " ~ } ) ,

Furthemre, if n-"* X:,, F#3°)e,%V[Q,B(60)u2] and a: = o(n-"'), then

n"'( I - @')A "0, B-'(~')CT~] .

Proof. Because 6 - 8' is O,,(a,), given q, > 0, one can choose No such that

Page 293: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

270 SOME LARGE SAMPLE THEORY

the probability is greater than 1 - q, that d is in S for all n >No. On the basis of a T ~ Y I O ~ expansion, we can write, for d in S,

8 = [F'(d)F( &]-IF( e)[f(O0) - f( d) + el = [Ff(d)F(d)]-'F'(d)[F(d)do +el

+ [Ff(d)F(d)]-'R(h),

where f(8) is the n X 1 vector with tth elementflx,; 8), e is the n X 1 vector with tth element e,, 6' = 8' - d, and the jth element of n-'R(d) is

I

r = l s = l r = 1

in which 8 is on the line segment joining d and 8'. The elements of L(8) are bounded on $, and, given el >0, there exists an Nl such that the probability is greater than 1 - B, that the elements of

differ from the respective elements of L(8) by less than el for all 8 € $ and al l n > Nl . Therefore,

and

n-'R(d) = O,,(u;). (5.5.35)

Because B(8) is nonsingular on S, there is an N2 such that Ff(8)F(8) is nonsingular with probability greater than 1 - 9 for all 8 in S and all n >N2. It follows that

and

plim[n-'Ff(d)F(d)]-' = B-'(e0),

where thejsth element of B(8O) is

Page 294: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION FOR NONLINEAR MODELS 271

Therefore,

wf(d)F(d)]-'R(d) = U,(a:).

The jth element of n-'F'( d)e is

where Bt is on the line segment joining d and 8'. By assumption 2, n

n-' 2 p s r f ( l ( t l ; e+) - f (Jsr) (xf ; e 0 ) l 2 L o 1-1

as n + 00. "herefore,

and

The Fesults foIlow from (5.5.35) and (5.5.36). A

To estimate the variance of the limiting distribution of n"2( # - e0), we must estimate B-'(e0) and u2. By the assumptions, [n-'F'(d)F(d)]-' and [n-'F'(a)F( #)I-' are consistent estimators for B-'(9'). Also

n

s2 = (n - & ) - I c [y, -four; 6)12 f = = l

(5.5.37)

is a consistent estimator of a'. Hence, using the matrix [F'(d)F(&)]-' and the mean square s2, all of the standard linear regression theory holds approximately for d.

The theorem demonstrates that the one-step estimator is asymptotically unchanged by additional iteration if 6 - 8' = ~ , , ( n - " ~ ) . For small samples, we may choose to iterate the procedure. For a particular sample we are not guaranteed that iteration of (5.5.341, using B of the previous step as the initid estimator, will

Page 295: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

272 SOME LARGE SAMPLE THEORY

converge. Therefore, if one iterates, the estimator 8 should be replaced at each step bY

e , = $ + v &

where d is the estimator of the previous step and vE(0 , I] is chosen so that n-' Xy,, [Y, - f ix,; @)I2 is less than n-' Ey=, [Y, -fix,; $)I2, and so that €8. This iteration fumishes a method of obtaining the least squares estimator that minimizes (5.5.3).

Example 5.53. To illustrate the Gauss-Newton procedure, consider the

y , = e , + e , X , , + c I ~ x , , + ~ , , (5.5.38)

where the e, are normal independent ( 0 , ~ ~ ) random variables. While the superscript 0 was used on 8 in our derivation to identify the true parameter value, it is not a common practice to use that notation when discussing a particular application of the procedure. Observations generakd by this model are given in Table 5.5.1. To obtain initial estimators of e,, and e,, we ignore the nonlinear restriction and regress Y, on x,, =I 1, x,, , and x , ~ . This gives the regression equation

model

= 0.877 + 1 . 2 6 2 ~ ~ ~ + 1 . 1 5 0 ~ ~ ~ .

Assuming that n-' Z$,, n-' Z&, and n-' Z:X,~X,~ converge to form a positive definite matrix, the coefficients for n,, and x,' are estimators for 0, and O,, respectively, with emrs O,,(n-'"). We note that (1.150)"* is also a consistent estimate of 0'. Using 0.877 and 1.262 as initial estimators, we compute

W, = U, - 0.877 - 1.262,' - 1.593%,, . The rows of F(d) are given by

Ff,(d)=(l,nfl +24&, r = 1,2 ,...,, 6 .

Table 5.5.1. Data and Regression Varioblw Used in the EstImatiOn of the ParameLers of the Model (5.53)

1 9 1 2 4 -0.777 12.100 2 19 1 7 8 -3.464 27.199 3 11 1 1 9 -5.484 23.724 4 14 1 3 7 -1.821 20.674 5 9 1 7 0 -0.714 7.000 6 3 1 0 2 -1.065 5.050

Page 296: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

INSTRUMENTAL VARIABLES 273

Regressing w, on x , ~ and x , ~ + 28,xt2, we obtain

g=( 0.386) -0.163 ' 0.877) + ( 0.386) = (1.263)

e=(1.262 -0.163 1.099

as the one-step Gauss-Newton estimate. Our computations illustrate the nature of the derivatives that enter the

covariance matrix of the asymptotic distribution. In practice, we would use one of the several nonlinear least squares programs to estimate the parameters. These programs often have the option of specifying the initial values for the iteration or permitting the program to use a search technique. In this example, we have excellent start values. If we use the start values (0.877, 1.262), the nonlinear least squares estimate is a= (1.2413,1.0913)', the estimated covariance matrix is

q d l = ( 1.3756 -0.0763 ) -0.0763 0.00054 '

and the residual mean square is 1.7346. We used procedure "LEN of SAS@ [SAS (1989)l for the computations. AA

5.6. INSTRUMENTAL VARIABLES

In many applications, estimators of the parameters of an equation of the regression type are desired, but the classical assumption that the matrix of explanatory variables is fixed is violated. Some of the columns of the matrix of explanatory variables may be measured with error and (or) may be generated by a stochastic mechanism such that the assumption that the enor in the equation is independent of the explanatoq variables becomes suspect.

Let us assume that we have the model

y = Q,@ + XA + z , (5.6.1)

where @ is a k, X 1 vector of unknown parameters, A is a k, X 1 vector of unknown parameters, y is an n X 1 vector, Q, is an n X k , matrlx, X is an R X k, matrix, and z is an n X 1 vector of unknown random variables with zero mean, The matrix @ is fixed, but the elements of X may umtain a random component that is correlated with 2.

Estimators obtained by ordinary least squares may be seriously biased because of the correlation between z and X. If information is available on variables that do not enter the equation, it may be possible to use these variables to obtain consistent estimators of the parameters. Such variables are called instnunenral variables. The instrumental variables must be correlated with the variables entering the matrix X but not with the random components of the model. Assume that we have observations on k, instrumental variables, k, 3 k,. We denote the n X k, matrix of

Page 297: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

274 SOMB LARGE SAMPLE THEORY

observations on the instrumental variables by + and assume the elements of + are fixed. We express X as a linear combination of Cp and #:

X=@S, +w+w = (@ : #)S + W , (5 6 2 )

where

Note that the residuals w follow from the definition of 6. Therefore, w may be a sum of random and fixed components, but the fixed component is orthogonal to @ and + by construction,

The instrumental variable estimators we consider am obtained by regmssing X on @ and +, computing the estimated values 2 from this regression, and hen replacing X by 2 in the regression equation

y = @fl f XA + z .

The instrumental variable estimators of fl and A are given by the regression of y on # and t:

(5.6.3)

where

fit =(a: #)[(a: qb)'(Cp: +)]-'(a: #)'X. These estimators are called two-stage least squm estimators in the econa-

metria literature. An alternative instrumental variable estimator is the limited information maximum likelihood estimator. See Johnston (1984) and Fuller (1987). To investigate the properties of estimator (5.6.3), we mum:

1. Q, = DLi(@: #)'(a: +)D;,' is a nonsingular matrix for all n > k, +k,, and limn+ Q, = Q, where Q is nonsingular and D,, is a diagonal matrix whose elements are the square roots of the diagonal elements of (@: *)'(a: #).

2. M, is nonsinguiar for all n > k, + k,, and

lim M, = M, n-v-

Page 298: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

INSTRUMENTAL VARIABLES 275

where M is a positive definite matrix,

M, = D;,'(@ : X)'(@ : S)Dii , x = (cp: #)6 ,

and D,, is a diagonal matrix whose elements are the square roots of the diagonal elements of (a : X)'(@ : X).

3. limn+., R,, = R, where R is finite and

R, = E{Di, [a : %]'zz'[@ : %ID,'} .

4. (a) limn+, B, = B, where B is finite and

B, = E{Di,(@: #)') 'zz'(cP: #)D;i}

(b) limn+ G,,j = G,,, i , j = 1.2,. . . , k, + k,, where G, is finite,

G,, = E(D;i[a: #]'w,w>[cP: q%JD;,'},

and w,, is the ith column of the matrix w.

5. (a) limn+ dj,,, = 00, j = 1,2, i = 1,2, . . . , k, + k4-,, where d,,,, is the ith diagonal element of Din.

(b) !l(z pi) - I P P , " ~ = 0. i = 1,2, . . . , k, ,

lim 1,4$ +:=O, j = 1 , 2 ,..., k,, ) - I

where qi is the tith element of @ and #j is the tjth element of #.

Theorem 5.6.1. Let the model of (5.6.1) to (5.6.2) and assumptions 1 through 4 and 5(a) hold. Then

Dzn( 6 - 8) = Mi'Dk'(cP : X)'Z + op( 1) ,

where

e= [(a:a)'(~:a),-I(~:az>'y, a = (@: dr)&

and

Page 299: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

276 SOME LARGE SAMPLE THEORY

proof: Define M, to be the matrix M,, with X repiaced by 2. By assumption 4, the variance of D1,( 8 - 6 ) is of order one, and by assumption 3, the variance of D;,'(@ : X)'z is of order one. Thus,

Similarly,

Using

and

we have

DJd- @)=fi,'D~~(@:%)'z=Pkf;' +o,,(l)lD~~(@:%)'z + op(l)

= M; 'D&O : it)'^ + oP(i) . A

Since limn+ M, = M, we can also write

D2,,( 8 - 8) = M-'D&a : X)'Z + OF( 1). (5.6.4)

In many applications n-"*D,, and n-1'2D2,, have finite nonsingular limits. In these cases n'" can be used as the normalizing factor, and the reminder in (5.6.4)

It follows from Theorem 5.6.1 that the sampling behavior of D2,,(&- 0) is approximately that of M,'Di:(cp:x)'~, which has variance M;'R,,M,', where R, was defined in assumption 3. In some situations it is reasonable to assume the elements of z are independent (0, a:) random variables.

is o,(n-"2).

Corollary 5.41. Let the mudel (5.6.1) to (5.6.2) and assumptions 1 to 5 hold with the elements of z independently distributed (0,~:) random variables with

Page 300: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

JNSTRUMENTAL VARIABLES 277

E{z~} = qu:. Then

Furthermore, a consistent estimator for at is

1 n-k, -k2 2'2, 2 s, = (5.6.5)

where 2 = y - (a : X)d.

Proof. A proof is not presented here. The normality result is a special case of Theorem 6.3.4 of Chapter 6. That s,' is a consistent estimator of a: can be

A demonstrated by the arguments of Theorem 9.8.3.

Our analysis treated and qb as fixed. Theorem 5.6.1 will hold for CP and (or) 9 random, provided the second moments exist, the probability limits analogous to the limits of assumptions 1, 2, and 5 exist, and the error terms in y and X are independent of Q, and +.

A discussion of instnunental variables particularly applicable when k, is larger than k, is given in Sargan (1958). A model where the method of instrumental variables is appropriate will be discussed in Chapter 9.

Example 5.6.1. To illustrate the method of instrumental variables, we use some data collected in an animal feediig experiment. Twenty-four lots of pigs were fed six different rations characterized by the percentage of protein in the ration. The remainder of the ration was primarily carbohydrate from com. We simplify by calling this remainder corn. Twelve of the lots were weighed after two weeks, and twelve after four weeks. The logarithms of the gain and of the feed consumed are given in Table 5.6.1. We consider the model

Gi =P, +&Pi +&Ci + Z,, i = 1.2,. . . ,24,

where Gj is the logarithm of gain, Ci is the logarithm of corn consumed, Pi is the logarithm of protein consumed, and 2, is the random error for the ith lot.

It is clear that neither corn nor protein, but instead their ratio, is fixed by the experimental design. The observations on corn and protein for a particular ration are constrained to lie on a ray through the origin with slope corresponding to the ratio of the percentages of the two items in the ration. The logarithms of these observations will lie on parallel lines. Since Cj - PI is fixed, we rewrite the model as

Gi = + (pi + &>f'I + &(C, - Pi> + 2, - In terms of the notation of (5.6.1), Gi = and PI = X i . Candidates for & are functions of C, - P I and of time on feed. As one simple model for the protein

Page 301: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

278 SOME LARGE SAMPLE THBORY

Tnbk 5.6.1. Gain and Feed C~nsumed by 24 Lots d Plga

Lot on Feed Gain Corn Rotein Time Log Log

i Weeks G C P P 2

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

2 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4

4.477 4.564 4.673 4.736 4.718 4.868 4.754 4.844 4.836 4.828 4.745 4.852 5.384 5.493 5.513 5.583 5.545 5.613 5.687 5.591 5.591 5.700 5.700 5.656

5.366 5.488 5.462 5.598 5.521 5.580 5.516 5.556 5.470 5.463 5.392 5.457 6.300 6.386 6.350 6.380 6.314 6.368 6.391 6.356 6.288 6.368 6.355 6.332

3.465 3.587 3.647 3.783 3.787 3.846 3.858 3.898 3.884 3.877 3.876 3.941 4.399 4.485 4.535 4.565 4580 4.634 4.733 4.698 4.702 4.782 4.839 4.816

3.601 3.601 3.682 3.682 3.757 3.757 3.828 3.828 3.895 3.895 3.961 3.961 4.453 4.453 4.537 4.537 4.616 4.616 4.690 4.690 4.760 4.760 4.828 4.828

-0.008 -0.042

0.035 -0.036 -0.033

0.059 -0.043 0.007 0.035 0.034

-0.026 0.017

-0.021 0.003 0.001 0.04 1 0.01 3 0.028 0.028

-0.033 -0.015

0.015 -0.019 -0.041

Source: Data courtesy of Research Department, Moorman Manufacturing Company. The data are a portion of a larger experiment conducted by the Moorman Manufacturing Company in 1974.

consumption we suggest

P, = 8, + S,(C, - Pi) + &ti + 4(ti - 3#Ci - P i ) + W, ,

where ti is the time on feed of the ith lot. The ordinary least squares estimate of the equation is

gi = 4.45 - 0.95 (Ci -Pi) + 0.46ti - O.O2(t, - 3#C, - P i ) , (0.48) (0.09) (0.15) (0.09)

where the numbers in parentheses are the estimated standard emrs of the regression coefficients. If the r;C: are independent (0, a’) random variables, the usual regression assumptions are satisfied. The interaction term contributes very little to the regression, but the time coefficient is highly significant. This suppolts

Page 302: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTiUATED GENERALIZED L U S T SQUARES 279

assumption 2 because it suggests that the partial correlation between P, and ti after adjusting for C, - P, is not zefo. This, in turn, implies that the matrix M, is nonsingular. The hmlues for this regression are given in Table 5.6.1. Regressing G, on @, and C, - P,, we obtain

Gj = 0.49 + 0.98Pi + 0.31(Ci - Pi ) . In this problem it is reasonable to treat the 2 ’ s as independent random

variables. We also assume that they have common variance. The estimated residuals are shown in the last column of Table 5.6.1, These must be computed directly as

Gi - 0.49 - 0.98Pi - 0.31(Ci - Pi ) . The residuals obtained in the second round regression computations are Gi - 0.49 - 0.98@, - 0.31(C, - P i ) and are inappropriate for the construction of variance estimates, From the 2’s we obtain

24

$2 = (21)-’ 2 2; = 0,0010. i s 1

The inverse of the matrix used in computing the estimates is

14.73 -1.32 -5.37 (-1.32 0.23 0.21),

-5.37 0.21 2.62

aud it follows that the estimated standard errors of the estimates are (0.121), (0.015), and (0.051), respectively. AA

5.7. ESTIMATED GENERALIZED LEAST SQUARES

In this section, we investigate estimation for models in which the covariance matrix of the error is estimated. Our treatment is restricted to linear models, but many of the results extend to nonlinear models.

Consider the linear model

Y=X#f+u, (5.7.1)

whereY i s a n n x l v e c t o r , X i s a n n X k m a t r i x , a n d # f i s t h e k X l vectorof unknown parameters. We assume

mu, uu’) t XI = (0, V,,) 7 (5.7.2)

where V,, is positive definite. In many of our applications, u’ = (u,, u2, . . . , u,) will be a portion of a realization of a time series. For example, the time series may be a pth order autoregressive process.

Page 303: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

280 SOME LARGE SAMPLE THEORY

For known V,,, the generalized least squares estimator of f l is

j3 = (x'v,;'x)-'X'v,;'Y, (5.7.3)

where we have assumed V,, and X'Vi:X to be nonsingular. The conditional variance of the estimator is

V(j3 I X) = (x'v;;x)-' . (5.7,4)

Often V,, is not known. We are interested in the situation where an estimator of V,,, denoted by ?,,, is used to construct an estimator of fl. We define the estimated generalid least squares estimator by

@ = (x'6,-,'x)-'x'~~,'Y, (5.7.5)

where we assume ?,u and X'6iU'X are nonsjngular. The estimated generalized least squares estimator of f l is consistent for f l under mild conditions. In the theorem, we use a general nonnalizer matrix, Mi. A natural choice for Mi is G:" = (X'V,;'X)'/*.

Theorem 5.7.1. Let the model (5.7.1) and (5.7.2) hold. Assume there exists a sequence of estimators ?,, and a sequence of nonsingular matrices {M,} such that

M,'X'V,;'u = O J l ) , (5.7.7)

M,'X'(?;,' -V;,')U= O,(&), (5.7.9)

where &+ 0 as n 00. Then

Proop. By the assumption (5.7.8),

(5.7.10)

where

(G,,, a,,) = (X'V;u'X, X'?,;'X). (5.7.1 1)

Page 304: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATED GENFJRALEED LEAST SQUARES 281

Therefore

M;(@ - 8) =MA(&;' - G;~)M,M;~xv,;'u

+- M$;'M,M;~x~(~;,' -v;,')a = OP(5,) * A

In working with models such as (5.7.1)-(5.7.2). it is often assumed that

P M i 'X'V,;"M,'' + A,, (5.7.12)

where A, is a fixed positive definite matrix. This assumption is sufficient for the condition (5.7.6). If there exists a sequence (M,} that satisfies the assumptions of Theorem 5.7.1, then any nonsingular H, such that G, = H,HA also will satisfy the assumptions because (5.7.6) implies that Hi'M, = OP(l).

Theorem 5.7.1 is given for a general normalizing matrix M,. If M, is chosen to be G:", where G, is defined in (5.7.11), then (5.7.6) and (5.7.7) follow directly.

Under the assumptions of Theorem 5.7.1, the limiting distribution of the normalized estimated generalized least squares estimator is the same as the limiting distribution of the normalized generalized least squares estimator con- structed with known V,,, provided the limiting distribution exists. Note that the estimator V,, can converge to v,, rather slowly.

Corollary 5.7.1.1. Let the assumptions (5.7.10)-(5.7.12) of Theorem 5.7.2 hold. In addition assume

M;'G,M;'~--~~A,, (5.7.13)

where A, is a fixed positive definite matrix, and

ML(@ - /3)2 N(0, A;'). (5.7.14)

Then

I M;(B - MO, A;') I (5.7.15)

and

where G, is defined in (5.7.11) and ,yz(k) is a chi-square random variable with k degrees of freedom.

Roof. The assumption (5.7.13) implies the assumption (5.7.6) of Theorem 5.7.1. Hence, (5.7.15) follows from the variance of the generalized least squares

Page 305: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

202 SOME LARGE SAMPLE THEORY

estimator and (5.7.14). Also, (5.7.16) follows from

where we have used (5.7.13), (5.7.4), and the assumption (5.7.8). A

The conditions (5.7.6) and (5.7.7) of Theorem 5.7.1 hold if Mi is chosen equal to Gf”. However, in practice, one wishes a matrix ML that is a function of the data. In some situations, there is a known transfotmation such that the estimated parameters of the transformed model can be normalized with a diagonal matrix. The transformation can be a function of n. In Corollary 5.7.1.2, we demonstrate how the estimator in such a case can be normalized using sample statistics.

corollary 5.7.13. L,et the assumptions of Theorem 5.7.1 hold with M = D:”, where D,, = diagw’V,-,’X}. Also assume (5.7.14) holds with M, = D:”. Then

a:“(@ - B ) O ’ N(0, A;’),

where 8, = diag{X’viu1X}7 and

(5.7.17)

112x,0,1 xfi, 1 I 2 Where d = 6,

Proop. By assumption (5.7.8) of Theorem 5.7.1,

Given the Conditions of Theorem 5.7.1 and Corollary 5.7.1.1, the estimated variance of the estimated generalid least squares estimator can be used to construct pivotal statistics. We state the result in Corollary 5.7.1.3.

Page 306: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTlMATED G E N E R A L W LEAST SQUARES 283

Corollary 5.7.13. Let the assumptions of Theorem 5.7.1 and Corollary 5.7.1.1 hold, where {M,} is a sequence of fixed matrices. Then

6;; A:(B - /?)A N(0, I),

where that can depend on II.

= A$,'A,,, 4, is defined in (5.7.12)' and A, is a fixed nonzero vector

Proof. Under the assumptions, we show that

- I I - unA An(# - /?)z N(0, 1) *

Where

By Corollary 5.7.1.1, we have

By Skorohod's theorem [see Theorem 2.9.6 of Billingsley (197911, there exist random vectors Z, and Z on a common probability space such that 2, has the same distribution as L,, Z - N(0, A;'), and lim Z, = Z as. Therefore,

u,;'A~;''z, = U ; ; A ~ ; " Z + c;;A;M;"(z, - z).

Since t7;;A;M;"A;'Mi'An = 1 , we have

and

where g (A,) is the maximum eigenvalue of A,. Therefore, u;; A#f,"Z, and hence uiA AM;"Ln converges in distribution to the standard n o d distribution. See Sanger (1992, Section 7.1.3).

From the assumption (5.7.8) of Theorem 5.7.1 and the assumption (5.7.13) of Corollary 5.7.1.1,

"Y

M;%,M;" = M;'C,M;'' + up(s,) = A0 + op( l ) ,

Page 307: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

284 SOME LARGE SAMPLE TweoRY

- - o#) 7

where S,,,,, is the maximum element of 4. Therefore &,; qA converges to one in A probability, and the result follows.

Under the conditions of Theorem 5.7.1, the asymptotic distribution of the estimated generalized least squares estimator is the same as that of the generalized least squares estimator, provided the limiting distribution of the generalized least squares estimator exists. The following theorem gives conditions such that the generalized least squares estimator is asymptotically n o d . To obtain a limiting n o d distribution, the transformed errors must satisfy a central limit theorem. In Theorem 5.7.2, we assume the transformed errors to be independent. For example, if {u,} is an autoregressive process with independent increments, then the assumption is satisfied.

Theorem 5.7.2. Assume the model (5.7.1) and (5.7.2) with fixed X. Assume there exists a sequence of fixed, nonsingular matrices {M,,} such that

lim M,'X'V;:XM,'' =A,, , (5.7.19)

where A,, is positive definite. Assume there exists a sequence of nonsingular transformations pn} such that the elements of e = T,a are independent with zero expectation, variance one, and

n-m

E{Ietn12+'I < K for some 6 > 0 and finite K. Also assume that

lim sup [(M;'x'T:),( = 0. n--)m I C j s n

1ei.Gp

Then

M : ( B - m S N(O,A,') 7

where & is the generalized least squares estimator defined in (5.7.3).

(5.7.20)

Page 308: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATED GWERALIZED LEAST SQUARES 285

Proof. Consider the linear combination n

A'Ml'X'Tie = 2 c,"e,, , r - 1

Where

c, = I'M, 'X'T:,, ,

A is an arbitrary K-dimensionai vector with 0 < IAI < a, and T,,,, is the tth row of T,,. By construction, the variance of X:-l c,,e,,, denoted by V,, is

n n

Hence, V, Z:, is bounded. By the assumption (5.7.19), n

fim 2 c:,, =!& A'M;~X'T;T,XM;''A m+m ,- 1

= A'A,A > 0.

Let hi, be the ijth element of M;'x'T;. n u s ,

which implies that

By the assumption (5.7.20) and (5.7.21),

lim v,' sup c,, 2 = 0 . 16r6n n - w

Hence, by Corollary

Since A is arbitrary,

5.3.4, n

P v,"~ C c,,,e,,+ N(O, 1). f - 1

by the assumption (5.7.19) we have

M,'X'Vi:u< N(0, A,),

(5.7.21)

and we obtain the conclusion. A

To this point, no structure has been imposed on V,, other than that it must be

Page 309: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

286 SOME LAROE SAMPL.E THEORY

positive definite. The number of unknown parameters in VuM could conceivably grow with the sample size n. A model of practical importance is

E{u IX} = 0

E{UU' I x} =v,, =vu,(eo), (5.7.22)

where 8 is an 1 X 1 vector of unknown parameters, 1 is fixed, and 8' is the true value. The parameter space for 8 is 8. It is assumed that the form of the function V,,(@) is known and that V,,(8) is a continuous function of t l If, for exampk, the time series is known to be a pth order autoregressive process, the vector 8 will contain the parameters of the autoregressive process. We are interested in the situation in which an estimator of 4 denoted by 4, is used to construct an estimator of V,,(e), denoted by g,, =V,,(d).

The following theorem gives sufficient conditions for the estimated generalized least squares estimator to have the same asymptotic distribution as the generalized least squares estimator for general n d i z i n g matrices under the model (5.7.1) and (5.7.22).

Theorem 5.73. Let the model (5.7.1) and (5.7.22) hold. Let 8 = (el, e,, . . . ,q)' and let

be continuous in 8. Let B be defined by (5.7.5). Assume there exists a sequence of nonsingular matrices {M,} such that

M;G,'M,, = OJI), (5.7.23)

M,'X'V;,'(B')u = Op(l) , (5.7.24)

- 1 0 where G, = X'V,, (6 )X. Also assume

M,'X'Bn,(B)XM,'' = 0,,(1), (5.7.25)

M,'X'BRi(t9)u = Op( 1). (5.7.26)

for i = 1,2,. . . , I, uniformly in an o n neighborhood of 8', denoted by C(8'). Let an estimator of O0, denoted by rbe available, and assume

6- eo = o,(h,,), (5.7.27)

where [,+O as n+w. Then

Page 310: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATED GENERALIZED LEAST SQUARES 287

Proof* By a TayIor expansion

* where 8' is the true value of fl and Bni(t0, d) is a matrix whose j s element is the j s element of Bni(e) evaluated at a poiat on the line segment joining d and 0'. Let the probability space be (a, d, P). Let E > 0 be given. Choose S such that {e: }eo - el < S) c c(eo), where for any vector v, 1.1' = v'v. Since B converges in probability to 8', there exists an NI and a set D,," E d such that P(D1,,) > 1 - el2 and

forj=1,2 ,..., n, fors=l ,2 ,..., n,foralln>N,,andforallw€D,.,.Bythe assumption (5.7.25), there exists a K, an N2, and a set Dz,, E d with P(D,,,)> 1 - €12 such that '

for all R r N,, all w E D2,,. and all 8 E C(@'), where 1141 = [tr(H'H)J"2. Let D,, = D1,, n D,,,, and observe that P(DJ > 1 - c. Let N = max(N,, N2). Therefore, foralln>Nandforall W E D , ,

Hence,

which implies the assumption (5.7.28) of Theorem 5.7.1. By a similar expansion,

M;'X'V;,'(b)u = M,'X'V;,'(6°)u I

+ 2 M;'X'Bn,(Oo, d)u(q - ep), i-1

* where Bni(O0. 8) is analogous to Bnr(8', 8). Using the assumption (5.7.26) and an argument similar to the one used to show (5.7.28).

M;lX'~,i(eo, $)u = O,,( 1).

Thus the assumption (5.7.9) of Theorem 5.7.1 is satisfied. Because the assump-

Page 311: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

288 SOME LARGE SAMPLE THEORY

tions (5.7.23) and (5.7.24) are the same as the assumptions (5.7.6) and (5.7.7) of Theorem 5.7.1, the result follows. A

A sufficient condition for (5.7.23) is

M;~G,M;I~ 5 A,, (5.7.29)

where A, is a fixed positive definite matrix, and a sufficient condition for (5.7.25) is

M;~x'B,,(B"M,''~A,(~), i = 1,2,. . . , I ,

uniformly in a neighborhood of 8' as n 3 00, where the A,@) are continuous in 8. Under the assumptions of Theorern 5.7.3, the limiting distribution of the

normalid estimated generalized least squares estimator is the samc as the limiting distribution of the normalized generalized least squares estimator con- structed with known V,,, provided the limiting distribution exists. Note that the estimator of 6 can converge to 8' at a relatively slow rate.

A common procedure is to estimate f l by the ordinary least s q u m estimator,

/# = (x'x)-Ix'Y, (5.7.30)

compute residuals

ii=Y-XB, (5.7.31)

and use these residuals to estimate 8. Then the estimator of 6 is used in (5.75) to estimate fl. In mder for this procedure to be effective, the estimator of 8 based on U must be a consistent estimator of 6', and this, in turn, requires 8 of (5.7.30) to be a consistent estimator of p. The following lemma gives sufficient conditions for the estimator of 6' based on the ordinary least squares residuals to be consistent.

Lemma 5.7.1. Consider the model (5.7.1). Let d = &u) be an estimator of 6' based upon the true u, and let

* u = [I - cx(x'X)-'x']u, where 0 6 c G 1. Assume that &u) is a continuous function of u with a continuous first derivative, and that

- 6, = 0,Cf"). (5.7.32)

where f' 4 0 as n 3 00. Also assume

(5.7.33)

Page 312: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATED GENERALIZED LEAST SQUARES 289

forj = 1,2, . . . , l, uniformly in c for 0 d c d 1, where is defined in (5.7.30). Let 41) be an estimator of 8 constructed with the reaiduaI.9 defined in (5.7.31). Then

&U) = &u) + op(tn) = eo + o,,(~,,). Proof. By a Taylor expansion about u, we have

(5.7.34)

where 6 is on the line segment ‘oinin u and u, X i is the ith row of X, and ii, - u, = - X i @ - /3). Because u is of the form given in the assumptions, it d g follows from (5.7.33) that

&ii) = &I) + up(sn) = 8; + o,<f). A

Given that 4 is a consistent estimator of 8’, the difference between the estimator of I3 based upon 8 and the estimator of /? based on 8’ converges to zero.

Theorem 5.7.4. Let the assumptions (5.7.23)-(5.7.26) of Theorem 5.7.3 hold. In addition, assume (5.7.32),

M;(x’x)-’M, = op(i), (5.7.35)

E(M~’X’V,,(B’)XM~”) = 0(1), (5.7.36)

and

(5.7.37)

uniformly in c for O S c S 1 and j = 1,2,. . . ,1, where X, is the ith row of X, and 6 is defined in Lemma 5.7.1. Then

~ : r & &tz )) - 131 = w W 0 ) - 131 + oPc 6,) ,

(5.7.38)

Page 313: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

230

then

SOME LARGE SAMPLE THEORY

Proof. Under the assumptions (5.7.36) and (5.7.22),

E { E ~ ~ ' X ' U U ' X M ~ ' ' IXJ} = E(M~'X'V,,(t9°)XM~''} = ql),

which implies that Mi'X'u = Op(l) and

ML(@ - @) = M~(X'X)-'M,,M,'X'U = Op( 1) . (5.7.40)

By (5.7.40) and the assumption (5.7.37).

(5.7.4 1 )

By (5.7.41). the assumption (5.7.32), and Lemma 5.7.1, we have &ii) = e0+ Op(&), and the assumption (5.7.27) of Theorem 5.7.3 holds. Also, the other assumptions of Theorem 5.7.3 hold for &ti). Therefore,

~i[B&ii)) - BI =M:[A~') - BI + op(Sn)

and (5.7.39) follows from the assumption (5.7.38). A

Under the conditions of Theorem 5.7.4, the procedure of using the ordinary least squares residuals to estimate the covariance matrix and then using the estimated covariance matrix in the estimated generalized least squares estimator produces an estimator of /3 with the same large sample properties as the estimator constmcted with known 8. Also

can be used as an estimator of the covariance matrix of the approximate. distribution of @.

5.8. SEQUENCEs OF ROOTS OF P0LYNOMIAI.S

In Chapter 2, we saw that the roots of the characteristic polynomial were important in defining the behavior of autoregressive and autoregressive moving average time series. Hence, the relationships between the coefficients of a polynomial and the

Page 314: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SEQUENCES OF ROOTS W POLYNOMIALS 291

roots are of interest. We write the pth order polynomial as

g(m) = mp + ulmp- ‘ + - - - + ap = (m - rn,)(m - mz) * * - (m - mp) , (5.8.1)

where the mi, i = 1,2,. . . , p, are the roots of g(m) = 0. Let a], f i2 , . . . , tii9 denote the distinct roots with multiplicities r,, rz, . . . , r9, where Z;=]ri =p. If ri = 1, we say that rii, = m, is a simple root. The coefficients can be expressed in terms of the roots by

P P P

a l = - Z m j , a 2 = C C, m p j , i = I j = 1 + 1

(5.8.2) I-! P P P P

up = (-1)’ n m, . i s I

a 3 = - C X G m p y n , , . q . , i = l ]-i+l !==I+,

Thus, the coefficients are continuous differentiable functions of the mots. The functions (5.8.2) define a mapping from a p-dimensional space to a pdimensional space. The differentials are

P

P P

J - 1 I - I da, = -a,

ah3 = -a, 2 ahj -a , 2 mj dm, -

dmj - 2 m, ah, , P P P

m:dmj, I- 1 j = 1 ]=I

P P P

dap = -ap-] C dml - ap-2 2 mi dmj - - - - - m;-’ ahl . j= I I- 1 j= I

Thus

d a = B H d m ,

where da = (da,, da,, . . , , hP)’, dm = (dm,, ah,, . , . , ahp)’,

B =

0 0 0

0 ... 0 ...

-1 ... -1 0 -a, -1 -a2 -a1

-1 -ap-, -apv3 ...

(5.8.3)

(53.4)

(5.8.5)

Page 315: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

292

and

SOME LARGE SAMPLE THEORY

H=

m P - l mP- l ... m P - l 1 2 P

(5.8.6)

The matrix H is nonsingular if all mots are simple, Because B is nonsingular, it is possible to solve for the differentials of m, as functions of the differentials of aj when al l toots are simple.

If the coefficients of the polynomial are real, the complex roots are complex conjugate pairs. In that case, we can use a transfomtion to define real quantities

c, = m , + m 2 , (5.8.7)

(5.8.8) 2 c2 = (m, -m2)2

for every complex conjugate pair. The inverse transformation is

m, = OS(c, + c2),

m2 = O.S(c, - c2), (5.8.9)

where we adopt the convention that m, is defined with the positive coefficient on the radical and m2 is defined with the negative coefficient. The inverse transforma- tion has continuous derivatives except at c2 = 0, the point where m, = m2. The derivatives of the coefficients with respect to c, and with respect to ci are derivatives of real valued functions with respect to real valued arguments. From (5.8.6) we see that repeated r o d s will produce a singularity in the

Jacobian of the transformation. However, even if there are repeated roots, the mots of a polynomial are continuous functions of the coefficients of the polynomial.

Lenuna 5.8.1. Let g(m) of (5.8.1) have distinct roots rill, m2,. . . ,aq with multiplicities r , , r2 , . . . , rq, where a,.u2, . . . ,up are complex numbers, and q-, rj = p . Let

G(m) = mp + (u, + 6,)m p-' 4- - * - + (up + ap>,

where a,, . , . , Sp are complex numbers. For i = 1.2,. . . , q, let ei be an arbitrary real number satisfying

O € Cmin laj - & , I .

Page 316: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SEQUENCES OF ROOTS OF POLYNOMIALS 293

Then there exists a S > 0 such that, if 141 C S for i = 1,2,. . . , p, then G(m) has precisely ri mots in the circle 4 with centex at mi and radius 9, for i = 1,2, . . . , q.

Proof. Omitted. See Marden (1966, p. 3). A

To extend the results of Lemma 5.8.1 to sequences of polynomials, let

g,(m) = mp + al,mP-' + - - - + up" , (5.8.10)

where uinr i = 1,2,. . . , p, are sequences of reaI numbers such that a,, +ui, i = 1,2,. . . , p, as n+W, where the a, are the coefficients of (5.8.1). Let rft,,d,, . . . ,aq be the distinct roots of (5.8.1) with multiplicities r, , r z , . . . , rq, and let E be the arbitrary real number defined in Lemma 5.8.1. Then, by Lemma 5.8.1, there exists an N such that, for n bN, g,(m) has precisely ri mots in the sphere 4i with center rii, and radius E, for i = 1,. . . , q.

We now investigate the special case where a root has multiplicity two. To study the relation between the Coefficients and the roots in the neighborhood of m , = m,, we use the transformation introduced in (5.8.7) and (5.8.8) and assume m3, m4,, . . , mp are simple rods. Then

P

a, = -cl - 2 m, , j - 3

P P P

a , = m , m , + c 1 2 m , + 2 2 "inJ 1-3 j = 3 i = j + I

D D D

= 0.25(c: - cf) + c , m, + c c m,mj , j = 3 J = 3 i = j + l

(5.8.11)

0 R D

P

up = 0,25(c; - c:)(-l)' n mi. 1-3

We observe that the partial derivatives of the coefficients with respect to cI evaluated at mi = m,, am equal to the partial derivatives of the functions (5.8.2) with respect to m,. Also, c2 enters the expressions for the coefficients only as a square. In the proof of Lemma 5.8.2, we show that c,, cj, m3, . . . , mp, where c3 = ci and m3, . . . , mp are simple roots, can be expressed as locally continuous differentiable functions of the coefficients. Thus, Lemma 5.8.2 extends the result of Lemma 5.8.1 to the case in which some of the roots are of multiplicity two. The

Page 317: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

m SOME LAROB SAMPLE THEORY

bound on the difference between two equal roots is the square root of the bound on the error in a simple root.

Lemma 5.8.2. Let a, = (u,,,~,, , . . . ,up,), n = 1,2,. . . , be a sequence of real vectors such that

where M , + O as n+a and a=(u,,u,, . . .,a,,) is the vector of real coefficients of (5.8.1). Let mi,, i = 1,2,. . . , p, be the roots of

and let m,, i = 1,2,. . . , p , be the roots of (5.8.1). Assume the roots of (5.8.1) are either simple roots or roots of multiplicity two. If mi is a simple root of (5.8.1),

for some K < a , where it is understood that for n sufficiently large, K can be chosen such that there is exactly one mi,, satisfying the mequality for each simple mi.

If m , =m, ,

lmln + m z n - I KMn 9

Imln -m,12 KM,

for some K < 00, where, for n sufficiently large, K can be chosen such that exactly one pair of roots satisfies the inequalities.

Proof. Let the roots, denoted by m = (m, , m,, . . . , mP), be arranged so that the repeated mts OCCUT as pairs in the first part of the vector. Let (m, , m,), (m3.m4), . . . , (m2k-1 , m2)) be the pairs of equal roots. Let

and

da=BH,du, (5.8.12)

Page 318: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SEQuENces OF ROOTS OP POLYNOMIALS 295

B is defined in (5.8.3, H is defined in (5.8.6). J is the Jacobian of the transformation of u into m, and

0.5 0 . 2 5 ~ , " ~ 3, = ( 0.5 - 0 . 2 5 ~ ~ ~ _,,,).

The matrix H is a Vandermonde matrix with determinant

IH/= n (rnj--rni), 16i<j6p

and the determinant of H, is

The determinant lHil is not zero at mZi- I = mZi because every zero product in the determinant of H is removed by the transformation. Hence, the differentials of u

A am locally continuous differentiable functions of a, and the results follow.

It is important that the bound given for the difference in Lemma 5.8.2 can be achieved. Consider, for example, the sequence of polynomials

g,(m) = m2 - (2 + 2n-I)m + 1 + n-' + n-' .

The roots of g,(m)=O are 1 +n-* +n-'I2 and 1 +n-' Thus, the difference between the two roots is O(n-1'2). The conclusion of Lemma 5.8.2 can also be obtained from the following theorem.

Theorem 5.8.1. Let a,, = (a,,,, ab,. . . * up,,), n = 1,2,. . . , be a sequence of (possibly complex valued) vectors. and let a = (a I, a,, . . . * up) be the vector of (possibly complex valued) coefficients of (5.8.1). Assume that

la, - al = o(%) * (5.8.13)

where K, + O as n+w. Let *',a2,. . . , rii denote the distinct roots of g(m) = O with multiplicities r , , r,, . . . , rq9 where X i = ,rl = p . Let Q be an arbitrary real number satisfying

8

O<r<0.5 min lar - r f i j l , 2 6 / 9

and let r, = s. Then there exists an N such that if n 2 N, the equation

mp + a,,lmP-l + - - + a,,,, = o (5.8.14)

Page 319: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

2% SOME LARGE SAMPLE THEORY

has exactly s roots in the circle t#l with center 1, and radius E and

(5.8.15)

i Z j

where lei = m,, - MI, i = 1,2,. . . , s, in which mni. i = 1,2,, , . , s, are the s mots in t#l.

Proof. That s roots will fall in when n 3 N follows from Lemma 5.8.1. Therefore, we only demonstrate the order results.

If rii, # 0, we can replace m in (5.8.1) with rn - fi1 to obtain a polynomial with s zero roots. Therefore, without loss of generality, we assume m, = 0. Then

and

The polynomial (5.8.1) can be written as

where

Page 320: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SEQUENCES OF ROOTS OF POLYNOMIALS 297

(5.8.17)

Because the ani, i = 1,2,. . . , p, converge, the b,, j = 1,2,. , . , p - s, also converge, and we write

Since the multiplicity of tii, = 0 is s, we have bp-s # 0. Thus, for example,

from the first equation of (5.8.17). A

Corollary 5.8.1.1. If the assumption (5.8.13) is replaced with

la, - 4 = op(K,)

then O(K,) of (5.8.15) is replaced with Op(x,).

Pmof. omitted. A

Corollary 5.8.1.2. Let the assumptions of Theorem 5.8.1 hold with s = 2. Then, for n a N,

where m,, and mn2 are the two mots in c#+.

proof. We have, for m,, and mn2 in 4,.

by Lemma 5.8.1 and by Theorem 5.8.1. A

We now give a result on the rwts and vectors of square matrices. Let A be any p X p matrix with possibly complex elements. "he roots of the determinantal

Page 321: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

298

equation

SOME LARGE SAMPLE THEORY

1A - mil= 0 (5.8.18)

are called the characteristic roots of A, the eigenvalues of A, the eigenroots of A, or simpry the roots of A. The vector u, that satisfies the equation

(A - rn,I)u, = 0, (5.8.19)

where m, is a root of (5.8.18), is called a characteristic vector of A or an eigenvector of A. The following theorem, taken from Magnus and Neudecker (1988). demonstrates that the simple roots of A are continuous differentiable functions of the elements of A. The theorem gives the differentials of the roots and of the vectors.

Theorem 5.83. Let m, be a simple root of (5.8.18) for A, , where A, is any (possibly complex) p X p matrix. Let u, be the associated eigenvector. Then a complex valued function m(A) and a complex vector valued function u(A) are defined for all A in some neighborhood S of A , , such that

m(A,) = m, W,) = u, (5.8.20)

Au(A) = m(A)u(A), and n:(A)u(A) = 1 (5.8.21)

for all A in S, where u:(A) is the complex conjugate of u,(A). Furthennore, the functions are differentiable any number of times on S, and the differentials are

dm = ( v ~ ~ , ) - ~ v ~ ( d A ) u , (5.8.22)

and

du = (A, - rn,I)+[I - (v~u,)-'u,v~](dA)u, , (5.8.23)

where Bt is the Moore-Penrose generalized inverse of B, v, is the eigenvector associated with the eigenvalue my of A:, satisfying

A ~ v , =mTvI,

and B* is the complex conjugate of B.

Proof. Omitted. See Magnus and Neudecker (1988, p. 161). A

If A, is a real symmetric matrix, then the roots are real, and the expressions (5.8.20) and (5.8.21) simplify to

dm = u;(dA)u, , du = (A, - rn,I)t(dA)u, .

(5.8.24)

(5.8.25)

Page 322: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 299

It follows from Theorem 5.8.1 that if the coefficients of a matrix are converging to limiting values at the rate M,, then a simple root rn, is converging to a limiting value at the same me.

The results for the roots of a matrix are applicable to the roots of a polynomial written in the form (5.8.1). Let the p X p matrix A be defined by

-a, -a, -a3 * * * -ap-, t q I 0 I 0 . * - 0 0 1 0 0 ... 0

A = I I :

; I ’ (5.8.26)

I 0 0 0 . - . 1 O J where the ai are the coefficients of (5.8.1). Then the roots of the determinantal equation

IA - rnII = 0 (5.8.27)

are the same as the mts of (5.8.1).

REFERENCES

Section 5.1. Chernoff (1956). Chung (1%8). Mann and Wald (1943b), Pratt (1959). Tucker (1967).

Seetiom 5A5.3. Billingsley (1%8), Brown (1971), Chan and Wei (1988), Chung (1968). Cram& (1946), Dvorctsky (19721, Gnedenko (1967), Hall and Heyde (1980). Loeve (1%3), Pollard (1984). Rao (1973), Scott (1973), Tucker (1%7), Varadarajan (1958). Wilks (1962).

sec#on 5.4. Cradr (lpQ6). DeGracie and Fuller (19721, Hansen, Hurwitz, and Madow (1953). Hatanaka (1973).

Section 5.5. Andrews (19871, Gallant (1971, 1987). Gallant and White (1988). Gumpertz and Pantula (1992), Hartley (1961). Hartley and Booker (1965), Jennrich (1969), Klimko and Nelson (1978). Malinvaud (197Ob). Potscher and Prucha (1991a.b). Sarkar (1990), Wald (1949), Wu (1981).

W o n 5.6. Basmann (1957), Fuller (1987), Sargan (1958, 1959). Theil (1961). Section 5.7. Carroll and Ruppert (19881, Jobson and Fuller (1980). Rothenberg (1984).

Seetian 5.8. Amemiya (199O), Magnus and Neudecker (1988), Marden (1966). Sanger (1992). Toyooka (1982, 1990), Van Dcr Genugten (1983).

1. Let {a,} and {b,} be sequences of real numbers, and let urn} and {g,) be and b,, = O(g,,). sequences of positive real numbers such that Q, =

Show that (a) la,/ = OW:), s > 0.

Page 323: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

300 SOME LARGE SAMPLE THEORY

2. Let Cf,,}, {g,,}, and {r,,} be sequences of positive real numbers, and let {X,,}, {Y,,), and (2,) be sequences of random variables such that X,,=O,(f.), Y,, = Op(gn), and 2, = op(r,,). Without recourse to Theorems 5.1.5 and 5.1.6, show that: (a) Ix,,lJ = o,(f."), s > 0.

(c) x,, + Y,, = ~,(max(f , . 8,)). (b) X n u n = Op(f,gn).

(d) Xnzm = opcf,',).

(e) If g,,/r,,=o(l), then Y,,+Z,,=o,(r,,).

3. Let X,, . . . , X, and Y,, . , . , Y,, be two independent random samples, each of size n. Let the X, be N(0,4) and the yi be N ( 2 , 9 ) . Find the order in probability of the following statistics. (a) X,,. (d) r'f . (b) F,,. (e) X,,F,,.

Find the most meaningful expression (i,e., the smallest quantity for which the order in probability statement is true).

(c) x;. ( f ) (X,, + 1).

4. h v e that plim,,,, 8 = 8' implies that there exists a sequence of positive real numbers {a,,} such that lim,,+ a,, = 0 and 8 - 6' = O,(a,).

5. Let (a,} be a sequence of constants satisfling ZL, la,l< 00; also let {X,,,: t = 1,2, . , . , n; n = 1,2,. , .} be a triangular array of random variables such that E{X,,} = O(b:), t = 1,2, . . . , n, where limn+, b,, = 0. Prove

6. Let Z,, be distributed as a normal (p, a 2 / n ) random variable with p f 0, and define Y,, = 3; - f," e X p d Y,, in a Taylor's series through terms of O,(n-'). Find the expectation of these terms.

7. Let F, , F2,. , . , FM be a finite collection of distribution functions with finite variances and common mean p. A sequence of random variables {X,: t E (I , 2, . . .)) is created by randomly choosing, for each t, one of the distrihution functions and then making a random selection from the chosen distribution. Show that, as II +=,

Page 324: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 301

8.

9.

10.

11.

where a: is the variance of the distribution chosen for index t. Does

hf

n ” ’ ( M - ’ c . j - I .;)-*’2(n-’ I= i X + h ( O , I I ) ,

where a; is the variance of the jth distribution, j = 1,2, . . . , M? Let X, be normal independent (p, a’) random variables, and let 2, = n-’ E:= x,. (a) It is known that p # 0. How would you approximate the distribution of Xlf

(b) It is known that p = 0. How would you approximate the distribution of Xlf

Explain and justify your answers, giving the normalizing constants necessary to produce nondegenerate limiting distributions.

in large samples?

in large samples?

Let f,, be distributed as a normal (p, a 2 / n ) random variable, p > 0. (a) Find E{Y,} to order (1 /n), where

Y, = sgnf,, ~,,[”’

and sgn X,, denotes the sign of 3,. (b) Find E{(Y, - pi’*)2) to order (lh). (c) What is the limiting distribution of nl’*(Y, - pi”)?

Let f, be distributed as a normal (p, a 2 / n ) random variable, p # 0, and define

Mn z, = 2 ‘ &+a

Show that E{Xi ’} is not defined but hat E{Z,} is defined. Show further that Z, satisfies the conditions of Theorem 5.4.3, and find the expectation of 2, through terms of O( lh) .

Let (X, Y)‘ be distributed as a bivariate normal random variable with mean (n, h)’ and covariance matrix

Given a sample of size n from this bivariate population, derive an estimator for the product ~4 that is unbiased to q l l n ) .

Page 325: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

302

12. Assume the model

SOME LARGE SAMPLE THEORY

U, = a + h-”~ + u, ,

where the u, are distributed as n o d independent (0, u2) random variables. We have available the following data:

t u, XI t u, XI

1 47.3 0.0 7 136.5 2.0 2 87.0 0.4 8 132.0 4.0 3 120.1 0.8 9 68.8 0.0 4 130.4 1.6 10 138.1 1.5 5 58.8 0.0 11 145.7 3.0 6 111.9 1 .o 12 143.0 5.9

where Y is yield of corn and x is applied nitrogen. Given the initial values & = 143, 1 = -85, = 1.20. carry out two iterations of the Gauss-Newton procedure. Using the estimates obtained at the second iteration, estimate the covariance matrix of the estimator.

13. In the illustration of Section 5.5 only the coefficient of x,, was used in constructing an initial estimate of 0,. Identifying the original equation as

U, = so + elXli + a’%,* + e, ,

construct an estimator for the covariance matrix of ($,8,, &), where the coefficients 80, 8,, and L2 are the ordinary least squares estimates. Using this covariance matrix, find the A that minimizes the estimated variance of

Ad, + ( I -A)&

as an estimator of 6,. Use the estimated linear combination as an initial estimate in the Gauss-Newton procedure.

14. Assuming that the e, are normal independent (0, cr2) random variables, obtain the likelihood function associated with the model (5.5.1). By evaluating the expectations of the second partiat derivatives of the likelihood function with respect to the parameters, demonstrate that the asymptotic covariance matrix of the maximum likelihood estimator is the same as that given in Theorem 5.5.1.

15. Let U, satisfy the model

U, = e,, + 0, e-’ZL + el , t = I, 2, . . . ,

Page 326: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 303

where 62 > 0 and e, is a sequence of n o d independent (0, a2) mdom variables. Does this model satisfy assumptions 1 and 2 of Theorem 5.4.4? Would the model with t in the exponent replaced by x,, where {x,}= (1,2,3,4,1,2,3,4,. . .}, satisfy the three assumptions?

16. An experiment is conducted to study the relationship between the phosphate content of the leaf and the yield of grain for the corn plant. In the experiment different levels of phosphate fertilizer were applied to the soil of 20 experimental plots. The leaf phosphate and the grain yield of the corn were recorded for each plot. Denoting yield by Y, applied phosphate by A, and leaf phosphate by P, the sums of squares and cross products matrix for A, P, and Y were computed as

Z A 2 Z A P 2 A Y 69,600 16,120 3,948

( Z Y A XYP 3,948 1,491 739 Z P A ZP2 X P Y 16,120 8,519 1,491

The model

yr= /3P1+u , , t = l , 2 ,*.., 20,

where the u, are normal independent (0, cr2) random variables, is postulated. Estimate /3 by the method of instrumental variables, using A , as the instrumental variable. Estimate the standard error of your coefficients. Compare your estimate with the least squares estimafe.

17. Show that if X, = O,(u,) and a, = o(b,), then X, = o,,(b,).

18. Prove the following corollary to Theorem 5.5.1.

Corollary 55.1. Let the assumptions of Theorem 5.5.1 hold. Also assume

E{(e; - a2)21d,,_,} = K

and

Then

n112 2 - (s N(0. K) . 19. Let s2 be defined by (5.5.57). Show that

s2 = (n - k)-'e'{I - ~(e~)i[~'(e~)~(e~)1-'F(e~)}e + 0,(max{u,3, n-'''u:})

under the assumptions of Theorem 5.5.4.

Page 327: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

304 SOME LARGE SAMPLE THEORY

u). Show that if the assumptions (5.7.6) through (5.7.9) hold and if M i = op( l), then converges to fl in probability, where /# is defined in (5.7.5).

21. Let X,, t = 1,2,. . . , n = 1,2,. . . , be a triangular array of random variables. Let {gf}:=, and {h,},"=, be sequences of positive real numbers. Show that if

E{ktnI} = glh, , t = 1,2,. . . , n , n = 1,2,. . . , then

Show that X,, = O,(g,h,) does not imply that

e ze 22. Let the model (5.7.1)-(5.7.2) hold with XI = t and V,, = diag{e , e , . . . , en'} for e E 8 = (1, a). Let M, = G:". Suppose an estimator of eo, the true vAe, is available such that 8- eo = O,(n-"'). (a) Show that

does not go to zero in probability, where lbIl2 is the square root of the largest eigenvalue of A'A.

(b) Show that the assumption (5.7.8) holds for this model.

23. Let

and let

&) = a, + a,z + a * f agk = 0

have distinct roots Z,, &,. . . , ,Tq with multiplicities r l , r,, . . . , rq. Let

ain - a, = o,(n+)

for i = 1.2,. . . ,k and some (>O. Prove that, given a>O, there exists an N such that the probability is greater than 1 - 6 that exactly r, of the roots of ~ ( z ) are within Men-' of zi for i = 1.2,. . . , q.

Page 328: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 305

The variance of f i conditional on x,, t = 1.2,. . . , n, is

vcb I (X, , . . . , XJ} = (i: 1 = 1 x:.,) -I..: .

Show that

n

plim n-3 2 X:X, = 3-’J‘J, n - w t - 1

where J = (1,l). Construct a sequence of matrices (M,} such that

and such that

25. Use Theorem 5.3.4 to prove the assertion made in Example 5.5.1 that

M;’Un(@o)-i”, N(0, VaS).

26. Let (a,,,a2,)’-N[(-1.40,0.49)’,n-’I]. Let (mln,mh) be the roots of 2 m + aInm + a,, = 0 ,

where lmlnl Sr lm,,l. What can you say about the distribution of n1’2(ml, - 0.7, m2, - 0.7) as n increases?

27. Let (a,,,a,)‘-N[(-1.20,0.61)’, n-’l[l. Let (m,,, m2,J be the roots of

m2 +a,,m +a,, = o .

Page 329: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

306 SOME LARGE SAMPLE THBORY

What can you say about the distribution of n1'2(mln - m,, m,, - m,)', where (m, ,m2) are the mots of

m2 - 1 . 2 h -I- 0.61 = 01

Hint: See Exercise 10 of Chapter 4.

28. Consider the sequence of polynomials

gn(m) = m2 - 3n-'m + 2n-2.

What is the order of the difference of the two roots? How do you explain the difference between this result and the example of the text? Use the proof of Theorem 5.8.1 to prove a general result for the order of the difference of the two roots of a second order polynomial that is converging to a polynomial with repeated roots.

29. Let {Yi]y=l be a sequence of k-dimensional random variables, where Y i - NI(pJ, Z), and J is a kdimensional column vector of ones. Find the limiting distribution, as n + 00, of

where Y = n - ' X ; = , Y, and n

S = (n - I)-' 2 (yi - y)(Yi - 9)' . Compare the variance of the limiting distribution of pg with that of f i = k-'J'F.

i s 1

30. Prove the following.

Corollary 5.1.6.2. Let {X,,} be a sequence of scalar random variables such that

x, = a + O,(t..) *

wbere r, + 0 as n + 00. If g(n) and g'(x) are continuous at a, then

31. Let

q = e a ' i a , , t = 0 , 1 , 2 ,....

Page 330: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EILERCISES 307

where fi E (-m, 0) and a, -NI(O, a'). Let 4 be the value of fi that minimizes n

QJB) = C (Y, - e8f)2 1-0

for /3 E (-m, 0). Are the conditions of Theorem 5.5.1 satisfied for Mn = 1 and a, = n f for some 5 > O? What do you conclude about the consistency of 87

Page 331: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

C H A P T E R 6

Estimation of the Mean and Autocorrelations

In this chapter we shall derive some large sample results for the sampling behavior of the estimated mean, covariances, and autocorrelations.

6.1. ESTIMATION OF THE MEAN

Consider a stationary time series X,, with mean p, which we desire to estimate. If it were possible to obtain a number of independent realizations, then the average of the realization averages would converge in mean square to p as the number of realizations increased. That is, given m samples of n observations each,

- I where 4,) = n 2:ml XflIr is the mean of the n observations from the jth realization.

However, in m y areas of application, it is difficult or impossible to obtain multiple realizations. For example, most economic time series constitute a single realization. Therefore, the question becomes whether or not we can use the average of a single realization to estimate the mean. Clearly, the sample mean 2" is an unbiased estimator for the mean of a covariance stationary time series. If the mean square error of the sample mean as an estimator of the population mean approaches zero as the number of observations included in the mean increases, we say that the time series is ergodic for the mean. We now investigate conditions under which the time series is ergodic for the mean. Theorem 6.1.1 demonstrates that the sample mean may be a consistent estimator for wnstationary time series if the nonstationarity is of a transient nature. The theorem follows Parm (1962).

308

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 332: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION OF THE MEAN 309

"hearem 6.1.1. Let {Xl: t E (1,2, . . .)} be a time series satisfying

lim E{X,} = p ,

lim COV{~,,, X,,} = 0, f - w

"3-

- I where IS, = n Xf=, X,. Then

Proof. Now

where the second term on the right converges to zero by Lemma 3.1.5. Furthermore,

1 " " n t = i j = i

Var{Z"} = 7 c x COV(X,, 3)

which also converges to zefo by Lemma 3.1.5. A

Corollary 6.1.1.1. Let {X,} be a stationary time series whose covariance function f ih) converges to zero as h gets large. Then

lim Var@"} = 0. n - w

h f . By Lemma 3.1.5, the convergence of fib) to zero implies that Cov{Xn, X,,} = (1 ln) Xi:: Hh) converges to zero, and the result follows by Theorem 6.1.1. A

Corollary 6.1.1.2. A stationary time series with absolutely summable co- variance function is ergodic for the mean. Futhermore,

m

Page 333: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

310 ESTIMATION OF THE MEAN AND AUTOCORRELATIONS

€'roo€. The assumption of absolute summability implies

lim r(h) = 0 , h-W

and ergodicity follows from Corollary 6.1.1.1. We have

and n Va&} converges to the stated result by Lemma 3.1.4. A

Theorem 61.2. If the spectral density of a stationary time series X, is continuous, then

lim n Va@,,} = 2 d O ) , (6.1.1) n - w

where AO) is the spectral density of X, evaluated at zero.

Proo€. By Theorem 3.1.10 the Fourier series of a continuous periodic function is uniformly summable by the method of C e s h . The autocovarianws r(k) are equal to 12 times the uk of that theorem. Therefore,

A

Since the absolute summability of the covariance function implies that flu) is continuous, it follows that (6.1.1) holds for a time series with absolutely summable covariance fuaction. Thus, for a wide class of time series, the sample mean has a variance that is declining at the rate n-'. ~n large samples the variance is approximately the spectml density evaluated at zerr, multiplied by Zm-'. To investigate the efficiency of the sample mean as an estimator of p, we write

Y, =c1 +z,. where Z, is a time series with zero mean, and we define V to be the covariance matrix for n observations on Y,. Thus,

v = E(m'} ,

where z is the column vector of n observations on 2,. If the covariance matrix is

Page 334: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EsIlMATION OF THE MEAN 311

known and nonsingular the best linear unbiased estimator of the mean is given by

r; = ( l ’ v - l l ) - l l ’ v - l y , (6.1 -2)

where 1 is a column vector composed of n ones and y is the vector of n observations on Y,. The variance of I; is

Var{I;} = (l’V-9)-l, (6.1.3)

- 1 I ) whereas the variance of jf,, = n X,,] Y, is

var(ji,} = n-21‘V1 . (6.1.4)

Let Y, be a pth order stationary autoregressive process &fined by

(6.1.5)

where the e, are uncorrelated (0,a2) random variables, and the roots of the characteristic equation are less than one in absolute value. For known 5, the estimator (6.1.2) can be constructed by transforming the observations into a sequence of u n m h t e d constant variance observations. Using the Gram-Schmidt orthogonabtion procedure, we obtain

(6.1.6) P-1

J = 1

P

w,=y ,+ &y,+ r = p + l , p + 2 ,..., n ,

where 6, = y - 1 ‘ 2 ( o ) 4 & ={[l -p2(1)]7)10)}-1’24 =p(l){[l -

j = t

p2(1))fi0)}-”2~, and so forth. The expected values are

Page 335: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

312 EmMATION OF THE MEAN AND AUTOCORRELATIONS

In matrix notation we let T denote the transformation defined in (6.1.6). Then T'Tc-~ = V-I, E{Ty} = Tlp, and

p = ( I ~ T ~ T I ) - * I ~ T T Y . (6.1.7)

With the aid of this transformation, we can demonstrate that the sample mean has the same asymptotic efficiency as the best linear unbiased estimator.

Theorem 6.13. Let U, be the stationary pth order autoregressive process defined in (6.1.S). Then

lim n Va&} = lim n Var(ji), n - w n-+-

where is defined in (6.1.2).

Proof. Without loss of generality, we let C* = 1. The spectral density of U, is then

where a, = 1. With h e exception of 2pz terms in the upper left and lower right comers of T'T, the elements of V-' are given by

v i r = arar+h, l i - r l = h G p 9 (6.1.8) otherwise.

{;I: The values of the elements u" in (6.1.8) depend only on li - rl, and we recognize

average. Therefore, for n > 2p, zp-lhl r -O a ,+Ihl = y,(h), say, as the covariance function of a pd, order moving

Page 336: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATORS OF THE AUTOCOVARIANCE ANTI AUTOCORRELATION F U N ~ I O N S 313

It follows that

= 2nfr(O) =lim n Var@"}. n-rm

A

Using the fact that a general class of spectral densities can be approximated by the spectral density of an autoregressive process (see Theorem 4.3.4). Grenander and Rosenblatt (1957) have shown that the mean and certain other linear estimators have the same asymptotic efficiency as the generalized least squares estimator for time series with spectral densities in that class.

63. ESTIMATORS OF THE AUTOCOVARWYCE AND AUTOCORRELATION FUNCTIONS

While the sample mean is a natural estimator to consider for the mean of a stationary time series, a number of estimators have been proposed for the covariance function. If the mean is known and, without loss of generality, taken to be zero, the estimator

(6.2.1)

is Seen to be the mean of n - h observations from the time series, say,

' fh ='Jt+h '

For stationary time series, E(Zf,} = ~ ( h ) for all t, and it follows that Hh) is an unbiased estimator of Hh). In most practical situations the mean is unknown and must be estimated. We list below two possible estimators of the covariance function when the mean is estimated. In both expressions h is taken to be greater than or equal to zero:

1 n - h

Y+(W = - c cx, - %XX,+k -in) 9 (6.2.2) n - h

(6.2.3)

It is clear that these estimaton differ by factors that become small at the rate n-'. Unlike $(A), neither of the estimators is unbiased. The bias is given in Theorem 6.2.2 of this section.

Page 337: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

314 esTIMATION OF THE MEAN AND AUTOCORRELATIONS

The estimator -j(h) can be shown to have smaller mean square e m r than rt(h) for certain types of time series. This estimator also has the advantage of guaranteeing positive definiteness of the estimated covariance function. In most of our applications we shall use the estimator j(h). As one might expect, the variances of the estimated autocovariances are much

more complicated than those of the mean. The theorem we present is due to Bartlett (1946). A result needed in the proof will be useful in later sections, and we state it as a lemma.

Lemma 62.1. Let {+}/"--m and {c,);=-~ be two absolutely summable sequences. Then, for fixed integers r, h, and d, d 2r 0,

Proof. Let p = s - t. Then

. n+d n

Now,

with the inequality resulting from the inclusion of additional terms and the introduction of absolute values. A

Theorem 6.2.1. Let the time series {XI} be defined by

Page 338: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATORS OP THE AVraYlVARIANCE AND AUTOCORRELATION FUNCTIONS 315

where the sequence (q} is absolutely summable and the er are independent (0, a*) random variables with E { e f } = qu4. Then, for fixed h and q (h q a 0),

where x h ) is defined in (6.2.1).

Proof. using

77a4, t = u = v = w , u4 if subscripts are equal in

pairs but not all equal otherwise,

E{ere,e"e,) =

it foUows that

Thus,

+ Y(s - t + q)Y(s - r - h)] . (6.2.6)

w w

Page 339: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

316 ESTIMATION OF THE MEAN AND A U " X m T I O N S

For normal time series q = 3, and we have

1 " -(%h), K d = n p = - , c E)Y(P)?tP - h + q) + ?tP + @)Y(P - h)I -

Corollary 6.2.1.1. Given a time series {et: t E (0 ,2 1,42, . . .)}, where the e, are normal independent (0, cr2) random variables, then for h, q 3: 0,

h = q = O ,

h = q # O , otherwise.

n - h ' C0v{Ye(h), qe(q)} =

Proof, Reserved for the reader. A

In Theatem 6.2.1 the estimator was constructed assuming the mean to be known. As we have mentioned, the estimation of the unknown mean introduces a bias into the estimated covariance. However, the variance formulas presented in Theorem 6.2.1 remain valid approximations for the estimator defined in equation (6.2.2).

Theorem 63.2. Given fixed h 3 q 0 and a time series X, satisfying the assumptions of Theorem 6.2.1,

and

Now

Page 340: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMAIVRS OF THE AUTOCOVARIANCE AND AUTOCORRELATION FUNCI'IONS 317

and we have the first result. Since by an application of (6.2.5) we have VdZ?} = A U(n-*), the second conclusion also folfows.

Using k e r n s 6.2.1 and 6.2.2 and the results of Chapter 5, we can approximate the mean and variance of the estimated correlation function. If the mean is known. we consider the estimated autocorrelation

ffh) = [Ko)l-~RW,

and if the mean is unknown, the estimator

F(h) = [ H0)l-I Sk). (6.2.7)

If the denominator of the estimator is zero, we define the estimator to be zero.

Theorem 6.23. Let the time series {XI } be defined by m

+ r~o)l-2~pf~)var(Ko)~ - COV{Hh), %O>ll + O(n-').

Proof. The estimated autocorrelations are bounded and are differentiable functions of the estimated covariances on a closed set containing the true parameter vector as an interior point, Furthermore, the derivatives are bounded on that set. Hence, the conditions of Theorem 5.4.3 ate met with {@), Kq), KO)}

Page 341: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

playing the role of {X,} of that theorem. Since the function jfh) is bounded, we take a = l .

Expanding [r(h)-p(h)][jfq)-p(q)] through third order terns and using Theorem 5.4.1 to establish that the expected value of the third order moments is an-'), we have the result (6.2.8). The remaining results are established in a similar manner. A

For the first order autmgressive time series X, = pX,-, + c, it is relatively easy to evaluate the variances of the estimated autocorrelations. We have, for h > 0,

(6.2.9)

We note that for large h,

n-h 1+p2 Var{f(h)) = 7 -

n l - p z ' (6.2.10)

For a time series where the mlat ions approach zero rapidly, the variance of ?(h) for large h can be approximated by the first tam of (6.2.8). That is, for such a time series and for h such that p(h)=O, we have

(6.2.1 1) 1

Var{@h)} =- 2 p2(p) . n p = - m

We are particularly interested in the behavior of the estimated aut~rrelations for a time series of independent random variables, since this is often a working hypothesis in time series analysis.

Comllary 6.2.3. Let {e,} be a sequence of independent (0 ,a2 ) random variables with sixth moment w6. Then, for h 3 q > 0,

n - h n(n - 1) E{%h)} = - - + qn-')

otherwise. proot. omitted A

For large n the bias in fib) is negligible. However, it is easy to reduce the bias

Page 342: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATORS OF THE AUTOCOVARIANCE AND AUTOCORRELATION FUNCTIONS 319

in the null case of independent random variables. It is suggested that the estimator

(6.2.12)

be used for hypothesis testing when only a small number of observations are available. For time series of the type specified in Corollary 6.2.3 and h, q > 0,

E(ii(h)) = O(n-2),

In the next section we prove that the ;(h) and a h ) are approximately normally distributed. Tfie approximate distribution of the autocorrelations will be adequate for most purposes, but we mention one exact distributional result. A statistic closely related to the first order autocorrelation is the von Neumann ratio:'

(6.2.13)

We see that

n (X' - xn)2 + (X, - Q2

2 c (X, -Q' f - 1

If the XI are n o d independent (p, a') random variables, it is possible to show that E(d,} = 2. Therefore, r, = $(d, - 2) is an unbiased estimator of zero in that case. von Neumann (1941) and Hart (1942) have given the exact distribution of r, under the assumption that XI is a sequence of normal independent ( p , a') random variables. Tables of percentage points are given in Hart (1942) and in Anderson (1971, p. 345). Inspection of these tables demonstrates that the

' he ratio is sometimes defined with the multiplier n/(n - I).

Page 343: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

320 ESFIMATION OP THE MEAN AND AUTOCORRELATIONS

2 -112 percentage points of tv = ru(n + 1)’12(1 - ru) are approximately those of Student’s t with n + 3 degrees of freedom for n greater than 10.

Clearly the distribution of Al)[l- /i2(1>]-”’(n + 1)II2, where &l) is defined in (6.2.12), is close to the distribution of t, and therefore may also be approximated by Student’s t with n f 3 degrees of freedom when the observations are independent normal random variables.

Kendall and Stuart (1966) and Anderson (1971) present discussions of the distributional theoq of statistics such as d,.

6.3. CENTRAL LIMIT TREOREMS FOR STATIONARY T W SERIES

The results of this chapter have, so far, been concerned with the mean and variance of certain sample statistics computed from a single realization. In order to perform tests of hypotheses or set confidence limits for the underlying parameters, some distribution theory is required.

That the mean of a finite moving average is, in the limit, normally distributed is a simple extension of the central limit t h e o m of Section 5.3.

Proposition 63.1. Let {Xt: 1 E (0, 2 1.22,. . .)} be defined by

where bo = 1, Zy’.o b, # 0, and the e, are uncorrelated (0, a2) random variables. Assume ~2’’~8,, converges in distribution to a N(0, a2) random variable. Then,

Proop. We have

m - I n -112 2- - n bjen-, .

s - 0 J==S+l

112 m Both n-lf2 Z::: ZIA(s+I bjen-, and n- to zero as n increases, by Chebyshev’s inequality.

Z,,, ZI”, bjel-, converge in probability

Page 344: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CENTRAL LIMIT THEOREMS FOR STATIONARY TIME SERIES 321

Therefore, the limiting distribution of n’”(f,, - p) is the limiting distribution of A

If X;’,, bj = 0, then fn - p has a variance that approaches zero at a rate faster than n-’, The reader can verify this by considering, for example, the time series X, = p + e, - e,- , . In such a case the theorem holds in the sense that n (.fn - p) is converging to the singular (zero variance) normal distribution.

Moving average time series of independent random variables are special cases of a more general class of time series called m-dependent.

,,It2 m Xi,, b3Zn, and the result follows.

112

Delinition 63.1. The sequence of random variables {Z,: t E (0,rt 1,+2, . . .)} is said to be mdependent if s - r > m, where m is a positive integer, implies that the two sets

(* * * 3 - 2 , 5- 1, 3) (Zs, zs+, Zs+Z, * * *)

are independent. We give a theorem for such time series due to Hoeffding and Robbins (1948).

Theorem 63.1. Let {Z,: t E (1,2, . . .)} be a sequence of mdependent random variables with E{Z,} = 0, E{Z:} = a: < ,8 < -, and E{p,lz+28} 6 /32’26 for some 6 > 0. Let the limit

P

Iim p - ’ c, A,+j = A , P+- j = I

A # 0, be uniform for t = 1,2, . . . , where

Then n

n-”’ 2 Z , L N(0, A ) . r = l

Proof. Fix an a, 0 < a < 0.25 - 5, where

0.25 > e8 > max{O,O.25( 1 + 25)-’( 1 - 25)) . Let k be the largest integer less than nu, and let p be the largest integer less than k-ln. Define

D

Page 345: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

322 ESTIMATION OF THE MEAN AND AUlYXXlRRELATIONS

Then the difference

0, = n-112 I= f: 1 z , - sp = n-1’2 [ZL? z k i - m + j > + r = p k - m + l 2 zr] *

For n large enough so that k > b , the sums Zy=l Zk,-m+,, i = 1,2,. , . , p , are independent. By the assumption on the moments, we have

va{2 ’ P I zki-m+J} m2g2,

r=pk-m+l

I

It follows that

Val@“) s n-’/32{mZ(p - 1) + (k + m)2} = o( 1) ,

and n-l” Z:=l 2, converges in mean square to Sp. Since 2, is comlated only with Z,-,, Zt-m + . . . , Zt+,- I , Z,+,,,, the addition of a 2, to a sum containing the m preceding tern, Zl- , , Z,+, . . . , Z,-,, increases the variance by the amount Therefore,

and

Since Varp;=, Z c i - l , k + j } ~ m 2 p z , and p - 1 X i , , P A , , converges uniformly, we have

Now

and

Page 346: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CENTRAL LIMIT THEOREMS FOR STATIONARY TIME SERIES 323

Hence, p-"' 27-, k-"'Y, satisfies the conditions of Liapounov's central limit theorem, and the result follows. A

For the mth order moving average, the A, of Theorem 6.3.1 is the same for all t and is

rn m m m-J

y(0) + 2 2 H j ) = 2 b:rr2 + 2 2 2 b,b,+ja2 /"I I =o j = l s=O

=@ b,)'r' *

which agrees with Proposition 6.3.1. The results of Proposition 6.3.1 and "heorem 6.3.1 may be generalized to

infinite moving averages of independent random variables. Our proofs follow those of Diananda (1953) and Anderson (1959, 1971). We first state a lemma required in the proofs of the primary theorems.

Lemma 6.3.1. Let the random variables with distribution functions F, (2) be defined by

5, = s k n + D k n

for R = 1,2,. . . and R = 1,2,. . . . Let

uniformly in n. Assume

and

Then

F t $ ) - A F t t z ) as R+OrJ#

Proof. Let to be a point of continuity of Ft(z), and fix S > 0. There is an > 0 such that

~F&z) - Ft(zo>f < 0.256

for z in [to - E, zo + a], and F&) is continuous at to - E and at zo f E. By

Page 347: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

324 ESTIMATION OF THE MEAN AND AUTOCORRELATIONS

hypothesis, there exists a KO such that

for k > KO. This means that

for k > KO and zo - ~ S z b z , + 6. Now there is a K, such that for k K, ,

P(I - s,, I > €} .= 0,256

for all n. By the arguments used in the proof of Theorem 5.2.1, this implies that

FS,(z - E ) - 0.256 Ffm(z) F',(z + E ) + 0.256

for all z. Fix k at the maximum of KO and K, . Let z1 and z2 be continuity points of F,&j

s u c h t h a t z , - ~ S z , < z o and2,€z,czo+eThenthereexistsanNsuchthat for n > N,

IFS,(Z, 1 - F&, >I < 02% s

IFSkm(Z2) - F&)I < 0.256

Therefore, for n > N,

F&z0) - 8 F $ k ( ~ l 1 - 0.56 C FskRtz, - 0.256

Ftn(Z,) Fs,,(z2) + 0.258 zs F#$Z,) + 0.58

F&zo) + 8 . A The following two lemmas are the convergence in probability and almost SUE

results analogous to the convergence in distribution result of Lemma 6.3.1.

Lemma 6.3.2. Let (&,S,,,D,), k = 1 , 2 ,..., n = l , 2 ,..., be random variables defined on the probability space ($2, d, P) satisfying

6fl = skn + Dkn . (6.3.1)

Assume: (i) one has

plim Dkn = 0 uniformly in n , (6.3.2) k - w

Page 348: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CENTRAL LIMIT THEOREMS FOR STATlONARY TIME SERIEB

(ii) for every fixed k there is an Sk, satisfying

p b (Skn - S k , ) = 0 9

n+w

(iii) one has

325

(6.3.3)

(6.3.4)

A

Lemma 6.33. Let the assumption (6.3.1) of Lemma 6.3.2 hold. Replace the assumptions (6.3.2). (6.3.3). and (6.3.4) with the following:

(i) one has

lim Dkn = 0 a.s., uniformly in n ; k-cm

(ii) for every fixed k, there is an &. satisfying

lim (Skn - Sk ) = 0 as. ; n 4-

(iii) one has

lim (&. - 6 ) = 0 as. k - m

Then

lim (5. - 6) = 0 as. n-rm

Prod. Omitted.

(6.3.5)

(6.3.6)

(6.3.7)

A

We use Lemma 6.3.2. to prove that the sample mean of an infinite moving average converges in probability to the population mean under weak conditions on the e,. For example, the conclusion holds for independently identically distributed e, with zero mean. Corollary 6.1.1.2 used finite variance to obtain a stronger conclusion.

Theorem 63.2. Let the time series X, be defined by

Page 349: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

326 ESTIMAllON OF THE MEAN AND AU"QCORREU\TIONS

Proof. We apply Lemma 6.3.2. Let t,, = 3,, - p,

(6.3.8)

and

By the assumption that the e, are uncomlated ( 0 , ~ ' ) random variables, n X,,, converges to zero in probability as n -+m. Hence, the sum (6.3.8) converges to zero (=J;.) as n + 01 for every fixed k, and condition (ii) of Lemma 6.3.2 is satisfied. Condition (iii) is trivially satisfied for = 0. Now

- 1 n

bM€-' c lWjl I/l>k

for some finite My by Chebyshev's inequality. Hence, condition (i) of Lemma 6.3.2 is satisfied and the conclusion follows. A

Using Lemma 6.3.1, we show that if the e, of a0 inlinite moving average satisfy a central limit theorem, then the moving avenge also satisfies a central limit theorem.

Thoorem 6.3.3. Let XI be a covariance stationary time series satisfying

Page 350: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CENTRAL LIMIT THEOREMS FOR STATIONARY TIME SERIES 327

Then

where

Proof. Let

and define the normalized sums

For a fixed k, W,, is a stationary time series such that ca

2 %(h) = c. q++lhla , h =o, "I, 52,. . . * J m k + 1

It follows that . n - I

Therefore, by Chebyshev's inequality, Db converges to zero as k --+ c*) uniformly in n. For fixed R, qk is a finite moving average of el that satisfies a central limit theorem. Therefore, by Proposition 6.3.1, as n tends to infinity,

AS k increases, the variance of the d distribution converges to <z;~,-, q)'u2. The conclusion folIows by Lemma 6.3.1. A

The stationary autoregressive moving average is a special case of an infinite moving average. Because the weights decline exponentially for the autoregressive

Page 351: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

328 ESTIMATlON OF THE MEAN AND AIJIWDRRELATIONS

moving average, the sample mean of Y, can be expressed as a function of the sample mean of the el with remainder of order in probability n- ' .

Corn- 6.33. Let U, be a stationary 6nite order autoregressive moving average satisfying

D a

where a, = 1, bo = 1, the roots of the autoregressive characteristic equation ~IE

less than one in absolute value, and (e,} is a sequence of uncomlated (0,cr') random variables. Then

where the y are defined in "heorem 2.7.1 and X,:=, y = (X;=, aj)-' 2&, b, . If n"'2e converges in distribution to a N(0, cr ) random variable, then

Proop. By the representation of T h m m 2.7.1,

m n+i-I

- 'Y c y+1e,-i i=O J = i

Now, by Exercise 2.24, there is some M and 0 C A < 1 such that lyl< MA' for all j 3 0. Hence, tfiere is a K such that

and

for all n. Dividing the variances by nz, we obtain the first result. The limiting distribution result is an immediate consequence of the order in probability result

A and the assumption that n'"2,, converges in distribution.

We now obtain a central limit theorem for a linear function of a realization of a

Page 352: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CENTRAL LIMIT THEOREMS FOR STATIONARY TIME SERIES 329

stationary time series. We shall state the assumptions of this theorem in a manner that permits us to use the Lindeberg central limit theorem.

Theorem 63.4. Let {Xl: r E T = (0, t 1,-+2,. . .)} be a time series defined by

where XT=o 191 em, and the el are independent (0, a') random variables with distribution functions F,(e) such that

lim sup J e2 tiF,(e) = 0 . i 3 - w rET Iel>i

Furthermore, let {Cl};=, be a sequence of fixed real numbers satisfying

(i i ClX1-% N(0, V) * f = l f=. l

prod. Following the proof of Theorem 6.3.3, we set

and note that

h - -(n - I )

which, by the proof of Theorem 6.3.3, can be made arbitrarily small by choosing k sufficiently large.

For fixed k, &, is a finite moving average, and, following the proof of Theorem

Page 353: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

330 EST(MATI0N OF THE MBAN AND AIJIWQRRELATIONS

6.3.1, we have

where b, = 2$, a$&+,; and the random variables blel are independent with mean zero and variance b , a . Let V, = Z:nI b:a2,

n

S, = V i l " 2 brer, 1 - 1

Mn= sup b , , 2

t S r S a

and consider, for S > 0,

where

and

R, ={e: lel>M; 112 V, 112 6 ) .

Clearly, R, C R, for all t c n . Assumptions i and ii imply that

2 - 1 n Because V=lim,,, (X:-] C,) Z,,, bgo2#0, we have

which, in turn, implies that Jima+- M,l lZV, l 'Z = 00. By assumption, the supremum of the integral over R, goes to zero. Hence, the

assumptions of the Lindeberg central limit theorem are met, and S,, converges in distribution to a nonnal random variable with zero mean and unit variance. Since

Page 354: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CENTRAL LIMIT THEOREMS FOR STATIONARY TIME SERIES 331

and

the result follows by Lemma 6.3.1. A

Because a stationary finite order autoregressive moving average time series can be expressed as an infinite moving average with absolutely summable covariance function, the conclusions of Theorem 6.3.4 hold for such time series. Also, the condition that the el are independent (0, a') random variables can be replaced by the condition that the el are martingale differences with E[e 1 Js, - J = u2 as. and bounded 2 -I- S (8 > 0) moments, where dt- I is the sigma-field generated by e,, . . . , e r - , .

We now investigate the large sample properties of the estimated auto- covariances and autwrrelations. We use Lemma 6.3.2 to demonstrate that the estimated autocovariances converge in probability to the true values. The assumptions and conclusions of Theorem 6.3.5 differ from those of Theorem 6.2.1. In Theorem 6.2.1, the existence of fourth moments enabled us to obtain variances of the estimated autocovariances. In Theorem 6.3.5, weaker assumptions are used to obtain convergence in probability of the sample autocovariances.

Thewem 63.5. Let the stationary time series U, be defined by

Proof. Now

Also,

by Theorem 6.3.2.

Page 355: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

332 ESTIMATION OF THE MEAN AND AUTOCORREU\TIONS

To show that the term

converges to zero, we apply Lemma 6.3.2, letting

n - h

n-h

n-h

and

For fixed i, j with i + j , using Chebyshev’s inequality, it can be proved that

a-h

Hence, for any fixed k, plim,,, S,, = 0. Now, by Chebyshev’s inequality,

plim 0::’ = 0 uniformly in n . k - m

In a similar manner, we can prove

plim[DE’ + D:’] = 0 uniformly in n &*-

and it follows that Hh) converges in probability to Hh). The result for j(h) A follows because, by the proof of Theorem 6.2.2, %A) - $@) = O,,(n-’).

Page 356: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CENTF3L LIMIT THEOREMS FOR STATIONARY TIME SERIES 333

Corollary 63.5. Let the stationary time series Y, satisfy

for some S > 0, and att-, is the sigma-field generated by et-2r. . . . Then

proof, The proof parallels that of Theorem 6.3.5. A

The vector of a finite number of sampIe autocovariances converges in distribution to a normal vector under mild assumptions.

"heorern 63.6. Let U, be the stationary time series defined by

where the et are independent (0, v2) random variables with fourth moment 7 p 4

and finite 4 + S absolute moment for some S > 0, and the bj are absolutely summabie. Let K be fixed. Then the limiting distribution of n"2[sO) - do), K1) - Hl), . . . , 3K) - HK)]' is multivariate normal with mean zero and co- variance matrix V whose elements ace defined by (6.2.4) of Theorem 6.2.1.

proof. The estimated covariance, for h = 0,1,2,. . . , K, is

and the last two terms, when multiplied by n1'2, converge in probability to zero. Therefore, in investigating the limiting distribution of n1'2[$(h) - Hh)] we need only consider the first term on the right of (6.3.9). Let

y, =x,, + w,, 9

Page 357: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTNATlON OF THE MEAN AND AUMCORRELATlONS 334

where

T n - h n-h

-112 n h where Smhn = n have

2,,, Xm,Xm,+h. Following the proof of Theorem 6.2.1, we

Now,

where L,,, 4 0 uniformly in n as m 4 0 0 because 21i,,m bj 4 0 as m 303. Therefore, Dmn + 0 uniformly in n as m + 00.

Consider a linear combination of the errors in the estimated covariances of Xmr,

Page 358: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CEMRAL LIMIT THEOREMS FQR STATIONARY TIME SERIES 335

where the Ah are arbitrary real numbers (not all zero) and

zih =XmiXm.i+h h =0,1,2,. .. , K. Now, Z,, is an (m + h)dependent covariance stationary time series with mean

x (h) and covariance function 11)

m

where we have used (6.2.5). Thus, the weighted average of the Z,h’s,

K K

is a stationary time series. Furthermore, the time series U, is (m + K)-dependent, it has finite 2 + Si2 mment, and

K

Therefore, by Theorem 6.3.1, Sm converges in distribution to a normal random variable. Since the Ah are arbitrary, the vector random variable

n*’2[?*m(o) - XyNY ?Xm(l) - X “ ( U ” - Y ? X p - x p 1

converges in distribution to a multivariate normal by Theorem 53.3, where the covariance matrix is defined by (6.3.10). Because E{Xm;Ym,r+h) converges to E { q K + h } as m 4 a, the conditions of Lemma 6.3.1 are satisfied and we obtain the conclusion. A

Theorem 6.3.6 can be proven for el that are martingale differences. See

Generally it is the estimated autocorrelations that are subjected to analysis, and Exercises 6.23 and 6.24, and Hannan and Heyde (1972).

hence their limiting distribution is of interest.

Corollary 63.6.1. Let the assumptions of Theorem 6.3.6 hold. Then the vector t~’’~[?(1) - p(l), 42) - p(2), . . . , 4K) - p(K)]’ converges in distribution to a multivariate normal with mean zero and covariance ma- G, where the hqth element of G is Zp”=-, [ A p l A p - h + q) + d p + q)P(p - h) - 2P(q)dp)p(p - h) - 2P(h)P(P)dP - 4) + 2dMJo)P2(P)l.

Proof. Since the i lh) are continuous differentiable functions of the Nh). the A result follows from Theorems 5.1.4.6.2.3, and 6.3.5.

Page 359: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

336 ESTIMATION OF “HE MEAN AND AUTOCORReLATlONS

Observe that if the original time series X, is a se:pence of independent identically distributed random variables with finite moments, the sample correla- tions will be nearly independent in large samples. Because of the importaace of this result in the testing of time series for independence, we state it as a corollary.

Corollary 6.3.6.2. Let the time series e, be a sequemce of independent (0, a’) random variables with uniformly bounded 4 + S moments for some S > 0. Let &h) be defined by (6.2.12), and let K be a fixed integer. Then n(n - h)-”’8h), h = 1.2,. . . , K, converge in distribution to independent normal (0,l) random variables.

Proof. omitd. A

Example 6.3.1. The quarterly seasonally adjusted United States unemploy- ment rate from 1948 to 1972 is given in Table 6.3.1 and displayed in Figure 6.3.1. The mean unemployment rate is 4.77. The autocorrelation function estimated using (6.2.7) is given in Figure 6.3.2 and Table 6.3.2. This plot is metimes called a correlogram.

To cany out statistical analyses we assume the time series can be treated as a stationary time series with finite sixth moment. If the original time series was a sequence of uncmfated random variables, we would expect about 95% of the estimated correlations to fall between the lines plus and minus l.%n-’ (n - h)”’. ObviousIy unemployment is not an uncorrelated time series. Casual inspection of the conelogram might lead one to conclude that the time series contains a periodic component with a period of about 54 quarters, since the estimated correlations for h equal to 21 through 33 are negative and below the 1.96 sigma bounds for an uncorrelated time series. However, because the time series is highly correlated at small lags, the variance of the estimated correlations at large lags is much larger than the variance of correlations computed from a white noise sequence.

The first few autocorrelations of the unemployment tine series are in good agreement with those generated by the second order autoregressive process

XI = 1.5356X,-, - 0.6692X1-, + e, ,

where the el are uncorrelated (0.0.1155) random variables. We shall discuss the estimation of the parameters of autoregressive time series in Chapter 8. However, the fact that the sample autoconelations are consistent estimators of the population correlations permits us to obtain consistent estimators of the autoregressive parameters of a second order process from (2.5.7). The g e n d agreement between the correlations for the second order autoregmsive process and the sample correlations is clear from Table 6.3.2.

The roots of the second order autoregressive process are 0.76820.2825. We recall that the correlation function of such a second order process can be written as

p(h)=b ,mf+b ,m; , h = 0 , 1 , 2 ,...,

Page 360: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

CENTRAL LIMIT THEOREMS FOR STATIONARY TlME SERIES

Table 6.3.1. US. Unemployment Rate (Quarterly Sensonally Adjusted) 1948-1972 Year Quarter Rate Year Quarter Rate Year Quarter Rate

337

1948

1949

1950

1951

1952

I953

I954

1955

1956

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

3.73 3.67 3.77 3.83 4.67 5.87 6.70 6.97 6.40 5.57 4.63 4.23 3.50 3.10 3.17 3.37 3.07 2.97 3.23 2.83 2.70 2.57 2.73 3.70 5.27 5.80 5.97 5.33 4.73 4.40 4.10 4.23 4.03 4.20 4.13 4.13

1957

1958

1959

196Q

1%1

1962

1%3

1964

1965

f 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

3.93 4.10 4.23 4.93 6.30 7.37 7.33 6.37 5.83 5.10 5.27 5.60 5.13 5.23 5.53 6.27 6.80 7.00 6.77 6.20 5.63 5.53 5.57 5.53 5.77 5.73 5.50 5.57 5.47 5.20 5.00 5.00 4.90 4.67 4.37 4.10

1966

1967

1968

1969

1970

1971

1972

1 2 3 4 1 2 3 4 I 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

3.87 3.80 3.77 3.70 3.77 3.83 3.83 3.93 3.73 3.57 3.53 3.43 3.37 3.43 3.60 3.60 4.17 4.80 5.17 5.87 5-93 5.97 5.97 5.97 5.83 5.77 5.53 5.30

Sources. Business Statistics, 1971 Biennial Edition, pp. 68 and 233 and Suwey of Current Buriness. January 1972 and January 1973. Quarterly data are the averages of monthly data.

where

(b,, b,) = (m* - m,>-"m, - dl), P(1) - m,I

For the unemployment time series the estimated parameters are 6 , = 0.500 -

Page 361: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION OF THE MEAN AND AUTOCORRELATIONS

0 15 30 45 60 75 90 105 T h e

Figure 6.3.1. United States quarterly seesonalIy adjusted ua~mployment tate.

0.2706 and 6 2 =I 6: = 0.500 + 0.2706. Using these value^, C B ~ estimate $le variance of the estimated autocorrelations for large lags using (6.2.11). We have

= 4.812,

Thus the estimated standard error of the estimated autocorrelations at large lags is about 0.22, and the observed correlations at lags near 27 could arise from such a process.

Given that the time series was generated by the second order autoregressive mechanism, the variance of the sample mean can be estimated by CorolIary

Page 362: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION OF THE CROSS COVARIANCES 339

1.2

1 .o

0.8

0.6

r(h) 0.4

0.2

0.0

-0.2

-0.4

Y

0 6 12 18 24 30 36 42 h

Figare 63.2. Cornlogram of quarterly s e a d l y adjusted unemployment rate.

6.1.1.2. By that corollary OD CO

?I 9a&}= x %h) = %O) 2 fib) h=-m hn-a)

0.1 155 (1 - 1.5356 + 0.6692)’

- -

= 6.47 . For this highly correlated process the variance of the mean is about five times that of an uncorrelated time series with the same variance. AA

6.4. ESTIMATION OF THE CROSS COVARLANCES

In OUT Section 1.7 discussion of vector valued time series we introduced the k X k

Page 363: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

340 ESTIMATION OF THE MEAN AND AUTOCORRELATIONS

TabIe6.32 Esttrnurted Autocorrelatlom, Quarterly U.S. sepsonally AdJusted Un- employment Rate, 1-72

Lag Estimated Conrelations for h Correlations Il.%n-‘(n - h)’”J Second Order Process

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

1.oooO 0.9200 0.7436 0.5348 0.3476 0.2165 0.1394 0.0963 0.0740 0.0664 0.0556 0.0352 0.0109

-0.0064 -0.0135

O.OOO4 0.0229 0.0223

-0.0126 -0.0762 -0.1557 -0.235 1 -0.2975 -0.3412 -0.3599 -0.3483 -0.3236 -0.3090 -0.3142 -0.3299 -0.33% -0.3235 -0.2744 -0.2058 -0.1378 -0.0922 -0.0816 -0.1027 -0.1340 -0.1590 -0.1669

- 0.1950 0.1940 0.1930 0.1920 0.1910 0.1900 0.1890 0.1880 0. I870 0.1859 0.1849 0.1839 0.1828 0.1818 0. I807 0.17% 0.1786 0.1775 0.1764 0.1753 0,1742 0.1731 0.1720 0.1709 0.1697 0.1686 0.1675 0.1663 0. I652 0.1640 0.1628 0.1616 0.1604 0.1592 0.1580 0.1568 0.1555 0,1543 0.1531 0.1518

I *oooo 0.9200 0.7436 0.5262 0.3 105 0.1247

-0.0164 -0.1085 -0.1557 -0.1665 -0.1515 -0.1212 -0.0848 -0.0490 -0.0 186

0.0043 0.0190 0.0263 0.0277 0.0249 0.0197 0.0136 0.0077 0.0027 -0.0010 -0.0033 -0.0044 -0.0046 -0.0041 -0.0032 -0.0022 -0.0012 -0.oo04

O.OOO2 O.OOO6 0.0007 O.OOO8 0.0007 0.0005 O.oo04 o.ooo2

Page 364: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION OF THE CROSS COVARIANCE3 341

covariance matrix

where X, is a stationary k-dimensional time series and f i = E{X,}. The j th diagonal element of the matrix is the autocovariance of 4,. xj(h) = E((5 , - &)(q,,+* - 4)}, and the tjth element is the cross covariance between Xi , and X,,,

xj(h) = E{(Xj, - f i X X j . t + h - 611 *

Expressions for the estimated m s s covariance analogous to those of (6.2.1) and (6.2.3) are

for the means known and taken to be zero, and

where the unknown means are estimated by f , , = n- Z:, I X,i. By (1.7.4). we can also write

The corresponding estimators of the cross correlations are

and

By our earlier results (see ”heorem 6.2.2) the estimation of the mean in estimator (6.4.2) introduces a bias that is O(n-’) for time series with absolutely sumrnable covariance function. The propeaies of the estimated cross covariances are analogous to the properties of the estimated autocovariance given in Theorems 6.2.1 and 6.2.3. To simplify the derivation, we only present the results for normal time series.

Page 365: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

342 ESTIMATION OF THE MEAN AND AUMCORREUTIONS

Theorem 6.4.1. Let the bivariate stationary normal time series X, be such that

Then

where the remainder term enters because the mean is estimated. Evaluating the expectation, we have

. * - h n - a

Using Lemma 6.2.1 to take the limit, we have the stated result. A

Since the cross comlations are simple functions of the cross covariauces, we can obtain a similar expression for the covariance of the estimated cross correlations.

Corollary 6.4.1.1. Given the bivariate stationary n o d time series of Theorem 6.4.1,

Page 366: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION OF THE CROSS COVARIANCES 343

Proop. By Theorem 5.5.1 we may use the first term in Taylor's series to obtain the leading term in the covariance expression. Evaluating

922(0)]}, and Var{+[9,1(0) + 922(0)1} by the methods of Theorem 6.2.1, we obtain the conclusion. A

COV{9*2(h), 9&)1, COV{9,,,(h), +r?,l<o) f 9,2(0)11, Cov{91,(q), ;r.i;,co> +

Perhaps the most important aspect of these rather cumbersome results is that the covariances are decreasing at the rate n-'. Also, certain special cases are of interest. One working hypothesis is that the two time series are uncorrelated. If XI, is a sequence of independent normal random variables, we obtain a particularly simple result.

Corollary 6.4.13. Let X, be a bivariate stationary normal time series satisfying

m

and

z2(h) = 0 , all h . Then

In the null case, the variance of the estimated cross correlation is approximately II , and the correlation between estimated cross correlations is the autocotrelation of X,, multiplied by II-'. If the two time series are independent and neither time series is autocorrelated, then the estimated cross comlations art? uncorrelated with an approximate variance of n-',

It is possible to demonstrate that the sample covariances are consistent estimators under much weaker conditions.

- 1

Lemma 6.4.1. Let

where {a,} and {B,} are absolutely summable and {e,} = {(el,, e2,)'} is a sequence of independent (0, S) random variables. Assume E{(e, ,(2+S} < L for i = 1,2 and some

Page 367: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

3-44 ESTIMATION OF THB MEAN AND AUTOCORRELATIONS

S >O, or that the el are identically distributed. Then

n-h

Proof. Define

fix h, and consider

If j # i - h,

where a: is the variance of eil. If j = i - h and u12 = E{ellea}, then

by the weak law of large numbers. [See, for example, Chug (1968, p. lo$).] Now

By Chebyshev's inequality

and it follows that

Page 368: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EsIlMATION OF THE CROSS COVARIANCES 345

uniformly in n. Convergence in probability implies convergence in distribution, and by an application of Lemma 6.3.1 we have that R Z,,, XIrX2 , r+k converges

A - 1 n h

in distribution to the constant ~ , ~ , ( h ) . The result follows by Lemma 5.2.1.

"hearem 6.4.2. Let {e,,} and {X,} be independent time series, where {el,} is a sequence of independent (0, a:) random variables with uniformly bounded third moment and (X,} is defined by

m

x, = c a,ez,-/ ' 3"o

where Z3?=?I+l<m and {eZt} is a sequence of independent (0,~:) random variables w~th uniformly bounded third moment. Then, for fixed h > 0,

proof. Wewrite

and note that, by Lemma 6.4.1,

For fixed h, the time series

is weakly stationary with

and bounded third moment. Asymptotic normality follows by a modest extension of Theorem 6.3.3. A

Example 6.4.1. In Table 6.4.1 we present the sample autocorrelations and

Page 369: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

346 EsllMATfON OF THE MEAN AND AUTOCORRELATIONS

Table 6.4.1. Sample Correlation Functions for Suspended sediment fn Dcs Mohw River at Boooe, Iowa and Saylorvllle, Iowa

~ ~- ~ ~

Autocorrelation Autocorrelation ~scorrelat ion Boone Saylorville Boone-SaylorviUe

h ? , I (h) +& +,&) - 12 0.20 0.10 0.10 -11 0.19 0.13 0.07 -10 0.22 0.16 0.08 -9 0.25 0.16 0.08 -8 0.29 0.16 0.13 -7 0.29 0.17 0.18 -6 0.29 0.15 0.21 -5 0.32 0.18 0.21 -4 0.39 0.27 0.23 -3 0.48 0.42 0.30 -2 0.62 0.60 0.40 -1 0.76 0.8 1 0.53

0 1 .oo 1 .oo 0.64 1 0.76 0.81 0.74 2 0.62 0.60 0.67 3 0.48 0.42 0.53 4 0.39 0.27 0.42 5 0.32 0.18 0.32 6 0.29 0.15 0.26 7 0.29 0.17 0.26 8 0.29 0.16 0.25 9 0.25 0.16 0.29

10 0.22 0.16 0.3 1 11 0.19 0.13 0.28 12 0.20 0.10 0.33

closs correlations for the bivariate time series Y, = CY,,, Yzl)', where Y,, is the logarithm of suspended sediment in the water of the Des Moines River at Boone, Iowa, and Y,, is the logarithm of suspended sadiment in the water at Saylorville, Iowa. SayIorviUe is approximately 48 miles downstream from Boone. The sample data were 205 daily observations collected from April to October 1973.

There are no large tributaries entering the Des Moines River between Boone and Saylorville, and a correlation between the readings at the two points is expected. Since Saylmille is some distance downstream, the cornlation pattern should reflect the time required for water to move between the two points. In fact, the largest sample cross correlation is between the Saylorville reading at time t + 1 and the Boone W i n g at time r. Also, estimates of y,&), h > 0. am consistently larger than the estimates of y,J-h).

The Boone time series was discussed in Section 4.5. There it was assumed that

Page 370: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTlMATION OF THE CROSS COVARIANCES 347

the time series Y,, could be represented as the sum of the “true process” and a measurement error. ”he underlying true value XI, was assumed to be a first order autoregressive process with parameter 0.81. If we define the time series

W,, = Y,, - 0.81Y,,,-, , W,, = YZr -0.81Y2,,-, ,

the transformed true process for Boom is a sequence of uncorrelated random variables, although the observed time series W,, wilI show a small negative first order autocorrelation.

Table 6.4.2 contains the first few estimated correlations for W,. Note that the estimated cross correlation at zero is quite small. Under the null hypothesis that the cross correlations are zero and that the autocorrelations of W,, and W,, are zero

Table 6.4.2. Saxnpk Correlation Functione for Trpnsfonned Suspenaed Sediment in Res MoineS River at Boone, Iowa and SayiordUe, Iowa

Autocorrelation Autocorrelation Cross Correlation h 4, w,, %I with %I

-12 -11 - 10 -9 -8 -7 -6 -5 -4 -3 -2 -1

0 1 2 3 4 5 6 7 8 9

10 11 12

0.05 -0.06

0.03 0.01 0.11 0.05

-0.03 0.04

-0.01 -0.07

0.05 -0.17

1 .oo -0.17

0.05 -0.07 -0.01

0.04 -0.03

0.05 0.11 0.01 0.03

-0.06 0.05

0.05 0.02 0.08 0.01 0.04 0.07

-0.06 -0.12 -0.07

0.02 -0.04

0.15 1 .oo 0.15

- 0.04 0.02

-0.07 -0.12 -0.06

0.07 0.04 0.01 0.08 0.02 0.05

0.04 -0.07

0.05 0.11 0.03 0.08 0.10 0.03

-0.05 -0.01 -0.04

0.06 0.06 0.41 0.24

-0.02 0.01 0.04

-0.10 0.08

-0.08 0.05 0.16 0.08 0.0 1

Page 371: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

348 ESTlMATION OF THE MEAN AND AUTOCORRELATlONS

after a lag of two, the estimated variance of the sample cross conrelations is

1 - 205 = 0,0046.

[l + 2(-0.17)(0.15) + 2(0.05)(-0.04)] --

Under these hypotheses, the estimated standard error of the estimated cross correlation is 0.068.

The hypothesis of zero cross correlation is rejected by the estimates Fu(l), t3,,(2), since they are several times as large as the estimated standard error. Tbe two nonzero sample cross correlations suggest that the input-output model is more complicated than that of a simple integer delay. It might be a simple &lay of over one day, or it is possible that the mixing action of moving water produces a m m complicated lag structure. AA

RJ%FERJDJCJB

SectIoo 6.1. Grenander and Rosenblatt ( 957). - ( 9701, Panen (1958, 1%2). Seetisn 6.2. Andeffon (19711, Bartlett (1946.1966). Hart (1942). Kendall(1954). Kendall

and Stuart (1966). Mariott and Pope (1954), von Neumann (1941, 1942). SeEtlOp 6.3. Anderson (1959, 1971), Andcrson and Walker (IW), Diananda (1953).

Eicker (19631, Hannan and Heyde (1972). Hoeffding and Robbins (1948). Mom (1W7).

Section 6.4. Bartlett (1966), Box, Jenkins, and Reinsel (1994), Hannan (1970).

EXERCISES

1. Let Y, = p + X,, where X, = e, + 0.4e,-, and the e, are uncorrelated (0, a') random variables. Compute the variance of j N = n - * Zfa, Y,. What is limn+, n VarQ,,}?

2. Letthetimeseries{Y,:tE(1,2, ...)} bedefioedby

U, = Q + pU,-, + e, ,

where U,, is fixed, {e,: t E (1,2, . . .)} is a sequence of independent (0, 02)

random variables, and lpl C 1. Find E{U,} and Var(Y,}. Show that U, satisfies the conditions of Theorem 6.1.1. What value does the sample mean of U, converge to?

Page 372: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ExERCiSEs 349

3, Let X, =e, + 0.5e,-,, where the e, are normal independent (0, c2) random variables. Letting

4. Evaluate (6.2.4) for the first order autoregressive process

X, = pX,- I + e, . where IpI C 1 and the e, are normal independent (0, a') random variables.

5, Given the finite moving average M

where the e, are normal independent ( 0 , ~ ' ) random variables, is there a distance d = h - q such that %h) and vq) are uncomlated?

6. Use the realization (10.1.10) and equation (6.2.2) to construct the estimated (3 X 3) covariance matrix for a realization of size 3. Show that the resulting matrix is not positive definite. Use the fact that

form=0,1, ..., n - l , h = O , l , ..., n- l ,where , fo r j= l ,2 ,..., 2n-1,

to prove that (6.2.3.) yields an estimated covariance matrix that is always positive semidefinite.

7. Give the variance of 4, n = 1,2,. . . , for {X,: t E (1.2,. . .)} defined by

XI = P + e, --el-, ,

where {e,: i E (0,1,2, . . .)} is a sequence of independent identically distributed ( 0 , ~ ' ) ran$m variables. Do you think there is a function w, such that wn(fn - P) + N O , 117

Page 373: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

350 ESTIMATION OP THE MEAN AND AUTOCORRELATIONS

8. Prove the following result, which is used in Theorem 6.3.4. If the sequence (c,) is such that

then

n 3 m 1 6 1 6 n lim sup ( 2 c ; ) - ' c : = o . - 1

9. Prove Corollary 6.3.5.

10. The data in the accompanying table are the average weekly gross hoars per production worker on the payrolls of manufacturing establishments (seasonally adjusted).

YeiU I II m Iv 1948 1949 I950 1951 1952 1953 1954 1955 1956 I957 1958 1959 1960 1961 1962 1963 1964 1%5 1966 1967 1968 1969 1970 1971 1972

40.30 39.23 39.70 40.90 40.63 41.00 39.53 40.47 40.60 40.37 38.73 40.23 40.17 39.27 40.27 40.37 40.40 41.27 41.53 40.60 40.60 40.57 40.17 39.80 40.30

40.23 38.77 40.27 40.93 40.33 40.87 39.47 40.73 40.30 39.97 38.80 40.53 39.87 39.70 40.53 40.40 40.77 41.10 41.47 40.43 40.63 40.73 39.87 39.39 40.67

39.97 33.23 40.90 40.43 40.60 40.27 39.60 40.60 40.27 39.80 39.40 40.20 39.63 39.87 40.47 40.50 40.67 41.03 41.33 40.67 40.83 40.63 39.73 39.77 40.67

39.70 39.30 40.97 40.37 41.07 39.80 39.90 40.93 40.47 39.13 39.70 40.03 39.07 40.40 40.27 40.57 40.90 41.30 41.10 40.70 40.77 40.53 39.50 40.07 40.87

Sources: Bvsiness Statistics, 1971, pp. 74 and 237, and Surwy qf Current Bwhss. Jan. 1972, Jan. 193. Thc quarterly data arc the averages of monchty data.

Page 374: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 351

(a) Estimate the covariance function fih), assuming the mean unknown. (b) Estimate the correlation function p(h), assuming the mean unknown. (c) Using large sample theory, test the hypothesis Ho: p(1) = 0, assuming

p(h) = 0, h > 1.

11. Using Hart’s (1942) tables for the percentage points of d, or Anderson’s (1971) tables for ru, obtain the percentage points for t , = ru(n f 1)1’2(1 - ru2)-1’2 for n = 10 and 15. Compare these values with the percentage points of Student’s t with 13 and 18 degrees of freedoin.

12 Using the truncation argument of Theorem 6.3.3, complete the proof of Theorem 6.4.2 by showing that

n-lr

(n ZIh

converges in distribution to a normal random variable.

13. Denoting the data of Exercise 10 by X,, and that of Table 6.3.1 by X2,, compute the cross covariance and cross correlation functions. Define

Y,, =XI, - 1.53X,.,-, - 0.66X,,,-, , Y2, =X,, - 1.53X2,,-, - 0.66X2,-, .

Compute the cross covariance and cross correlation functions for <Y,,,Y,,). Plot the cross correlation function of (Y1,,Y,,). Compute the variance of P, (h) under the assumption that Y,, is a sequence of uncorrelated random v&les. Plot the standard e m on your figure.

14. Let X, be a time series with positive continuous spectral density A(@). (a) Show that., given e > 0, one may define two moving average time series

W,, and W,, with spectral densities

Page 375: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

352 ESTIMATION OF THE MEAN AND AUTOCORRELATlONS

(b) Let {uf: t = 1,2,. . .} be such that

See Exercise 4.15 and Grenander and Rosenblatt (1957, chapter 7).

15. Prove Lemma 6.3.2.

16. Let XI = Z;&, b,e,-/. Show that

17. Let U, be the stationary autoregressive moving average

Rove that if Xy=o bi = 0, then vVn} = ~ ( n - ~ ) .

18. Let X I be a covariance stationary time series satisfying

where E7eo 191 < 00, X7mo ‘yi # 0, and the el are uncomlated (0, cr’) random variables. Show that

Page 376: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES

19. Let m m

353

Rove

20. Let U, be a stationary time series with absolutely summable covariance function, and let

m -

X, = c cy,Y,-j 3

j - 0

where IiijI<MAJ for someM<w and O < A < l . Show that

21. Construct an alternative proof for Corollary 6.3.3 by averaging both sides of the defining equation to obtain

22. For a moving average of order q, show that nl"f(h) $N[O, 1 + 2 Z;= pz(k)] for lhl> q.

23. Prove the following.

Result 63.1. Let el be a sequence of random variables. let d, be the sigma-field generated by {es: s zs t}. and assume the e, satisfy (a) E{(e,, e: - d, e: - 77cr4) I ~e, - , } = (o,o, 01 as.,

(b) p h , - R

(c) ~ { l e , 1 ~ + ~ ~ } < ~ < ~ for some 6 >o.

- 1 - I X:=l E{e:e,-,, I .I&l-'} = limn-,- n E{e:e,-,,} = 0 for every fixed h > 0,

Page 377: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

354 ESTIMATION OF THE MEAN AND AVMCORRELATIONS

Then, for any fixed K, B n ” 2 [ 7 e ( ~ ) - a2, yC(1), . . . , 9c(~)~’+ “0, u4 diag(q - 1, I, . . . ,111 .

24. Using Result 6.3.1 of Exercise 23, prove the following.

Result 6.3.2. Let Y, =XY-, &el-,, where the e, satisfy the conditions of Result 6.3.1. Then, for any fixed K,

n”2t7y(0) - MO), 74) - Yy(l), * ’ * 9 7rW) - I b ( r n 1 ‘ 5 M O Y) 9

where the elements of V are defined by (6.2.4) of Theorem 621.

Page 378: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

C H A P T E R 7

The Periodogram, Estimated Spectrum

In this chapter we shall investigate estimators of the spectral density of time series with absolutely summable covariance function. The spectral density of a time series was defined in Section 4.1 as the Fourier transform of the covariance hction. We also noted in Section 4.2 that the variances of the random variables defined by the Fourier coefficients of the original time series are, approximately, multiples of the spectral density. These two results suggest methods of estimating the spectral density.

The study of the Fourier coefficients was popular among economists in the 1920s and 193Os, and the student of economics may be interested in the discussion of Davis (1941) and the studies cited by Tintner (1952). Also see Granger and Hatanaka (1964) and Nold (1972). Recent applications are more common in the engineering literature. See, for example, Marple (1987).

7.1. THE PERIODOGRAM

Given a finite realization from a time series, we can represent the n observations by the trigonometric polynomial

m 1 a0 X, = y -I- 2 (uk cos qtr + b, sin q f ) .

k = I

where

217k o,=- n ’ k=0 ,1 ,2 ,..., m ,

U

2 2 x, cos Wkt t = 1 ak = , k = 0 , 1 , 2 ,..., m , n

(7.1.1)

355

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 379: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

356

n

2 C X, sin wkt

n 1-1 6, = , k=l,2, ..., m ,

and we have assumed n odd and equal to 2m + I. As before, we recognize the Fourier coefficients as regression coefficients. We can use the standard regression analysis to partition the total sum of squares for the n observations. 'l%e sum of squares removed by the regression of X, on cos okt is the regression coefficient multiplied by the sum of cross products; that is, the sum of squares due to a, is

(7.1.2)

and the sum of squares removed by cos q t and sin q f is

We might call the quantity in (7.1.3) the sum of squares associated with frequency 0,. Thus the total sum of squares for the n = 2m + 1 observations may be partitioned into m + 1 components. One component is associated with the mean. Each of the remaining m components is the sum of the two squares associated with the rn nonzero frequencies. The partition is displayed in Table 7.1.1.

Should the number of observations be even and denoted by 2m, there is only one regression variable associated with the mth frequency: cos m. Then the sum of squares for the mth frequency has one degree of freedom and is given by n-'(Z:=, X, cos 7rt)2 = $nu,.

One might divide all of the sums of squares of Table 7.1.1 by the degrees of 2

Table 7.1.1. Andy& of Variance Table for a Same of Size n = 2m + 1

SwrCe Degrees of Freedom snm of squans

MfZlQ

Frequency w, = 21rln

Frequency w, = 4 d n

Frequency a,,, = 2mnln

Total

2

n I - I

Page 380: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE PERIODOGRAM 357

freedom and consider the mean squares. However, it is more common to investigate the sums of squares, multiplying the sums of squares with one degree of freedom by two. The function of frequency given by these normalized sums of squares is called the periodogram. Thus, the periodogram is defined by

n Zn(o,)=~ (a: + b i ) , R = 1,2,. . . , nt, (7.1.4)

where m is the smallest integer greater than or equal to (n - 1)/2. Most computer programs designed to compute the periodogram for large data

sets use an algorithm based on the fast Fourier transform. See Cooley, Lewis, and Welch (1967) for references on the fast Fourier transform. Singleton (1969) gives a Fortran program for the transform, and Bloomfield (1976, p. 61) discusses the

If {XI} is a sequence of normal independent (0, a2) random variables, then the a, and b,, being linear combinations of the XI, will be normally distributed. Since the sine and cosine functions are orthogonal, the a, and b, are independent. In this case those entries in Table 7.1.1 with two degrees of freedom divided by c2 are distributed as independent chi-squares with two degrees of freedom.

The periodogram may also be defined in terms of the original observations as

pfroCedUE.

,(,)=a[( t-1 2 x l c o s q t ) 2 + ( f - 1 kxIs inq t )* ] , k = 0 , 1 , ..., m .

(7.1.5)

Note that if we define the complex coefficients c, by

then

(7.1.6)

Thus, the periodow ordinate at o, is a multiple of the squared nonn of the complex Fourier coefficient of the time series associated with the frequency q.

The periodogram is also expressible as a multiple of the Fourier transform of the estimated covariance function. If % Z 0, we can write

Page 381: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

358 THE PERIODwRAhl, ESTIMATED SPECTRUM

where p = E{Xr}. Therefore,

for k = 1,2,. . . , m. In this double sum there are n combinations with t - j = 0, n - 1 combinations with t - j = 1, and so forth. Therefore, by letting p = t - j , we obtain the following result.

Result 7.1.1. The kth periodogmn ordinate is given by

(7.1.7)

(7.1.8)

The coefficients a, and b, (k # 0) that are computed using X, - .fn are identical to those computed using X,. Therefore, by substituting X, - Zn for X, in (7.1.5), we MI1 also write

w

where k # 0 and

(7.1.10)

Page 382: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE PERIODOGRAM 359

The function f ( w ) of equation (7.1.8) is a continuous periodic function of w defined for all w. The periodogram has been defined only for the discrete set of points 0, = 2nk/n, k = 0 , 1 , 2 , . . . , m. In investigating the limiting properties of the periodogram, it is convenient to have a function defined for all w E [0, w]. To this end we introduce the function

* k=0,+-1 ,+ .2 ,..., 77tB+1) < w 4 dZ-1) n K(n,w)=R for I?

and take

I n ( 0 ) = In(wK(n,w,) *

Thus, I,@) for w E [O, n] is a step function that takes the value In(wk) on the interval (?r{2k - l } / n , n(2k + l } /n) .

The word periodogram is used with considerable flexibitity in the literature. Despite the apparent conflict in terms, our definition of the periodogram as a function of frequency rather than of period is a common one. We have chosen to define the periodogram for the discrete frequencies wk = 2nk/n, k = 0 . 1 . 2 , . . . , my and to extend it to all w as a step function. An alternative definition of the periodogram is 47$(0), where f ( w ) is given in (7.13). in which case one automatically has a function for all w. Our definition will prove convenient in obtaining the limiting properties of estimators of the spectrum.

The distributional properties of the periodogram ordinates am easily obtained when the time series is n o d white noise. To establish the promies of the periodogram for other time series, we first obtain the limiting value of the expectation of I,(&) for a time series with absolutely summable covariance function.

Theorem 7.1.1. Let X, be a stationary time series with E{X,}=p and absolutely summable covariance function. Then

lim E{Zn(w)} = 4*w), w # 0, n-vw

lim E{I,(O) - 2np2} = 4nJqO) w = 0.

Proof. Since E{j((p)} = Hp), it follows from (7.1.7) that

n-m

Now

Page 383: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

360 THE PERIODOGRAM, ESTIMATED SPECTRUM

and the latter sum goes to zero as n + m by Lemma 3.1.4. The sequence gn(w) = 2 X ~ l ! ( n - l l ~ h ) cos wh converges uniformly to 4@w) by Corollary

A 3,1.8, and w ~ ( , , ~ ) converges to w by construction.

The normaiized coefficients 2-”2n’/2a, and 2-”%”* b, are the random variables obtained by applying to the observations the transformation discussed in Section 4.2. Therefore, for a time series with absolutely summable covariance function, these random variables are in the limit uncorrelated and have variance given by a multiple of the spectral density evaluated at the associated frequency.

The importance of this result is difficult to overemphasize. Faa a wide class of time series we are able to transform an observed set of n obseeeeeons into a set of n statistics that are nearly uncorrelated. All except the first, the sample mean, have zero expected value. The variance of these random variables is, approximately, a simple function of the spectral density,

Theorem 7.1.2. Let XI be a time series defined by

where (aj) is absolutely sumable and the el are independent identically distributed (0, cr’) random variables. Let &(w) be positive for all w, Then, for w and A in (0, T ) and w # A , the sequences [27@w)]-’Ja(w) and [27@A)]-’Zn(A) converge in distribution to independent chi-square random variables, each with two degrees of freedom,

proof. Consider

uniformly in n. Fixing r, we have

Page 384: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

me PERJOLWRAM 361

where

and R, converges in probability to zero. As a,,, is uniformly bounded by (say) M, we have for E > 0

n r

and n-I1* Xr= SnCu,et converges in distribution to a normal random variable by the Linhberg condition. The asymptotic normality of 2- n uK(n,a,) follows by Lemma 6.3.1. The same arguments hold for a linear combination of n“2uK(n,r),

w # A, converges in distribution to a multivariate normal random variable. The covariance matrix is given by Theorem 4.2.1 and is

t / 2 112

I I 2 112 112 112 n112bK(n,(n.w)’ uK(n,A)y ’ K ( n . W so that 2- IQK(n,w)> ’K(n.o)* uK(n.A)* b K ( n . l ) l ,

Hence, by Theorem 5.2.4, the limiting distribution of Zn(w)/2?Tf(0) and Zn(A)/ 2MA) is that of two independent chi-square random variables with two degrees of freedom. A

Thus, for many nornormal processes, we may treat the periodogram ordinates as multiples of chi-square random variables. If the original time series is a sequence of independent (0, a’) random variables, then the periodogram ordinates all have the same expected value. However, for a time series with a nonzero autocorrelation structure, the ordinates will have different expected values. These facts have been used in constructing tests based on the periodogram.

Perhaps it is most natural to use the periodogram to search for “cycles” or “periodicities” in the data. For example, let us hypothesize that a time series is well represented by

XI = p + Acos ot + B sin cot + e l , (7.1.11)

where the e, ace normal independent (0, a’) random variables and A and B are fixed.

Page 385: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

Fit assume that w is hown and of the form 2?rkln, where R is an integer. To test the hypothesis that A = B = 0 against the alternative A 20 or B ZO, we can USe

2 (2m - 2 m ; + b;)

2 2 (u:+b:) F 2 m - 2 = m 1

/ “ I I+&

2 where F2m-2 has the F-distribution with 2 and 2m - 2 degrees of freedom. Note that the sum of squares for the mean is not included in the denominator, since we postulated a general mean p. If w cannot be expressed as 27&ln, where k is an integer, then the regression associated with (7.1.1 1) can be computed aad the usual regression test constructed.

We sometimes believe a time series mntains a periodic component, but are unwilling to postulate the periodic function to be a perfect sine wave. For example, we may feel that a monthly time series contains a seasonal compoIlent, a large portion of which is because of a high value for December. We know that any periodic function defined on the integers, with integral period H, can be represented by

-+ a0 (a,cos-p+bksin-;ii-r), 2w& 27rk LIHI

2 k = i

where L[H] is the largest integer less than or equal to H/2 . For the monthly time series one might postulate

To test the hypothesis of no seasonal effect (i.e., all A, and Bk equal zero), we form Snedecor’s F as the ratio of the mean square for the six seasonal frequencies to the mean square for the remaining frequencies. Note that theae tests assume el to be a sequence of normal independent (0, a*) random variables.

The periodogram has also been used to s m h for “hidden peridcities.” In the above examples we postulated the frequency or frequencies of interest and hence, under the null hypothesis, the ratio of the mean squares has the F- distribution. However, we might postulate the null model

X, = p + el

and the alternative model

XI = p + A cos ot + B sin wt + e l ,

where w is unknown. In such a case one might search out the largest periodogram ordinate and ask if

Page 386: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE PERIODOGRAM 363

this ordinate can reasonably be considered the largest in a random sample of size m selected from a distribution function that is a multiple of a chi-square with two degrees of freedom. A statistic that can be used to test the hypothesis is

where ZJL) is the largest periodogram ordinate in a sample of m periodogram ordinates each with two degrees of freedom. Fisher (1929) demonstrated that, for g>o,

where ( is constructed from the periodogram of a sequence of normal independent (p, r r2 ) random variables and k is the largest integer less tban g - ' . A table of the distribution of 6 is given by Davis (1941). W i k (1962, p. 529) contains a derivation of the distribution. In Table 7.1.2 we give the 1,5, and 10 percentage points for the distribution.

Under the null hypothesis that a time series is normal white noise, the periodogram ordinates are multiples of independent chi-squares, each with two degrees of M o m . Hence, any number of other "goodness of fit" tests are available to test the hypothesis of independence. Bartlett (1966, p. 318) suggested a test based on the normalized cumulative periodogram

(7.1.12)

The normalized cumulative periodogram for k = 1,2, . . . , m - 1 has the same distribution function as that of an ordered sample of size m - 1 selected from the uniform (0,l) distribution. Therefore, if we plot the normalized periodogram as a sample distribution function and apply the Kolmogorov-Smimov test of the hypothesis that it is a sample distribution function for a sample of m - 1 selected from a uniform (0,l) distribution, we have a test of the hypothesis that the original time series is white noise. This testing procedure has been discussed by Durbin ( 1967, 1969).

Example 7.1.1. In Section 9.2 a grafted quadratic trend is fitted to United States wheat yields ffom 1908 to 1971. The periodogram computed for the deviations from the trend is given in Table 7.1.3. As we are working with deviations from regression, the distributions of the test statistics are only approximately those discussed above. Durbii (1%9) has demonstrated that the critical points for the Kolmogorov-Smirnov-type test statistic using deviations from regression differ from those for an unaltered time series by a quantity that is o(n-').

Page 387: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

364 THE PERIOWGRAM, ESTIMATED SPECTRUM

Table 7.16 Percentage Points for tbe Ratio of Largest Peciodogram Ordinate to the Average ._ . -

Probability of a Larger Value Number of Ordinates 0.10 0.05 0.01

2 3 4 5 6 7 8 9

10 15 20 25 30 40 50 60 70 80 90

100 150 200 250 300 350 400 500 600 700 800 900

lo00

1.900 2.452 2.830 3.120 3.354 3.552 3.722 3.872 4.005 4.511 4.862 5.130 5.346 5.68 1 5.937 6.144 6.317 6.465 6.595 6.71 1 7.151 7.458 7.694 7.886 8.047 8.186 8.418 8.606 8.764 8.901 9.022 9.130

1.950 2.613 3.072 3.419 3.697 3.928 4.125 4.297 4.450 5.019 5.408 5.701 5.935 6.295 6.567 6.785 6.967 7.122 7.258 7.378 7.832 8.147 8.389 8.584 8.748 8.889 9.123 9.313 9.473 9.612 9.733 9.842

1.990 2.827 3.457 3.943 4.331

4.921 5.154 5.358 6.103 6.594 6.955 7.237 7.663 7.977 8.225 8.428 8.601 8.750 8.882 9.372 9.707 9.960

10.164 10.334 10.480 10.721 10.916 1 1.079 11.220 11.344 11.454

4.65 i

To test the hypothesis that the largest ordinate is the largest in a random sample of 31 estimates, we form the ratio of the fourth ordinate to the average of ordinates 1 to 31:

Page 388: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

THE PERIODOGRAM 365

Table 7.13. periodogwn of Devtatiom of Unfted States Wheat Yields from Trend

k PUiOd

in Years Periodogram

ordinate

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

64.0 32.0 21.3 16.0 12.8 10.7 9.1 8.0 7.1 6.4 5.8 5.3 4.9 4.6 4.3 4.0 3.8 3.6 3.4 3.2 3.0 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 2.1 2.0

0.54 5.15 4.74

53.50 9.86 3.02 3.20 2.62 3.56 2.7s 7.99 5.89 2.08 2.61 3.44 1.47 1.44 0.12 8.77 1.03 4.00 0.41 0.10

17.03 5.73 0.15 1.90 3.53 1.91 6.64 1.96

12.50

53.50 5.391 (= - = 9.92.

From Table 7.1.2 we see that the 1% point for this ratio is about 7.28. Thus, the null hypothesis is rejected at this level. While we may be somewhat reluctant to accept the existence of a perfect sine cycle of length 16 years on the basis of 64 observations, it is unlikely that the deviation from trend is a white noise time series.

The cumulative periodogram for the wheat yield data is displayed in Figure

Page 389: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

366 THE: PERIOLXXRAM. ESTIMATED SPECTRUM

0 8 16 24 32 k

Figure 7.1.1. Normalized cumulative periodogram for deviations of wheat yield from trend.

7.1.1. We have followed the common practice of plotting the cumulative periodogram against k. The value of the cumulative periodogram for k is plotted on the interval (k - 1, k]. The upper and lower 5% bounds for the Kolmogomv- Smirnov test have been drawn in the figure. The lines are k/31 -I- 0.245 and k/31 - 0.245, where 0.245 is the 5% point of the Kolmogorov-Smirnov statistic for sample size 31. The tables of Birnbaum (1952) indicate that for m - 1 > 30, the 95% point for the Kolmogorov-Smimov statistic is approximately 1.36(m - 1)-”* and the 99% point is approximately 1.63(m - l)-’’’. In constructing the cumulative periodogram we included a l l 32 ordinates, even though the last ordinate has only one degree of freedom. Since the normalized cumulative periodogram passes above the upper 5% line, the data reject the hypothesis of independence, primarily because of the large ordinate at R = 4. AA

73. SMOOTHING, ESTIMATING THE SPECTRUM

It is clear from the development of the distributional properties of the periodogram that increasing the sample size has little effect on the behavior of the estimated ordinate for a particular frequency. In fact, if the time series is n o d (0.1) white noise, the distribution of the periodogram ordinate is a twodegree-of-Worn chi-square independent of sample size. The number of periodogram ordinates

Page 390: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SMOOTHING. ESTIMATINa THE SPECIRUM 367

increases as the sample size increases, but the efficiency of the estimator for a particular fresumcy remains unchanged.

If the spectral density is a continuous function of w, it is natural to consider an average of local values of the periodogram to obtain a better estimate of the spectral density. Before treating such estimators, we present an additional result on the covariance properties of the periodogram.

In investigating local averages of the periodogram we find it convenient to make a somewhat stronger assumption about the rate at which the covariance function approaches zero. We shall assume that the c o v a r i m are such that

h = - n

This is a fairly modest assumption and would be satisfied, for example, by any stationary finite autoregressive process. A sufficient condition for

for a time series X, is that

where m

and {el} is a sequence of uncorrelated (0, a2) random variables. This is because

rn m

Before giving a theorem on the covariance properties of the periodogram, we present two lemmas useful in the proof.

Page 391: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

I -c I = I (fb "S Tb up - [b so3 $-v 803) '3 = ([+ 7)b so3 g

U U

d-u

' {d - 'dpp p ([+ 7)b up I ( d - u ' d } u p 3 ( f + 7)k so3 I d- u

Page 392: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SMOOTHING. ESTIMATING THE SPeCiaUM 369

where we have used Lemma 7.2.1. The sine result follows in a completely analogous manner. A

Theonm 7.2.1. Let the time series XI be defined by m

where the e, are independent (0, u2) random variables with fourth moment 77a4 and

Then

Furthermore, for the sequence composed only of even-sized samples,

Var{In(r)} = 2(4~)7~(7r) + 41).

Proop. By the definition (7.1.6) of the periodogram, we have

and

(7.2.1)

Page 393: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

370 THE PERIODOORAM. ESTiMATED SPECTRUM

where we have used (6.2.5) and a, = 0, r < 0. Now,

4 n: -

by the abswte summability of 9. If q, = 9 = 0, or if n is even and q = 9 = n, the second term of (7.2.1) is

For wk = 9 $0, a the second term of (7.2.1) is

where the inequality follows from Lemma 7.2.2. For w, = 9 the third term reduces to

By Lemma 7.2.2, for w, # 9,

and the absolute value of the second term shown above is less than w' Y - I p = - ( n - l ) /p/lHp)1J2, which, by assumption, is an-'). By a similar argument, the third term of (7.2.1) is O(n-') when w, # q. Thus, if q, = 9 = 0 ZO, Ir,

Page 394: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SMOOTHING. ESTIMATING THB SPECTaUM 371

If q = % = O , then

" - 1 2

2 h - - ( n - t ) z: 41(h) ] =2(4v)2f2(O),

and the stated results follow. A

Corollary 7.2.1. Let XI be the time series defined in Theorem 7.2.1. In addition, assume the e, are NI(0, a') random variables and that

m

Then

cOv{ln(q), lfl(q)) = O(nv2) B q @k

Proof. Under normality, the first term of (7.2.1) is zero because 71 = 3 for the normal distribution. Under the assumption that XymI j lql< *, we have

m

and the remaining two terms of (7.2.1) are See Exercise 7.9. A

Since the covariances between periodogram ordinates are small, the variance of the periodogram estimator of the spectral density at a particular frequency can be reduced by averaging adjacent penodogmn ordinates. The simplest such estimator is defined by

where

In general, we consider the linear function of the periodogram ordinates d

A u k ) = wj)f(uk+j)9 -d

(7.2.2)

Page 395: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

372 THE PERIOMxiRAM. ESTIMATED SPECTRUM

where 2;- - d ~ ( j ) = 1. The weight function ~ ( j ) is typically symmetric about zero with a maximum at zero.

We may extend frq) to all o by defining fro) =flwK(n,wJ w b the function K(n, w) was introduced in Section 7.1. In practice the vdues of flq) are often connected by lines to obtain a continuous function of o.

Theorem 7.2.2. Let the time series XI be defined by

where the e, are independent (0, cr2) random variables witb fourth moment qu4 and

Let d,, be an increasing sequence of positive integers satisfying

lim d,, = OQ , n - v

4 n - w fl l i m - = O .

Let the weight function Wn(.j), j = 0 , 2 1.22,. . . , -+-tin, satisfy

Then .&K(n,w)) defined by (7.2.2) satisfies

Proof. Now

Page 396: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SMOOTHING. WTIMATINQ THE SPECTRUM 373

Sinceflw) is u n i f d y continuous, given > 0, there exists an N such that for " 9 -fTo)l < for in the [wlp(i,w-Z3rd,/n)9 wK(n,e,+20d,/n)l'

Therefore,

By Theorem 7.2.1, for 2n-dn/n S o G ?r - 2wdnfn,

and, by the argument used for the expectation,

If o = 0 or w, only dn + 1 (or d,,) estimates i(q) are averaged, siace, for example, f ~ n - / n ) = j(-~n-/n). merefore,

VarNO)} = W,2(0)f2(O)

and the result follows. A

Corollary 7.2.2. Let the assumptions of Theorem 7.2.2 be satisfied with ~ , ( j ) = (un + I)-' . Then

Perhaps it is worthwhile to pause and summarize our results for the period- ogram estimators. First, the periodogram ordinates I&) are the sums of squares associated witb sine and cosine regression variables for the frequency q. For a

Page 397: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

374 THE PERIODWRAM ESTIMATED S I W X R U M

wide class of time series the IJy) are approximately independently distributed as [2Mw,)],y;, that is, as a multipIe of a chi-square with two degrees of freedom (Theorems 7.1.1 and 7.2.1).

Ifflo) is a continuous function of o, then, for large n, djacent periodogram ordinates have approximately the same mean and variance. Therefore, an average of 2d + 1 adjacent ordinates has approximately the same mean and a variance (2d + l)-' times that of the original ordinates. It is possible to construct a sequence of estimators based on redizations of increasing size wherein the number of adjacent ordinates being averaged increases (at a slower rate than n) so that the average, when divided by 47r, is a consistent estimator of flu) (Theorem 7.2.2). The consistency result is less than fully appealing, since it does not tell us how m y terms to include in the average for any particular time series. Some general conclusions are possible. For most time series the average of the periodogram ordinates will be a biased estimator of 4Mo). For the largest poxtion of the range of most functions, this bias will increase as the number of terms being averaged increases. On the other hand, we can expect the variance of the average to decline as additional terms are added. (See Exercise 7.11.) Therefore, the mean square error of our average as an estimator of the spectral density will decline as long as the increase in the squared bias is less than the decrease in the variance. The white noise time series furnishes the limiting case. Since the spectral density is a constant function, the best procedure is to include all ordinates in the average [i.e., to use SO) to estimate 2Mo) for all 0). For a time series of known structure we could determine the optimum number of terms to include in the weight function. However, if we possess that degree of knowledge, the estimation problem is no longer of interest. The practitioner, as he works with data, will develop certain rules of thumb for particular kinds of data. For data with unknown structure it would seem advisable to construct several averages of varying length before reaching conclusions about the nature of the spectral density.

The, approximate distributional properties of the smoothed periodogmn can be used to construct a confidence interval for the estimated spectral density. Under the conditions of Theorem 7.2.2, the I,&) are approximately distributed as in- dependent chi-squares, and therefore Ao) is approximately distributed as a linear combination of chi-square random variables. One common approximation to such a distribution is a chi-square distribution with degrees of freedom determined by the variance of the distribution.

Result 7.2.1. Let X, satisfy the assumptions of Theorem 7.2.2 and let flu) > 0. Then, for n-dn/n < w < 41 - d,/n), f - ' ( o ) ~ w ) is approximately distributed as a chi-square random variable divided by its degrees of freedom 8 where

Page 398: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SMOUIMING, ESTIMATING THE! SPECI'RUM 375

An approximate 1 - a level corhdence interval forfto) can be constructed as

(7.2.3)

where ,y:,012 is the a/2 tabular value for the chi-square distribution with u degrees of freedom. (Point exceeded with probability d2 , )

the logarithm of f iw) for time series with fto) strictly positive will have approximately constant variance,

Since the variance of f iw) is a multiple of

2 var{logfiw)}=~~)l -2var@(w)}% .

Therefore, logfiw) is often plotted as a function of w. Approximate confidence intervals for logflw) are given by

U ( 2 ) - (7.2.4) (X,./,> XV.1 -(a/ 2 )

v logfito) + log 7 log f tw) c 10gJrw) + log

Example 7.2.1. As an illustration of spectral estimation, we consider an artificially generated time series. Table 7.2.1 contains 100 observations for the time series

X,=0.7X,-, fe ,

where the e, are computer generated normal independent (0,l) random variables. The periodogram for this sample is given in Figure 7.2.1. We have C O M W ~ ~ ~ the ordinates with straight lines. The approximate expected value of the periodogram ordinates is 4mw) = 2( 1.49 - 1.4 cos @ ) - I and has also been plotted in the figure. Both the average value and the variance are much larger for ordinates associated with the smaller frequencies.

We have labeled the frequency axis in radians. Thus, the fastest frequency we can observe is r, which is 0.50 cycles per time unit. This corresponds to a cycle with a period of two time units. In applications there are natural time units such as hours, months, or years, and one may choose to label the axis in terms of the frequency in these units.

In Figure 7.2.2 we display the smoothed periodogram where the smoothed value at q is

The smoothed periodogram roughly assumes the shape of the spectral density. Because the standard error of the smoothed estimator for 2 < k < 48 is about 0.45 of the true value, there is considerable variability in the plot. The smoothing

Page 399: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

376 THE PERIOWORAM. ESTIMATED S m U M

Table 7.2.1. One Hamlred Observatlom fhom a FLrrst Order Alltoregressive Time Series with p = 0.7

._

First 25 Second 25 Third 25 Fourth 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

0.874 -0.613 - 0.366 0.850 0.110 - 1.420 2.345 0.113 -0.183 2.501 -0.308 -0.044 1.657 0.723 -0.391 1.649 -0.257 -0.095 2.498 1.051 -0.971 1.330 0.803 0.371 1.307 0.116 -1.622 3.404 -1.454 -2.941 2,445 0.2% -2.814 2.805 1.501 -1.784 1 A39 0.880 -2.471 1.240 -0.672 -3.508 1.116 0.436 -2.979 0.448 0.930 -0.779 0.377 1.168 0.869

-0.488 1.999 1.786 -0.960 1.376 0.123 -0.579 1.613 0.093 - 1.674 2.030 -0.731 -0.366 0.616 - 1.253 -0.922 0.667 -2.213 -1.174 0.707 -0.252 - 1.685 1.029 0.403

-0.955 -0.948

0.046 0.09 1 0.254 2.750 1.673 2.286 1.220

-0.256 0.252 0.325

-0.338 0.378 0.127 -2006 -2.380 - 2.024 - 1.085 1.037

-0.467 -0.794 -0.493 -0.157 0.659

introduces a Cosrelation between observations less than 2d + 1 units apart. In one sense this was the objective of the smoothing, since it produces a plot that more nearly approximates that of the spectral density. On the other hand, if the estimated spectral density is above the true density, it is apt to remain so for some distance. To compute smoothed values near zero and n; the periodic nature of the

function flu) is used. In most applications the mean is estimated and therefm ZJO) is not computed. We follow this practice in ow example. To compute the smoothed value at zero we set lJu,,) = I,,(ul) and compute

which, by the even periodic property of&@), is given by

Page 400: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SMOOTHING. ESTIMATING THE S f " R U M 377

Frequency (radians)

Ffgctre 7.2.1. Periodogram computed from the 100 autoregressive observations of Table 7.2.1 compared with 47&4.

In OUT example, replacing Iloa(oo) by Z,oo(wl),

&uo) = +[3(22.004) + 2(9.230)] = 16.894,

Similarly

As the sample size is even, there is a one-degree-of-freedom periodogram ordinate for w. The smoothed estimate at w is

in Figure 7.2.3 we plot the logarithm of the average of 11 periodogram ordinates (d = 5). The 95% confidence intervals are also plotted in Figure 7.2.3.

Page 401: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

378

28

24

20

16 C 5 0

12

a

4

c1

-

I

0.0 0.4 0.8 1.2 1.6 . 2.0 2.4 2.8 3.2 Frequency (radians)

Flgure 7.22 Smootbed periodogram (d = 2) computed from 100 autowgressive obswetions of TabIe 7.2.1 compared with 49&w).

They were consmcted using (7.2.4), so that the upper bound is

and the lower bound is

This interval is appropriate for 6 6 R rd 44 and adequate for k = 45. Confidence intervals for other values of R could be constructed using the variance of the estimator. For example, the smoothed value at zefo is

and the vaxiance is approximately & [4@O)]2=0.21[4@O)]2. Site the

Page 402: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SMOOTHING, ESTfMATINC THE SPiXTRUM 379

-

r 0

0.0 - . Smoothed periodogram + Confidence interval

-OS8 -- rlrcf(O1

I

0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 Frequency (radians)

FIgare 7.2.3. Logarithm of smoothed periodogram ( d = 5 ) and confidence interval for logarithm of 4nfw) computed from the 100 automgmsive observations of Table 7.2.1.

variance of a chi-square with 10 degrees of freedom divided by its degrees of freedom is 0.20, we can establish a confidence interval for 4MO) using the critical points for a chi-square with 10 degrees of freedom. A similar approach can be used for 1GkS5 and 46GkG50,

Example 73.2. As a second example of spectral estimation, we consider the United States monthly unemployment rate from October 1949 to September 1974, Figure 7.2.4 is a plot of the logarithm of the smoothed peciodogram using the rectangular weights and d = 5. Also included in the figure are lines defining the 95% confidence interval. This plot displays characteristics typical of many economic time series. The smoothed periodogram is high at small frequencies, indicating a large positive autocornlation for observations close together in time. Second, there are peaks at the seasonal frequencies IT, 5d6 , 2d3 , d 2 , d 3 , d 6 , indicating the presence of a seasonal component in the time series. The periodogratn has been smoothed using the simple average, and the flatness of the seasonal peaks indicates that they are being dominated by the center frequencies. That is, the shape of tbe peak is roughly that of the weights being used in the smoothing. Table 7.2.2 contains the ordinates at and near the seasonal frequencies.

Page 403: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

I Smoothed periodogram + Confidence limits

1 I I 1 I 1 I I 0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2

Frequency (radians) Figure 7.2.4. Logarithm of smoothed paiodogram for monthly United States unemptoyment rate for October 1949 through September 1974 (n = 300, d = 5).

With the exception of T, the ordinates at the seasonal frequencies are much larger than the other ordinates. Also, it seems that the ordinates adjacent to the seasonal fieqwncies are kger than those farther away. This suggests that the “seasonali- ty” in the time series is not perfectly periodic. That is, mote than the six seasonal frequencics are required to completely explain the pealrs in the estimated spectral density. AA

7.3. OTHER ESTIMATORS OF THE SPECTRUM

The smoothed periodogram is a weighted average of the Fourier transform of the sample autoccrvariance. An alternative method of obtaining an estimated spectral density is to apply weights to the estimated covarisllce function and then transform the “smoothed” covariance function. The impetus for this procedure came from a desire to reduce the computational costs of computing covariances for reaijzations with very large numbers of observations. Thus, the weight function has traditional- ly been chosen to be nonzero for the first few autocovariances and zero otherwise. The development of computer routines wing the fast Fourier transform reduced the cost of computing finite Fourier transforms and reduced the emphasis on the use of the weighted autocovariance estimator of the spectrum.

Page 404: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

OTHER ESTiMATORS OF THE SPECTRUM 381

Table 7.2.2. Perlodogrlun Ordinates Nepr Seaeonel FrequeneieS

Frequency Frequency (Radians) Ordinate (Radians) Ordinate

0.46 1 0.482 0.503 0.523 0.545 0.565 0.586

0.984 1.005 1.026 1.047 1.068 1.089 1.110

1.508 1.529 1.550 1.571 1.592 1.613 I .634

0.127 0.607 5.328

3 1.584 2.904 0.557 0.069

0.053 0.223 0.092

23.347 0.310 0.253 0.027

0.191 0.01 1 0.177 6.402 0.478 0.142 0.012

2.032 2.052 2.073 2.094 2.1 15 2.136 2.157

2.555 2.576 2.597 2.618 2.639 2.660 2.681

3.079 3.100 3.120 3.142

0.025 0.02 1 0.279 0.575 0.012 0.01 1 0.008

0.028 0.010 0.286 4.556 0.038 0.017 0.022

0.008 0.003 0.294 0.064

Let w(x) be a bounded even continuous function satisfying

w(0) = 1 ,

Iw(.>l 1 for all x . W(x)=O, kI>L

Then a weighted estimator of the spectral density is

(7.3.1)

(7.3.2)

where g,, < n is the chosen point of truncation and

1 *-& %h) = j f -h) - 2 (X, - fm)(xf+h - q, - 7; ,Lf h 2 0 .

From Section 3.4 we know that the transform of a convolution is the product of the transforms. Conversely, the transform of a product is the convolution of the

Page 405: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

382 THE PERIOWGRAM, ESTIMATED SP€XXRUM

transforms. In the current context we define the following function (or transform):

1

n-1

2n h = l

The function p(w) is a continuous function of w and is an unweighted estimator of the continuous spectral density. By the uniqueness of Fourier transforms, we can Write

We define the transform of w(n) similarly as

where W(w) is also a continuous function. It follows that

w(;) = W(w)eeUh d o I h = 0, 21, 22,. . . , f ( n - 1) I

Note that w(0) = 1 means that ,fZw W(s) ds = 1. Then, by Exercise 3.f4,

One should remember that both W(s) and p(s) are even periodic functions. Thus, the estimated spectrum obtained from the weighted estimated covariances w(hlg,)#h) is a weighted average (convolution) of the spectrum estimated from the original covariances, where the weight function is the transform of weights applied to the covariances.

The function W(w) is called the kernel or specha1 window. The weight function w(x) is often called the lag window.

Theorem 73.1. Let g, be a sequence of positive integers such that

lim &=a,

limn-’ g,=o;

n+m

n 4-

Page 406: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

OTHER ESTIMATORS OF THE SPECTRUM 383

and let XI be a time series defined by

where the e, are independent (0, 02) random variables with fourth moment 7 p 4 ,

and (aj) is absolutely summable. Let

and

where w(x) is a bounded even continuous function satisfying (7.2.5). Then

and

the conditions

Proof. See, for example, Anderson (1971, Chapter 9) or Hannan (1970, Chapter 5). A

The choice of a truncation point gn for a particular sample of rt observations is not determined by the asymptotic theory of Theorem 7.2.3. As in our discussion of the smoothed periodogram, the variance generally increases as g, increases, but the bias in the estimator will typically decrease as g, increases. It is possible to determine the order of the bias as a function of the properties of the weight function w(x) and the speed with which y(h) approaches zero [see h n (l%l)J, but this still does not solve the problem for a given sample and unknown covariance structure.

Approximate confidence limits for the spectral density can be constructed in the same manner as that used for the smoothed periodogram estimator.

Result 7.3.1. Let XI satisfy the assumptions of Theorem 7.3.1, let w(n) satisfy

Page 407: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

(7.3.1), and let f (w)>O. Then for rrv/2n< o< T - ?rvl;?n, f-'(w)&u) is approximately distributed as a chi-square random variable divided by its degrees of fbedom y where

2n v = * I_, w2(x)4.Jk

Considerable research has been conducted on the weights w(x) to use in estimating the spectrum. One of the simplest windows is obtained by truncating the sequence of autocovariances at 8,. This procedure is equivalent to applying the window

(7.3.3)

The function w(x) is sometimes called a truncated or rectangular window. While w(x) does not meet the conditions of Theorem 7.3.1, it can be shown that the conclusion holds. The spectral window for the function (7.3.3),

1 sin(g, ++)s 2 a sin(s/2) ' W(s) = -

takes on negative values, and it is possible that the weighted average fio) will be negative for some w. This is generally considered an undesirable attribute, since f ( o ) 3 0 for all o.

Bartlett (1950) suggested splitting an observed time series of n observations into p groups of M observations each. The periodogram is tben computed for each group, and the estimator for the ordinate associated with a particular frequency is taken to be the average of the p estimators; that is,

where f,,(q) is the estimator for the ordinate at frequency w, obtained from the sth subsample.

Barlett's estimator is closely related to the estimator with lag window

This window has been called modifid Bartktt or triangular. Setting g, = M, the spectral window is given by

Page 408: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE SPECTRAL eSTIMATES 385

Using Lemma 3.1.2,

sin2 ( M / 2 ) 0 21rM sin2(w/2) ’ We(w) =

To evaluate the variance of the modified Bartlett estimator, we have

M wz($ )= 2 (7) M - l h 1 2 = + M . h = - M h=-M

Thus, for the modified Bartlett estimator with covariances truncated at M, the variance of the estimated spectrum is approximately

Blackman and Tukey (1959) suggested the weight function

1-2a+2ucos?m, Ix lS l , otherwise. w ( 4 = {

The use of the window with u = 0.23 they called “hamming”, and the use of the window with a = 0.25, “harming.” Panen (1961) suggested the weight function { 1 - 6x2 + 6t.4’. bl =s 4 ,

W(X) = 2(1- 1.1)3, + s [XI 6 1 , otherwise.

This kernel will always produce nonnegative estimators. Brillinger ( 1975, Chapter 3) contains a discussion of these and other kernels.

Another method of estimating the spectral density is to estimate an auto- regressive moving average model for the time series and then use the spectral density defined by that model as the estimator of the spectral density. Because of its simplicity, the pure autoregressive model is often used. The smoothness of the estimated spectral density is a function of the order of the model fit. A number of criteria for determining the order have been suggested. See Akaike (1969a), Parzen (1974, 1977), Marple (1987). Newton (1988), and the references cited in Section 8.4.

The Burg method of fitting the autoregressive process, discussed in Section 8.2.2, is often used in estimating it. The method has the advantage that the roots of the autoregressive process are always less than one and the computations are such that it is relatively easy to consider autoregressive models of different orders.

7.4. MULTIVARIATE SPECTRAL ESTIMATES

We now investigate estimators of the spectral parameters for vector time series. Let n observations be available on the bivariate time series Y, = (Y,,, Yz,)’. The Fourier coefficients uIk , b,,, uZk, and b,, can be computed for the two time series

Page 409: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

by the formulas following (7.1.1). Having studied estimators off,,(@) and&,(@) in the preceding sections, it remains only to investigate estimators 0fft2(0) and of the associated quantities such as the phase spectrum and the squared coherency.

Recalling the transformation introdwed in Section 4.2, which yields the normalized Fourier coefficients, we shall investigate the joint distributional properties of

2- 1 n 112 (u2r + fib,) = n-112 c y2fe+t, j (7.4.1) I = I

where w, = 2?rk/n, k = 0,1,. . . , n - 1. Define a transformation matrix H by

(7.4.2)

where G is an n X n matrix with rows given by

The matrix G was introduced in (4.2.6), and is the matrix that will diagonalize a circular matrix. Let

be the 2n X 2n covariance matrix of y = (yi, yb)', where

and

v,, =

(7.4.3)

. (7.4.4)

yJ-n + 1) %,(-PI f 2) x, ( -n + 3) * * - Yl1(0) . In Theorem 4.2.1, for time series with absolutely summable covariance function, we demonstrated that the elements of GV,,G* converge to the elements of the diagonal matrix 27rDii, where the elements of Dii areXi(w) evaluated at @k = 27rkl n, k=O,1,2, ..., n - 1 .

Page 410: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATFi SPBXRAL ESTIMATES 387

It remains to investigate the behavior of GV,,G*. To this end define the circular matrix

VIZC = . ( 7 . 4 3

Then GV,,,G* is a diagonal matrix with elements

A4 - 1

, k = 0 , 1 , ..., n-1, (7.4.6) -*Pnkhln

h = - M

where we have assumed n is odd and set M = (n - 1) /2 . If n is even, the sum is from -M + 1 to M, where M = n / 2 , If y,,(h) is absolutely summable, we obtain the following theorem.

Theorem 7.4.1, Let Y, be a stationary bivariate time series with absolutely summable autocovariance function. Let V of (7.4.3) be the covariance matrix for n observations, Then, given 8 =- 0, there exists an N such that for n > N, every element of the matrix

is less than E in magnitude, where

and q: = 2vk/n, k = 0,1,. . . , n - 1.

proof. The result for D,, and D,, follows by Theorem 4.2.1. The result for Dlz is obtained by arguments completely analogous to those of Section 4.2, by showing that the elements of GV,,G* - GV,,,G* converge to zero as n increases. The details are reserved for the reader. A

If we make a stronger assumption about the autocovariance function we obtain the stronger result parallel to Corollary 4.2.1.

Page 411: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

Corollary 7.4.1.1. Let Y, be a stationary bivariate time series with an autocovariance function that satisfies

W

Let V be as defined in (7.4.3), H as defined in (7.4.2), and D as defined in Theorem 7.4.1. ?%en every element of the matrix W H * - 2nD is less than 3Lln.

Proof. Tbe proof parallels that of Corollary 4.2.1 and is reserved for the reader. A

- 1 112 By T h m m 7.4.1, the complex coefficients 2 n (alk + &,,) and 2- n (av + eBY) are nearly uncorrelated in large samples if j # k. Since

Uik - ib,k = t 6bj,n-k , i = l , 2 , k = 1 , 2 ,..., n - 1 ,

I 112

it follows that

(7.4.7)

That is, the covariance between alk and a,, is approximately equal to that between b,, and b,, while the covariance betwecn b,, and aZk is approximately the negative of the covariance between bzk and alk .

We define the cross penorbgram by

To obtain a function defined at all o, we recall the function

(7.4.9)

Cordlarg 7.4.1.2. Let Y, be a stationary bivariate time series with absolutely summable covariance function. Then

n - m lim H42n(4 = 4#12(W).

Proof. The result is an immediate consequence of Theorem 7.4.1. A

Page 412: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE SpE(TfRAL ESTIMATES 383

We now obtain some distributional results for spectral estimates. To simplify the presentation, we assume that Y, is a normal time series.

Theorem 7.4.2. Let Y, be a bivariSte normal time series with covariance function that satisfies

112 IIZ Then r k =2- n (a,k,blk,azt,b2t)' is distributed as a multivariate normal random variable with zero mean and covariance matrix

for w, f 0, T, where jJ%) = c12(wk) - .+q,2&). Also

j f k . E{r&} = U(n-') ,

It follows that

and for w, # 0, T,

Proof. Since the a, and bi, are linear combinations of normal random variables, the normality is immediate. The moment properties follow from the

A moments of the normal distribution and from Corollary 7.4.1.1.

We may construct smoothed estimators of the cross spectral density in the same manner that we constructed smoothed estimators in Section 7.2. Let

(7.4.10)

Page 413: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

390

Where

THE PERIODOGRAM. ESTlWATED SPECTRUM

d

and W,(j), j = 0, t 1,+.2, . . . , +d, is a weight function.

Tbeorem 7.4.3. Let Y, be a bivariate normal time series with covariance function that satisfies Z;,.-- lhl Ixj(h)l < L < 00, i, j = 1.2. Let dn be a sequence of positive integers satisfying

lim d, = 00, n+m

dn n+m n lim-=O,

and let Wn(j), j = 0, If: 1.22,. . . , hfn. satisfy

Then

where f,2(wKc,,,,,) is defined by (7.4.10) and K(n, o) is defined in (7.4.9).

Proof. Reserved for the reader. A

The properties of the other multivariate spectral estimators follow from Theorem 7.4.2 and Theorem 7.4.3. Table 7.4.1 will be used to illustrate the computations and emphasize the felationships to normal regression theory. Notice that the number of entries in a column of Table 7.4.1 is 2(2d + 1). We set

Page 414: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE SPECTRAL ESTIMATES 391

Table 7.4.1, StatisttCe Used in Computation oP Cross Sptxtra at ak = 2&/n, d = 2

Fourier coefficients Fourier Coefficients Signed Fourier for Yz for Y, Coefficients for Y,

' 2 . k - 2 ai.fi-2 -b , ,k -2

' 2 . k - 2 ' t , k - 2 ' 1 , k - 2

' 2 . k - I % - I -%-I

' 2 4 '1, +Ik

'2, b l , 'I It

a2,ki-2 ' l , k + 2 -b, .* +2

b2.k - I ' 1 . k - I

a2.t + I a l , k + l - b l , k + ,

' & k + I 'I,k+I

b2.k + 2 bI .4+2 ' 1 , k + 2

W , ( j ) = (26 + l)-' forj = 0,k 1,+2,. . , , +d. Then the cospectrum is estimated by

which is the mean of the cross products of the first two columns of Table 7.4.1 multiplied by n/4m The quadrature spectrum is estimated by

which is the mean of the cross products of the first and third column of Table 7.4.1 multiplied by n/4m

The estimator of the squared coherency for a bivariate time series computed from the smoothed periodogram estimator of f(w),

is given by

This quantity is recognizable as the multiple correlation coefficient of normal regression theory obtained by regressing the first column of Table 7.4.1 on the second and third columns, By construction, the second and third coIumns are orthogonal.

The estimation of the squared coherency generalizes immediately to higher

Page 415: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

392 THE PERIOWORAM, ESTIMATED SPECIXUM

dimensions. If there is a second explanatory variable, the Fourier coefficients of this variable are added to Table 7.4.1 in the same form as the columns for Yl . Then the multiple squared coherency is the multiple correlation coefficient associated with the regression of the column for Yz on the four columns for the two explanatory variables. An estimator of the error spectnun or residual spectrum of Y2 after Y, is

(7.4.14)

This is the residual mean square for the regression of the first column of Table 7.4.1 on the second and third multiplied by n/47r. Many authors define the estimator of the error spectrum without the factor (2d + 1)/2d. We include it to - make the analogy to multiple regression complete. It also serves to remind us that X2 (w ) is identically one if computed for d = 0. A test of the hypothesis that XJyt) = 0 is given by the statistic l2

(7.4.15)

which is approximately distributed as Snedccor's F with 2 and 4d (d > 0) degrees of M o m under the null hypothesis. This is the test of the hypothesis that the regression coefficients associated with columns two and three of Table 7.4.1 are

If X:2(mk) # 0, the distribution of %:z(q) is approximately that of the multiple correlation coefficient [see, for example, Anderson (1984, p. 134). Tables and graphs useful in constructing confidence intervals for Xtz(wk) have been given by - Amos and Koopmans (1963). For many degrees of freedom and X : , ( 4 # 0 , Xi,(*) is approximately normally distributed with variance

zero.

(7.4.16)

The estimated phase spectrum is

where it is understood that 3,2(wk] is the angle in (-T, n] between the positive half of the cI2(yt) axis and the ray from the origin through (Cl2(uk) , -&(wk)). The sample distribution of this quantity depends in a critical manner on the true coherency between the two time series. If X:,(w,)=O, then, conditional on fI1(q), the variables ~~(q) / f ,~ (q ) and 4 ,2 (wkVf1 , (q ) are approximately distributed as independent normal (0, f,(wk)[2(2d + l)fll(q,)J-') random vari- ables. his is because c]2(Wk)/&,l(@k) and &(wk)/f11(64) are the regression coefficients obtained by regressing column one of Table 7.4.1 on columns two and three. It is well known that the ratio of two independent normal random variables with zero mean and common variance has the Cauchy distribution, and that the arc

Page 416: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTfVARIATE S-AL ESTMATES 393

tangent of the ratio has a uniform distribution. Therefore, if X;2(wk) “0, the principal value of @)t2(wJ will be approximately uniformly distributed on the interval (-7r/2, d 2 ) .

If X 7 2 ( q ) + 0 , then &(w) will converge in distribution to a normal random variable. While approximate confidence limits could be established on the basis of the limiting distribution, it seems preferable to set confidence limits using the normality of ~ ~ ~ ( ~ f ~ ~ ( q ) and g,2(~kVf11(~k).

Fieller’s method [see Fieller (1954)] can be used to construct a confidence interval for p&). This procedure follows from the fact that the statement -ql2(w)/cI2(w) = R,,(w) is equivalent to the statement c,,(o)R,,(o) + qI2(o) = 0. Therefore, the method of setting confidence intervals for the sum c,&)RI2(w) + qt2(w) can be used to determine a confidence interval for R, , (o ) = tan plz(w) and hence for p12(w). The (1 - a)-level confidence interval for the principal value of p12(w) is the set of p12(w) in [-7r/2, 7r/2] such that

Sin2[(P, , (~) - 61zfw)l t:rc:,cw> + 4:2(w)l-’9ar(C,,(w)) 9 (7.4.18)

where

9a&(4= (4d + 2)- If,, ( w l f a ( 4 7

t, is such that P{ltl> t,} = a, and t is distributed as Student’s t with 4d degrees of freedom. To obtain a confidence interval for plz(w) in the interval (- 7r, a] it is necessary

to modify Fieller’s method. We suggest the following procedure to establish an approximate (1 - a)-levei confidence interval. Let Fid(a) denote the a% point of the Fdistribution with 2 and 4d degrees of freedom. The possibilities for the interval fall into two categories.

Note that the criterion for category 1 is satisfied when the F-statistic of (7.4.15) is less than F$(a). Assuming E l , ( @ ) and tjI2(w) to be normally distributed, it can be proven that the outlined procedure furnishes a confidence interval with probability at least 1 - a of covering the true ~(0). If the true coherency is zero, the interval will have length 27r with probability 1 - a.

Recall that the cross amplitude spectrum is

A I 2 W = [c f2 (4 + q:2(w)1”z = If,2(41

Page 417: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

394 THB PERMIWSRAM. ESTIMATED SPECTRUM

and the gain of X,, over XI, is

Estimator of these quantities are

17;,,(4 = V l l ( 4 r 1 & ( @ ) * (7.4.20)

It is possible to establish approximate confidence intervals for these quantities using the approximate normality of t2,2(tu)/fll(w) and 4&)/fIl(w). As a consequence of this normality,

has, approximately, the F-distribution with 2 and 4d degrees of freedom. Therefore, those C,~(O) and q,2(u) for which (7.4.21) is less than the a percentage tabular value of the F-distribution form a ( I - a)-level confidence region. Let

Assuming normal e,,(o) and tjI2(o), we conclude that [AL(w),AU(o)] is a confidence interval for A 1 2 ( o ) of at least level I - a. The confidence interval for gain is that of A l z ( ~ ) divided by fl I (w).

Example 7.4.1. We use the data on the sediment in the Des Moines River discussed in Section 6.4 to illustrate some of the c m spectral computations. Table 7.4.2 contains the Fourier coefficients for the first 11 frequencies for the 205 observations. For d = 5, these are the statistics used to estimate $&), where o6 = 0.0293 cycles per day.

Using the rectangular weight function, we have

Page 418: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE SeEciaAL ISTIMATES 395

Table 7.4.2. S t a ~ t l c p Used in Computing Smoothed Btimptee of Crm Spectrum for Sediment in the Des M d m s River at Baone and SaylorviHe for Frequency of 0.0293 Cycles per Day, d = 5

1

2

3

4

5

6

7

8

9

10

11

0.0049

O.Qo98

0.0146

0.01 95

0.0244

0.0293

0.034 1

0.0390

0.0439

0.0488

0.0537

205.00 -0.194 0.305

102.50 0.175 0.161

68.33 0.01 1 0.013

5 1.25 0.058 0.147

41.00 0.067 0.042

34.17 0.02 1 0.061

29.29 0.322 0.059

25.63 0.065 -0.067

22.78 -0.053 0.019

20.50 0.281 0.037

18.64 0.08 1 -0.062

-0.103 0.504 0.432 0.1% 0.047

-0.039 0.055 0.088 0.202 O.lt6 0.133 0.072 0.234

-0.012 -0.017 -0.019 -0.013

0.098 0.332

-0.053 0.152 0.024

-0.504 -0.103 -0.1%

0.432 0.039

-0.088 0.055

-0.116 0.202

-0.072 0.133 0.01 2 0.234 0.019

-0.017 -0.098 -0.013

0.053 0.332

-0.026 0.152

0.047

= 4.6768 - 1.43 13 U

It follows that

$22(0.0293) = 0.3209,

$, , (0.0293) = 0.5809 ,

$, 2( 0.0293) = 0.3722 - 0.1 I38 G: .

The error spectrum for Saylorville is estimated by the residual sum of squares obtained from the regression of the Saylorville column on the Boone columns

Page 419: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

3% THE PBRIODOGRAM. ESTIMATED SPECTRUM

multiplied by 2O5[2( 10)(471)] - I . We have

205 J;;,(0.0293) = [0.4327 - 0.6407(0.5019) - (-0.1960)(-0.1536)]

= 0.0661

The squared coherency is

The F-statistic to test the hypothesis of zero coherency is

202t,(0.0293) = 43.42. "' = 2[1 -%;,(0.0293)]

This is well beyond the 1% tabular value of 5.85 for 2 and 20 degrees of freedom, and it is clear that the two time series are not independent. The estimate of the

@,,(0.0293) = tan-'[-1.4313/4.6768] = -0.2970 radians.

phase spectrum is

+ Upper confidence fimit 2.0 + Lower confidence limit

h VI g 0.0 .- -0 E Y

0 M - $j -2.0

2 z 0-

-4.0

-6.0

0.0 0.1 0.2 0.3 0.4 0.5 Frequency (cycles per day)

Figure 7d.1. Estimated phase spectrum for Boom-Saylorville sediment data (d = 5 and rectangular weight function).

Page 420: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVAWATE SPECTRAL ESTIMATES 397

Let us establish a 95% confidence interval for qI2(0.0293). Because the F-test rejects the hypothesis of zero coherency at the 5% level, the criterion of category 2 is satisfied. Consequently the confidence interval for ( P ~ ~ ( W ~ ) is (-0.5228, -0.07 121, where

S = sin-'[ (2.086)' 20(o.8128) 0.1872 ] "' = 0.2258 ~

The estimated gain of &, over X,, is

~,,,(0.0293)1 0.3892

f l ,(0.0293) - OS809 - - = 0,6700 . $,,,<4 =

A 95% confidence interval for #,,,(w6) is given by [t,hL, t,hut], where

t,hL = 0.6700 - [( 1 l)-'(0.5809)-1(0.0661)(3.49)]"2 = 0.4800,

$" = 0.6700 + [(11)-'(0.5809)-'(0.0661)(3.49)]''2 = 0.8600.

Figure 7.4.1 is a plot of Qlz(o) and the confidence interval for plptz(o) (d = 5 and the rectangular weight function) for the Boone-Saylorville sediment data. Recall that q12(o) is an odd function of o with q12(0) = q,J.rr) = 0. Therefore, we

0.0 0.1 0.2 0.3 0.4 0.5 Frequency (cycles per day)

Figure 7.4.2 Squared coherency for the Boone-Saylorville sediment data ( d = 5 and rectangular weight hmnction).

Page 421: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

398 THE F'ERIODOGRAM. ESTIMATED SRYTRUM

define the imaginary part of Z I z m ( w k ) to be the negative of the imagirtary part of Zlzn(ok). Likewise, the imaginary part of Z12,,(a + A) is set equal to the negative of the imaginary part of Ilzm(rr - A). IfZ*&(O) or Zlzm(a) are computed, the imaginary part is zero. As a result, &(O) = 0, and &(a) = 0 if n is even. It follows that the estimated phase at zero is 0 if E,,(O) > 0 and is a if EI2(0) C 0. In plotting the estimated phase a continuous function of o is desirable. Therefore, in creating Figure 7.4.1, that angle in the set

that differed least from the angle previously chosen for q-l, k = 2,3,. . . ,102, was plotted. The general downward slope of &(o) is associated with the fact that the

readings at Saylorville lag behind those at Boone. If the relationship were a perfect one period lag, @j2(w) would be estimating a straight line with a negative slope of one radian per radian. The estimated function seems to differ enough from such a straight line to suggest a more complicated lag relationship.

Figure 7.4.2 contains a plot of squared coherency for the sediment data. The 5%

Frequency (cycles per day)

Figure 7.4.3. Plot of 49&(w) and 47&(0) for Saylorvilk sediment (d = 5 and ncmgular weight function).

Page 422: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE SPECTRAL ESTIMATES 399

point for the F-distribution with 2 and 20 degrees of freedom is 3.49. On the basis of (7.4.15), any %;,(o) greater than 0.259 would be judged significant at that level. A lie has been drawn at this height in the figure. Similar information is contained in Figure 7.4.3, where the smoothed period-

ogram for Saylorville and 4Tf"(w) are plotted on the same graph. The estimated e m spectrum lies well below the spectrum of the original time series for low frequencies, but the two nearly coincide for high frequencies. The estimated error spectrum is clearly not that of white noise, since it is considerably higher at low fquencies than at high. One might be led to consider a first or second order autoregressive process as a model for the e m .

Figure 7.4.4 is a plot of the estimated gain of Saylorville over Boone. The 95% confidence interval plotted on the graph was computed using the limits (7.422) divided by $, (w). Note that the lower confidence bound for gain is zero whenever the squared coherency falls below 0.259.

0.0 0.1 0.2. 0.3 0.4 0.5 Frequency (cycles per day)

&gom 7.4.4. Gain of Saylorvilb over Boone (d = 5 and rectangular weight function).

Page 423: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

400 THE PERIODOGRAM, ESTIMATED SPeCFRUM

REFERENCE3

Section 7.1. Bartlett (1950, 1966). Bimbaum (1952). Davis (1941), Durbin (1967, 1968, 1969). Eisenhart, Hastay, and Wallis f1947), Fisher (1929). Gnnander and Rosenbiatt (1957), Hannan (1958, 1960), Harris (1967). Kendall and Stuart (1%6), Koopmans (1974), Tintner (1952). W e y (1961), Wilh (1962).

Sections 747.3. Akaike (I%%), Anderson (1971), Bartle# (1950), Blackman and Tukey (1959), Bloomfield (19761, Box (19541, Brillinger (1975), Hannan (1960, 1970). Jenkins (1%1), Jenkins and Watts (1968). Koopmans (1974), Marple (1987), Newton (1988). Otnes and hochson (1972). Parzen (1957,1961,1974,1977), Priestly (1981).

Section 7.4. Amos and Koopmntns (19631, Anderson (19841, Brillinger (1975), Fiellcr (1954). Fishman (1%9), Hannan (1970), Jenkins and Watts (1968), Koopmans (1974). Scbeffi (1970). Wahba (1%8).

EXERCISES

1. Compute the periodogcam for the data of Table 6.4.1. Calculate the smoothed peciodogram using 2,4 and 8 for d and rectangular weights. Plot the smoothed periodograms, and observe the differences in smoothness and in the height and width of the peak at zero. Compute a 95% confidence interval for 4 M w ) using the smoothed periodogmm with d=4. Plot the logarithm of the smoothed periodogram and the confidence interval for log4.rrs(o).

2. Given in the accompanying table is the quarterly gasoline consumption in California from 1960 to 1973 in millions of gallons.

Q- Year I I1 III IV 1960 1335 1443 1529 1447 1%1 1363 1501 1576 1495 1962 1464 1450 161 1 1612 1963 1516 1660 1738 1652 1964 1639 1754 1839 1736 1955 1699 1812 1901 1821 1966 1763 1937 2001 1894 1967 1829 1966 2068 1983 1968 1939 2099 2201 2081 1969 2008 2232 2299 2204 1970 2152 2313 2393 2278 1971 2191 2402 2450 2387 1972 2391 2549 2602 2529 1973 2454 2647 ’ 2689 2549

Source. U.S. Dept. of Transp&ation (1975). Review and Analysis of Gasoline Consumption in the United Stares from 1- to the Presenr, and US. Degi. of Trnnsportation, News, various issues.

Page 424: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 401

Using these data: (a) Compute the periodogram. (b) Obtain the smoothed periodogram by computing the centered moving

average

(c) Fit the regression model

y, = a f pt +z, . Repeat parts a and b for the regression residuals 2,.

weight function W,( j) where (d) Compute the smoothed periodogram for 2, of part c using the symmetric

j = O , 0.2, j = 1 ,

0.05 , j = 3,

(e) Fit the regression model

where

1 , jthquarter, atj ={ 0 otherwise.

Repeat parts a and b for the residuals from the fitted regression. Compute and plot a 95% confidence interval for the estimated spectral density.

3. Let the time series X, be defined by

X, =el +O.&,-, ,

where {e,} is a sequence of n o d independent (0,l) random variables. Given a sample of l0,OOO observations froin such a time series, what is the appmximate joint distribution of the periodogram ordinates associated with qsoo = 27r(2500)/10,000 and 0,2s0 =2n(1250)/10,O00?

4. Prove the sine result of Lemma 7.2.2.

5. Prove that the covariance function of a stationary finite autoregressive process

Page 425: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

44x2

satisfies

THE PBRIODOGRAM, ESIPMATED SPECTRUM

h = - n

6. Use the moment propedes of the n o d distribution to demonstrate the portion of Theorem 7.4.2 that states that

7. Let X,, denote the United States quarterly unemployment rate of Table 6.4.1, and let X,, denote the weekly gross hours per production worker given in Exercise 13 of Chapter 6. Compute the penodogram quantities lll,,(a+), ZIZn(q), and Z22n(ok). Compute the smoothed @mates using d = 5 and the rectangular weight function. Compute and plot X;*(q), qI2(uk). Obtain and plot a confidence interval for q2(uk) and for the gain of X,, over XI,. Treating hours per week as the dependent variable, plot 47&,(@).

8. Show that

where

l o otherwise.

9. Let XI be a time series defined by m

where {el} is a sequence of independent (0, 02) random variables with fourth moment 9u4 and

m

Page 426: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 403

(a) Show that e

for such a time series. (b) Let d,,, Wn(j), and A m ) be as defined in Theorem 7.2.2. Show that

E N @ ) } = A m ) + o(n-'d,) .

10. Given X,, X,, . . . X,, let &(h) be defined by (7.1.10). Consider the aug- mented observations

n + 1 s t s 2 n - 1.

(a) Show that the periodogram of Y, can be written n- I

zy,2n-1(q) = (2n - 1)-'2n 2 9x(h)e'"Jh, h = - n + l

where 9 = (2n - 1)-'27rj. (b) Show that

A ( j ) = ('Ln)-I(2n - l)"zc, *

where = (2n - 1)-'27rj and

r=O

11. Suppose XI X,, . . . , X,,, are unconelated random variables. Show that

vsr((p+ l ) - 1 5 1 x i } a 3 r { P - 1 Z X , ) P

i = I i = l

unless

Page 427: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

C H A P T E R 8

Parameter Estimation

8.1. FIRST ORDER AUTOREGRESSIVE TIME SERIES

The stationary time series defined by

U,-p=p(Y,-, - p ) + e , , (8.1.1)

where the e, are normal independent ( 0 , ~ ' ) random variables and lpl< 1, is one of the simplest and most heavily used models in time series analysis. It is often a satisfactory repsentation of the error time series in economic models. This model also underlies many tests of the hypothesis that the observed time series is a sequence of independently and identically distributed random variables.

One estimator of p is the first order autocornlation coefficient discussed in Chapter 6,

(8.1.2)

- I where ji, = n X:= I U,. To introduce some other estimators of the parameter p, let us consider the distribution of the U, for the normal time series defined by (8.1.1).

The expected value of U, is p, and the expected value of Y, - pY,-, is A = (1 - p)p. For a sample of n observations we can Write

or

Y l = P + Y U,=A+pU,- , +e, , t = 2 , 3 ,..., n ,

where the vector (q, e2, e3,. . . ,en) is distribted as a multivariate normal with zero mean and covariance matrix

(8.1.4) 2 - I 2 E=diag{(l - p ) u , U ~ , U ' , . . . ,a2}.

404

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 428: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

FTRST ORDER AUTOREGRESSIVE TiME SERIES 405

It follows that twice the logarithm of the likelihood of a sample of n observations is

2logz4y: p, p, U Z ) = -n l og2r - n log ff2 + log(1 - p2)

(8.1.5)

The computation of the maximum likelihood estimators is greatly simplified if we treat Yl as fixed and investigate the conditional likelihood. This is also an appropriate model in some experimental situations. For example, if we initiate an experiment at time 1 with an initial input of Y,, it is very reasonable to condition on this initial input.

To construct the conditional likelihood, we consider the last n - 1 equations of (8.1.3). Maximizing twice the logarithm of the likelihood,

2 logL(y: & p, a21 Y,) = -(n - 1) log 27r - (n - 1) log r2 n

- g - 2 (y , - A - py,-,)2 9

r=2

leads to the estimators

(8.1.6)

i = Y ( o , - f iY( - l ) 9 (8.1.7) n

G2=(n - l)-'c [(u, -Y(o))-b(y,-, - Y ( - , J 2 , r=2

where (yt-,), y(o,> = (n - i 1 - I Xy.=2 <Y,-,, Y,). These estimators are, strictly speak- ing, not the maximum likelihood estimators for the model stated in (8.1.1). The estimator can take on values outside of (-1,1), while the maximum likelihood estimator is constrained to the parameter space.

The estimators of A and p are those that would be obtained by applying least squares to the last n - 1 equations. he least squares estimator of c2,

n

s2=(n-33)-' c [ (y , -Y(o) ) - iXK-, -Y(- t ) )12 , (8.1.8) 1-2

is typicdly used in place of 6'. The least squares estimator for p differs from (8.1.2) by terms whose order in

probability is n-'. Therefore, by (6.2.9) and Corollary 6.3.6.1, n"'(@-p) is approximately normally distributed with mean zero and variance equal to 1 - pz. The limiting distribution is also derived in the next section. Note that the estimator

Page 429: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

406 PARAMEIER ESTIMATION

of p defined by (8.1.7) can be greater than one in absolute value, while that defined in (8.1.2) cannot.

Let us now retum to a consideration of the unconditional likelihood as given by equation (8.1.5). Differentiating the log likelihood with respect to 1u, p, and c2 and setting the derivatives equal to zero, we obtain

n - I

Ic = 12 + (n - 2)(1 - P ) I j YE + (1 - p) 2 y, + Y.] , 1-2

If y is known, Anderson (1971, p. 354) shows that the maximum likelihood estimator of p is a root of the cubic equation

f lp ) = p3 + c,p2 + c2p + c3 = 0 , (8.1.10)

where cj = - (n - 2)-’nc,,

and y, = U, - p. Hasza (1980) gives explicit expressions for the three roots of (8.LlO) and shows that there is a root in each of the intervals (-m, -1), (-1, 1) and (1, a). If yI is stationary, then

[CI, c21 = - [B, (n - 2)-’nI + O,(n-’) *

where the next section that b, - po = o,(n-I’’), where po is the true value. Hence,

= [Xy=2 y:-,]-’ Z:=, yI-Iyr is the least squares estimator. We show in

2 mP’ P3 - POP2 - P + Po = (P - 1)(P - Po) ‘

It follows from the results of Section 5.8 that the three mots offlp) = 0 converge in probability to - I , p,,, and 1, respectively. Therefore, the root in ( - 1 , f ) is consistent for po. We show in section 8.4 that the least squares estimator (8.1.7) and the maximum likelihood estimator have the same limiting distribution for stationary processes.

If y is unknown, Gonzalez-Farias (1992) showed that the unconditional

Page 430: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

H I G W ORDER AUTORBORESSIVE TIME SERIFS 407

maximum likelihood estimator is a solution of a fifth degree polynomial. A numerical solution can be obtained by iterating equation (8.1.10) and the estimator for p in (#.1.9), beginning with 4 = 9,.

8.2. HIGHER ORDER AUTOREGRJWIVE TIME SERIES

8.2.1. Least Squares EstDmation for Uaivarlate Processes In this section we study the ordinary least squares estimators of the pth order autoregressive process. Consider the time series

P

U, + C aiyt-, = 0, + e, , (8.2.1) f = 1

where the! mots of

(8.2.2)

are less than one in absolute value and the e, are uncorrelated (0, a') random variables with additional properties to be specified. Because the procedures of this section are closely related to multiple regression, it is convenient to write (8.2.1) as

where 4 = -q, i = 1,2,. . . , p. If the process is stationary, the expression

where E(Y,} = p = (1 + Z%, q)-lfJo, is also useful. The ordinary least squares estimator of 6 = (8,. el,. . . ,S,)' is

a=[ l=p+l i: x:x,]-' I=p+ i: I x;U,,

(8.2.4)

(8.2.5)

By the properties of autoregressive processes, E{Y,-,e,}=O for j > O and E{X;e,} = 0. Also, if the process is stationary with autocovariance function fih), then

Page 431: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

408 PARAMETER ESTIMATION

and under the conditions of Theorem 6.2.1,

It follows that the e m r in 8 is O,(n-’”). Also, by (8.2.7) the least squares estimator is asymptotically equivalent to the estimator

where %h) is defined in (6.2.3). The estimator (8.2.8) is sometimes called the Yule-Walker estimator.

he least squares estimator of a* is n

s2=(n-2p-1) - ’ I: s;, (8.2.9)

where Z, = Y, - X# The divisor for a’ is defined by analogy to ordinary regression theory. There are n -p observations in the regression, and p 4- 1 parameters are estimated.

The asymptotic properties of d and a’ are given in the following theorem. The theorem is stated and proven for martingale difference errors, but the result also holds for el that are iid(0, (r2) random variables.

r=p+ I

Theorem 8.2.1. Let Y, satisfy

P

x = t t i + X + ~ - ~ + e , , t = p + l , p + 2 ,.... (8.2.10) j - I

Assume (Y l , Y2,. . . , Y ) is fixed, or assume (Y,, Y2, . . . , Y,) is independent of e, for t > p and E{IY,12+’}<m for i = 1,2,. . . , p and some v>O. Let the roots of

(8.2.11)

be less than one in absolute value. Suppose {e,}]-, , is a sequence of (0, a‘) random variables with

Page 432: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

HIGHER ORDER AUTOREGRESSIVE TIME SERIES

and

for all t and some Y > 0, where d,-, is the sigma-field generated by {el : j C t - 1). If (Y, , Y2, . . . , Yp) is random, the sigma-field is generated by {ej : j S t - 1) and V I , y2, * 9 Yp>. Let

- 1

6= 1c x : x , i: x;q, 1 ,=p+l

where X l = ( l , q - I , q - 2 ,..., yt-p). Then

(a) n”’( b - 0)’. N(0, A-’02), and (b) 6 ’ 4 a’,

where

A = l i n-’ E(XiX,), ,+- 1=p+l

and 8’ is defined in (8.2.9).

Proof. We have

Given e>O, there is some No such that Z:=,+, XiX, is nonsingular with probability gteater than 1 - E for n >No. Let q be a column vector of arbitrary real numbers such that q’q#O, and consider those samples for which Y - p + I XiX, is nonsinguiar. Let

l=p+I r=p+l

where Z,, = n-112q‘Xief. Then E@,, I a?,-, I ) = 0 as.,

6 ~ = E { 2 ~ , ) 5 8 , _ , } = n - ’ ~ ’ X ; X f ~ * ,

and

imp+ I

Page 433: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

410 PARAMETER ESTIMATION

By Theorem 6.3.5,

Now

and 2 lim s,, =lim E{S;,} = q1Aqu2 ,

R - P m-tm

and hence condition ii of Theorem 5.3.4 is satisfied. We now investigate

<s;;-ye-~n-*-u'2L 2 E{Iq'x;lz+"}. 1=p+l

Let wj be the weights of Theorem 2.6.1. By Holder's inequality,

Therefore, E{Iq'X:12+Y} is bounded, condition iii of Theorem 5.3.4 is satisfied, and result a follows.

To prove part b, we observe that Z1 = e, - X,(d - 6) and

9 8: = i: e: - (n -p)(d-- @)'An(ij- e) f = p + l 1=p+l

= ji e,2+Op(1) t = p + l

Page 434: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

HIGHER ORDER AUTOREGRESSIVE TIME SERIES 41 1

where A, = (n - p ) - ’ Z2Ym,,+, XiX,. The conclusion of part b follows because A (n - 2p - I ) - ’ Xy=,+, e, converges to uz a.s. by Corollary 5.3.8.

Because the matrix A, = (n - p) - ’ X:=, + , X:X, converges to A in probability, the usual regression distribution theory holds, approximately, for the a u t o r e p - sive model. For exampie, u ~ ” ’ ( ~ - e), where vli is the ith diagonal element of (X:,,,, X:X,)-’&’, is approximately a N(0,l) random variable.

While we have obtained pleasant asymptotic results, the behavior of the estimators in small samples can deviate considerably Erom that based on asymptotic theory. If the roots of the autoregressive equation are near zero, the approach to normality is quite rapid. For example, if p = 0 in the first order autoregressive process, the normal or t-distribution approximations will perform very well for n > 30. On the other hand, if the roots are near one in absolute vdue, very large sampies may be required before the distribution is well approximated by the normal.

In Figure 8.2.1 we present the empirical density of the least squares estimator b

0.5 0.6 0.7 0.8 0.9 1 .o 1.1 Parameter Estimate

FIGURE 8.21. Estimated density of b comparcd with normal approximation for p = 0.9 and n = 100. (Dashed line i s m#mal density.)

Page 435: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

412 PARAMETER ESTIMATION

defined in (8.1.7). The density was estimated from 20,000 samples of size 100. The observations were generated by the autoregressive equation

y, = 0,9y,-, + e, ,

where the e, are normal independent (0. I ) random variables. The empirical distribution displays a skewness similar to that which we would encounter in sampling from the binomial distribution. The mean of the empirical distribution is 0.861, and the variance is 0.0032. The distribution obtained from the normal approximation has a mean of 0.90 and a variance! of 0.0019. The mean of the empirical distribution agrees fairly well with the approxi-

mation obtained by the methods of Section 5.4. The bias approximated from a first order Taylor series is E { b - p}=-n-'(l + 3p). This approximation to the expectation has been discussed by Mariott and Pope (1954) and Kendall(l954). Also see Pantula and Fuller (1985) and Shaman and Stine (1988).

Often the practitioner must determine the degree of autoregressive process as well as estimate the parameters. If it is possible to specify a maximum for the degree of the process, a process of that degree can first be estimated and high order terms discarded using the standard regression statistics. Anderson (1962) gives a procedure for this decision problem. Various model building methods based on regression theory can be used. Several such procedures are described in Draper and Smith (1981). In Section 8.4, we discuss other order determination procedures.

Often one inspects the residuals from the fit and perhaps computes the autocorrelations of these residuals. If the model is correct, the sample auto- correlations estimate zero with an e m that is Op(n-1'2), but the variances of these estimators are generally smaller than the variances of estimators computed h m a time series of independent random variables. See Box and Pierce (1970) and Ljung and Box (1978). Thus, while it is good practice to inspect the residuals, it is suggested that final tests of model adequacy be constructed by adding terms to the model and testing the hypothesis that the true value of the added coefficients is zero.

Example 8.2.1. To illustrate the regression estimation of the autoregressive process, we use the unemployment time series investigated in Section 6.3. The second order process, estimated by regressing Y, - j i , on - Y,, and Y,-z - Ya, is

P, - 4.77 = 1.568 (Y,-, - 4.77) - 0.699 (q-2 - 4.77), (0.073) (0.073)

where j$,, = 4.77 and the numbers below the coefficients are the estimated standard errors obtained from the regression analysis. The residual mean square is 0.105.

If we condition the analysis on the first two observations and regress Y, on Y,-, and Y,-z including an intercept term in the regression, we obtain

Page 436: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

HIGHER ORDER AUTOREGRESSIVE TIME SERIES 413

= 0.63 + 1.568 Y,-, - 0.699 K v 2 . (0.13) (0.073) (0.073)

The coefficients ace slightly different from those in Section 6.4, since the coefficients in Section 6.4 were obtained from equation (8.2.8).

To check on the adequacy of the second order representation, we fit a fifth order process. The results are summarized in Table 8.2.1. The F-test for the hypothesis that the time series is second order autoregressive against the alternative that it is fifth order is

F i g = 0.478[3(0.101)]-’ = 1.578.

The tabular 0.10 point for Snedecor’s F with 3 and 89 degrees of freedom is 2.15, and so the null hypothesis is accepted at that level. AA

8.23. Alternative Estimators for Autoregressive Time Series

The regression methd of estimation is simple, easily understood, and asymp- totically efficient for the parameters of stationary autoregressive processes. However, given the power of modern computing equipment, other procedures that are more efficient in small samples and (or) appropriate for certain models can be considered.

If the V, are normally distributed, the logarithm of the likelihood af a sample of n observations from a stationary pth order autoregressive process is the generali- zation of (8.1.5).

where Y’ = (Y,, Yz, . . . , Y,), J’ = (1,1,. . . ,1), Z,, is the covariance matrix of Y expressed as a function of (u’, e,, 8,. . . . , e,), and

Table 8.2.1. A ~ l J r s l s of Variance for Quarterly Season- ally Aajustea Unempioyment rate, 1948 to 1972

Degreesof Mean source Freedom Square

q-1 1 112.949 q - 2 after q-3 1 9.481 y-, after q-,, Y,-z 1 0.305 Y,-4 after Y,-l, Yr-2, Y,-3 1 0.159 K - 5 after q-1, K - 2 , y-3, q--4 1 0.014 Error 89 0.101

Page 437: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

414 PARAMETBR ESTIMATION

= E l - 2 - l 4 . (8.2.13)

Several computer packages contain algorithms that compute the maximum likelihood estimator. In Section 8.4, it is proven that the limiting distribution of the maximum likelihood estimator is the same as the limiting distribution of the ordinary least squares estimator.

By Corollary 2.6.1.3, a stationary autoregressive process can be given a forward representation

or a backward representation

where {el} and {u,} are sequences of serially uncorrelatd (0, rr2) random variables. The ordinary least squares estimator of a = (al, a i l . . . , ap) is the value of a that minimizes the sum of squares of the estimated e,. One could also construct an estimator that minimizes the sum of squares of the estimated u,. This suggests a class of estimators, where the estimator of a is the a that minimizes

(8.2.14)

The ordinary least squares estimator is obtained by setting w, 1. The estimator obtained by setting w, = 0.5 was studied by Dickey, Hasza, and Fuller (1984). We call the estimator with w, = 0.5 the simple symmetric estimator. For the zero mean first order process, the simple symmetric estimator is

Because I$-,$[ G00,5(Y~-, + Y:), GIs for the first order process is always less than or equal to one in absolute value.

We call the estimator constructed with

t = l , 2 ,..., p , t = p + 1, p + 2 , , . . , n - p + 1 , t = n - p + 2 , n - p + 3 , . . , , n ,

w, = (n - 2 p + 2)-'(t - p ) , (8.2.15) c : the weighted symmetric estimator. The weights (8.2.15) assume n 3 2 p . The

Page 438: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

weighted symmetric estimator is nearly identical to the maximum likelihood estimator unless one of the estimated roots is close to one. Tbe roots of the weighted symmetric estimator with weigbts (8.2.15) am not reshcted to be less than one in absolute value. For the zero mean 6rst order process, the weighted symmetric estimator with weights (8.2.15) is

Thus, the least squares estimam , and Mer in the weights given to Y, and

Tabk 8.2.2 contains the variables nquired for the symmetric estimation of the Y, in the divisor.

pth order process. Tbe estimator is

i3 = -(x’wX)-’x‘wy, where Xis the (2n - 2p) X p matrix below the headings -al , -%, . . . , -ap, Y is the (2n - 2p)-dimensional column vector called the dependent variable, and W is the (2n - 2p) diagonal matrix whose elements are given in the “Weight” column. An estimator of the covariance matrix of i3 is

Where

B2 = (n - p - l ) - ’ (Y’wy - B’X’WY).

Parameter

- 4 ... Dependent weight Vdrrbk - 0 1 -a,

Y l

y2

... yp-l W p + , 5. I u,

Wp+2 u ,+2 Yp+, Y, ...

Y” -p

y. Ym- 1

... Wn y. y.- l K - 2

1 - %-p+ I u, -p u , - p + , y,-,+, 1 - wn-p Ym-p-l Y” - p L P + ,

...

...

Page 439: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

416 PARAMETER ESTIMATION

If the mean is unknown, there are several ways to proceed. One option is to replace each U, in the table with Y, - J, where j = n-’ Zy=l Y,. The second is to replace the elements in each column of the table with U, - y(i), where ji(,) is the mean of the elements in the ith column. The third is to add a column of ones to the table and use the ordinary least squares formulas. The procedures are asymp- totically equivalent, but the work of Park (1990) indicates that the use of separate means, or the column of ones, is p r e f d in small sampies when the process has a root close to one in absolute value. If the estimator of the mean is of interest, the mean can be estimated by j , or one can use estimated generalized least squares where the covariance matrix is based on the estimated autoregressive parameters.

If a regression program has a missing value option that omits an gbservation if any variable is missing, one can obtain the estimators by creating a data set composed of the original observations followed by a missing value, followed by the original data in reverse order. The created data set contains 2n + 1 “observa- tions.” Lagging the created vector p times gives the p explanatory variables required for the regression. Then calling the regression option that deletes observations with a missing value enables one to compute the simple symmetric estimator for any autoregression up to order p. The addition of weights is required to compute the weighted symmetric estimator.

The ideas of partial autocomlation introduced in Section 1.4 can be used in a sequential estimation scheme for autoregressive models. Let a sample of n observations (Y,, Y 2 , . . . , Y,) be available, and define X, = U, - y,, where 7, = n-l x:=, u,. Let

-112 a

t = 2 t -2 r-2 (8.2.16)

be an estimator of the first autocomelation. Then an estimator of the first order autoregressive equation is

q = y n u - $1) + 4’u,-’ 9 (8.2.17)

and an estimator of the residual mean square for the autoregression is

(8.2.18)

where

A test that the first order autocomiation is zero under the assumption drat higher order partial autocorrelations are zero is

t , = [(n - z)-l(l - B f I ) ] - ’ ” 4 , . (8.2.19)

Page 440: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

HIGHER ORDER AUTOREGRESSIVE TIME SERIES 417

Under the null hypothesis, this statistic is approximately distributed as a N(0,l) random variable in large samples.

Higher order partial autocorrelations, higher order autoregressions, higher order autocorrelations, and tests can be computed with the following formulas:

(8.2.20)

where

and 20) = 1. The estimated partial autocorrelations ti ate defined for i = 1,2,. . . , n - 1, and t, and +f,i) are defined for i = 1,2,. . . ,n - 2. A test that 4, = 0, under the assumption that 4j = 0, j > i, is t,, which is the generalization of t , of (8.2.19).

If a pth order autoregression is selected as the representation for the time series, the covariance matrix of the vector of coefficients can be estimated with

V{C?,} = (n - 1 -p)-lP-l&;pp), (8.2.21 )

where

1 $(2) ..' &p-1) k l ) . . * 2 p - 2 )

1 1, ,-[ 2:) 1

2 p - 1 ) z ; ( p - 2 ) k p - 3 ) * * '

eP = (4,. &, . . . , dpP)', and a h ) is defined in (8.2.20). The matrix P-' is also given by

B ' S - I B , (8.2.22)

Page 441: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

418

where

PARAMEWER ESTIMATION

0 1

- 4 2

[email protected]

0 0 1

- Jp -3,p- I 1 ... 1. An alternative estimator of the partial autocornlation is

It can be verified that 4, is also always less than one in absolute value. The estimator (8.2.23) in combination with (8.2.20) was suggested by Burg (1975).

The sequential method of computing the automgmssion has the advantage for stationary time series that all roots of the estimated autoregression are less than one in absolute value. The estimators obtained by the sequential methods are very similar to the simple symmetric estimators. The sequential procedure also uses all available observations at each step. If regressions are only computed by regression in one direction, moving from an equation of order p - 1 to an equation of order p involves dropping an observation and adding an explanatov vsriable. Hence, the maximum possible order far the one-direction regression procedure is i n . The sequential procedure defines the autoregression up to order n - 1.

The alternative estimators are asymptotically equivalent for stationary auto- regressive processes.

"heorern 8.2.2. &et the assump$ons of Theorem 8.2.1 hold. Then the limiting distribution of n"2(6- 6). where 6 is the maximum likelihood estimator, the simple symmetric estimator, the partial correlation estimator, or the weighted symmetric estimator, is the same as that given for the ordinary least squares estimator in Theorem 8.2.1.

Proof. Omitted. A

The maximum likelihood estimator is available in many computer packages and performs well in simulation studies for the correct model. The simple symmetric estimator, the weighted symmetric estimator, the Burg estimator, and the maxi- mum likelihood estimator have similar efficiencies for stationary processes. The maximum likelihood estimator and the weighted symmetxic estimator perform better than other estimators for processes with roots close to one in absolute value. The maximum n o d likelihood estimator and the partial correlation methods

Page 442: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

HIGHER ORDER AUTOREORHSIVE TIMG SERIES 419

produce estimated equations such that the Foots of the estimated characteristic equation are all less than one in absolute value. It is possible for the roots associated with ordinary least squares or with the weighted symmetric estimator to be greater than one in absolute value. The ordinary least squares estimator perfonns well for forward prediction and is recommended as a preiiminary estimator if it is possible that the process is nonstationary with a root greater than one.

8.2.3. Multivariate Autoregrdve Time Series

In this subsection, we extend the autoregressive estimation procedures to vector valued processes. Let Y, be a k-dimensional stationary process that satisfies the equation

P

Y , - p + ~ A i ( Y , - i - p ) = e r (8.2.24)

for t = p + 1, p f 2,. . . , where e, are independent (0,s) tandom variables or martingale differences. If the process is stationary, E{Y,) = p. The equation can also be written

I = i

P

Y, + 2 AIY,+ = 4 + e f t i = 1

(8.2.25)

where t& is a k-dimensional column vector. Let Y,,Y, , . . ., be observed. Discussion of estimation for vector autoregressive processes can proceed by analogy to the univariate case. Each of the equations in (8.2.25) can be considered as a regression equation. We write the ith equation of (8.2.24) as

D

(8.2.26)

where Y,, is the ith element of Y,, 8,, is the ith element of $. A,, is the ith row of A,, and e,, is the ith element of e,.

and X, = (1, Y;-i, Y;-,,. . . , Y;-,), we can write equation (8.2.26) as

Defining = (e,,, -Ai, , , -Azk , . . . , -Ap, )', i = 1,2, . . . , k,

<, =X,q fe,, . (8.2.27)

On the basis of our scalar autoregressive results, we are led to consider the estimators

i = 1,2,. , .,k. (8.2.28)

If we let A' = (-4, A,, A,, . . . , Ap)', then the system (8.2.25) can be written

Page 443: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

420 PARAMETER ESTIMATION

Y:= -X,A'+e;, t = p + 1 1 p + 2 ,..., n , (8.2.29)

and the ordinary least squares estimator of A' is

(8.2.30)

where 4 is the ith column of -A' and 4 is the ith column of -A. The least squares estimator of 2 is

n

2 = [ n - ( k + l ) p - l ] - ' t: GrG;l (8.2.31) r=p+l

W h e r e

P

8, = Y, - 4 + I= A,Yl-p 1 - 1

and fi is defined in (8.2.30). An alternative method of computing the estimator of (A,, . . . AP) = Alzl is as

i: u:(yr-y)l* (8.2.32)

where U, = (Yl'-, - y'l Yim2 - ji', . . . , Y:-,, - 7') and = n-' Xf=, Y,. The dis- tributions of the estimators are the vector analogs of the distributions of Theorem 8.2.1.

r=p+ 1 1-=p+l

Theorem 8.2.3. Let the vector time series Y, satisfy D

C A ~ Y , - ~ = 8 + el j = O

for t = p + 1, p f 2,. . . , where {el} is a sequence of k-dimensional random variables and the Aj are fixed k X R matrices such that A, = I and the roots of

are less than one in absolute value. Suppose {el};s, is a sequence of (0 .2) random vectors with

E{(e,, ere:) 14-J = (0, a.s.

and

E{le112+' 1 &,) < L < QI a.s.

Page 444: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MOVING AVERAGE TIME SERW 421

for all t and some Y > 0, where 4- is the sigma-field generated by {e, : j G t - 1). Assume

is positive definite. If (Yl,Yz,. . . ,Yp) are random, it is assumed that E{IYi(’+”) < 03 for i = 1,2, . . . , p and the sigma-field is generated by (e): j G

n*”(d- 8)’. N(O,S@G-’) *

t - 1) and (Y,,Y,, . . . , Y,). Then

and %$s, where 4 is defined in (8.2.28), 2 is defined in (8.2.31),

6’ = (el, e;*. . . , 8;) ,

and Z 8 G - I is the Kronecker pmduct of the matrices 2 and G-’.

PmoC We only outline the proof, because it differs in no substantive way from that of Theorem 8.2.1. Since Y, is converging to a stationary time series, plim(n - p)-l 2:,,+, XiX, = G by Corollary 6.3.5. Therefore, the limiting dis- tribution of 6 follows from that of n*”(n -p) - ’ X:..p+l X: e,,, i = 1,2,. . . , k. By arguments paralleL to those of Theorem 8.2.1, we obtain the limiting distribution. If Z is singular, then the limiting distribution contains singular components. A

An alternative estimator for the vector process is the maximum likelihood estimator which can be computed by numerical procedures. The limiting dis- tribution of the maximum likelihood estimator is the stme as that of the least squares estimator for stationary vector processes.

8.3. MOVING AYERAGE TlME SERIES

We have seen that ordinary least squares regression procedures can be used to obtain efficient estimators for the parameters of autoregressive time series. Unfortunately, the estimation for moving average processes is less simple. By the results of Chapter 2 we know that there is a relationship between the

correlation function of a moving average time series and the parameters of the time series. For example, for the first order process, p( 1) = /3( 1 + /3’)-’, where f i is the parameter of the process. On this basis, we might be led to estimate f3 from an estimator of p(1). Since we demonstrated in Chapter 6 that

estimates p(1) with an error which is Op(n-f’Z), it follows that

Page 445: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

422 PARAMETER ESTIMATION

[2?(1)]-'{1 - [I -4i2(1)]"2}, O<)ql) (G0.5 , 41) < -0.5, 41) > 0.5 , (8.3.1) ji~={ ;;,

0 , 41)=0,

estimates j3 with an error of the same order, Obviously, if 31) lies outside the range (-0.5,0.5) by a significant amount, the mode1 of a first order moving average is suspect.

It follows from equation (6.2.8) that

var(q1)) = f i ( i + f i 2 p ( 1 + pz + 4p4 + p6 + pa) + qn-2). (8.3.2)

Because the derivative of p(1) with respect to j3 is (1 -i- /?2)12(1 - PZ), the approximate varianw of &, is, for IsI< 1,

var{jir}=n-'(1 -j32)-2(1 + j3z+4j34+p6+p8) . (8.3.3)

While the estimator (8.3.1) is consistent, we shall see that it is inefficient for /3 # 0. Roughly, the other sample autocorrelations contain information about /3.

More efficient estimators can be obtained by least squares or by likelihood methods. We first present an estimation procedure based on the Gauss-Newton method of estimating the parameters of a nonlinear function discussed in Chapter 5. One purpose for describing the procedure is to demonstrate the nature of the derivatives that define the limiting distribution of the estimator. In practice, there are a number of computer programs available to perform the numerical estimation. Maximum likelihood estimation is discussed in Section 8.4.

Consider the first order moving average

T = e , + & - , , (8.3.4)

where I#3l< 1 and the e, are independent (0, a2) random variables. Equation (8.3.4) may also be written

(8.3.5) e, = -Be,-, + 5 , and, using our previous difference equation results, we have

I- I

(8.3.6)

where

and

Page 446: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MOVING AVERAGE TIME SBRlRs 423

(8.3.7)

for t > 1. The expression (8.3.6) places the estimation problem in the nonlinear estimation format of Section 5.5.2. We assume initial estimators @ and Zo satisfying (@ - /3) = ~ ~ ( n - ” ~ ) and

Z,, = Op(l) are available. These requirements are satisfied if one uses Z, = 0 and the estimator & of (8.3.1). The one-step Gauss-Newton estimator of #? is obtained by regressing the deviations

I - I

e,(R 8) = Y, --x(Y; P, ZJ = C (-P>’Yt-, + (-B>’co (8.3.8) J’o

on the first derivative of i ( K 8, e,) evaluated at p = p; that derivative is

We could also include e, as a “random parameter” to be estimated. The inclusion of the derivative for a change in 6, does not affect the limiting distribution of the estimator of #? for invertible moving averages. Therefore, we simplify our discussion by considering only the derivative with respect to 8.

The computation of c,(Y;@) and W,(Y;P) is simplified by noting that both satisfy difference equations:

and

The difference equation for e,(Y;p) follows directly from (8.3.5). Equation (8.3.11) can be obtained by differentiating both sides of

e,(R B ) = Y, - Be,-, (K B)

with respect to p and evaluating the resulting expression at /3 = 8. improved estimator of #? is then

Regressing e,(U;@) on y(Y;p), we obtain an estimator of p - p . The

Page 447: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

424 PARAMETER ESTIMATION

The asymptotic properties of B are developed in Theorem 8.3.1. An interesting result is that the large sampb behavior of the estimator of the moving average parameter /3 is the same as that of the estimator of the autoregressive process with parameter -/3. The limiting distribution follows from the fact that the derivative (8.3.11) evaluated at the m e /3 is an autoregressive process.

Theorem 83.1. Let Y, satisfy (8.3.4), where 1/3’1< 1, Po is the true value, and the e, are independent (0, a’) random variables with E{ler[2+v) < L < 03 for some Y > 0. Let Z,, and 6 be initial estimators satisfying i?,, = Op( l), B - /3 = ~ , ( n - ” ~ ) , and 161 < 1. Then

n”2(/9- / ? O ) O ’ “0,l - ( P O ) ’ ] ,

where 6 is defined in (8.3.12). Also, & ’ 2 ~ ( c 1 0 ) 2 , where a” is the true value of a and

n

&* = n-’ C ej(E 4). r = 1

Proof. The model (8.3.6) is of the form (5.5.52) discussed in Section 5.5.2. The first derivative ofx(Y; @) is given in (8.3.9). The next two derivatives are

I- I

= C j(j - l)(j - 2)(-@)j-3q-, -t t(t - I)(r - 2~- /3 ) ‘ -~e , , J = 3

where it is understood that the summation is defined as if U, = 0 for t 6 0. Let $ be a closed interval containing p as an interior point and such that maxBES 1/31 C A < 1. By Corollary 2.2.2.3, f;(R /3) and the derivatives are moving averages with exponentially declining weights for all #3 in 3. Heoce, for @ in s they converge to stationary infinite moving average time series, the effect of e, being transient. It follows from Theorem 6.3.5 that the sample covariances and autocovariances of the four time series converge in probability for all @ in s and the limits are continuous functions of /3. Convergence is uniform on the compact set $. Hence, conditions 1 and 2 of Theorem 5.5.4 are satisfied and

Page 448: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MOVING AVERAGE TIME SERIES 425

If #3 = po,

= 2 (-@)'-let-j + (-po)r-l[ti;o - (f - l)eo] I = 1

and W,(Y; Po) is converging to a stationary first order autoregressive process with parameter -Po. Hence,

by Theorem 6.3.5, and

by the arguments used in the proof of Theorem 8.2.1. Thus, the Iimiting distribution for n"*@ - Po) is established.

Because 8' is a continuous function of ) and because n-' X:-, e:(Y; p) converges u n i f d y on S, it follows that 8' converges to (a')' in probability. A

Comparison of the result of Theorem 8.3.1 and equation (8.3.3) establishes the large sample inefficiency of the estimator constructed from the first order autocorrelation. By the results of Theorem 8.3.1, we can use the regular regression statistics as approximations when drawing inferences about #3.

In our discussion we have assumed that the mean of the time series was known and taken to be zero. It can be demonstrated that the asymptotic results hold for Y, replaced by Y, - p", where f,, is the sample mean.

The procedure can be iterated using 4 as the initial estimator. In the next section we discuss estimators that minimize a sum of squares criterion or maximize a likelihood criterion. The asymptotic distribution of those estimators is the same as that obtained in Theorem 8.3.1. However, the estimators of the next section generally perform better in small samples.

A method of obtaining initial estimators that is applicable to higher order processes is an estimation procedure suggested by Durbin (1959). By Theorem 2.6.2, any qth order moving average

Y, = C ~ s e r - a + er (8.3.13) r = I

for which the roots of the characteristic equation are less than one can be represented in the form

Page 449: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

426 PARAMETER ESTIMATION

m

y, = -2 cjy,-j + e, , / = I

where the weights satisfy the difference equation c, = -PI, cz = -& - pic, ,

c j= -2 pmcj-m, j = q + 1 , q + 2 , .... m = l

Since the weights cj are sums of powers of the roots, they decline in absolute value, and one can terminate the sum at a convenient finite number, say k. Then we can write

&

u,=- 2 cjyt-/ +- e, . (8.3.15) /= 1

On the basis of this approximation, we treat Y, as a finite autoregressive process and estimate cj, j = 1,2, . . . , k, by the regression procedures of Section 8.2. As the true weights satisfy equation (8.3.14), we treat the estimated c,’s as a fiNte autoregressive process satisfying (8.3.14) and estimate the p’s. That is, we treat

(8.3.16)

as a regression equation and estimate the P’s by regressing -Cj on

1,2, . . . , q, as per (8.3.14).

possible to use the results of Berk (1974) to demonstrate that

* * 9 C p q , A where the appropriate modifications must be made for j =

Ifwelet {k,}beasequencesuch thatk,,=o(n”’) andk,+masn-+oo,itis

k. 2 ( t j -C j )2=Op(k”n- ’ ) . j - I

It follows that the preliminary estimators constructed from the 6, will have an error that is oP(n-“’).

In carrying out the Gauss-Newton procedure, initial values Z, - q , Zz-q,. . . , Z,, are required. The simplest procedure is to set them equal to m. Alternatively, the autoregressive equation (8.3.15) can be used to estimate the Y-values preceding the sample period. Recall that a stationary autoregressive process can be written in either a forward or a backward manner (Corollary 2.6.1.2) and, as a result, the extrapolation formula for Yo is of the same form as that for Y,,,. If the true process is a qth order moving average, one uses the autoregressive equation to

Page 450: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MOVING AVERAGE TIME SERIES 427

estimate q values, since the best predictors for Y,, t S -4, are zero. Thus, one predicts Yo, Y-,, . . . , Yt-q , sets Y, = 0 for t < 1 - q, and uses equation (8.3.13) and the predicted Y-values to estimate e2-,, . , . , e,.

Example 83.1. We illustrate the Gauss-Newton procedure by fitting a first order moving average to an artificially generated time series. Table 8.3.1 contains 100 observations on X, defined by X, = 0.7e,-, + e,, where the e, are computer generated n o d independent (0,l) random variables. We assume that we know the mean of the time series is zero.

As the first step in the analysis, we fit a seventh order autoregressive model to the data. This yields

= 0.685 Y,-l - 0.584 Y,-2 + 0.400 Y,-3 - 0.198 K - 4

(0.108) (0.13 1) (0.149) (0.152)

f 0.020 Y,-5 + 0.014 q-6 + 0.017 Y,-7, (0.149) (0.133) (0.108)

Table 8.3.1. One Hundred Observations Prom a First Order Moving Average Time Series witb f l = 0.7

First 25 second 25 Third 25 Fourth 25

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

1.432 -0.343 - 1.759 -2.537 -0.295

0.689 -0.633 -0.662 -0.229 -0.851 -3.361 -0.912

1.594 1.618

- 1.260 0.288 0.858

- 1.752 -0.960

1.738 - 1.008 - 1 .589

0.289 -0.580

1.213

1.176 0.846 0.079 0.815 2.566 1.675 0.933 0.284 0.568 0.515

-0.436 0.567 1.040 0.064

-1.051 -1.845

0.281 -0.136 -0.992

0.321 2.621 2.804 2.174 1.897

-0.781

-1.311 -0.105

0.313 -0.890 - 1.778 -0.202

0.450 -0.127 -0.463 0.344

-1.412 - 1.525 -0.017 -0.525 - 2.689 -0.21 1

2.145 0.787

-0.452 1.267 2.316 0.258

- 1.645 - 1.552 -0.213

2.607 1.572

-0.261 -0.686 -2.079 -2.569 -0.524 0.044

-0.088 - 1.333 - 1.977

0.120 1.558 0.904

- 1.437 0.427 0.051 0.120 I .460

-0.493 -0.888 -0.530 -2.757 - 1.452

0.158

Page 451: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

428 PAhWETEJt ESTIMATION

where the numbers in parentheses are the estimated standard errors obtained from a standard regression program. The residual mean square for the regression is 1.22. The first few regression coefficients are declining in absolute magnitude with alternating signs. Since the third coefficient exceeds twice its standard error, a second order autoregressive process would be judged an inadequate representation for this realization. Thus, even if one did not know the nature of the process generating the data, the moving average representation would be suggested as a possibility by the regression coefficients. The regression coefficients estimate the negatives of the ci, j 1, of Theorem 2.6.2. By that theorem, /3 = -cI and c, = -pi- j = 2,3, . . . . Therefore, we arrange the regression coefficients as in Table 8.3.2, The initial estimator of B is obtained by regressing the firpt row on the second row of that table. This regression yields b=O.697.

Our initial estimator for e, is

to = 0.685Y, - 0.584Y2 + 0.400Y3 - 0.198Y4 I- 0.020U5 + 0.014Y6 + 0.017Y7 = 0.974.

The values of e,(U;0.697) are calculated using (8.3.10), and the values of W,(Y; 0.697) are calculated using (8.3.11). We have

e,(Y; 0,697) Y, - 0.697C0 = 1.432 - 0.697(0.974) = 0.753 , t = 1 , ={ Y, - 0.697et-,(Y; 0.697), t = 2 , 3 ,..., 1 0 0 ,

W,(Y; 0.697) to = 0.974, t = l , =( e,- ,(Y; 0.697) - 0.697v:_,(Y; 0.697), t = 2.3, . . . , 100.

The first five observations are displayed in Table 8.3.3. Regressing e,(Y; 8) on W,(Y; B ) gives a coefficient of 0.037 and an estimate of ) = 0.734. The estimated standard error is 0.076, and the residual mean square is 1.16.

Table 8.33. obsctnatlons for Regreapion Jibtimalion of anInitialEtimatcfor#J

Regression Coefficient Multiplier

i -5 of B

I 0.685 1 2 -0.584 -0.685 3 0.400 0.584 4 -0.198 -0.400 5 0.020 0.198 6 0.014 -0.020 7 0.017 -0.014

Page 452: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

AUTOREGRESSIVE MOVING AVERAGE TIME SERIES 429

Table 8.33. First Five Observations Used in Gauss- Newton ComputaUOn

r e,(K 0.697) W,(U; 0.697)

1 0.753 0.974 2 -0.868 0.074 3 -1.154 -0.920 4 - 1.732 -0.513 5 0.913 - 1.375

Maximum likelihood estimation of the parameters of moving average processes is discussed in Section 8.4. Most computer programs offer the user the option of specifying initial values for the likelihood maximization routine or of permitting the program to find initial values. Because of our preliminary analysis, we use 0.697 as our initial value in the likelihood computations. The maximum likelihood estimator computed in SAS/ETS@ is 0.725 with an estimated standard error of 0.071. The maximum likelihood estimator of the ecror variance adjusted for degrees of freedom is k2 = 1.17. The similarity of the two sets of estimates is a reflection of the fact that the estimators have the same limiting distribution. AA

The theory we have presented is all for large samples. There is some evidence that samples must be fairly large before the results are applicable. For example, Macpherson (1981) and Nelson (1974) have conducted Monte Carlo studies in which the empirical variances for the estimated first order moving average parameter are about 1.1 to 1.2 times that based on large sampIe theory for samples of size 100 and Odp $0.7. Unlike the estimator for the autoregressive process, the distribution of the nonlinear estimator of /3 for 8 near zero differs considerably from that suggested by asymptotic theory for n as large as 100. In the Macpherson study the variance of B for p = 0.1 and n = 100 was 1.17 times that suggested by asymptotic theory.

In Figure 8.3.1 we compare an estimate of the density of B for p = 0.7 with the normal density suggested by the asymptotic theory. The empirical density is based on 15,000 samples of size 100. Tfie estimator of /3 is the maximum likelihood estimator determined by a grid search procedure.

The estimated deosity for 4 is fairiy symmetric about 0.7 with a mean of 0.706. However, the empirical density is considerably flatter than the normal approxi- mation. The variance of the empirical distribution is 0.0064, compared to 0.0051 for the normal approximation.

8.4. AUTOREGRESSIVE MOVING AVERAGE TIME SERJES

In this section, we treat estimation of the parameters of time series with representation

Page 453: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

430 PfaRAMETw ESTIMATION

0.4 0.5 0.6 0.7 0.8 0.9 1 .O Parameter Estimate

FIGURE 83.1. Estimated density of maximum likeifiood estimator of f i compared with normal approximation for f i = 0.7 and n = 1.00. (Dashed line is normal density.)

P 4

Y, + C ajK-, = e, + E fie,-, , j = I i= I

where the e, are independent (0, u2) random variables, and the mots of P

~ ( m ; (u) = mp + (rim'-' = 0 , J " 1

(8.4.1)

(8.4.2)

and of 4

B(s; /3) = sq + 2 pisq-i = 0 (8.4.3) I - 1

are less than one in absolute value. Estimators of the parameters of the process can be defined in a number of

different ways. We consider three estimators obtained by minimizhg three

Page 454: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

AUMWRESSrVE MOVING AVERAGE TIME SERIES 431

different functions. If Y, is a stationary n o d time series, the logarithm of the likelihood is

L, , ( f ) = -0.51 log 2~ - 0.5 loglZyy(f)l - OSY'Z;;(f)Y (8.4.4)

where f ' = (6', v2), 6' = (al , a,, . . . , ap, &, &, . . . , @,I, Y' = (Y,, Y2,. . . , Y,). and 2,,=2,,(~) = E{YY').

The estimator that maximizes L,,(f) is often called the maximum likelihood estimator or Gaussian likelihood estimator even if Y, is not normal. Let a-22yAf)=M,,,(6). Then the maximum likelihood estimator of 6 can be obtained by minimizing

(8.4.5) Z n @ ) = n-llM,#)l I l n Y t Mij(8)Y.

The estimator of 6 obtained by minimizing

QJ6) = n-'Y'M;,!(B)Y (8.4.6)

is called the least squares estimator or the unconditional least squares estimator. An approximation to the least squares estimator is the estimator that minimizes

(8.4.7)

where the d,(6) are defined in Theorem 2.7.2. Observe that Q,,(6) is the average of the squares obtained by truncating the infinite autoregressive representation for el. In Corollary 8.4.1 of this section, we show that the estimators that minimize Zn(6), el,(@, and Qzn(6) have the same limiting distribution for stationary invertible time series.

To obtain the partial derivatives associated with the estimation, we express (8.4.1) as

P 9

el(& 6) = Y, + 2 - 2 p,el-i(Y; 6 ) . (8.4.8) j = 1 i= 1

Differentiating both sides of (8.4.8). we have

Using the initial conditions

Page 455: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

432 PARAMETER EST(MATI0N

w,,,(y,e)=o, j = i , 2 ,..., p ,

wpi,,(x e) = 0 9 i = 1,2, . , . , q ,

for t S p and e,-,(Y, 0) = 0 for t - i G p , the derivatives of QJ8) are defined recursively. The Wa,,(Y; 0) and WP,.,(Y; e) are autoregressive moving averages of Y,. Therefore, if the roots of (8.4.2) and (8.4.3) associated with the 8 at which the derivatives are evaluated are less than one in absolute value, the effect of the initial conditions dies out and the derivatives converge to autoregressive moving average time series as t increases.

The large sample properties of the estimator associated with Q2,(8) are given in Theorem 8.4.1. We prove the theorem for iid(0, g2) errors, but the result also holds for martingale difference errors satisfying the conditions of Theorem 8.2.1.

"heorem 8.4.1. Let the stationary time series Y, satisfj (8.4.1) where the el are iid(0,d) random variables. The parameter space Q is such that all roots of (8.4.2) and (8.4.3) are less than one in absolute value, and (8.4.2) and (8.4.3) have no common m u . Let 8' denote the true parameter, which is in the interior of the parameter space 8. Let 4 be the value of e in the closure of Q that minimizes Qz,(e)- men

and

- 1 0 2 P( 4 - O 0 ) 5 "0, v, (a ) 3 * (8.4.10)

where

Proof. Let (9, be a compact space such that, for any 6 E ee, the roots of (8.4.2) are less than or equal to one in absolute value, the roots of (8.4.3) are less than or equal to 1 - c in absolute value for some e > 0, and eo is in the interior of Se. For any 8 in S,,

where

Page 456: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

AUTOREORESSIVE MOVING AVERAGE TTME SERIES 433

and the polynomiaIs are defined in (8.4.2) and (8.4.3). The time series Z,(e) is a stationary autoregressive moving average because the roots of B(s; @) are less than one in absolute value. See Corollary 2.2.2.3, equation (2.7.14), and Theorem 6.3.5. The convergence is uniform by Lemma 5.5.5, because Z,(e) and its derivatives are infinite moving averages with exponentially declining coefficients. Now

defines the unique minimum variance prediction error for the predictor of U, based on q-l,q-z ,... - See Section 2.9. Therefore, if 6+8' and eEQ,,

V{Z,(@) ' V{z,(e">) - If any of the roots of B( 3; 8) are equal to one in absolute value, Q2,,(8) increases without bound as n -+m. It follows that the condition (5.5.5) of Lemma 5.5.1 is satisfied for 4 defined by the minimum of Q2,(f?) over the closure of 8. Hence, 4 converges to 8' in probability as n + m. Because &(a) is apcontinuous function of 8 thai converges uniformly to V{Z,(O)) on @,, and 4+eo, it follows that

The first derivatives of e,(Y; 0 ) are &lined in (8.4.9). The second derivatives e,,cr%> can be defined in a similar manner. For example,

Therefore, the second derivatives are also autoregressive moving follows that the matrix

averages. It

1

converges uniformly to B(B) = limf-.,m O.SE{W~,W,,), which is a continuous func- tion of 4 on some convex compact neighborhood S of 8' containing 8' as an interior point.

By the Taylor series arguments used in the proof of Theorem 5.5.1,

where B(6O) is defined in that theorem and r, is of smaller order than 4 - 8'.

Page 457: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

434 PARAMETER ESTIMATION

Now,

because the d,(B') decline exponentially. The vector WLl(Y 0') is a function of (Yl, Y,, . . . , and hence is independent of e,. Following the arguments in the proof of Theorem 5.5.1, we obtain the asymptotic n d t y of

ti

n-'', w;,(Y; @')el ,

and hence of nI"(4 - 0'). Theorem 5.5.1 does not apply directly, because Y, + Z;:; dJ(B)Y,-, are not independent and not identically distributed. However, as t increases, every element of W;,(Y; 6') converges to an autoregressive moving average time series in the c,, and we obtain V&'(U')~ as the covariance matrix of

1 - 1

the limiting distribution. A

The three estimators defined by (8.4.5), (8.4.6), and (8.4.7) have the same

Theorem 8.4.2. Let dm, and dl be the estimators obtained by minimizing (8.4.5) and (8.4.6), respective1 . Then under the assumptions of Theorem 8.4.1, the limiting distribution of n (I, - e') is equal to the limiting distribution of n'/2(d, - 8') and is that given in (8.4.10).

Proof. Let Yf = Y,(B) be an autoregressive moving average (p, q) with parameter 6 E Q, and uncmlated (0, a,) errors r,, where S, is defined in the proof of Theorem 8.4.1. Let

V = ( Y ~ - ~ , Y2-p , . . . , Yo, el-p, e2-,,, . . . , eo)

limiting behavior.

1 IY

* *

* * * * * * * * * * * * * * * Y,#, = (Y,, Y,, . . . , Yn), and e i =(e l , e,, . . . , en). Then

[:'I en =[;I t, +G> ::, where D = D(6) is the n X n lower triangular matrix with dii = d,@) = dl,-j,(f% d,(@) is defined in* Theorem 2.7.2, 0 is a ( p + q) X n matrix of zeros, 1 is ( p + q) X ( p + q), K = (I, K')' is a matrix whose elements satisfy

s, I 1 s i a p + q , 1 aj G p f q ,

- 2 p r k i - r + j - ai-J-q Y p + q + 1 S i S n + p + q,

- 5 p r l t - r , j 9 p + q + l S i S n + p + q , p + l S j S p + q ,

is the Kronecker delta. Note that the k, = k,(6) satisfy the same difference

1 d j G p , r- I

r-I I * *

tu =

and

Page 458: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

AUTOREGRESSIVE MOVING AVERAGE TIME SERIES 435

equation as the do(@), but with different initial values. Galbraith and Galbraith (1974) give the expressions

M;; = D'D - D'K(A-' + K'K)-'K'D (8.4.11)

and

where

The dependence of all matrices on 8 has been suppressed to simplify the expressions. Observe that Q2, = n-'Y'D'DY and

Q," - Q1, = n--'Y'D'K(A-' + KK)-'K'DY.

To show that Qzn - Q,, converges to zero, we look at the difference on the set 8,. Now

for some M < 00 and some O < A < 1, because the dr(8) and the elements of K decline exponentially. Also, A is a positive definite matrix for 8 E @,, and the determinant is uniformly bounded above and below by positive numbers for all n and al l 8 E 8, It follows that

SUP lQ,V) - Q,,(@)l= a.8.

We now investigate the ratio r,(e)[~,(e)]-' . Let the n x n lower triangular matrix T = T(8) define the prediction m r made in using the best predictor based on (y t - , , y l - z , . . . , U,) to predict Y,. See Theorem 2.9.1 and Theorem 2.10.1. Then,

rt.e~i3,

and

Page 459: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

436 PARAMETER ESTIMATION

where h,i >a2 and hi i+u2 as i+@. Also, letting 2;:; bjq-j be the best predictor of 6,

for some M < @ and O < A < 1. Thus,

* * for some M < 03 and & < ~ 0 . It foIIows that

i,,(e)[~~,,(e)j-~ = IMyje)lt'" c -+ 1.

* * Let the ( p + q)-vector Z = &e) = K'DY, where q(8) = Ey=l QlJ(8) and

n--i

Now the gJe), their first derivatives. and their second derivatives are exponential- ly declining in i for 6E ee. Also, the first and second derivatives of &,(@) are exponentially declining in i. The first and second derivatives of A are bounded on e. Therefore, r"(8) - Qln(f9), Qt,(6) - Q,,,(e), and Q,(e) - I, ,(@) and their first and second derivatives converge uniformly to zero in probability. By the stated derivative Properties of K(8) and A(@), the first and second

derivatives of n-'loglM,,(B)( converge uniformly to zero in probability for 8 E 69,. Therefore, the limits of l,,(@), of Q,,(8), and of Qzn(@) are the same, and the limits of the first and second derivatives of the three quantities are also the same. It follows that the limiting distributions of the three estimators are the same. A

We derived the limiting distribution of the estimators for the time series with known mean. Because the limiting behavior of sample autocovariances computed with mean adjusted data is the same as that for autocovariances computed with known mean, the results extend to the unknown mean case.

The mean squares and products of the estimated derivatives converge to the mean squares and products of the derivatives based on the true 8'. Therefore, the usual nonlinear least squares estimated covariance matrix can be used for inference.

There are a number of computer programs available that compute the Gaussian

Page 460: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

AUTOREORESSIVE MOVING AVERAGE TIME SERIES 437

maximum likelihood estimates or approximations to them. It is g o d practice to use the autoregressive representation technique introduced in Section 8.3 to obtain initial estimates for these programs. By Theorem 2.7.2, we can represent the invertible autoregressive moving average by the infinite autoregressive pmess

m

Y, = - 2 d,Y,-j + e, , j - I

(8.4.12)

where the d, are defined in Theorem 2.7.2. Hence, by terminating the sum at a convenient finite number, say k, and estimating the autoregressive parameters, d , , d,, . . . , d,, we can use the definitions of the dj to obtain initial estimates of the autoregressive moving average parameters.

In using a nonlinear method such as the Gauss-Newton procedure, one must beware of certain degeneracies that can occur. For example, consider the autoregressive moving average (1,l) time series. If we specify zero initial estimates for both parameters, the derivative with respect to a, evaIuated at a, =/3, = O is - Y,- ,. Likewise, the derivative with respect to P, evaluated at a, = /3, = 0 is Y,- , , and the matrix of partial derivatives to be inverted is clearly singular. At second glance, this is not a particularly startling result. It means that a first order autoregressive process with small at behaves very much like a first order moving average process with small PI and that both behave much like an autoregressive moving average (1,l) time series wbere both a, and 8, are small. Therefore, one should consider the autoregressive moving average (1 , l ) repre- sentation only if at least one of the trial parameter values is well away from zero. Because the autoregressive moving average (1,l) time series with at = /3, is a sequence of uncorrelated random variables, the singularity occurs whenever the initial values are taken to be equal.

In developing our estimation theory for autoregressive moving averages, we have assumed that estimation is carried out for a correctly specified model. In practice, one often is involved in specifying the model at the same time that one is constructing estimates. A number of criteria have been suggested for use in model selection, some developed with time series applications in mind. Because the estimated model parameters are asymptotically normally distributed and because the estimation procedures are closely related to regression, model selection procedures developed for regression models are also applicable to the auto- regressive moving average problem. A test based on such statistics was used in Example 8.2.1.

Another selection procedure is based on the variance of regression prediction, See Mallows (1973) and b i k e (1%9a). Assume one fits the regression model

y = X $ + e , (8.4.13)

where y is n X 1, X is n X k , /3 is k X 1, and e-(0, (r21). If one predicts the y-value for each of the observed X, rows of X, the average of the n prediction variances is

Page 461: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

438 PARAMETER ESTIMATION

(8.4.14)

where = X,B, @ = (X'X) X y, and k is the dimension of X,. Thus, one might choose the model that minimizes an estimator of the mean square prediction error. To estimate (8.4.14), one requires an estimator of cr2. The estimator of a2 that is most often suggested in the regression literature is

- 1 t

B2 = (n - k,)-'(y'y - @;x;x,B,), (8.4.15)

where X, is the model of highest dimension and that dimension is k,. Thus, the model is chosen to minimize

~ ~ ~ = n - ' ( n + k ) & ' . (8.4.16)

The use of the regression residual mean square for the particular set of regression variabIes being evaluated as the estimator of a2 generally leads to the same model selection.

A criterion closely related to the mean square prediction error is the Criterion called AIC, introduced by Akaike (1973). This criterion is

AIC = -2 log L(d) + 2 k , (8.4.17)

where L ( 6 ) is the likelihood function evaluated at the maximum likelihood estimator d, and k is the dimension of f2 For normal autoregressive moving average models,

-2 logL(6) = loglE,@)) + n + n log 27r,

where Eyy(b) is the covariance matrix of (Yl, Y2,. . . , YE) evaluated at 8 = b The determinant of ZY,(O) can be expressed as a product of the prediction error variances (see Theorem 2.9.3). Also, the variances of the prediction errors converge to the error variana u2 of the process as t increases. Therefore, 1 0 g ~ 2 ~ ~ ( d ) ~ is close to n log &:, where 8: = a:(@> is the maximum likelihood estimator of cr2. It follows that AIC and MPE are closely related, because

n-' log&,,(8)l+ 2n-'k=log[(l + 2n-'k)&~]=logMPE,

when the Ci2 of MPE is close to (n - k)-'nB;. The AIC criterion is widely used, although it is known that this criterion tends to select rather high order models and will overestimate the true order of a finite order autoregressive model. Because of the tendency to overestimate the order of the model, a number of related criteria have been developed, and considerable research has been conducted on the use of model selection Criteria, See Parzen (1974, 1977), Shibata (1976, 1986), Hannan

Page 462: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

AUTOReORESSIVE MOVING AVERAGE TIME SERIES 439

and Quinn (1979). Hannan (1980), Hannan and Rissanen (1982), Bhansali (1991), Findley (1983, Schwartz (1978), and Rndley and Wei (1989).

Example 8.4.1. The data of Table 10.B.2 of Appendix 10.B are artificially created data generated by an autoregressive moving average model. As a first step in the analysis, we fit a pure autoregressive model of order 10 by ordinary least squares. The fitted model is

$=- 0.099 + 2,153 q-, - 2.032 q-2 + 0.778 q-3 (0.104) (0.105) (0.250) (0.332)

(0.352) (0.361) (0.366) + 0.072 qV4 - 0.440 q-5 + 0.363 ?-6

- 0.100 Y,-, - 0.144 Y,-,+ 0.150 q-9- 0.082 (0.368) (0.355) (0.268) (0.1 13)

where the numbers in parentheses are the estimated standard errors of an ordinary regression program. We added seven observations equal to the sample mean to the beginning of the data set and then created ten lags of the data. This is a compromise between the estimators (8.2.5) and (8.2.8). Also, we keep the same number of observations when we fit reduced autoregressive models. There are 97 observations in the regression. The residual mean square with 86 degrees of freedom is 1.020. The fourth order autoregressive process estimated by ordinary least squares based on 97 observations is

$ = - 0.086 f 2.170Y,-, - 2.074 q-2+ 0.913 yt - -3 - 0.209 x-4 (0.102) (0.099) (0.224) (0.225) (0.100)

with a residual mean square of 0.996. This model might be judged acceptable if one were restricting oneself to autoregressive processes because the test for the fourth order versus the tenth order gives F = 0.63, where the distribution of F can be approximated by Snedecor’s F with 6 and 86 degrees of freedom.

We consider several alternative models estimated by Gaussian maximum ~kelihood in Table 8.4.1. All calculations were done in SAS/ETS*. The estimate

I Table8.4.1. ComparLson d Alternative Models Estl- mated by Maximum Likelihood Model

AR( 10) 1.005 301.4 1.115 0.996 294.5 1.046

ARMA(2.1) 1.032 296.9 1 .w4 -(222) 0.991 294.0 1.041 GRMA(2.3) 0.990 295.0 1 .om ARMA(3.1) 1.010 295.8 1.061 ARMA(3.2) 0.992 295.2 1.052

Page 463: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

of u2 given in the table is the maximum likelihood estimatol djusted for degrees of freedom,

e2 = (n - r)-I tr{YlM;;Y},

where Y‘ = (Y, , Y,, . . . , Y,), M,, is the estimate of My,, E,, = v Z M y y , 8, is the II X n covariance matrix of Y, and r is the total number of parameters estimated. The mean prediction e m was calculated as

W E = n-’(n + r)S2,

where n = 100, and the AIC is defined in (8.4.17). Several of the models have similar properties. On the basis of the AIC criterion,

one would choose the autoregressive moving average (2,2). For this model, the estimated mean is -0.26 with a standard error of 0.56, and the other estimated parameters are

(&’, 4, &, b,) = (- 1.502, 0.843, 0.715, 0.287). (0.109) (0.110) (0.061) (0.059) AA

Example 84.2. As an example of estimation for a prows containing several parameters, we fit an autoregressive muving average to the United States monthly unemployment rate from October 1949 to September 1974 (300 observations). The periodogram of this time series was discussed in Section 7.2. Because of the very large contribution to the total swn of squares from the seasonal frequencies, it seems reasonable to treat the time series as if there were a different mean for each month, Therefore, we analyze the deviations from monthly means, which we denote by U,. The fact that the periodogram ordinates close to the seasonal frequencies are large relative to those separated from the seasonal fquencies leads us to expect a seasonal component in a representation for Y,.

and V,-,,. Notice that we are anticipating a model of the “component” or “multiplicative” type, so that when we include a variable of lag 12, we also include the next three lags. That is, we are anticipating a model of the form

(1 -e,a-e,se2-e,a3xi - ~ 4 ~ 1 2 - ~ ~ s e 2 4 - e 6 s e 3 6 - e 7 g e 4 8 ) ~ = V, - e , ~ , - ~ - e , q 2 - e,y,-, - e,y,-,,

+ e,e,u,-,, + e2e4qMl4 + . - - + e,e,q_,, = e, . In calculating the regression equation we added 36 zeros to the beginning of the

data set, lagged Y, the requisite number of times, and used the last 285 observations in the regression. This is a compromise between the forms (8.2.5) and (8.2.8) for the estimation of the autoregressive parame€ers. The regression vectorsfortheexplanatoryvariables Y,-,, Y,-2, q-3r yI-12,U,--13, Y,-14,andY,-,,

Page 464: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

AUTOREGRESSIVE MOVING AVERAGE TLME SERIES 441

Table8.4.2. Regression CoeW~ents Obtained in P m liminary Autoregressive Fit to United States Monthly Unemployment Rate

Standard Error Variable Coefficient of Coefficient

Y- 1 1.08 0.061 T - 2 0.06 0.091

L I Z 0.14 0.060 Y - 3 -0.16 0.063

Y - L 3 -0.22 0.086 Y - 1 4 0.00 0.086 Y- LS 0.06 0.059 u,-24 0.18 0.053 Y - 2 5 -0.17 0.075 YI-26 -0.09 0.075 x - - 2 7 0.09 0.054 u,-36 0.08 0.054 Y-3, -0.13 0.075 Y-3, 0.04 0.075 Y - 3 9 0.00 0.054

Y - 4 9 -0.01 0.075 Y - S O -0.01 0.075 L, -0.09 0.053

K-.a, 0.11 0.055

contain all observed values, but the vectors for longer lags contain zeros for some of the initial observations.

The regression coefficients and standard e m n are given in Table 8.4.2. The data seem to be consistent with the component type of model. The coefficients for yI-,,,-, are approximately the negatives of the coefficients on Y,-lzi for i = 1,2,3. Even more consistently, the s u m of the three coefficients for U, - z, -, for j = 1,2,3 is approximately the negative of the coefficient for yI-lz,, i = 1, 2, 3, 4. The individual coefficients show variation about the anticipated relationships, but they give us no reason to reject the component model. The residual mean square for this regression is 0.0634 with 254 degrees of freedom. There are 285 observations in the regression and 19 regression variables. We deduct an additional 12 degrees of freedom for the 12 me8118 previously estimated. We also fit the model with U,- x- , , and the componding lags of 12 as well as the model with y I - , , yt-2. YtT3. yI-.,, qPs, and the corresponding lags of 12. Since the coefficient on y IW3 is almost twice its standard error, while the coefficients on yI-4 and yt-5 were small, we take the third order autoregressive process as our tentative model for the nonseasonal component.

The coefficients for yt-,2, yI-24, yI-36, yI-,, are of the same sign, are of small magnitude relative to one, and are declining slowly. The autoregressive co-

Page 465: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

442 PARAMETER ESTMAllON

efficients for an autoregressive moving average can display this behavior. Therefore, we consider

as a potential model. On the basis of Theorem 2.7.2, the regression coefficients for lags of multrples of 12 should satisfy, approximately, the relations of Table 8.4.3. Regressing the first column of that table on the second two columns, we obtain the initial estimates 8 = 0.97, B = -0.83. The estimate of $I is of fairly large absolute value to be estimated from only four coefficients, but we are only interested in obtaining crude values that can be used as star& values for the nonlinear estimation. The maximum likelihood estimate of the parameter vector of model (8.4.18) using (1.08, 0.06, -0.16, 0.97, -0.83) as the initial vector is

(J,, 4, a3, 4 ))=(1.152, -0.002, -0.195, 0.817, -0.651)

with estimated standard errors of (0.055, 0.085, 0.053, 0.067, 0.093). The residual mean square e m is 0.0626 with 268 degrees of freedom. Since this residual mean square is smaller than that associated with the previous regressions, the hypothesis that the restrictions associated witb the autoregressive moving average representa- tion are valid is easily accepted. As a check on model adequacy beyond that given by our initial regressions, we estimated four alternative models with the additional terms $+, e, - I , $-,,, In no case was the added term significant at the 5% level using the approximate tests based on the r e p s i o n statistics.

One interpretation of our final model is of some interest. Define X, = Y, - l.lSZY,-, + 0.002Y,-, -t 0.195Y,-,, Then XI has the autoregressive moving aver- age representation

X, = 0.817X,-1, - 0.651e,-,, + e, ,

where the e, are uncomlated (0, 0.0626) random variables. Now XI would have this representation if it were the sum of two independent time series X, = S, + u,, where v, is a sequence of uncorrelated (0,0.0499) random variables,

S, = 0,817S,-12 + u, ,

Table 8.4.3, cplculstfon of Initial Estimates of S and j#

Regression Coefficients Multipliers Multipliers

j -4 for s for #3

1 0.14 1 1 2 0.18 0 -0.14 3 0.08 0 -0.18 4 0.11 0 -0.08

Page 466: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PREDICTION WITH ESTlMATED PARAMETERS 443

and u, is a sequence of (0,0.0059) random variables. In such a representation, S, can be viewed as the "seasonal component" and the methods of Section 4.5 could be used to construct a filter to estimate S,. AA

8.5. PREDICTION WITH ESTIMAmD PARAMETERS

We now investigate the use of the estimated parameters of autoregressive moving average time series in prediction. Rediction was introduced in Section 2.9 assuming the parameters to be known. The estimators of the parameters of stationary finite order autoregressive invertible moving average time series discussed in this chapter possess errors whose order in probability is n-"2. For such time series, the use of the estimated parameters in prediction increases the prediction error by a quantity of OP(n-I'*).

Let the time series Y, be defined by

where the roots of

and of

9

rq + fir"-' = o

are less than one in absolute vdue and the e, are independent ( 0 , ~ ' ) random variables with E{ef} = 7p4. Let

i= 1

denote the vector of parameters of the process. When 8 is known, the best one-period-ahead predictor for U, is given by

Thewems 2.9.1 and 2.9.3. The predictor for large n is given in (2.9.25). The large n predictor obtained by replacing 9 and #?, in (2.9.25) by &, and ,&, is

where

Page 467: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

444 PARAMETER ESTIMATKlN

t = p -4 + I , p -4 + 2, * . . ? P YO*

(8.5.3)

Theorem 8S.1. Let yI be the time Series defined in (8.5.1). Let 6 be an estimator of 8 =(-a,, . . , , -apt p,, . . . , p,)' such that d- 8 = Up(n-1'2). Then

where f , + , ( Y , , . . . , U,) is defined in (2.9.25) and <+l(Yl , . . ., Y,) in (8.5.2).

Proof. We write D D

where 8' is between 6 and 8 and, for example,

and P ll

For I such that the mots of the characteristic equations are less t,an one in absolute value, the derivatives multiplying 6, - are, except for the initial effects,

A stationary time series. since 8- = ~~(n-''~), the result ~OIIOWS.

Theorem 8.5.1 generalizes immediately to predictions s peniods ahead. On the basis of th is result, the prediction variance formulas of Section 2.9 can be used as approximations for UK predictor fn+*(Y1,. . . , Y.1.

Additional results are available for the e m r in predictions for the stationary p-th order autoregressive process. Let Y, be the stationary process satisfying

D

(8.5.4)

Page 468: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PREDICTION WITH ESTIMATED PARAMmRS 445

A =

where the roots of

JI

(8.5.5)

are less than one in absolute value. and the e, are independent (0, cr2) random variables. We can also write

Y, =AY,-, fe , , (8.5.6)

where Y, = (U,, U , - l , . . . , y I - p + l , l)’,e, = (e,,O,. . . , O)’, and

--f% -cr, -a3 * . - - 9 - 1 -5 a,

1 0 0 * - * 0 0 0 0 I 0 * - . 0 0 0 0 0 1 .-. 0 0 0

0 0 0 . - - 1 0 0 I 0 0 0 .-. 0 0 1

The least squares estimator of a = (-ai, -%,. . . , --ap, a$‘ is

(8.5.7)

and the estimator of A, denoted by b, is obtaiwd from A by replacing the q with the estimator $. Let the least squares predictor of Y,,,, based upon (8.5.7). be

En+, = XY, (8.5.8)

for s 3 1.

Theorem 85.2. Let the model (8.5.4) hold with the e, independent identically distributed random variables with a symmetric distribution function. Let U,, Y2, . . . , Yp be symmetrically distributed with finite variance, independent of ep+ , , ep+2, . . . . Assume the distributions are such that

for n > p . Then, for n >p ,

h f . The least squares estimators 4. i = 1.2,. . . , p . can be expressed as functions of

Page 469: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

446 PARAMETBR ESTIMATJON

Therefore, the &,, i = 1,2,. . . , p, are even functions of (U,, Y,, . . . , Y,), and &,, is an odd function of (Yt, Y,, . . . , Y,). Note that adding a constant to each Y, will change the predictor by the same amount. Therefore, there is no loss of generality in the assumption that a, = 0. The prediction error is

S - 1

Y,+, - g,,, = 2 Aie,,+s-i + (A‘ - As)Y, . i=o

The last element of the last co~umn of iz” is one. AII other entries of the last column are multiples of $, where the multiples are functions of &,, &,, . . . ,ap and the functions may be zero for sma&l s. Let (Y,, Y,, . . , , Y,,) = Y be a realization, and let (-U,, -Y2,. . . , ;Yn) = 3 Then the value of (A“ - AJ)Y, for Y is the negative of the value for 2 The result follows because E{e,}=O, and

A because the distribution of (U, , Y,, , . . , Y,) is symmetric.

We now obtain an approximation for the variance of the prediction emr, conditional on (Y,, Y,-], . + . , Yn-p+ ,).

Theorem 83,3. Let Y, be a stationary normal time series satisfying the model (8.5.4) with the roots of (8.5.5) less than one in absolute value. Then the mean square errm of the predictor ?,+s of (8.5.8), conditional on Y,,, is

E{(Yn+j - Qn+s)(Yn+s - Qn+,>’ IYnI

where r = EOI’,Y:), llR,,]lz = t’R;R,,, Y,, = A’Y, forj = 0,1,. . . , s, and M is a matrix with one as the upper left element and zeros elsewhere.

Proof. We have . - I . . ,

Y,+, = A‘Y, + A’en+s-j , I-0

Page 470: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PREDICTTON WR'H ESTIMATED PARAMETERS 447

where Ao = I. Therefore, the e m in the predictor is r-1

-*S

Y,,, - q,,, = A'e,,,, - (A - As)Y, 1=0

It follows that

H(Ym+s - Q n + s M Y n + s - Qn+s)' IYnI s-1

= c2 2 A'MA" + E{(X - As)YnYi(A' - A")' I Y,} . i - 0

Because A = A + Op(n-1'2), we may expand b" in a first order Taylor series about A to obtain

a - I

By the normality, E{I& - #I*) = O(n-*), and we obtain

j - 0

(8.5.10)

where E(([R,,Il} = and E(IIRl,([ I Y.} = ~ ? ~ ( n - ~ " ) . Note that for t > n, q > n the upper left element of (A - A)TrTi(A - A)' is

(& - LT)'~,P:(& - LT) and all other elements are zero. Themfore, because P, and P, are functions of Y,,

where

Under om assumptions,

and

,qi&,l~ = 0 ( ~ - 3 ' ~ ) . A Note that the upper left element of A'MA'' is w/', where the w, satisfy the

Page 471: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

448 PARAMETER ESTlMATloN

difference equation (2.6.4). For the first order r s s ( p = I), the expression for the conditional mean square error of Y,+,f - n + r reduces to

where p = E { q ) and ~ ( 0 ) = E{(Y, - p)'}. If the mean is known to be zero and not estimated, we obtain for p = 1

2 - I E{(Y~+ I - fa+ 8 1' IYnI = (I"~ + Y,Y, (0) + Rn *

The unconditional mean square error of g,,, is the expected value of (8.5.9).

Cod1ai-y 8S3. Let the assumptions of Theorem 8.5.3 hold. Then

s- 1 r - l s-I

= (I"' 2 A'MA'' + n- 'u2 2 2 A'MA" (8.5.11) j = O j = O k=O

x tr[(As-'-'r)'(r-'As-k-')] + O ( ~ J - ~ ' ~ ) .

Proof. omitted A

The approximate mean square error of the predictor given in (8.5.9) can be expressed in scalar form as

(8.5.12)

where the wi are defined in Theorem 2.6.1. For s = 1, we have 1 2 ' E{(Y,+, - f m + l ) 2 1 ~ n ) = ~ 2 + ~ - (I" Ynr-Iyn

an approximation that is often used. Let

n

# = (n - p - l)-I x (y, - &'Y,-l)2 f m l

and f=n-'Zy,, Y,-lY;-l. An estimator, q(Yn+s-fIn.+I), of the mean square

Page 472: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

PREDICTION WITH ESTIMATED PARAMEfERS 449

error of prediction may be obtained from (8.5.9) by ignoring the remainder term and replacing A, r, and c2 by their estimators A, f, and a2, respectively. Let

s- I

V(Y,+, - P,,,) = a2 2 A k A t j j - 0

j = O k = O

(8.5.14)

For s = 1, equation (8.5.14) reduces to the estimated variance of the e m r in predicting Y,+l obtained by treating the problem as a hed-X regression prediction of Y,+, given Xn+l = Y,. Predictions more than one period ahead are nonlinear functions of the estimated parameters.

Example 85.1. A number of computer programs are available to compute predictions. Procedure ARIMA of SAS/ETS@ was used to estimate models in Example 8.4.1. That program computes predictions that are essentially the estimated minimum mean square error predictor. The estimated prediction variance for the s-period prediction computed by the program is the formula that ignores the estimation error,

.--I " 1

j = O (8.5.15)

where 9 are estimators of the coefficients of Theorem 2.7.1. Predictions for the next three periods using the estimated autoregressive moving average (2,2) of Example 8.4.1 are -7.96, -5.17, and -1.14 for one, two, and three periods, respectively. The standard errors output by the program are 1.00, 2.42, and 3.67 for one, two, and three periods, respectively.

If we use Gaussian maximum likelihood to estimate the fourth order auto- regressive process, we obtain the model

- j%= 2.188 (Y,-, - I;) - 2.130 ( K - 2 - j % ) (0.098) (0.220)

(0.220) (0.098) + 0.977 (Y,-3 - j % ) - 0.232 (q-4 - f i ) ,

where I; = -0.270 (0.493). The predictions for the next three periods based on the fourth order autoregressive model are -8.11, -5.37, and -1.30 for one, two, and three periods, respectively. The standard errors that do not contain a component for estimation error, as output by the program, are 1.00, 2.40, and 3.58 for one, two, and three periods, respectively. To illustrate the effect on the estimated standard e m of the component due to

estimating parameters, we use the ordinary least squares estimates of the fourth

Page 473: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

450 PARAMETER ESTIMATlON

order autoregression. The estimated model is given in Example 8.4.1. To compute predictions, we use the method of indicator variables and a nonlinear regression program. To construct predictions for three periods, we write the regression model

y, = eo -t- e,x,, + e2xf2 + e3xt3 + e4xt4 + e5x,, + ( ~ 6 - ~ ~ ~ 5 ) x 1 6 + ( ~ - e 1 e 6 - e ~ e ~ ) x t 7 + e t ,

where the regression variables are defined in Table 8.5.1. The estimator of (e,,e,,6,) obtained by nonlinear least squares is the predictor of (Yn+,,Yn+2,Yn+3). The estimated standard errors for (&, J6, 4) output by a nonlinear regression program are standard errom for the predictions containing a component for estimation error. The estimated predictions and the estimated standard errors for the nonlinear regression are given in the last column of Table 8.5.2.

The next to last column of Table 8.5.2 contains tbe standard errors computed with the formula (8.5.15) using the ordinary least squares coefficients. The differences between the standard errors of the last two columns are due to the

Table 8.5.2. Alternative Predictors for Data of Example 8.4.1

Method and model

Prediction ML, ARMA(2,Z) ML, AR(4) OLS, AR(4) OLS, AR(4) Period (8.5.15) (8.5.15) (8.5.15) (Nonlinear) n + l -7.96 -8.1 1 -8.17 -8.17

(1.W ( 1 *w ( 1.w (1.01)

n + 2 -5.17 -5.37 -5.51 -5.51 (2.42) (2.40) (2.38) (2.45)

n + 3 -1.14 - 1.30 - 1.48 - 1.48 (3.67) (3.58) (3.55) (3.68)

Page 474: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

NONLINeAR pRocEssBs 451

estimated effect of the estimation error. The estimation error makes a larger contribution to the estimated standard errors for the longer predictions.

Table 8.5.2 also contains predictions and standard errors for two models estimated by maximum likelihood, The maximum likelihood estimates of the autoregressive process differ slightly from the ordinary least squares estimates, and hence the predictions differ slightly. The autoregressive moving average predic- tions differ somewhat from the predictions based on the autoregressive model, but the differences are smaU relative to the standard errors. The differences among the predictions for the different models are larger than the differences among the estimated standard errors. AA

8.6. NONLINEAR PROCESSES

In this section, we consider estimation for time series that fall outside the class of finite parameter autoregressive moving averages. As we have seen, estimation for autoregressive time series has many analogies to linear regression theory. Estimation for moving average time series is a particular nonlinear estimation problem, in that the model can be expressed as an autoregression in which the coefficients satisfy nonlinear restrictions. We now study extensions in which the conditional expectation of Y, given ( Y t - l , . . . , Yo) is a nonlinear function of (u, -,.. . . . Yo).

Assume

Y, =fiz,; e) + e, , t = i , 2 , . . . , (8.6.1)

where Z, = (Y 1-1' Y t - 2 : : . . , Yo), 8 = (el, 0,. . . . , Ok)' is the parameter vector, and {e,} is a sequence of iid(0, a') random variables. Let d be the value of 8 that minimizes

AssumeflZ,; 6 ) has continuous first and second derivatives with respect to 6 for all t on a convex compact set S, S C 8, where the true value 8' is an interior point of S, and 8 is the parameter space. The model (8.6.1) is the same as the model (5.5.1) of Chapter 5. In Theorem 5.5.1, we gave conditions under which the nonlinear least squares estimator has a normal distribution in the limit. Basically, the derivatives must satisfy some convergence criteria.

In ordinary fitting problems, where U, is known to depend on a vector Z,, it is common practice to approximate an unknown functional relationship with a polynomial in Z,. The fitted function may be used directly as an approximation to the conditional expectation, or it may erne as an intermediate step in developing a nonlinear model with realistic properties. We demonstrate that similar approxi- mations can be used in time series analysis.

Page 475: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

452 PARAMf,lER ESTIMATION

Theorem 8.6.1. Let U, be a strictly stationary time series wsth finite 2(r+ 1) moments for some positive integer r. Let XI be a vector of polynomials in lags of r,:

XI = (1, K-1, q - 2 , . . . , KTpr Y I _ , t 2 U , - l U , - , * * * 9 y:-,> *

Assume

Y, =flZ,; 9) + el,

where {e,} is a sequence of iid(0,a’) random variables, 2, is the vector (r,-l, Y,-t, . . . , Yo), and 6 is a vector of parameters. Assume

where E{(Y,, Xl)‘(q, XI)} is positive definik Let

and

where = X I &

Proop. We have

By construction, E{XI’[f lZl; 6) -XI&) = 0. Also,

(8.6.5)

by the assumption (8.6.3). We demonstrate a stronger result in obtaining the order of 8- 6. By

assumption, el is independent of K-,, for h 9 1. Therefore, letting dI-, denote the sigma-field generated by (q- U,-’, . . .), X;e, satisfies the conditions of Theorem 5.3.4 because

Page 476: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

NONUNEAR PROCESSES 453

[ E { ~ X ; x r } ] - ' i 1 5 1 x;x,e,i .

Therefore,

and g- & =pp(n-"2). Because 6- 5: = OP(n-"'),

where we have used (8.6.3). A

It is a direct consequence of Theorem 8.6.1 that polynomial regressions can be used to test for nonlinearity in the conditional expectation.

Corollary 8.6.1. Let U, be the strictly stationary autoregressive process satisfy ing

P

Y, = tr, + C el&-, + e, . r = l

where {e,} is a sequence of iid(0, u2) random variables. Assume U, satisfies the moment conditions of Theorem 8.6.1, includinfi the assumption (8.6.3). Let 8 be defined by (8.6.5). where %= (&, &) and 6, is the vector of coefficients of (1, q-1,. . . , &-,I. Let

n

B2 = (n - Q-' 2 (u, - fy, 1-1

where k is the dimension of X, and = X,& Then

where

and the partition of f i confoms to the partition of 8.

Page 477: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

454 PARAMEER ESTIMATION

Proof. Under the assumptions,

andf(Z,; 8) - X,& = 0. The limiting normal distribution follows from the proof of Theorem 8.6.1. A

It follows from Corollary 8.6.1 that the null distribution of the test statistic

where k, is the dimension of t2 and k is the dimension of 8, is approximately that of Snedecor's F with k, and n - k degrees of freedom. The distribution when ffZ,; t9) is a nonlinear function depends on the degree to which fiZ,; 8) is well approximated by a polynomial. If the approximation is good, the test statistic will have good power. Other tests for nonlinearity are discussed by Tong (1990, p. 221), Hinich (1982), and Tsay (1986).

There are several nonlinear models that have received special attention in the time series literature. One is the threshold autoregressive model that has been studied extensively by Tong (1983, 1990). A simple first order threshold model is

where

(8.6.7)

The indicator function, s(Y,- ,, A), divides the behavior of the process into two regimes. If Y,-, 2 A , the conditional expected value of Y, given Y,-l is (6, + e2)Y,-,. If Y,-, < A , the conditional expected value of Y, given Y,-, is e,Y,-,. Notice that the conditional expected value of U, is not a continuous function of Y,- I for e2 # 0 and A # 0.

Threshold models with more than two regimes and models with more than one lag of Y entering the equation are easily constructed. Also, the indicator function 4.) can be a function of different lags of Y and (or) a function of other variables. Practitioners are often interested in whether or not a coefficient has changed at some point in time. In such a case, the indicator function can be a function of time.

Most threshold models do not satisfy the assumptions of our theorems when the parameters specifying the regimes are unknown. The conditional expected vaiue of Y, given Y,-* defined by threshold model (8.6.6) is not a continuous function of Y,- , , and the conditional expected value is not continuous in A. Models that are continuous and differentiable in the parameters can be obtained by replacing the function 4.) defining the regimes with a continuous differentiable function.

Page 478: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

NONLINEAR PROCESSES 455

Candidate functions are continuous differentiable statistical cumulative distribution functions. An example of such a model is

* U, = e, w,- I + e 2 m - 2 . K)L + e, , (8.6.8)

(8.6.9)

where 6(x, K) is the logistic function. The parameter K~ can be fixed or can be a parameter to be estimated. The conditional mean function of (8.6.8) is continuous and differentiable in (el, f?,, K,, q) for tcI in (0,"). Tong (1990, p. 107) calls models with s(.) a smooth function, such as (8.6.9), smoothed threshold autoregressive models. See Jones (1978) and Ozaki (1980).

&x, K) = [I + exdxc,(x - K2))~ -1 , *

Example 8.6.1. One of the most analyzed realizations of a time series is the series on the number of lynx trapped in the Mackenzie River district of Canada based on the records of the Hudson Bay Company as compiled by Elton and Nicholson (1942). The data are annual recurds for the period 1821-1934. The biological interest in the time series arises from the fact that lynx are predators heavily dependent on the snowshoe hare. The first statistical model for the data is that of Moran (1953). Tong (1990, p. 360) contains a description of other analyses and a detailed investigation of the series. We present a few computations heavily influenced by Tong (1990). The observation of analysis is log,,, of the original observations. The sample mean of the time series is 2.904, the smallest value of W, - f is - 1.313, and the largest value of U, - is 0.941. The second order autoregressive model estimated by ordinary least squares is

8 = 1.057 + 1.384 U , - l - 0.747 U , - 2 , (0.122) (0.064) (0.m) (8.6.10)

where 6' = 0.053, and the numbers in parentheses are the ordinary least squares standard errors.

Let us assume that we are interested in a model based on U,-l and q-2 and are willing to consider a nonlinear model. For the moment, we ignoG information from previously estimated nonliiear models. If we estimate a quadratic model as a first approximation, we obtain

j , = 0.079 + 1.366 yl-l - 0.785 Y, -~ + 0.063 Y:-, (0.034) (0.073) (0.068) (0.159)

2 + 0.172 Y , - ~ Y , - ~ - 0.450 JJ,- .~ (0.284) (0.172)

where y, = U, - p = Y, - 2.904 and &' = 0.0442. Also see Cox (1977). The ordinary regression F-test for the hypothesis that the three coefficients of the quadratic terms are zero is

F(3,106) = (0.0442)-'0.3661= 8.28,

Page 479: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

456 PARAMETER ESTIMATION

and the hypothesis of zero values is strongly rejected. The coefficient on yfbI is small, and one might consider the estimated model

j t = 0.089 + 1.350 yt-! - 0.772 y,-2 + 0.272 Y , - I Y , - ~ - 0.497 Y:-2 i

(0.027) (0.059) (0.059) (0.129) (0.123) (8.6.1 1)

where Ci2 = 0.0438. The estimated conditional expectation of Y,, given (y t - , , Y,-J, is changed very

little from that in (8.6.11) if we replace yIA2 in the last two tenns of (8.6.11) by a bounded function of Y , - ~ that is nearly linear in the interval (-1.1,l.l). Such a function is

g(Y,-,; K t , K2) = 11 4- exp(K,(Y,-, - K2)}1-' - (8.6.12)

If K~ is very small in absolute value, the function is approximately linear. As K,(Y, -~ - K ~ ) moves from -2 to 2, g(y,-,; K ~ , K ~ ) moves from 0.88 to 0.12, and as - K?) moves from -3 to 3, g(y,-2; q, K ~ ) moves from 0.9526 to 0.0474. If K~ is very large, the function is essentially a step function. Also, as K'

increases, the derivative with respect to K, approaches the zero function except for values very close to K ~ . The range of y, = Y, - y' is from - 1.313 to 0.94 1. Tbus, g(y,-,; -2.5,O) is nearly linear for the range of the data.

The estimated equation obtained by replacing y,-, with g(y, -2; -2.50) in the last two terms of (8.6.11) is

(8.6.13)

with Ci2 = 0,0432. We note that g(y,-,; K,, %) converges to the indicator function with jump of height one at the point K~ as K~ -+ -w. Therefore, the threshold model is the limit 8s ( K , , K2) 3 (-a, 22), where g2 is the point dividing the space into two regimes.

We fit the nonlinear model obtained by letting K , and K~ of (8.6.12) be parameters to be estimated. In the estimation we restricted K, to be the interval [-15, -0.51, and the minimum sum of squares m u d on the boundary K, = -15. The estimated function is

(8.6.14)

where Bz = 0.0420 and the standard error of k2 is 0.078. The standard errors are computed treating K, as known, If the standard errors are computed treating K~ as

Page 480: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

NONLINEAR PROCESSES 457

estimated, the standard error of 2, is larger than the estimated value, This reflects the fact that the derivative with respect to K~ approaches zero as tcI 4 -a.

The residual sum of squares for the model (8.6.14) is 4.448, while that for (8.6.13) is 4.620. The reduction due to fitting K~ and tcZ is 0.172, and the F for a test against (-2.5,O) is 2.05, which is less than the 5% tabular value of 3.08. If we assumed that we were only estimating tc2, then the improvement in the fit would be significant. The K, of - 15 means that about 76% of the shift occurs in an interval of length 0.27 centered at the estimated value of 0.357. The estimated variance of the original time series is 0.314. Therefore, the interval in which most of the estimated shift takes place is about one-half of one standard deviation of the original time series.

A threshold model fitted using g2 to define the regimes is

0.102 f 1.278 yral - 0.456 Y , - ~ if y,-,<0.36, (0.030) (0.071) (0.086)

0.166 + 1.527 yt-, - 1.239 y,+ if y , - , 3 0 . 3 6 , i (0.152) (0.108) (0.264)

jl =

where 8* = 0.0445. Tong (1990) has given threshold models with smaller residual mean square.

It is reasonable to conclude that the process generating the data is nonlinear, but a number of nonlinear models are consistent with the data. Selection of a final model would rest heavily on subject matter considerations. AA

A second nonlinear model that has been studied extensively is the bilinear model. See, for example, Granger and Andersen (19781, Subba Rao and Gabr (1984). and Tong (1930). The first order model can be written

q=a,Y,-, +BIY,- Ie,- , + e , , (8.6.15)

where el - NI(0, a2). Sufficient conditions for (8.6.15) to be a stationary process are a: + &v2 < 1 or E&j + e,l} < 1. Both conditions are intuitive in that they are analogous to a requirement that, on average, the coefficient of Y,- , be less than one in absolute value. See Tong (1981) and Quinn (1982).

A third class of nonlinear models is the random coefficient autoregressive models. An example of such a model is

where

a: = (a,,, a,,, . . . , a,,), and e, is independent of a,. Nichols and Quinn (1982) is a treatment of such models.

Page 481: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

458 PARAMETER ESTIMATION

8.7. MISSING AND OUTLIER OBSERVATIONS

Estimation with missing observations requires a specification of the mechanism that is responsible for the observations being missing. The most common specification is that observations are missing at random. See Rubin (1376) and Little and Rubin (1987). Under the missing-at-random specification, the likelihood can be written as

(8.7.1)

Y, is the vector of Y, actually observed, and Y, is the vector of missing observations. The subscripts of Y are placed in square brackets because they do not necessarily correspond to the index t.

A simple missing mechanism specifies some probability p,,, = q that an observation is missing and that this probability is equal for all observations. A more general model would permit the probability to depend on the index t, but in order to estimate 8 by maximizing only L(Y,; 8). the probability that the observation is missing must not be a function of Y.

Assume that Y, is a normal stationary autoregressive moving average of order ( p , q) and that m observations in a sample of size n are missing at random. Then

where YB is the (n -m)-dimensional vector of observations and Z,, is the covariance matrix of Y,. While it is relatively easy to write down the likelihood, there has been considerable research on computational methods.

Jones (1980) suggested using the Kalman filter representation of Section 4.6 to construct maximum likelihood estimators for autoregressive moving averages. Let

where r g t, s = max(p, q -k l), and E{Y,+] I r} = E{Y,+, I (Y,, Yz, . . . , Y,)). Let XI,?,, denote thejth element of X,,,, and abbreviate XI,, to X,. The Kalman recursion IS initiated at t = 1 with X, of (4.6.43) equal to 0, and I;vvoo equal to the covariance matrix of XI. Let the sample be complete. Then one can write the log-IikeEihood as

n 4

(8.7.3)

n log(2.rrtr2) + 2 log u,, + 2 o,'<Y, - I= I 1 3 1

Page 482: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MISSING AND OUTLIER OBSERVATIONS 459

where k,,,,f- , is the best linear predictor of Y, given (Y,-,, . . , , Y,), and

%=vfu, - ~ f , l , f - , l - (8.7.4)

We have suppressed the dependence of u,, and 9 ,.,,,-, on the parameters to simplify the notation. In Example 4.6.4, we illustrated how the usual Kalman recursion formulas can be used in the presence of missing data. Thus, the maximum likelihood estimators can be obtained by maximizing (8.7.3) with respect to the unknown parameters, using the formulas of Section 4.6, where the sum in (8.7.3) is over the elements of Y,.

If there are a modest number of missing values, the method of indicator variables can be used in conjunction with an ordinary maximum likelihood program to construct the maximum likelihood estimators. We illustrate the procedure in hample 8.7.1.

Example 8.7.1. The data of Table 8.7.1 are computer generated observations from a second order autoregressive process. We assume that observations 40, 57, and 58 are missing at random. One method of computing the maximum likelihood estimates under this assumption is the method of indicator variables illustrated in the construction of predictions in Example 8.5.1.

Tab& 8.7.1. Data for Example 8.7.1

1.467 7.181 8.939 5.810

-0.267 -4.053 -5.427 -3.624 -0.704

1.256 I

2.162 3.232 1.076

-2.572 -4.135 -3.836 -1.185

0.787 1.446 2.425

2.722 0.201

- 3.239 -5.287 -5.491 -3.210

0.682 4.97 1 6.265 3.347

-0.860 -3.697 -4.030 -1.710 -0.217 - 1.254 -0.618

0.087 0.553

-1.198 - 1.803

0.096 -0.113 -0.023 -0.746 - 1.072 - 1.235 -0.943

1.218 3.173 2.158 1.859 2.676 3.01 1 1.540

- 5.081 3.900

-0.107 -4.723 -7.357 -4.938

1.647 6.374 5.993 3.705

-0.784 - 3.036 -3.516 -1.510

0.294 1.637 0.633

- 1.447 -4.060 -5.559 -3.814

2.067

4.156 4.224 2.605

-0.561 - 1.842 - 1.580

0.266 1.526 3.401 3.450 1.361

-2.423 -5.100 -5.433 -3.377 -0.070

2.924 3.432 3.209 1.745

Page 483: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

460 PARAMEI'ER ESTIMATION

To implement the procedure, we define

-1 if t = M ,

= { 0 otherwise , -1 if t=57,

-1 if t = 5 8 ,

Then the model

(8.7.5) u, + a]fl,_l + aju,-2 = 6, 9

where el -NI(O, c2), is estimated by maximum likelihood. Using the maximum likelihood option of the AUTOREG procedure of SAS", we obtain

= P O + P l x t , + + @?'I3 + *

=(- 0.001, - 0.040, 1.442, 3.284, - 1.377, - 0.899, 1.183). (0.207) (0.554) (0.787) (0.786) (0.045) (0,042) (0.173)

where 8' is the estimator adjusted for degrees of freedom. Observe that (B,, b2, b3) is the best estimator of (Y40r Ys7, YS& given the other observations and that the parameters are equal to (&, ,%). The standard errors for Y5, and Y5, are larger than the standard error of the estimator of Y40 because Y,, and Y,, are adjacent missing observations. AA

In any statistical analysis, it is wise practice: to inspect the data at every stage of the analysis for extreme or unusual observations. Such observations are often called outlier observations. In time series analysis, the first step in the analysis is generally an inspection of plots of the original data. Such plots help one define the nature of the data. The plot of the data against time is often sufficient to identify very extreme observations, particularly extreme observations created by emus in recording data.

As with many important concepts, what defines an outlier is difficult to specify. This is because what is unusual requires a complete specification of the statistical model, a specification beyond that common in practice. Thus, in a sample of 100 normal (0,l) random variables, a value of sif-,)(Xtn) -Z(n-l,) = 5.00, where Xol, is the largest observation, is the root mean square for the 99 smallest observations, and .f(,,- I is the mean of the 99 smallest observations, is unusual. In a sample of 100 from a Cauchy distribution, the same value is less unusual.

Fox (1972) defined two types of unusual observations in time series. The first type is called an additive outlier. As an example model for additive outliers, assume that we observe the process

(8.7.6)

Page 484: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MISSING AND OUTLIER OBSERVATIONS 461

where XI is a stationary autoregressive moving average, 8, is zero for t # m and one for t = m, and l, is the value of the perturbation at r = m. Under such a model, one might attempt to estimate I, and to remove the effect of the 5, on the estimates of the autoregressive parameters. One can expand the specification by permitting more than one outlier.

The second type of outlier is called an innovation outlier. To illustrate, assume that Y, is a stationary autoregressive process satisfying

(8.7.7)

where e, = a, + S,,l,, and (a,,,,, 5,) has the properties described following (8.7.6). Then {, is called an innovation outlier.

If we assume that we know the point m at which an outlier may have occurred, treat the unknown &, as a parameter to be estimated, and assume is a normal process, then one can use likelihood methods to estimate 5, and to test the hypothesis that 5, = 0. If the point at which an outlier may have occurred is unknown, then a reasonable procedure is to search over the possible values. To construct the likelihood ratio for every value of t , or for every possible pair of values, is a large computational task. Therefore, it is common practice to fit a model to the original data and then inspect the residuals to see if any are unusual. The inspection is often done on the basis of plots. In an autoregressive process, an innovation outlier will generally appear as a single residual of large absolute value. This is because only the single observation fails to satisfy the autoregressive model. On the other hand, an additive outlier will affect the residual for the point at which it occurs and will also affect the p following residuals because they also fail to satisfy the basic autoregressive model. Statistics for testing for outliers were suggested by Fox (1972), and extensions have been considered by several authors, including Chang, Tiao, and Chen (1988) and Tsay (1988).

Assume we have an autoregressive process of order p with known parameters. The predicted vdue for Y,, given that Y, is not observed, is a function of

filter, say H, which when applied to the y, gives an estimator of Y,. The difference between Y, and Hyi divided by the estimated standard deviation of Y, - Hyk provides evidence on whether or not Y, is an additive outlier. For a first order process with parameter a,,

y, - - (Y,-,, . . . , Y,-,, Y,,,, . . . , Urn+,)'. Therefore, one can construct the linear

and the variance of the difference is a2( 1 + a:)-'. In practice, it is necessary to replace the unknown parameters with estimates.

If the parameters of the autoregressive process have been estimated and the residuals calculated, then the residuals at the points m, m 4- 1, . . . , m + p are affected by the presence of an additive outlier. Thus, an estimate of c,, calculated from the residuals. is

Page 485: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

462 PARAMETER ESTIMATION

(8.7.9)

where 4 = 1 and 2,+i = E?=, djYm++,, i =0, 1,. . . , p. A test statistic for the hypothesis that 5, = 0 is

(8.7.10)

where &' is an estimator of u2, such as the mpssion residual mean square. If the process is n o d , rn is not determined by the data, and 5, = 0, then t, is, approximately, a N(0,l) random variable.

An innovation outlier at time m affects only the residual at time rn. "Is, an estimate of 3, is the residuals at the point rn. Fox (1972) and Chang, Tho, and Chen (1988) have discussed choosing between the two types of outliers, A simple procedure is to choose the type of outlier that gives the largest absolute value of

In Example 8.7.2, we use model fitting procedures similar to those used in Example 8.7.1 to identify unusual observations, to classify the unusual observa- tions, and to construct estimates of the autoregressive parameters.

the estimate of lrn.

Example 8.7.2. Table 8.7.2 contains 100 computer generated observations from a second order autoregression. The basic model is

25.267 27.08 1 26.819 25.1 10 20.733 16.947 15.573 17.376 20.2% 22.256 23.162 24.232 22.076 18.428 16.865 17.164 19.815 21.787 22.446 23.425

23.722 21.201 17.761 15.713 15.509 17.790 21.682 25.971 27.265 24.347 20.140 17.303 16.970 19.290 20.783 19.746 20.382 21.087 21553 22.063

19.802 19.I97 21.096 20.887 20.977 20.254 19.928 19.765 20.057 22.2 18 24.173 23.158 22.859 23.676 24.01 1 22.540 22.818 24.914 26.081 24.900

20.893 16.277 13.643 16.062 25.712 31.666 30.120 25.048 17.757 14.199 14.525 18.885 23.230 25.915 24.403 20.349 15.452 12.609 14.620 22.137

26.266 27.653 25.962 21.455 22.665 17.355 19.120 21.462 24.929 26.190 24.300 19.655 15.589 14.1 17 15.886 19.861 24.060 25.628 25.75s 23.786

Page 486: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MISSING AND OUTLIER OBSERVATIONS 463

where e, - NI(0, a2). The parameters of the second order autoregressive process estimated by maximum likelihood are

(fi,&l,&2, a2)= (21.150, - 1.308, 0.840, 2.20). (0.276) (0.053) (0.053) (0.32) (8.7.12)

Figure 8.7.1 is a plot of the standardized residuais from the second order autoregression, plotted against time. The standardized residuals are

c1 = (Y, - &)~;““o), z* = [(Y2 - $1 - m ( Y , - P)I(~y(o)r1 - i52(1)11-1’2 >

Z, = [Y, - $ + hl(yt-, - f i ) + 4(yI-2 - $)la-’, t=2,3,. . . ,n,

0 20 40 60 80 100 Observation

FIGURE 8.7.1. Residuals from second Mder autofegressive fit.

Page 487: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

464 PARAMETER ESTIMATION

where &' = 2.20 is the estimated variance of el. There is a large deviation at t = 65 and three large deviations for t = 85, 86, and 87. In practice, one would attempt to determine if an error has been made in data recording or if there is a subject matter basis for the unusual observations. We proceed under the assump- tion that no explanation for the large deviations is available.

The plot suggests that there is an innovation outlier at t = 65, because there is a large deviation only at that point, the deviations at t = 66 and t = 67 appearing rather ordinary. The deviations associated with t = 8 5 , 86, and 87 indicate that there may be an additive outlier at t = 85. There is a large positive deviation at t = 85, followed by a negative deviation at t = 86, followed by a positive deviation at t = 87, and these signs match the signs of the autoregressive coefficients. The value of t,,, of (8.7.10) for m = 85 is 5.81, while the standardized residual is u eEJ = 3.48. Thus, the data support the presence of an additive outlier over the presence of an innovation outlier at t = 85.

To estimate the parameters of the model treating Y6$ as an innovation outlier and Y8, as an additive outlier, we write a regression model

A - I a

(8.7.13)

where u, is the zero mean second order autoregressive process. In this nzpresenta- tion, an additive outlier can be introduced by creating an indicator variable for the point. For our example, we let XI, = 1 for the parameter p and let

1 if r = 8 5 , 0 otherwise.

(8.7.14)

The process of maximizing the likelihood for the model (8.7.13) with XI = ( X l , , X 1 2 ) can be visualized as transforming the data on the basis of the autoregressive model and then finding the /3 that minimizes the regression residual mean square. Thus, after transformation,

YS5 = p + & - ry1(Y84 - p) - a2(y83 - p) + e85 *

yS6 = p - a1(Y85 - p -. ss,) - a2(y&4 - p) + e86 9

y8'7 = - cTI(y86 - - %(u8, - - & 5 ) + e87 I

where (PI. &I = (P, &). A different X-variable is requiied for an innovation outlier in regression model

(8.7.13). We require a variable which, after the autoregressive transformation, is one for t = m and is zero for t # m. If we know (mi, a'), the variable

if tC6.5, if t = 6 5 , (8.7.15)

= { %,x~- 1.3 ' a2xt-2,3 if t>65

Page 488: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MISSING AND OUTUER OBSERVATIONS 465

will satis$ the requirements for rn=65, because X,, satisfies the difference equation. We created the vector X t = ( X t l , X , z , X , 3 ) using (-1.308,0.840) from our initial fit of the autoregressive model. The model (8.7.13), estimated by maximum likelihood using the ARIMA procedure in SASQ, gives the estimates

(A &,5, f85, &,, $, 82)=(20.9!?1, 4.65, 5.27, (0.220) (0.55) (1.07)

- 1.390, 0.877, 1.157). (0,048) (0.048) (0.170)

If we iterate, using (6,. &J = (- 1.390, 0.877) to create a new X,, variable, we obtain

(G, &5, tB5, g,, &, 2) = (20.982, 4.64, 5.16, - 1.395, 0.880, 1.179) . (0.223) (0.55) ( I .lo) (0.048) (0.047) (0.173)

(8.7.16)

Note that iteration changed the estimates very little. The estimates of tS and lB5 are estimated differences between the observed

values and the values expected under the autoregressive model, where the deviation is assumed to be an additive outlier for r = 85 and an innovation outlier for r = 65. Because we identified these points from the residual plots, we should not use ordinary critical values in making tests. Monte Car10 studies by Chang, Tiao, and Chen (1988) suggest that the largest t in a sample of 100 will exceed 4.00 about 1% of the time when the null model is true. Therefore, both observations ate suspect.

Removing the outliers produces a large reduction in the estimated variance, where the estimated variance in (8.7.16) is only about 55% of that in (8.7.12). The changes in the autoregressive coefficients due to removing the outliers are between one and two standard errors, and the estimated mean is changed by about one-half the standard error.

If we use the estimates given in (8.7.12) to create predictions for the next three periods, we obtain

(Pi011 f102, f103) = (20.73, 18.38, 17.89). (1.48) (2.44) (2.76)

The corresponding predictions constntcted with (8.7.16) are

The largest effect of removing the outliers is a reduction in the estimated standard error of the prediction errors. One should evaluate these effects when deciding to remove outliers. If the true model is one with long tailed errors, then removing

Page 489: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

466 FARAMETeR ESTIMATlON

outliers and using the smaller estimate of Q* will lead to confidence intervals for predictions that are too narrow. AA

The estimation procedure of Example 8.7.2 assigns zero weight to observations that are identified as outliers. Alternatively, the presence of unusual observations might lead to a modification of the working model and the application of estimation procedures to the modified model. One might specify a long tailed distribution or a mixture distribution for the errors and use maximum likelihood estimation under the specified model. See Kitagawa (1987), Pefia and Gutrman (1989). and Durbin and Corder0 (1993).

Estimators that perform well under a wide range of statistical models are called robust estimators. See Hampel et al. (1986) and Huber (1981) for general discussions of robust procedures. Robust procedures that automatically down- weight large deviation observations have also been considered for time series. See, for example, Martin (1980), h i s s (1987), and Bruce and Martin (1989).

8.8, LONG MEMORY PROCESSES

We introduced long memory time series in Section 2.1 1. The long memory time series that is called fractionally differenced noise satisfies

(8.8.1)

and

$d) = [r(j + i)r(-d)~-~r(j - d ) ,

~ ( d ) = mi + ~)r(d)i - 'rcj + d ) ,

and -0.5 < d < 0.5, As might be expected, the estimation theory associated with such processes differs from that of short memory processes. For example, consider the variance of the sample mean. We have

If ~ ( h ) is proportional to hZd-' and d < 0, then ~ ( h ) is absolutely summable and V E n } = O(n- ). If 0 < d C 0.5, ~ ( h ) is not summable, but Yajima (1988) has shown that

Page 490: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

LONG MEMORY PROCESSES 467

lim ni-zd v&} = dr( 1 - u)[r(d)r( 1 - dw( i + 2.41 - . (8.8.2) n - s m

The distributions of the sample autacovariances are difficuIt to obtain, but it is relatively easy to show that the sample autocovariances are consistent.

Lemma 8.8.1. Let U, be the infinite moving average time series m

U, = C ye,.-, , ?PO

where e, are independent (0, a*) random variables with bounded fourth moments and the v, are square swnmable. Let

n-h

W) = n-’ x U,q+, . r * l

Then plim,,, +y(h) = %(h) for h = 0, 1,2,. . . .

and

uniformly in n, because lim,,,E{CX,- U,) ’ )=O, uniformly in t, Therefore, by Lemma 6.3.2,

A Plim + y W = Y y ( W * n+m

To introduce estimation for the parameters of long memory processes, assume

Page 491: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

468 PARAM- ESTIMATION

we have n observations from the Y, of (8.8.1) and we desire an estimator of d. Consider the estimator of d obtained by minimizing

(8.8.3)

with respect to d. Note that the infinite autoregressive representation of U, is truncated at length t - 1 in (8.8.3). The function (8.8.3) is of the same type as (8.4.7).

Lemma 8.82. Let the model (8.8.1) hold, let do E 03 be the true parameter, and let 8 = [d , d2,] C (0.0,O.S). Assume the e, are independent identically distributed (0, Q ) random variables with finite fourth moment E(e:) = &*. Let d be the value of d E 8 that minimizes (8.8.3). Then plim,,, d = do.

'f

Proof. We show that d is a consistent estimator of do by showing that

lim P{ inf [Q,(d) - Qfl(dO)l > 0) = 1 (8.8.4) "*)" Id-doIarl

for all q > O . Consider the difference

where Q,(d) = n - Z:= S:(d) and

Now,

= o(t-2d) Therefore

and

Page 492: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

LONG MEMORY PROCESSES 469

for all d , and d, in 8, where

(8.8.5)

(8.8.6)

and lbj(d)tGMj-l-d logj as j + m . Since the coefficients ~ ( d ) and bi(d) are absolutely summable and attain their supremum at some point in the set [d,,, d,,], it follows that the supremum of the derivative on the right of (8.8.5) is UJl). Thus, by Lemma 5.5.5, plim,,,, C,,(d) = 0 uniformly in d. Therefore, we consider the d associated with the minimum of Z:=, A:@).

We have n

plim n-' 2 A:@) = E{A:(d)}

because A,@) = XY-, gl(d)e,+, where g,(d) is square 2.2.3,

n-wJ , = I

summable by Theorem

m

gi(d) = C Ki-r(d)y(do) 9 ypi-m

and it is understood that ~ ( d ) = 0 and $do) = 0 for j < 0. Again, by the mean value theorem,

for all d , and d2 in 6, and, by the properties of Kj(d) and b,(d), the supremum on the right is UJl). Now,

Page 493: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

470 PARAMEFER ESTIMATION

for some M < a because SUPdce lq(d)l and SUp,,,elb,(d)l am absolutely summ- able; we have used the dominated convergence theorem. Hence, by Lemma 5.5.5,

n

plim n- ' A:@) = E{A :(d)} n+- / = I

uniformly in d. Because E{A:(d)} reaches its minimum at do, the condition (8.8.4) is established. A

Results on estimators of d and of the parameters of the process

(1 - &u, = 2, , ' (8.8.7)

where Z, is a stationary autoregressive moving average, have been obtained by a number of authors. Maximum likelihd estimation for the normal distribution model has been studied by Fox and Taqqu (1986), Dahlhaus (1989), Haslett and Rafbry (1989). and Beran (1992). Properties of Gaussian maximum likelihood estimators for linear processes have been investigated by Yajima (1985) and Giraitis and Surgailis (1990). Also see Robinson (1994a). The following theorem is due to Dahlbaus (1989). The result was extended to linear processes by Giraitis and Surgailis (1990).

Theorem 8.8.1. LRt Y, satisfy (8.8.71, where d E [dl0, d,,] C (0.0,0.5), and Z, is a stationary normal autoregressive moving average with (k - 1)dimensional parameter vector 4. where t& is in a compact parameter space Q,. Let 8 = (d, 8;)'. Let 4 be the value of B that maximizes the likeiihood. Then

n112(d- 810- N(B, Ve,)

where

fv(o) is the spectral density of Y,, and the derivatives are evaluated at the true 61

Proof. Omitted. See Dahlhaus (1989). A

The inverse of the covariance matrix of "beorem 8.8.1 is also the limit of the expected value of n-'b;(Y)h#(Y), where

Page 494: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 471

log L(Y : 8) is the log-likelihood function, and the derivatives are evaluated at the true 8.

Section 8.1. Anderson (1971), Gonzalez-Parias (19921, Hasza (1980), Jobson (1972). Koopmans (1942).

Section 8.2. Anderson (1959, 1962, 1971). Box and Pierce (1970). Drapef and Smith (19811, Hannan (1970), Kendall (1954), Mann and Wald (1943a). Maniott and Pope (1954). Reinsel (1993). Salem (1971). Shaman and Stine (1988).

Sections 8.3,8.4. Anderson and Takemura (1986), Be& (1974), Box, Jenkins and Reinsel (1994), Brockwell and Davis (1991). Cryer and Ledolter (1981), Durbin (1959). Eltinge (1991). Hannan (1979). Kendall and Stuart (1966), Macphetson (1975). Nelson (1974). Pierce (197Oa), Sarkar (1990), Walker (1%1), Wold (1938).

Sedion 8.5. Davisson (1965). Fuller (1980), Fuller and Hasza (1980,1981). Hasza (1977). Phillips (1979), Yamamoto (1976).

Section 8.6. Granger and Andersen (1978). Prieatley (1988). Quinn (1982), Subba Rao and Oabr (1984). Tong (1983, 1990).

SeetiOn 8.7. Chang, Tiao, and Chen (1988), Fox (19721, Jones (1980). Ljung (1993). Tsay (1988).

sectton 8.8. Beran (1992). Dahlhaus (1989). Deo (1995), Fox and Taqqu (1986), Yajima (1985).

EXERCISES

1, Using the first 25 observations of Table 7.2.1, estimate p and A using the equations (8.1.7). Compute the estimated standard errors of a and d using the standard regression formulas. Estitnate p using the first equation of (8.1.9). Then compute the root of (8.1.10) using y, = Y, - 3. Iterate the computations.

2. Let Y, be a stationary time series. Compare the limiting value of the coefficient obtained in the regression of Y, - t(l)Y,.-l on Y,-, - ?(l)Y,-* with the limiting value of the regression coefficient of Y,-2 in the multiple regression of Y, on Y,-l and Y,-2.

3. Compare the variance of y,, for a first order autoregressive process with Var(&} and Va@}, where & = d( 1 - p)-' , and A^ and 3 are defined in (8.1.7) and (8.1.9). respectively. In computing Var($) and Va@}, assume that p is known without error. What arc the numerical vahes for n = 10 and p = 0.71 For n = 10 and p = 0.9?

4. Assume that 100 observations on a time series gave the following estimates:

NO) = 200, fll) = 0.8, 42) = 0.7, 33) = 0.5.

Page 495: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

472 PARAMETER ESTIMATION

Test the hypothesis that the time series is first order autoregressive against the alternative that it is second order autoregressive.

5. The estimated autocorrelations for a sample of 100 observations on the time series {XI} are 41) = 0.8, 42) = 0.5, and $3) = 0.4. (a) Assuming that the time series {XI} is defined by

XI = ax,-1 + B2XI.-2 +el 9

where the e, are n o d independent (0, a’) random variables, estimate 8, and &.

(b) Test the hypothesis that the order of the autoregressive process is two against the dternative that the order is three.

6. Show that for a fixed B, B # Po, the derivative W,(C 8) of (8.3.9) converges to an autoregressive moving average of order (2,l). Give the parameters.

7. Fit a first order moving average to the first fifty observations in Table 8.3.1. Predict the next observation in the realization. Establish an approximate 95% confidence interval for your prediction.

8, Assume that the e, of Theorem 8.3.1 are independent with 4 + S moments for some 6 > 0. Show that n1’2[&z - ( v ~ ) ~ ] converges in distribution to a normal random variable.

9. Prove Theorem 8.3.1 for the estimator constructed with Y, - ji,, replacing V, in all defining equations.

10. Fit an autoregressive moving average (1.1) to the data of Table 8.3.1.

11. The sample variance of the Boone sediment time series d i s c u d in Section 6.4 is 0.580, and the sample variance of the SaylorvUe sediment time series is 0.337. Let XI, and X2, represent the deviations from the sample mean of the Boone and Saylorville sediment, respectively. Using the correlations of Table 6.4.1, estimate the following models:

x,, = @lXL1-1 + @2X1.,-2 9

x,, = @,X,,l-, + @2XI.,-Z + @,%*,-1 9

X21E 6 1 x l , ~ - l + 6Zxl,f-Z + e3xl , f -3 + 65X2.r-1 + e6xZJ-Z * A 7

On the basis of these regressions, suggest a model for predicting Saylorville sediment, given previous observations on Boone and Saylorville sediment.

Page 496: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 473

12. Fit autoregressive equations of order 1 through 7 to the data of Exercise 10 of Chapter 6. Choose a model for these data.

13. A test statistic for partial autocorrelations is given in (8.2.19). An alternative statistic is t: = n1’2&i. Show that

t 7 - 5 N ( 0 , l )

under the assumptions of Theorem 8.2.1 and the assumption that 4j = 0 for j 3 i.

14. A simple symmetric estimator with mean adjustment can be defined as a function of the deviations Y, - yn. Show that

15. We can express the Q(@) of (8.2.14) as

where q = l and c,=O if t - J < l or r - i < l . Show that cOl f=1 for t = 2.3, . . . , R for the weights as defined in (8.2.15). What are cool and c, for w, = 0.51 What are cooI and cI1, for the weights of (8.2.15)?

16. Compute and plot the estimated autocorrelation functions for the AR(4) and ARMA(2,2) models of Table 8.4.2 for h = 0, 1, . . . ,12. Does this help explain why it is difficult to choose between these models?

17. Consider a first order stationary autoregressive process Y, = pq-, + e,, with IpI < 1 and e, satisfying the conditions of Theorem 8.2.1. Prove Theorem 8.2.2 for the first order autoregressive process by showing that the difference between any two of the estimators of p is O ~ ( R - ” ~ ) .

18. Consider a first order invertible moving average process U, = f e,, where IpI < 1, and the eI are iid(0, a2) random variables with bounded fourth moments. Let 6 , . cZ,. . . , tx the regression coefficients in (8.3.15). Let hD.k

denote the Durbin’s estimator given by - [Zt , cly-l]-l [xt, 2,-,c?,], where to = - 1. Find the asymptotic distribution of @D,k for a fixed R. Find the limit as k + w of the asymptotic mean and the variance of SDqk.

19. Let (&,, 4, S2) be the estimated parameters for the secoad order auto-

Page 497: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

474 PA- ESTIMATION

regressive process of Example 8.7.1. Use (&*, $, 6’) to construct estimates of the first five autocovariances of the process. Then use observations (YS8, Y3,,, Y41, Y4J and the estimated autocovariances to estimate Yd0. Use (yss, y56, y s g ~ y60) to (y57, y58)’

20. Assume that Y, is the first order n o m l stationary time series

Y, = e, + Be,-,, e,-NI(O, a’).

Show that the log-likelihood can be written n

log Qc /3) = -0.51 log 2~ - 0.5 Ic, log l E 1

n

- 0.511 log c2 - 0 . 5 ~ - ~ Z: V;’Zf, r = I

w h e r e Z , = Y l , V I = ( 1 + ~ 2 ) , Z t = ~ - V ~ - 1 1 j 3 Z t - 1 forr=2,3 ,..., n,and

~=l+p’-V;-”, p z , t = 2 , 3 ,..., n. 21. Rove the following.

Lemma. Let Y, and er satisfy the assumptions of Theorem 8.2.1. Let cArr t = 1,2,. . . , n and n = 1.2. ..., be a triangular m a y of constants with

n

Z: c;tf = 1 and lim sup c:~ = 0. 1 5 1 ‘+01 l 6 f c n

Then

forJ = 1.2,. . . , p .

Page 498: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

C H A P T E R 9

Regression, Trend, and Seasonality

The majority of the theory presented to this point assumes that the time series under investigation is stationary. Many time series encountered in practice are not stationary. They may fail for any of seved reasons:

1. The mean is a function of time, other than the constant function. 2. The variance is a function of time, other than the constant function. 3. The time series is generated by a nonstationary stochastic mechanism.

We consider processes of the third type in Chapter 10. Examples of hypothesized nonstationarity of the first kind occur most frequently in the applied literature, but there are also many examples of heterogeneous variances.

The traditional model for economic time series is

u, = T, + S, + 2, , (9.0.1)

where T, is the “trend” component, S, is the “seasonal” component, and Z, is the “irregular” or “random” component. In our terminology, 2, is a stationary time series. Often T, is fufther decomposed into “cyclical” and ”long-term” com- ponents. Casual inspection of many economic time series leads one to conclude that the mean is not constant through time, and that monthly or quarterly time series display a type of “periodic” behavior wherein peaks and troughs occur at “nearly the same” time each year. However, these two aspects of the time series typically do not exhaust the variability, and therefore the random component is included in the representation.

While the model (9.0.1) is an old one indeed, a precise definition of the components has not evolved. This is not necessarily to be viewed as a weakness of the representation. In fact, the terms acquire meaning only when a procedure is used to estimate them, and the meaning is determined by the procedure. The reader should not be disturbed by this. An example from another area might serve to clarify the issue. The “intelligence quotient” of a person is the person’s score on an I.Q. test, and I.Q. acquires meaning only in the context of the procedure used to

475

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 499: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

476 REGRESSION, TIUPID, AND S E A S O N M Y

determine it. Although the test may be based on a theory of mental behavior, the I.Q. test score should not be taken to be the only estimator of that attribute of humans we commonly cali intelligence.

For a particular economic time series and a particular objective, one model and estimation procedure for trend and seasonality may suffice; for a different time series or a different objective, an alternative specification may be preferred.

We shall now study some of the procedures used to estimate trend and seasonality and (or) to reduce a nonstationary time series to stationarity. Since the mean function of a time series may be a function of other time seria or of fixed functions of time, we are led to consider the estimation of regression equations wherein the error is a time series.

9.1. GLOBAL LEAST SQUAREs

In many situations we are able to specify the mean of a time series to be a simple function of time, often a low order polynomial in t or trigonometric polynomial in t. A sample of R observations can then be represented by

y=*/3+2, (9.1.1)

where /3' = (p, , pZ, . . . , &) is a vector of unknown parameters,

F~, i = 1,2,. . . , r, are n-dimensional column vectors, and Z, is a zero mean time series. In many situations, we may be willing to assume that 2, is stationary, but we also consider estimation under weaker assumptions.

The elements of cp,, say p,,, are functions of time. For example, we might have Pr I = 1, 9 t z = 1, % =tZ . The elements of p, may atso be random functions of time, for example, a stationary time series. In the random case, we shall assume that z is independent of Q, and investigate the behavior of the estimators conditional on a particular realization of v,,. Thus, in this section, al l p(, will be treated as fixed functions of time. Notice that y, #, pi, and z might properly be subscripted by n. To simplify the notation, we have omitted the subscript.

The simple least squares estimator of /3 is

= (*faJ)-lwy. (9.1.2)

In (9.1.2) and throughout this section, we assume Cb'aJ is nonsingular. Assume that the time series is such that the matrix V,, = E(zz') is nonsingular. Then the generalized least squares (best linear unbiased) estimator of /3 is

(9.1.3)

Page 500: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

GLOBAL LEAST SQUARES 477

be covariance matrix of Bs is E{(& - @><& - #)'} = (*'@)-'*'vaw@'a)-' , (9.1.4)

while that of & is

It is well known that & is superior to bS in that the variance of any linear contrast A'& is no larger than the variance of the corresponding linear contrast A'& However, the construction of BG requires knowledge of V,, and generally V,, is not known. In fact, one may wish to estimate the mean function of U, prior to investigating the covariance structure of the error time series. Therefore, the properties of the simple least squares estimator are of interest. To investigate the large sampie behavior of least squares estimators, let

where 9, is the ith column of the matrix @. The least squares estimator is consistent for 8 under mild assumptions.

Propwition 9.1.1. Let the model (9.1.1) hold where 2, is a zero mean time series and qi, i = 1,2,. . . , r, t = 1,2,. . . , are fixed functions of time. Assume

lim d,,, = co, i = 1,2,. , . , r , (9.1.5) n - r m

n - w lim D,'@'@D,' = A,, (9.1.6)

Proof. "he covariance matrix of the least squares estimator is given in (9.1.4), and the variance of the n o d i z e d estimator is

V@,(& - @)} = D,(~'*)-'D,D,'@'V,,~D,'D,(~Qi)-'D, .

It follows that D,& - P O ) = OJI), because

n-rm lim V{D,(b, - Po)} = Ai'BAi ' , A

Under stronger assumptions, the limiting distribution of the least squares estimator can be obtained. Assume that the real valued functions qIf of t satisfy

Page 501: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

478 RWRESSION, TREND, AND SEASONALITY

and n-h

l b difd; C f l , f l+h , j = a-hsi j I (9.1.9)

h = 0, 1,2,. . . , i, j = 1,2,. . . , r, where A, is nonsingular and A,, is the matrix with typical element uhij. me assumptions on the (pti are quite modest and, for example, are satisfied by polynomial, trigonometric polynomial, and logarithmic functions of time. The following theorem demonstrates that the limiting dis- tribution of the vector of standardized estimators is normal for a wide class of stationary time series.

, = I n+-

Tbeorem 9.1.1. Let Z, be a stationary time series defined by a

z, = 2 ye,-j 9 t=O,+-l , 5 2 *..., j =O

where { y } is absolutely summable and the e, are independent (0,cr2) random variables with distribution functions F,(e) such that

lim sup' J e2 dF,(e) = 0 . 1-1.2, ... l4>8

Let the (pt,, i = 1.2,. . . , r, f = 1.2,. . . , be fixed functions of time satisfying the assumptions (9.1.8) and (9.1.9). Then

D,(& - N(O, A ~ ~ B A ; ' ) , (9.1.10)

where A, is the matrix with elements u,,~ defined in (9.1.9). B is defined in (9.1.7), and BS is defined in (9.1.2). Tbe covariance matrix is nonsingular if B of (9.1.7) is nonsingular.

Proof. Consider the linear combination X:-l cIZl, where

and the A, are arbitrary real numbers. Now, by our assumptions,

Page 502: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

GLOBAL LEAST SQUARES 479

2 -112 and c, is completely analogous to (Xi",lCj) 6.3.4, X:-l c,Z, converges to a normal random variable with variance

C, of Theorem 6.3.4. By Theorem

m I I

where b, = Z;=-,u,,%(h) is the ijth element of B of (9.1.7). Since limn+, D,'@'@D,' =A, and since A was arbitrary, the mult follows from Theorem 5.3.3. A

The matrix A, with ijth element equal to ahij of (9.1.9) is analogous to the matrix r(h) of Section 4.4, and therefore the spectral representation

(9.1.11)

holds, where M ( y ) - M(w,) is a positive semidefinite Hermitian matrix for all - T C o, < a+ G n; and A,, = M(v) - M(- r). We state without proof some of the results of Grenander and Rosenblatt (1957), which are based on this representation.

Theorem 9.13. Let the assumptions (9.1.1)> (9.1.8). and (9.1.9) hold, and Iet

lim D; ' W Y ~ ~ D ; I = 2 r I fz(o) ~IM(U),

the spectral density of the stationary time series Z, be positive for all w. Then 7r

-" n - w

(9.1.12)

(9.1.13)

wherefz(w) is the spectral density function of the process 2,.

Proof. See Grenander and Rosenblatt (1957). A

The asymptotic efficiency of the simple least squares and the generalized least squares is the same if the two covariance matrices (9.1.12) and (9.1.13) are equal. Following Grenander and Rosenblatt, we denote the set of points of increase of M(u) by S. That is, the set S contains all w such that for any interval (0,. y)

Page 503: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

480 REGRESSION, TREND, AND SEASONALITY

where w1 < o < %, M(y) - M(w,) is a positive semidefinite matrix and not the null matrix.

Theorem 9.13. Let the assumptions (9.1.1), (9.1.8), and (9.1.9) hold, and let the spectral density of the stationary time series Z, be positive for all w. Then the simple least squares estimator and the generalized least squares estimator have the same asymptotic efficiency if and only if the set S is composed of q distinct points, 0,’ y,. . . , oq, q G r .

Proof. See Grenander and Rosenblatt (1957). A

It can be shown that polynomials and trigonometric polynomials satisfy the conditions of Theorem 9.1.3. For example, we established the result for the speciai case of a constant mean function and autoregressive Y, in Section 6.1. One may easily establish for the linear trend that

from which it follows that the set S of points of increase is o, =O. The reader should not forget that these are asymptotic results. If the sample is

of moderate size, it may be desirable to estimate V and transform the data to obtain final estimates of the trend function. Also, the simple least squares estimators may be asymptotically efficient, but it does not follow that the simple formulas for the estimated variances of the coefficients are consistent. In fact, the estimated variances may be badly biased. See Section 9.7.

9.2. GRAFTED POLYNOMIALS

In many applications the mean function is believed to be a “smooth” function of time but the functional form is not known. While it is difficult to define the term “smooth” in this context, several aspects of functional behavior can be identified. For a function defined on the real line, the function would be judged to be continuous and, in most situations, to have a continuous fint derivative. This specification is incomplete in that one also often expects few changes in the sign of the first derivative.

Obviously low order polynomials satisfy the stated requirements. A h , by the Weierstrass approximation theorem, we know that any continuous function defined on a compact interval of the real line can be uniformly appximated by a polynomial. Consequently, polynomials have been heavily used to approximate the mean function. However, if the mean function is such that higher order polyno- mials are required, the approximating function may be judged unsatisfactory in that it contains a large number of changes in sign of the derivative. An alternative

Page 504: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

0- POtYNOMIALS 481

approximation that generally overcomes this problem is to approximate segments of the function by low order polynomials and then join the segments to form a continuous function. The segments may be joined together so that the derivatives of a desired order are continuous.

Our presentation is based on the fact that, on the real line, the functions

where i = 1,2,. . . ,M, R and M are positive integers, and continuous with continuous (k - 1)st derivatives. It follows that

is also continuous with continuous (k - 1)st derivative. We call the function g(t) a grafted polynomial of degree k.

To illustrate the use of grafted polynomials in the estimation of the mean function of a time series let us make the following assumption: "The time series may be divided into periods of length A such that the mean function in each period is adequately approximated by a quadratic in time. Furthermore the mean function possesses a continuous first derivative." Let R observations indexed by t = 1,2,. . . , n be available. We construct the functions q,, i = 1.2, . . . , M, for k = 2 and A, = Ai, where M is an integer such that IA(M + 1) - nl <A. In this formulation we assume that, if n # A(M + l), the last interval is the only one whose length is not A. At the formal level all we need do is regress our time series upon t, t2, q,,, cpt2 , . . . , ysM to obtain estimates of b,, J = 0,1,. . . ,M + 2. How- ever, if M is at all large, we can expect to encounter numerical problems in obtaining the inverse and regression coefficients. To reduce the possibility of numerical problems, we suggest that the p's be replaced by the linear combina- tions

w , ~ = ys, - 3 q , , + , + 343.,+, - q.i+3 i = h2 , . . - ,M, where, for convenience, we define

for all t. Note that

(f - Ai)* , A i c t c ( i + 1 ) A ,

(t - A i l 2 - 3 [ t - (i + I)Al2, (i + 1)AGt <(i + 2 ) A , [c - (i + 3)AI2, (i + 2 ) A S f <( i + 3 ) A ,

otherwise.

Wf, =

Since the function wti is symmetric about (i + 1.5)A, the w-variables can be

Page 505: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

482 REGRESSION, TRBND, AND SEASONALITY

written down immediately. The w,, remain correlated, but there should be. little trouble in obtaining the inverse.

This procedure is felt to have merit when one is called on to extrapolate a time series. Since polynomials tend to plus or minus infinity as t increases, practitioners typically hesitate to use high order polynomials in extrapolation. Although a time series may display a nonlinear trend, we might wish to extrapolate on the basis of a linear trend. To accomplish this, we approximate the trend of the last K periods of a time series of n observations by a straight line tangent to a higher degree trend for the earlier portion of the time series. If the mean function for the first part of the sample is to be approximated by a grafted quadratic one could construct the regression variables

for i = 1,2, . . . , M, where (K + AM - nI < A . Using thcse variables, the estimated regression equation can be written as

where the 6's are the estimated regression coefficients. The forecast equation for t h e m e a n f u n c t i o n i s ~ ~ , + ~ , t , t = n + l , n + 2 ,... .

We can also estimate a mean function that is linear for both the first and last portlons of the observational period. Using a grafted quadratic in periods of length A for the remainder of the function, the required regression variables would be, for example,

Q~I = t *

[t-K-A(i- l)]', K+A(i- l )<t=ZK+Ai , #,I +, = A* + 2A(t - K - Ai) , t > K + Ai ,

{ o otherwise

for i = 1,2,. . . , M, where n > K + M A and the mean function is linear for t < K and t > K f MA. If M is large, a transformation similar to that discussed above can be used to reduce computational problems.

Example 92.1. Yields of wheat in the United States for the period 1908 through 1991 are given in Table 9.2.1. In this example, we use onIy the data fmm 1908 through 1971. Yields for the first 20 to 30 years of the period are relatively constant, but there is a definite increase in yields from the late 1930s through 1971. As an approximation to the @end line, we fit a function that is constant for the first 25 years, increases at a quadratic rate until 1961, and is linear from 1961

Page 506: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

GRAFTED POLYNOMIALS 483

Table 9.2.1. United States Wheat Ylelds (1908 to 1991)

Year

1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 I920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935

- Yield Y W Yield Year Yield

14.3 15.5 13.7 12.4 15.1 14.4 16.1 16.7 11.9 13.2 14.8 12.9 13.5 12.7 13.8 13.3 16.0 12.8 14.7 14.7 15.4 13.0 14.2 16.3 13.1 11.2 12.1 12.2

1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 I954 1955 1956 1957

1959 1960 1961 1962 1963

1958

12.8 13.6 13.3 14.1 15.3 16.8 19.5 16.4 17.7 17.0 17.2

17.9 14.5 16.5 16.0

17.3 18.1

20.2

27.5 21.6 26.1 23.9 25.0 25.2

18.2

18.4

19.8

21.8

1964 1965 1966 1967 1%8 1%9 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991

25.8 26.5 26.3 25.9 28.4 30.6 31.0 33.9 32.7 31.7 27.3 30.6 30.3 30.7 31.4 34.2 33.5 34.5 35.5 39.4

37.5 34.4 37.7 34.1 32.7 39.5 34.3

38.8

SOURCE U.S. Department of Agriculture, Agricultural Statistics, various issues.

to 1971. A continuous function with continuous derivative that satisfies these requirements is

1 d t G 2 5 , 25 c t s 54, v , ~ = (t - 25)’ , (9.2.1) r 841+58(t-54), ’ 5 4 c t ,

where t = 1 for 1908. The trend line obtained by regressing the 64 yield observations on the vector

(1, R l ) is displayed in Figure 9.2.1. The equation for the estimated trend line is

I‘, = 13.97 + O.O123p,,

and the residual mean square is 2.80.

Page 507: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

484 REGRESSION. TREND, AND SEASONALITY

30

A

." -- PI

E 20

15

A I I I I I i

50 60 0 10 20 30 40 lime

Figure 9.2.1. U.S. wheat yields 1908-1971 end grafted quadretic trend.

The trend in the data and the properties of the time series will be investigated AA further in Example 9.5.1 and Example 9.7.2.

In the example aud in ow presentation we assumed that one was willing to specify the points at which the different plynomiah are joined. The problem of using the data to estimate the join points has been considered by Gallant and Fuller (1973).

Grafted polynomials passing through a set of given pints and joined in such a way that they satisfy certain restrictions on the derivatives are called spline funcrions in approximation theory. A discussion of such functions and heir properties is contained in Ahlberg. Nilson, and Walsh (1967), Greville (1%9), and Wahba (1990).

9.3. ESTIMATION BASED ON LEAST SQUARES RESIDUALS

93.1. Estimated Autocorrelations

One of the reasons for estimating the trend function is to enable one to investigate the properties of the time series Z,. Let the regression model be that defined in (9.1.1):

Page 508: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION BASED ON LEAST SQUARES RESIDUALS

y=@'B+z.

The calculated residuals using the simple least squares estimator are

2 = K - q . b s =zr - q,<& - B) 7

485

(9.3.1)

(9.3.2)

and uf, estimated Z, is

the rth row of (9. The estimated autocovariance of Zf computei. using the

I n - h T2(h) = ; 2 2$f+h , h = 0,1,2, . . . , n - 1 .

I = I (9.3.3)

Theorem 9.3.1. Let Y, be defined by the model (9.3.1) where Zf = ZJwm0 +e,-J, {T} is absolutely summable, {e,} is a sequence of independent (0, a') random variables witb bounded 2 f S (S > 0) moments, and the fit, i = 1,2, . . . , r, t = 1,2,. . . , are fixed functions of time satisfying the assumptions (9.1.5) and (9.1.6). Then

T2(W - W ) = 0 J n - I ) 7

where ?Ah) is defined in (9.3.3) and

. n - h

TJ~) = C Z,Z, +,, , h = 0, I , 2, . . . , n - I . , = I

Proof. We have, for h = 0.1,. . . , n - 1,

n - h n-h n -h

where Dn is the diagonal matrix whose elements are the square roots of the elements of the principaI diagonal of a'@. By the absolute summability of y,(h), tbe assumption (9.1.5), and the assumption (9.1.6)

n-h

2 Zr#+h, .Dil = Op(l) 9

1 = 1

Page 509: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

486 REORESSION, TREND, AND SEASONALITY

Since each element of the r X r matrix

is less than one in absolute value, the result follows. A

It is an immediate consequence of Theorem 9.3.1 that the limiting behavior of the estimated autocovaxiances and autocorrelations defined by Theorem 6.3.5 and Corollary 6.3.5 also holds for autocovariances and autocorrelations computed using 2, in place of 2,. The bias in the estimated autoconelations is O(n-’) and therefore can be i g n d in large samples. However, if the sample is small and if several R~ are included in the regression, the bias in j$(h) may be sizable. To briefly investigate rhe nature of the bias in 92(h) we use the statistic studied

by Durbin and Watson (1950, 1951). They suggested the von Neumann ratio (see Section 6.2) computed from the calculated residuals as a test of the hypothesis of independent e m . The statistic is

(9.3.4)

where

- 1 2 -1 ... # .

0 0 0 . - * -1 1

P = [I - @(@’@)-‘@’]z is the vector of calculated residuals, and z is the vector of observations on the original time series. Recall that

where

The expected values of the numerator and denominator of d are given by

and

Page 510: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION BASED ON LEAST SQUARES RESIDUALS 487

E{@’2) = trp,, -y,@(@’~)-l@’l, (9.3.6)

where V,, = E{zz’), and trB is the trace of the matrix B. It is easy to see that the expectation (9.3.5) may differ considerably from

2(n - l)[x(O) - x( l)]. The expectation of the ratio is not necessarily equal to the ratio of the expectations, and this may further increase the bias in some cases (e.g., see Exercise 9.3).

We now turn to a consideration of d as a test of independence. For normal independent e m s W i n and Watson have shown that the ratio can be reduced to a canonical form by a simultaneous diagonalization of the numerator and denominator quadratic forms. Therefore. in this case, we can write

where it is assumed that the model contains an intercept and k’ other independent variables, the A, are the roots of [I - WW@)-’@’lHfI - @(@’@)-‘@’I, the S, are the roots of I-*@’@)-’@‘, and the e. are normal independent random variables with mean zero and variance a: = uz. Furthermore, the distribution of the ratio is independent of the distribution of the denominator. The exact distribution of d, depending on the roots A and 8, can be obtained. For normal independent errors Z,, Durbin and Watson were able to obtain upper and lower bounds for the distribution, and they tabled the percentage points of these bounding distributions.

For small sample sizes and (or) a large number of independent variables the bounding distributions may differ considerably. Durbin and Watson suggested that the distribution of i d for a particular @-matrix could be approximated by a beta distribution with the same first two moments as $d.

As pointed out in Section 6.2, in the null case the f-statistic associated with the first order autocorrelation computed from the von Neumann ratio has percentage points approximateiy equal to those of Student’s f with n + 3 degrees of freedom. For the d-statistic computed from the regression residuals it is possible to use the moments of the d-statistic to develop a similar approximation. The expected value of d under the assumption of independent errors is

’2

E(d} = (tr H - tr[@’H@(@’@)-’])(n - k‘ - 1)-’

= {2(n - 1) - tr[(A@)’(A@)(W@)-’])(n - k’ - 1)-’ , (9.3.7)

where (A@)’(A@) is the matrix of sums of squares and products of the first differences. Consider the statistics

(9.3.8)

Page 511: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

488

and

REGRESSION, TREND. AND SEASONALITY

where r, = 3 2 - d) . The factor 1 - r2 keeps 6 in the interval [-1,1] for most samples. For normal independent (0, Q ) errors the expected value of 6 is O(n-*) and the variance of 3 differs from that of an estimated autocorrelation computed from a sample of size n - k' by terns of order n-'. This suggests that the calculated td be compared with the tabuIar value of Student's t for n - k' + 3 degrees of freedom. By using the second moment associated with the particular @-matrix, the t-statistic could be further modified to improve the approximation, though there is little reason to believe that the resulting approximation would be as accurate as the beta approximation suggested by Durbin and Watson.

4

93.2. Estimated Variance Functions

The residuals can be used to investigate the nature of the variance of Zf as well as to study the autocorrelation structure. By Proposition 9.1.1, the least squares estimator of #3 is consistent in the presence of variance heterogeneity. A simple form of variance heterogeneity is that in which the variance of the errors depends on fixed functions of time.

In Proposition 9.3.1, we show that the errors in the least squares estimated autoregressive coefficients are 0 J r 1 - l ' ~ ) in the presence of variance heterogeneity.

Proposition 93.1. Let Z, be a time series satisfying

n

(9.3.10)

for t = 0, 2 1, 22,. . . , where e, are independent (0,l) random variables with bounded 2 + 6 moments, S > 0, { Q ~ J is a fixed sequence bounded above by M, and bounded away from zero, and the roots of the characteristic quation associated with (9.3.10) are less than one in absolute value. Define the least squares estimator of a by

- i n

I=p+ 1

where

Page 512: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION BASED ON LEAST SQUARES RESIDUALS 489

Proof. The Z, can be written as X,m=owju,-j, where the w, are defined in (2.6.4). Now,

i+j

The a, are independent and have bounded 2 + 6 moments. Therefore,

and n r-h

2 2

? = I j=O

by Theorem 6.3.2.

second term of (9.3.12) converges to zero in probability. Therefore, Using the arguments of the proof of Theorem 6.3.5, it can be shown that the

n - h n - h - Z~Z,+,, - n-' Z 2

f=I j = O

The error in the least squares estimator is

&-a=-( 1=p+l i x;xr)-' t = p + l 2 x;a, (9.3.13)

and by the assumption (9.3.10),

E{(X:a,, X:X,a:)I 4-J = (0, X:Xpf,) (9.3.14)

for t = 0, t l , 2 2 , . . . , where a(-, is the sigma-field generated by Zf-l, Zf--2, . . . . Because a;-,o,?, has absolute moment greater than one, the arguments associated with obtaining the probability limit of (9.3.12) can be used to show that

plim (n -PI-'[ 2 x:xfcT:l - i: ~ ( X : X J ~ ~ , ] = 0 . n -B- t = p + I f=p+I

It follows that

Page 513: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

REORESSION, TRPID, AND SEASONALITY 490

(9.3.15)

by Theurem 5.3.4. A

The result of Proposition 9.3.1 can be extended to the estimated autoregressive parameters computed from least squares residuals.

Propasition 9.3.2. Let the model (9.1.1) hold, where Zf is the autoregressive time series of (9.3.10) and f i p , i = 1,2, . . . , r, t = 1,2, . . . , are fixed functions of time. Let the assumptions (9.1.5) and (9.1.6) of Proposition 9:l.l and the assumptions of Proposition 9.3.1 hold. Let 2f be the least squares residuals of (9.3.2), and let

ti=-( i: $v;**)-' I=.P+l i w2:, t=p+ 1

Gi'''(t% - a)& N(O, I),

where Gn is defined in (9.3.11).

Proof. Because

for gome 0 < A < 1, one can use the arguments of the proof of Theorem 9.3.1 to show that

A

Example 9.3.1. The 150 observations in Table 9.3.1 were computer gener- ated. A plot of the data suggests that the mean function is approximately a linear function of time. If we fit a linear trend by ordinary Ieast squares, we obtain

= 15.472 + 0.3991t. (9.3.16)

If a second order autoregression is fitted to the residuaIs by ordinary least squares, we obtain

2, = 1.524 gf-, - 0.616 2f-2, (0.067) (0.067) (9.3.17)

Page 514: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION BASED ON M A S T SQUARES RESIDUALS 491

Table 93.1. Data fw Example 93.1

16.45 16.39 16.25 16.23 17.16 17.55 17.48 18.32 18.31 19.09 19.53 18.08 19.21 19.96 20.69 21.15 19.69 19.36 18.83 19.84 22.20 22.42 22.18 22.38 22.45 23.47 25.03 25.41 24.91 25.49

25.13 26.21 26.98 28.60 30.80 32.65 32.92 34.41 35.65 37.15 37.49 37.64 37.03 36.78 37.26 36.83 36.38 36.79 35.87 35.89 36.21 36.95 37.93 36.55 33.94 34.28 37.51 40.42 41.70 43.39

46.49 47.78 48.05 46.54 43.15 40.34 41.30 42.49 43.00 43.75 45.86 45.9s 44.87 44.22 43.95 43.52 41.51 39.41 36.07 37.23 40.14 40.4 1 41.06 39.56 40.33 43.90 47.28 48.71 49.46 50.61

49.57 51.03 50.80 5 1.49 52.91 56.95 59.32 59.76 59.29 58.74 57.80 56.27 52.98 52.27 56.17 61.33 64.69 68.49 70.19 7 1.47 70.98 68.63 67.47 67.68 68.47 69.52 71.47 73.33 72.17 72.72

73.07 72.83 70.05 66.30 61.47 60.47 61.30 62.38 63.46 62.38 61.61 61.82 61.03 62.18 62.60 61.83 64.23 65.62 67.17 68.61 69.36 69.91 70.28 71.92 72.48 71.59 70.19 69.69 73.55 77.55

where s2 = 1.646 and the standard errors are those of the ordinary regression program.

If the absolute values of the residuals from the autoregression are plotted against time, there is an increase in the mean of the absolute values. Several models can be considered for the variance of the errors. We postulate a model in which the variance is related to the mean. Our model for the variance of a, is

(9.3.18) 2 a,, = ~ ( a 3, = eKl [E{ q)l K* ,

where a, is the error in the autoregression estimated by (9.3.17). We estimated (K,. %), replacing a: with Ci 3 from regression (9.3.17) and replacing E{Y,} with 1 from (9.3.16). by maximum likelihood under the assumption that a, are NI(0, g.],). The estimates are

Page 515: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

4% REGRESSION, TREIW, AND SEASONALITY

(2,, k2) = (-3.533 , 1.051), (0.558) (0.148)

where the standard e m are from the inverse of the estimated information matrix. We have estimated the variance function for the a,, but if the variance function

is a smooth function, the variance of the autoregressive process Z, will be nearly a multiple of the variance of the a,. Therefore, we define new variables

(e;,'~, a;', ~ s ' t ) = (~1, (ptlt ( P , Z ) ,

where aa, is the function (9.3.18) evaluated at E{Y,)=$ and ( R , , % ) = (-3.533, 1.051). The parameters of the model

(9.3.19)

estimated by Gaussian maximum likelihood, are

(PI, &,a,,a2,a~)=(15.113, 0.407, - 1.473, 0.570, 0.995). (1.465) (0.021) (0.068) (0.068) (0.090)

Estimation for a model such as (9.3.19) is discussed in Section 9.7. The estimate of (B,, &, a,, %) is not greatly different from that obtained in (9.3.16) and (9.3.17) by ordinary least squares. However, the confidence interval for a prediction is different under the model in which the variance is a function of the mean. The standard e m for predictions one, two, and three periods ahead are (1.283, 2.328, 3.178) for the model computed under the assumption of homoge- neous variances. The corresponding prediction standard emrs are (1.673, 2.985,

A 4.023) for the model with the variance function of (9.3.18).

In Example 9.3.1, the variance was assumed to be a fixed function of time. Models in which the variance is a random function have also been considered in the literature. Most often the variance at the current time is expressed as a function of the past behavior of the time series. Under mild assumptioIls, it can be shown that the least squares estimator of the parameter vector of the autoregressive process has a normal distribution in the limit.

Projmition 9.33. Let (2,. q,), t = 0 . 2 1 . 5 . . . . , be a covariance stationary time series satisfying

Page 516: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION BASED ON LEAST SQUARES RESIDUALS 493

where the wj are absolutely swnmable, el are iid(0,l) random variables with 2 + S, (S, > 0) moments, el +, is independent of a,, for j 3 0, and

If, in addition, 2, is an autoregressive process satisfying P

z1 + C a ~ , - , =a, ,

where the roots of the characteristic equation are less than one in absolute value, and if

plim n-1 i: X:Xlafl = E{X:X,a~,), n-+- t - p + i

i s 1

(9.3.20)

then

G,"*(&- a)& N(0, I ) ,

where q &, G,, and X, are defined in Proposition 9.3.1.

proof. The sequence (a,} is a martingale difference sequence and

P n"*rS,+ N(0, a:)

by Theorem 5.3.4. Hence, n1"Zn also converges in distribution to a normal random variable by Theorem 6.3.3. Thus,

We expand n-' 2::; Z,Z,+,, as in (9.3.12) and observe that

Page 517: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

by Theorem 6.3.2. It can be shown that r - h m rn .. ..

plim C C C wjw,+,a,-,a,-j = o ? = I j = O i - O

i# j

by applying the arguments of the proof of Theorem 6.3.5. Therefore, the sample covariance is consistent for the population covariance.

The m r in the least squares estimator is given in (9.3.13). Under the present assumptions, (9.3.14) holds and, by the assumption (9.3.20) and Theorem 5.3.4, we obtain the limiting normal distribution result of (9.3.15). The limiting result for

A G,"'(& - cu) then follows.

Engle (1982) introduced a variance model called the autoregressive conditional heteroscedastic model (ARCH). An example is

(9.3.21)

where the e, are NI(0,l) random variables and 0 Q K' < 3-l". Some properties of this model are developed in Exercise 2.43 of Chapter 2. CIearIy, many extensions of the simple model are possible, and a number have been considered in the literature. See Engle and Bollersiev (1986), Weiss (1984, 1986), Pantula (1986), Bollerslev, Chou, and Kroner (1992). Higgins and Bera (1992). and Granger and Teriisvirta (1993),

A model closely related to the ARCH model postulates the variances to be an unobserved stochastic process. See Melino and Turnbull (1990), Jacquier, Polson, and Rossi (1994), and Harvey, Ruiz, and Shepbard (1994). An example mode1 is

z,=ez,_,+a,, (9.3.22) a, = u,,e, , (9.3.23)

h , - / S , = W , - , - & J + b , , (9.3.24)

where h, = log a,, and (e,, b,) - II[(O, 0), diag( 1, a:)]. In (9.3.241, the logarithm of the unobserved variance process is a first order autoregressive process. Thus, a,, is always positive.

If the a, are such that loga: has second moments, it is natural to take logarithms of (9.3.23) to create a linear model. Then,

log a: = ZZ, + log e: . (9.3.25)

If e, is normally distributed, then log e: is distributed as log chi square which has mean of about -1.27 and variance of about 4.93. The distribution of loge: is heavily skewed to the left. If (9.3.24) is a stationary autoregressive process and e, is normally distributed, then log a: is a stationary autoregressive moving average process. Nonlinear estimation methods can be used to estimate the parameters of

Page 518: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

ESTIMATION BASED ON LEAST SQUARES RESIDUALS 495

(9.3.24) treating the variance of loge: as known or unknown. In Example 9.3.2, we consider a modification of the log transformation in the estimation.

Exampie 933. As an example of the fitting of a stochastic volatility model, we consider the prices of June futures contracts for the Standard and Poor’s 500 index, collected as the average price per minute from 8:30 am. to 3:15 p.m. on April 13,1987. Our basic observations are the first differences of the prices. The data are considered by Breidt and M q u i r y (1994). If a first order autoregression is fitted to the data, the estimated coefficient is 0.340 with a standard error of 0.047. The residuals from the first order autoregression, denoted by d,, are plotted against time in Figure 9.3.1.

The observations in Figure 9.3.1 show evidence of a serially correlated variance. At the end of the series, large deviations tend to be followed by large deviations, while in the middle, small deviations tend to be followed by small deviations. This dependence is reflected in the sample autocorrelations for the Ci :. The first five autocorrelations of the 4 : are 0,204,0.501,0.205, 0.346, and 0.263. These correlations are significantly different from zero because the standard error under zero correlation is about 0.05.

We wish to fit a variance model of the form (9.3.23). (9.3.24) to the residuals from the autoregressive fit. In practice, there are some disadvantages to a logarithmic transformation of the a: as defined in (9.3.25). There may be some zero values for a:, and the original distribution may not be nonnal. Therefore, we

1.0-

0.5 -

- i .oJ ,

8:30 10:30 12:30 2:30 Time

Figure 93.1. Residuals fmm autoregression of price differences.

Page 519: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

4% REORESSION, TREND, AND SEASONALITY

suggest the following estimation scheme that is more robust against nonnormality. Let 5 be a s d fraction, say 0.02, of the average of the a:. Let

u, = log(u: + 6 ) - (u: + S)-- ' t = 2h, + w, , (9.3.26)

where

w, = log(e: + v,'f> - (e: + (r,2~)-'a,'~, (9.3.27)

If e: is large, w, differs little from log e:. For example, the difference is less than 0.016 if ai:[=0.02 and e:>0.10. For normal (0.1) random variables and uGt t= t= 0.02, the mean of w, is about -1.09 and the variance is about 3.19. Thus, the &modification of the log transformation produces variables with smaller variance than log e: because the function (9.3.26) is bounded below by log f - 1. Also, the distribution of w, is much less skewed than that of loge:.

The V, of (9.3.26) can be used to estimate the parameters of the h,-process up to an additive constant. For example, if our specification for the h,-prOcess is a pth order autoregressive process, we write

- 2

(9.3.28)

i - 1

where /+, = E{2h,} + E{w,}, g, = w, - E{wt}, XI = u", - E{2hl}, w, is defined in (9.3.27). {b,} is independent of {g,}, and b, - II(0, 0,'). Because of the nahm of our transformation, the conditional distributions of the g, given h, are only approxi- mately equal.

We use a first order autoregressive model for the h, of stock prices. Then the IJ, of (9.3.26) is an autoregressive moving average, which we write as

~,-llu=wJ,-, - h ) + % + B r l t - l * (9.3.29)

We replace u: with d : in the definition of U, of (9.3.26) and set f = 0.00058, where the average of the S: is 0.0290. The estimates of the parameters are

((2: B, &;) = (0.939, - 0.816,2.%1). (0.033) (0.055)

From (9.3.29) and (9.3.28), pa: = -@," and

(1 + P*),,Z = a; + ( 1 + +'>a,".

It follows that * - 1 1

&," = - + /3&: = 2.5742,

&.b' = &$ + 8' + $-'&1+ $31 = 0.0903.

Page 520: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MOVING AVERAGES-LINEAR PlLTERING 497

The fact that &: differs considerably from 3.19 suggests that the original e, are not normally distributed. Given an estimate of the parameter 9 of the X,-process, we can construct smoothed estimates of X,, r = 1,2,. , . , n and, hence, estimates of 2h,, up to an additive constant. Then the constant can be estimated as a multiplicative constant in the original (nonlogarithmic) scale. Let f be defined by

where p,, = E{h,}. Then an estimator of f is

(9.3.30)

(9.3.3 1)

where 2, is the smoothed estimator of X, constructed from the fitted model (9.3.28). The quantities

B:, = pexd2,) (9.3.32)

are smoothed estimates of the unknown air. be carried out in which the initial transformation is

If the Bq, arr: quite variable over the sample, a second round of calculations can

where R is a constant, such as 0.02, and &:, = lexP($,}. In this example, the estimates are changed very little by the second round of computations. However, in some situations there is a considerable change, and in general a second round of computation is recommended. See Breidt and Caniquiry (1994).

Given the estimates of Q:, a;, and JI for the price data, we constructed smoothed estimates $, using the Kalman filter recursions for fixed interval smoothing described, for example, in Harvey (1989, p. 154) and Anderson and Moore (1979, Chapter 7). The estimate of f of (9.3.30) is l= 0.01550. The smoothed estimates of the standard deviations go, from (9.3.32) are plotted against time in Figure 9.3.2. "hey reflect the pattern observed in the data. That is, there is a period of high volatility near the end of the series and low volatility in the middle of the observation period. A A

9.4. MOVING AVERAGES-LINEAR FILTERING

9.4.1. Moving Averages for tbe Mean

In the previous sections we considered methods of estimating the mean function for the entire period of observation. One may be interested in a simple approximation to the mean as a part of a prel imhy investigation, where one is not willing to specify the mean function for the entire period, or one may desire a relatively simple method of removing the mean function that will permit simple

Page 521: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

498 REGRESSION, TREND. AND SBASONALRY

Time Figme 9.33. Smoothed estimates of a, for price differences.

extrapolation of the time series. In such cases the method of moving averages may be appropriate.

One basis for the method of moving averages is the presumption that for a period of M observations, the mean is adequately approximated by a specified function. The function is typically linear in the parameters and most commonly is a polynomial in r. Thus, the specification is

where 2, is a stationary time series with zero expectation, is a vector of parameters, and M = M, + M2 + 1. The form of the approximating function g is assumed to hold for all r, but the parameters are permitted to be a function of r. Both the “local” and “approximte” nature of the specification should now be clear. If the functional form for the expectation of U, held exactly in the interval, then it would hold for the entire realization, and the constants M, and M, would become f - 1 and n - 1, respectively.

Given specification (9.4.1), a set of weights, w,(s), am constructed that, when applied to the ?+,, j = -hi,, -M, + 1,. . . , M2 - 1, M2, furnish an estimator of g(s; 4) for the specified s. It follows that an estimator of Z,,, is given by

y,+, - 8 s ; 8, ?

Where

Page 522: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MOVING AVERAGES-LINEAR PILTERING 499

In the terminology of Section 4.3 the set of weights {wj(s)) is a linear filter.

time Series is adequately represented by Let us consider an example. Assume that for a period of five observations the

where Z, is a stationary time seriles with zero expectation. Using this specification, we calculate the least squares estimator for the trend value of the center observation, s = 0, as a linear function of the five observations. The model m a y be written in matrix form as

where

y, = (Y,+ Y,-l , U,, U,,, , U,+z ) ' is the vector in the first column of Table 9.4.1, Q, is the matrix defined by the second, third, and fourth columns of Table 9.4.1 (those columns headed by &,, PI,, and a,), and z , = ~ Z , ~ ~ , ~ , ~ , . Z , ~ Z , + ~ , Z , ~ ~ ~ ' is the vector of (unobservable) elements of the stationary time series. The least squares estimator of the mean at j = 0, g(0; B,), is given by

= (1, 0, O)(*'@)-'@'y, 2

where the vector (1,O.O) is the value of the vector of independent variables associated with the third observation. That is, (1,0,0) is the third row of the matrix Q, and is associated with j = 0. The vector of weights

Table 9.4.1, WculatsOn of Weights for a Five Period Quadratic Moving Average

Weights for Trend Weights for Trend Adjusted Series

9, w, P,, P2, s(o;B,) g ( k B , ) t?(2;B,) (s = 0)

Y,-2 1 -2 4 -6170 -10170 6/70 6/70 y,-' 1 -1 1 24/70 12170 -10/70 -24170 Y, I 0 0 34/70 24/70 -6170 36/70 Y,+, 1 1 1 24/70 26/70 18/70 -24170 y1+2 1 2 4 -6170 18/70 62/70 6/70

Page 523: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

500 RWRESSION. TREND, AND SEASONALITY

w' = (1, O,O)(O'@)-'@' ,

once computed, can be applied to any vector y I of five observations. These weights are given in the fifth column of Table 9.4.1 under the heading "g(0; B,)." The least squares estimator of 2, for the third observation (s = 0) is given by

y, - = u, - (1,O. o)(@'@)-'O'yf . The vector of weights giving the least squares estimator of 2, is presented in the

last column of Table 9.4.1. One may readily ver@ that the inner product of this vector of weights with each of the columns of the original matrix @ is zero. Thus, if the original time series satisfies the specification (9.4.2), the tihe series of estimated residuaIs created by applying the weights in the last column will be a stationary time series with zero mean. The created time series X, is given by

where F, denotes the weights in the last column of Table 9.4.1. Although X, can be thought of as an estimator of Z,, X, is 8 linear combination of the five original 2, included in the moving average.

A regression program that computes the estimated value and deviation from fit for each observation on the dependent variable is a convenient method of computation. Form a regression problem with the independent variables given by the trend specification (e.g., constant, linear, and quadratic) and the dependent variable defined by

where the estimated mean is to be computed for j = s. The vector of weights for the estimated mean is then given by

(9.4.3) w = 6 = @(@'@)-'wv = aq@'*)-'q;. ,

where g, is the vector of observations on the independent variables associated with j = s. Similarly, the vector of weights for computing the trend adjusted time series at j = s is given by

(9.4.4) v - 6 = v - aqcP'@)--'@'v = v - aqwm)-'p;. . "be computation of (9.4.3) is recognized as the computatim of 9 associated

with the regression of the vector v on @. The weights in (9.4.4) are then the deviations from fit of the same regression.

Page 524: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MOVING AVERAGES-LINEAR FILTERING 501

The moving average estimator of the trend and of the trend adjusted time series are frequently calculated for j = 0 and M, = M2. This configuration is called a centered moving averuge. The popularity of the centered moving average is perhaps due to the following theorem.

Theorem 9.4.1. The least squares weights constructed for the estimator of trend for a centered moving average of W + 1 observations, M a positive integer, under the assumption of a pth degree polynomial, p nonnegative and even, are the same as those computed for the centered moving average of 21M + 1 observations under the assumption of a (p + 1)st degree polynomial.

Pmf. We construct our matrix Cg in a manner analogous to that of Table 9.4.1, that is, with columns given by jk, k =0, 1,2,. . . , p , j = 0, +1, +2,. . . , +M.

Since we are predicting trend for j = 0, the regression coefficients of all odd powers are multiplied by zero. That is, the elements of 0:. in (9.4.3) associated with the odd powers are all zero. It remains only to show that the regression coefficients of the even powers remain unchanged by the presence or absence of the (p + 1)st polynomial in the regression. But

j = - M

for k a positive integer. Therefore, the presence or absence of odd powers of j in the matrix Cg leaves the coefficients of the even powers unchanged, and the result follows. A

The disadvantage of a centered moving average is the loss of observations at the beginning and at the end of the realization. The loss of observations at the end of the observed time series is particularly critical if the objective of the study is to forecast fiiture observations. If the trend and trend adjusted time series are computed for the end observations using the same model, the variance-covariance structure of these estimates differs from those computed for the center of the time series. We illpstrate with our example.

Assume that the moving average associated with equation (9.4.2) and Table 9.4.1 is applied to a sequence of independent identically distributed random variables. That is, the trend value for the first observation is given by applying weights g(-2,8,) to the first five observations, and the trend adjusted value for the first observation is obtained by subtracting the trend value from Y,. The weights g(-2, &) are the weights g(2.8) ammged in reverse order. For the second observation the weights g(-1, P,) me used. For t = 3.4,. . . , n - 2, the weights g(0, P,) are used. Denote the trend adjusted time series by X,, t = 1,2,. . . , n. If 3GtGn-2 and 3Gt+h+En-2, then

Page 525: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

502 RBORESSION. TREND, AND SEASONALITY

For the first observation,

Cov{X, , x,} =

560/4900, - 1260/4900, 420/4900, 252/4900, -420/4900, 204/4900, -36/4900, 0

t = l , t = 2 , t = 3 , t = 4 , t = 5 , t = 6 , t = 7 , otherwise .

9.4.2. Moving Averages of Integrated The Series

In Chapter 10 we shall study nonstationary time series that can be represented as an autoregressive process with root of unit absolute value. We have been constructing moving average weights to remove a mean that is polynomial in time. We shall see that weights constructed to remove such a mean will also eliminate the nonstationarity arising from an autoregressive component with unit root. The time series {w, t E (0, 1.2, . . .)} is called an integrated time series of order s if it is defined by

where {&, t E (0,1,2,. . .)) is a stationary time series with zero mean and positive spectral density at w I= 0.

Theorem 9.4.2. A moving average coashructed to remove the mean will reduce a first order integrated time series to stationarity, and a moving average constructed to remove a linear trend will reduce a second order integrated time series to stationarity.

Proof. We first prove that a moving average constructed to remove the mean (zero degree polynomial) will reduce a first order integrated time series W,= Z:=,,Z, to stationarity. Define the time series created by applying a moving average to remove the mean by

Page 526: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MOVING AVERAGES--LINEAR FlLTERING 503

M

x, = il: C j w , + / , t = 0, 1,2,. . . I

j = I

where the weights satisfy M c cj = O .

1-1

Then

M f + j

x,= c CI 2 z* j = l s-0

J

= I - 3 I c,(w, + r=l z &+.) M I

= I= CJ c z,+, . j = 1 r = I

which is a finite moving average of a stationary time series and therefore is stationq.

Consider next the second order integrated time series

r = O j = O r=0 j - 1

where W, is a first order integrated time series. The weights dj constructed to remove a hear trend satisfy dj = 0 and Z r l Jdj = 0. Therefore, the time series created by applying such weights is given by

which, once again, is a finite moving average of a stationary time series. A

The reader may extend this theorem to higher orders. Moving averages are sometimes repeatedly applied to the same time series.

Page 527: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

504 REGRESSION, TREND, AND SEASONALJTY

Theorem 9.4.3 can be used to show that the repeated application of a moving average constructed to remove a low order polynomial trend will remove a high order polynomial trend.

Theorem 9.4.3. Let p and q be integers, p b 0, q a 1. A moving average constructed to remove a pth degree polynomial trend will reduce a (p + q)th degree polynomial trend to degree q - 1.

Proof. We write the trend function as

P+Q

r-0

If a moving average with weights {wj: J = -M,, -MI + 1, . . . , iU2} is applied to this function, we obtain

IntenAanging the order of summation and using the fact that a filter consmcted to remove a pth degree polynomial trend satisfies

we obtain the conclusion. A

9.43. Seasonal Adjustment

Moving averages have been heavily used in the analysis of time series displaying seasonal variation. Assume that is a monthly time series that can be represented locally by

12

(9.4.5)

1 if Y, is observed in month m, Dm ={ 0 otherwise,

Page 528: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MOVING AVERAGW-LWEAR FETERING 505

11 c a,, = 0, m = l

and Z, is a stationary time series with zero mean. In this representation at + B, j is the trend component and X:=, ~lmDl+j,m is the seasonal Component. Since the sum of the seasonal effects is zero over a period of 12 observations, it follows that a moving average such as

(9.4*6) i t = L ( L Y 12 2 f -6 + &-5 + T-4 + * * ' + q + 4 + x + 5 + + q + 6 )

the difference 12 6

u, - t = x &Qm f 2 djZ,+, (9.4.7)

constructed for the time series (9.4.5) contains the original seasonal component but no trend. A moving average of the time series Y, - % can then be used to estimate the seasonal component of the time series. For example,

m = l j=-6

(9.4.8)

furnishes an estimator of the seasonal component for time t based on 5 years of data. The 12 values of $, computed for a year by this formula do not necessarily sum to zero. Therefore, one may modify the estimators to achieve a zero sum for the year by defining

(9.4.9)

where is the seasonal component at time t that is associated with the kth month. The seasonally adjusted time series is then given by the difference

The seasonal adjustment procedure described above was developed from an additive model and the adjustment was accomplished with a difference. It is quite common to specify a multiplicative model and use ratios in the construction.

- 't(k)'

Page 529: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

506 REGRESSION, TREND, AND SEASONALITY

It is also possible to constnict directly a set of weights *sing regression procedures and a model sucb as (9.4.5). To illustrate the procedute of seasonal adjustment based directly on a regression model, we assume that a quarterly time series can be represented for a period of 21 quarters by

3 4

where

1 if U, is observed in quarter k, Dfk ={ 0 otherwise

and

We call X;=, ap,k the quarter (or seasonal) effect and Zjm0 &j' the trend effect. The first seven columns of Table 9.4.2 give a *matrix for this problem. We have incorporated the restriction 2:=1 a, = 0 by setting ah = -a, - a, - a,.

With our coding of the variables, the trend value for f , the center observation, is given by A, and the seasonal value for f is given by a,. To compute rhe weights needed for the trend value at f , we regress the &-column on the remaining six columns, compute the deviations, and divide each deviation by the sum of squares of the deviations. It is readily verified that this is equivalent to Computing the vector of weights by

where

J, = (1,0,0,0,0,0,0).

The weights for the seasonal component at time f can be calculated by regressing the column associated with a, on the remaining columns, computing the deviations, and dividing each deviation by the sum of squares. These operations are equivalent to computing

where

J,, = (O,O, 0,0,1, 090).

Let v be a vector with 1 in the 1 lth (tth) position and zeros elsewhere. Then the weights for the trend and seasonally adjusted time series are given by

Page 530: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MOVING AVERAGES-LINEAR FETERING 507

Table 9.4.2. Calculatdoa of Weights for tbe Trend and !kasonaI Components of a Quarterly Time &ria

cp Weights Index

-10 1 -10 100 -9 1 -9 81 -8 1 -8 64 -7 1 -7 49 -6 1 -6 36 -5 1 -5 25 -4 1 -4 16 -3 1 -3 9 -2 1 -2 4 -1 1 -1 1 0 1 0 0 1 1 1 1 2 1 2 4 3 1 3 9 4 1 4 16 5 1 5 2 5 6 1 6 36 7 1 7 49 8 1 8 6 4 9 1 9 81

10 1 10 100

- lo00 -729 -512 - 343 -216 - 125 -64 -27 -8 -1 0 1 8 27 64 125 216 343 512 729 lo00

0 0 1 -1 -1 -1

1 0 0 0 1 0 0 0 1

-1 -1 -1 1 0 0 0 1 0 0 0 1

-1 -1 -1 1 0 0 0 1 0 0 0 1

- 1 -1 -1 1 0 0 0 1 0 0 0 1

-1 -1 -1 1 0 0 0 1 0 0 0 1

-0.0477 -0.0304 -0.0036 0.0232 0.0595 0.0634 0.0768 0.0902 0.1131 0.1036 0.1036 0.1036 0.1 131 0.0902 0.0768 0.0634 0.0595 0.0232

-0.0036 -0.0304 -0.0477

-0.0314 -0.0407 0.1 562

-0.0469 -0.0437 -0.0515 0.1469

-0.0546 -0.0499 -0.0562 0.1438

-0.0562 -0.0499 -0.0546 0.1469

-0.0515 -0.0437 -0.0469 0.1 562

-0.0407 -0.0314

0.0791 0.071 1

- 0.1 526 0.0237

-0,0158 -0.0119 -0.2237 -0.0356 -0.0632 -0.0474 0.7526

-0.0474 -0.0632 -0.0356 -0.2237 -0.01 19 -0.0158 0.0237

-0.1526 0.07 1 1 0.079 I

w, = v - wso - w,, . The weights w, can also be obtained by regressing v on B, and computing the deviations h m regression.

9.4.4. Differences

Differences have been used heavily when the objective is to reduce a time series to stationarity and there is little interest in estimating the mean function of the time series. Differences of the appropriate order will remove nonstationatity associated witb locally polynomial trends in the mean and will reduce to stationarity integrated time series. For example, if Y, is defined by

u, = a +/3t + w,, (9.4.10)

where w, = c I zt-I

j - 0

Page 531: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

508 REGRESSION, "D, AND SEASONALITY

and 2, is a stationary time series with zero mean, then

AY, = U, - U,-l = fl +Z,

is a stationary time series. The first difference operator can be viewed as a multiple of the moving average

constructed to remove a zero degree polynomial trend from the second of two observations. The second difference is a muItiple of the moving average constructed to remove a linear trend from the third of three observations. The second difference can also be viewed as a multiple of the moving average constructed to remove a linear trend from the second of three observations, and so forth, Therefore, the results of the previous subsections and of Section 2.4 may be combined in the following lemma.

Lemma 9.4.1. Let U, be a time series that is the sum of a polynomial trend of degree r and an integrated time series of order p. Then the qth difference of Y,, where r d q and p S q, is a stationary time series. The mean of AqU, is zero for r c q - 1.

prool. See Corollary 2.4.1 and Theorems 9.4.2 and 9.4.3. A

Differences of lag other than one are important in transforming time series. For the function f i r ) defined on the integers, we define the diflerence of lag H by

Ac,,,f(t) =.IT?) -f(t - H) (9.4.1 1)

where H is a positive integer.

a function satisfying In Section 1.6 we defined a periodic function of period H with domain T to be

J l t + H ) = f ( t ) V t , # + H E T .

For T the set of integers and H a positive integer, the difference of lag H of a periodic function of period H is identically zero. "berefore, differences of lag H have been used to remove seasonal and other periodic components from #me series. For example, if the expected value of a monthly time series is written as

12

E{Y,)= p + flt + c qD,, . i = l

where

(9.4.12)

1 if U, is observed in month i , Dti ={ 0 otherwise,

12

1- 1

Page 532: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

STHUCRJRAL MODEXS 509

then the expected value of A(12J is E(Y, - Y,- = 128. The difference of lag 12 removed the periodic component and reduced the linear trend to a constant as well. Also, a difference of any finite lag will reduce a first order integrated time series to stationarity.

Mixtures of differences of different lags can be used. For example, if we take the first difference of the difference of lag 12 (or the difference of lag 12 of the first difference) of the time series V, of (9.4.12), the expected value of the resultant time series is zero; that is, ,??{Acl2) AY,} = 0.

The effect of repeated application of differences of different lags is summarized in Lemma 9.4.2,

Lemma 9.43. Let the time series V, be the sum of (1) B polynomial trend of degree r, (2) an integrated time series of order p , and (3) a sum of periodic functions of order H,, H,, . . . , Hq, where r Gq and p d q. Then the difference X, = A(H,)A(H2)- .ACHq,Y,, where 1 SH, C a for all i, is a stationary time series. If r d q - 1, the mean of X, is zero.

Pmf. Reserved for the reader. A

93. STRUCTURAL MODELS

In Sections 9.1 and 9.2, we demonstrated the use of specified functions of time to approximate a mean function. In Section 9.4, we studied moving averages as local approximations to the mean function. Stochastic functions are also used as models for the means of time series. Models with stochastic means are sometimes called structural models in econometrics. See, for example, Harvey (1989). To introduce the approach, consider the simple model

U, =p+ + e l , & = & - I +a , ,

f = 1.2,. . . , (9.5.1)

where (e,,a,)'-NI(O,diag(u~, a:)). Models such as (9.5.1) are also called unobserved compoRents models and can be considered a special case of (9.0.1). It follows fkom (9.5.1) that

X,=AV,=a,+e,-e,-, (9.5.2)

and hence X, is a nonnally distributed stationary process with mean zero and autocovariances ~ ( 0 ) = a: + 2u:, ~ ( 1 ) = -uf, and ~ ( h ) = 0 for lhl> 1. That is, XI is a first order moving average time series with representation

Page 533: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

510 REORFSSION. TREND, AND SEASONALITY

with mean zero and variance a: = ( I + f12)-'(2a,2 + 5). If we let K = U ; ~ U ; , then K = 2 - @-'(I + f l )2 and a: = (K + 2)-'( 1 + @ )a:. When a," = 0, the model (9.5.1) reduces to the constant mean model and X, =e, -e,- , is a noninvertible moving average.

Given a sample segment from a dizat ion of Y,, we can estimate the unknown parameters (a:, a:) by fitting a first order moving average to X, = AU,, with the parameter p restricted to [ - 1,O). The resulting estimata of 43 and uz are used to construct estimates of (a:, a:). Given an estimate of (a:, aJ, estimates of the h can be obtained with filtering methods. Because of the form (9.5.1), it is natural to use the Kalman filter procedures of Section 4.6 to estimate the 14.

It is also clear that a prediction constructed with estimates of a: and t~," is identical to the prediction obtained by using the moving average representation for

A more general trend model is obtained by including a random change x,. component. The expanded model is

(9.5.4)

where (e,,, et2, er3) -NI(O, diag{a:, a;, af}). The component is a local linear trend.

If we take second differences of Y,, we obtain

(9.5.5)

By the assumptions on e,, we have

[a(O), %(I), ~ ( 2 ) ] = [af + 20; + 6a:, -a; - 4a:, a:] , and ~ ( h ) = 0 for h > 2, Thus, X, = A'Y, can be represented as a second order moving average

x ,=u ,+p ,u , - , +&u,-,. (9.5.6)

-2 2 - 2 2 If we let ~2 = a, a2 and K~ = ul P,, then

K2 = -4 - p;'(2)&(1) 3

K~ = p i ' ( 2 ) - 2 ~ , - 6 , a : = ( ~ ~ + 2 ~ ~ + 6 ) - 1 a,(l+j3:+&). 2

As with the simple model, the fact that variances are nonnegative restricts the possible values for (PI, a).

Example 9.5.1. To illustrate the use of the structural model to estimate the

Page 534: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

STRUCTURAL MODELS 511

trend, we use the data on wheat yields of Example 9.2.1. We consider the simple model

(9.5.7)

where (e,, a,)' - II(0, diag{of, o,"}). We let X, = AY, and fit the first order moving average of (9.5.3) to the first differences of the data to obtain

X, = - 0.4109 u,-, + U,

(0.1010)

with St = 4.4019. Eighty-three differences were used, and the zero mean model is estimated. Using

(1 + p 2 p p = -(2af i- aya;,

(1 + pZ)a,Z = 2a3 4- u," ,

we obtain (&.,", (is = (1.8086,1.5278). Using 0{8:} = 0.4726 and Taylor ap rox- imation methods, the estimated standard errors are 0.5267 and 0.5756 for 6, and a,", respectively.

Given the estimates and the model, one can use filtering procedures to estimate the individual p,. We use the covariance structure to construct a linear filter. If the p m s s begins at time one with a fixed unknown initial vaiue, denoted by p,, the covariance matrix of the vector (Y,, Y,, . . . , Y,)' = Y is

D

v,, = 1 ~ 3 + L L ' ~ , ~ ,

where L is an n X n lower triangular matrix with hj = 1 fo r j< i and LIj = O for jai. The best linear unbiased estimator of the unknown initial value is

p, = (J'v;,'J)-'J'v;;Y, (9.5.8)

where J' is an n-dimensional row vector composed of all ones. Now, from (9.5.7),

f i = (ELI 9 112, - - - 9 EL")' = ELI J + La

where a = (ui, a2,. . . , u,)'. Therefore, the best unbiased estimator of p is

@ = JP, + a;LL'Vi'(Y - J& > = {J(J'V;; J)- J'V;: + o:LL'V;' [I - J(J'V;,'J)- ' J'V;; ]}Y .

(9.5.9)

For fixed v,'a;*, the estimator is a linear function of Y, say KY, and the weights to be applied to the elements of yi to estimate any particular ~4 decline rapidly as

Page 535: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

512 REGRESSION. TREND, AND SEASONALITY

the distance betweenj and t increases. In equation (9.5.9), @ is an n-dimensional vector, but any subset of the original vector of observations can be used to construct a smaller vector of estimates. Using our estimates of a: and a:, the vector of optimal weights for estimating c1, using (Y,+ Ym+. . . . , Ym+4, Ym+J is

H,,, = (0.007,0.013,0.029,0.071,0.171,0.418,0.171,0.071,0.029,0.013,0.007).

Notice that the sum of the weights is one and that the weights are symmetric about the center weight. If K denotes the matrix multiplying Y in (9.5.9), the vector €I,,, is the sixth row of the eleven by eleven matrix K.

Figure 9.5.1 contains a plot of the estimated f i values.Values for 1913 through 1986 were computed using H,,,. That is, the estimates are centered moving averages with weights H,. The end values were computed using the optimal estimator based on the first eleven and last eleven observations. The vector of estimated variances of the e m s jit - f i for the last six observations is

(0.755,0.755,0.757,0.764,0.808,1.066).

m * 40

30 -

I! F Q

44

8 5

20 -

I 104 I I I I i I i I

0 10 20 30 40 50 60 70 80 90 Time

Figurn 9S.1. U.S. wheat yields 1908-1991 and structural trend estimates from model (9.5.7).

Page 536: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SOMB E€W?CTS OF MOViNG AVERAGE OPERATORS 513

The variance for Y,,-5 is the variance for all estimators in the center of the observation vector, that is, more than 5 years from the end points. The estimated variances are the last six diagonal elements of the matrix

LLV.,” - K L L ~ ~ ~ - LL’KW: + K ~ ~ K ’ . The random walk model for pt provides a rather flexible mode1 for the mean

function. As a result, the fi, of Figure 9.5.1 form a much more variable estimated mean function than the &rafted polynomial function of Example 9.2.1. The random walk model might be questioned for these data because of the rather extended period of increasing yields. Although one may not be satisified with the final estimates of & obtained from the simple structural model, the estimates furnish useful information about the general movement of the time series.

Under the model (9.5.7), the predictor for &+,, s > 0, is fin, The variance of the prediction e m r is V{i2n+s - &+J = V{bn - &Lh) -t- su,”. Thus, the predictions of the mean for 1992, 1993, and 1994 are (35.46,35.46,35.46), respectively, and the estimated standard errors of the estimated means are (1.61,2.03,2.38). The predictions of the individual yields for 1992, 1993, and 1994 are also (35.46,35.46,35.46). The variances of the prediction errors are the variances of the estimated means increased by (r:. The estimated standard errors of the prediction errors are (2.03,2.38,2.68). AA

Example 9.5.1 illustrates the fact that the structural model can be used for preliminary investigation of a time series. We discussed the construction of moving averages of arbitrary length based on polynomial models in Section 9.4. By specifying a structural model for the trend, we also obtain a moving average estimator of the trend. The “length” of the moving average is based on estimates constructed with the data

9.6. SOME EFFECTS OF MOVING AVERAGE OPERATORS

Linear moving averages have welidefined effects on the correlation and spectral properties of stationary time series. These were discussed earlier (see Theorems 2.1.1 and 4.3.1), but the effects of trend-removal filters are of sufficient importance to merit special investigation.

Proposition 9.6.1. Let X, be a stationary time series with absoiutely summ- able covariance function, and let Y, = Z E - L aT,,j, where L and M are nonnega- tive integers and the weights a, satisfy X: - aj = 0. Then the spectral density of Y, evaluated at zero is zero; that is, fu(0) = 0.

Proof. By Theorem 4.3.1, the spectral density of Y, is given by

f y ( 4 = <2.rr>2f,(@Vt(4&(4 I

Page 537: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

514

where

REGRESSION, TREND. AND SEASONALITY

Since c f l _ - L uj = 0, it follows thatA(0) = 0. A

Since fU(o)= ( 2 ~ ) ~ ' X;fp-m %(/I), it follows that the sum of the auto- covariances is zero for a stationary time series that has been filter& to remove the mean. For example, one may check that the covariances of X, and Xl+,,, h = -4, -3, . . . ,4, for the example of Table 9.4.1 s u m to zero.

Tbe weights in the last column of Table 9.4.1 were constructed to remove a quadratic trend The transfer function of that filter is

2 @ ( ~ ) = (70)-'(36 - 48 cos w + 12 CQS 20),

and the squared gain is

12@(0)1' (70)-'(36 - 48 cos w + 12 cos 20)'.

The squared gain is plotted in Figure 9.6.1. The squared gain is zero at zero and rises very slowly. Since the weights remove a quadratic trend, the filter removes much of the power from the spectral density at low frequencies. Since the spectral density of white noise is constant, the squared gain is a multiple of the spectral density of a moving average of uncorrelated random variables where the coefficients are given in Table 9.4.1.

The squared gain of the first difference operator is displayed in Figure 9.6.2. While it is of the same general appeafance as the function of Figure 9.6.1, the squared gain of the first difference operator rises from zero more rapidly.

1.5

0.5

0 0 d 4 d2 3d4 1F

0

Figure 9.6.1. Squand gain of filter in Table 9.4.1.

Page 538: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SOME EFFECTS OF MOVING AVERAGE OPERATORS 515

.- r ~ . o ~ m M

/ 4.0

..

U

a / 0 d4 Id2 3d4 n

0

Flgure 9.6.2. Squared gain of first difference operator.

If a time series contains a perfect sine component, the difference of the time series will also contain a perfect sine component of the same period, but of different amplitude and phase. That is, if Y, = sin wt, the difference is

AY, =sin ot - sin& - 1)

= 2 sin +wcos w(t -3). We note that the amplitude of the sine wave is changed. If w is such that lsin 301 <$, the amplitude will be reduced. This will be true for long period waves. On the other hand, for short period waves, 1713 < w < rr, the amplitude will be increased. Note that this agrees with Figure 9.6.2, which shows the transfer function to be greater than one for w > n13.

Filtering 8 stationary time series with least squares weights to remove the seasonal effects will reduce the power of the spectral density to zero at the seasonal fresuencies. For a time series with p observations per period of interest, the seasonal frequencies are defined to be 2mp-'w, rn = 1 ,2 , . . . , L [ p ] , where L [ p ] is the largest integer less than or equal p12. For example, with a monthly time series, the seasonal frequencies are w16, 1~13, w12, 271/3,5rr16, and w. We have not included the zero frequency because most seasonal adjustment schemes are not constructed to remove the mean.

PmposJtion 9.6.2. Let uj be a least squares linear filter of length R constructed to remove seasonal variation from a time series with p (Rap) observations per period of interest. Let the time series have an absolutely summable covariance function. Then the spectral density of the filtered time series is zero at the seasonal frequencies.

Froof. A lineat least squares filter constructed to remove the seasonal effects

Page 539: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

516

satisfies

REGRESSION, “D, AND SEASONALITY

2mn R

2 ajsin---j=O, j - I P

This is so because any periodic function of period p defined on the integers can be represented as a sum of p sines and cosines. We have

and, setting w = 2 m / p ,

form = 1.2, . . . , L,fpl . A

The first difference is a filter satisfying the conditions of Proposition 9.6.1, and the difference of lagp satisfies the conditions of Proposition 9.6.2. We now consider the effects of difference operators on autoregressive moving average time series.

Proposition 9.63. Let X, be an autoregressive moving average time series of order (p, q) expressible as

where {e,] is a sequence of uncorrelated (0, a’) random variables. Let the roots of m P + a,mP-l + - * + a- = 0 be less than one in absolute value, and let the roots ofsq+ /31sq - ’ + . . . + B , = ~ b e s , , s , , ..., s,.~henthefirstdifference x = x , - Xg-l is an autoregressive moving average ( p , 4 + 1) with the autoregressive portion uacbanged and the roots of the moving average portion given by sl. s2,. . . , sq, 1.

Pmf. The spectral density of X, is given by

Page 540: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

SOME E-S OF MOVING AVERAGE OPERATORS 517

The spectral density of Y, is given by

' ] = I

where sq+l = 1. A

It follows immediately from Proposition 9.6.3 that the kth difference of the stationary autoregressive moving average time series of order (p, q) is an autoregressive moving average time series of order (p, q + k), where at least k of the roots of the auxiliary equation associated with the moving average are one. Differences of lag r have similar effects on a time series.

Proposition 9.6.4. Given a time series X, satisfying the assumptions of Proposition 9.6.3, the difference of lag r of XI is an autoregressive moving average time series of order ( p , q + r) where the autoregressive portion is unchanged and the roots of the moving average portion are s l , s2, . . . , sq plus the r roots of the equation sr = 1.

Proof. The spectral density of Y, =XI - X,-? is given by

fJo) = (1 - e-""')(l - e'Yr)fX(o)

and, for example, the factor e-'"' - 1 can be written as a + r

wherethes,, j = q + l , q + 2 ...., q+r,aretherrmtsofs '=l . A

These results have important practical implications. If one is attempting to identify an autoregressive moving average model for a time series that has been differenced, it is wise to consider a moving average of order at least as large as the order of differences applied to the original time series. If the time series was stationaty before differencing, the characteristic polynomial of the moving average

Page 541: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

518 REGRESSION. TREND, AND SEASONALITY

portion of the differenced time series contains at least one unit root and the estimation theory of Section 8.3 is not applicable.

9.7. REGRESSION WITH TIME SERIES ERRORS

In this section we treat the problem of obtaining estimates of the parameters of regression equations with time series errors. We investigated the properties of the ordinary least squares estimators in Section 9.1 and found that for some special types of independent variables the ordinary least squares estimators are asymp- totically fully efficient. For other independent variables, including most stationary time series, the ordinary least squares estimators are not efficient. Moreover, in most situations, the variance estimators associated with ordinary least squares are biased. To illustrate some of these ideas, consider the simple model

(9.7.1) q=px,+z , ,

(Z,, X , ) = w,- f I I\x, - I 1 + (er * '

where the e, are n o d independent (0, a:) random variables, the ,re normal independent (0, a,") random variables, el is independent of uj for all 2, j , lpl< 1, and IAI < 1. Under the model (9.7.1), the expected value of

conditional on the observed XI, t = 1,2,. . . , n, is p, and hence js is unbiased.

( X I , X,, . . . , X n ) is given by equation (9.1.4), The variance of the least squares estimator of /3 conditional on X =

r = l j - 1

where Q: = x ( O ) . Now the sample autocovariance of X, converges to the population autocovariance, and

where u: = ~ ~ ( 0 ) . Hence,

Page 542: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

REGRESSION WITH TIME SERIES ERRORS 519

If p is known, the variance of the generalized least squares estimator computed with known p, conditional on X, is

It follows that the large sample relative efficiency of generaIized least squares to ordinary least squares is

"his expression is greater than one for all nonzero (p, A) less than one in absolute value. For p = A = (OS)'", the relative efficiency is 30096!

The estimated variance of the ordinary least squares estimator obtained by the usual formulas is

where s2 = (n - l)-I z;=l (y, - &x,,>'. Since V{bs 1 X} = OJn-'), we have bs - p = Op(n-"2) and s2 converges to

the variance of Z,. Therefore,

and

The common least squares estimator of variance will be an over- or underestimate, depending on the signs of p and A. If p = A = (0.5)1'2, the ordinary least squares estimator will be estimating a quantity approximately one-third of the true variance. Given these results on the efficiency of ordinary least squares and on the bias in the ordinary least squares estimatM of variance, it is natural to consider an estimator such as (9.1.3). where the estimated autocorrelations are used in place of the unknown parameters.

In Section 9.3 we established that the autocorrelations of the error time series could be estimated from the calculated regression residuals. We apply those results to our current model.

Propasition 9.7.1. Let Y, satisfy

Page 543: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

520 REGRESSION, TREND. AND SEASONALITY

(9.7.2)

where Z, is a stationary autoregressive process, D

the roots of the characteristic equation are less than one in absolute value, the 9- satisfy (9.1,7), (9.1.8), and (9.1.9), and {e,} is a sequence of independent (0, c2; random variables witb E{e:} = qc4. Denote the simple least squares estimator of /3 by B = (b,, A, . . . , &’ and the simple least squares residuals by &.

Let dir be the estimator of a = (a,, a2, . . . , ap)’ obtained by regressing 2, on (&,,&, - - . ,$Z,-p), t = p + 1, p + 2 , . . . ,n , and let & be the estimator obtained byregressing2, OI~(Z,~,,Z,~~,...,Z,~~). Then

dir - & = Op(n-’ ) .

The result is an immediate consequence of Theorem 9.3.1. Proof. A

Given an estimator of the correlational structure of the error time series, we wish to construct the estimated generalized least squares estimator. Often the most expeditious way to obtain this estimator is to Et.ansfonn the data. In the present case, the Gram-Schmidt orthogonalization leads one to the transformed variables

(9.7.3)

P

q = e , = ~ , + Z a , ~ ,-,, z = p + l , p + 2 ,..., n , i=1

where S,, = yL1”(O)q h2 = { [ 1 - p ~ ( 1 ) ] ~ ( 0 ) } - ” 2 u , =%(lX[l - p~(1))75(0)}-1’za, etc. The t; are uncomlated with constant variance u . For the first order autoregressive time seiiea the transformed variables are

El = (1 - p2)”2z1 , +=ZI-pZ,- l , t = 2 , 3 ,..., n.

As a second example of the transformation, consider the second order auto- regressive process

Z, = 1.532,-, - 0.662,.-, + e, ,

Page 544: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

REGRESSION WlTH TlME SERIES ERRORS 521

where the variance of e, is u2. Then by (2.5.8) the variance of 2, is 11.77~'. The correlations for the time series are given in Table 6.4.2. The transformation is

~~=(11.77) -"~2, ,

4 = (1.7718)-"'(22 - 0.92172,), 5 = e, = 2, - 1.532,-, + 0.662,-, , t=3,4 ,..., n .

To define the estimated generalized least squares estimator, we can express our original model (9.7.1) in the matrix notation of Section 9.1 as

y=cpg+z.

We let V,, = E{zz'} and E = Tz, where T is the n X n transformation matrix defined in (9.7.3). Then the estimated generalized least squares estimator is obtained by regressing ?y on %,

b= [@'*'9cp]-'cp'P?y = [@'$,'@]-'Cb'9,'y1 (9.7.4)

where 0,' = T'Tc?-', a2 = (n - 2p) - 1 ZimpTl(& n f Z,P=, c$,-~)~, and 9 is ob- tained from T by replacing aj with 4, u with B', and so forth.

In Section 5.7 we demonstrated that the limiting distributions of the generalized least squares estimator and of the estimated generalized least squares estimator are the same under mild assumptions. By Theorem 5.7.4 and Proposition 9.7.1, the estimated generalized least squares estimator for an error process that is an autoregressive moving average will have the same limiting distribution as the generalized least squares estimator based on the m e parameters, under mild assumptions. In Theorem 9.7.1, we give a direct proof of the result for autoregressive errors using the transformation 9.

Theorem 9.7.1. Let Y, satisfy the model (9.7.2), where the (Pri satisfy the assumptions (9.1.7), (9.1.8). and (9.1.9). Then

Dn(BG - B ) = op(n-"2) ?

where BG = [WV~'@]-lcP'V~lg, @ is defined in (9.7.4). dnii = (Xr=l P , ~ ) 2 112 , and

D, = diagVn, I , dn22, . . . I d,,,) . Proof. We have, for f',, the tth row of $ and 9' the ith column of a,

P

6 = *,.z = €, + c (4 - aj)z,-j, J = I

Page 545: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

522 REGRBSSION. TREND, AND SEASONALITY

for t = p + 1, p f 2,. . . ,n, with similar expressions holding for t = 1,2,. . . , p . By Proposition 9.7.1 and Theorem 8.2.1, di! - a! = O,,(n-“’), and it follows that

Therefore,

D,@ - p) = ( D , ~ Q ~ $ ‘ ~ ~ D ~ I ) - I D , * ~ ’ $ ~ ~ Z I

= @n-r#V;lrpD,’)-lD,lO‘V;lz + Op(n-1’2)

= D,(& - 8) + O,(~Z-~”). A

Since the residual mean square for the regression of “y on $4) converges in probability to uz, one can use the ordinary regression statistics for approximate tests and con6dence intervals.

Because of the infinite autoregressive nature of invertible moving average time series, an exact transformation of the form (9.7.3) is cumbersome. However, the approximate difference equation traasfonnation wili be adequate for many purposes. For example, if

Zr=et+be , - , , lbl<1,

the transformation

e, = (1 + b2)-”’2, , (1 + b2)”’(1 + b2 + b4)-”’[& - (1 f b2)-’bZ, J , (9.7.5)

$=Z,-b5-1 s t=3 ,4 ,..., n ,

wiH generally be satisfactory when b is not too close to one in absolute value.

Example 9.7.1. To illustrate the fitting of a regression model with time series errors, we use the data studied by Prest (1949). The data were discussed by Durbin and Watson ( 195 11, and we use the data as they appear in that article. The data, displayed in Table 9.7.1, pertain to the consumption of spirits in the United Kingdom from 1870 to 1938. The dependent variable Y, is the annual per capita consumption of spirits in the United Kingdom. The explanatory variables q, and y2z are per capita income and price of spirits, respectively, botb deflated by a general price index. All data are in logarithms. The model suggested by Prest can be written

Page 546: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

REGRESSION WITH TIME SERIES ERRORS 523

Table 9.7.1. AMWI Consumption of Spirits in the United Kingdom from 1870 to 1938

Consumption Income Price Y W v, 9, I 4112

1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904

1.9565 1.9794 2.0120 2.0449 2.0561 2.0678 2.0561 2.0428 2.0290 1.9980 1.9884 1.9835 1.9773 1.9748 1 .%29 1.9396 1.9309 1.9271 1.9239 1.9414 1.9685 1.9727 1.9736 1.9499 1.9432 1.9569 1.9647 1.9710 1.9719 1.9956 2.oooo 1.9904 1.9752 1.9494 1.9332

1.7669 1.9176 1.7766 1.9059 1.7764 1.8798 1.7942 1.8727 1.8156 1.8984 1.8083 1.9137 1.8083 1.9176 1.8067 1.9176 1.8166 1.9420 1.8041 1.9547 1.8053 1.9379 1.8242 1.9462 1.8395 1.9504 1.8464 1.9504 1.8492 1.9723 1.8668 2.oooO 1.8783 2.0097 1.8914 2.0146 1.9166 2.0146 1.9363 2.0097 1.9548 2.0097 1.9453 2.0097 1.9292 2.0048 1.9209 2.0097 1.9510 2.02% 1.9776 2.0399 1.9814 2.0399 1.9819 2.0296 1.9828 2.0146 2.0076 2.0245 2.oooo 2.oooo 1.9939 2.0048 1.9933 2.0048 1.9797 2 . m 1.9772 1.9952

Consumption Income Price Year v, Qfl %2

1905 1906 1907 1908 1909 1910 191 1 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 I 926 1927 I928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938

1.9139 1.9091 1.9139 1.8886 1.7945 1.7644 1.7817 1.7784 1.7945 1.7888 1.8751 1.7853 1.6075 1.5185 1.6513 1.6247 1.539 1 1.4922 1.4606 1.455 1 1.4425 1.4023 1.3991 1.3798 1.3782 1.3366 1.3026 1.2592 1.2635 1.2549 1.2527 1.2763 1.2906 1.2721

1.9924 1.9952 2.0117 1.9905 2.0204 1.9813 2.0018 1.9905 2.0038 1.9859 2.0099 2.0518 2.0174 2.0474 2.0279 2.034 I 2.0359 2.0255 2.0216 2.0341 1.9896 1.9445 1.9843 1.9939 1.9764 2.2082 1.9965 2.2700 2.0652 2.2430 2.0369 2.2567 1.9723 2.2988 1.9797 2.3723 2.0136 2.4105 2.0165 2.4081 2.0213 2.4081 2.0206 2.4367 2.0563 2.4284 2.0579 2.4310 2.0649 2.4363 2.0582 2.4552 2.0517 2.4838 2.0491 2.4958 2.0766 2.5048 2.0890 2.5017 2.1059 2.4958 2.1205 2.4838 2.1205 2.4636 2.1 182 2.4580

SOURCE: Durbin and Watson (1951). Rqmduced with pmnission of the Trustees of Biometrika and of the authors.

wkre 1869 is the origin for t , p,, = lO-'f, pf4 = lO-"(t - 35)'. and we assume 2, is a stationary time series.

The ordinary least squares regression equation is

E,= 2.14 + 0.69 q, - 0.63 YI,~ - 0.95 p,3 - 1 . 1 5 ~ ~ . (0.28) (0.14) (0.05) (0.09) (0.i6)

Page 547: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

524 REGRESSION. TREND, AND SEASONALITY

The residual mean square is 0.00098. The numbers in pentheses are the standard errors computed by the ordinary regression formulas and would be a part of the output in any standard regression computer program. They are nor proper estimators of the standard errors when the error in the equation is autocorrelated. We shall return to this point.

The Durbin-Watson d is 0.5265. Under the null hypothesis of normal independent errors, the expected value of d is

E{d} = [2(68) - 0.67](64)-' = 2.1 145,

where

@{(A@)'(A@XdW)-'} = 0.67.

Therefore, from equations (9.3.8) and (9.3.9),

b= 0.7367 + (0.5)(0.1145)(66)(65)-'(1 - 0.5427) = 0.7633

and

f d = (66)"2(0.7633)(0.4174)-"2 = 9.60.

Obviously, the hypothesis of independent errors is rejected. To investigate the nature of the autocorrelation, we fit several autoregressive models to the mgression residuals. The results are s u d z d in Table 9.7.2. The statistics of that table are consistent with the hypothesis that the error time series is a first order auto- regressive process. Therefore, we construct the transformation of (9.7.3) for a first order autoregressive process with 6 = 0.7633. The transformation matrix is

-0.7633 1

0

-0.7633 ::: 0 0 * * * !J, 1

-0.7633 1 ... 0 *=I 0 . -0.7633 1 * a *

where 'li' is a 69 X 69 matrix and 0.6460 = (1 -

the estimated generalized ieast squares equation

= [I - (0.7633)21"2. Regressing the transformed U, on the transformed q,, i = 0.1, . . . ,4, we obtain

The numbers in parentheses are consistent estimates of the standard errors of the

Page 548: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

REGRESSION WITH TIME SERIES ERRORS 52s

TaMe9.7.2. AapiysiS of Varlance for Residupls from Spirit Consumption Regression

S o m e Freedom square Degrees of Mean

0.03174

0.00013

2- I

1 0.00036 it,-, after 2,-,,zl-z

1 0.00004 %-, after Zl-,p&.2, 2,-, 1 0.00049

21-, aftcr;31-, I

Error 56

estimators. They are computed h m the transformed data by the usual regression formulas. The error mean square for the transformed regression is 0.000417.

Every one of the estimated standard errors for the generalized least squares estimator based on /i = 0.7633 is greater than the standard error calculated by the usual formulas for the simple least squares estimates. Of course, we know that the standard formulas give biased estimators far simple least squares when the errors are autoconrelated. To demonstrate the magnitude of this bias, we estimate the variance of the simple least squares estimators, assuming that the error time series is

Z, = 0.76332,- + e, ,

where e, is a sequence of uncorrelated (O,O.O00417) random variables. The covariance matrix of the simple least squares estimator is estimated by (@f@)-'Wtz@(WD)-', where Q, is the 69 X 5 matrix of observations on the independent variables and va is the estimated covariance matrix for 69 observa- tions from a first order autoregressive time series with b = 0.7633.

Alternative estimates of the standard errors of the coefficients are compared in Table 9.7.3. The consistent estimates of the standard errors of simple least squares obtained from the matrix (@'@)-'W~z@(@f9)-' are approximately twice those obtained from the simple formulas. This means that the variance estimates

Table 9.7.3. Comparison of Alternative Estlmatom of Standard Errors of Estimators for Spfrits Example

Estimator Consistent Consistent for Simple Estimator Estimator for

Least squares for Simple Generalized Coefficient Assuming p = 0 Least squares Least squares

Bll 0.282 0.54s 0.3W Bl 0.138 0.262 0.146 a 0.054 0.112 0.072

0.088 0.179 0.108 0.161 0.313 0.266

a 84

Page 549: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

526 REQRESSION. TREND, AND SEASONALITY

computed under the assumption of zero correlation are approximately one-fourth of the consistent estimates. The estimated standard enms for the generalized least squares estimators range fium about one-half (0.56 for income) to nine-tenths (0.85 for time squared) of those for simple least squares. This means that the estimated efficiency of simple least squares relative to generalized least squares ranges from about three-tenths to about seven-tenths.

The computations for regression models with autoregressive mrs m available in several computer programs. 'Itre procedute AUTOREG of SAS Institute, Inc. (1989) is one convenient program. If we specify a first order autoregressive error and the Yule-Walker procedure of estimating the autoregressive process, we obtain an estimate b = 0.7160 with a standard e m of 0.0879. The estimate of p differs from ours because of the different estimator used. The estimated regression equation is nearly identical to the one given above. AA

In Example 9.7.1 we gave details of a two-step computational procedure for estimation of a regression equation with time series errors. Computer programs offer one the option of specifying and estimating the complete model, either by the two-step procedure or as a single maximization problem.

Assume the time series error in a linear regression model to be an auto- regressive process. Thus, we write

Y,=q.p+Zl. t = 1 , 2 ,..., n , (9.7.6a)

P

z1 + C. aiz,+ = e, , i = l

(9.7.6b)

where {eJ is a sequence of M(0, a') random variables, and q,, t = 1,2,. . , , n, are &-dimensional row vectors of explanatory variables. If the process 2, is a stationary normal process, the log likelihood is

UY; e) = -o.s[n log 2~ + loglv,l+ (Y - @p)'v;'(y - @p)i, (9.7.7)

where V,, =V,,(a, a2) is the n X n covariance matrix of (2, ,Z2, . . . , Z,,), Y is the n-dimensional column vector (Y,, Y2, . . . , Y")', CP is the n X k matrix

The theorems of Section 5.5 can be used to obtain the limiting distribution of the maximum likelihood estimator. The limiting distribution of the maximum likelihood estimator of j3 is the same as the limiting distribution of the generalized least squares estimator, if the latter exists.

2 (pi., q;., . . . , qi.)', and 8' = (PI, &, . . . I B,, al, %, . . . , aps a ).

Theorem 9.7.2. Let Y, satisfy equation (9.7.6a). where 2, is a stationary process with the roots of the characteristic equation associated with (9.7.6b) less than one in absolute value. Assume the e, are iid(0, cr') random variables or that the e, satisfy the martingale difference conditions of Corollary 5.3.4. Let (y1.1 and m,,} be sequences such that

Page 550: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

REGRESSION WITH TIME SERIES ERRORS 527

and

Let d be the estimator that maximizes the log Gaussian likelihood (9.73, and let eo be the true parameter vector. Then

Mi(@ - Po)--% N(0, I),

n"*(& - a')-% N(0, A-'02),

8' 5 c', and ML(B - pa) is independent of r ~ " ~ ( & - a') in the limit, where A = N Z , - I , Zt+, . . . , Zt-,, )'(Z,- 1, 2,- ', . . . , z, -,, 11.

prool. omitted. A

Example 9.7.2. We use the wheat yield data of Example 9.2.1 for the period 1908 to 1991 to illustrate the nonlinear fitting. The fi, from Example 9.5.1 and a plot of the data suggest that wheat yields were nearly constant for the first 25 years and for the last several, say five, years. Therefore, we consider a mean function that is constant for the first 25 years and for the last five years. A function that is continuous with continuous first derivative that meets these requirements is

0, 1 =s t G 25, (t - 25)', 25 6 t 4 54, 841 + 58(t - 54) 54 c t s 70, 841 + 58(t - 54) - 2.9(t - 70)2, 7 0 c t 6 80, 2059, t a 8 0 ,

where t = 1 for 1908, The first part of the function was used in Example 9.2.1. We assume the errors follow a first order autoregressive process. Then our model is

u, = B, + BI Pa + 2, (9.7.8)

~ 2 = [

for t = 1.2, . . . , n, where 2, is the stationary process satisfying

Z t + a , Z , - , = e l .

and we assume the e, are iid(0, o:).

maximum likelihood estimates, we obtain If we use procedure AUTOREG of SAS/ETSo to compute the Gaussian

bo = 14.257, 8, = 0.01075, &, = - 0,292, &: = 3.427. (0.380) (0.00037) (0.107)

Page 551: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

52% REORPSSION, TREND, AND SEASONAUTY

The model predictions for the next three observations are

(flw,, f,993, f,,,,) = (35.78, 36.21, 36.34). (1.95) (2.05) (2.07)

The prediction for 1992 is similar to that obtained in Example 9.5.1, The model of Example 9.5.1 postulates a random walk for the mean function and uncomlated deviations from the mean function. Hence, the predictions for a11 future periods are the best estimate of the yield for 1991, and the estimate for 1991 is heavily influenced by the actual observation for 1991. The predictions under the model (9.7.8) ate based on the estimated trend associated with the function fi2. Because the function is constant for t Z 80, the predictions ultimately fall on a horizontal line. Because the yield for 1991 is about 2.1 bushels below the trend line, the prediction for 1992 is about 0.29 X 2.1 = 0.61 bushels below the trend line, the prediction for 1993 is about (0.29)2 X 2.1 = 0.18 bushels below the trend line, and the prediction for 1994 is abwt (0.29)3 X 2.1 = 0.05 bushels below the trend line. Because the 1991 yield is below the trend line and the estimated autocorrelation is positive, the predictions for the next three years increase toward the trend line.

The standard errors of the predictions constructed in this example are smaller than those of the predictions of Example 9.5.1. In this example, the mean is assumed to be of a known functional fcnm. In Example 9.5.1, the mean is assumed to be random walk. If the mean function is known, then it is easier to predict than the random walk. However, in using the estimated trend function for prediction, one projects the trend function into the future. Often practitioners hesitate to project polynomial trends very far into the future. In our current example, the most recent past of the mean function is assumed to be constant, and we might be less worried about our trend assumption than we would have been in constructing predictions in 1971 using the function of Example 9.2.1. To project linearly in 1971, one would need to believe that the improvements in varieties and pesticides and the use of fertilizer that led to increased yields during the observation period would continue at the same rate during the prediction period. In the case of wheat yields the linear trend predictions would have peformed well for about fifteen years, from 1971 until about 1986. AA

Example 9.7.3. As a second illustration of nonlinear estimation, we use the data on consumption of spirits of Table 9.7.1. If we use nonlinear least squares and fit the model with a first order autoregressive e m to the last n - 1 observations, we obtain

= 2.421 f 0.716 p,, - 0.818 ptZ - 0.846 ~5~ - 0.560 ~ 5 4

(0.304) (0.149) (0.074) (0.1 19) (0.380)

2, = 0.788 Zt-, , (0.078)

Page 552: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

REGRESSION WITH T W E SERE3 ERRORS 529

and 6’ = O.OOO41, where y)r3 = lO-’t and pf4 = 10-4(t - 35)’. If we fit the full model by maximum likelihood, we obtain

pI = 2.391 + 0.727 y ~ , - 0.822 y)rz - 0.773 qf3 - 0.932 pt4, (0.308) (0.150) (0.075) (0.1 13) (0.300)

2, = 0.812 Z,-, , (0.076)

and 6’ = 0.00042. As expected, the results are very similar to those obtained in Example 9.7.1 because the three procedures have the same asymptotic efficiency.

If we fit the model with a secolld order autoregressive error by maximum likelihood, we obrain

= 2.473 + 0.704 - 0.832 y~~ - 0.845 p,3 - 0.470 q r , (0.333) (0.156) (0.077) (0.127) (0.420)

= 0.742 Z f - I + 0.054 Z,-’ , (0.140) (0.139)

and 8’ = 0.00042. The second order coefficient in the autoregressive model is small relative to its standard error, agreeing with the results of Table 9.7.2. AA

Regression models in which the current value of a time series U, is a function of current and lagged values of a time series X, are sometimes called transfer function models. The models are often written in terms of the backshift operator. Thus, the simple model

U, = PIX, + PZXI - I + el

can be written as

More complicated models may contain ratios of polynomials in the backshift operator. For example,

E: = [ A ( & ? ) I - ’ B ( S B ) X , + e,

where A(&?) = 1 - a, 93-- - B(a) = 6 , - b,93-. - --b,B”, the ci are the coefficients defined by [A(B)]-’B(B), and it is assumed that the mots of A( a) = 0 are greater than one so that the coefficients converge to zero. The set of coefficients (c,,, c,, . . .) is called the impulse response function. Such models fall into the class of nonlinear regression models covered by the theory of Section 5.5, provided X, is reasonably well behaved.

Page 553: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

530 RBCIRESSION. TREND, AND SEASONALITY

9.8 REGRESSION EQUATIONS WITH LAGGED DEPENDENT VARIABLES AND TIME SERIES ERRORS

In the classid regression problem tbe emr in the equation is assumed to be distributed independently of the independent variables in the equation. indepen- dent variables with this property we call ordinary. In ow investigation of the estimation of the autoregressive parameters in Chapter 8, we noted that the autoregressive representation had the appearance of a regression problem, but that the vector of errors was not independent of the vector of lagged values of the time series. A generalization of the autoregressive model is one containing ordinary independent variables and also lagged values of the dependent variable as explanatory variables.

Consider the model

= X,0 f e, (9.8.1)

fOK t = 1 , 2 , ..., where x,=(~,p,), y)tz,... ~Q)lr,q-I,q-~,..,,q-~), e= (&, &, , , . , &, -a -%, . . . , -a,), the ql are fixed constants, and e, are uncorrelated (0, 02y random variables. The characteristic equation associated with equation (9.8.1) is

D

(9.8.2)

Such models are common in economics. See Judge et al. (1985, Chapter 10). Ordinary least squares is a natural way to estimate the parameters of (93.1). Let

denote the least squates estimator. The limiting distribution of the S t 4 W h t U d 8 is normal under mild assumptions.

equation

(9.8.3)

PmPerlY

Theorem 9.8.1. Let U, satisfy equation (9.8.1). where Yo, Y-,, . . . , Y-,+, are fixed. Assume

N e , , e:> I SZ,-~) = (0, r2) a.s.

and

E{lef1*+” I s z t , - , } < ~ < w as.

for all t and some u>O, where d,-, is the sigma-field generated by

Page 554: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

REGRESSION EQUATIONS WITH LAGGED DEPENDENT VARIABLES 531

and

(9.8.5)

Let b be defined by (9.8.3). Then

M:( b - O o ) 5 N(0, I) . Proof. Let q' be an arbitrary (k +p)-dimensional row vector, and &fine

where V l , = Z:=, a:,,. By assumption, -2 2 plims,, v,, = 1

and condition (ii) of Theorem 5.3.4 is satisfied. We have

- I ' where R, = {e: lei> I q'M, X,(-'e} for t = 1,2,. . . , F,(e) is the distribution function of e,, and

- 2 2 p Because s,,V,,, 1 and because the supremum af rhe integrals of e2 over R , goes to zero as n increases, condition (iii) of Theorem 5.3.4 is satisfied. Because q is arbitrary, the joint normality is established. A

In Theorem 9.8.1, conditions on the nature of the difference equation associated with (9.8.1) are imposed t h u g h the assumptions (9.8.4) and (9.8.5). If {@,} is a fixed sequence, whexe a, = (p,,, pr2,. . . , R~), then the existence of a sequence of fixed matrices {Mlln} such that

Page 555: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

532 REGRESSION, TREND, AND SEASONALITY

together with the assumption that the roots of (9.8.2) are less than one in absolute value, is sufficient for (9.8.4) and (9.8.5). See Fuller, Hasza, and Goebel (1981).

The conclusion of Theorem 9.8.1 can be obtained with martingale difference errors or with iid(0, a') errors. The martingale difference property is critical to the proof. If the e, are autwomlated, E{Y;-,e,} may not be zero, in which case the

To obtain consistent estimators for the parameters in the presence of autocorre- lated errors, we use the method of instrumental variables introduced in Section 5.6. The lagged values of O D , ~ are the natural instrumental variables. To simplify the presentation, we consider a model with two independent variables and two lagged values of Y,,

least squares estimator is no longer consistent. I

Y, = /?, ptI + he2 + A, Y,- , + A,<-, + Z, , t = 1,2, . . . , (9.8.6)

where 2, is a stationary time series with properties described in Theorem 9.8.2. There are several methods of instrumental variable estimation. In econometrics,

the two common instrumental variable procedures are called two-stage least squares and limited information maximum likelihood. The two procedures have the same asymptotic efficiency. For the model (9.8.6), the first step of the two-stage procedure is the regression of q-, and K - z on R,. ( p t - , , I , f l - 2 . 1 ~ Y)2,

OD, -,,*, and O D , - ~ , ~ . The predicted values K-, and t-2 are obtained from these regressions, and the instrumental variable estimates are OMained by regressing Y,

To apply an instrumental variable procedure to the model (9.8.6). we assume the matrix of sums of squares and products of ( p f I , pf2, O D , - I , ~ , f l - 2 . 1 , %-1.2,

O D , , - ~ , ~ ) to be nonsingular and to satisfy assumption 1 preceding Theorem 5.6.1. We requin: only two instrumental variables in order to estimate the parameters A, and 4, and some of the variables may be omitted if singularities appear in practice. For example, if Qh = (pa, 9,. p2), where O D , ~ 1 and OD,, = t, then the matrix with rows (1, t, eZ. t - 1, Y ) - ~ , ~ , t -2 , ~)1-~,d will be singular. One can delete variables from the regression to create a nonsingular regression problem. In the examplecaseonecanomit ~ ~ , - , , , = r - l and ~ ) - ~ , , = t - 2 .

In Theorem 9.8.2, we show that the instrumental variable estimator is consistent

on (4Ptl. Y)zt t - 1 9 t - 2 ) .

for the model (9.8.6). The conclusions generalize O D , ~ and any number of lagged Y's.

Theorem 9.83. Let {q: t E (1,2, . . .)} satisfy

immediately to any number of

the model (9.8.6), where m

Page 556: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

REGRESSION EQUATIONS WlTH LAGGED DEPENDENT VARIABLES 533

the bj are absolutely summable, the roots of m2 - A,m - 4 = 0 are less than one in absolute value, (Y- ,, Yo) are bounded in probability, and the e, are independent (0, a') random variables with E(]e,12"} < L2+& for some finite L and some 8 > 0.

Let pfI and cpI2 satisfy the assumptions (9.1.8) and (9.1.9). Let

Q =,'% Q,,

be nonsingular, where

Let 6 = (b, , &, &, &) be the instrumental variable estimator, and let

,= 1

D2,,(6- 8)=0,(1).

Proof. By the difference equation properties of our model, we may write

where U, = 2;:; wjZf,, the wj are defined in terms of A, and 4 by Theorem 2.6.1, and c,, and czf go to zero as t goes to infinity. Equation (9.8.7) gives q- , and q-2 as functions of lagged values of q l and e2 and of a random term U,. By our assumptims on the R~ and since PI and p2 are not both zero, the current and lagged values of R~ satisfy the conditions placed on Q, and 9 in Theorem 5.6.1. Likewise, Z, satisfies the conditions on Z, of Theorem 5.6.1.

The B, of assumption 4 preceding Theorem 5.6.1 is

B, = D;,,?Q,; #)'r(cP; W;.' ,

where the ijtb element of r is r;(li - j l ) , and tbe matrix B = limn+- B, is of the same form as the matrix B of Theorem 9.1.1. The matrices G,, of assumption 4 preceding Theorem 5.6.1 are of a similar form, with the covariance matrix of U,- , , of U, - 2 , or of U,-, with U,-2 replacing that of 2,.

Page 557: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

534 REGRESSION, TREND. AND SEASONALITY

Therefore, the conditions of Theorem 5.6.1 are satisfied, and the order in A probability of the emr in the estimators is given by that theorem.

By Theorem 9.8.2 the instrumental variable estimator is consistent when the enor in the equation is a stationary moving average time series with absolutely sununable coefficients. Although the instrumental variable estimators are not the most efficient estimators, they are not dependent on a specification for the emr process. Hence, they can be used as preliminaq estimators. In some applications, one will use the instrumental variable estimators to obtain information about the emr process. Given the instrumental variable estimates BI, b2, 6,, 4 of the model (9.8.6), we can calculate the estimated residuals

The presence of the lagged Y-values in the equation means that the results of Theorem 9.3.1 do not hold for autoconelations computed from these residuals. However, we are able to obtain a somewhat weaker result.

Theorem 9.83. Given the assumptions of Theorem 9.8.2, the sample covariances calculated from the instrumental variable residuals satisfy

9dh) - 9;(& = Op(rn) T

where

is defined in (9.8.8), rn = max(n-”2, S;’}, and S, is defined in Theorem 9.8.2.

Proof. Using the definition of 2, we have, for h = 0,1,. . . , n - 1,

Page 558: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

REGRESSION EQUATIONS WITH LAGGED DEPENDENT VARIABLES 535

Therefore,

where D,, is defined in T h m m 9.8.2. A

Because the estimated autocorrelations are consistent estimators, the estimated autocorrelations of the residuals can be used to develop a model for the error process. Given an original model such as (9.8.6) and a specified form for the error process, it is natural to estimate the entire model by a nonlinear fitting procedure.

To introduce such models, let Z, be the 6rst order autoregressive process

Zf = PZ,-, i- e, , (9.8.9)

where lpl< 1 and the el are independent (0, a’) random variables. We may then write (9.8.6) as a nonlinear model with lagged values of U, among the explanatory variables, and whose error term e, is a sequence of independent random variables. That is, substituting (9.8.6) into (9.8.9), we obtain

U,=/3,qJ,] +&Pf,+(A1 +P)U,-, + ( A , - P A i ) U , - ,

- Pfli%i,l - P&e)r-1,2 - P M - 3 +el

(say) =fiWf;il)+ef, t=4,5, ..., n , (9.8.10)

where W, = (ql, q2, e)r- l ,19 P ~ - ~ . , , T-, . T-,, U , - 3 ) and 6’ = (4,e2, fl3,e4, 4) = (PI, &, A,, A,, p). Note that we have expanded the vector of parameters to include P.

By Theorem 5.5.3 and its comllaries, the least squares estimator of 6 will be consistent and, when properly standardrze . 4 asymptotically normally distributed under mild assumptions. Note that the model (9.8.10) is of the same form as (9.8. I) with coefficients that satisfy nonlinear restrictions.

&ample 9.8.1. To illustrate the estimation of a made1 containing lagged values of the dependent variable, we use a model studied by Ball and St. Cyr (1966). The data, displayed in Table 9.8.1, are for production and employment in the bricks, pottery, glass, and cement industry of the United Kingdom. The

Page 559: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

536 REGRESSION, TREND. AND SEASDNALITY

TabIe9.8.1. Production Index and Empbynmt of the Urkks, Pottery, Glass, and Cement Industry of the United Kingdom Year- Logarithm of Logarithm of Quarter production Index Employment (OOO)

1955-1 4.615 5.844 2 4.682 5.846 3 4.644 5.852 4 4.727 5.864

1956- 1 4.625 5.855 2 4.700 5.844 3 4.635 5.841 4 4.663 5.838

1957- I 4.625 5.823 2 4.644 5.817 3 4.575 5.823 4 4.635 5.817

1958-1 4.595 5.802 2 4.625 5.790 3 4.564 5.790 4 4.654 5.784

1959-1 4.595 5.778 2 4.700 5.78 1 3 4.654 5.802 4 4.727 5.814

1960-1 4.719 5.820 2 4.787 5.826 3 4.727 5.838 4 4.787 5.841

1961-1 4.771 5.838 2 4.812 5.844 3 4.7% 5.846 4 4.828 5.855

1962- 1 4.762 5.852 2 4.868 5.864 3 4.804 5.864 4 4.844 5.861

1963-1 4.754 5.844 2 4.875 5.835 3 4.860 5.841 4 4.963 5.852

1964-1 4.956 5.849 2 5.004 5.858

SOURCE: Ball and St. Cyr (1966). Reproduced from Review of Economic Studies Vol. 33, No. 3 with rbe permission of the authors and editor.

Page 560: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

REGRESSION EQUATIONS WITH LAGGED DEPENDENT VARIABLES 537

quarterly data cover the period 1955-1 to 1944-2. The model studied by Ball and St. Cyr can be written as

Y, + pi ~5 + + &oft f 8 4 0 1 2 + P$f, + AK-1 + Zf 9 (998.111

where Y, = the logarithm of employment in thousands of the bricks, pottery, glass,

qf = the logarithm of the production index of the industry, and cement industry of the United Kingdom,

t = time with 1954-4 as the origin.

1 - 1 0 otherwise.

if observation t occurs in quarter j , if observation t occurs in quarter 4,

We assume Z, is a zero mean stationary time series, PI # 0, and IAl To constntct the instrumental variable estimator, we use the limited information

maximum likelihood procedure SYSLEN of SAS/ETS". In the terminology of that program Y, and K-, are endogenous variables and q, q-,, 2, D1,, DI2, and D,, are instruments. The inswmental variable estimated equation is

%={

1.

8 = 0.27 + 0.077% - 0.00042t - 0.004920,, - 0.003360,, + 0.00728Df, + 0.894K3_, .

The pmgram calculates estimated standard errors under the assumption that the errors are uncorrelated. The estimated standard errors are biased if the emrs are autoconelated. Because we are considering the possibility that the errors are autocoxrelated, we do not present tbe estimated standard errors.

An analysis of the estimated residuals 2, such as that in Table 9.7.2, indicated that the data are consistent with the hypothesis that 2, is a first order autoregressive process with a parameter of about 0.42. To estimate the model with autoregtessive errors, we use the maximum

likelihood option of procedure ARINA of SAS/ETS". We specify the model (9.8.11) with a first order autoregressive error. While the error in the first observation is correlated with the lagged dependent variable for that observation, we chose to include that observation in the fit. Thus, the program is used to maximize a pseudolikelihood that treats Y,-, as an ordinary explanatory variable. The estimated model is

f: = 0.88 + 0.097 - 0.00055 t - 0.00368 D,, (0.41) (0.032) (0.00030) (0.00186)

(0.00181) (0.00162) (0.081) - 0.00432 Dt2 + 0.00734 D,3 + 0.773 K-1

and @=0.386 (0.194). where the numbers in parentheses are the estimated

Page 561: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

538 REGRESSION. l”D, AND SEASONALITY

standard errors, Although this is a fairly small sample with a fairly large number of explanatory variables, the maximum likelihood estimates are similar to the instrumental variable estimates. Because of the small sample and the presence of time among the explanatory variables, it is a reasonable conjectme that the estimator /i possesses a negative bias.

There are two important reasons for recognizing the potential for autocorrelated e m in an analysis such as this one. First, the simple least squares estimators are no longer consistent when the errors are autocorrelated. Second, the estimators of the variances of the estimators may be seriously biased. A hypothesis of some interest in the study of employment was the hypothesis that /3, + A = I. This corresponds to the hypothesis of constant returns to scale for labor input. In the simple least squares analysis the sum of these coefficients is 0.89 with an estimated standard emr of 0.045. The resulting “?-statistic” of -2.5 suggests that there are increasing returns to scale to labor in this industry. On the other hand, the analysis recognizing the possibility of autocorrelated errors estimates the sum as 0.88, but with a standard e m of 0.072. The associated “t-statistic” of -1.7 and the potential for small sample bias in the estimators could lead one to accept the hypothesis that the sum is one. AA

REFERENCES

section 9.1. Anderson (1971). Orenander (1954). Onmder and Rosenblatt (1957), Hannan (1970).

Section 9.2. Ahlberg, Nilson, and Walsh (19671, Fuller (1969). Gallant and Fuller (1973). Greville (19691, Poirier (1973).

W o n 9.3. Bollerelev, Chou, and Kroner (1992), Lhubin and Wamn (1950,1951), Engle (1982), Granger and Teriisvirta (1W3), Hannan (1969). Harvey, Ruiz, and Shephard (1994). Melino and Turnbull (1990). von Neumann (1941, 1942).

Section 9.4. Davis (1941). Grether and Nerlove (1970), Jorgenson (1964), Kendall and Stuart (19661, Love11 (19631, Malinvaud (197Oa). Nerlove (1964). Stephenson and Farr (1972). Tintner (1940, 1952).

Section 9.5. Harvey (1989). Seetion 9.6. Davis (1941), Durbin (1962, 1963), Hannan (1958), Tintner (1940, 1952),

Rao and Tintner (1%3). Section 9.7. Amemiya (1973), Cockan and Orcutt (1949). Durbin (1960). Hildretb (1969).

Malinvaud (197Oa), Rest (1949). Seetion 9.8. Aigner (1971). Amemiya and Fuller (1967). Ball and St. Cyr (1966). Dhynnes

(1971). Fuller and Martin (l%l), Griliches (1967). Hatanalca (1974). Malinvaud (197Oa). Nagamj and Fuller (1991, 1992), Wallis (1%7).

EXERCISES

1. Let the time series X, be defined by

Page 562: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES

where

539

~ , = p Z , - ~ + e , , l p l < I , t=O,+1,_+2, ..., and {e,) is a sequence of independent (0, a') random variables. Compute the covariance matrices of Bs of (9.1.2) and BG of (9.1.3) for p = 0.9 and n = 5 and 10.

2. The data in the accompanying table are the forecasts of the yield of Corn in Illinois released by the United States Department of Agriculture in August of the production year. (a) For this set of data estimate a mean function meeting the following

restrictions: (i) The mean function is h e a r from 1944 through 1956, quadratic from

1956 through 1960, linear from 1960 through 1965, quadratic from 1965 through 1%9, and linear from 1969 through 1974.

(ii) The mean function, considered as a continuous function of time, has a continuous first derivative.

(b) Reestimate the mean function subject to the restriction that the derivative is zero for the last five years. Test the hypothesis that the slope for the last five years is zero.

(c) Plot the data and the estimated mean function. Augnst Foreclrst of Corn YieM for IuInets

Year YieM Year Yield Year Yield

1 944 I 945 1946 1947 1948 1949 1950 1951 1952 1953 1 954

45.5 43.0 55.0 45.0 58.0 61.0 58.0 56.0 56.0 58.0 45.0

1955 1956 1957 1958 1959 1960 1961 1962 1963 1964

58.0 59.0 52.0 64.0 63.0 63.0 73.0 79.0 81.0 85.0

1965 1966 1%7 1968 1 %9 1970 1971 1972 1973 1974

88.0

96.0 103.0 95.0 93.0 92.0 97.0

105.0 86.0

82.0

SOURCE: U.S. Department of Agriculture, Crop Production, various issues.

3. Evaluate (9.3.5) and (9.3.6) for n = 10 and the model of Exercise 9.1. The covariance between Xrm2 Z,Z,-, and X:, I 2: for the &st order autoregressive process can be evaluated using Theorem 6.2.1. Using these approximations and the approximation

Page 563: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

540 REGRESSION, TREWD. AND SEASONALITY

find the approximate expected value of j2(1) for the model and sample sizes of Exercise 9.1.

4. Using a period of three observations, construct a moving average to remove a linear trend from the third observation and hence show that the second diffemce operator is a multiple of the moving average constructed to remove a linear trend from the third observation in a group of three observations.

5. Define XI by

x, = px,-, + @, , lpl< 1 9

where the el are independent (0, cr2) random variables. Find the auto- correlation functions and spectral densities of

Plot the spectral densities for p = 0.5.

6.

7.

8.

9.

10.

Consider the model

where

1 for quarter i (i = 1,2,3,4) , ‘ti ={ 0 otherwise,

4 c ar, =0 , I = ]

and Z, is a stationary time series. Assuming that the model holds for a period of 20 observations (5 years), construct the weights needed to decompose the most recent observation in a set of 20 observations into “trend,” “seasonal,” and “remainder.” Define trend to be & + /3,r + &rz for the tth observation, and s e a ~ 0 ~ 1 for quarter i to be the a, associated with that quarter.

Obtain the conclusion of Proposition 9.6.3 directly by differencing the autoregressive moving average time series.

Prove Lemma 9.4.1.

Prove Lemma 9.4.2.

Using the data of Exercise 2 and the model of part b of that exercise, construct the statistics (9.3.8) and (9.3.9).

Page 564: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 541

11. Assume that the data of Exercise 2 of Chapter 7 satisfy the model A

where I for quarter j ,

41 ={ 0 otherwise,

2, is the stationary autoregressive process z, = e,z,-, + e,z,-, + C, .

and {e,) is a sequence of normal independent (0, a’) random variables. (a) Estimate the parameters of the model. Estimate the standard errors of your

estimators. Do the data support the model for the error time series? (b) Using the model of part a, estimate gasoline consumption for the four

quarters of 1974. Construct 95% confidence intervals for your predictions, using normal theory. Actual consumption during I974 was [2223,2540,25%, 25291. What happened? Computational hint: If a computer program is available that fits regression models with auto- regressive errors, the parameters and predictions can be obtained in one fitting operation. Add four “observations” and four independent variables to the data set. The four added independent variables are defined by

%*5+‘ ={o otherwise -1 for 1974-i,

for i = 1,2,3,4. The four observations on the original data for 1973 and the four added “observations” for 1974 are displayed in the accompany- ing table. The values of the original independent variables for the prediction period m the true values of these independent variables, while the values of Y, for the prediction period are zero. The augmented regression model with autoregressive errors is then estimated. The coefficients for the four added independent variables are the predictions, and the estimated standard enms are appropriate for the predictions.

Independent Dependent Variables for

Year Quarter Variable a,, a,, a,, a,, t Predictions

1973 1 2454 1 0 0 0 53 0 0 0 0 2 2647 0 1 0 0 5 4 0 0 0 0 3 2689 0 0 1 0 55 0 0 0 0 4 2549 0 0 0 1 5 6 0 0 0 0

1974 1 0 1 0 0 0 5 7 - 1 0 0 0 2 0 0 1 0 0 5 8 0 - 1 0 0 3 0 0 0 1 0 5 9 0 0 - 1 0 4 0 0 0 0 1 6 0 0 0 0 - 1

Page 565: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

542 REORESSION, TREND, AND SEASQNALin

12. Using the Prest data of Table 9.7.1, estimate the parameters of the model

U, = P, -t plr + act - 35)' + p3pIl + p464(pf2 + AK-, + Z,, 2, = pZl-, + e, ,

where {e,} is a sequence of independent ( 0 , ~ ~ ) random variables with E{e:}=7p4 and IpI < 1, 1A1< 1.

13. Let U, satisfy

U, = aCq + 2, , Zl=e,fi?e,-,, t=1,2, ...,

where lpl< 1, {el} is a sequence of independent (0, cr2) random variables with E{ef)=rp4, and {qp,} satisfies the assumptions (9.1.8) and (9.1.9). Let

and /.$ is constructed from the first order autocornlation of the (8.3.1) and restricted to (-1,l). (a) Show that

by the lule

/&-)=O,(n-') ,

where

Page 566: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 543

(1 + p p z , , t = 1,

t = 2 , 3 ,..., n ,

and ‘f,. is the tth row of 9. Show that

112 [ i: p:] (4 - a) = op(n-”2), I - 1

where

and

14. Let {Y,} satisfy

v = E{zz‘}.

r I

x = C ~ , f i ~ + . C ~ , x - , + e , , t=1,2, ...*

where (e,) is a sequence of independent (0,a2) random varia~les with E{e:} = v4 and the mots of ms - Z;ci”.; AjmS-j = 0 are less than one in absolute value. State and prove Theorem 9.8.1 for this model.

150 J = 1

15. Add a linear term to the wheat model of Section 9.2, and fit the model with a first order autoregressive emf.

16. Let y1. = (1,4p,, , p,*) = (1, t, t + (- l),). Does this vector satisfy the assump- tions (9.1.8) and (9.1.9) with A, nonsingular? Is there a linear transformation of y3 that satisfies the two assumptions with A,, nonsingular?

17. Let the model (9.7.1) hold with q+, that satisfy the assumptions (9.1.8) and (9.1.9). Show that the conditions of Theorem 5.7.4 are satisfied for (2,) that is a stationary autoregressive moving average with iid(0, a2) errors.

1%. Using the estimates of Example 9.5.1, construct the vector H, to estimate % using (YmW3, Y,-*, . . . , Y,+?, Y,+,). What is the estimated variance of b,,, constructed with the vector?

19. The U.S. wheat yields for 1W and 1993 are 39.4 and 38.3. Are these yields consistent with the model of Example 9.7.27 Fit the model of Example 9.7.2 to the data for 1908 through 1993, and predict yields for 1994, 1995, and 1996. Fit the model in which (9.7.8) is replaced with

Page 567: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

REORESSION. TREND. AND SEASONALITY 544

where

1 s t s 7 0 , 70 G t Q 80,

100+20(t-80), t 3 8 0 .

Use the fitted model to predict yields for 1994, 1995, and 1996.

24. The U.S. wheat yields for 1992 and 1993 are given in Exercise 19 as 39.4 and 38.3, respectively. Fit the structural model of Example 9.5.1 to the data for 1908 through 1993. Predict yields for 1994,1995, and 1996.

21. Let Z, be a stationary time series with absolutely summable covariance function. Show that the assumptions (9.1.8) and (9.1.9) are sufficient for the limit (9.1.7) to exist. Give an example of a 2, and fi such that B of (9.1.7) is singular.

22. Let Z, satisfy the ARCH model of (9.3.21). Show that 2, satisfies the assumptions of Proposition 9.3.3.

23. Consider the model

z, = ez,- I + a, , let < 1 , a, = (k, -t k2a:- I )1’2e, ,

where the e, are M(0,l) random variables and Oak, dO.3. Let aOLs and (k,,,,, i2,0Ls)2denote the least squares estimators obtained by regressing Z, on Z,-l and a on (1, 6t-l), respectively, where 6, = Z, - domZ,-l. Define 6, = ( i1 ,OLs + L2.0Ls6:-1). ~ e t (i,,,Ls, i2,,,,) be the estimator of <kl, k2) obtained by regressing f ; , a , on (if-’, f;:*d:-,). Define =(i,,GLs + L s d :-I ).

Let BOLs denote the estimated generalized least squares estimator obtained by regressing fi;“2Z, on /;~1’z2f-,. (a) Find the asymptotic distribution of

- 1 0 2

n1’2($Ls - e, R,,,, - k,, i2.0Ls - k,)’ .

n1’2[$Ls - e, R,,,, - k,, Rz2.GLs - k2)’ . (b) Find the asymptotic distribution of

(c) Consider the conditional maximum likelihood estimators bL, and lZaML obtained by minimizing

Page 568: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 545

where h, = k, + k,(Z,_, - flZ,-2)2. Find the asympt~tic distribution of n ; e, &,+,, ; k,, &ML - Q'. Compare the limiting distributions 112

of eotsT eGu, a d eML.

Page 569: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

C H A P T E R 10

Unit Root and Explosive Time Series

In Chapter 9, we studied time series in which the mean was a function of time. A second important class of nonstationary time series are those that satisfy nonstationary stochastic difference equations. We begin OUT discussion with least squares estimators for the univariate unit root autoregressive process. Alternative estimation procedures for unit processes are discussed in Section 10.1.3. Tests based on the metbods of Section 10.1.3 have been demonstrated to be more powerful against the unknown mean stationary alternative than tests based on ordinary least squares regression. The remaining sections are devoted to auto- regressive processes with a root greater than one, multivariate unit root auto- regressive processes, and time series with a moving average unit root.

10.1. UNIT ROOT AUTOREGRESSIVE TIME SERIES

10.1.1. The AutoregregPlve procesS with a Unit Root

Consider the first order time series

Y,=pY,-, +e, , t = l , 2 ,..., Y 0 = O ,

( 10.1.1)

where the e, are normal independent (0. a*) random variables. In Section 2.1, we showed that the solution to the difference equation is

1 - 1

U, = 2 1-0

(10.1.2)

One case of particular interest is obtained when p = 1. Then Y, is the sum of t independent (0, a2) random variables. The time series (10.1.1) with p = 1 is sometimes called a random walk.

546

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 570: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTOREGRESSIVE TIME SeRIEs 547

By the results of Section 8.1, the maximum likelihood estimator of p conditional on Y,, for any real valued p, is

and

(10. I .3)

(10.1.4)

If p = 1, then Y, = Xj-, el and the mean and variance of the numerator and the denominator of (10.1.4) can be computed. We have

= +n(n - I)aZ, (10.1.5)

(10.1 6)

(10.1.7)

Normality was used in obtaining (10.1.6) and (10.1.8). The other results remain true for e, that are iid(0, a').

These results differ in a notable way from those obtained for (pf C 1. When fpl< 1, every entry in the covariance matrix of (x:=~ Y,- le,, ~ y = ~ Y:- is of order n , ~ t h e l i m i t o f t h e e x p e c ~ v ~ u e o f n - ' ~ ~ , , ~ ' isaZ(l-p2)-' . ~fp-1, the variance of the numerator of (10.1.4) is order n instead of order n. Also, the mean of the denominator of (10.1.4) is of order n2, and the variance is order n4. The denominator divided by 0.5n(n - l)az gives a random variable with mean one and variance 4(nz - n + 1)[3(nz - n)]-'. Because the variance converges to 413, the normalized denominator does not converge to a constant in mean square.

The moment results and the fact that Z Yf is a quadratic form in (el, e,, . . . , en) can be used to show that the enur in j3, as an estimator of one, i s O,(n-'>. Thus, for p = 1, the least squares estimator converges in probability to the true value more rapidly than does the estimator for ipi < 1.

i:

Lemma 10.1.1. Let f i be defined by (10.1.3), and assume Y, satisfies (10.1.2) with p = 1, where the e, are normal independent ( 0 , ~ ' ) random variables. Then

Page 571: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

548 UNIT ROOT AND EXPIABIVE TIME SERIES

- 1 = O,(n-').

Proof. The denominator of $ a n be written as n

yf-, = e'Ane , r=2

where e' = (el, e,, . . . , and

n - 1 n - 2 n - 3 . * *

n - 2 n - 2 n - 3 . * . ( 10.1.9)

... The matrix A, can be diagonalized by the (n - 1) X (n - 1) orrhonormal matrix M, whose 0th element is

m,,, = 2(2n - 1)-"2cos[(4n - 2)-'(2j - 1)(2i - 1)vJ. (10.1.10)

Then

M,A,M: = A,, 3 diag(A,,, 4,,, . . , An- ,,,,I *

Where

A ~ , = $sec2[(n - i ) 4 2 n - I)-'] (10.1.11)

for i = 1,2, , , . , n - 1. S e e Rutherford (1946). Hence, the denominator of 3 - 1 can be written as

" - 1 .. . e'A,e = 2 Al,U;, ,

I = I (10.1.12)

where n - 1

uin = c mni,e, s i = 1 , 2 )..., n - 1 , I= 1

and the ZJ,, are NI(0,o') random variables. By the moment results (10.1.5) and (10.1.6)

" - 1

Page 572: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTOREORESSIVE TlME SERIES 549

If we fix i and take the limit as n -+ 03, we have 2 lim n-'Ain = 7 r EZ 4[(2i - l ) ~ r J - ~

n -w

for i = 1,2,, . . . Therefore, the largest root, A ~ , , increases at the rate n2 as n increases. Given E 2 0,

P{Uf, < 2E).

Because U:,, is a chi-square random variable, given A > 0, there exists an E > 0 such that P{Uin < 2 ~ ) < A. It follows that, given A > 0, there exists a KE-' such that

P{(e'A,e)-'n(n - I ) > KE-'} < A

for all n and, by (10.1.6),

Because n-2 X:G2 Y,-,e, = OJn-'), it follows that a - 1 = Op(n-')+ A

The numerator of - 1 can be written in an alternative informative manner as

Hence, as n + w ,

2(n(r*)-' i: q-le , + t ~ ' x : ,

where ,y: is the chi-square distribution with one degree of freedom. The probability that a onedegree-of-freedom chi-square random variable is less than one is 0.6826. Therefore, because the denominator is always positive, the probability that /i < 1, given p = 1, approaches 0.6826 as n gets large. Although a chi-square distribution is skewed to the right, the high correlation between the numerator and denominator of /i - 1 strongIy dampens the skewness. In fact, the distribution of /i displays skewness to the left.

To establish the limiting distribution of the least squares estimator of p, given that p = 1, we will use the following lemma.

r=2

Lemma 10.12. Let {U,n: 1 s t G n, n 2 1) denote a triangular array of random variables defined on the probability space (a, d, P). Assume

Page 573: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

550 UNIT ROOT AND EXPLOSIVE TIN SElUES

E W , , , u:n, U,,P,,)l= (0, c2, 0)

for 1 G t # s C n. Let {wi: i = 1,2, . . .) be a sequence of real numbers, and let {win: i = 1,2,. . . , n; n = 1,2,. . .) be a triangular m y of real numbers. If

and

l imw, ,=wi , i = l , 2 ,..., n-m

then

I'I i= I

Proof. Let e > 0 be given. Then we can choose an M such that

and

for all n > M. Furthermore, given M, we can choose No > M such that n >No implieS

€ M

(Y2 z (Wi" - wi)2 c I = 1

and

36 1=M+I 9'

Hence, for all n >No,

i= I i - I A

We now give the primary result for the distribution of the least squares estimator of the autoregressive parameter when the true process is a random w a k

Tbeorem 10.1.1. Let

Y,=Y,-,+e,, t = l , 2 ,...,

Page 574: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTOREGRESSIVE TIME SERlES 551

where Yo = 0. Assume {ef}r=l satisfies

E { ( e f , e ~ ) I ~ - . l } = ( ~ , a’) as.,

E{le,(2+dldf-1}<M<~ 8.5.

for some S > 0, where 4 is the sigma-field generated by (el, e,, . . . , er}. Then

n(b - 1 ) P ’ (2G)-’(T2 - 1) ,

where 6 is defined in (10.1.3),

3: = (-l)‘+I 2[(2i - 1)wI-l ,

{Vi}r= I is a sequence of M(0,l) random variables, and (G, T) is defined as a limit in mean square.

Proof, The estimator b is scale invariant. Hence, there is no loss of generality in assuming cr2 = 1. LA

From the definition of p and from (10.1.13). we have

= (2G,,)-’(T,2 - 1 ) + oP( 1) ,

where n-’ X:=, e: = a2 +o,(l) by Corollary 5.3.5.

distribution to (G, T). Let To establish the limit distribution result, we show that (G,,, T,) converges in

U, = Win, U,,, . . . U,,-l.n)t = Mnen I

where en = (el, e,, . . . , en- (10.1.10). Then

and M, is the matrix with (i, j) th element given by

T,, = n-It2 JLe,, = n -112 J,M, I I U, ,

where J,, = (1, 1 , . . . , 1)‘. Let k,,, be the ith element of n-”’J;M,’. The element k,, is the covariance between T,, and Uin. Using this fact and the results of J o k y (1961, p. 78). it can be shown that, for fixed i,

lim kin = 21’27 . n +m

Page 575: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

552 UNIT ROOT AND EXPLlXIVE TIME SERIES

See Dickey (1976, p. 29). Also, 2 Z;",, y: = 1; see Jolley (1961, p. 56). Let

* where A, is defined in ( 10.1.1 1 ). Then Tn - T,, converges in probability to zero by Lemma 10.1.2. Also,

because

The entries in nl1'Mn are bounded, and the sum of squares of each row of M , is one. Therefore, by Corollary 5.3.4, for fixed 1,

3 Win, u,,, * * . * UJ+ (U1, up * - * 9 UJ 9

where (U, , U,, . . . , V, ) - N(0, I). Note that

Because the sequence {y:}Tel is summable, the vector

converges to zero in probability as 1 tends to infinity, uniformly in n. Therefore, by Lemma 6.3.1, (G,,,Tn) converges in distribution to (G,T) and the result is established. A

We gave the proof of Theorem 10.1.1 for e, that are martingale diffkrences. The

The limiting distribution of Theorem 10.1.1 can be obtained by using Theorem result also holds for e, that are iid(0, a2) random variables.

5.3.5 and 5.3.6. By Corollary 5.3.6, with u2 = I ,

Gn, T , ) s (G, T) ,

where

Page 576: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUMRBaRESSIVE TIME SERlEs 553

( 10.1.14)

and W(t) is the standard Wiener process defined in Section 5.3. In Theorem 10.1.1, we assumed Yo = 0 to simplify the presentation. Because

for any finite Yo, the conclusion of Theorem 10.1.1 is true for any finite Yo. The first part of Table 10.A.l contains percentiles for the empirical distribution

of a given that p = 1. Normally distributed errors were used to construct the table for n = 25,50, 100, 250, and 500. The total number of observations available is n, while n - 1 sums of squares and products are used in calculating the regression coefficient. The entries for n = 00 are the percentiles for the limiting distribution of Theorem 10.1.1.

The table may also be used for the distribution of a for p = - 1, since, for symmetrically distributed e,, the distribution for p = - 1 is the mimr image of the distribution for p = 1. If the e, are not symmetrically distributed, the limiting distribution for p = -1 remains the mirror image of the limiting distribution for p = 1.

Corollary 10.1.1.1. Let the model of Theorem 8.5.1 hold. Then

lim (P@ - p > a I p = 1) - P@ - p < -a I p = -I}) = o n-roo

for aU real a. If, in addition, the e, are independently and identically distributed with distribution that is symmetric about zero, then

P { p p > o I p = l } = P @ - p < - u I p = -1)

for all real a and all n 2 2 .

Proof. Let X, = X;:: (-l)’e,-j. In a manner similar to that used to obtain (10.1.13), we have

Therefore, for the estimator of p when p = - 1, P{b + 1 <a} is

Page 577: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

554 UNIT ROOT AND EXPLOSIVE TIME SERIES

Although the sign of e, in the weighted sum t > i, the sign is always opposite of that for

(- l)’eI-j is not the saxne for all and e,,,, and it follows that

If the el m iid(0, a2) and if the distribution of el, t = 1,2, . . . , is symmetric, then the distributional pmperties of the sequence (-el, e2, -e3, . . .) are the same as the distributional properties of the sequence (el, e2, eg, . . .), and we conclude that, for symmetric el and for any a,

!

Also, the large sample distribution of given p = -1 is the mirror image of the distribution for p = 1 for e, that are not symmetric, because the V, have a limiting n d distribution for the sequence (-el, e,, -e3’. . .) as well as for the sequence

A (el, e,, e3,. . .I. A ~ t ~ r a l statistic to use in testing the hypothesis that p = 1 is the test statistic

one would calculate in ordinary linear regression:

where

Cordlarg 10.1.1.2. Let the model of Theorem 10.1.1 hold. Then

+A (4G)-1’2(T2 - 1) p

where + is defined in (10.1.15).

Proof. We have

(10.1.15)

n

Page 578: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTOREGRESSIVE " M E SERIES 555

Therefore, s2& a2 by Corollary 5.3.8. Because

= (4G,)-"'(T: - 1) + aP(l),

we have the result. A

Percentage points for the empirical distribution of ? are given in Table 10.A.2. The percentiles for n = 00 are the percentiles of the distribution of Corollary 10.1.1.2.

To extend the results for the first order process with p = 1 to the pth order autoregressive process, we consider the time series

I v,=zz);, t = l , 2 )..., ( 10.1.16)

where {Zl: t E ( O , r t l , 22, . . . )} is a ( p - 1)st order stationary autoregressive time series with the representation

j = 1

D

2, = c @,-,+, + e, , i-2

(10.1.17)

the e, iue martingale difference random variables with the properties defined in Theorem 10.1.1, and the absolute value of the largest root of

(10.1.18)

is less than some A < 1. By (10.1.16) and (10.1.17), we may write P ~ , + z ajq- ,=e , , t = p + l , p + 2 , ..., (10.1.19)

I" I

wherep- 1 of the roots of P

m p + 2 (rim'-' = o

are thep - 1 roots of (10.1.18). and the remaining root is one. In a problem where a root of one is suspected, it is operationally desirable and theoretically convenient to write the autoregressive equation so that the unit root is isolated as a coefficient. To this end, we write

j 5 1

( 10. I .20)

Page 579: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

556 UNIT ROOT AND ExpLOsNE TIME SERIES

for t = p + l , p + 2 , ..., where p a 2 , 4=Z,P,ia,, i = 2 , 3 ,..., p , and 0,= -Z;=, a,. If there is a unit mot, 0, = 1.

2, on Z,-,, ZJ-2, . . . , Zl-p+l and obtain an estimator of (02, 0,, . . . , 0,). say (6, &. . . , a,), with the properties described in Section 8.2. It is interesting that, given 8, = 1, the regression of U, on Y,- z,-, , . . . , Z, yields an estimator of el, say a,, such that the large sample distribution of nc(8, - 1) is the same as that of n(b - l), where b is the estimator (10.1.3) and c is a constant defined in Theorem 10.1.2. The large sample distribution of ( 4 , &, . . . 6,) is the same as that of the regression coefficients in the regression of 2, on q-,, Zt-2, . , . , Z,-,+,.

If one knew that 0, = 1, one would regress U, - Y,-l

Theorem 10.1.2. Let U, be &hed b (10.1.16). Suppose the el satisfy the assumptions of -em 10.1.1. k t d'=(#l, $ , . . ., 8,)' be the vector of regression coefficients obtained by regressing U, on U,- , , 2,- . . . , Z,-,+ ,, t = p + 1, p + 2,. . . ,n, and define D,, to be the diagonal matrix

I I2 D,, = diag{n, n"', . . . , n } . &fine = z;*,, wi = (1 - zTm2 ql, WIWE

and the w, satisfy the homogeneous difference equation with characteristic equation (10.1.18) and initial conditions wo = 1 and w, = 0 for i C 0. Let

where W, = Zj=lej, and (a2, a,, . . . , a,)' is the vector of regression coefficients obtained by regressing Z, on ZJ.-l, Zl -2 , . . . ,z,-p+l. Then

plim D,( d - #) = 0. n-+m

112 Furthermore, n(d1 - 1) is independent of n (0, - 4,. . . , 8, - 8,) in the limit.

hof. Using the definition of 2, as the weighted sum of past el, we can write

I / m \ m i + l

= cW, + Q, + R , ,

where Q, =xj:;=I gjet-j-t-1, gj = -XT,, wt, and R, = XT.o e-] Z::;+, w,. Since the absolute value of the largest root of ( 10.1.18) is less than A C 1, by Exercise 2.24, lw,l is less than MIAi and lgjl is less than M2Aj for some finite MI and M2. It

Page 580: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTOREGRESSIVE TIME SERIES 557

follows that E(Q,Q,,,} and E{ZfZf+,} are bounded by a multiple of Also, the expectations of qQj, KZ,, n-'Y,q, and R: are bounded. Using these results, it can be shown that

(10.1.21)

and

where C, is the (p - 1) X (p - 1) matrix of sums of squares and cross products of Zf-I, ZfV2,. . . , Z,-,+,. The lower right (p - 1) X (p - 1) submatrix of A, is C,,. Furthermore, the ijth element of n-'C, is converging in probability to ~ ( i - j ) . By the convergence results for n-'X Wf-l, we have D,'B,,D,' = OJ1) and D,B;'D, = o~(I). Therefore, using equation (10.1.21),

D,(A;' - B;')D, = q i ) .

Similarly,

n n

. . . . r=p+l 1 = p + l

and the probability limit result is established. From (10.1.13).

n-' f: W,-,e, = 0.J(n"2e')2 - 0 . 5 ~ ~ + Op(n-1'2).

Let U.,,,=CV,,,U,,,, ..., U,,X X I = ( Z , - I . Z,-.2 .... . Z f - p + l ) . and A = E O F , h where Ui, is defined in (10.1.12). Now

, - p + l

Page 581: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AND EXPLOSIW TIME SERlEs

wbere mnit is defined in (10.1.10) and 6 , I , . . . , blr, b,, b3, , . . . , b3,p- are fixed constants, satisfies the assumptions of Theorem 5.3.4. Therefore,

for every fixed 1. It follows, by Lemma 6.3.1, that

converges in distribution to (G, T, g'), where (G,,, T,) is defined in the proof of Theorem 10.1.1, f is a N(0, Ia2) random variable, and (G, T ) is independent of 5. A

Corollary 10.131. Let the assumptions of Theorem 10.1.2 hold. Let

Then

piim [ K J ~ - 6 ) - kn(@ - e)j = 0 , n+m

where y, c, d, and 6 are defined in Theorem 10.1.1.

Proof. By the proof of Theorem 10.1.2,

by 'Theorem 6.3.5, the result follows. A

On the basis of Corollary 10.1.2.1, Table 10.A.2 can be used to investigate the

Page 582: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTOReGRESSIVE TIME SERIES 559

hypothesis that one of the mots in a pth order autoremsive process is unity. That is, the pivotal statistic for 8, computed by an ordinary least squares program,

9= [q{dl)]-’/2(J, - I ) , (10.1.22)

where O{dl> is the ordinary least squares estimator of the variance of J,, has the limiting distribution given in Corollary 10.1.1.2. Also, the regression pivotals can be used to test hypotheses about the mfficients of the stationary process. For example, thc hypothesis that the original process is of order p - 1 is equivalent to the hypothesis that ap = 0, which is equivalent to the hypothesis that 19~ = 0. A test statistic for this hypothesis is the ordinary regression pivotal statistic

rp = [ O { t @ - ” z ~ p ,

where O{ap} is the ordinary least squares estimator of the variance of JP. Under the assumptions of Theorem 10.1.2, tp is asymptotically normal when 8, = 0.

We note that the coefficient for Y,- in the regression of Y, on Y,- l , Z,- ,, . . . , Z,-p+l is not the largest root of the fitted autoregressive equation. To see this, consider the second order process. The fitted equation is

= (A, + - fild12Y,-z,

where (d,, hill) are the two roots of the estimated characteristic equation. It follows that the estimated equation in Y,-l and Z,-l is

= [Al + h,(l - dl ) ]q- , + h,A2(q-1 - q-,) * Also, the error in the coefficient of q-, as an estimatur of one is

Because c = (1 - rn2)-’. the limiting distribution of n(hl - 1) is the same as the limiting distribution of n(j3 - 1) given in Table 10.A.l. We state the generalization of this property as a corollary.

Corollary 10.13.2. Assume that the model ( 10.1.16)-( 10.1.17) holds, and express Y, in the form (10.1,19), where one of the roots of

P

mp + C aim’-’ = o

is equal to one and the other roots are less than one in absolute value. Let &’ = ( G1, 4, . . . , $) be the least squares estimator of a estimated subject to the restriction that & is real, where (fi,, f i 2 , . . . , riip) are the roots of the estimated characteristic equation with rii, a lliizl 3 - - - B IriiJ. Then

i = l

Page 583: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

560 UNIT ROOT AND EXPLOSIVE TIME SERIES

n(fii, - 1 ) 3 (2C)3T2 - 1), where (G, T) is defined in Theorem 10.1.1.

Proof. By (10.1.16) and (10.1.17). we have P P - 1

mp + simp-' = (m - 1) i = l

aI - 1 =Op(n-') , and 4- 0, =Op(n-"2) for i = 2 , . . . , p , where the 4, i = 1,2,. . . , p, are the regression coefficients of Theorem 10.12. Therefore, by Corollary 5.8.1, the roots of

P

m p + x & m p - t = o , 1=1

where the relation between and is defined by (10.1.20), converge in probability to the roots of the equation defined with (a1,. . . , a p ) . The estimated polynomial for the ordinary least squares estimator evaluated at m = 1 is

Using the fact that 8, = -XTeI hi, and the limiting result for the roots,

This equality also demonstrates that given c > 0, there is some N, such that for n > N,, the probabiiity that h computed with the ordinary least squares estimator of (I; is re!al is greater than 1 - e. The conclusion follows because we demon- strated in the proof of Theorem 10.1.2 that

A

For the model (10.1.1) with lpl< 1, the limiting behavior of the estimator of p is the same whether the mean is known or estimated. The result is no longer true when p = 1. Consider the estimator

(10.1.23)

Page 584: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTOREaRBSSIVE TIME SERIES 561

where n

[Y(o),Y(-I)l= (n - 1 1 - I c (q, q - , , . r=2

When p = 1, we have (3-1n)-1’2j7 4 N(0,l) and this random variable makes a contribution to the limiting distribution of n(b, - 1). Also see the moment results of Exercise 10.4. The limit random variable associated with 9 is denoted by H in Theorem 10.1.3.

Theorem 10.13. Let the assumptions of Theorem 10.1.1 hold. Let b, be defined by (10.1.23), and let

7; = [P@,}1-”2(4 - 1) . (10.1.24)

Where

and

Then

and

P 2 112 ?,,d (G - H )- [0.5(T2 - 1) - TH] ,

where

iJ, -NI(O, l), W(t) is the Wiener process, and G, T, and 10.1.1.

are defined in Theorem

Prod. The method of proof is the same as that of Theorem 10.1.1. Let

and define U, and Mn as in the proof of Theorem 10.1.1. Then

Page 585: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

562 UNIT ROOT A M , EXPLOSIVB TIME S E W

-312 whereL:=n (n- l ,n-2, ..., l).Letql,, betheitheIementofL:M,'.This element is the covariance between H, and U,, and it can be shown that for fixed i

112 2 limq,,=2 Y i n - w

The maainder of the proof follows that of Theorem 10.1.1. A

The second part of Table 10.A.1 contains empirical percentiles for the distribution of n(b, - l), and the second part of Table 10.A.2 contains the empirical percentiles of the corresponding studentized statistic given that p = 1. The values for n = 00 are tbe percentiles of the Iimiting distribution of Theorem 10.1.3.

It can be demonstrated that plimn(fi, - /i) = 0 when p = -1. That is, estimating the mean does not alter the limiting distribution when p = -1. Therefore, the first part of Tables 1O.A.1 and 10.A.2 can be used to approximate the distributions of 6, and 7- when p = -1.

The distributions of Theorem 10.1.3 are obtained under the model (10.1.1) with p = 1. This point can be emphasized by writing an extended made1 as

U, = (1 - p)p + pY,-] + e, ,

where p and p are unknown parameters. The null model for the test 7 j is p = 1. The extended model then reduces to (10.1.1). The alternative model with p # 1 permits a nonzero value for 6, = (1 - p)p. Thus, the test based on ?& is invariant to the mean p of the alternative model.

The limiting distribution of n3"(8, - I), where dl is the bast squares estimator and 4, = 1, for the model

Y,=B,+61Y,-, +e,

with 0, # 0 is normal, and is discussed in Section 10.1.2. It follows from Theorem 10.1.3 that the distribution of the ordinary least

squares estimator of 6, for the model (10.1.1) with p = 1 is not normal. Let AY, = Y, - Y, - and consider the equation

AU, = 6, + <el - l)U,-' f e, , (10.1.25)

with the unknown e, equal to zero. The estimated regression equation can be written

where y(-ll is defined in (10.1,23), AP- (n - l)-' Xy-2 AX, and

Page 586: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTOREGRESSIVE TIME SERIES

Under the null model, AY, = e, and AT= E. It follows that

io = B - (8, - l)y(-l)

and

n 1 ’ 2 d , , z T - (G - H2)-’[0.5(TZ - 1) - TH]H,

563

(10.1.27)

where G, T, and H are defined in Theorem 10.1.1 and Theorem 10.1.3. Dickey (1976) and Dickey and Fuller (1981) have discussed the distribution of nI’z80. Under the normal distribution null model, Ahp of (10.1.26) is distributed as a normal random variable. Also, because il is a quadratic function of (Y,, . . . , Y,), AT is uncorrelated with dl for normal e,. Therefore, estimating the equation in the form ( 10.1.26) yields two uncorrelated estimators whose null distributions are easily interpreted.

The distributional results for the estimator of the parameters of the first order autoregressive model with intercept extend to the pth order process in much the same manna as the results for the model with no intercept. It follows from Theorem 10.1.4 that the regredon pivotal calculated to test 6, = 1 has the tabled distribution ?* when the fitted made1 has an intercept.

Theorem 10.1.4. Let Y, be defined by (10.1.16). Suppose the assumptions of Theorem 10.1.2 hold Let 8= (4, Jl, d2, . . . , i,,)’ be the vector of regression coefficients obtained by regressing Y, on (1, Y,-l, Z,-.l,. . . , Z,-p+l 1. t = p i- 1, p + 2,. . . , n. Define Dn to be the diagonal matrix

D, = diag(nIt2, n,n1’2,n1’2, . . . , n I I 2 ),

and let 8- 8 be the vector such that #o = $o;

- 1 n-1 where W, and c are defined in Theorem 10.1.2, and = (n - 1) Z,,, 5, and ( g2, &, , . . ,8,) is the vector of regression coefficients obtained by regressing 2, on (Z,- I , Z, - , . . . , Z, -,,+, ). Let 1;T be the ordinary regression pivotal computed by dividing 4 - 1 by the regression estimated standard error of

plimDn(8- 8 ) = 0 ,

Then

n-*m

n(J1 - 8,) is independent of n”*[i2 - 0,. . . . , JP - 6,J in the limit, and the limiting distribution of T; is the limiting distribution of ?,, given in Theorem 10.1.3.

Page 587: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

564 UNIT ROOT AND EXptoSIVE TIME SERES

PmP. omitfed. A

There has been extensive research on models with unit roots. We give only a few references. Hasza and Fuller (1982) studied models with two unit roots. Dickey, Hasza, and Fuller (1984) gave results for seasonal models, and Tiao and Tsay (1983) gave results for models with complex mots. Said and Dickey (1984) show that fitting a high order autoregressive model is an appropriate way to test for a unit root in a model of unknown form. Chan and Wei (1988) and Pantula (1989) give results that cover models with a number of unit roats. Willips (1986, 1987a-c, 1988) and his co-workers have given many results on both univariate and vector processes. Shin (1990) and Yap and Reinsel (1995a,b) give tests for a unit root in the autoregressive portion of an autoregressive moving average. Ahtola and Tiao (1984). Chan (1988), Chan and Wei (1987), and Phillips (198%) have studied the behavior of estimators when p is close to one but not equal to one.

Example 10.1.1. In this example we analyze the one-year treasury bill interest rate for the period January 1960 through August 1979. This is one of three interest rate series studied by Stock and Watson (1988). The data are given in Table 10.B.1 of Appendix 10.B. We postulate the time series to be an auto- regressive process, and we are interested in testing the hypothesis that the process has a unit root. In this example, we use the ordinary least squares method of fitting. If we fit a sixth order autoregressive pmess, the fitted equation is

t= 0.104 + 1.332 Y,-, - 0.453 Y , - p + 0.127 FY3 (0.064) (0.067) (0.1 11) (0.115)

+ 0.043 q - 4 - 0.079 5 - 5 + 0.014 q - 6 (0.1 15) (0.1 11) (0.067)

and the regression residual mean square is 0.0833. Tbere are 230 observations in the regression, and the residual mean square bas 223 degrees of freedom. The coefficients for yt-3, yt-,,, x+ , and Y,-6 am small relative to the least squares standard errors. If we drop Te6 from the regression, the t-statistic for yt-s is - 1.036. If we drop x-.5 f m the regression, the t-statistic for Y,-4 is -0.884, and if we drop xa4 from the regression, the t-statistic for yt-3 is 1.427, By Theorem 8.2.1, the t-statistics are approximately normally distributed for a stationary process. By Theorem 10.1.4, the t-statistics are appximately normally distributed if the process contains a single unit root. Thus, the testing procedure is appropriate in either case. The F-test for the coefficients of qq4, q-,, and q - 6 , computed as the sum of the squared t-statistics divided by 3, is 0.63. We compute the F-statistic in this way because different numbers of observations were used in the different regressions. The value of F is such that one easily accepts the hypothesis that the three coefficients are zero. The t-statistic for the coefficient of q-3 is such that one could either drop the coefficient or retain it in the analysis. We choose to retain the coefficient, and we proceed with the analysis using the third order model.

Page 588: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUWReORESSIVE TlME SERlES 565

We are interested in testing for a unit mot against the alternative of a stationary process, but we are willing to entertain the possibility of two unit roots. Dickey and Pantula (1987) suggest a sequential testing procedure in such situations. One begins with the greatest numbex of unit mots one is willing to consider. In our case, we begin by testing for a second unit root under the maintained hypothesis of a single unit root. We are willing to assume there is no intercept in the model for the differences. The estimated regression is

where the numbers in parentheses are the regression estimated standard errom. Because the T-statistic +=(0.076)-’(-0.763) = -10.09 is less than the 1% tabular value of -2.58 in the first part of Table 10.A.2, we reject the hypothesis of a second unit root.

The regression equation for the test of a unit root in the original time series is

A t = 0.082 - 0.012 &-, + 0.343 A q - , - 0.094 Aq-, . (0.062) (0.01 1) (0.065) (0.066)

Because the +- statistic of -1.09 for the coefficient of &-, is greater than the 10% tabular value of -2.57 from the second part of Table 10.A.2, we would accept the hypothesis of a unit root at that level. The analysis of these data is extended in Example 10.1.2. AA

10.1.2. Rmdom Walk with Drift

In this section, we obtain the limiting distribution of the least squares estimator for (O,, 8,) of the model

eo+el&-,+e, , t = 1 , 2 ,..., &’{Yo, t = O , ( 10.1.28)

under the assumption that 8, f 0 and 8, = 1. The made1 (10.1.28) with 8, # 0 is sometimes called a random walk with d$t, where 8, is the drift panuneter. Under ( 10.1 28)

y, = Y, + eot + w, , (10.1.29)

where = Z;=l e,. When 0, # 0, the time trend wiU. dominate the long-run behavior of Y, of (10.1.29). This is because W, = QP(t1”) and hence W , is small in probability relative to t. The least squares estimator of 6 = (t90, 8,)’ is 4 = (do, J,)’, where

Page 589: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

566 UNJT ROOT AND I 'Spu)sIVE TIME SERIES

$0 = Y(0) - #lY(-l) (10.1.30)

y ( , ) = ( n - l)-I 2 y,+, , i = O , -1 . I -2

We give the limiting distribution of the least squares estimator under the assumption that O0 # 0 and el = 1.

Theorem 10.13. Let the process Y, satisfy (10.1.28) where {e,) satisfies the conditions of Theorem 10.1.1 or is a sequence of independent identically distributed (0, a') random variables. Assume 4 ZO and 0, = 1. Let the least squares estimator be given by (10.1.30). Then

Proot. We have

and

where T = 0.5(n + 1) and = (n - l ) - ' Z:-2 K-l. By our previous results, n

Similarly,

and

Page 590: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTOREGRESSIVE TIME SERIES 567

nl"($ - N(0, 4v2) .

The joint limiting distribution follows by Theorem 6.3.4. A

The regression estimated equation for the model (10.1.28) can be written in the form

AU,=AF+(dl - l)(U,-] -y(-,,), (10.1.3 1)

i n t r o d d in (10.1.26). Tbis form has the advantage that AF is asymptotically normally distributed with mean 8, for all 8, when @, = 1.

A time series of the form (10.1.28) with 8, # 0 will tend, over time, to increase if 6, > 0 and to decrease if 0, < 0. A natural alternative model for a time series displaying such behavior is as the sum of a stationary time series and a time trend. Let

U, = 4) + 4,t +X,, (10.1+32)

where X, is an autoregressive time series satisfying

By substituting (10.1.33) into (10.1.32). we obtain

P

U, = e, I- ~lr t + el?-, + C gcv,,+, - q,) + e, , ( 1 0 . 1 . ~ j = 2

where I&, = Qio( 1 - 8,) + 0, q5, - 4, ZiS;, tj and el = c#+( 1 - 8,). Thus, one way to investigate the hypothesis that U, contains a unit root is to fit the regression model (10.1.34) by ordinary least squares and test the hypothesis that 8, = 1. The distribution of the test statistic differs fiwm that obtained when time is not included in the regression.

Theorem 10.1.6. Let Y, satisfy (10.1.34) where {eI}rPI is such that

E{(e,, I &,-,I = (0, a2) B.s.,

~ { l e , l ~ + ' J ~ - , } < ~ < ~ a.s.

for some 6 > 0 and t = 1,2, . . . , where dI is the sigma-field generated by (e,,e,, . . . , e,). Assume 89= I .

Let ($,, I$,, 6,. 4,. . . , ep) be the ordinary least squares regression coefficients, and let

Page 591: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

568 UNIT ROOT AND EXXPUISIVE TIME SERIES

be the ordinary r e p s i o n pivotal for 6,. Then

d ?T+ OS(G - H 2 - 3K2)-L'2[(T - W ) ( T - 6K) - I] ,

where

W(f) is the Wiener process, and T, G, H, x , snd U, are defined in Theorem 10.1.1 and Theorem 10.1.3. Also, the limiting distribution of

n112[(d2 - e;), (83 - e;), . . . , (4 - e;)i is the same as the limiting distribution of the least squares estimator obtained by regressing AK on . . ,

Proof. Omitted, See Dickey and Fulier (1979). A

In (10.1.34). we expressed Y, as a function of the vector (1, t, K-,, AT-, , . . . , AK-,,+,) and computed the estimates by ordinary least squms. A second estimation procedure that has been heavily used in practice is to adjust the observations by subtracting the mean functions as estimated by ordinary least squares. The conclusions of Theorem 10.1.4 and of Theorem 10.1.6 also hold for the coefficients obtained in the regression of y, on ( Y , - ~ , Ay,-, , . . . , Ay,-J where y, is the deviation of & from the estimated mean function.

10.13. Alternative Estimators In the stationary case, we presented a number of alternative estimators of the parameters of the autoregressive process. Tbe maximum likelihood estimator associated with (8.1,9), the least squares estimator (8.2.6), and the weighted symmetric estimator associated with (8.2.14) all have the same limiting dis- tribution when is a stationary autoregressive process. However, the limiting distributions differ if the pmcess has a unit root.

We begin our discussion with the first order model and let

+e,, t=2,3, ..., t ' 0 , (10.1.35)

where the e, are independent (0, a') random variables. The simple symmetric estimator for p minimizes (8.2.14) with w, ss 0.5 and is

( 10.1.36)

Page 592: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTOREOReSSlVE TIME SERIES 569

The error in the simple symmetric estimator is

(10.1.37)

and if p = 1, the erzo~ reduces to - I n

p$ - 1 = - [ <Y," - Y :I + 2 i: Y:- c e; , (10.1.38) t -2 r=2

where we have used (10.1.13).

Q, is (8.2.14) with w,=O5. Then An estimator of u2 associated with (10.1.36) is 5: = (n - 2)-'Qs(pS), where

If this estimator of variane is used to construct a pivotal statistic, the pivotal reduces to a function of A. Letting

tF.,2[0.5(Y: + Y,') + 2 Y:]-' (10.1.40) 1=2

we have

.i, = -(n - 2y2(1 + pJ-L12(1 - p y 2 . (10.1.41)

The estimator constructed with the weights of (8.2.15) is called the weighted symmetric estimator and can be written as

- I n

a - ( 10.1.42) r-2

The error in the weighted symmetric estimator when p = 1 is

(10.1.43)

The pivotal statistic for the weighted symmetric estimator is

- 1 -112

rw= A [ - 2 p 1 Y : + n - l e Y : ) cr, ] ( f i W - - 1 ) , (10.1.44) 1-2 1-1

where 8; is the estimator of u2 constructed by dividing Qw(pw) by n - 2. The

Page 593: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

570 UNlT ROOT AND W O S I V E TIME SERIES

Iimiting distributions of the symmetric estimators follow from (10.1.38) and ( 10.1.43).

Theorem 10.1.7. Let the assumptions of T h m m 10.1.1 hold, and let /?, be by (10.1.42), and ?w by (10.1.44). Then defined by (l0.1.36), .i, by (10.1.40),

P n(P, - I)--+ -(2G)-’, Y .i,+ -0.5G-’”,

n(&, - 1)-% 0.5G-’(T2 - 1) - 1 , P

?w--+ 0.SG-”2(TZ - 1) -

where G and Tare defined in ~ m m 10.1.1.

Proof. From (10.1.38) we have

= -0.5[ n-2 .:I - l a 2 + OJl) ,

and the first result follows from Theorem 10.1.1. The limiting distribution for 7: is a consequence of (10.1.41).

Using (10.1.43), and the limiting distributions of n-’X;=., 1 , and of n-’ 2rS2 Y,-,e, derived in Theorem 10.1.1, we obtain the results for n(& - 1) and eW. A

The first part of Table 10.A.3 contains percentiles of the distribution of t, and the first part of Table 10.A.4 contains the percentiles of the distribution of C. The percentiles were generated using the Monte Carlo method. The percentiles for the limiting distributions were constructed by approximating the infinite sums that define (G, T, H, K) with finite sums. The raw Monte Carlo percentiles were smoothed using a function of 1 2 - I . The standard errors of the estimates of Table 10.A.3 are generally less than 0.01, and those of Table 10.A.4 are generally less than 0.007.

- 1 is always negative, and this is reflected in the percentiles of Table 10.A.3. The percentiles of n(& - 1) are defined in tern of those of Table 10.A.3 by quation (10.1.41). The distribution of fis has somewhat smaller tails than the distribution of the ordinary least squares estimator. For example, the distance between the 0.025 and 0.975 percentiles of the limiting distribution is 11.03 for rii, and is 12.10 for b. About 10% of the values of pw are greater than one, and the 90th percentile of ?w is close to zero.

The difference

Page 594: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNlT ROOT AUTOREORESSIVE TIME SERIES 571

The distributions of the symmetric estimators are a W o n of other variables included in the regression model, as is the distribution of the ordinary least squares estimator. Let xk be an estimated “mean function” for the time series Y,. Examples of j;, arr: the sample mean

the linear regression trend function

fy ,2 = b, + 6 , f , (10.1.45)

where (b,, b , ) is the vector of regression coefficients obtained in the ordinary least squares regression of U, on (1, t), and the quadratic regression trend function

fYg3 = 60 + b,t + b2t 2 . where (bo, b, , b2) is the vector of regression coefficients obtained in the ordinary least squares regression of Y, on (1, t, t2).

The limiting distributions of the .r-statistics am given in Theorem 10.1.8. The limiting distributions of n(Pg1 - I), g = s, w and j = 1,2, are obtained by squaring the denominators in the expressions. For example, the limiting distribution of n(& - I ) is the distribution of -0.5(G - H2)-’,

Theorem 10.1.8. Let the assumptions of Theorem 10.1.7 hold. Let Ps, be the estimator obtained by replacing U, with y( - f y , j in (10.1.36), and Iet tj be the pivotal obtained by replacing Y, with U, - f y f j in (10.1.40). Let &, and gW, be constmcted in an analogous manner using (10.1.42) and (10.1.44). respectively. Then

Y 2 -112 PSI+ -0S(G - H ) ?s2+ -0.5(G - H 2 - 3KZ)-*”,

?wl + (G - H 2 ) - r’2[0.5(T2 - 1) - TH - G 4- 2 H 2 ] ,

, y:

X

Y 0.5[(T - 2HXT - 6K) - 11 f (H - 3K)* - (G - H 2 - 3K2) (G - H 2 - 3K2)’I2 Pw2- 9

where G, T, H, and K are defined in Theorem 10.1.6.

Proop. The limiting distributions of the simple symmetric test statistics are obtained by replacing Y, with Y, - f y , j in (10.1.45), where - 1 is given in (10.1.38), and the limiting distributions of the weighted symmetric test statistics are obtained by replacing Y, with U, - f y t j in (10.1.44), where p,,, - f is given in

Page 595: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

572 UNIT ROOT Ah: i EXPLQSIVE TIME SERIES

( 10.1.43). We have

"he limiting distributions of the test statistics follow from the joint Wting distribution of the components. A

The distributions of the pivotal statistics for the mean adjusted case are given in the second parts of Tables lO.A.3 and 10.A.4. The distributions for the linear trend adjusted case are given in the third parts of the tables. In addition, the distribution for the simple symmetric pivotal is given for the quadratic trend adjusted case.

The distributions of the symmetric estimators generalize to higher order processes in the same manner as the ordinary least squares estimators. Table 10.1.1 contains a data arrangement that can be used for the weighted regression estimation of the autoregressive model written as

where 2, = yt - Yl-t. If the model contains a mean function, the procedure is adjusted as described in Section 8.2.2. The hypothesis of a unit mot is tested by testing the hypothesis that @, = 1.

Table 10.1.1. Data Arrangement for Regresasron Eeumntlon 02 Antorrgresgfve Pprame- ters by the Weighted Symmetric Procedure

parameter Weight Dependent

B,

y 2 - y, y 3 - y2

... Variable 81 82

... y, - Yp-1 WP' l Yp, , YP

w p + 2 u,+, Y P + l y,+, - y, ...

Page 596: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNm ROOT AUTOREOReSSIVE TIME SERIES 573

Theorem 10.1.9. Let the assumptions of Theorem 10.1.2 hold. Let w,, 0.5, and let wf, be defined in (8.2.15). Let t& be the value of 8 that minimizes

2 i: wtg(Y1 , - 8 l Y f - l . j - e z z , - 1 , j - * * * - 8pZt -p + 1, j 1 f = p + l

pI.oof. omitted. A

The estimator of p for the autoregressive process (10.1.35) constructed by maximizing the n o d stationary likelihood has a different limiting distribution than either that of ordinary least squares or those of the symmetric estimators when p = 1. The limiting distribution of that estimator, which we call the maximum likelihood estimator, has been given by Gonzalez-Farias (1992).

Theorem 10.1.10. Let the assumptions of Theorem 10.1.7 hold. Let h,,, be the value of p that maxitnizes

L(Y; p, u') = -0.k log 2~ - 0.5n log u2 + 0.5 log( 1 - p 2 )

- ( 2 ~ ~ ) - ~ [ ( 1 - p ' ) Y ~ + ~ 1=2 (V, -pV,- , ) ' ] . (10.1.48)

Then

n(h,,, - 1 ) L 0.25G-'{(T2 - 1) - [(T2 - 1)' + 8G]"2},

where G and T are defined in Theorem 10.1.1.

Proof. From .(8.1.10), the cubic equation defining ( p - 1) = 6 can be written

&(a) = a3 + (3 + C,)S2 + (3 + 2c, + c2)6 + (1 + c, + c2 + cg) = 0, (10.1.49)

where c, = -fi, c3 = (n - 2)-'np,

Page 597: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

574

and

UNIT ROOT AND EXPLOSIVE TIME SERlBs

We observe that

3 +c, = 3 - p = 2 + Op(n-'),

3 + 2c' + c2 = -2(8 - 1 + n - * ) + op(n-2), 1 + c1 +c, + cj = 2n-'(p- I + n - ' ) - 6 + ~,(n-~). I

It follows that the roots of (10.1.49) converge in probability to 0, 0, and -2. Hence, the root in (-2.0) is converging in probability to zero. Also,

n2+5 G - ' T , ,

n(@ - 1 ) A G-'[0.5(T2 - 1) - GI , a

nZ(l + C, + C, + c 3 ) d -G-' . Let

g,@) = + a',& + a2, *

whereyaon = 3 + cl, uln = 3 + 2c, + c,, and a,, = 1 + c1 + C, + c3. Because n2a2,,+ -G-', tbe roots of g,(S)=O itre real and of opposite. sign with probability approaching one as A increases. Letting S, denote tfie negative root of g,(S) = 0, we have

nS, = n(&,n) - ' [ -a ,n - - h o n a 2 n ) 1 ' 2 ~

=0.5n(/?- 1 +n-' - [ ( p - 1 -n- ' ) , + 26- 4n - 2 1 112 )+ O,(n-')

-A 0.25G-'{(T2 - 1) - [(T' - 1)2 + 8G]"2}. (10.1.50)

Also,

and

Page 598: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTOREORESSIVE TIME SERIES 575

for n large, 8,,, C 4 for n large. It follows that

-nb + [n2b2 - 4n28~(2u,, + 6S*)]”’ a + 0, 2(2q),n + G*) n ( a m - 4.) =

where b = h,,& + u , ~ ~ + 38;. Therefore, the limiting distribution of n8, is the A same as the limiting distribution of n&.

From (10.1.50), we see that the value of li, - 1 is close to the value of bw - 1 unless jiw - 1 is positive, or unless j5, - 1 is negative and 2(n2fi - 2) is large relative to pw - 1. The limiting distribution of 2(n2;7 - 2) is the distribution of 2G-’(T2 - 2G), where E{T2 - 2G) = 0. Empirical studies show that the 5% and 10% points of the distribution of n(& - 1) are very similar to those of the distribution of n(& - 1).

If p is unknown and the estimator of (p, p, a’) is obtained by maximizing the likelihood (8.1.5), the limiting distribution of the estimator of p differs from that of Theorem 10.1.10. Let &, denote the estimator of p that maximizes the likelihood (8.1.5). If Y, satisfies the assumptions of Theorem 10.1.7, Gonzalez- Farias (1992) has shown that

9 4 P m l - 1 1 4 l 9

where 6 is the negative root of

a , ~ “ + a313 + a212 + a , 5 + a. = 0,

[a,, a3, a,] = [ -4, -T2 + 1 - 2TH + H 2 - 8(G - H’), 2(G - H 2 ) ] , -4T2 f 8 + 8TH - 8 H 2 4- 2(T - W)’ , Q ,

a,= 8(G -H2) t 4 T - 5 - 8TH + H2, and G , H , and T are defined in Theorem 10.1.6. Gonzalez-Farias (1992) also showed empirically that the limiting distribution of the estimator of p that maximizes (8.1.5) is nearly indistinguishable from the limiting distribution of the estimator that maximizes the known mean likelihood (10.1.48) where p is replaced with y’.

Corollary 10.1.10. Let the assumptions of Theorem 10.1.7 hold Let be the value of p that maximizes (10.1.a) with y,, = U, - = Y, -& replacing u,. and let Fm2 be the value of p that maximizes (10.1.48) with Y , ~ = Y, -&,2

replacing U,, where jYt2 is the ordinary least squares estimator of the linear trend function. Then

Page 599: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

576 UNIT ROOT AND EXPLOSIVE TIME SERIES

where gj is the limiting distribution of n(Pwj - 1) obtained by multiplying the distribution of PWj of Theorem 10.1.8 by (G - H2)-"' for j = 1 and by (G - H z - 3KZ)-"' for j = 2,

q, = (G - H2)- ' [H2 + (T - H)'] , and

= (G - H2 - 3K2)-'[(H - 3K)' + (T - H - 3K)'] . Proof. The results follow from (10.1 SO). A

A pivotal statistic for the test of p = 1 can be constructed by dividing @, - 1 by the estimated standard error of b,,,,. For higher order processes, the test is constructed by testing the bypothesis that the sum of the autoregressive co- efficients is one.

T h m are several methods available for computing the estimated covariance matrix of the estimated coefficients. One procedure is to write the concentrated log likelihood of (8.4.5) in the form

It

2 g:(Y; 6) , (10.1.51) 1-1

where g,(Y; 6) = c(6)zI(Y; 6), z,(Y; 8) are uncorrelated (0, a') random variables when the function is evaluated at the true 9 and 46) is a constant depending on 4 Then the maximum likelihood estimators are obtained by minimizing (10.1.51), and an estimator of the covariance matrix of the maximum likelihood estimator &m

is

(10.1.52)

where hi(Y &,) is the vector of derivatives of g,(Y 8) with respect to 8 evaluated at the maximum likelihood estimator,

I9

&: = (n - t)-' C z;(Y; 6) (10.1.53)

is the maximum likelihood estimator of a' adjusted for dewes of freedom, and r is the number of parameters estimated.

Table 10.A.5 contains the percentiles of tbe test statistics based on the maximum likelihood estimator and the estimated covariance matrix (10.1.52). This table was constructed by the same procedutes used to construct Table 10.A.4. We have presented a numbex of test statistics for the unit root problem. Tb

ordinary least squares estimators are easy to compute and can be computed with an

f P 1

Page 600: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTORBGRESSIVE TIME SERIES 577

ordinary regression program, They may be used in exploratory procedures even when other programs are available. The simple symmetric estimator is next in ease of computation and can also be computed with an ordinary least squares regression P*gram*

If the objective is to test the hypothesis of a unit root against the alternative of a stationary process with unknown mean, the maximum likelihood and weighted symmetric procedures are recommended. Monte Car10 studies have demonstrated that these two procedures have nearly identical power against the stafionary unknown mean alternative and that they are much more powerful than ordinary least squares in that case. The power of the test based on the simple symmetric estimator falls between that of ordinary least squares and that of maximum likelihood, being closer to that of maximum lielihaod. See for example, PantuIa, Gonzalez-Farias, and Fuller (1994) and Park and Fuller (1995).

If a test for a unit root against the alternative of a model with known mean is desired, the ordinary least squares procedure is recommended. See Park and Fuller (1995).

Example 10.1.2. We continue the analysis of the one-year treasury bill interest rate begun in Example 10.1.1. The simple symmetric estimator of the third order process is

A t = 0.106 - 0.020 -I- 0.354 AK-, - 0.089 A&-, . (0.062) (0.01 1) (0.065) (0.066)

The regression equation was computed using an ordinary regression program and the data arrangement of Table 10.1.1 with an intercept included in the regression. The estimated error variance is 0.083. The value of the statistic for a test of a unit mot is - 1.76, which is greater than the 0.10 tabular value of -2.34 given in Table 10.A.3.

If we use the weights of (8.2.15), we obtain

Aft = 0.090 - 0.015 &-, + 0.360 AT-, - 0.092 AT-, (0.062) (0.01 1) (0.065) (0.066)

as the weighted symmetric least squares estimated equation. The test statistic for a unit root is -1.36, which is greater than the 0.10 tabular value of -2.23 given in Table 10.A.4.

The maximum likelihood estimates for the third order process are

(-6,,-4, -bi,)=(1.3436, - 0.4510, 0.0915), (0.0650) (0.1053) (0.0656)

where we used procedure ARIMA of SAS/ETSe to compute the maximum likelihood estimator. This program uses the Marquardt algorithm and equation (10.1.52) to compute the estimated covariance matrix of the coefficients. The sum

Page 601: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

578 UNIT ROOT AND EXPLGSIVE TIMe SENtLS

of the estimated autoregressive coefficients is 0.9841, and the estimated standard e t ~ o ~ of the sum of the coefficients is 0.0119. Therefore, the maximum l i k e l i i pivotal for testing the bypotksis of a unit root is - 1.34. The computed value is greater than the 0.10 tabular value of -2.32 given in the second part of Table 10.A.5.

In thii example, all procedures lead to essentially the same conclusions. As stated in the text, the weighted symmetric and maximum likelihood procedures are the recommended pocedures when the mean is estimated. In this example, and in the majority of practical situations, the weighted symmetric estimator and the maximum likelihood estimator will both reject the hypothesis of a unit root or both accept the hypothesis when the test is against the stationary alternative. AA

From the tabled distribution of n@ - 1) of Table 10.A.1, it is clear that the least squares estimator of the unit root parameter of an autoregressive process is biased. Also, the estimator of the autoregressive parameter is biased in the stationary case. See Section 8.2. In the unit root case, the bias is order n-', the same order as the standard deviation of the estimator.

There are several possible situations associated with the use of data to investigate an autoregressive process. One may be willing, on the basis of subject matter knowledge, to specify that the process is statiomy. Then the use of an estimation procedure, such as maximum likelihood based on the stationary likelihood, that restricb all mots of the process to be less than one in absolute value, is appropniite. In a second situation, one may have sttong theoretical reasons for believing a unit root is present. If so, one may choose to test for a unit root and, if the test is consistent with the hypothesis of a unit mot, estimate the process subject to the restriction of a unit root. In the third situation, a unit root is a distinct possibility, but not to the extent that it is a model to be maintained unless rejected by the data. In this third case, it is natural to adopt an estimator that is less biased in the vicinity of the unit root than are the ordinary estimators.

Assume that the model is a pth order autoregressive process. Assume we wish to restrict the estimator of the largest root to the interval (-1, I]. Under the assumption that only a single root of one is possible and that there is no drift if a mot is equal to one, an estimation procedure is the following.

1. Estimate the model P

Y, =so + e,y,-, + C 9 A Y , - ~ + , + e, (10.1.54) j - 2

by weighted symmetric least squares, and compute the statistic CI for the test that el = 1. This can be done by fitting the weighted least squares regression of y, on (y,-,. A<-, , . . . , AU,-,+J where Y, = U, - ji,.

2. Construct a new estimator of 8,

8, = b, + dew, )r9{t31}10.5 , (10.1.55)

Page 602: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AUTOREGRESSIVE TIME SERIES 579

where 6, is the estimator from step I ,

+WI if t w 1 8 - l . 2 , c ( t W l ) = 0.035672(@w1 + 7.0)2 if -7.00< C -1.2, I. if .iwl 6 -7.00,

and 0{8,] is the estimated variance of the weighted least squares estimator. 3. Estimate (e2,. . . , %) by the regression of y, - t?lyr-l on AT-j, j =

1,2, , . . , p - I. Denote the estimators by (g2,. . . , OJ.

In the presence of a unit root (8, = l), the outlined estimation procedure is approximately median unbiased for 8, because the median of tWl is about - 1.20 when 8, = 1. Of course, if 8, is restricted to the interval (- 1, I], it is not possible to construct a reasonable estimator of 0, that is mean unbiased for all 8, E (- 1,1]. The function c(PW,) of (10.1.55) was chosen to be a smooth function of t,,,, with value of 1.20 at .iW1 = -1.20. The modified estimator differs from the weighted least squares estimator if +w, > -7.00. This set of t, corresponds, approximately, to j1 > (n + 49)-’(n - 49). The weighted least squares estimator J1 is biased for most values of el, but the bias is small relative to the standard error for 8, considerably less than one and n large.

The empirical properties of the weighted symmetric estimator and the estimator (10.1.55) are compared for the first order process in Table 10.1.2. The true model is

q=pU,- , +@,,

where e, -N(O, 1). The generated ptocess is stationary if p < 1, and Yo = 0 if p = 1. The weighted least squares estimator, denoted by a, is defined by (10.1.42) with U, replaced by Y, - y, provided the estimator is less than one, and is equal to one otherwise. The entries in the table are based on 3000 samples.

defined by (10.1.55) has a uniformly smaller median bias than the weighted least squares estimator Jw. The mean bias is also uniformly smaller. However, the empirical variance of bW is smaller than that of 6 for p differing from one by more than 8n-’, and the mean square error of bw is smaller than that of for p differing from one by more than four standard errors of the estimator. In those situations, the mean square error of exceeds that of b,,, by modest percentages. The largest percentage difference in the table is about 8%. On the other hand, for all three values of n, the modified estimator has a mean square m r less than one half that of for p = 1. For a fixed p, such as 0.75, the mean square emor multiplied by n declines as n increases for both estimators, primarily because of the decrease in bias.

The last two columns of Table 10.1.2 contain the empirical 2.5 percentiles of the statistics

The estimator

Page 603: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

580 UNIT ROOT AND EXPLOSIVE TIME SERlEs

Table 10.13. Empirical Properties of Estfmatar (10.155) and Weightea h a s t Squares Estimator for Flmt Order Proms

Median nMSE Pmntiles

0.15 0.50 0.75 0.85 0.90 0.95 0.98 1 .oo

0.50 0.75 0.85 0.90 0.95 0.98 0.99 1 .oo

0.75 0.85 0.90 0.95 0.98 0.99 1 .oo

0.122 0.456 0.695 0.791 0.838 0.884 0.912 0.933

0.479 0.724 0.822 0.870 0.9 19 0.948 0.957 0.965

0.738 0.836 0.886 0.935 0.964 0.974 0.983

0.126 0.49 1 0.757 0.860 0.909 0.955 0.98 1 1 .Ooo

0.483 0.747 0.855 0.905 0.956 0.984 0.992 1 .OOo

0.742 0.847 0.901 0.953 0.983 0.992 1 .Ooo

n-550

1.046 0.998 0.857 0.782 0.747 0.720 0.710 0.728

n = 100

0.836 0.630 0.521 0.459 0.395 0.363 0.360 0.379

n = U X )

0.562 0.421 0.342 0.262 0.216 0.204 0.203

1.127 1.064 0.748 0.567 0.467 0.357 0.309 0.311

0.887 0.624 0.452 0.352 0.241 0.170 0.156 0.159

0.590 0.421 0.313 0.199 0.125 0.101 0.09 1

- 2.25 -2.36 - 252 -2.58 -2.64 -2.71 -2.82 -2.83

-2.13 -2.27 -2.38 -2.45 -2.57 -2.71 -2.75 -2.80

- 2.29 -2.33 - 2.40 -2.52 -2.69 -2.74 -2.80

- 2.25 -2.31 -2.28 -2.22 - 2.20 -2.19 -2.25 -2.22

-2.13 -2.16 -2.13 -2.1 1 -2.10 -2.15 -2.16 -2.17

-2.28 -2.25 -2.20 -2.18 -2.21 -2.19 -2.17

and

where jjw is the weighted symmetric estimator and p is the estimator (10.1.55). Because p has small median bias, the pivotal statistic 1; has a median close to zero

Because the 2.5 percentile for +w, is -2.80 for n = 200 and p = 1, the 2.5 for all values of p.

Page 604: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AClTOREGRESSlVE TIME SERIES 581

percentile of fI is -2.17 for n = 200 and p = 1. The estimator bW has a negative bias for positive p, and the 2.5 percentile is less than -2.0 for p’s that are a considerable distance from one. The percentiles for il show a steady decrease to about -2.8 at p = 1.00. The empirical 97.5 percentiles of fl and i, are compared in Table 10.1.3. The

percentiles of i, decline as p moves towards one. The maximum vdue of the 97.5 percentile for fl is approximately equal to 2.30 for 1 - p = 12n-’. At least 2.5% of the estimates jj are equal to one for all three sample sizes when 1 - p = 6n-I. The majority of the 97.5 percentiles of f , fall below 2.30, and the majority of the 2.5 percentiles are above -2.30. Therefore, the interval

( 10.1.56)

will have coverage greater than 95% for most parameter values and for sample sizes in the 50 to 200 range. Because p is restricted to (- 1,1], the actual confidence interval is the intersection of (10.1.56) and (- 1,1].

Example 10.13. We continue the analysis of Example 10.1.2 under the assumption that the largest root of the autoregressive process is in the interval (-1.1). The statistic +wl from Example 10.1.2 is -1.36. The modified estimator of (10.1.55) is

8, = aW, + 0.013 = 0.997

Regressing y, - 0.997y,-, on AT-, and AU,-,, we obtain the estimated equation,

y,= O.997yt-, + 0.355AY,-, - 0.103 AT-, , (0.01 3) (0.065) (0.066)

where y, = U, - ji,. The standard errors for the coefficients of AT-, and AT-, are those computed for the original weighted regression of Example 10.1.2. The standard emr for the coefficient of is the standard error from Example 10.1.2 increased by 15%. With these standard mrs, normal tables can be used for approximate tests and confidence intervals for the parameters of the equation. An

Table 10.13. Empirical 975 Percentiles of T and i n=JO n-100 n=200 -

1 - P ‘1,973 ‘1,975 f, ,975 ‘1.97, tl,975 ‘1.975

Zn-‘ 1.49 2.02 1.58 2.17 1.55 2.17 12n-I I .26 2.27 1.34 2.29 1.36 2.32 6 n - I 1.04 2.04’ 1.11 2.08’ 1.05 2.Mt

Mom than 2.5% of the estimates equal one.

Page 605: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

582 UNIT ROOT AND E X W I V E TIME SERIES

approximate 95% confidence interval for 6, under the assumption that the largest root is in (-1, l] is [0.971,1].

The characteristic equation associated with the modified estimates is

m3 - 1.352m2 + 0.458m - 0.103 = 0,

and the largest root is 0.996. Using a Taylor expansion and the estimated covariance matrix with the variance of the coefficient of y t - l increased by 15%, the standard emr of the largest r o d is 0.013. An approximate 95% confidence interval for the largest root is r0.970, 11. AA

10.1.4. PredictrOn for Unit Root auto mom^ It is an important result that prediction procedures appropriate for stationary time series are also appropriate for time series with an autoregressive unit root. As in the stationary case, estimation of the parameters adds a term that is ~,<n-"' ) to the prediction error when the autoregression contains a unit root.

Theorem 10.1.11. Let Y, satisfy

P

= ~ , + B , Y , - , +1c q < ~ , - ~ + , - q - j ) + e , , j=2

where e, = 9, 4 = Xf-, ad, j = 2,3, . . . , p, and e, are iid(0, a2) random variables. Let Y,, Y,,.. ., Yp be fixed, and let & and 4 be the least squares estimators of a = (ao, a,, , . , , aP)' and 6 = (e,, S,, , . . ,@,)'. Assume that one of the roots of the characteristic equation is one and that the other root'. ire less than one in absolute value. Let fn+, be the predictor of Y ~ + ~ based upon rir. Then

s - 1

where the wj satisfl the difference equation D

subject to the initid conditions wo = 1 and

prool. By theorem^ 10.1.4 and 10.1.5, bir- O! = OP(n-I'*) a d

Page 606: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

583

It follows that

because ALIY, = OP( 1). Y, = Op(nl") if q, = 0, and Y, = Op(n) if a,, # 0. Repeated substitution into the expression

yields the result for s > 1. A

It follows from Theorem 10.1.11 that the estimator of the prediction variance given in (8.5.14) for the stationary process can also be used for a process with a unit root.

Example 10.1d. By Theorem 8.5.1 and Theorem 10.1.11, the prediction variance formula that ignores the effect of estimation error can be used for modeis whose autoregressive part has mots less than or equal to one in absolute value. In Examples 10.1.1 and 10.12, we found the behavior of the one-yeat treasury bill interest rate to be consistent with the hypothesis of a unit root. Least squares or maximum likelihood can be used to construct predictors that. are appropriate whether or not the process has a unit root.

The one-, two-, and three-period forecasts computed with the third order process estimated by stationary maximum Iikelihood are 9.01, 8.95, and 8.87, respectively. The standard errors estimated using the approximation that ignores the estimation error are 0.291, 0.486, and 0.626 for the one-, two-, and three- period forecasts. The corresponding approximate standard errors computed by the least squares prediction formulas are 0,292, 0.493, and 0,641. The least squares standard errors were computed by fitting the model with three explanatory variables for the three predictions as described in Example 8.5.1. AA

10.2. EXPLOSIVE AUTOREGRESSIVE TIME SERIES

We begin our study of explosive autoregressive time series with the first order pmcess. Let

Page 607: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

584 UNIT ROOT AND EXF'LOSIVE TIME SERIES

f?,V,-, +e,, t = l , 2 ,..., t = o , (10.2.1)

where I@, I > 1 and {e,}L -~ is a sequence of i.i.d. random variables with mean zero and positive variance u2. Equation (10.2.1) is understood to mean that V, is created by adding e, to elV,-,. As with other autoregressive processes, repeated substitution enables us to express V, as a weighted average of the e,,

r - 1

(10.2.2)

It foflows that, for lell > I, the variance of V, is

v{y,}= (ef - i)-'(eF - i ) ~ * (10.2.3)

and the variance increases exponentially in t. In expression (10.2.2), the term in el is 8:-'el. When lel1 > 1, the weight 6t-l

increases exponentially as t increases. Hence, the variance contribution from el is of order e y. If 16, I < 1, the weiat applied to the first observation declines to zem as t increases. If [el I = 1, the weight of the first observation becomes s d i relative to the standard deviation of the process. To furthet understand the importance of the fimt few e, in the model with

lol I > 1, we observe that

I

= e; Z: e; 'e , . (10.2.4) i s 1

Letting X, = e Liei, we have

X,+X a.s. (10.2.5)

as t increases, where X = X;=l 8;'ei. Thus for large t, the behavior of V, is essentially that of an exponential multiple of the random variable X,

The model (10.2.1) extended to include an intercept term and a general initiai value is

f?o+BIY,-I + e r , t = l , 2 ,..., Y, ={ Y o * t = O . (10.2.6)

where

Page 608: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXF'LOSIVE AUTOREGRESSIVE TIMe SERIES ss5

n

yct) = n- I y,+, , i = 0, - 1 . 1=I

If yo is fixed and e, - NI(0, a2), the least squares estimators are the maximum likelihood estimators.

Rubin (1950) showed that plim 8, = 8, for the least squares estimator of the model (10.2.1). White (1958, 1959) considered the asymptotic properties of the least squares statistics for (10.2.1) under the assumption that e, -M(o, a'). M i t e used moment generating function techniques to obtain his results. Anderson (1959) extended White's results using the representation (10.2.4). H a m (1977) studied estimators for the general model (10.2.6), and Theorem 10.2.1 is taken from that work.

Theorem 1021. Let {&}:==,, be defined by (10.2.6) with 1 and {e,} a sequence of iid(0.a') random variables. Let the least squares estimators be defined by (10.2.7). The limiting distribution of (6: - 1)-'0:(8, - 0,) is that of W,W;', where W, and W, are independent random variables,

/ m m

and 8,, = y o + Oo(el - l)-t, The limiting distribution of n"2(80 - ea) is that of a N(0, a') random variable, and n 'I2( do - $) is independent of e;( 0, - el) in the

in distribution to a Cauchy random variable. limit. If e, -M(O, a2), y o =0, and 0, = 0, then (fl; - 1) - I n - 4) converges

Proof. We may write Y, as

where

Now X, converges as. as r + to a random variable

(10.2.8)

where 6, = y o + eo(Ol - l)-'. Then e;'Y, -+X a.s. as t increases. Note that

Page 609: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

586 UNIT ROOT AND B)BU)SIVE "WE SERE3

The error in the least squares estimator of 6, is

(10.2.9)

where " f- I

using

and

( 10.2.1 1 )

we have

By similar arguments n

Page 610: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXPLOSIVE AUTOREGRESSIVE TIME SERIES 587

and

It follows that

(e: - i)-Ie;(b, - el) =x-’z, + O~(~I-~’~), (10.2.14)

and

2, and X are asymptotically independent. Therefore, (0: - 1)-’8;($, - 6,) has a nondegenerate limiting distribution. If

the et’s are NI(0, a’) random variables, then 2, and X, both being linear combinations of the e,’s, are normally distributed. The mean of the Z,, is zero and the mean of X is 4. so that the limiting distribution of (0; - l)-’@:(d, - 8,) is that of W, W; ’ , where ( W, , W,) is a n d vector. If S, = 0 and the e, are normal, then the limiting distribution of (6: - l)-’@;(d, - 8,) is that of the ratio of two independent standard normal variables, which is Cauchy. Now, from (10.2.7),

a0 - eo = a(,,) - (dl - el )y(- ,) = qo1 + o ~ n - 9

by (10.2.13) and (10.2.14). Also, n”zZ~,, is independent of X and of 2, in the limit, because the elements of

are independent and the vector L,, - (ntlzZ0, 2,) converges in probability to zero A for no < k,, < no.’, for example.

The distributions of other regression statistics follow from Theorem 10.2.1. Let the regression residual mean square be

Page 611: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

Codlory 10.2.1.1. Suppse the assumptions of Theorem 10.2.1 hold. Then

plim ci2 = a2. n - w

Proof. Using the lemt squares definition of $,

B2 = (n - 2)-1[ 5 (e, - 212 - <dl - el)2 i (T-, - y ~ ] . t = I r - I

We have shown that (4 -el) = U5(/Oll-") and that X;=l(u,-l -j&l))2 = Up(}61~-zn). Because the e, are iid(0, u ), the mean square of the e, converges in probability to a2. A

Cordlary 10.2.1.2. Suppose that e, - NI(0, u2). Then, under the model of Theorem 10.2.1.

W. We have

The random variables X and 2, are asymptotically independent, and thepvariance of 2, converges to (6;- 1)-'a2. The result follows, because 82-+a2 by corollary 10.2.1.1. A

Because the pivotal .i is approximately N(0,l) in large samplm, + may be used to test hypotheses concerning 8, and to set confidence intervals for 0, when the original errors are normally distributcd.

We now extend these results to the pth order process with a single root larger than one in absolute value, Let

P

( 10.2.17) j - I t = - p + l , - p + 2 ,..., 0,

where ( Y - ~ + ~ , . . . , yo) is an initial vector and {e,}:=-- is a sequence of independent identically distributed (0, a') random variables. Let lmll > Im,I 2 lm3( z - - .a Im,l be the roots of the characteristic quation

Page 612: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXPLOSIVE AUTORujRESSIVE TIME SBRIES 589

(10.2.18)

Assume Im,l> 1 and al l other mts ate less than one in absolute value. We can also write the first equation of model (10.2.17) as

where the roots of

(10.2.20)

are the p - 1 mts of (10.2.18) that are less than one in absolute value. We consider a conceptual regression with the true m denoted by my, used to define the regression variables. Let

(10.2.21)

where

be the least squares estimator of 6 = (e,, el,. . . , e,)’, where 8, = m i . The conclusion of Theorem 10.2.2 is analogous to that of Theorem 10.1.2 in that the limiting distribution of the coefficient of K - l is closely related to the limiting distribution in the first order case and the other coefficients have the limiting distribution associated with the stationary process.

Two alternative assumptions for the initial conditions can be considered. In the first, the vector (Y-~+,, Y - ~ + ~ , . . . , yo) is treated as fixed. In the second, we let

+zt , t = u , ..., y, ={Yo 9 t - 0 , (10.2.22)

where yo is fixed and {Zt};=-= is a stationary autoregressive process with characteristic equation (10.2.20). The limiting distributions are the same under the two formulations.

Theorem 10.2.2. Suppose the model (10.2.17), (10.2.22) holds with lePI > 1 and the roots of (10.2.20) less than one in absolute value, where 6’ is the true parameter vector. Let 4 be defined by (10.2.21), and let

6, = d’Z[($ - e&($ - 63,. . . , (4 - e,O)y .

Page 613: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

590 UNIT ROOT AND EXPLOSIVE TIME SERIES

Then ti:;:(&, - 0;) converges to a random variable witb mean zero and variance u2 and

Proof. We have

where

D, = diag[n, (2 Y t 1 ) , n,. . . ,n] .

By the arguments used in Theorem 10.2.1

and Uf converges to

To simplify the notation, we now write ml for my. It follows from the arguments used to obtain (10.2.12) and (10.2.14) that

Now,

We have

and

Page 614: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXPLOSIVE AUTOUEQIRESSIVE TIME SERIES 591

Therefore,

and

It follows that

where the row in A* associated with d1 is (0,1,0,. . . ,O). The matrix obtained by deleting the second row and column of A* is A. By the arguments of Theorem 8.2.1,

converges to a n o d vector random variable and we have the limiting result for 5,. The remaining results follow from arguments used in the proof of Theorem 10.2.1. A

In Theorem 10.2.2 we obtained the limiting distribution of estimators where some of the explanatory variables were a function of the unknown true maximum root. These results can be used to construct confidence intervals for the largest root. This operation is illustrated in Example 10.2.1. In Corollary 10.2.2 we give the limiting distribution of the ordinary regression pivotal for the largest root.

CorolIary 10.22. Let (ijO,rii,, l&,. . . , OP) be the nonlinear least squares estimator of the parameters of (10.2.19), where Itii,l is at Ieast as large as the largest absolute value of the roots of the estimated equation (10.2.20). Let the assumptions of Theorem 10.2.2 hold with e, -NI(O, a'). Let

Page 615: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

592 UNIT ROOT AND EXPLOSIVE TlME SERIES

Then

where 6,, is the second diagonal element of the element associated with fil.

k P , The nonlinear least squares estimator for m,, based on the model (10.2.19), differs from the root calculated with the characteristic equation of the linear least squares fit of (10.2.17) only in that the rii, from the nonlinear fit is always a real number. By Thwrem 10.2.2, the linear least squares estimators are consistent. Because the true largest root is real, the probability approaches one that the two estimation procedures agree. The estimated characteristic equation, based on the estimator of (10.2.21), is

The coefficients of U, j = 1,2, . . . , p, obtained from the estimator ( 10.2.2 1 ) are the same as those obtained by the least squares fit of the model (10.2.17). Therefore, the roots of (10.2.23) are the same as the toots of the estimated characteristic equation associated with the least squares estimatM of (10.2.17).

By a Taylor expansion, we have that the largest root of (10.2.23). denoted by rEt , , satisfies

and m, -my = OJlmyl-"). It follows that

Also.

Page 616: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BxpLosIVE AUTOREoRESSIVE TME SERIES 593

(10.2.25)

and U is defined in the proof of Theorem 10.2.2. By the arguments used in the proof of Theorem 10.2.2,

Therefore,

Because the multiplier for 8, - m: in (10.2.24) is the square root of the multiplier for (m~)*"[(m0)* - 1]U2 in (10.2.25), the result follows from the

A limiting distFibution of &(dl - 0;) given in Theorem 10.2.2.

Example 10.2.2. Engle and Kraft (1981) analyzed the logarithm of the implicit price deflator for the gross national product as an autoregressive time series. The analysis we present follows Fuller (1984). We simplify the model of Engle and Kraft and use data for the period 1955 first quarter through 1980 third quarter. If we we least squares to estimate an autorepsive equation of the third order, 100 observations are used in the regression. The estimated characteristic equation associated with the third order process is

m3 - 1.429m2 + 0.133m + 0.290 = 0.

and the largest root of this equation is 1.0178. Because the largest root is greater than one, the estimated model is explosive. Before accepting an explosive model, it is reasonable to test the hypothesis that the largest root is one. To test the hypothesis of a unit rod, we regress the first difference on Y,-l and the lagged first diffmnces. The estimated equation is

where the numbers in parentheses are the estimated standard e m obtained from tbe ordinary least squam regression program. By Theorem 10.1.4, the statistic

7- = (0.00z0)-'(0.0054) = 2.72

Page 617: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

594 UNfT ROOT AND EXPLOSIVE TIME SERIES

has the distribution tabulated in the second part of Table 10.A.2 when the hugest root is one. According to that table, FP will exceed 0.63 about 1% of the time when the true root is one. Therefore, the hypothesis of a unit root is easily rejected

To set confidence limits for the largest root, we use Corollary 10.2.2. Let the coefficient of q-l in the regression of U, -mlY,-l on 1, q- , , q-l -mlY,-2, and ytW2 - m,q..., be denoted by 6. If m, > 1 is the largest root of the characteristic equation and if all other mts are less than one in absolute value, then the limiting distribution of the statistic i = (see, S>-lS, where s.e. 6 is the ordinary least squares standard enor, is that of a N(O.1) random variable. Therefore, we define a confidence set for m, to be those rn, such that the absolute value of the calculated E is less than the tabular value of Student’s t for the desired confidence level. The confidence set can be constructed by fitting models for a range of m,-values and choosing the values for which the test statistic is equal to the tabular value. For our data

~-1.0091Y,~,=-0.0211 + 0.00274Y,-,+ 0.4f7(Y,-l-1.0091Y,-2) (0.082) (0.00139) (0.098)

(0,099) + 0.288 (qV2 - 1.0091 q-,3)

and

f, - 1.0256 = - 0.0211 - 0.00254 Y,-* + 0 . M (q-1- 1.0256 q - 2 )

(0.0082) (0.00128) (0.098)

(0.097) f 0.283 (Y,-? - 1.0256 q-3) .

It follows that a 95% confidence interval for mI based on the large sample theory is (1.0091, 1.0256).

By Corollary 10.2.2, we can construct tests for the stationary part of the model. For example, the ordinary regression t-statistic for 5-3 - rn, yI-4 in the regression of $-m,Y, - l on ( l , Y , - , , ~ - l - m ~ Y , - ~ , ~ - 2 - ~ l Y , - ~ , ~ - 3 - m , Y , - 4 ) has a N(0,l) distribution in the limit. Because the t-statistic for the hypothesis that the coefficient for q-3 - m, q-4 is zero is identical (for any m, # 0) to the t-statistic for the coefficient of q.-4 in the regression of Y, on (1, Y,-, , yt+ yt--3, U,-,), we have a test of the hypothesis that the process is third order against the hypothesis that it is fourth order. The fitted fourth order model, using the estimated value of m, is

- 1.0178 Y,-l - 0.0207 - 0.00000 q-1 + 0.409 @‘,-I - 1.0178 <-2)

(0.00%) (0,001 14) (0.104)

(0.108) (0.104) + 0.280 (x - , - 1.0178 Y,-3) + 0.012 (Fe3 - 1.0178 K-4).

Because the regression pivotal for y1-4 is t = (0.104)-10.012 = 0.12, we easily

Page 618: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXPLOSIVE ALIT0RU;RESSNE TIME SERIES 595

accept the hypothesis that the process is third order. The argument extends to the use of an F to test that the vector of coefficients for a set of variables is zero. AA

We now consider the problem of predicting Y,+, for the explosive auto- regressive model. We derive the results for the first order model

Y,=O,Y,-, + e , ,

where the e, are NI(0, a') random variables, Given (Y,, Yz,. . . , Yn), we take as our prediction of Yn+s

t*s = &y, 9 (10.2.26)

where s>O. The error in our prediction is

where

Also,

- 2 n ) .

where we have used O;"U, =X+Op(tl;A). Now Z, and 2Jal 8f-je,+j are independent,

var(z,) = u'(e: - i)-I(i - e;211), and

Therefore, the variance of the leading term in the prediction error is 2 2 2fs-I)

VZK(C,,) = e l (e: - 1) + (e; - i)-'(e: - 1)i + oclelI-2n), (10.2.27)

where

Page 619: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

5% UNIT ROOT ANR BxpLoslvE TIME SER[ES

The variance of the prediction e m may be estimated by

(10.2.28) 1- s - 1 2 2 4:. - 1 + <so, ) Y , $(fn+, - Y,+,) = ci2 e; - 1 y - 1 YL,

For s = 1, this estimated variance reduces to the familiar regression formula for the variance of a prediction.

In the explosive case, unlike the case with lol I d 1, the m r in $I contributes to the leading term of the variance of the prediction e m . It follows that the usual approximations, such as (8,5.13), are not formally correct when the process bas a root larger than one. However, the approximations based upon regression formulas remain appropriate for predictions with s > 1 and for higher order processes with a root greater than one in absolute value.

Example 10.2.2. In this example, we construct predictions using the model of Example 10.2.1 and the method of indicator variables described in Example 8.5.1. To obtain predictions for three periods, we write

Y, = e, + e,x,, + 62Xf2 + e,x,, + e,x,, + (86 - e,e,)x,, + (e, - e,e, - e2e5)x,, + e, ,

where the regression variables are defined in Table 8.5.1 of Example 8.5.1 and the estimator of (S,, 06, 4) is the predictor of (Y,+,, Yn+2, Yn+,). Using a nonlinear least squares program and the data for 1955-1 through 1980-4, we obtain

f81-2, fa,-,) = (4.5144, 4.5410, 4.56721, (0.0044) (0.0074) (0.0107)

where the numbers in parentheses are the standard errors output by the nonlinear least squares program. There are 117 observations used in the regression, and the residual mean square is 6' = O.oooO179. Using this estimate of u2, the approxi- mation (8.5.13) gives 0.0042 as the standard error for the one-period prediction. Thus, for a root close to one the difference between the appropriate estimator and the approximation based on the stationary model is not large for a one-period predictim. AA

10.3. MULTIVARIATE AUTOREGRESSIVE PROCESSES WITH UNIT ROOTS 103.1. Multivariate M o m Walk Before studying estixnators for the general autoregressive mode€, we give the distribution of the least squares estimator for a multivariate version of the first order unit root process. Let

(10.3.1)

Page 620: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

where Y, is a &-dimensional column vector and the e, are independent identically distributed (0, Zee) random variables. It is always possible to transform the vector Y, into a new vector X, that satisfies an equation of the form (10.3.1) with Z,, = I. One transformation is

x, = TY, 9

where T =Xi:" = QD;1'2Q', Q' is the orthogonal matrix such that

Q'LQ = D,

Q'Q = I, and D, is diagonal. We write the general first order autoregressive process as

Y, = H,Y,-, +el

and consider the least squares estimator of Hi,

(10.3.2)

(10.3.3)

If HI = I, the nonmalized error in the least squares estimator is

The quantities in the expression converge in distribution to random variables that are analogous to the limit random variables of Theorem 10.1.1. The first order vector autoregressive model with an intercept is

Y, = Ho + BIY,-, + e l , (10.3.4)

where H, is a k-dimensional column vector. The ordinary least squares estimator of HI for the mode1 (10.3.4) is

- 1

A;,@ =p r=2 v,-, --~<-J~Y,-, - T ( - 1 4 n

x 2 (YI-l - Y(-l)xyt - k(0,)' 9 (10.3.5) ,=z

where (P,-,,, q o ) ) = (n - 11-l x;-*(y,-l, YI).

Lemma 103.1. Let the model (10.3.1) hold. Let Yo be a vector random variable with finite covariance matrix, perhaps the m o matrix, independent of {el). Let (el}Ll be a sequence of random vectors, and let 4-, be the sigma-field generated by {el, e2,. . . , eI-i}, Assume

We,, e,e:) I 4-,) = (0, Zee) 8.s.

and

Page 621: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

E{le,(’+’ I . C & ~ ) < M < ~ as.

for some 6 > 0, or assume the e, are iid(0, Z,,), where Z,, is positive definite. Let fii be defined by (10,3.3), and by (10.3.5). Then

nZjj’(<ijt; - 1 ) Z ~ ” ’ ~ G-IY,

and

where I 0

G = 2 y?U,U: = I, W(r)W(r) dr , i== I

I ( P m

Y = 2(y, + q)-’yfqlJ,Ui - 0.51 = I, W(r) dW’(r) , i s 1 j = l

a

q = It] 2”’yiU, =W(l),

3 = (-1)’+’2[(21- 1)wI-l , i - I

U, = (Ull, is the standard vector Wiener process.

. , , Ukl)’, 1 = 1,2,. . . , is a sequence of NI(0, I) vectors, and W(r)

Proof, The result follows immediately from Theorem 5.3.7, See Phillips and A Durlauf (1986) and Nagaraj (1990).

The extension of ‘I%eorem 5.3.7 to a moving average process wilt be used repeatedly in this section,

Lemma 103.2. Let {e,) satisfy the assumptiom of Theorem 5.3.7. Let

m m

2 1Fjll<=, 2 K,=C, 1-0 i-0

llKjl[ is the maximum of the absolute values of the elements of $. and C is a nonsingular real matrix. Then

Page 622: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE AUrTOREOReSSlVE PROCESSES WITH UNIT ROOTS 599

where G, Y, 5, and W(u) are defined in Lemma 10.3.1.

Proof. Using the arguments in the proof of Theorem 10.1.2, the sums of squares and products of the Y, can be approximated by the sums of squares and

A products of C I ei. The results then follow from Theorem 5.3.7.

103.2. Vector Process with B Sfirgle Unit Root In this section we examine estimation of the parameters of a vector autoregressive process whose characteristic equation has a unit root. We begin by considering least squares estimation of the parameters for two special equations. The first is

Y , = B , Y , _ , +e,, t = 1,2 ,... , (10.3.6)

where the e, are independent (0, TJ random variables with bounded 2 + S (8 > 0) moments and Y, is a scalar. Assume E{e, I g-,} = 0, where .$-I is the sigma-field generated by [(q- , X, - I , e,- , ), CYf-2, Xr-2t e, . . . , (Yo. KO, eo)l. Assume that X, is a zero mean stationary R-dimensional pth order autoregressive process satisfying

(10.3.7)

where the 4 are independent (0,XJ random variables with bounded 24-8 (8 > 0) moments.

If Bl = 1, then from (10.3.6),

(10.3.8)

Also, from (10.3.7) and Theorem 2.8.1, fl'X,-, has a representation as an infinite moving average

Page 623: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

600 UNIT ROOT AND E X W I V E TIME SERIBS

Hence, by the proof of Theorem 10.1.2, I

y,"Yo + 2 u/ , j - 1

(10.3.9)

(10.3.10)

where u, = ciq-, + e, and ci = /3' Ey,o Kj. These properties of the processes can be used to obtain the limiting distribution of the feast squares estimator of (6,. p').

Theorem 10.3.1. Let the model (10.3.6)-(10.3.7) hold with 6, = I, where (er, e;) is a sequence of independent zero mean vectors with common covariance matrix and bounded 2 + 6 (6 > 0) moments. Let 8 be the least squares estimator of (B,, p')', where

(10.3.1 1)

and F, = (F-l,X;-l). Let

t , = [0(81}~-"2(~1 - I ) ,

4, - 1 = op(?l-l), where ${q} is the least squares estimator of the variance of 8,. Then

and

t ~ ~ ( g - @I--% NO, r;;ce6), where r,, = E(x,X:}, Q 4 7;, ? is defined in Corollary 10.1.1.2, 4 has the limiting distribution of Corollw 10.1.1.2, u, is defined in (10.3.10), p,, is the correlation between u, and el, d is a N(0,l) random variable, and % is independent of d. If (@,,/.?')=(LO), then

n(8, - 1 ) s (2G)-'(T2 - I ) ,

where (G, T ) is defined in Theorem 10.1.1.

Proof. From representation (10.3.10) and Theorem 10.1.1, we have

Page 624: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE AU'rOREGReSSIVB PROCESSBS WITH UNlT ROOTS 601

Also

because

Following the arguments used in the proof of Theorem 10.1.2, we have

Thus,

Where

and D,, = diag(n, n1I2, n i l2 , . . . , r i l l 2 ) . Therefore,

If /3 = 0, then Y, = e,,

and

n(d1 - 1 ) s (2G)-'(T2 - 1).

The distrib~tiona~ result for n'"@ - 8) follows because n-'" z:=~ X,-,e,

To establish the limiting result for t , , we see that satisfies tbe conditions of ~hmrem 5.3.3 with Z, = n-'/2~l-,e,.

Page 625: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

UNIT ROOT AND ExpzosIvEI TIME SERIES 682

Now

where be, = cr~uiaue, and d, = e, - bcuu, is uncorrelated with u,. By Corollary 10.1.1.1,

By Lemma 10.3.1,

n m m

where the limit nomal random variables (UU,,Udi) are defined for the p e s s (u,, d, ) by the expression following (10.1.12). Because u, and d, are u n c m l d , UMi and u d j are independent. Therefore, the conditional distribution of x,, conditional on {Uul, Uuz,. . .} is normal with mean zero and variance

2 2 -112 for almost every {u~,, vU2, . . .I. It follows that ( B t l y i U u i ) dent of 3. and

X d is indepen-

(10.3.13)

where f is a N ( 0 , l ) random variable independent of 7;. A

The X,.-l of the mode1 (10.3.6) is a rather general vector. The vector could contain current values of an explanatory variable and might contain variables of the form A&- ,=&- , . -q -,-, for i p l . If X,-, contains only A&-,, i =

Page 626: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTlVARIATE AUTOReGRESSIVE PROCESSES WITH UNIT ROOTS 603

1,2,. . . , p - 1, then we obtain Theorem 10.1.2 as a special case of Theorem 10.3.1. In Theorem 10.3.1, the model contains no intercept. If the model Contains a

constant term, then the pivotal statistic for the coefficient of lagged Y is a lineat combination of ?jp of Table 10.A.2 and a N(0,l) random variable.

Note that the usual regression statistics are appropriate, in large samples, for testing and confidence statements concerning the /?-parameters. The usual pivotal statistic for the coefficient for Y,- , has the distribution of Table 10.A.2 if p,. = 0. This can happen if the only elements of X,-, with nonzero coefficients are elements of the form j 3 1. If one is interested in testing p = 1 in a model such as (10.3.6), the statistic of Theorem 10.3.4 can be computed for the vector proms (q.x:).

A special case of the model (10.3.6) is that in which V, is a function of a vector, say XI - I , and an error that is autoregressive with parameter 0,. The model can be written’

or as

Under rather general conditions for XI,, and e, = 1, the nonlinear least squares estimator (&, 8, ) satisfies

We next consider an equation in which one of the explanatory variables satisfies an autoregressive equation with a unit root.

Theorem 10.3.2. Let

xr = &V,-, + 81Z,-I + e, ? (10.3.14)

P

Y,+&u,-j=u,. j - I (10.3.15)

Page 627: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

604 UNIT ROOT AND EXPLOSIVE TIMB SERIES

r

j = I Z, -t- BjZ , , = v, I (10.3.16)

where (e,, v:, u,) is a sequence of independent zero mean random vectors with covariance matrix I; and bounded 2 + 6 (6 > 0) moments. Let one of the roots of

mp -I- 2 ajmP-j = o

be one, and let the remaining p - 1 m& be less than one in absolute value. Let 2, be a stationary process where the roots of

P

j - I

B, - =o,Cn-9 ,

~fll+Pu,7;+(1-P,.) d ,

n1‘2(b2 - m - 5 MO, r h C ) , (10.3.17) a 2 II2

a where r, = E{Z,Z;), .i + 6, .i is defined in Corollary 10.1.1 .?. pa, is the correlation between u, and e,, u, is defined in (10.3.15), d is a N(O.1) random variable, and 6 is independent of d.

Proof. From the proof of Theorem 10.1.2,

where c = [I -t- Z7mh, (i - l)q]-’, G is defined in Theorem 10.1.1, and

The e m in the least squares estimator is

Page 628: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE AIJTOREORESSIVB PROCESSES WITH UNIT ROOTS 605

n

1 I2 where Dn = diag(n, n'", nl", . , . , n ),

and we have used

Thus,

The limiting distribution of n-1'2 Zy,, Z,-,e, follows from Theorem 5.3.3. The derivation of the limiting distribution of t,, follows that for t i of Theorem

10.3.1. A

Theorem 10.3.2 identifies the difficulties in making inferential statements about parameters of equations such as (10.3.15) where the explanatq variable is a time series. In Chapter 9 we assumed the explanatory variables to be fixed sequences. An operationally equivalent assumption is that the explanatory variables form a random sequence independent of the e m sequence. In such cases, the t-statistics are N ( 0 , l ) random variables in the limit. If an explanatory variable is an autorepsive process, the distribution of the t-statistics depends on the roots of the process and on the correlation properties of the error processes. If the autoregressive process defining the explanatory variable has a unit root and if the error in the autoregressive process is correlated with the error in the equation, then by Theorem 10.3.2 the usual t-statistics have nonstandard limiting distributions. If the explanatory process is stationary, then the r-statistics for the parameters of the model (10.3.6) are N(0,l) random variables in the limit.

If it is known that the explanatory process has a unit root, then imposing that condition on the system (10.3.14). (10.3.15) produces efficient estimates of the parameters of interest, and the t-stahtics converge in distribution to N(0,l) random variables. See Phillips (1991) and Exercise 10.12. However, in many practical situations it is not known that the Y,-process of equation (10.3.15) has a

Page 629: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

606 UNlT ROOT AND EXPLOSIVE TIMS SERIES

unit root. In such situations, the investigator must recognize the potential effect of a unit root in the Y,-process on inferences about p,.

Example 103.1. In this example, we consider estimation for the model

where (el, u,)' - NI(0.E). Our situation is assumed to be that described in Section 10.1.3 in that we specify 0, €(-1,1] and 4 = 0 if 0, = 1. The +ta of Table 10.B.4 were artificially generated to satisfy the model. The ordinary least squares estimates of the two equations are

$, = 12.992 + 0.7783 y,- , (0.099) (0.0075)

j , = 0.9798 y,-, + 0.7869 AK-, , (0.0076) (0.0597)

where yr = U, - y,, and the ordinary least squares standard errors are given in parentheses. The estimated covariance matrix of (ett u,) is

0.9618 0.7291

and the estimated correlation between e, and u, is j3,2 = 0.751. The weighted least squares estimate of 0, is d,,,, = 0.9850, and the test for a unit root is .iwl = -1.94. The estimator of el constructed from (10.1.55) is #w, = 0.9921. To estimate the parameters of the X-equation, we use a system method of

estimation such as full information maximum likelihood or three-stage least squares, available in SAS/ETs'. The two equations are estimated simultaneously, subject to the restriction that 0, = 0.9921. The standard errors computed by the program shouid not be used as the standard errors of the estimators. They are only appropriate if it is known that 0, = 0.9921. We suggest that the standard e m from the initial ordinw least squares regression be modified as in Example 10.1.3. Thus, the standard e m for y,-, in the regression of y, on yf-, is increased by 15%, and the standard error for yI - , in the relpession of X, on ( l , ~ , - ~ ) is multiplied by 1 +0.15@,,. Our estimate of tbe model is

2, = 12.937 + 0.7876 Y,-) , (0.099) (0.0083)

9, = 0.9921 y,-, + 0.7308 AY,-, (0.0087) (0.0597)

Page 630: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARlATB AUTOREGRESSIVE PRQCESSES WITH UMT ROOTS 607

where the numbers in parentheses are the modified ordinary least squares standard errors. AA

We now investigate estimation for the vector autoregressive process. Let Y, be a pth order autoregressive process of dimension R Let

P

Y,+I:A,Y,_,=~,, t=1 ,2 ,..., i= I

(10.3.18)

where the e, are independent and identically distributed (0, Zee) random vectors, and the Aj are k X k parameter matrices. The representation (10.3.18) was introduced in Section 2.8. Assume that the chatacteristic equation

(10.3.19)

has g mots equal to one and that all remaining roots are less than one in absolute value. By analogy to (10.1.29, we can write

P

Y, =HIY,- l + 2 Hi AY,-i+I +e,, (10.3.20) i=2

Hp-i - - Aj 1-p - i

(10.3.21)

for i = 2,3,. . . , p - 1. We will study the process in which the portion of the Jordan canonical form of H, associated with the unit roots is a g-dimensional identity matrix.

By Theorem 2.4.2, there exists a Q such that

Q- 'H~Q = A , (10.3.22)

where the elements of A are determined by the roots of H, . If the roots of H I are distinct, A is a diagonal matrix with the roots on the diagonal. If H, has repeated roots, the matrix A is of the form described in Theorem 2.4.2. Assume that A is block diagonal with the g X g identity matrix as the first block and A,, as the second block, where A22 - I is nonsingular.

Let T be a nonsingular matrix whose first g rows are the first g rows of Q-', the rows associated with the roots of one. The remaining rows of T are linear combinations of the rows of Q-' chosen such that all elements of T are real. The matrix T can be chosen 50 that the upper left g X g matrix of TE.,,T' is an identity matrix. Let X, = TY,. Then

I

Page 631: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

608 WNlT ROOT AND EXPLOSIVE TIME SERIES

P

X,=cPIXI-, + C @ i A X I - i + l +a,, (10.3.23)

where @,=TH,T-', i=1,2, . . . , p , and a,=Te,. The matrix has a g X g identity matrix as the upper left block. The remaining elements of the first g rows and 6rst g columns of

1-2

are zero. It follows that the vector process

is a vector autoregressive process with all mots less than one in absolute value. A vector process Y, for which them exists a transformation T such that k - g

components of the transformed vector are stationary and g components are unit root processes is called coinregrated of or&r k - g in the econometrics literature. See Granger (1981) and Engle and Granger (1987). Granger's original definition of cointegration required every element of Y, to contain a unit root. The last k - g rows of T that define stationary processes are called cointegruting vectors.

A canonical form for the pth order process of interest is

P

Y , = H , Y , - , + ~ H , A Y , - , + l + e l , (10.3.24)

where (Y f,, Y;,) = Yi, Y,, is tbe vector composed of the first g elements of the Y,, the upper left g X g portion of Xea is I,, H, =block diag{Ig, HI.,,}, and HI,,, - Ik-* is nonsingular. The conformable partition of e, of (10.3.24) is (e;,,e;,)', where e,, is a g-dimensional vector. We can write

ezt = eltZe,lz + hi 9 (10.3.25)

1-2

where XeeIz is the g X (k - g) upper right part of Z,,, a,, is uncomlated with el, and

E(a,*a:J = ~ p o 2 2 * (10.3.26)

The vector (AY i f , Y;,) is a stationary process and can be expressed as

by Theorem 2.8.1. Also, f P f

(10.3.27)

where €I,,l , is the upper left g X g block of HJ and €I,,,, is the upper right g X (k - g) block of H,. Because E;=l AY,i = Y,, - Y,, and AYII is stationary, we have

Page 632: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE AUTOREGRESSIVE PROCESSES WITH UNIT ROOTS 609

and

1

d g ~ , 2 el, + 0,(1) (10.3.28) l = I

where the O,( 1) remainder is uncorrelated with e, for s > 1.

(10.3.24) be Let the ordinary least squares estimators of the parameters of the model

and

(10.3.29)

(10.3.30)

where

Lc-l = ( Y ~ - l , A Y ~ - l . . . . . A Y ~ - p + l ) .

and 6: = Y; - LL-$'. lple ordinary least squares estimator of the "covariance matrix" of vec H' is

H' = (HI, H,, . . . , Hp)' ,

(10.3.31)

This matrix will normalize the estimators, but it is not an estimated covariance matrix in the traditional sense because some of its elements cannot be standardized to converge to constants. This is analogous to the results for the univariate case discussed in Section 10.1. The matrix V= X r S p + I L,!- I L,- I is kp X kp, and we denote the ijth k X k block of the matrix by v,. From the derivation of the canonical form (10.3.24), we b o w that the least

squares estimator is invariant to linear transformations in the sense that if T is nonsingular, then the least squares estimators of the coefficients for the trans- formed model based on TY, are Tt?,T-', i = 1,2 ,..., p , where fit, i = 1.2,. . . , p, are the least squares estimators for the original model. Also, the estimator of the error covariance matrix for the transformed problem is '&,TI, where E,, is the regression residual mean square for the original problem. We now give the limiting distribution of the estimators for the canonical model.

Page 633: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

610 UNIT ROOT AND expzoSIVE TIME SERIES

Theorem 10.33. Let Y, be a k-dimensional pth order autoregressive process written in the canonical form (10.3.24). Assume the c, am iid(O,E,,) random variables or that the e, are martingale differences satisfying the conditions of Lemma 10.3.1. Partition Y; as (Yi,,Yi,), where Y,, is the g-dimensional vector with g unit roots. Assume that (AY i,, Yb,) is a stationary process with E{AY ,,} = 0 and YI,* = 0. Let the ordinary least squares estimators be defined by (10.3.29) and (10.3.30). Then

*re C, = (I - ZT=2 Hj,l,)-i is defined in (10.3.28), %, is defined in (10.3.25).

D, = diag(n, n, . . . , n, nl", t ~ l ' ~ , . . . , r ~ " ~ ) ,

(G, Y) is defined in Lemma 10.3.1, G,, is the upper left g X g portion of G, Y, . = (Y I , Y 12) is the matrix composed of the first g rows of Y, the elements of q,, are zero mean normal mdom variables with

and

Also, e,, converges in probability to Xee.

Proof. By (10.3.28) and by the arguments used in the proof of Theorem 10.1.2,

Likewise, by Theorem 5.3.7 and Theorem 8.2.3,

where a2. is a matrix of zero mean n o d random variables with covari- matrix

V { V ~ s e e QD zL,zz *

The covariance between n-"2 Z AY,,, e:, j a 1, and n-' Z Y,-,e: goes to zero as n increases and the limit random variables are independent. See Lai and Wei (1985a) and Chan and Wei (1988). Using

Page 634: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE AUTOREGRESSIVE PROCESSES WITH UNlT ROOTS 611

we obtain the limiting distribution. The estimated covariance matrix See converges to Zee be~ause

n

[n - ( p + l)k]--' 2 (Y: - Lt-,H')'(Y; -L,-,fi') I -pfl

= [n - ( p + 1)kI-l ere: + ~ , , ( n - ' ) . A 1=p+l

Theorem 10.3.3 gives the distribution for the parameters of the canonical form (10.3.24). The coefficient matrices for any other representation are linear combina- tions of the eIements of the canonical coefficient matrices. Hence, the limiting distributions are defined in terms of those of Theorem 10.3.3. As in the univariate case, the limiting distribution of coefficients associated with unit roots differ from the limiting distribution of coefficients associated with the stationary portion of the model.

The following corollary to Theorem 10.3.3 was derived for g = 1 by Fwntis and Dickey (1989).

Corollary 10.33. Let Y, be a k-dimensional pth order autoregressive process with g unit mts and all other roots less than one in absolute value. Assume the process can be transformed to the canonical form (10.3.24) with g unit roots. Let the assumptions on e, of Theorem 10.3.3 hold. Let &, 2 lir, 2 * - * 3 1, be the g roots of

l I m P +A,mP-' + . . . +Apl=o, with largest absolute values, where the A,, i = 1,2,. . . , p, are the least squares estimators. Then the limiting distribution of n(ki - l), i = 1,2,, . . , g, is the distribution of the g roots of

1h - Y ;.G-'l= 0 ,

where (G, Y) is the g X 2g ma& defined in Lemma 10.3.1.

Froof. The roots of the characteristic equation are unchanged by nonsingular transformations and we assume, without loss of generality, that the equation is in the canonical form (10.3.24).

By Theorem 10.3.3, the least squares coefficients are converging to the true values. Therefore, by tfie results of Section 5.8, the roots of the characteristic equation are converging to the true mots. We expand the largest toots about one and write &' = 1 + ri3 + op(S). Then the local approximation to the determinantal equation becoma

Page 635: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

612 UNlT ROOT AND EXPLOSIVE TIME SERIES

[ I ( l+pS)+A, f l + ( p - 1)S]+A2[l + ( p - 2 ) S J + . - + A p ~ = O .

Using the definitions of fit in terms of the A,, we have

(10.3.32)

The lower right rtion of I -a, is converging to a nonsingular matrix. The matrix 1 - xr-2 c i s converging to a matrix, say M, with ci* as tbe upper left g X g block, Therefore, we multiply the matrix in (10.3.32) on the left by the nonsingular matrix

where MZ1 is the lower left (k - g) X g portion of M, The values of S tbat satisfy the resulting deteminantal equation are e ual to the values that satisfy (10.3.32). using the fact that the first g columns of 8, - 1 are op(n-l) , the g smallest roots of (10.3.32) are converging to the g smallest roots of

IC,@l,,, - 1,) - I ,4 = 0 *

The result follows from the limiting distribution of at,,, - I, given in Theorem 10.3.3. A

We now turn to the problem of estimating the parameters of (10.3.18) subject to the restriction that one of the roots of (10.3.19) is equal to one. If we write the model in the form (10.3.20), the determinant of H, - I is zero under the null model and there exists a vector K,, with K ~ K , $0, such that

K;(H, -I)=O. (10.3.33)

Therefore, if the first observations are treated as fixed, the maximum likelihood estimator of (HI, , , . , €Ip) is obtained by imposing the restriction (10.3.33) on the least squares estimator, Assume the first element of q, denoted by K, is not zero, and let K; ;K; = (1, -Bo,i2). If we multiply (10.3.20) by K F ~ I K ; , we have

P - 1

AY,, = ~ 0 , 1 2 A Y ~ , + E Bi+t,l by,- , + 'it (10.3.34) I = 1

- I t where Bd+l,l. = K ~ ~ ~ K ~ H , + , , Y: = (Y,,.Yi,), and vl, = K , ~ qe,. Let 6 = (B,,i2r B2,, , . . . , BP.,.). Let li = (li,, li2, li3) be the matrix of regres-

sion coefficients obtained in the regression of [AY,,, AYI,, (AYi-,,. . . , AY,'-p+s)I on (Y,!- ,L'Y:- . . . , AY,!-p+l 1. Of course, the elements of &3 are zeros and ones. Let Ak be the smallest root of

Page 636: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTNARlATE AUTOREGRESSIVE PROCESSES WITH UNIT ROOTS 613

where

is a k(p - 1) X k(p - 1) matrix of zeros, e,, is defined in (10.3.30), and Lt- l= (Y~- . l , AY;-, ,.,., AY;-p+l). Then, the maximum likelihood es- timator of B is

OM P - 1 M p - 1 )

where eCez2 is the lower right (kp - 1) X (kp - 1) submatrix of e,,, and %ct21 is the lower left (@ - 1) X 1 submatrix of 3,.

The vector 6 is called the limited information maximum likelihood estimator in the econometric literature. Hence, 4 can be calculated by any of the many econometric computer packages. The limiting distributions of the test statistic and the limiting distribution of the coefficients associated with the conditional likelihood were obtained by Johansen (1988) and Ahn and Reinsel (1990). We give a proof that differs somewhat from the proofs of those authors.

Theorem 103.4. Suppose the model (10.3.18) holds with exactly one mot of (10.3.19) equal to one and exactly one root of HI equal to one. Assume the e, are iid(0, &) random variables or that the e, are mathgale differences satisfying the conditions of Lemma 10.3.1. Assume that Y, is such that K] can be mmalized to give equation (10.3.34). Let 8 of (10.3.36) be the estimator of B. Let

Page 637: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

614 UNIT ROOT AND EXPLOSIVE TIME SERIES

where 4 s 4, 4 is defined in Corollary 10.1.1.2, the distribution of + is given in COrOhJ’ 10.1.1.2, and 8ik is defined (10.3.35).

proot. Let Vf-l be the deviation from fit for the regression of Y;-l on (AYf’-l,. . . ,AY,!-,p+l). Then ik is also the smallest root of

where (hll, k12) is the upper k X R portion of (&l, h2). Note that (&,l, k12)= 8: -1.

Let T, I be a transformation matrix such that the unit root is associated with the first element of XI = TI lYf. That is, XI satisfies the equation

P

x, = HxlXf-1 + I: Hx, A&-,+, +TI,% 9 i-2

where Hxi =TllHiTL/, H, is defined in (10.3.21), and the first row and first column of H,, - I are vectors of zeros. Then

n

r-p+1 (*11, +,2)’ t: vf-lv;-l(kl19 6 1 2 )

n

= (hlI, kJT,’ 2 ~f-l%~-,T~~‘(hll, ii,2> r=p+l

n

,-p+l =&’ c X,-,X:-,&,

where %= (&, &) = T;:’(iil &,2), and %,-, is the deviation from fit for the regression of X,- on (AYi- . . . AY,!,,,,. 1. Also,

where ik is the smallest root of (10.3.35) and the roots of (10.3.35) are the roots of

Page 638: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE AUTQREGRESSIVE PROCESSES WlTH UNIT ROOTS 615

for i = 1,2,. . . , k and j = 1,2,. . . , k, by the arguments used in the proof of T h m m 10.1.2. Therefore, the elements of the first row and column of 6 z & are oP(l). Also, because (x,,, . . . , &) is stationary with positive definite covariance matrix, the matrix of mean squares and products of (x2,, . . . ,&) is an estimator of a noxisingular matrix. Also, by the assumption that HI - I has one zero root, t2 is of rank R - 1 and

where M is a nonsingular matrix. It follows that i k = 0 ~ 1 ) . ~ l s o see Theorem 2.4.1 of Fuller (1987, p. 151). From the definition of d,

where Bo,l, = (1, -Bo.lz) and ar is the probability limit of li. Because (wZ, m3)'~iWl is a stationary pmess, the asymptotic normality of n1'2(d- 8) follows from Theorem 5.3.4. See Theorem 8.2.1.

obtained in the regression of q$ on Lf-l . Because * is converging to ar, f i , , , is converging to Bo,lZ, and e,, is converging to Z,,, we have

(n -P)?,,& a - l B o , I 3SB,B6., ,

Note that L,- I ( d?-, h3) is the predicted value for

and the distribution result for oii'2( d - 0) is established. We now obtain the limiting distribution of i k . Because the roots are unchanged

by linear transformation, we assume, without loss of generality, that the model is in the canonical form. In the canonical form, Bo,l. = (1, 0, . . . , 0). From (10.3.37),

where

Page 639: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

616 UNIT ROOT AM) EXPLOSIVE 'IWE s e m

and the first row of Q is a zero vector. It follows that

The standardized matrix

D;;'2[8;: - g2(&8xx@-'g;]s;:",

where B;; = diag !$,, is converging to a matrix with (1,0,. . . $0) as the first row. Therefore,

and & 3 8;. See the proof of Theorem 10.1.2. A

The results extend to the model with an intercept.

Corollary 10.3.4. Let the model P

Y,=H,+H,Y,-, + ~ H , A Y , - , + , + e , , t=2,3 ,..., n ,

be estimated by ordinary least squares where H is a &-dimensional column vector. Let KY'H: = 0, where K ~ ' ( H ; - I) = 0, and K I and HY are the true parameters of the process. Let all statistics be defined as in Theorem 10.3.4 except that the regression equations contain an intercept. Let the error assumptions of Theorem 10.3.4 hold. Let b= @,,,,, &, &, . . . , tipvl ), where tl is the estimated intercept for the model (10.3.34) expanded to include an intercept. k t 988 be defined as in Theorem 10.3.4 with

# = (AY2,. AY,,, . . . , AY,,, 1, AY:-,, AY:-,, . . . , AY:-,,).

i = 2

8

Page 640: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE AUMReGRESSIVE PROCESSES W m UNIT ROOTS 617

3 where +M + $,, the limit random variable e,, has the distribution given in Theorem 10.1.3, and Ak is defined by (10.3.35) with L,.-l expanded to include a one.

Proof. omitted. A

One method of estimating the entire system (10.3.20) subject to the restriction (10.3.33) is to use a nonlinear estimation program. The computations are illustrated in Example 10.3.1 of the next subsection.

10.33. Vector Process with Several Unit Roots We now extend our discussion to the estimation of the multivariate process (10.3.18) subject to the restriction that exactly g mts of (10.3.19) are equal to one, under the assumption that the part of the Jordan canonical form of H, associated with the unit roots is the g-dimensional identity matrix. This means that there are g vectors 4 such that

KI(H, -I) = 0, (10.3.38)

where the 4 define a g-dimensional subspace. Let

* s,, = (fi, - I)V,'(fi, - I)', (10.3.39)

where V, , is the portion of the inverse of

P = (k,, 4% * . . , 41, where e,, is defined in (10.3.30). That is,

( 6 h h - &see)@ = 0,

where &, i = 1,2,. . . , k, are the roots of

i = 1,2, . . . , k , (10.3.40)

Is,, - Ae,,I = 0 (10.3.41)

and R'Z,,& = 1. For normd e,, the method of maximum likelihood defines the estimated

g-dimensional subspace with the g vectors associated with the 8 smallest roots of (10.3.41). The vectors 4, i = k - g f 1, k - g + 2,. . . , k, define a gdimensional

Page 641: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

618 UNIT ROOT AND EXPLOSIVE TIME SBRIES

subspace, but any set of g vectors of rank g formed as linear combinations of the 4 can be used to define the subspace.

We have solved the problem of estimating those linear combinations of the original variables that are unit root processes. One could ask for those linear combinations of the original variables that define stationary processes, the cointegrating vectors. The cointegrating vectors are obtained in Example 10.3.2 using a maximum likelihood computer program. The cointegrating vectors can be computed directly using a determinantal equation different than (10.3.41), Define S O O I SIO, and so, by so, =s;o'

and

where d, is the degrees of freedom for %ee. For the first order model, V,' = X~=2Yr-lYr'.-l and

Then, the k - g cointegrating vectors are the vectors associated with the k - g largest mots of

~s, ,s~s , , - .v,'l= 0. (10.3.44)

The roots Q of (10.3.44) are related to the roots of (10.3.41) by

P = ( l + d i ' f i ) d i ' f i (10.3.45)

and L = ( 1 - P)-'kjr Given that (10.3.38) holds, there is some arrangement of the elements of Y,

such that

where (I, -Bo,,2) is a g X k matrix obtained as a nonsingular transformation of (K~. 5,. . . , KJ. Thus, if we let

we can multiply the model (10.3.20) by Bo to obtain

Page 642: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE AUTOREGRBSSNE PROCESSES WITH UNIT ROOTS 619

where v, = Boer, and the first g rows of B, are composed of m s . Then the maximum likelihood estimator of B,,,, is

fro,,, = C 2 2 L 7 (10.3.47)

where

S,, = S,,R[Diag(i, - I, i2 - 1,. . . , i k - 8 - I, O,O, . . . , o>J&S,, ,

earZZ is the lower right (k - g) X (k - g) submatrix of em,, & is the matrix of vectors 4, and 4, i = 1,2,. . . , k, are the characteristic vectors in the metric gee associated with equation (10.3.40). A derivation of this estimator in a different context is given in Fuller (1987, Theorems 4.1.1 and 4.1.2). The estimators of B,,l,, j = 2,3,. . . , p , are given by the regression of f), AY, on (AY;-,, AY:-2.. . . , AY:-p+,).

We now give the limiting distributions for some of the statistics associated with maximum likelihood estimation.

Theorem 10.3.5. Assume that the model (10.3.19), (10.3.46) holds, where e, are iid(0, Zee) random vectors, g of the roots of H, are one, and R - g of the roots are less than one in absolute value. Assume that the part of the Jordan canonical form associated with the unit roots is diagonal. Let be the least squares estimator of H, obtained in the least squares fit of equation (10.3.20). Let i i a & % - . . ~ A k betherootsoftheequation

I(&, -I)V,’(A, - I ) ’ -A~ , I=O,

where e,, is the least squares estimator of Z,,, and V,, is defined with (10.3.39). Then the distribution of (&-,+Is &-g+2,. . . , ik) converges to the distribution of the roots ( f i , , V2,. . , , 4 ) of

where G,, and Y,, are g X g matrices defined in Lemma 10.3.1.

(ik-,+ I , i k - g + 2 , . . . , li,> converges to the distribution of the roots ( 4, G ~ , . . . , If the statistics are computed using deviations from the mean, the distribution of

of

KYg, - lq’Wg, - as‘>-’(Y,, - lq‘> - vIgl = 0 I

where 6 and q are g-dimensional vectors defined in Lemma 10.3.1. Let fi,,, be defined by (10.3.47). let Bj,*, be the g X k matrix composed of the

Page 643: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

620 UNIT ROOT AND EXPLOSIVE TIME SERIES

first g rows of Bj, j = 2,3, . . . , p, of (10.3.46), and let fij,*., j = 2,3,. . . , p , be the associated estimators of Bi,'., j = 2.3, . . . , p . Then

n"2vec((6 - B)'}-% N(0, VBB),

where V,, = Xu, 8 a- I , a = E{ $; &},

and

Proof. We outline the proof for the no-intercept model. The model can be transformed to the canonicai form (10.3.24) and the mots of (10.3.41) computed with the canonical form are the same as those computed with the Original variables.

Let v,-, be the column vector of residuals obtained in the regression of Y,-, on (AY,!- ),. . . , AY;-,+ ,), and let sit be the regression residual mean square for the regression of AU,, on ( Yf'- I , AY ;- . . . , AY f'-p + ). Then, in the set of ordinary regression statistics, the coefficients for Y,-' in the regression of AX, on (Y:-,, AY:-, , . . . , AY;-,+ I ) are the regression coefficients for the r e p s i o n of AY;., on y f - l and

is the estimated covariance matrix for that vector of Coefficients. For the model (10.326) in canonical form,

and D, = diag(n, n, . . . , n, nl", n i l 2 , . . . , Also,

Page 644: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE AUTOREGRESSIVE PROCESSES WITH UNIT ROOTS 621

* * where Y is the limit 3 d o p variable for n-’ &’=, (Xi:: e,)e:, Y,, is the*upper left g X g submatrix of Y, Y,, is the upper right g X (k - g) submatrix of Y, and the elements of V,, are normal random variables. See the proof of Theorem 10.3.1. Tzlen

where

The nonsingular matrix n1’ZM2,(H~,2, - I) dominates Rn22t which is O,,( 1). Thus,

-I- remainder,

yhere*the remainder terms are of small order relative to the included terms. Now M:,Mn2, is a nonsingular matrix multiplied by n, R,,, is O,,(l), Rn21 is OJl), and the upper left g X g portion of gee converges to I,. Therefore, the g smallest roots of (10.3.41) converge to the roots of

* Because the elements of e,, am small relative to Mn and the g smallest mots are OJl), the limiting behavior of e, of (10.3.47) is that of

Therefore, the limiting behavior of n1’2(fio,1z - B,,J is the limiting behavior of

In the canonical form, ii,,, is estimating the zeco matrix with an error that is

Page 645: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

622 UNi" ROOT AND EXPLOSIVE IIME S w e S

converging to a mahrix of normal random variables. Because the maximum likelihood estimators are invariant to linear transformations, the distribution for the estimators of the parameters of the original model are the appropriate transforma- tion of the canonical &os12. The limiting distribution of the vector of other elements of b is obtained by the arguments used in the proof of Theorem 10.3.4. A

Table 10.A.6 of Appendix 10.A contains percentiles to be used in testing for unit mts in the characteristic equation (10.3.19). The entries in the table are the percentiles associated with

A, = (1 +df-'Ai)-',( , (10.3.48)

where li, i = 1,2,. . . ,A, are the roots of the &tednantal equation (10.3.41), and d, is the degrees of freedom for see. Monte Carlo studies conducted by Hoen Jin Park (personat communication) indicate that in smaR samples with g > 1 that statistics based upon 1, have power superior to the corresponding test based on $.

The first row of Table 10.A.6 is the distribution of the test statistic for testing the hypothesis of a single unit rod against the alternative of no unit roots, the second row is for the hypothesis of two unit roots against the alternative of no unit roots, the third row is for the hypothesis of two unit roots against the alternative of one unit root, tbe fourth row is for the hypothesis of three unit mts against the alternative of no unit roots, etc. m e test statistic for row 1 is X,, the test statistic for row 2 is ik-, + x k s the test statistic for row 3 is $ 9 the test statistic for row 4 is Xk+, + X k - , -I- X k , the test statistic for row 5 is &-, + X,, etc. he numkrs at the left are the degrees of freedom associated with z,,.

Table 10.A.7 contains the percentiles of d o g o u s statistics for the model (10.3.20) estimated with an intercept. Table 10.A.8 contains the percentiles of analogous statistics for the model estimated with a time trend.

The tables were constructed by Heon Jin Park using the Monte Car10 method. The limiting percentiles were calculated by simulation using the infinite series representations of Lemma 10.3.1. The percentiles were smoothed using a function of d,, where df is the d e p s of freedom. The standard errors of the entries in the tables are a function of the size of the entries and range from 0.001 for the smallest entries to 0.25 for the largest entries.

We now discuss the estimation of H, - I of equation (10.3.20) subject to the linear constraints associated with the presence of unit roots. Assume that the least squares statistics have been computed, and let fi, be the least squares estimator of HI. Let V,, be the part of the inverse of the matrix of sums of squares and products associated with Y,-,. Each column of ft: is a vector of regression coefficients and the ordinary least squares estimator of the "covariance matrix" of a column is V,, multiplied by the e m variance. Thus, arguing in a nuwigmuus manner, the rows of

*

have the property that they are estimated to be uncomlated.

Page 646: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE AUMREGRBSSWE PROCESSES WITH UNIT ROOTS 623

Under the hypothesis of g unit roots in the original system, equivalent to g zero roots for H, - I, there is a B0,,. such that

y’”(I3; - I)BL,,, = 0 . (10.3.49)

The estimator of Bo,,,, up to a linear transformation, is given by the g characteristic vectors of (10.3.40) associated with the g smallest roots. Let

il; =v;Ii’2(H; -I)

and

ri =V,”’(H; -I) .

Now each column of v = Bo,l.&, contains g linear contrasts that are estimating zero and &,,.a, is an estimator of those contrasts. Therefore, an improved estimator of a column of r, is obtained by subtracting from the column of Iff, an estimator of the e m in fi, constructed with the estimator of the g linear contrasts. Thus, the estimator is

where 9; = fii.,.@;,,,, eVv = Po,, zeefi;,,., and eve = &,,e,,. It follows that the estimator of Hi - I is

a; - I =v;;2P; = (4%; - 1x1 - ii;,,,2;w12ve), (10.3.51)

The estimated covariance matrix of the estimators i = 1.2, . . . , k, is

e r r = g e e - ~ c v ~ ~ v i ~ v e .

Therefore, the estimated covariance matrix of vecffi, - I) is

?{Vec(ii, -I)} =(V::2~I)(I~e,r)(V::2QDI)f *

The estimator of HI - I is seen to be the least squares estimator adjusted by the estimators of zero, where the coefficient matrix is the matrix of regression coefficients for the regression of the e m in 8, - I on the estimators of zero. his formulation can be used to construct the maximum likelihood estimators of the entire model. Let

d = ve~{(fi, - I)f(I, -Bo,,2)f} be the Rg-dimensional vector of estimators of zero. Then

Page 647: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

624 UNIT ROOT AND EXPLX)SIVE TIME SERIES

follows from regression theory. If the original vector autoregressive process is

Y, =H,+H,Y,-, +H,AY,-, +-.~+H,AY,-,+, +e,, (10.3.53)

we may wish to estimate the vector of intercepts Ho with the restriction that

Bo.,Ho = 0. (10.3.54)

This is the restriction that the linear combinations of the original variables that are unit root processes are processes with no drift. The estimator of Bo, , , is given by the eigenvectors associated with the smallest roots of

I(&, ft, - I ) v p o , A, - I)‘ - = 0, (10.33)

where V** is the portion of the estimated covariance matrix of the least squares estimators associated with the vector of regressor variables (1, Y,’- I ).

In Example 10.3.2 we demonstrate how existing software for simultaneous equation systems can be used to construct the estimators and estimated standard errors of the estimators.

Example 103.2. To illustrate the testing and estimation methods for multi- variate autoregressive time series, we use the data on U.S. interest rates studied by Stock and Watson (1988). The three series are the federal funds rate, denoted by Y,,; the %day treasury bill rate, denoted by Y2,; and the one-year treasury bill rate, denoted by Y3,. The data are 236 monthly observations for the period, January 1960 through August 1979, The one-year treasury bill interest rate was studied as a univariate series in Examples 10.1.1 and 10.1.2.

Following the procedures of Example 10.1.1, a second order process fit to tbe differences yields

A’f,, = 0.017 - 0.550 AY,,-, - 0.161 A2Yl,r-I , (0.025) (0.075) (0.065)

A’f,, = 0.021 - 0.782 AY2,,-, - 0.025 AZY2,,-, , (0.021) (0.082) (0.065)

A2f3, = 0,018 - 0.768 AY3,1-l + 0.105 A2Y3.,-, . (0.019) (0.076) (0.065)

(10.3.56)

Page 648: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE AUTOW-SIVE PROCESSES WITH UNIT ROOFS 625

The +p statistics for the hypothesis of a second unit root under the maintained hypothesis of a single unit root are -7.37, -9.52, and -10.13, respectively. 'Ihus, in al l cases, the hypothesis of a second unit root is rejected at the one percent level.

"hiid order autoregressive models fit by ordinary least squares to each series gives

APl, = 0.108 - 0.017 Y,,,-, + 0.297 AY,,-, + 0.177 AYl,f-2, (0.062) (0.010) (0.065) (0.M)

AYz, = 0.078 - 0.01 1 Y2,, - , + 0.200 AYZ,, - I + N(0.035 AYZ,, .-2

(0.062) (0.012) (0.066) (0.066) (10.3.57)

A t . , = 0.082 - 0.012 Y3,r-I + 0.343 AY3*,-, - 0.094 dYs,,-2. (0.062) (0.01 1) (0.065) (0.064)

The three residual mean squares are 0.143, 0.098, and 0.083 for the federal funds rate, 9O-day treasury rate, and one-year treasury rate, respectively. There are 233 observations in the regression, and the residual mean square has 229 degrees of freedom. The 7; statistics for the hypothesis of a unit root in the univariate processes are - 1.62, -0.98, and -1.09. The 1096 point for 7: for a sample with 229 degrees of freedm is about -2.57. Hence, the nuU hypothesis of a unit mot is not rejected for each series. Models were also fit with time as an additional explanatory variable. In all cases, the hypothesis of a unit root was accepted at the 10% level. On the basis of these results, we proceed as if each series were an autoregressive process with a single unit root and other roots less than one in absolute value.

We write the multivariate process as

or as

BY, = H, + (H, - I)Y,-, + H, AYl - l + H, AY,-, + e, , (10.3.59)

where Y, = (Y,,, Y,,, Y3t)' is the column vector of interest rates, H, is a three- dimensional column vector, and H,, i = I , 2,3, are 3 X 3 matrices of parameters. If the multivariate process has a single unit root, the rank of HI - I is 2. The rank of H, - I is one if the multivariate process has two unit roots.

To investigate the number of unit mts associated with the multivariate process, we compute the roots of

/(al - I)VL;(fil - I)' - Ae,,l= 0 , (10.3.60)

where B, is the least squares estimate of H,, S,, is the least squares estimate of see, and V, ( jcrl i is the least squares estimated covariance matrix of the ith row of H,. The matrix (H, - I)V~l'(&, - I)' can be computed by subtracting the residual

Page 649: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

626 UNIT ROOT AND EXPLOSIVE TIME SERfES

sum of squares and products matrix for the full model from that for the reduced model

AY, =He + H, b y , - , + A, BY,-, + e, . (10.3.61)

The full model has nine explanatory variables (Y:-,, AY:- ,, AY:-2) plus the intercept and is fitted to 233 observations, Hence, the estimator of See has 223 degrees of freedom. The reduced model contains (AY,- I , and the intercept.. The matrices are

a ; - I =

'-0.167 0.043 0.020 (0.043) (0.034) (0.033)

0.314 -0.245 -0.058 (0.121) (0.098) (0.094)

-0.109 0.189 0.019 ,(0.098) (0.080) (0.077)

(10.3.62) 0.1304 0.0500 0.0516 0.0500 0.0856 0.0712 0.0516 0.0712 0.0788

) 2.0636 -0.4693 -0.1571

( -0.1571 0.1752 0.1574 (fi, - X)V,'(fh, -1)' = -0.4693 0.5912 0.1752 . (10.3.63)

The standard errors as output by an ordinary regression program are given in parentheses below the estimated elements of (k, -1)'. The estimates of HI - I are not normally distributed in the limit when the Y,-process contains unit roots.

The three roots of (10.3.60) are 1.60, 12.67, and 32.30. The test statistic tabled by Park is

k

2 (1+d;'ii)-IA= i: x,, i = k - g + l i - k - g + l

where ti, is the degrees of m o m for &, i, 3 4 P * * - zz & % and 8 is the hypothesized number of unit rvts in the process. The test statistic for the hypothesis of a single unit root is A, = 1.59. Because the median of the statistic for a single root is 2.43, the hypothesis of one unit root is easily accepted. The test for two unit roo3 against the alternative of one unit root is based on the second smallest root A, = 11.99. On the basis of this statistic, the hypothesis of two unit roots is accepted at the 10% level because the 10% point for the largest of two i given tw? unit roots is about 12.5. The hypothesis of three unit roots is pjected because A, = 28.21, exceeds the 1% tabular value for the largest of three A given three unit roots. For illustrative purposes, we estimate the process as a multivariate process with two unit rods and diagonal Jordan canonical form.

Page 650: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

MULTIVARIATE AUTORffiRESSIVE PROCESSES WITH UNIT ROOTS 627

The eigenvectors associated with the determinantal equation (10.3.60) are

kl = (0.4374,0.1042,3.1472)' , 4 = (1.8061,5.3600, -6.3725)' , k3 ~(2.6368, -4,3171,1.8279)' .

(10.3.64)

If the vector process contains two unit roots, the two time series defined by

CW,,,W,,, = ( d j Y , , q f , ) 7 (10.3.65)

are estimated to be the unit root processes. Furthermore, the errors in these two processes are estimated to be uncorrelated. Thus, it is estimated that

(H, - Mk1, &) = (0.0) (10.3.66)

and that

(kl, &)'XJk,, 4) = I - The determinantid equation (10.3.44) that can be used to define the cointegrat-

ing vectors is

897.0 679.4 639.1 15.00 6.47 7.18 1261.6 897.0 834.9 1 ( 6.47 5.26 5.50) - .( 7.18 5.50 6.69 834.9 639.1 615.8

The roots are 0.1264, 0.0537, and 0.0071, and the associated vectors are

(-0.104,0,313, -0.188)' , (-0.056, -0.106,0.195)' , (10.3.67) (0.004,0.056, -0.025)' .

If we assume that there are two unit roots, then the first vector of (10.3.67) defines the single cointegrating vector.

We note that if (10.3.66) holds,

(a, - 1x4, &)K;; = (0,O) (10.3.68)

for any nonsingular 2 X 2 matrix K, , . In particular, we could let

(10.3.69)

The K l , of (10.3.69) is nonsingular if Y,, and Y,, are both unit root processes and are not both functions of a single common unit root process. Therefore, in practice, care should be exercised in the choice of normalization. For our example, we assume that

Page 651: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

628

where

UNIT ROOT AND EXPLOSIVE TIME SERIES

(H, -I)'p=O,

(10.3.70)

We create a square nonsingular matrix by augmenting fl with a column (O,O, 1)' to obtain

If we multiply (10.3.59) by B', we obtain the restricted model

AY1~=et,Yl ,~- l +e12Y2.r-l +eB3Y3.r-l + w 1 6 1 ~ + s r l I

AY21 = P*, AYlI + w 1 e w + u21 ' (10.3.71)

Ay3r = a1 "1, + wt61w + ' 3 1

where W, = (1, AY,'... , , AY;-J. This model is in the fonn of a set of simultaneous equations. A number of programs are available to estimate the parameters of such models, We use the full information maximum likelihood option of SASIETS' to estimate the parameters. The estimates are

A f l 1 = - 0.130 Y, ,,-, + 0.3% Yz,,-, - 0.238 (0.039) (0.1 10) (0.080)

+W,4, ,

bY2, = - 0.487 AY,, + W,4w ,

AY3, = - 0.122 AY,, + W14,,, , (0.3 19)

(0.23 8)

(10.3.72)

where we have ommitted the numerical values of (4w, 4w, hW) to simplify the dispiay. The vector (-0.130,0.396, -0.238) applied to Y, defines a process that is estimated to be stationary. The vector is a multiple of the cointegrating vector given as the first vector of (10.3.67).

The estimator of Hi - I estimated subject to the restriction that the rank of H, - I is one is

fi;-+

-0.130 0.063 0.016 (0.039) (0.030) (0.027) 0.3% -0.192 -0.048

(0.1 10) (0.091) (0.087)

-0,238 0.116 0.029 (0,080) (0.058) (0,052)

(10.3.73)

Page 652: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

TESTING FOR A UNIT ROOT IN A MOVING AVERAGE MODEL 629

The standard errors of the estimates are given in parentheses below the estimates. The estimates are not al l normally distributed. However, the difference between the estimate and the true value divided by the standard error converges to a N(0,l) random variable for a correctly specified model. Therefore, the mud regression testing and confidence interval procedures can be applied to the restricted estimates under the assumption that the process contains two unit roots. The tests are subject to the usual preliminary test bias if one estimates the number of unit roots on the basis of hypothesis tests. The two time series defined by

U,, f 0.487Y1,

and Y3, + 0.122Y,,

are processes estimated to have unit roots. These two pmsses are linear combinations of the two processes in (10.3.65). That is, the two processes are obtained by transforming the first two vectors of (10.3.64).

"?*). AA 3.147 -6.372 0 1

10.4. TESTING FOR A U" ROOT IN A MOVING AVERAGE MODEL

"he distributional results developed for the autmgressive process with a unit root can be used to test the hypothesis that a moving average process has a unit root. Let the model be

Y, =el + Be,-I , t = 1,2,. . . , (10.4.1)

where the el ate iid(0, a*) random variables. We assume e, is unknown and to be estimated. We have

Yl = Pe, + e l . U,=f;(Y; P,eo) + el, t = 2 , 3 , . . . , (10.4.2)

where 1 - 1

. t (Y B, eo) = -2 - (-P)le0 . j * 1

On the basis of equation (10.4.2) we define the function el = e,(Y J3, eo). where

e , ~ P, eo) = C c-B>'u,-, + (-@)'e0. I - I

j - 0

Then

Page 653: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

630 UNlT ROOT AND EXPLOSIVE TIME SERIES

and

If p = -1, we have from (10.4.1) I

2 Y, = -e, + el. (10.4.3) i-1

Therefore, treating e, as a 6x4 parameter to be estimated, n r

1-1 ]=I 2, = -n-' 2 2 yi (10.4.4)

is the best linear unbiased estimator of e, when /? = - 1, and the error in i, is -Li, = -n Xr=, el.

We shall construct a test of the hypothesis tba: /3 = - 1 by considering one step of a Gauss-Newton itecafion for (p, e,) with (- 1,8,) as the initial value for (@, e,). The dependent variable in such an iteration is

- 1

(10.4.5)

Under the null hypothesis,

el(Y; -1, go) = e, - gn.

Let W,(Y; /3, e,) denote the partial derivative o f f l y @, e,) with respect to /3, and note that w,(Y; fi, 8,) is the negative of the partial derivative of e,(Y; p, e,) with respect to p evaluated at <S,G,). men, under the null,

The W,(Y; - I , E,) satisfy the recursive rekitionship

W,(Y; -1, 2,) = 2, , WJY; -1, 2,) = Y* + so , q(X--l,&,)=q-* +2w,-,(Y; -1 ,8J-q- .2w; -1,iQs

t =3,4, 1 . . ,

and the e1(X - 1, 2,) satisfy the recursive relationship

Page 654: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

TIis"G FOR A UNIT ROOT IN A MOVlNG AVERAGE MODEL a1

e,(Y; -1, 2,) = U, + 2,, e,(Y; -1, So) = Y, + e,-,(Y; -1, g o ) , I = 2,3,. . . .

The Gauss-Newton iteration consists of regre&ag e,(Y; - 1, i,) on W,(Y; - 1, 2,) and a second variable that is identically equal to negative one. The regression equation is

e,(Y; - 1, 2,) = W,(Y; - 1, go) Afl - Aeo + e, , (10.4.7)

and the estimator of AS is

where ~ = ~ ( Y ; - l . 8 0 ) , ~ n = = n - ' Z ~ = , ~ , andi?,=e,(Y;-l,i?,). Thepivotal statistic associated with (10.4.7) is

(10.4.8)

Where n

s2 = (II - 2)-' (2, + he ,̂ - A&)' 1= 1

and Ai?, is the estimated regression coefficient for the regression equation (10.4.7). The limiting distribution of n AB follows from the results of Section 10.1.1.

Lemma 10.4.1. Let U, satisfy the model (10.4.1) with flo = - 1. Then

II AS". -0.5[G - H2 - TK + (12)-'T2]-' , * a T+ -O.J[G - H* - TK + (12)- 'T2]-"' ,

where A b is defined in (10.4.7), G, T, and H are defined in Theorem 10.1.3, and K is defined in Theorem 10.1.6.

Proof. Under the null model, the repsion coefficient estimating the change in fi is

Page 655: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

632 UNIT ROOT AND EXI"VE TIME SGRIES

and

Hence,

1 = I i ('g e,)e, = 0.5(x: - 1=0 e:) ,

rn n

1=1

nA@= - I 2 ' G,, - N: - T,Kn f (12) T, (10.4.10)

Where

and n- 1

Kn = n-'" (n - j ) ( j - l)ej. J- 1

The result for n A s follows by the arguments of Theorem 10.1.2. Because A)? O,(n-') and Ago = Op(n-1'2), s2 converges to u2 and we have the result for r. A

* Percentiles for the test statistic rare given in Table 10.A.12 of Appendix 1O.A.

If the true Po is greater than - 1, the Gauss-Newton procedure should produce a positive A@ to move the estimate away from -1. HOW~V~X, under the null model all values of A@ are negative. Therefore, the null hypothesis is rejected for the least negative of the values, the right tail of Table 10.A.12.

Consider the testing situation in which the null model is

Y, =e, - e l - ] , r = 1,2, . . . , (10.4.1 1)

and the alternative model is

l', = p + /3el.-l + e l , t = 1,2,. . . , (10.4.12)

where < 1. Under the altemative model

Y, =f;(Y; p, B eo) 3- el * (10.4.13)

Page 656: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

TESTING FOR A UNIT ROOT IN A MOVING AVERAGE MODEL 633

where

for t = 2,3,. . . . The partial derivatives are such that one step of a Gauss-Newton iteration for (p, p, e,) with (0, - 1, go) as the initial value is given by the regression of e,(Y; 0, defined in (10.4.5) on (t, W,, -l), where 0, is defined in (10.4.4) and W, = W,(Y, 0, - L E O ) is defined in (10.4.6). Under the null, the distribution of the regression pivotal for W,(Y; 0, - 1, go) is that tabulated as 7, in Table 10.A.2.

Lemma 10.42. Let the model (10.4.1) hold with /? = -1. Let Afi be the coefficient of W, in the regression of e,(Y; 0, -1, 2,) on (t, W,, - I), and let 7: be the corresponding regression pivotal. Then

re 2 -112 ?7+ 0.5(G - H 2 - 3K ) [(T - 2H)(T - 6K) - 11 ,

where G, T, H, and K are defined in Theorem 10.1.6.

Prd , Under the null model

where

and XI = Z;=, ei. This is the same as the expression for d1 in the first order model (10.1.34), and the result follows from Theorem 10.1.6. A

A testing situation in which the null model contains a constant t m is also of interest. The null model is

K = p + e , - e , - , , f = 1,2,. * . , (10.4.14)

and the alternative model is (10.4.12) with lpl< 1. The test statistic is conshucted in a similar manner to that for the null model (10.4.11). The initial values for p and e, are

Page 657: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

634

and

UNIT ROOT AND BXPLOSIVB TlMfi SERIES

n f

Zo = -n-' 2 y/ + 0.5ji4n + 1). r = l / = l

The value for Z1 is Y , + g O - f i , we have Z,=2i=lY,+Zo-fp for t = 2,3 ,..., n,and

t - I

W,(Y; j i , - 1, Zo) = -OSt(t - 1)h + 2 jq-, + . j - 1

TIE test statistic tp is the regression pivotal for AP associated with the regression equation

e,(Y @, -1, go) = ( t - 1 ) A k + Y(Y; A -I ,&) A@ - Ae, +e, . (10.4.15)

The limiting distribution of the test statistic is given in Lemma 10.4.3. The distribution of the test statistic is tabled in the second part of Table 10.A.12.

Lemma 10.4.3. Let the model (10.4.12) hold witb @ = -1. Let A 4 be the regression coefficient for W,(Y; A -1, go) in the regression (10,4.15), and let Cfi be the corresponding regression pivotal. Then

where 0, = G - H2 - 3K2 + 0S(T - 2U)2 - (T - 2H)(12L - yi - 3K),

G, H, and K are defined in Theorem 10.1.6, and W(r) is the standard Wiener process.

proof. Omitted. See Arellano and Pantula (1995). A

The testing procedure can be extended to the autmgressive moving average process. Let the mode1 be

P a

or P a

(10.4.16)

Page 658: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

TEsnNO POR A UNlT ROOT 1N A MOVlNG AVERAGE MODEL 635

where the roots of

are less than one in absolute value, 5, = Xfsl p,, and a, = -ZyG1 4, j = 2.3,. . . , q. Assume it is desired to test 5, = -1. The derivatives of the function with respect to the autoregressive moving average parametem are given in (8.4.9) of Section 8.4. If we set f l = -1 in the original model, we have

P 4

f o r r = p + l , p + 2 ,..., whereX,=Eisly/,and

4

K = -e, - Z: i=2

Therefore, treating K as fixed, XI is a stationary process (or is converging to a stationary process) with mean K. If the original model contains a mean p, then the model for X, contains the mean function + K$. The testing procedure consists of the following steps:

1. Create the variable X, = 5. If the null model is of the form (10.4.16), estimate f ) = ( a , , q , .. .,ap, 12,. . . , l,) and K by any of the procedures of Section 8.4. If the null model contains an unknown mean, estimate 8 and (KO, K ~ ) by any of the procedures of Section 8.4. Let Z, = e,(Y,; P', 2, f ) be the residuals from this fitting operation for (10.4.16).

2. For the model (10.4.16), evaluate the derivatives at (@', K, 5,) = ( P ' , k, - 1) and carry out one step of the Gauss-Newton procedure with Z1 as the dependent variable. The test of a, = -1 is a test for a unit root of the moving average paa of the process. The critical values for the test are tbe *7 of Table 10.A.9. If the null model contains an unknown mean, (k,, E , ) is a part of t&e model and changes in tbese two parameters are estimated as part of the Gauss-Newton step. critical values for the test of the unknown-mean null model based on c, are the values given as *., in Table 10.A.9.

Tanaka (1990). Tsay (1993). Saikkonen and Luukkonen (1993). and Breihmg (1994) have discussed alternative testing procedures.

Example 10.4.1. To illustrate testing for a moving average unit root we use computer generated data. The data are given in Table 10.33.3. The model for the data is

Page 659: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

636 UNIT ROOT AND EXPLOSIVB TIME SERIES

y, - p + q(L.1 - P) + q A K - 2 - =e, +S,e,-1 + &el-,

=el + 4',e,-, + -e,-z),

(10.4.17)

where the er are NI(0, a*) random variables, l1 = P, + &, and 5, = -&. The maximum likelihood estimates of the parameters ate

(p, 61, &,&,&)=(- 0.021, - 1.375, 0.601, - 0.471, - 0.401). (0.060) (0.100) (0099) (0.118) (0.117)

The roots of the moving average polynomial are 0,911 and -0.440. To test the hypothesis that one of the roots of the moving average polynomial is one, we create the variable XI = 2jz1 u/ and estimate the parameters of the model

(10.4.18)

where K,-, = -e, - h e - , and K! = p( 1 + aI + %), by maximum likelihood. The estimates are

(GI, &, 12, 4, gl) = (- 1.385, 0.573, 0.481, 0.532, - 0.009). (0,100) (0.098) (0.110) (1.427) (0.024)

We construct a test for the null model of the form (10.4.17). The dependent variable for the Gauss-Newton step is the residual Zl obtained from the fit of model (10.4.18). We write the regression equation as

The derivatives defining the explanatory variables are given by the recursive equations

Q all = mT-1 + Q u , , t - i - lz(Q,,.,-i - Q o , , , - z )

=-Y1, t = 2 ,

Qa, , t= - L 2 + Qa2.r-i - L(Qu2, t - i - Q m z , t - 2 ) *

Q C , . l = 81-i + Q,, , t - i - l z ( Q ~ ~ , t - i - Qr,.c-z) 9

QL2,t = 8t - i - gt-2 + Qgz,r - i - L ( Q L z , t - i - Q ~ ~ . i - z )

where the equations hold for t 2 3, and it is understood that the values are zero for t = 1 and t = 2 unless otherwise specified.

Page 660: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

TESTING FOR A UNIT ROOT IN A MOVDJG AVERAGE MODEL 637

Regressing gr on the derivative in a regression with 98 observations, we obtain the estimated regression equation,

5, = 0.23 - 0.003 (r - 2)- O,OllQ,l,r- 0.016Q,2,1- 0.081QL1,,+ O.058QL2,,. (0.32) (0.004) (0.102) (0.101) (0.051) (0.114)

The test that the moving average contains a unit root is * T- = -(0.051)-'(0.081) = -1.39.

The hypothesis of a unit root is just accepted at the 5% level, because -1.59 is just less than the tabular value of -1.563. The hypothesis is rejected at the 10% level because the calculated value is greater than the 10% tabular value of - 1.71 1,

If the null model is (10.4.17) with ( f;, p ) = (- LO), and the alternative is the model (10.4.7), we estimate the model in X, that contains only u,,. The maximum likelihood estimates are

(it,, 4, 12, %)=(- 1.385, 0.572, 0.483, 0.074). (10.4.20) (0.099) (0.098) (0.109) (0.724)

The derivatives for (a,, %, &, &) are found in the same manner as for the alternative model with a mean. One step of the Gauss-Newton procedure is computed with the intercept but no time variable. The result of the Gauss-Newton step is

Er = - 0.11 - 0.002 Q,,,, - 0.016 Q,,, - 0.059 Q,,,, + 0.051 Qc2,f (0.26) (0.102) (0.101) (0.042) (0.112)

and the test statistic is % = - 1.40. In this case, the hypothesis of a moving average unit root is accepted at the 10% level, because the calculated value is less than the tabular value of -1.282.

If the null is (10.4.17) with (c,, p) = (- 1,O) and the alternative is the model (10.4.17), we use the residuals associated with the estimates (10.4.20) but include time in the Gauss-Newton step. The result is

2, = 0.43 - 0.006 (t - 2) - 0.009 Q,,, (0.36) (0.004) (0.102)

- 0.018 Q,,,, - 0.102 QC,., + 0.070 Qs,,, , (0.100) (0.053) (0.113)

and the test statistic is $, = - 1.91. Because the 10% critical value of the test obtained from Table lO.A.2 is -1.22, the null hypothesis is accepted at the 10% level. AA

Page 661: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

638 UNIT ROOT AND ExeLOsIVE TIME SeRIES

Sectton 10.1. Bhargava (1986), Dickey (1976). Dickey and FblIer (1979, 1981), Dickey, Hasza. and miller (1984), Dickey and Said (1982), Elliott, Rothenberg, and Stock ( 19921, Evans and Savin ( 198 1 a,b), Gonzalez-Frrrias and Dickey ( 1992), Lai and Wei (1985a.b), Pantula (1982, 1988a), Pantula, Ganzalez-Faxias, and FulIer (1994). Phillips (1987a-c), Phillips and Pemn (19881, Rao (196l), Reeves (1972), Sargan and Bhargava (1983), Stigum (1974). White (1958).

Sectton 10.2. Anderson (1959). Basawa (1987), Fuller and Hasza (1980, 1981), Has% (19771, Rao (1961, 1978a,b), Rubin (1950), Stigum (1974), Venkataraman (1%7, 1973), White (1958, 1959).

Section 103. Ahn and Reinsel (1990). Chan and Wei (1988), Fountis and Dickey (1989), FuIler, Hasza, and Goebel(1981), Johansen (1988). Johansen and Juselius (1990), Lai and Wei (1985a,b), Nagaraj and Fuller (1991), Phillips (1987a-c, 1988, 1991), Phillip and Durlauf (1986), Stock and Watson (1988).

Seetion 10.4. Arellano (1992), Arellano and Pantula (1993, Chang (19891, Tsay (1993).

EXERCISES

1. Prove

for fixed X,, where the e, are iid(0, @').

2. Let the model (10.1.1) hold with p = 1. (a) Show that

n[ (i Yf)-' 2 Y,-,Y, - 1 -% -(2G)-'(T2 + 1). t c 1 r 9 2 1

(b) Show that

3. Let Y, be the pth order autoregressive process with a unit root considered in Thearem 10.1.2. Let rii, be the largest root of the characteristic equation associated with the ordinary least squares estimator. Show that

n{ [ (2 r=2 Yf)-' r-2 5 Y f - l ] ' ' 2 ~ t - 1 ) z -(2G)-' ,

where G is defined in Theorem 10.1.1.

Page 662: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

EXERCISES 639

4. Assume that Y, = Y,-, + e,, t = 1,2,. . . , where Yo = 0, and e, - NI(0, a'). Show that

5. Assume that the time series U, satisfies the equation

U , = p q - , + e l , r = l , 2 ,..., where Yo = 0 and e, -NI(O, 1). Let $= b, if @, d 1 and J= 1 if fi, > 1, where

Let bm be the value of p that maximizes (10.1.47). Show that the limiting distribution of n($- 1 ) has a smaller second moment about zero than the limiting distribution of n(& - 1) when po = I .

6. Show that, under normality, the likelihood ratio statistic for testing H,,: at = a; against HA: a, #a: for the model (10.2.6) is a function of .i defined in cor0Ilal-y 10.2.1.2.

7. Plot the residuals from the model of Example 10.1.1 against time. Do you believe the original errors are identically distributed over time? Do you believe the original errors are normally distributed? Fit autoregressive models of orders 2, 3, and 4 to the square mts of the original observations. Plot the residuals from the second order model.

Page 663: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

640 UNIT ROOT AND EXPLOSIVE TIME SERIES

8. Let the assumptions of Theorem 10.3.1 hold for the model

Y , = & + ~ ' X , ~ , + ~ l Y , ~ l + e , .

Show that

where

t,, = [e{.($)J-"2(63, - 11,

t{ijl} is the ordinary least squares estimator of the variance of the or$- least squares estimator $,, 7; has the distribution of Table lO.A.2, +* -+ and d is a N(0,l) random variable independent of tjl.

9. Use the facts that & k = 8 , and &Xe$=I, where A= diag(X,, &, . . . , &), to show that the expression (10.3.47) for BiSl2 is equivalent to that in (10.3.37) when g = I .

10. Let the following model hold

YlJ = 75Y2.t-I + el1 * y21= Y2.f- 1 + ez:

for t = 2.3,. . . , where YZ0 = 0 and (ebt,e2f)'--NI(0,Z). Show that the maximum likelihood estimator of v1 is given by the coefficient of Y2,f - in the regression of Yl on (Y2,, - , , Yz, - k'2at- ), What is the coefficient of Yz, - Y2,, - , estimating? Show that the limiting distribution of the usual regression t- statistic for the coefficient of Yz,t-, in the multiple regression is that of a N(0,l) random variable. See Phillips (191).

11. Let model (10.1.25) hold with el = 1 and assume e -NI(O, a 2 ) . Show that A? of (10.1.26) is uncorrelated with dl - 1 but that ntS2 Ahp is not independent of n(8, - I ) in the limit.

12. Assume that (Zt, Y,) of Table lO.B.4 satisfy the model

4 = a0 + alu, + e3t 9

Y, = r2,, + 7r2,Y,-, + T~~ AKml + e2, ,

where (e3r, eZr)' -NI(O, Z,,), r2] E (- 1, I], and r2* = 0 if 7r2, = 1. Estimate the parameters of the model by ordinary least squares and by the system method of Example 10.3.1. Also estimate the parametem subject to the restriction that ( T ~ ~ ~ rZl) = (0,l).

Page 664: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

A P P E N D I X A

Appendix 1 O.A. Percentiles for Unit Root Distributions

Table 1O.A.1. Empirical Cumulative DtsMbution of n($ - 1) for p = 1

Sample Size n 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99

Probability of a Smaller Value

25 50

100 250 500 W

25 50

100 250 500 W

25 50

100 250 500 W

-11.8 - 12.8 -13.3 - 13.6 -13.7 - 13.7

- 17.2 -18.9 - 19.8 -20.3 -20.5 -20.6

-22.5 -25.8 -27.4 - 28s -28.9 -29.4

-9.3 -9.9 - 10.2 - 10.4 - 10.4 - 10.5

- 14.6 - 15.7 - 16.3 - 16.7 - 16.8 - 16.9

-20.0 -22.4 -23.7 -24.4 -24.7 - 25.0

-7.3 -7.7 -7.9 - 8.0 - 8.0 -8.1

-12.5 -13.3 -13.7 - 13.9 - 14.0 -14.1

- 17.9 - 19.7 -20.6 -21.3 -21.5 -21.7

ri -5.3 -0.82 -5.5 -0.84 -5.6 -0.85 -5.7 -0.86 -5.7 -0.86 -5.7 -0.86

4 -10.2 -4.22 -10.7 -4.29 -11.0 -4.32 -11.1 -4.34 -11.2 -4.35 -11.3 -4.36

A -15.6 -8.49 -16.8 -8.80 -17.5 -8.96 -17.9 -9.05 -18.1 -9.08 -18.3 -9.11

1.01 0.97 0.95 0.94 0.93 0.93

-0.76 -0.81 -0.83 -0.84 -0.85 -0.85

-3.65 -3.71 -3.74 -3.76 -3.76 -3.77

1.41 1.34 1.31 1.29 1.29 1.28

0.00 -0.07 -0.11 -0.13 -0.14 -0.14

-2.51 -2.60 -2.63 -2.65 -2.66 -2.67

1.78 1.69 1.65 1.62 1.61 1.60

0.64 0.53 0.47 0.44 0.42 0.41

-1.53 -1.67 -1.74 - 1.79 - 1.80 -1.81

2.28 2.16 2.09 2.05 2.04 2.03

1.39 1.22 1.13 1.08 1.07 1.05

-0.46 -0.67 -0.76 -0.83 -0.86 -0.88

~

NOTE!. This table was construd by David A. Dicky using the Monte Carlo method. Details are given in Dickey (1976). Standard errors of the estimates vary, but most are less than 0.10 for entries in the left half of the table and less than 0.02 for entries in the right half of the table. Some entries differ slightly from those of the first edition.

641

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 665: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

642 APPENDDL A

Tabie 10.A.2. Emptrkal Cmnulatlve Distribution of B for p = I

Sample Size n 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99

Probability of a S d l e r Value

9

25 50 100 250 500

W

25 50 100 250 500 02

25 50 100 250 So0 W

-2.65 -2.26 -1.95 -1.60 -0.47 0.92 1.33 1.70 -2.62 -2.25 -1.95 -1.61 -0.49 0.91 1.31 1.66 -2.60 -2.24 -1.95 -1.61 -0.50 0.90 1.29 1.64 -2.58 -2.24 -1.95 -1.62 -0.50 0.89 1.28 1.63 -2.58 -2.23 -1.95 -1.62 -0.50 0.89 1.28 1.62 -2.58 -2.23 -1.95 -1.62 -0.51 0.89 1.28 1.62

-3.75 -3.33 -2.99 -2.64 -1.53 -0.37 0.00 0.34 -3.59 -3.23 -2.93 -2.60 -1.55 -0.41 -0.04 0.28 -3.50 -3.17 -2.90 -2.59 -1.56 -0.42 -0.06 0.26 -3.45 -3.14 -2.88 -2.58 -1.56 -0.42 -0.07 0.24 -3.44 -3.13 -2.87 -2.57 -1.57 -0.44 -0.07 0.24 -3.42 -3.12 -2.86 -2.57 -1.57 -0.44 -0.08 0.23

t -4.38 -3.95 -3.60 -3.24 -2.14 -1.14 -0.81 -0.50 -4.16 -3.80 -3.50 -3.18 -2.16 -1.19 -0.87 -0.58 -4.05 -3.73 -3.45 -3.15 -2.17 -1.22 -0.90 -0.62 -3.98 -3.69 -3.42 -3.13 -2.18 -1.23 -0.92 -0.64 -3.97 -3.67 -3.42 -3.13 -2.18 -1.24 -0.93 -0.65 -3.96 -3.67 -3.41 -3.13 -2.18 -1.25 -0.94 -0.66

2.15 2.08 2.04 2.02 2.01 2.01

0.71 0.66 0.63 0.62 0.61 0.60

-0.15 -0.24 -0.28 -0.31 -0.32 -0.32

NOTE. This table was constructed by David A. Dickey using Ihe Monte Carlo method. Derails are given in Dickey (1976). Standard em#s of the estimates vary. but most lcss than 0.014. Some entries differ slightly from those of the first edition.

Page 666: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPENDIX A 643

Table 10.A.3. Cumulative Distribution of Simp€e Symmetric for p = 1

Samph Size n 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99

hobability of a Smaller Value

25 50

100 250 500 m

25 50

100 250 500

OD

25 50

100 250 500 W

25 50

100 250 500 m

No Adjustment

-2.72 -2.35 -2.05 -1.74 -0.87 -0.43 -0.37 -0.33 -0.29 -2.71 -2.36 -2.08 -1.78 -0.90 -0.44 -0.38 -0.33 -0.29 -2.70 -2.37 -2.09 -1.79 -0.91 -0.45 -0.38 -0.34 -0.30 -2.70 -2.37 -2.10 -1.80 -0.92 -0.46 -0.39 -0.34 -0.30 -2.70 -2.37 -2.10 -1.80 -0.92 -0.46 -0.39 -0.34 -0.30 -2.69 -2.37 -2.10 -1.81 -0.93 -0.46 -0.39 -0.34 -0.30

Mean Removed

-3.40 -3.02 -2.71 -2.37 -1.42 -0.83 -0.73 -0.65 -0.59 -3.28 -2.94 -2.66 -2.35 -1.44 -0.84 -0.73 -0.65 -0.58 -3.23 -2.90 -2.64 -2.34 -1.44 -0.84 -0.73 -0.65 -0.58 -3.20 -2.88 -2.62 -2.34 -1.45 -0.85 -0.73 -0.66 -0.58 -3.19 -2.88 -2.62 -2.33 -1.45 -0.85 -0.73 -0.66 -0.58 -3.17 -2.87 -2.62 -2.33 -1.45 -0.85 -0.73 -0.66 -0.58

Linear Trend Removed

-4.19 -3.76 -3.45 -3.09 -2.10 -1.42 -1.28 -1.18 -1.07 -3.99 -3.36 -3.36 -3.04 -2.11 -1.44 -1.29 -1.18 -1.07 -3.89 -3.57 -3.31 -3.02 -2.12 -1.44 -1.30 -1.19 -1.07 -3.84 -3.54 -3.29 -3.01 -2.12 -1.45 -1.30 -1.19 -1.07 -3.82 -3.52 -3.28 -3.00 -2.12 -1.45 -1.30 -1.19 -1.07 -3.80 -3.51 -3.27 -299 -2.12 -1.45 -1.30 -1.19 -1.07

Quadratic Trend Removed

-4.75 -4.30 -3.97 -3.61 -2.57 -1.84 -1.69 -1.57 -1.45 -4.50 -4.14 -3.85 -3.54 -2.58 -1.86 -1.70 -1.58 -1.45 -4.38 -4.05 -3.79 -3.50 -2.58 -1.87 -1.70 -1.58 -1.45 -4.30 -4.00 -3.76 -3.47 -2.58 -1.87 -1.70 -1.58 -1.44 -4.28 -3.99 -3.74 -3.47 -2.58 -1.87 -1.71 -1.58 -1.44 -4.26 -3.97 -3.73 -3.46 -2.58 -1.87 -1.71 -1.58 -1.44

NOTE. This table was constructed by David A. Dickey.

Page 667: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

644 APPENDIX A

TaMe 10A.4. Cumulative DisMbmtion of Wesgbtd SymmeMe +w for p = 1

Sample Probability of a SznaUcrValue Size n 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99

2s 50 100 250 500 m

25 50 100 250 500 m

25 50 100 250 500 03

No Adjustment

-2.73 -2.37 -2.09 -1.80 -1.05 -0.05 -2.74 -2.40 -2.13 -1.85 -1.09 -0.06 -2.74 -2.42 -2.16 -1.88 -1.10 -0.06 -2.75 -2.43 -2.17 -1.89 -1.12 -0.06 -2.75 -2.43 -2.18 -1.90 -1.12 -0.06 -2.76 -2.44 -2.18 -1.90 -1.12 -0.07

Mean removed

-3.33 -2.92 -2.60 -2.26 -1.19 -0.07 -3.21 -2.85 -2.57 -2.25 -1.19 -0.04 -3.16 -2.82 -2.55 -2.24 -1.20 -0.02 -3.12 -2.80 -2.54 -2.23 -1.u) -0.01 -3.11 -2.80 -2.53 -2.23 -1.20 -0.00 -3.10 -2.79 -2.52 -2.22 -1.20 -0.00

Linear Trend Removed

-4.11 -3.70 -3.37 -3.02 -1.98 -1.07 -3.93 -3.57 -3.28 -2.96 -1.96 -1.01 -3.84 -3.51 -3.24 -2.94 -1.96 -0.97 -3.78 -3.47 -3.21 -2.92 -1.95 -0.95 -3.76 -3.45 -3.20 -2.91 -1.95 -0.95 -3.75 -3.45 -3.19 -2.91 -1.94 -0.94

0.24 0.48 0.24 0.51 0.25 0.53 0.25 0.53 0.25 0.54 0.26 0.54

0.25 0.51 0.30 0.58 0.32 0.62 0.34 0.64 0.34 0.65 0.35 0.65

-0.82 -0.60 -0.72 -0.48 -0.68 -0.42 -0.65 -0.38 -0.64 -0.37 -0.63 -0.36

0.80 0.83 0,85 0.86 0.86 0.86

0.84 0.93 0.98 1.01 1.01 1 .or

-0.35 -0.19 -0.11 -0.06 -0.05 -0.03

NOTE. This table was conshuc(ed by Heon Jin Park.

Page 668: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPI?NDIX A 645

Table 10.A.5. Cumulative Distribution of Maximum Likelihood +m for p = 1

Sample Size n 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99

No Adjustment

Robability of a Smaller Value

25 50

100 250 500 00

25 50

100 250 500 00

25 50

100 250 500 00

-2.80 -2.44 -2.16 -1.86 -0.99 -0.40 -0.32 -0.27 -2.78 -2.44 -2.17 -1.87 -1.00 -0.40 -0.32 -0.27 -2.77 -2.44 -2.17 -1.88 -1.00 -0.39 -0.32 -0.27 -2.76 -2.44 -2.17 -1.88 -1.00 -0.39 -0.32 -0.27 -2.75 -2.44 -2.17 -1.88 -1.00 -0.39 -0.31 -0.27 -2.75 -2.44 -2.17 -1.88 -1.00 -0.39 -0.31 -0.27

Mean Removed

-3.52 -3.11 -2.78 -2.44 -1.44 -0.65 -0.51 -0.42 -3.33 -2.98 -2.69 -2.38 -1.42 -0.64 -0.51 -0.42 -3.24 -2.92 -2.64 -2.34 -1.41 -0.64 -0.50 -0.42 -3.19 -2.88 -2.61 -2.32 -1.40 -0.64 -0.50 -0.41 -3.17 -2.86 -2.60 -2.31 -1.39 -0.64 -0.50 -0.41 -3.16 -2.85 -2.59 -2.30 -1.38 -0.63 -0.50 -0.41

Linear Trend Removed

-4.40 -3.97 -3.62 -3.25 -2.17 -1.35 -1.15 -1.00 -4.07 -3.72 -3.43 -3.11 -2.12 -1.34 -1.14 -1.00 -3.92 -3.60 -3.34 -3.04 -2.10 -1.33 -1.14 -0.99 -3.84 -3.54 -3.28 -3.00 -2.08 -1.32 -1.13 -0.99 -3.82 -3.52 -3.27 -2.98 -2.08 -1.32 -1.13 -0.99 -3.79 -3.50 -3.25 -2.97 -2.07 -1.31 -1.13 -0.99

-0.23 -0.22 -0.22 -0.22 -0.22 -0.22

-0.35 -0.34 -0.34 -0.34 -0.34 -0.34

-0.87 -0.86 -0.86 -0.85 -0.85 -0.85

NOTE. This table was constructed by Heon Jin Park.

Page 669: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

646 APPBNDIX A

Table 10.A.6. Vector O r d i ~ q Lepst Squares: No Aaustmcnt

Probability of a Smaller Value

D.f. Ho H, 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99 ~~ ~ ~~~~

23 1 0 0.00 0.00 0.01 0.02 0.58 2.70 3.68 4.62 5.80

22 2 0 1.18 1.49 1.84 2.31 4.75 8.35 9.53 1060 11.89 2 1 1.01 1.28 1.58 1.99 4.16 7,42 8.48 9.47 10.59

21 3 0 5.38 6.11 6.80 7.67 11.34 15.76 17.09 18.34 19.83 3 1 5.11 5.83 6.50 7.32 10.84 15.06 16.34 17.49 18.86 3 2 3.34 3.82 4.30 4.90 7.48 10.63 11.58 12.37 13.32

20 4 0 12.06 13.11 14.04 15.19 19.62 24.63 26.10 27.47 29.00 4 1 11.78 12.82 13.72 14.84 19.15 24.03 25.46 26.75 28.29 4 2 9.78 10.67 11.46 12.44 16.17 20.27 21.43 22.51 23.70 4 3 5.81 6.40 6.92 7.55 10.05 12.77 13.54 14.19 14.91

48 1 0 0.00 0.00 0.01 0.02 0.59 2.83 3.90 4.96 6.36

47 2 0 1.21 1.56 1.93 2.43 5.12 9.35 10.84 12.23 13.90 2 1 1.04 1.34 1.65 2.10 4.50 8.39 9.79 11.02 12.67

46 3 0 5.80 6.64 7.43 8.44 12.84 18.53 20.38 22.06 24.13 3 1 5.53 6.35 7.10 8.06 12.31 17.84 19.62 21.24 23.19 3 2 3.61 4.17 4.73 5.42 8+62 12.95 14.38 15.70 17.23

45 4 0 13.55 14.85 16.04 17.47 23.31 30.26 32.42 34.36 36.67 4 1 13.26 14.53 15.68 17.10 22.81 29-65 31.80 33.67 35.88 4 2 11.08 12.15 13.19 14.43 19.49 25.56 27.40 29.07 31.01 4 3 6.62 7.34 8.02 8.87 12.41 16.82 18.19 19.38 20.83

98 1 0 0.00 0.00 0.01 0.02 0.59 2.91 4.03 5.16 6.69

97 2 0 1.23 1.59 1.97 2.49 5.31 9.91 11.56 13.15 15.08 2 1 1.06 1.37 1.69 2.15 4.67 8.91 10.49 11.98 13.84

96 3 0 6.00 6.89 7.74 8.83 13.62 20.08 22.23 24.21 26.64 3 1 5.73 6.59 7.39 8.44 13.09 19.37 21.46 23,40 25.74 3 2 3.74 4.34 4.94 5.69 9.23 14.29 16.00 17.59 19.56

95 4 0 14.33 15.76 17.06 18.66 25.32 33.50 36.10 38.44 41.23 4 1 14.01 15.41 16.68 18.26 24.81 32.86 35.44 37.74 40.48 4 2 11.72 12.93 14.07 15.46 21.32 28.61 30.94 33.02 35.46 4 3 7.05 7.84 8.60 9.56 13.71 19.19 20.97 22.61 24.61

Page 670: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPENDIX A 647

Table 10.A.6.-Conrinued

probability of a Smaller Value

D.f. H, H, 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99

248 1 0 0.00 0.00 0.01 0.02 0.60 2.95 4.10 5.27 6.87

247 2 0 1.25 1.61 1.99 2.53 5.42 10.26 12.01 13.73 15.87 2 1 1.07 1.38 1.71 2.18 4.77 9.25 10.93 12.58 14.59

246 3 0 6.12 7.03 7.92 9.05 14.11 21.07 23.42 25.60 28.28 3 1 5.83 6.71 7.56 8.65 13.57 20.35 22.65 24.79 27.41 3 2 3.82 4.44 5.06 5.85 9.61 15.15 17.05 18.83 21.11

245 4 0 14.80 16.32 17.69 19.39 26.57 35.59 38.49 41.15 44.27 4 1 14.46 15.94 17.30 18.98 26.06 34.95 37.81 40.45 43.55 4 2 12.11 13.42 14.62 16.10 22.47 30.59 33.25 35.66 38.48 4 3 7.31 8.15 8.96 9.99 14.54 20.74 22.83 24.79 27.21

498 1 0 0.00 0.00 0.01 0.02 0.60 2.96 4.12 5.30 6.91

497 2 0 1.25 1.61 2.00 2.54 5.46 10.37 12.16 13.92 16.14 2 1 1.08 1.39 1.72 2.19 4.80 9.36 11.08 12.75 14.86

4% 3 0 6.16 7.07 7.98 9.12 14.28 21.42 23.84 26.07 28.85 3 1 5.86 6.75 7.61 8.72 13.73 20.69 23.06 25.27 27.98 3 2 3.84 4.47 5.09 5.90 9.74 15.43 17.42 19.28 21.66

495 4 0 14.95 16.50 17.90 19.65 26.99 36.31 39.32 42.10 45.34 4 1 14.62 16.12 17.51 19.23 26.48 35.67 38.64 41.42 44.62 4 2 12.23 13.57 14.80 16.33 22.86 31.27 34.04 36.58 39.56 4 3 7.40 8.27 9.09 10.14 14.82 21.28 23.49 25.55 28.13

00 1 0 0.00 0.00 0.01 0.02 0.60 2.97 4.13 5.31 6.94

2 0 1.25 1.62 2.01 2.55 5.49 10.48 12.31 14.10 16.42 2 1 1.08 1.39 1.72 2.20 4.83 9.48 11.23 12.88 15.14

3 0 6.19 7.11 8.03 9.18 14.46 21.78 24.26 26.54 29.41 3 1 5.89 6.78 7.66 8.77 13.91 21.06 23.47 25.75 28.56 3 2 3.87 4.49 5.13 5.95 9.88 15.71 17.82 19.77 22.22

4 0 15.10 16.67 18.12 19.91 27.41 37.04 40.17 43.08 46.47 4 1 14.78 16.29 17.73 19.50 26.89 36.40 39.51 42.44 45.69 4 2 12.36 13.72 15.00 16.56 23.25 31.96 34.81 37.54 40.71 4 3 7.50 8.39 9.21 10.31 15.11 21.83 24.18 26.31 29.07

NOTE. This table was constructed by Heon Jm park. See equation (10.3.48) for the definition of the statistics.

Page 671: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

648 APPENDIX A

Table fO.A.7. Vector OrdlaarJr Leust Squares: Mean Rornovsd

D.f.

22

21

20

19

47

46

45

44

97

96

95

94

Probability of a Smaller Vaiue

H, H, 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99

1 0 0.00 0.02 0.07 0.24 2.13 5.20 6.25 7.23 8.44

2 0 2.51 2.99 3.51 4.18 7.32 11.27 12.57 13.68 15.03 2 1 2.07 2.46 2.85 3.38 5.79 8.92 9.89 10.75 11.75

3 0 7.90 8.74 9.50 10.46 14.38 18.93 20.30 21.53 23.03 3 1 7.33 8.11 8.80 9.67 13.18 17.17 18.36 19.43 20.65 3 2 4.53 5.05 5.53 6.13 8.59 11.41 12.23 12.93 13.73

4 0 15.22 16.27 17.23 18.37 22.74 27.65 29.10 30.41 32.M 4 1 14.62 15.63 16.53 17.60 21.69 26.18 27.50 28.66 29.99 4 2 11.78 12.62 13.37 14.28 17.69 21.34 22.41 23.30 24.32 4 3 6.79 7.33 7.82 8.39 10.64 13.02 13.65 14.19 14.81

1 0 0.00 0.02 0.07 0.25 2.30 5.88 7.20 8.43 9.99

2 0 2.69 3.28 3.85 4.63 8.33 13.42 15.07 16.57 18.40 2 1 2.23 2.68 3.13 3.74 6.65 10.80 12.25 13.51 15-10

3 0 8.92 9.93 10.91 12.13 17.22 23.46 25.41 27.19 29.30 3 1 8.28 9.23 10.11 11.24 15.87 21.60 23.36 24.98 26.90 3 2 5.15 5.81 6.43 7.20 10.54 14.88 16.23 17.44 18.92

4 0 18.02 19.46 20.75 22.33 28.58 35.83 38.06 40.07 42.40 4 1 11.33 18.74 19.95 21.45 27.42 34.28 36.36 38.28 40.49 4 2 14.05 15.23 16.28 17.57 22.72 28.65 30.48 32.09 33.99 4 3 8.19 8.96 9.65 10.50 14.01 18.23 19.54 20.65 22.10

I 0 0.00 0.02 0.07 0.26 2.38 6.26 7.72 9.10 10.82

2 0 2.77 3.40 4.03 4,86 8.89 14.60 16.53 18.29 20.50 2 1 2.31 2.79 3.27 3.93 7.12 11.90 13.59 15.11 17.07

3 0 9.44 10.58 11.66 13.03 18.79 26.10 28.45 30.61 33.26 3 1 8.78 9.84 10.82 12.08 17.38 24.18 26.36 28.37 30.83 3 2 5.48 6.21 6.91 7.77 11.65 16.93 18.67 20.30 22.26

4 0 19.47 21.12 22.61 24.47 31.87 40.67 43.45 45.94 48.% 4 I 18.75 20.34 21.76 23.53 30.65 39.07 41.72 44.14 47.05 4 2 15.24 16.59 17.83 19.35 25.59 33.08 35.44 37.58 40.13 4 3 8.93 9.81 10.64 11.66 15.99 21.50 23.29 24.91 26.97

Page 672: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPENDIX A 649

Table 10.A.7.4ontinued

Probability of a Smaller Value

D.f. Ho H, 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99

247 1 0 0.00 0.02 0.07 0.26 2.43 6.47 8.01 9.48 11.34

246 2 0 2.82 3.47 4.13 5.00 9.23 15.36 17.47 19.42 21.93 2 1 2.35 2.86 3.36 4.04 7.41 12.60 14.44 16.17 18.38

245 3 0 9.76 10.99 12.13 13.58 19.77 27.83 30.45 32.88 35.92 3 1 9.08 10.21 11.26 12.60 18.33 25.85 28.33 30.63 33.49 3 2 5.69 6.46 7.20 8.13 12.35 18.29 20.32 22.24 24.57

244 4 0 20.38 22.15 23.78 25.81 33.97 43.88 47.05 49.89 53.48 4 1 19.64 21.35 22.90 24.85 32.71 42.25 45.32 48.09 51.53 4 2 15.99 17.45 18.81 20.50 27.44 36.03 38.79 41.32 44.39 4 3 9.41 10.36 11.27 12.40 17.28 23.73 25.91 27.93 30.44

497 1 0 0.00 0.02 0.07 0.26 2.45 6.52 8.10 9.59 11.53

4% 2 0 2.84 3.49 4.15 5.04 9.33 15.62 17.79 19.82 22.43 2 1 2.36 2.88 3.39 4.07 7.50 12.83 14.72 16.54 18.85

495 3 0 9.86 11.13 12.28 13.75 20.10 28.43 31.15 33.68 36.84 3 1 9.19 10.34 11.41 12.76 18.65 26.43 29.02 31.41 34.42 3 2 5.76 6.54 7.30 8.25 12.59 18.77 20.91 22.92 25.38

494 4 0 20.69 22.51 24.19 26.27 34.70 45.00 48.32 51.30 55.09 4 1 19.% 21.69 23.30 25.30 33.42 43.37 46.59 49.48 53.13 4 2 16.25 17.75 19.15 20.90 28.08 37.07 39.98 42.65 45.93 4 3 9.58 10.56 11.49 12.65 17.72 24.53 26.85 29.02 31.70

co 1 0 0.00 0.02 0.07 0.26 2.46 6.56 8.16 9.68 11.74

2 0 2.86 3.51 4.17 5.07 9.41 15.89 18.11 20.21 22.94 2 1 2.36 2.88 3.41 4.10 7.59 13.05 14.99 16.93 19.36

3 0 9.97 11.25 12.45 13.91 20.42 29.03 31.85 34.49 37.73 3 1 9.29 10.46 11.56 12.92 18.96 27.02 29.70 32.18 35.32 3 2 5.83 6.62 7.39 8.37 12.83 19.27 21.54 23.59 26.19

4 0 21.04 22.89 24.63 26.74 35.44 46.17 49.62 52.77 56.73 4 1 20.31 22.07 23.73 25.76 34.14 44.55 47.90 50.91 54.79 4 2 16.54 18.09 19.52 21.33 28.72 38.12 41.20 44.00 47.55 4 3 9.78 10.78 11.73 12.92 18.17 25.33 27.85 30.17 33.01

NOTE This table was waspNcted by Ha00 Jin Parit. See quation (10.3.48) for the definition of the statistics.

Page 673: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

650 APPENDIX A

Table IO.A.8. Veetor Ordinary Least Sqapres: L W r Trend Removed

Probability of a SmalletValue

D.f. H, H, 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99

21

20

19

18

46

45

44

43

96

95

94

93

I

2 2

3 3 3

4 4 4 4

1

2 2

3 3 3

4 4 4 4

1

2 2

3 3 3

4 4 4 4

0 0.08 0.33 0.71 1.32 3.89 7.19 8.24 9.22 X0.32

0 4.20 4.93 5.62 6.49 10.09 14.22 15.44 16.51 17.78 1 3.24 3.73 4.20 4.79 7.31 10.30 11.18 11.96 12.83

0 10.65 11.62 12.49 13.56 17.72 22.26 23.62 24.78 26.18 1 9.56 10.38 11.14 12.04 15.50 19.26 20.33 21.27 22.39 2 5.73 6.26 6.75 7.35 9.70 12.22 12.95 13.54 14.21

0 18.56 19.68 20.64 21.84 26.30 31.04 32.44 33.67 35.12 1 17.52 18.51 19.40 20.47 24.41 28.53 29.74 30.79 31.% 2 13.77 14.55 15.28 16.17 19.37 22.62 23.56 24.34 25.22 3 7.79 8.28 8.74 9.30 11.34 13.39 13.93 14.38 14.87

0 0.11 0.40 0.80 1.43 4.32 8.44 9.86 11.13 12.76

0 4.71 5.56 6.41 7.46 11.95 17.50 19.31 20.90 22.79 1 3.57 4.16 4.75 5.48 8.74 13.06 14.46 15.75 17.29

0 12.52 13.79 14.96 16.38 22.08 28.75 30.81 32.67 34.83 1 11.18 1230 13.30 14.51 19.47 25.33 27.13 28.77 30.70 2 6.79 7.49 8.16 8.98 12.46 16.71 18.07 19.31 20.78

0 22.95 24.58 25.98 27.73 34.47 42.06 44.25 46.18 48.65 1 21.60 23.05 24.39 24.00 32.18 39.12 41.20 43.01 45.21 2 17.06 18.31 19.46 20.82 26.03 31.92 33.69 35.21 37.14 3 9.77 10.59 11.31 12.17 15.69 19.81 21.05 22.17 23.45

0 0.13 0.43 0.86 1.50 454 9.09 10.72 12.23 14.13

0 4.% 5.89 6.79 7.94 12.95 19.36 21.46 23.41 25.75 t 3.76 4.39 5.02 5.84 9.51 14.62 16.36 17.98 20.02

0 13.48 14.88 16.22 17.85 24.49 32.53 35.06 37.37 40.12 1 12.02 13.26 14.40 15.82 21.69 28.90 31.22 33.31 35.84 2 7.32 8.12 8.89 9.85 14.04 19.47 21.28 22.90 24.95

0 25.21 27.12 28.81 30.92 39.08 48.57 51.44 54.05 57.18 1 23.71 25.44 27.06 29.01 36.60 45.50 48.22 50.68 53.57 2 18.80 20.32 21.70 23.35 29.88 37.63 40.07 42.16 44.76 3 10.85 11.83 12.71 13.78 18.26 23.89 25.69 27.33 29.28

Page 674: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPENDIX A 651

Table lO.Ad.-Continued

Probability of a Smaller Value

D.f. H, H, 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99 ~

246 1 0 0.14 0.44 0.88 1.54 4.68 9.49 11.26 12.93 15.03

245 2 0 5.10 6.08 7.02 8.22 13.56 20.57 22.87 25.06 27.74 2 1 3.86 4.53 5.18 6.05 9.99 15.66 17.63 19.48 21.85

244 3 0 14.05 15.55 16.98 18.75 26.02 35.01 37.89 40.52 43.72 3 I 12.53 13.84 15.08 16.63 23.10 31.26 33.95 3637 39.36 3 2 7.63 8.51 9.34 10.40 15.05 21.33 23.47 25.39 27.84

243 4 0 26.61 28.71 30.58 32.92 42.05 52.92 56.31 59.42 63.11 4 1 25-01 26.95 28.73 30.89 39.47 49.76 52.99 55.92 59.43 4 2 19.91 21.59 23.11 24.94 32.38 41.49 44.42 47.00 50.16 4 3 11.54 12.61 13.59 14.81 19.96 26.71 28.96 31.02 33.54

4% 1 0 0.14 0.45 0.89 1.55 4.72 9.62 11.44 13.17 15.34

495 2 0 5.15 6.13 7.09 8.32 13.76 20.99 23.39 25.66 28.46 2 1 3.89 4.57 5.24 6.11 10.15 16.03 18.09 20.03 22.49

494 3 0 14.23 15.77 17.23 19.05 26.55 35.87 38.88 41.64 45.02 3 1 12.70 14.04 15.30 16.91 23.59 32.10 34.91 37.45 40.63 3 2 7.72 8.64 9.49 10.58 15.40 21.98 24.25 26.28 28.89

493 4 0 27.09 29.26 31.18 33.61 43.09 54.47 58.05 61.33 65.28 4 1 25.46 27.48 29.30 31.53 40.46 51.26 54.69 57.80 61.59 4 2 20.31 22.03 23.58 25.49 33.26 42.87 45.95 48.74 52.18 4 3 11.79 12-88 13.89 15.16 20.56 27.73 30.15 32.38 35.13

m 1 0 0.14 0.45 0.89 1.55 4.77 9.77 11.63 13.41 15.67

2 0 5.19 6.16 7.16 8.41 13.95 21.44 24.00 26.32 29.24 2 1 3.91 4.60 5.29 6.17 10.30 16.43 18.59 20.64 23.15

3 0 14.40 16.00 17.47 19.36 27.08 36.76 39.92 42.83 46.41 3 1 12.85 14.23 15.54 17.19 24.08 32.97 35.92 38.57 41.97 3 2 7.82 8.77 9.64 10.78 15.75 22.64 25.05 27.25 30.00

4 0 27.60 29.86 31.81 34.31 44.16 56.11 59.88 63.27 67.64 4 1 25.94 28.06 29.89 32.18 41.48 52.80 56.47 59.74 63.97 4 2 20.75 22.50 24.06 26.05 34.15 44.30 47.50 50.57 54.44 4 3 12.06 13.16 14.19 15.51 21.19 28.78 31.42 33.81 36.88

NOTE. This table was construcled by Hcon Ji Park. See equation (10.3.48) for the definition of the statistics.

Page 675: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

652 APPENDIX A

Table 10A.9. Perattilee of Test StaWcs for Test of Unit Root of Moving Average

Sample Size n 0.00 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99

probability of B Smallex Value

*7 of (10.4.8)

25 -3.87 -3.47 -3.16 -2.83 -1.88 -1.28 -1.16 -1.08 -1.00 50 -3.71 -3.37 -3.09 -2.79 -1.89 -1.28 -1.16 -1.07 -0.98

100 -3.64 -3.32 -3.06 -2.77 -1.89 -1.28 -1.16 -1.06 -0.97 250 -3.59 -3.29 -3.04 -2.76 -1.90 -1.28 -1.16 -1.06 -0.97 500 -3.58 -3.28 -3.03 -2.75 -1.90 -1.28 -1.16 -1.06 -0.97 750 -3.57 -3.28 -3.02 -2.75 -1.90 -1.28 -1.16 -1.06 -0.96 W -3.56 -3.27 -3.02 -2.75 -1.90 -1.28 -1.16 -1.06 -O.%

* 7, of (10.4.15)

25 -4.51 -4.09 -3.75 -3.40 -2.39 50 -4.28 -3.93 -3.64 -3.33 -2.39

100 -4.17 -3.85 -3.58 -3.29 -2.39 250 -4.10 -3.80 -3.55 -3.27 -2.40 500 -4.08 -3.78 -3.54 -3.27 -2.40 750 -4.07 -3.78 -3.53 -3.26 -2.40 m -4.06 -3.77 -3.53 -3.26 -2.40

NOTE. This table was CollSlNcted by David A. Dickey.

-1.69 -1.55 -1.45 -1.70 -1.56 -1.45 -1.71 -1.56 -1.45 -1.72 -1.56 -1.45 -1.72 -1.56 -1.45 -1.72 -1.57 -1.45 -1.72 -1.57 -1.45

-1.34 - 1.34 - 1.33 -1.33 - 1.33 -1.33 -1.33

Page 676: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

A P P E N D I X B

Appendix 10.B. Data Used in Examples

Tab& 10.8.1. Serraonslly Adlestea Interest Rates ObS. u1 u2 u3

1 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

4.12808 4.1 1977 4.02260 4.08650 3.89452 3.29188 3.13677 2.78538 2.41740 2.34835 2.35063 2.00812 1.58808 2.68977 2.20260 1.66650 2.02452 1.70188 1.06678 1.80538 1.69740 2.13835 2.53M2 2.35812 2.27808 2.51977 2.88260 2.85650 2.33452

4.39345 4.10203 3.39690 3.23849 3.33345 2.52677 2.25655 2.15797 2.39310 2.29151 2.32655 2.18323 2.28345 2.56203 2.47690 2.29849 2.33345 2.39677 2.19655 2.24797 2.193 10 2.29151 2.43655 2.53323 2.76345 2.87203 2.80690 2.73849 2.72345

5.02055 4.47400 3.821 10 3.62040 3.65055 3.15180 2.95945 2.64600 2.71890 2.84960 2.79945 2.58820 2.70055 2.92400 2.901 10 2.81040 2.79055 2.85180 2.7 1945 2.73600 2.73890 2.82960 2.82945 2.91820 3.26055 3.38400 3.12110 2.97040 2.98055

653

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 677: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

654 APPENDIX B

Tabh lO.B.l.--Conrinlred

Obs. u 1 u 2 u 3

30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74

2.65188 2.61678 2.73538 2.71740 2.77835 2,85062 2.95812 3,04808 3.14977 3.16260 3.06651 3.04452 2.96188 2.92678 3.29538 3.29740 3.37835 3.39062 2.4081 1 3.61808 3.62937 3.61260 3.63651 3.54453 3.47 188 3.32678 3.30538 3.26740 3.23835 3.43062 3.8781 1 4.03808 4.12977 4.22260 4.2565 1 4.14453 4.01 188 3.99678 3.92538 3,82740 3.95835 4.01062 4.3481 1 4.55808

2.79677 2.87655 2.67797 2.69310 2.73 15 1 2.78655 2.80323 2.95345 3.06203 2.97690 3.90849 2.96345 3.05677 3.13656 3.17797 3.293 10 3.44151 3.47655 3.45323 3.56344 3.67203 3.62690 3.47849 3.52345 3.54677 3.41656 3.35797 3.44309 3.56151 3.59655 3.77323 3.85344 4.07203 4.01691 3.93849 3.93345 3.86677 3.79656 3.69797 3.83309 4.02151 4.04655 4.31323 4.63344

4.74977 4.79203

2.94180 3.09945 2.92600 2.84890 2.82960 2.86945 2.88820 3.07055 3.17399 3.1 1 110 3.10040 3.13055 3.16180 3.32946 3.32601 3.42890 3.53960 3.59945 3.63820 3.75054 3.88399 3.921 10 3.82040 3.78055 3.75180 3.56946 3.49601 3.58890 3.71960 3.78945 3.90820 3.98054 4.17399 4.161 10 4.07040 4.03055 3.94180 3.81946 3.78601 3.94890 4.08960 4.15945 4.50820 4.76054 4.98399

Page 678: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

655

Obs. Ul u2 u3

75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 1 0 101 102 103 104 105 106 107 108 109 110 I l l 112 113 114 115 116 117 118 119

4.83260 4.8365 1 4.94453 5.14188 5.20678 5.33538 5.21740 5.40835 5.68062 5.4281 1 5.07808 5.14977 4.71260 4.21651 3.98453 3.95188 3.69678 3.69538 3.8 1739 3.75835 4.03062 4.5381 1 4.73808

5.23260 5.92651 6.16453 6.04188 5.92678 5.83538 5.59739 5.79835 5.72062 6.048 1 1 6.43807 6.78977 6.97260 7.5765 1 8.71453 8.87188 8.51678 8.99538 8.96739 8.87835 8.76062

4.86977

4.67691 4.62849 4.68345 4.56677 4.75656 4.81797 5.28309 5.34151 5.27655 4.89323 4.76344 4.70203 4.34691 3.84849 3.64345

4.16656 4.12797 4.33309 4.55 15 1 4.68656 4.90323 5.04344 5.12203 525691 5.38849 5.70344 5.58677 5.26656 4.94797 5.10309 5.34151 5.40656 5.89323 6.18344 6.26203 6.10691 6.11849 6.08344 6.50677 6.95656 6.83797 7.0309 6.99151 7.19656

3.60677

4.95111 4.81040 4.92055 4.83180 4.86946 5.16601 5.65889 5.44960 5.41945 4.94819 4.68054 4.74399 4.321 11 3.97040 3.95055 4.21 181 4.82946 4.87601 4.95889 5.14960 5,31945 5.53819 5.37054 5.41399 5.561 11 5.54040 5.91055 5.73181 5.31946 4.97601 5.03889 5.25960 5.43945 5.9 18 19 6.14054 6.38399 6.341 11 6.1 1040 6.18055 6.93181 7.11946 7.10601 7.20889 7.13960 7.28945

Page 679: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

656 APPENDIX B

Table lO.B.l.+dued - -~ -

Obs. Ul u 2 u 3

1 20 121 1 22 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 1 62 I 63 164

8.99810 9.11807 9.12977 7.94260 8.26651 7.98453 7.57 188 7.1 1678 6.41538 6.10739 6.07835 5.51062 4.92810 4.27807 3.86977 3.89260 4.31651 4.67453 4.88188 5.21679 5.37538 5.36739 5.07835 4.82062 4.16810 3.63807 3.43977 4.0126 4.3365 4.3 145 4.4319 4.4568 4.6054 4.6874 4.9183 4.9706 5.3581 6.0781 6.7298 7.2726 7.2865 7.8845 8.4619

10.3068 10.3054

7.75323 7.91344 7.27203 6,71691 6.51849 6.88344 6.7467'7 6.40657 6.26797 6.04309 5.90151 5.23656 4.80322 4.48343 3.84203 3.4669 1 3.86849 4.18344 4.81678 5.35657 4.79798 4.60309 4.45 151 4.17656 3.94322 3.42343 3.34202 3.81691 3.7 1 849 3.73344 3.97678 3.93657

4.57309 4.73151 4.73656 5.00322 5.45343 5.74202 6.17691 6.26849 6.40344 7.25678 7.96657 8.52798

3.87798

7.56819 7.58054 7.22399 6.641 11 6.60040 7.19055 7.121 81 6.54947 6.37601 6.24889 6.15960 5.31945 4.78819 4.47053 4.01 399 3.741 11 4.16040 4.71055 5.38181 5.66947 5.34601 5.04889 4.67960 4.41945 4.34819 3.89053 4.23399 4.571 11 4.72040 4.53055 4.76181 4.82947 4.72601 5.29889 5.31960 5.12945 5.22819 5.65053 6.10399 6.67111 6.58040 6.70055 7.10181 7.89947 8.14601

Page 680: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPENDIX B 657

Table l&B.l.-CoRtinued

obs. u1 u2 u3

165 166 167 168 1 69 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 1 93 194 195 1% 197 198 199 200 201 202 203 204 205 206 207 208 209

10.5974 9.8883 9.9406 9.978 1 9.7881 9.1198 9.5326 10.6765 1 1.3545 11.9019 12.8268 11.8154 11.1574 9.9383 9.3606 8.5581 7.2681 6.3898 5.7226 5.6565 5.2645 5.5219 6.0068 5.9454 6.0574 5.6983 5.1306 5.2281 5.008 1 4.9198 5.0226 4.9865 5.3345 5.4519 5.2168 5.0954 5.0674 4.9083 4.8606 4.6781 4.7481 4.8298 4.8726 4.8965 5.3945

8.20309 7.21151 7.78656 7.38322 7.81343 7.26202 8.04692 8.33849 8.27344 7.96678 7.50657 8.81798 7.97308 7.45151 7.42656 7.08322 6.30343 5.64202 5.57692 5.61849 5.27344 5.40678 6.08657 6.29798 6.33308 5.95151 5.43656 5.37322 4.91343 5.02202 5.08692 4.86849 5.24344 5.47678 5.18657 4.99798 4.99308 4.91151 4.70656 4.28322 4.66343 4.81202 4.68692 4.54849 5.00344

7.92889 7.09960 7.32945 6.958 19 7.08053 6.68399 7.481 1 1 8.15040 8.28055 8.21181 7.96947 8.70601 8.37889 7,51960 7.21945 6.73819 6.34053 5.73399 5.841 1 1 6.47040 5.98055 5.91181 6.56947 6.98601 7.05889 6.40960 5.99945 6.10819 5.15053 5.70399 5.961 I1 5.61040 6.05055 6.17 18 1 5,74947 5.46601 5.35889 5.11960 4.92945 4588 19 5.07052 5.33399 5.331 11 5.17040 5.50055

Page 681: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

658 APPENDIX B

Tabie lOS.l.-Continued

ObS. u1 u2 u3

210 21 1 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 23 I 232 233 234 235 236

5.3619 5.3268 5.7054 5.9574 6.3483 6.4206 6.5881 6.8381 6.9298 6.9726 7.0565 7.4045 7.5719 7.7 168 7.8454 8.2674 8.8383 9.6706

10.0581 10.208 1 10.2098 10.2726 10.1765 10.2845 10.2619 10.3768 10.7454

5.08678 5.14658 5.34798 5.72308 6.15151 6.05656 6.00322 6.48342 6.59202 6.37692 6.29849 6.45344 6,79678 6.96658 6.93798 7.76308

8.59656 9.01322 9.39342 9.46202 9.56692 9.46849 9.65344 9.12678 9.19658 9.37798

7.9aisi

5.46181 5.49948 5.79601 5.98889 6.44960 6.44945 6.46819 6.87052 7.03399 6.961 12 7.03040 7.35055 7.58181 7.71948 7.55601 7.86888 8.37960 9.12945 9.38819 9.61052 9.56399 9.521 12 9.35040 9.34055 8.86181 8.79948 8.98601

SOURCE. Banking and Monetary Statistics: 1Wl-1970, Federal Reserve Systems. Washington, D.C., and Annual Statistical &gmt (1970- 1979). Federal Reserve Systems. Washington. D.C. Thcse data differ from those in the original sources because they are seasonally adjusted in a different manner.

Page 682: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPENDIX B 659

Table 10.B.2. Arti6dal Autoregressive Moving Average Data Time Observation Time Observation Time Observation

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

-0.589 -0.665

1.655 2.689 1.695

-2.183 -6.983 - 10.053 - 10.566 -8.129 -2.242

4.792 9.154 9.521 6.793 2.258

-2.285 -4.200 -2.712 -0.547

0.389 0.731

- 1.075 -2.228 -2.305 -0.519

1.788 4.267 5.913 5.483 3.745 2.017 0.3 1 8

- 1.804

35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67

-4.546 -5.403 -4.41 1 -3.709 -0,834

2.498 3.103

-0.698 -6.6% - 10.295 -8.954 -4.039

2.415 8.275

12.414 13.220 9.959 3.281

-3.934 -8.842 -9.985 -7.533 -3.458

2.159 8.960

14.563 1 5.020 9.890 3.059

-3.910 -7.551 -8.689 -7.357

68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 % 97 98 99

100

-5.698 -3.757 - 1.778 -0.637

1.473 3.078 3.157 3.902 5.557 6.295 4.099

-0.5 18 -5.132 -7.221 -6.819 -5.023 -3.389 -1.546

1.469 3.959 3.387 0.777

-2.547 -5.955 -8.043 -5.394 - 0.869

2.738 3.708 3.245

-0.757 -5.353 -8.213

Page 683: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

660 APPENDIX B

Table lO.B.3. Computer Generated Moving Average Data

Time Observation Time Observation Time Observation

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

0.0824 -0.7421 -2.5863 - 1.0847

0.08 1 I 2.1080 1.2283 0.2222 0.1139 0.9652 1.7643 22358 0.8803 0.8975 0.8376

-0.3025 - 2.5 194 -2.7883 -3.5046 -4.1382 -3.6002 - 1.5214

2.0135 3.2380 1.5426 1.1661 0.6807

-0.8016 - 1 S89O -1.1519 - 1.33%

0.0265 1.785 1 0.1%5

35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67

-0.2092 1.1027 0.7200

-0.1 186 -0.7560 -1.1736

0.1670 0.4029 1.4065

-0.6458 - 1.8765 - 1.6670 -0.9132 0.9964 1.6263 0.5360 1 .O236 0.7965 0.2285

-0.2926 -1.31 10

0.3846 1.9833

-0.1662 -0.0513

3,0565 1.9968 0.4235

-1.3013 - 1,235 1 - 1.7 102 - 1.2697 -0.6817

68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

100

-0.8659 -0.3166

1.2318 2.6148 I .3507 0.3786 0.0751

-0.1 134 -0.9633 -2.1476 - 1.6736

0.2056 2.3535 1.7980

-0.1434 - 1.47% - 1.4369 -0,7226

0.4259 1.3133 1.8455 0.5274

- 1.8385 -3.3115 - 1.7593

0.0670 0.4853 3.0867 1.7763 1.5428 0.2979

- 1.4703 - 1.4294

Page 684: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPENDIX B 661

Table 10.B.4, Computer Generated Data for Vector PIwm3!3

Time x, u, z, 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

-5.59 -6.85 -7.25 -6.72 -9.56 -8.20 -7.93 -8.85 -9.55 -7.56 -6.47 -4.98 -3.14 - 1.65 -0.78 -0.65

2.67 5.28 5.96 8.97

13.18 14.49 17.45 19.15 20.99 20.56 19.87 19.88 18.84 18.25 17.46 16.03 14.89 14.69 15.46 14.56 14.45 14.27 14.63 14.12 14.89 15.09 14.16 13.46

- 10.78 - 12.1 1 -13.21 - 12.48 - 12.73 - 13.50 -13.66 - 13.72 - 14.25 - 12.61 -9.67 -7.05 -4.43 -2.79 -3.03 -3.47 -0.30

5.08 8.73

12.29 15.86 19.61 21.46 22.98 23.74 23.45 22.11 21.98 21.47 20.14 18.61 17.23 16.32 14.99 15.41 15.60 14.62 14.44 14.98 16.56 16.39 16.30 15.56 13.34

-6.64 -7.91 -8.12 -6.14 -9.75 -8.82 - 8.06 -8.89 -9.98 -6.24 -4.13 -2.88 -1.05 -0.33 -0.98 - 1.00

5.21 9.59 8.88

1 1.83 16.03 17.49 18.93 20.37 21.60 20.33 18.80 19.77 18.44 17.19 16.24 14.93 14.17 13.62 15.80 14.71 13.66 14.13 15.06 15.38 14.75 15.02 13.57 11.68

Page 685: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

662 APPENDIX B

Time XI Y, Zl

45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88

13.96 12.93 10.99 9.5 I 9.63 8.58 8.00 5.16 3.46 4.60 3.10 3.53 5.53 5.20 6.41 6.20 6.30 7.63 8.95 9.25

1 1.25 15.56 17.16 19.77 19.82 24.38 23.36 24.19 25.47 23.57 22.30 22.44 23.50 23.66 25.15 26.57 27.17 28.03 28.43 27.36 27.87 27.17 26.52 24.90

12.44 11.53 11.53 10.19 8.48 7.14 5.00 3.39 I .33 1.13 1.14 2.69 4.21 4.70 4.57 5.29 5.93 7.08 9.84

1211 14.41 17.82 21.12 24.07 24.78 26.91 28.22 28.20 28.35 27.55 25.98 25.37 26.42 28.76 30.08 31.39 32.18 33.44 33.19 32.43 30.06 29-50 29.52 28.90

13.25 12.20 10.99 8.44 8.26 7.50 6.29 3.88 1.81 4.43 3.11 4.77 6.75 5.59 6.3 1 6.78 6.81 8.55

11.15 11.07 13.08 18.29 19.81 22.12 20.39 26.08 24-40 24.18 29.59 22.94 21.05 21.95 24.34 25.53 26.21 27.62 27.81 29.04 28.22 26.74 25.98 26.72 26.54 24.40

Page 686: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

APPENDIX B 663

Table 10.B.4 --Com*nued

Time X, Y, 2,

89 90 91 92 93 94 95 96 97 98 99

100

24.56 24.04 21.84 23.68 22.50 20.36 20.53 20.59 19.75 17.93 15.65 15.12

27.58 26.97 26.34 26.11 24.44 24.31 23.21 22.00 20.66 18.60 16.22 14.62

23.50 23.56 21.33 23.50 21.16 20.30 19.60 19.63 18.67 16.28 13.75 13.83

Page 687: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

Bibliography

Agiakloglou, C., and Newbold, P. (1992), Empirical evidence on Dickey-Fuller type tests. J. Time Series Anal. 13, 471-483.

Ahlberg, J. H., Nilson, E. N., and Walsh, J. L. (I%?), The Theory of S p l k s curd Their Applicutions. Academic Press, New York.

Aha, S. K., and Reinsel, G. C. (19881, Nested reduced-rank autoregressive models for multiple time series. J. Amer. Statist. Assoc. 83, 849-858.

Ahn, S. K., and Reinael, G. C. (19901, Estimation for partially nonstationary multivariate autoregrdve models. J. Amer. Statist. hsoc. 85, 813-823.

Ahtola, J., and Tim, G. C. (19841, Parametor inference for a nearly nonstationary first order autoregressive model. Biometrika 71, 263-272.

Ahtola, J.. and Tiao, G. C. (1987), Distributions of least squares estimators of auto- regressive parameters for a process with complex roots on the unit circle. J. Time Series Anal. 8, 1-14.

Aigner, D. J. (1971). A compendium on estimation of the autoregressive moving average model from time series data. Inr. Econ, Rev. 12, 348-371.

Akaike, H. (1969a), Fitting autoregressive models for prediction. Ann. Inst. Stutist. Math. 21, 243-247.

Akaike, H. (1969b). Power spectrum estimation through autoregressive model fitting. Ann. Iwt. Statist. Math. 21, 407-419.

Akaike, H. (19731, Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrih 60, 255-265.

Akaike, H. (1974), Markovian representation of stochastic processes and its application to the analysis of autoregressive moving average prwesses. Ann. Inst. Slanist. Math. 26,

Amemiya, T. (1973). Generalized least square8 with an estimated autocovariance matrix. Econometrica 41, 723-732.

Amemiya, T.. and Fuller, W.A. (1%7), A comparative study of alternative estimators in a distributed lag model. Econometrica 35, 509-529.

Amemiya, Y. (1990). On the convergence of the ordered roots of it sequence of detednantal equations. Linear Algebra Appl. 127, 531-542.

Amos, R. E., and Koopmans, L. H. (19631, Tables of the distribution of the coefficient of coherence for stationary bivaciate Gaussian processes. Sandia Corporation Monograph SCR-483, Albuquerque, New Mexico.

Anderson, B. D. O., and Moore, J. B. (1979). Optimal Filtering. Prentice-Ha& Englewood Cliffs, New Jersey.

Anderson, R. L. (1942), Distribution of the serial correlation coefficient. Ann. Moth. Smisz.

664

363-387.

13, 1-13.

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 688: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BIBLIOGRAPHY 665

Anderson, R. L., and Anderson, T. W. (1950), Distribution of the circular Serial correlation

Anderson, T. W. (1948). On the theory of testing serial correlation. S h d . Akfuariedskri#

Anderson, T. W. (1952). Asymptotic theory of certain ‘‘goodness of fit” criteria based on

Anderson,T. W. (1959),On asymptotic distributionsofestimates ofparameters ofstochastic

Anderson, T. W. (1962), The choice of the d e p of a polynomial regression as a multiple

Anderson, T. W. (1971), The Statistical Analysis of Time Series. Wiley, New Yo& Anderson. T. W. (1984), Introducrion to Multivariate Statistical Analysis (2nd 4.). Wiley,

New York. Anderson, T. W., and DarJing, D. A. (1952), Asymptotic theory of certain “goodness of fit”

criteria based on stochastic processes. Ann. Math. Statist. 23, 193-212. Anderson, T. W., and Takemula, A. (1986), Why do noninvertible estimated moving

averages occur? J. Time Series Anal. 7 , 235-254. Anderson, T. W., and Taylor, J. B. (1979), Strong consistency of least squares estimators in

dynamic models. Ann. Statist. 7, 484-489. Anderson, T. W., and Walker, A. M. (1964). On the asymptotic distribution of the

autocorrelations of a sample from a linear stochastic process. Ann. Math. Statist. 35, 12%- 1303.

Andrews. D. W. K. (1987), Least squares r e p s i o n with integrated or dynamic regressors under weak error assumptions. Econometric Theory 3, 98-116.

Arellano. C. (1992). Testing for trend stationarity versus difference stationarity. Un- published thesis, North Carolina State University, Raleigh, North Carolina.

Arellano, C.. and Pantula, S. G., (1995). Testing for trend stationarity versus difference stationarity. J. Time Series Anal. 16, 147-164.

Ash, R. B. (1972). Red Analysis and Probability. Academic Press, New Yo& Ball, R. J., and St. Cyr, E. B. A. (1966), Short term employment functions in British

Bartle, R. G. (1964). The Elemem of Real Analysis. Wiley, New York. Bartlett, M. S. (1946). On the theoretical specification and sampling properties of

autocorrelated time series. Suppl. J , Roy. Statist. SOC. 8, 27-41. Bartlett, M. S. (19M), Periodogram analysis and continuous spectra. Biometrih 37, 1-16. Bartlett, M. S. (1966), An Introduction to Stochastic Processes with Special Reference to

Bartlett, M. S . (19781, An Introduction to Stochastic Processes (3rd ed.). Cambridge

Basawa, I. V. (1987), Asymptotic distributions of prediction errors and related tests of fit for

Basmann, R. L. (1957). A generalized classical method of linear estimation of coefficients

M s h , N., and Priestly, M. B. (1981), A study of AR and window spectral estimation.

coefficient for residuals ftom a fitted Fourier Series. Ann. Math. Srurist. 21, 59-81.

31, 88-116.

stochastic processes. Ann. Math. Statist. 23, 193-212.

difference equations, Ann. Math. Stat& 30,676-687.

decision problem. Ann. Math. Statist. 33, 255-265.

manufaduring industry. Rev. Econom. Stud. 33, 179-207.

Methods and Applications (2nd ed.). Cambridge University Press, Cambridge.

University Press. Cambridge.

nonstationary processes. Ann. Statist. 15, 46-58.

in a structural equation. Econometrics 25, 77-83.

Appl. Statist. 30, No. 1.

Page 689: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

666 BIBUOGRAPHY

Bell, W. R. (1983), A computer program (TEST) for detecting 0ui:ws in time series. 1983 Proceedings of the Bvsiness and Economics Section, h r . Siatist. Assoc,, 634-639.

Bellman, R. (1960). Inrrodwtion to Matrix Analysis. McGraw-Hill, New York. Beran, J. (1992). Statistical methods for data with long-range dependence. Sf&t. Sci. 7,

Beran, J., and Ghosh, S. (lWl), Slowly decaying comlations, testing normality, nuisance

Berk, K. N. (1974). Consistent autoregressive spectral estimates. Ann. Statist. 2,489-502. Bernstein, S . (1927). Sur I’extension du thdorhe limite du calcul des probabiiWs aux

sommes de quantitibs d6pndantes. Math. Ann. 97, 1-59. Bhansali, R. J. (1980). Autoregremive and window estimates of the inverse correlation

function. Biometrika 67, 55 1-566. Bhansali, R. J. (1981), Effects of not knowing the order of an autoregressive process I. J.

Amer. Statist. Assoc. 76, 588-597. Bhansali, R. J. (1991), Consistent recursive estimation of the order of an autoregmsive

moving average process. Int. Statist. Rev. 59, 81-96. Bhansali, R. J. (1993). Order selection for linear time series models: a review. In

Developments in Time Series Analysis. (T. Subba Rao, ed.) Chapman and Hall, New

Bhargava, A. (1986), On the theory of testing for unit roots in observed time series. Rev.

Billinglsey, P. (1968). Convergence of Probdlity Measures. Wiley, New York. Biliingsley, F? (1979). ProWility and Measure. Wiley, New Yo&. Birnbaum, Z. W. (1952). Numercial tabulation of the distribution of Kolomogorov‘s statistic

Blackman, R. B., and Tukey. J. W. (1959). The Measurement of Power S’ctrafTom the

Bloomfield. P. (1976). Fourier Analysis of Time Series: An Introduction. Why, New Yo& Bollerslev, T,, Chou, R. Y., and Kroner, K. F. (1992), ARCH modeling in finance: A review

of the theory and empirical evidence. J. Econometrics 52, 5-59. Box, G. E. P. (1954), Some theorems on quadratic forms applied in the shdy of analysis of

variance problems, I. Effect of inequality of variance in the one-way classification. Ann. Math. Statist. 2!4, 290-316.

Box, G. E. P., Jenkins, G. M., and Reinsel. G. C. (1994), Time Series Analysis: Forecusting und Control. Prentice Hall, Englewood Cliffs, New Jersey.

Box, G. E. P., and Pierce, D. A. (1970), Distribution of residual autoconelations in autorepsive-integrated moving average time series models. J. Amer. Statist. Assoc.

Box, G. E. P., and Tiao, G. C. (1977). A canonid analysis of multiple time series. Biometrika, 64, 355-365.

Breidt, P. J., and CarriqUj., A. L. (1W4), Stochastic volatility models: an application to S & P 500 Futures. Unpublished manuscript, Iowa State University, Ames, Iowa.

Breitung, J. (1994), Some simple tests of the moving-average unit root hypothesis. J. Time Series Anal. 15, 351-370.

Brillinger, D. R. (1975). Erne Series: Data Analysis and Theory. Holt, Rinehart and

404-416.

parameters. J. Amer. Sratist. Assoc. 86, 785-791.

YO&. 50-66.

&on. Stud. 53, 369-384.

for finite sample size. J, Amer. Starist. Assoc. 47, 425-441.

Point of View of Communications Engineering. Dover, New York.

65, 1509-1526.

Page 690: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BIBLIOGRAPHY 667

Winston, New York.

Springer-Verlag, New York. Brockwell, P. J., and Davis, R. A. (1991), Time Series: Theory and Methods (2nd ed.).

Brown, B. M. (1971), Martingale central limit theorems. Ann. Math. Srurist, 42, 59-66. Brown, B. M., and Hewitt, J. I. (1975). Asymptotic likelihood theory for diffusion

processes. J. Appl. Prob. 12, 228-238. Brown, R. G. (1962). Smoothing. Forecasting, and Prediction of Discrete Rme Series.

Prentice-Hall, Englewood Cliffs, New Jersey. Bruce, A. G., and Martin, R. D. (1989). Leave-&out diagnostics for time series (with

discussion). J. Roy. Statist. SOC. B 51, 363-424. Burg, J. P. (1975). Maximum entropy spectral analysis, Ph.D. thesis, Stanford Univesity,

Stanford, California. Burguete, J., Gallant, A. R., and Souza, 0. (1982), On unification of the asymptotic theory

of nonlinear econometric models. Econ. Rev. 1, 151-190. Burman, J. P. (1965), Moving seasonal adjustment of economic time series. J. Roy. Statist.

Soc. A 128, 534-558. Carmll, R. J., and Ruppert, D. (1988). Trahsfoormation and Weighting in Regression.

Chapman and Hall, New York. Chan, N. H. (1988). On the parameter inference for nearly nonstationary time series. J.

Amer. Statist. Assoc. 83, No. 403, 857-862. Chan, N. H.. and Tsay, R. S. (1991), Asymptotic inference for non-invertible moving

average time series. Unpublished manuscript. Carnegie Mellon University, Pittsburgh, Pennsylvania.

Chan, N. H., and Wei, C. Z. (1987), Asymptotic inference for nearly nonstationary AR(1) processes. Ann. Statist. 15, 1050-1063.

Chan, N. H., and Wei, C. Z. (1988), Limiting distributions of least square estimates of unstable autoregmssive processes. Ann. Sturist. 16, 367-41.

Chang, I., Tiao, G. C., and Chen, C. (1988), Estimation of time wries parameters in the presence of outliers. Technometrics 30, 193-204.

Chang, M. C. (1989), Testing for overdifference. Unpublished thesis, North Carolina State University, Raleigh, North Carolina.

Chernoff, H. (1956), Large sample theory: parameteric case. Ann. Math. Stutisr. 27, 1-22. Chung, K. L. (1968). A Course in Probubility Theory. Harcourt, Brace and World, New

Cleveland, W. S. (1971). Fitting time series models for prediction. Technometrics 13,

Cleveland, W. S. (1972), The inverse autocorrelations of a time series and their applications. Technometrics 14, 277-293.

Cochran, W. T., et al. (1%7), What is the fast Fourier transform? IEEE T r m . Audio and Electroacoustics AU-15, No. 2, 45-55.

Cochrane, D., and Orcutt, 0. H. (1949), Application of least squares regression to relationships containing autocorrclated error terms. J. Amer. Srafisr. hsoc . 44.32-61.

Colley, I. W., Lewis, P. A. W., and Welch. P. D. (1%7), Historical notes on the Fast Fourier Transfm. IEEE Trans. Audio and Electroacoustics AU-15. No. 2, 76-79.

Cox, D. D. (lWl), Gaussian likelihood estimation for nearly nonstationary AR(1) processes. Ann. Statist. 19, 1129-1142.

York.

713-723.

Page 691: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

668 BIBLIOGRAPHY

Cox, D. R. (1977). Discussion of papers by Campbell and Walker and 2 and Morris, J.

Cox, D. R. (19841, Long-range dependence: a review. In Statistics: An Appraisal ( H . A.

Cox, D. R., and Miller, H . D. (1%5), The Theory of Stochusric Processes. Wiley, New

Cram&, H. (1940). On the theory of stationary random procasses. Ann. Math. 41,215-230. Cram&, H. (1946), Mathematical Methods of Statistics. Princeton URiversity Press,

Crowder, M. J. (1980), On the asymptotic properties of lwt squares estimators in

Cryer, 1. D., and Ledolter, J. (1981). Small sample propities of the maximum likelihood

Cryer, J. D.. Nankervis, J, C., and Savin, N. E. (1989), Mirror-image distributions in AR(1)

Dahlhaus, R. (1989), Efficient parameter estimation for self-similar processes. Ann. Statist.

Davies, R. 8. (1973), Asymptotic inference in stationary Gaussian time-series. Adv. Appl. Prob. 5. 469-497.

Davis, H. T. (1941), The Analysis of Economic Time Series. Rincipia Press, Bloomington, Indiana,

Davisson, L. D. (1965). The prediction error of stationary Gaussian time series of unknown covariance, IEEE Trans. Ittjorm. Theory 11, 527-532.

W a c i e , J. S., and Fuller, W. A. (1972), Estimation of the slope and analysis of covariance when the concomitant variable is measured with error. J. Amer, Sfarist. Assoc. 67, 930-937.

DeJong, D. N., Nankervis, J. C., Savin, N. E., and Whiteman, C. H. (1992a), Integration versus trend-stationarity in macroeconomic time serics. Economefrica 60,423-434.

DeJong, D. N., Nankervis, J. C., SavJn, N. E., and Whiteman, C. H. (t992b). The power probfems of unit root tests for time series With autoregressive errors. J. Econometrics

Denby, L., and Martin, R. D. (1979). Robust estimation of thc fvst order autoregressive

Deo, R. (1995), Tests for unit roots in mdtivruiate autonpssive praesses. Rh.D.

Dhrymes, P. J. (1971), Distributed Lugs: Probiem of Estimation and FonnrJation.

Diananda, F? H. (1953)' Some probability limit theorems with staristical applications. Proc.

Dickey, D. A. (1976). Estimation and hypothesis testing in nonstationary time series. Ph.D.

Dickey, R, A., Be& W. R., and Miller, R. B. (1986), Unit roots series model: Tests and

Dickey, D. A,, and Fuller, W. A. (1979). Distribution of the estimators for autoregressive

Roy. Stafist. SOC. A 140, 453-454.

David and H. T. David, eds.). Iowa State University Press, A m , Iowa.

York.

Princeton, New Jersey.

autoregression. Ann. Statist. 8, 132-146.

estimator in the MA( 1) model. Biometrika 68, 691-694.

models. Econometric Theory 5, 36-52.

I

17, 1749-1766.

53, 323-343.

parameter. J. Amer. Statist. Assoc. 74, 140-146.

dissertation, Iowa State University, Ames, Iowa.

Holden-Day, San Francisco.

Cambridge Philos. Soc. 49,239-246.

thesis, Iowa State University, Ames, Iowa.

implications. Amer. Statistician 4, 12-26.

time series with a unit root. J. Amer. Statist. Assoc. 74, 427-431.

Page 692: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BIBLIOGRAPHY 669

Dickey, D. A., and Fuller, W. A. (1981), Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica 49, 1057-1072.

Dickey, D. A., Hasza, D. P., and Fuller, W. A. (19&4), Testing for unit roots in seasonal time series. J. Amer. Statist. Assoc. 79. 355-367.

Dickey. D. A., and Pantula, S. G. (1987), Determining the order of differencing in autoregressive processes. J. Business and Econ. Statist. 5, 455-461.

Dickey, D. A., and Said, S. E. (1982), Testing ARIMA (p, 1, q) vmus ARMA (p f 1,q). In Applied Time Series Analysis (0. D. Anderson, ed.). North Holland, Amsterdam.

Didemch, G. T. (1985), The Kalman filter from the perspective of Goldberger-lbil estimators. Amer. Statistician 39, 193-198.

Diebold, F. X., and Nerlove, M. ( l W ) , Unit mots in economic time series: a selective survey. Econometrica 8, 3-69.

Draper, N. R., and Smith, H. (1981). Applied Regression AnaJysis (2nd 4.). Wiley, New Yo*.

Dufour, J. M., and King, M. L. (1991), Optimal invariant tests for the autocorrelation coefficient in linear regressions with stationary or nonstationary AR(1) errors. J. Econometrics 47, 115-143.

Duncan, D. B., and Horn, S. D. (1972), Linear dynamic recursive estimation from the viewpoint of regression analysis. J. h r . Statist. Assoc. 67, 815-821.

Durbin, J., (1959), Efficient estimation of parameters in moving-average models. B k t -

W i n , J. (1960), Estimation of parameters in time-series regression models. J. Roy. Srcltist.

Durbin, J. (1%1), Efficient fitting of linear models for continuous stationary time series from discrete data. Bull. Int. Statist. Inst. 38, 273-281.

Durbin, J. (1962). Trend elimination by movingaverage and variate-difference filters. Bull. Int. Statist. Inst. 39, 131-141.

Durbin, I. (1%3), Trend elimination for tbe purpose of estimating seasonal and periodic components of time series. Pmc. Symp. Time Series Analysis, Brown University (M. Rosenblatt. ed.). Wiley, New York, 3-16.

Durbin, J. (1%7), Tests of serial independence based on the cumulated periodogram. Bull. Int. Statist. Inst. 42, 1039-10119.

Durbin, J. (1%8), The probability that the sample dishibution function lies between two parallel straight lines. Ann. Math. Statist. 39, 398-411.

Durbin, J. (1%9). Tests for serial correlation in regression analysis based on the periodogram of least-squares residuals. Biometrika 56, 1-15.

Durbin, J. (1973), Distribution Theory for Tests Based on the Sample Distribution Function. Regional Conference Series in Applied Mathematics No. 9, SLAM, Philadel- phia.

hubin, J., and Cordero, M. (1993), Handling structural shifts, outliers and heavy-tailed distributions in state space time series models. Unpublished manuscript, London School of Economics and Political Science, London.

Durbin, J., and Watson, G. S. (1950). Testing for serial conelation in least squares regression, I. BiometriRa 37. 409428.

r i b 46, 306-316.

SW. B 22, 139-153.

Page 693: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

670 BIBLIOGRAPHY

Rurbin, J., and Watson, G. S. (1951), Testing for serial *lion in least squares regression, II. Biomehika 38, 159-178.

Dvorekky, A. (1972), Asymptotic normality for sums of d e p c h n t mdom variables. Sixth Berkeley Symp. Math. Statist. Prob. 2, University of California Press, Berkeley, 5 t 3-535.

Eberlain, E. (1986), On strong invariance principles under dependence assumptions. Ann. Probab. 14, 260-270.

Eicker, F. (1963). Asymptotic normality and consistency of the least squares estimators for families of linear regressions. Ann. Math. Sratist. 34, 447-456.

Eicker, F. (19671, Limit theorems for regressions with unequal and dependent errors. Proc. Fzph Berkeley Symp. Math. Statist. Prob. 1, University of California Ress, Berkeley,

Eisenhart, C., Hastay, M. W., and Wallis, W. A. (1947), Selected Techniques of Statistical Analysis. McGraw-Hill, New York.

Elliott, G. (1993), Efficient tests for a unit mot when the initial observation is &awn from its unconditional distribution. Unpublished manuscript. Harvard University, Cam- bridge, Massachusetts,

Elliott, G., Rothenberg, T. J., and Stock, J. H. (1992), Efficient tests for M autoregressive unit root Unpublished paper presented at the NBER-NSF Time Series Seminar, Chicago.

Elliott, G., and Stock, J. H. (1992), Efficient tests for an autoregressive unit root. Paper presented at the NBER-NSF Time Series Seminar, Chicago.

Eltinge, J. L. (1991), Exponential convergence propeaies of autocovariance matrix inverses and latent vector prediction coefficients. Unpublished manuscript, Department of Statistics, Texas A & M University, College Station, Texas.

Elton, C., and Nicholson, M. (1942), The ten-year cycle in numbers of lynx in Canada. J .

Engle, R. P. (1982), Autoregressive conditional heteroscedasticity with estimates of UK inflation. Econornetrica 50, 987-1007.

Engle, R. F., and Bollemlev (1986). Modelling the persistence of conditional variances. Econometric Reviews 5, 1-50.

Engle, R. F., and Granger, C. W. I. (1987). Co-integration and error comction: representa- tion, estimation, and testing. Econornetrica 55, 251-276.

Engle, R. F.. and Kraft, D. F. (19811, Multiperiod forecast M o r variances of inflation estimated from ARCH models. P m . MA-Census-NBER Conference on Applied Time Series Analysis of Economic Dafa (A. ZeIlner, 4.). 176-177.

Evans, G. B. A., and Savin, N. E. (198la), The calculation of the limiting distribution of the least squares estimator of the parameter in a random walk model. Ann. Srarist. 9,

Evans, G. B. A., and Savin, N. E. (1981b), Testing for unit roots: 1. Ecomtnerrica 49,

Evans, G. B. A., and Savin, N. E. (1984), Testing for unit roots: 2. Economtrica 52,

Ezekiel, M., and Fox, K. A. (1959). Methods of Correlation and Regression Analysis.

59-82.

Anilnal ECO~O~Y 11, 215-244.

1114-1118.

753-777.

1241- 1249.

Wiley, New York.

Page 694: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BIBLIOGRAPHY 671

Fieller, E. C. (1932). The distribution of the index in a n o d bivariate population.

fieller, E. C. (1954). Some problems in interval estimation. J. Roy. Statist. Soc. B ZQ,

Fidley, D. F. (1980), Large sample behavior of the S-army of seasonally nonstationary ARMA series, In Time Series Analysis (0. D. A n h n and M. R. Perryman, ed~.), North Holland, Amsterdam, 163-170.

Fidley, D. F. (1985). On the unbiasedness property of AIC for exact or approximating linear stochastic time series models. J. Tirsre Series Anal. 6, 229-252.

Findley, D. F., and Wei, C. Z. (1989). Beyond chi-square: likelihood ratio ptmxdwe~ for comparing non-nested possibly incorrect regressors. Unpublished manuscript, U.S. Bureau of the Census, Washington.

Finkhiner. D. T. (1960). Introduction to Mutrices and Linear Transformations. W. H. Fheman. San Francisco.

Fisher, R A. (1929). Tests of significance i0 harmonic analysis. Proc. Roy. Soc. London Ser. A 125, 54-59.

Fishman, G. S. (1969). Spectral Methods in Econometrics. Harvard University Press, Camkdge, Massachusetts.

Fountis, N. G. (1983). Testing for unit mots in multivariate automgressions. Unpublished PkD. thesis, North Carolina State Univesity, Raleigh, North Carolina.

Fountis, N. G., and Dickey, D. A. (1989), Testing for a unit root nonstationarity in multivariate autorcgressive time series. Ann. Statist. 17, 419-428.

Fox, A. J. (1972), Outliers in time series. J. Roy. Statist. Soc. B 34, 350-363. Fox, R., and Taqqu, M. S. (1985). Noncentd limit theorems for quadratic forms in random

Fox, R., and Taqqu, M. S. (1986). Large sample properties of parameter estimates for

Franklin, J. N. (1968), Matrix Theory. Prentice-Hall, Englewood Cliffs, New Jersey. Friedman, M., and Schwiuiz, A. J. (1%3), A Monetury History of the United Stutes

Fuller, W. A. (1969), Grafted polynomials as approximating functions. Australion J . Agr.

Fuller, W. A. (1979), Testing the autoregressive process for a unit root. Paper presented at

Fuller, W. A. (1980). The use of indicator variables in computing predictions. J.

Fuller, W. A. (1984). Nonstationary autoregressive time series. in H u h & of Statistics, 5, Time Series in the Time Domain (E. J. Hannan, P. R. Krishnaiah, and M. M. Rao, eds.).

Biometrika 24, 428-440.

175-185.

variables having long-range dependence. Ann. Prob. 13, 428-446.

strongly dependent stationary Gaussian time series. Ann. Statist. 14, 517-532.

2867-1960 Princeton Univeristy Press, Princeton, New Jersey.

Econ. 13, 35-46.

the 42nd Session of the International Statistical Institute, Manila.

E C O I W F ? W ~ ~ ~ ~ 12, 231-243.

Noah Holland, Amsterdam, 1-23. Fuller, W. A. (1987). Meusurement Error Models. Wiley, New Yo&. Fuller, W. A., and Hasza, D. I? (1980), Predictors for the first-order autoregressive process.

Fuller, W. A., and Hasza, D. P. (1981), Properties of predictors for autoregressive time J. Econometrics 13, 139-157.

series. J. Amer. Sm'st. Assoc. 76, 155-161.

Page 695: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

672 BIBLIOGRAPHY

Fuller, W. A.. Has% D. P., and Cbabel, 1. J. (1981). Estimation of the parameters of

Fuller, W. A,. and Martin, J. E. (1961). The effects of autocomlatcd emrs on the statistical

Galbraith, R F., and Gailbraith, J. I. (1974), On the invmes of some p a t t d matrim

Gallant, A. R. (1970, Srarisrical Inference for Nonlimar Regression Models. Ph.D. thesis,

Gallant, A. R. (19753, Nonlinear regression. Amer. Statistician 29, 73-81. Gallant, A. R. (1975b), The power of the likelihood ratio test of location in nonlinear

Gallant, A. R. (1987). Nonlinear Staristical Models. Wlley, New York. Gallant, A. R., and Fuller, W. A. (1973), Fitting segmented polynomial regression models

whose join points have to be estimated. J. Amer. Statist. Assoc. 68, 144-147. Gallant, A. R., and White, H. (1988). A Um@d Theory of Estimation and Inference for

Nonlinear Dynamic Models. Basil Blackwell, New York. Gay, R., and Heyde, C. C. (19901, 00 a class of random field models which allows long

range dependeuce. Bwmerrika 77,401-403. Geweke, J., and Porter-Hudak, S. (1983), The estimation and application of long memory

time series models. J. Time Series Anal. 4, 221-237. Giraitis, L., and Surgailis, D. (1990). A central limit theorem for quadratic forms in strongly

dependent linear variables and its application to asymptotic normaIity of Whittle’s estimate. Prob. Theory and Related Fields 86, 87-104.

Gnedenko, B. V. (1967). The Theory of Probability ( m l a t e d by B. D. *Her). Chelsea, New York.

Goldberg, S . (1958), Inrmduction ro Di@ereme Eqrrations. Wiley, New York. Gonzalez-Farias, G. M. (1992), A new unit root test for autorepsive time series. Ph.D,

dissertation, North Carolina State University, Raleigh, North Carolina. Gonzalez-Fatias, G., and Dickey, D. A. (1 W), An unconditional maximum iikelihoad teat

for a unit mot. 1W Proceedings of the Business and Economic Statistics Section,

Gould, I. P., and Nelson, C. R. (1974). The stochastic strucm of the velocity of money.

Gradshtcyn, I. S., and Ryzhik, I. M. (1%5), (4th ed.). Tables of Integmh, Series and

Granger, C. W. J. (1980), Long memory relationships and the aggregation of dynamic

Granger, C. W. J. (1981), Some properties of time series data and their use in econometric

Granger, C. W. J., and Andersen, A. P. (1978), An Inrroahction fo Bilinear Tim Series

Granger, C. W. J.. and Hatanaka, M. (1!364), Spectre1 Analysis of Economic Time Series.

Granger, C. W. J., and Joyeux, R. (1980). An introduction to long-range time series models

stochastic difference equations. Ann. Stafisc. 9, 531-543.

estimation of distributed lag models. J . Farm Econ. 43, 71-82.

arising in the theory of stationary time series. J. Appl. Prob. 11, 63-71.

Iowa State University, Ames, Iowa.

regression models. J. Amer. Statist. Assoc. 70, 199-203.

Amer. SUrisr. ASSOC. 139-143.

Amer. Econ. Rev. 64, 405-417.

Producrs. Academic Press, New York.

models. J. Econometrics 14, 227-238.

model specification. J. Econometrics 16, 121-130.

Models. Vandenhoeck and Ruprecht, Giittingen.

Princeton University Press, Princeton, New Jersey.

and hctional differencing. J. Time Series Anal. 1, 15-30.

Page 696: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BIBLIOGRAPHY 673

Granger, C. W. I., and Teriisvirta, T. (1993), Modelfing Nonlinear Economic Relationships. Oxfd University Press, Oxford.

Granger, C. W. J., and Weiss, A. A. (1983), Time series analysis of error correction models. In Studies in Economic Time Series and Multivariate Statistics ( S . Karlin, T. Amemiya and L.A. Goodman. eds.), Academic hess, New York, 255-278.

Grenander, U. (1950), Stochastic processes and statistical inference. Ark. Mar. 1,195-275. Grenander, U. (1954). On the estimation of regression coefficients in the case of an

Grenander, U. (1957), Modern trends in time series analysis. Smkhya 18, 149-158. Grenander. U., and Rosenblatt, M. (1957). Statistical Anafysis of StatioMly Time Series.

Grether, D. M., and Nerlove, M. (1970). Some properties of "optimal" seasonal

Oreville, T. N. E. (1969). Theory and Applications of Spline Functions. Academic Press,

Griliches, Z. (1%7), Distributed lags: a swey. Econumetrica 35, 16-49. Grizzle, J. E., and Allen, D. M. (1969), Analysis of growth and dose response curves.

Gumpertz, M. L., and Pantula, S. G. (1992). Nonlinear regmsion with variance com-

Guo, L., Huang, D. W., and Hannan. E. J. (1990), On ARX(m) approximation. J.

Hall, A. (1992). Teating for a unit root in time series with pretest data-based model

Hall, P., and Heyde, C. C. (1980), Marringale Limit Theory and its Application. Academic

Hampel, F. R., Ronchetti, E. M., Rwsseeuw, P. J., and Stahel, W. A. (1986). Robusr

Hannan, E. J. (19561, The estimation of relationships involving distributed lags. Ecorw-

Hannan, E. J. (19581, The estimation of spectral density after trend removal. J. Roy. Sfarist.

Hannan, E. J. (1960), Time Series Anualysis. Methuen, London. Hannan, E. J. (19611, A central limit theorem for systems of regressions. Proc. Cambridge

Hannan, E. J. (1963). The estimation of seasonal variation in economic time series. J .

Hannan, E. J. (1%9), A note on an exact test for trend and serial correlation. Economerricu

Hannan, E. J. (1970). Multiple Time Series. Wiley, New York. Hannan, E. I. (1971). Nonlinear time series regression, 1. Appl. Prob. 8, 767-780. Hannan, E. J. (1979). The central limit theorem for time series regression. Stock Processes

Hannan, E. J. (1980). The estimation of the order of an ARMA process. Ann. Statist. 8,

autoconrelated disturbance. Ann. Math. Statist. 25,252-272.

Wiky, New York.

adjustment. Econometrics 38, 682-703.

New York.

Biometrics 25, 357-381.

ponents. J. Amer. Starist. Assoc. B7, 201-209.

Multivariate Anal. 32, 17-47.

selection. J. Business and Econ. Statist. 12,461-470.

Press, New York.

Statistics: The Approach Based on Influence Functions. Wiley, New York.

metrica 33,206-224.

SOC. B 20, 323-333.

Phil. SOC. S7, 583-588.

Amer. Statist, Assoc. 58, 31-44.

37, 485-489.

Appl. 9, 281-289.

107 1 - 108 1.

Page 697: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BIBLIOGRAPHY 674

Hannan, E. J. (1987), Rational transfer function approximation, Statist. Sei. 2, 135-161. Hannan, E. J., and Deistler, M. (19881, The Statistical Theory of Linear Systems. Wiley,

Hannan, E. J., Dunsmuir, W. T. M., and Deistler, M. (1980), Estimation of vector ARUAX

Hannan, E J., and Hey&, C. C. (1972), 00 limit theorems for quadratic funtions of discrete

Hannan, E. I., and Kavalietis, L. (1984). A method for autoregressive-moving average

Hannan, E. J., and Nicholls, D. E (1972), Thc estimation of mixed regression, autoregm-

Hannan, E. J., and Q u b , B. G. (1979), The determination of the order of an autoregres-

Hannan, E. J., and Rissanen, J, (1982), Recursive estimation of mixed autoregressive-

Hansen, J., and Lebedeff, S. (19871, Global trends of measured surface air temper-. J.

Hansen, J., and L.ebedeff, S. (1988), Global surface air temperatures: update through 1987.

Hansen, M. H., HurwitZ, W. N., and Madow, W. G. (1953). Sump& Survey Merhocis and

H a ~ ~ i s . B. (1967). Spectral Analysis of Time Series. Wiley, New York. Hart, B. I. (1942). Significance levels for the ratio of the mean square successive difference

Hllrtley, H. 0. (l%l), The modified Gauss-Newton method for the fitting of nonlinear

Hartley, H. O., and Booker, A. (1965), Nonlinear least squares estimation. Ann. Math.

Harvey, A. C . (1984), A unified view of statistical forecasting procedures (with comments), J. Forecasting 3, 245-283.

Harvey, A. C. (1989). Forecasting. Structural Time Series Models and the Kalnuur Filter. Cambridge University Press, Cambridge.

Harvey, k C., and Durbin, J. (1986), The effects of seat belt legislation on British road casualties: a case study in structural time series modelling (with discussion). J. Roy. Statist. Soc. A 149, 187-227.

Harvey, A. C., and Pierse, R. G. (1984). Estimating missing observations in economic time series. J . Amer. Statist. Assoc. 79, 125-131.

Harvey, A. C.. Ruiz, E.. and Shephard, N. (1994), Multivariate stochastic variance models. Rev. Econ. Stud. 61, 247-264.

Harville, D. A. (1986). Using ordinary least squares software to compute combining intra-interblock estimates of treatment contrasts, Amer, Statistician 40, 153-157.

Haslett, J., and Raftery, A. E. (1989). Space-time modelling with long-memory depen- dence: assessing Ireland's wind power resource (with discussion). J. Roy. Statist. Soc. c 38, 1-21.

New York.

models. J. Multivariate AM^. 10, 275-295.

time series. Ann. Math. Statist, 43, 2058-2066.

estimation. Biometrih 72, 273-280.

sion, moving average and distributed lag models. Econotnetrica 48, 529-548.

sion, J. Roy. Statist. Soc. B 41, 190-195.

moving average order. Biometrika 69, 81-94. Correction (1983). 70, 303.

Geophys. Res. 92, 13345-13372.

Geophys. Res. k i t . 15, 323-326.

Theov, 11. Wiley, New York.

to the variance. Ann. Math. Statist. 13, 445-447,

regression functions by least squares. Techtwmetrics 3. 269-230.

Statist. 36, 638-650.

Page 698: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BlBLNXiRAPHY 675

Hasza, D. P. (19771, Estimation in nonstationary time series. Ph.D. thesis, Iowa State University, Ames, Iowa.

Hasza, D. P. (1980), A note on maximum likelihood estimation for the first-order autoregressive process. Corn. Srarisr. Theory and Merhods A9, No. 13, 141 1-1415.

Hasza, D. P., and Fuller, W. A. (1979), Estimation for autoregressive processes with unit roots. Ann. Statist. 7 , 1106-1 120.

Hasza, D. P., and Fuller, W. A. (1982), Testing for nonstationary parameter specifications in seasonal time series models. Ann. Sfatist. 10. 1209-1216.

Hatanaka, M. (1973), On the existence and the approximation formuIae for the moments of the k-class estimators. Econ. Smd. Quurr. 24, 1-15.

Hatanaka, M. (1974), An efficient two-step estimator for the dynamic adjustment model with autoregressive errors. J. Economerrics 2, 199-220.

Hermdorf, N. (1984). A functional central limit theorem for weakly dependent sequences of random variables. Ann. Prdi. 12, 141-153.

Higgins. M. L., and Bera, A. K. (1992). A Class of Nonlinear ARCH Models. Inf. &on. Rev. 33, 137-158.

Hildebrand, F. B. (1968), Finite-Difference Equurions Md Simulations. Prentice-Hall, Englewood Cliffs, New Jersey.

Hildreth, C. (1969), Asymptotic distribution of maximum likelihood estimators in a linear model with autoregressive disturbances. Ann. Math. Srarisr. 40, 583-594.

Hillmer, S. C., Bell, W. R., and Tiao, G. C. (1983). Modeling considerations in the seasonal adjustment of economic time series. In Applied Time Series Analysis of Economic Duta (A. Zelher, ed.). United States Bureau of the Census, Washington.

Hinich, M. (1982), Testing for Gaussidty and linearity of a time series. 1. Time Series

Hoeffding. W., and Robbins, H. (1948), The central limit theorem for dependent random

Hosking, J. R. M. (1981). Fractional dif'ferencing. Bbmetriku 68. 165-176. Hosking, 1. R. M. (1982), Some models of persistence in time series. In Time Series

Analysis: Theory und Practice I (0. D. Anderson, 4.). North-Holland, Amsterdam. Huang, D., and Anh, V. V. (1993). Estimation of the nonstationary factor in ARUMA

models, J. Time Series Anal. 14, 27-46. Huber, P. J. (1981), Robust Stutisrics. Wiley, New York. lbragimov, I. A., and Linnik, Y. V. (19711, Independent and Stationary Sequences of

Jacquier. E., Polson, N. G., and Rossi, P. E. (1994). Bayesian analysis of stochastic

Jenkins, G. M. (1%1), General considerations in the analysis of spectra. Technometrics 3,

Jenkins, G. M., and Watts, D. G. (1968). Spectral Analysis and its Application. Holden-

Jennrich, R I. (1%9), Asymptotic properties of non-linear least squares estimators. Ann.

Jobson. J. D. (1972). Estimation for linear models with unknown diagonal covariance

AMl. 3. 169-176.

variables. Duke Math. J. 15, 773-780.

Random Variables. Wolters-Noonibff, Groningen.

volatility models. J. Business und Econ. Sfatisr. 12, 371-389.

133-166.

Day, San Francisco.

Math. Stutist. 40, 633-643.

matrix. Unpublished R.D. thesis, Iowa State University, Ames, Iowa.

Page 699: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

676 BIBLIOGRAPHY

Jobson, J. D., and Fuller, W. A. (1980). Least squares estimation when the covariance matrix and the parameter vector are functionally related. J. Amer. Statist. Assoc. 75,

Johansen. S. (1988), Statistical analysis of cointegration vectors, J. Eon, DyMmics and Control 12,231-254.

Johansen, S . (1991h Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrca 59, 1551- 1580.

Johansen, S., and Juselius, K. (1990), Maximum liLelihood estimation and inference on cointegration-with applications to the demand for money. Oxford Bull. &on. and Srarisr. 52, 169-210.

176-1 81.

Johnston, J. (1984). Econometric Methods (3rd ad.). McGraw-Hill, New York. Jolley, L. B. W. (1%1), Swnmation of Series (2nd ed.). Dover, New Yo&. Jones, D. A. (1978), Nonlinear autoregressive processes. Proc. Roy. Soc. Am, 71-95. Jones, R. H. (1980), Maximum likelihood fitting of ARMA models to time series with

missing observations. Technometrics 22, 389-395. Jorgenson, D, W. (1964). Minimum variance, linear, unbiased seasonal adjustment of

economic time series. J. Amer. Statist. Assoc. 59, 681-724. Judge, G. C., Griffiths, W. E., Hill, R. C., uithkepohl, H., and Lee, T-C. (1985), & Theory

and Practice of Econometrics. Wiley, New Yo&. Kalman, R. E. (19601, A new approach to linear filtering and pradiction pblems. T m .

ASME J. Basic Eng. 82, 35-45. Kalman, R. E. (19631, New methods in Wiener filtering theory, In Proceedings ofthe First

Symposium on Engineering Applications of Random Function Theory and Probability (J. L. Bodanoff and F. Kazin, eds.). Wiley, New York.

Kalman. R. E., and Bucy, R. S. (1961), New results in linear filtering and prediction theory. Trans. Anter. Soc, Mech. Eng. J . Basic Eng. 83, 95-108.

ICawashima, H. (1980), Parameter estimation of autoregressive integrated processes by least squares. Ann. Statist. 8, 423-435.

Kempthome, O., and Folks, L. (1971), Probability, Statistics and Data Analysis. Iowa State University Press, Ames, Iowa.

Kendall, M. G. (1954). Note on bias in the estimation of autocornlation. Biomerrih 41, 403-404.

Kendall, M. 0.. and Stuart, A. (1966), The Advanced Theory of Statistics 3. Hafner, New York.

King, M. L. (1980), Robust tests for spberical symmetry and their application to least squares regression. Ann. Stafist. 8, 1265-1271.

King, M. L. (1988), Towards a thsory of point optimal tcsting. Econ. Rev, 6, 169-218. Kitigawa, G. ( 1987), Non-Gaussian state-space modelling of nonstationary time series (with

Klimko, L. A., and Nelson, P. I. (1978), On conditional least squares estimation for

Kohn, R., and Ansley, C. P. (1986), Estimation, prediction, and interpolation for ARIMA

Koopmans, L. H. (1974). 7% Specrral Analysis of Time Series. Academic Press, New Yo&.

discussion). J. Amer. Statist. Assoc. 82, 1032-1063,

St~haStiC P~OCWS~S. Ann. StW'St. 6, 629-642.

models with missing data. J. Amer. Statist. Assoc. 81, 751-761.

Page 700: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BIBLIOGRAPHY 677

Koopmans, T. (1942). Serial cormlation and quadratic forms in nonnal variables. Ann. Marh. Statist. 13, 14-33.

Koopmans, T. C., Rubin, H., and bipnik, R. B. (1950), Measuring the equation systems of dynamic economics. In Statistical Inference in DyMmic Economic Models (T. C. Koopmans, 4.). Wiley, New York.

Kramer, K. H. (1963). Tables for constructing confidence limits on the multiple comlation coefficient. J. Amer. Statist. Assoc. 58, 1082-108s.

Kriimer, W. (1986). Least squans regression when the independent variable follows an ARIMA process. J. Amer. Sratist. Assoc. 81, 150-154.

Kreiss, J. P. (1987). On adaptive estimation in stationary ARMA processes. Ann. Statist. 15,

Lai, T. L., and Siegmund, D. (1983). Fixed accuracy estimation of an autoregressive parameter. Ann. Statist. 11, 478-485.

Lai, T. L.. and Wei, C. 2. (1982rt), Asymptotic properties of projections with applications to stochastic regression problems. J. Multivariare A d . 12, 346-370.

Lai, T. L., and Wei, C. 2. (1982b), Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Srarist. 10, 154-166.

Lai, T. L., and Wei, C. 2. ( 19824, A note on martingale difference sequences satisfying the local Marcinkiewicz-Zygmund condition. Columbia University Tech. Report, New Yo&.

Lai, T. L., and Wei, C. Z. (19856, Asymptotic properties of general autoregressive models and strong consistency of least squares estimates of them parameters. J. Multivariate

Lai. T. L.. and Wei. C.Z. (198Sb). Asymptotic properties of multivariate weighted sums with applications to stochastic regression in linear dynamic systems. In Multivariate Analysis VZ (P. R. Krishnaiah. ed.). North-Holland, Amsterdam, 375-393.

Lighthill, M. J. (1970). Introduction to Fourier Analysis and Generalised Functwns. Cambridge University Press, Cambridge.

Little, R. J. A., and Rubin, D. B. (1987). Statistical Analysis with Missing Data. Wiley. New Yo&.

Liviatan, N. (1963), Consistent estimation of distributed lags. Znt. Econ. Rev. 4, 44-52. Ljung, G. M. (19931, On outlier detection in time series. J. Roy. Stutisr. Soc. B 55.

Ljung, G. M., and Box, G. E. P. (1978), On a measure of lack of fit in time series models.

Lohe, M. (1%3), Probability Theory (3rd ed.). Van Nostrand, Princeton, New Jersey. Lomnicki, 2 A., and Zaremba, S. K. (1957), On estimating the spectral density function of

a stochastic process. J. Roy. Statist. Soc. B 19, 13-37. LoveU, M. C. (1%3), Seasonal adjustment of economic time series and multiple regression

analysis. J. Amer. Statist. Assoc. 58, 993-1010. MacNeil, 1. B. (1978), Properties of sequences of partial sums of polynomial regression

residuals with applications to tests for change of regression at unknown times. Ann. Stafist. 6. 422-433.

1 12- 133.

Anal. 13, 1-23.

559-567.

Biomefrika 72, 299-315.

Page 701: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

678 BIBLIOGRAPHY

Macpherson, B. D. (1981), Properties of estimators for the parameter of the first order

Macphersm, B. D., and Fuller, W. A. (1983), Consistency of the least squares estimator of

Magnus, J. R., and Neudeclrer, H. (19881, Matrix Difserential Culculur with Applications in

Malinvaud, E. ( 197Oa), Stutisfical Methods in Econometrics. North-Holland, Amsterdam. Malinvaud. E. (1970b). The consistency of non-linear regressions. Ann. Murh. Statist. 41,

Mallows, C. L. (1973). Some comments on C,, Technomerrics 15, 661-675, Mandelbrot, B.B. (1969). Long-run linearity, locally Gaussian process. H-spectra and

Mandelbrot, B. B., and Van Ness, J. W. (1968), Fractional Brownian motions, fractional

Mann, H. B., and Wald, A. (1943a), On the statistical tnatment of linear stochastic

Mann, H. B., and Wald, A. (1W3b), On stochastic limit and order rclafionships. Ann. Math.

Marden, M. (1966). Geometry of Poljmomials. Amer. Math. Soc., Providence, Mode Island.

Mariott, F. H. C.. and Pope, J. A. (1954), Bias in the estimation of autocomlations. Biometrika 41, 390-402.

Marple, S. L. (1987). Digital Specrral Analysis with Applications. Prentice-Hall, En- glewoad Cliffs, New Jersey.

Martin, R. D. (1980), Robust estimation of autoregressive models, In Directions in Time Series (D. R. Brillinger and G. C. Tiao, eds.), Jnstimte of Mathematical Statistics, Hayward, California, 228-254.

Martin. R. D., and Y0hai.V. J. (1986), Influence functionals for time series. Ann. Statist. 14,

McClave, J. T. (1974), A comparison of moving average estimation procedures. Comm.

McDougall, A. J. (1994), Robust methods for autoregressive moving average estimation. J.

McLeish, D. L. (1974), Dependent central limit theorems and invariance principles. Ann.

McLeish, D. L. (1975), A maximal inequality and dependent strong Iawa. Ann. Prob. 3,

McNeilI, 1. B. (1978). Properties of sequences of partial sums of polynomial regression residuals with applications to tests for change of regression at unknown times. Ann. Statist. 6, 422-433.

Meinhold. R. J.. and Singpurwalla, N. D. (1983). Understanding the Kalmau filter. Amer. Stutistician 37, 123-127.

Melino, A., and Turnbull, S. (1990), Pricing foreign cumncy options with stochastic volatility. J. Ecom~zr ics 45, 239-265.

Miller, K. S. (1963). Linear Di@erential Eqwtions in the Real Domain. Norton, New York.

moving average process. B.D. Thesis, Iowa State University, Ames, Iowa.

the first order moving average parameter. Ann. Sfutisz. 11, 326-329.

Stutistics and Econometrics. Wiley, New York.

956-969.

infinite variances. Inr. &on. Rev. 10, 82-111.

noises and applications. S I N Rev. 10, 422-437.

difference equations. E c o m t r i c a 11, 173-220.

Sintist. 14, 217-226.

781-818.

Statist. 3, 865-883.

ROY. Statlkt. SOC. B 56, 189-207.

Pmb. 2, 620-628.

829-839.

Page 702: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BIBLIOGRAPHY 679

Miller, K. S. (1%8), Linear Diference Equations. Benajmin. New York. Moran, P. A. P. (1947). Some theorems on time series I. BIometrika 34, 281-291. Moran. P. A. P. (1953), The statistical analysis of the Canadian lynx cycle, I: structure and

Morettin, P. A. (1984), The Levinson algorithm and its applications in time series analysis.

Muirhead, C. R. (1986), Distinguishing outlier types in time series. J. Roy. Statisr. SOC. B

Muirhead, R. J. (1982), Aspects of Multivariate Statistical Theory. Wiley, New York. Nabeya, S., and Tanaka, K (1990a). A general approach to the limiting distribution for

estimators in time series regression with nonstable autongressive errors. Econornetrica

Nabeya, S., and Tan&, K. (1990b). Limiting powers of unit root tests in time series regression. J. Econometrics 46, 247-271.

Nagaraj, N. K. (1990). Results on estimation for vector unit root processes. Unpublished manuscript, Iowa State Univeristy, Ames, Iowa,

Nagaraj, N. K., and Fuller, W. A. (1991), Estimation of the parametets of linear time series models subject to nonlinear restrictions. Ann. Stm'st. 19, 1143-1 154.

Nagaraj, N. K., and Fuller, W. A. (1992), Least squares estimation of the linear model with autoregressive errors. In New Directions in Time Series Analysis, Part I (D. Brillinger et al., 4s.) . IMA Volumes in Mathematics and Its Applications, Vol. 45, Springer, New

Nankervis, J. C., and Savin, N. E. (1988). The Student's t approximation in a stationary first order autoregressive model. Ecotwmetrica 56, 119-145.

Narasimham, G. V. L. (1969), Some properties of estimators occurring in the theory of linear stochastic processes. In k c w e Notes in Operations Research and Mathematical Economics (M. Beckman and H. P. Kiinzi, eds.). Springer, Berlin.

Nelson, C. R. (1974). The first-order moving average process: identification, estimation and prediction. J. Econometrics 2, 121-141.

Nelson, C. R., and Plosser, C. I. (1982), Trends and random walks in macroeconomic time series: some evidence and implications. J. Monetary &on. 10, 129-162.

Nerlove, M. (1964a), Notes on Fourier series and analysis for economists, Multilith prepared under Grant NSF-GS-I42 to Stanford University, Palo Alto, Caliiornia.

Nerlove, M. (1964b). Spectral analysis of seasonal adjustment procedures. Econornetrica

Newbold, P. (1974), The exact likelihood function for a mixed autorepsive-moving

Newey, W. K., and West, K. D. (19871, A simple, positive definite, heteroscedasticity and

Newton, H. I. (1988). Timcskrb: A Time Series Analysis Laboratory. Wadsworth and

Newton, H. J., and Pagano, M. (1984). Simultaneous confidence bands for autoregressive

Nicholls, D. E (1976). nK efficient estimator of vector linear time series models.

prediction. Australian J. Zuol. 1, 163-173.

Int. &tist. Rev. 52, 83-92.

48,3947.

58, 145-163.

Ywk, 215-225.

32, 241-286.

average process. Biometrika 61, 423-426.

autocorrelation consistent cov&ance matrix. Econometdca 55, 703-708.

BmldCole , Pacific Grove, California.

spectra. Biometrikn 71, 197-Un.

Biometrika 63, 381-390.

Page 703: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

680 BIBLIOGRAPHY

Nicholls, D. F.. Pagan, A. R., and Terrell, R. D. (1975). The estimation and use of models with moving average disturbance terms: a survey. Inr. Econ. Rev. 16, 113-134.

NichoUs, D. F., and Quinn, B. G. (1982). Random CoefJicient Autoregressive Models: An Introduction, Springer Lecrure Notes in Statistics, No. 11. Springer, New York.

Nold, F. C. (1972). A bibliography of applications of techniques of spectral analysis to economic time series. Technical Report No. 66. Institute for Mathematical Studies in the Social Sciences, Stanford University, Stanford, Caiifmia.

Nyblom, J., and Makelainen, T. (1983), Comparisons of tests for the presence of random walk coefficients in a simple linear model. J. Amer. Statist. Assoc. 78, 856-864.

Olshen, R. A. (1967), Asymptotic properties of the periodogram of a discrete stationary process. J . Appl. Prob. 4, 508-528.

Orcutt, G. H., and Winokur, H. S. (1969), Fust order autoregression: inference. estimation, and prediction. Econometricu 37, 1-14.

Otnes, €2. K., and Enochson, L. (1972)) Digitul Tim Series Analysis. Wiley, New Yo&. Otto, M. C., and Bell, W, R. (1990). Two issues in time series outlier detection using

indicator variables. 1990 Proc. Business and Economics Section. h e r . Statist. Assoc.,

Ozaki, T. (1980). Non-linear threshold models for non-linear random vibrations. J. Appl.

Pantula, S . 0. (1982), Roperties of the parameters of auto-regressive time series. PhD.

Pantula, S. G. (1986), On asymptotic properties of the least squares estimators for

Pantula, S. G. (1988a), Statistical time series. Unpublished manuscript for Statistics 613,

Pantula, S. G. (1988b), Estimation of autoregressive models with ARCH emrs. Sankhjk

Pantula, S. G. (1989). Testing for unit roots in time series data Economm*c Theory 5,

Pantula, S . G., and Fuller, W. A. (1985), Mean estimation bias in least squares estimation of

Pantula, S. G., and Fuller, W. A. (1993), The large sample distribution of the roots of the

Pantula, S . G.. Gonzalez-Paris, 0.. and Fuller, W. A, (1994), A comparison of unit root

Park. H. J. (1990), Alternative estimators of the parameters of the autoregressive process.

Park, H. J., and Fuller, W. A. (1995), Alternative estimators and unit mot tests for the

Park, J., and Phiilips. I? C. B. (1988), Statistical inference in regressions with integrated

Park, J., and Phillips, P. C. B. (1989), Statistical inference in regressions with integrated

Panen, E. (1957). On consistent estimates of the spectrum of a stationary time series. AM.

I

182-187.

Prob. 17, 84-93.

Thesis, Iowa State University. Arnes, Iowa.

autoregressive time Series with a unit root. Sankhyil Ser. A 48, 208-218.

North Carolina State University, Raleigh, North Carolina.

Ser. B, SO, 1 1 9-138.

256-27 1.

autoregressive processes. J. Econometrics 27, 99-121.

second order autoregressive polynomial, Biometrika SO, 919-923.

criteria. J. Business and &on. Stat&. 13, 449-459.

Ph.D. dissertation, Iowa State University, Amea, Iowa.

autoregressive process. J. Time Series Anal. 16, 415-429.

processes: Part I. Econometric Theory 4, 468-497.

processes: Part 11. Econometric Theory 5, 95-131.

Math. Statist. 28, 329-348.

Page 704: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BIBLIOGRAPHY 681

P m n , E. (1958). conditions that a stochastic process be ergodic. Ann. Math. Statist. 29,

Parzen, E. (1%1), Mathematical considerations in the estimation of spectra. Technometrics

Parzen, E . (1962). Stochtic Processes. Holden-Day, San Francisco. Panen, E. (1%9), Multiple time series modeling. In Multivariate Analysis ZZ (P. R.

Krishnaiah, ed.), Academic Press, New Yak. Panen, E. (1974). Some recent advances in time series modeling. IEEE Trans. Automatic

Control AC-19, 723-729. P m n , E. (1977). Multiple time wries modeling: determining the order of the approxhat-

ing autoregressive scheme. In Mulrivariare Analysis N (P. R. Krishnaiah, ed.). North

Peiia, D., and Guttman, I. (1989). Bayesian approach to robustifying the Kalman Blter. In Bayesian Analysis of Time Series and Dynamic ModeLr (J. C. Spall, ed.). Marcel Dekker, New Yo& 227-253.

Perron, P. (1989), The calculation of the limiting distribution of the least-squares estimator in a near-integrated model. Economerric Theory 5, 241-255.

Phillips, P. C. B. (1979). The sampling distribution of forecasts from a firstader autoregression. J. Econometrics 9, 241-262.

Phillips, I! C. B. (1986), Understanding spurious regressions in econometrics. J. Econo-

Phillips, P. C. B. (1987a), Time series regression with a unit root. Econometrica 55,

Phillips, P. C. B. (1987b), Towards a unified asymptotic theory of autoregression. Biometrika 74, 535-548.

Phillips, P. C. B. (t987c), Asymptotic expansions in nonstationary vector autoregressions. Econometric m0Iy 2.45-68.

Phillips, P. C. B. (1988). Weak convergence to rhe matrix stochastic integral .fi B dB'. J.

Phillips, P. C. B. (1991), Optimal inference in cointegrated systems. Econometrica 59,

Phillips, P. C. B. (1994). Some exact distribution themy far maximum likelihood estimators of cointegrating coefficients in error correction models. Econometrica 62, 73-93.

Phillips, P. C. B., and Durlauf, S. N. (1986), Multiple time series regression with integrated processes. Rev. Econ. Stud. 53, 473-496.

Phillips, P. C. B., and Hansen, B. E. (1990). Statistical inference in instrumental variables regression with l(1) processes. Rev. Econ. Stud. 57, 99-125.

Phillips, P. C. B., and Ouliaris, S. (lW), Asymptotic properties of residual based tests for cointegration. Econometrica 58, 165- 193.

Phillips, P. C. B., and Perron, P. (1988). Testing for a unit root in time series regression. Biometrika 75. 335-346.

Phillips, P. C. B., and Solo, V (1992), Asymptotics for linear processes. Ann. Srurisr. 20,

299-301.

3, 167-190.

HolIand, Amsterdam, 283-295.

metrics 33, 311-340.

277-301.

Multivariate Anal. 24, 252-264.

283-306.

971-1001.

Page 705: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

682 BlBLlOORAPHY

Pierce, D. A. (1970a), A duality between amrepss ive aMf moving average procegses concerning their least squares parmeter estimates. Ann. Math. Statisr. 41, 422-426.

Pierce, D. A. ( 1970b). Rational distributed lag models with autorepsive-moving average errors. 1970 Prw. Business and Economics Statistics Section, Amer. Statist. Assoc.

Pierce, D. A. (1971). Least squares estimation in the r e p s i o n model with autoregressive-

Pierce, D. A. (1972). Least s q u m estimation in dynamic-disturbance time series models.

Plosser, C. I., and Schwett, 0. W. (I977), Estimation of noninvextible moving average

Poirier, D. J. (1973), Piecewise regression using cubic splines. 1. Amer. Starist. Assoc. 68,

Pollard, D. (1984), Convergence of Stwhastic Processes. Springer, New Yo&. Potscher, B. M., and h h a , 1. R (lwla), Basic structure of the asymptotic theory in

dynamic nonlinear econometric models, part I consistency and approximation concepts, Ecowmeic Rev. 10, 125-216.

Potscher, B. M., and &ha, I. R. (1991b). Basic structure of the asymptotic theory in dynamic nonlinear econometric models, part II: asymptotic normality. Economerric Rev. 10, 253-325.

Pratt, J. W. (1959), On a genml concept of “in probability.” Ann. Math. Statist. 30,

Prest, A. R. (1949). Some experiments in demand analysis. Rev. Econ. and Statist. 31,

Priestly, M. B. (1381), Specrd Analysis and Time Series. Academic Press, New York. Priestly, M. B. (1988). Non-linear and Non-stationary E m Series Analysis. Academic

Quenouille, M. H. (1957). Tlre Analysis of Multiple Time Series. H a , New York. Quinn, B. G. (1982), A note on the existence of strictly stationary solutions to bilinear

equations. J. Times Series Anal. 3, 249-252. Rao, C. R. (1959), Some problems involving lineat hypothesis in multivariate analysis.

Biometrika 46,49-58. Rao, C. R. (1965). The theory of least squares when the parameters are stochastic and its

apptication to the analysis of growth curves, Bwmrrika 52, 4 7 - 4 8 . Rao, C. R. (1%7), Least squares thcory using an estimated dispersion matrix and its

application to meamment of s ip is . In Fifth Berkeley Symp. Math. Statist. Prob. 1,

Rao, C. R. (1973), Linear Statistkal Inference and its Applications (2nd ed.). Wiley, New

Rao, J. N. K., and Tintnet, G. (1963). 00 the variate difference method. Australian J .

Rao, M. M. (1%1), Consistency and limit distributim of estimatom of parameters in

Rao, M. M. (lW8a). Asymptotic distribution of an estimator of the boundary parameter of

591-596.

moving average e m . Biometrikia 58, 299-312.

Biometrika 59, 73-78,

process: the case of overdifferencing. J. Economtrics 6, 199-224.

515-524.

549-558.

33-49,

Press, Saa Diego.

355-372.

York.

Statist. 5, 106-116.

explosive stochastic difference equations. Ann. Math, Statist. 32, 195-218.

an unstable process. Ann. Stat&. 6, 185-190. Correction (1980), 8, 1403.

Page 706: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BIBLIOGRAPHY 683

Rao, M. M. (1978b), Covariance analysis of nonstationary time series. In Developments in

Reeves, J. E. (t972), The distribution of the maximum likelihood estimator of the parameter

Reinsel. G. C. (1993), Elements of Multivariate Time Series Analysis. Springer, New Yo& Reinsel, 0. C., and Ahn. S. K. (1989). Asymptotic distribution of the likelihood ratio test

for cointegration in the nonstatiomy vector AR model. Unpublished report, University of Wisconsin, Madison.

Robinson, P. M. (1972). Nonlinear regression for multiple time series. J. &PI. Prob. 9,

Robinson, P. M. (1994a). Semiparametric analysis of long-memory time series. Ann. Srutisz.

Robinson, P. M. (1994b). Efficient tests of nonstationary hypotheses. J. Amer. Stat&.

Rogosinski, W. (1959). Fourier Series (translated by H. Cohn and F. Steinhardt). Chelsea,

Rosenblatt, M. (ad.) (1963). Proc. Synrp. Time Series Analysis, Brown University. June

Rothenberg, T. J. (1984). Approximate nmality of generalized least squares estimates.

Royden. H. L. (1989), Real Anafysis (3rd ed.). Macmillan, New York. Rubin, D. B. (1976), inference and missing data. Biometriku 63. 581-592. Rubin, H. (1950), Consistency of maximum-likelihood estimates in the explosive case. In

Statistical Inference in Dynamic Economic Models (T. C. Koopmans, ed.). Wiley, New Yo&.

Rutherford, D. E. (1946). Some Continuant Determinants Arising in Physics and Chemistry. Proceedings of the RUM Sociev of Edinburgh, Sect. A . 62. 229-236.

Sage, A. P. (1968). Oprimum System Control. Prentice-Hall, Englewood Cliffs. New Jersey.

Said, E. S.. and Dickey, D. A. (1984). Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 71, 599-607.

Saikkonen, P. and Luukkonen, R. (1993), Testing for a moving average unit root in autoregressive integrated moving average models. J. Amer. Stutist. Assoc. 88, 596- 601.

Salem, A. S. (1971), Investigation of alternative estimators of the parameters of auto- regressive processes. Unpublished M.S. thesis, Iowa State University, Ames, Iowa.

Sallas, W. M., and Harville, D. A. (1981). Best linear recursive estimation for mixed linear models. J. Amer. Statist. Assoc. 76, 860-869.

Samaranayake, V. A., and Hasm D. P. (1983). The asymptotic properties of the sample autocorrelations for a multiple autoregressive process with one unit root. University of Missouri. Rolla, Missouri.

Sanger, T. M. (19921, Estimated generalized least squares estimation for the heterogeneous measurement error model. PhD. thesis, Iowa State University, Am-, Iowa.

Sargan, J. D. (1958). The estimation of economic relationships using instnunental variables. Econometrica 26, 393-415.

Statistics (P. R. Krishnaiah, ed.). Academic Press, New York.

in the first-order autoregressive series. Biumenih 59, 387-394.

758-768.

22. 515-539.

ASSOC. 89, 1420-1437.

New Yo&

11-14, 1%2, Wiley, New York.

Econometrica 52, 811-825.

Page 707: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

684 BIBLIOGRAPHY

Sargan, J. D. (1959). The estimation of relationships with autocoxrelated residuals by the

Sargan, J. D., and Bhargava, A. (1983), Testing residuals from least squares regression for

Sarlcar, S. (1990), Nonlinear least squares estimators with differential rates of convergence.

SAS (1984), SASIETS User’s Gu&,Version 5 ed., SAS Institute, Cary, N.C. SAS Institute Inc. (1989). SASISTAT User’s Guide, Version 6 (4th ed.), Vol. 2. SAS

Scheffi, H. (1970), Multiple testing versus multiple estimation. Improper confidence sets.

Schmidt, P., and Phillips, F? C. B. (1992). Testing for a unit root in the presence of

Schwartz, G. (1978), Fjstimating the dimension of a model. Ann. Statist. 6, 461-464. Schwartz, G. W. (l989), Tests for unit rods: a Monte Carlo investigation. J. Business and

Econ. Statist. 7 , 147-160. Scott, D. J, (1973), Central limit theorems for martingales and for processes with stationary

increments using a Skorokhod representation approach. A&. Appl. Prob. 5. 119-137. Sen, D. L., and Dickey, D. A. (1987). Symmetric test for second differencmg in univariate

time Series. J. Business and Econ. Statist, S, 463-473. Shaman, P., and Stine, R. A. (1988), The bias of autoregressive coefficient estimators. J.

Amer. SWist. Assoc. (13, 842-848. Shibata, R. (1976). Selection of the order of an autoregressive model by Akaike’s

information critwion. Biometrika 63. 117-126. Shibata, R. (1986), Consistency of model selection and parameter es t idon . In Essays in

Time Series and Allied Processes (J. M. Gani and M. B. Priestley, eds.), Appl. prob, Ttust, Sheffield, 126-141.

Shin, D. (1990). Estimation for the autoregressive moving average model with a unit root. Ph.D. thesis, Iowa State University, Ames, Iowa.

Shin, D., and Fuller, W. A. (1991), Estimation for the autoregressive moving average with an autoregressive unit root. Unpublished manuscript, Iowa State University, Ames, Iowa.

Shively, T. S. (1988), An exact text for a stochastic coefficient in a time series regression model. J. T h e Series Anal. 9, 81-88.

Sims, C. A,. Stock, J. H., and Watson, M. W. (1990), Inference in linear time Saies models with some unit mots. Ecometricrr 58, 113-144.

Singleton, R. C. (1969). An algorithm for computing the mixed radix fast Fourier transform. IEEE Trans. Audw und Electroacoustics AU-17, No. 2, 93-103.

Solo, V. (1984), The order of differencing in ARIMA models. J. h r . Stitfist. Assoc. 79,

Sorendon, H. W. (1966). Kalrnan littering techniques. In Advances in Control Systems 3 (D. T. Leondis, 4.). Academic Ress. New York.

Stephenson, J. A.. and Farr, H. T. (1972), Seasonal adjustment of economic data by application of the general linear statistical model. J. Amer. Statist. Assoc. 67, 37-45.

use of instrumental variables. J. Roy. Statist. Soc. B 21, 91-105.

being generated by the Gaussian random walk. Econometrica 51, 153-174.

Ph.D. thesis, Iowa State University, Ames, Iowa.

Institute, Cary, North Carolina.

Estimation of directions and ratios. Ann. Math. Statist. 41. 1-29.

deterministic trends. Ogord Bull. .&on. and Statist. M, 257-288.

916-92 1.

Page 708: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BIBLIOGRAPHY 685

Stigum, B. P. (1974), Asymptotic properties of dynamic stochastic parameter estimates

Stigum, B. P. (1975), Asymptotic Properties of autoregressive integrated moving average

Stigum, B. P. (1976), Least squares and stochastic difference equations. J. Econometrics 4,

Stine. R. A., and Shaman, P. (1990), Bias of autoregressive spectral estimators. J. Amer.

Stock. J. H. (1987), Asymptotic properties of least squares estimators of cointegrating

Stock, J. H., and Wason, W. W. (1988), Testing for common trends. J. Amer. Statisr. Assoc.

Stout, W. F. (1%8), Some results on tbe complete and almost sure convergence of linear combinations of independent variables and martingale differences. Ann. Math. Sratist.

Strasser, H . (1986), Martingale difference arrays and stochastic integrals. Prob. Theory Md Related Fie& 72, 83-98.

Stuart, A,, and Ord, J. K. (1987), Kendall’s Advanced Theory of Stntistics, Vol. 1. Oxford University Press, New York.

Stuart. A., and Ord, J. K. (1991), Kendall’s Advanced Theory of Statistics, Vol. 2. Oxford University Press, New Yo&

Subba Rao, T., and aabr, M. M. (1984). An Introduction to Bispectral Analysis and Bilinear Time Series Models. Springer Lecture Notes in Statistics, Springer, New York.

Sweeting, T. J. (1980), Uniform asymptotic normality of the maximum likelihood estimator. Ann. Statist. 8, 1375-1381.

Tanaka, K (1990), Testing for a moving average unit root. Economerric Theory 6, 433-444.

Taqqu, M. S. (1988), Self-similar processes. In Encycbpediu of Sraristical Sciences, Vol. 8. Wiley, New York, 352-357.

Theil, H. (1961), Economic Forecusrs and Policy. North-Holland, Amsterdam. Thompson, L. M. (1969), Weather and technology in the production of wheat in the United

Thompson, L. M. (1973). Cyclical weather patterns in the middle latitudes. J. Soil and

Tiao, G. C., and Tsay, R. S. (1983), Consistency properties of least squares estimates of

Tintner, G. (1 940), TBe Variute Dflereme Method. Principia Press, Bloomington, Indiana. Tintner, G. (1952). Econometrics. Wiley, New York. Tolstov, G. P. (1%2), Fourier Series (translated by R. A. Silverman). Prentia-Hall,

Tong, H. (1981), A note on a Markov bilinear stochastic process in discrete time. J. Time

Tong, H. (1983). Threshold Models in Non-linear T h e Series Analysis. Lecture Notes in

(III). J. Multivariate Anal. 4, 351-381.

proce~se~. Stockarric P r ~ p e r t i e ~ Appl. 3, 315-344.

349-370.

Statist. ASSOC. 85, 1091-1098.

vectors. Econometrica 55, 1035-1056.

83, 1097-1107.

39, 199-1542.

States. J. Soil and Water Conservation 24, No. 6, 219-224.

Warer Conservation 28, No. 2, 87-89.

autoregressive parameters in ARUA models. Ann. Statist. 11. 854-871.

Englewood Cliffs, New Jersey.

Series Anal. 2, 279-284.

Statistics, No. 21. Springer, Heidelberg.

Page 709: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

6%6 BIBLIOORAPHY

Tong, H. (1990). Non-linear Time Series: A Llynamical System Approach. O x f d

T0rres.V. A. (1986), A note on teaching infinite moving avmges. Amer. Statistician 40,

Toyooka, Y. (1982). Second-urder expansion of mean squared enor matrix of generalized least squares estimator with estimated parameters. Biometrika 69, 269-273.

Toyooka, Y. (1990), No estimation effect up to the second order for the covariance parameter of GLSE and MLE in a regression with a general covariance structure, Math. Japonica 35, 745-757.

University Press, New Yo&.

40-4 1.

Tsay, R, S. (1986), Nonlinearity tests for time series. Biomerrika 73,461-466. Tsay, R. S. (1988), Outliers, level shifts and variance changes in time series. J, Forecasting

Tsay. R. S . (1993), Testing for noninvertible models with applications. J. Business and

Tucker, H. G. (1%7), A G m d w e Course in Probability. Academic Press, New Yo&. Tukey, J. W. (l%l), Discussion, emphasizing the connection between analysis of variance

and spectrum analysis. Technomeirics 3, 1-29. ulyrych, T. J., and Bishop, T. N. (1975). Maximum entropy spectral analysis and

autoregressive decomposition. Rev. Geophys. and Space Fhys. 13, 183-200. Van der h u g t e n , B. B. (1983), The asymptotic behavior of the estimated geaeralized least

squares method in the linear regression model. Statist. Neerlandicu 37, 127-141. Varadarajan, V. S. (1958), A useful convergence theorem. Sankhyd 20, 221-222. Venkataraman, K. N. (19671, A note on the least square estimators of the paramem of a

second order linear stochastic difference equation. Calcutta Statisr. Assoc. Bull. 16,

Venkarman, K. N. (1968), Some limit theorems on a linear stochastic difference equation with a constant term, and their statistical applications. Sankhyd Ser. A 30,51-74.

Venkataraman, K. N. (19731, Some convergence theorems on a second order linear explosive stochastic difference equation with a constant term. J. IndiM Statist. Assoc.

von Neumann, J. (1941). Distribution of ratio of the mean square successive difference to the variance. Ann. Marh. Statist, 12, 367-395.

von Neumann, J. (1942), A further remark concerning the distribution of the ratio of the mean square successive difference to the variance. Ann. Math. Statist. 13, 8 6 8 8 .

Wahba, G. (1968). On the distribution of some statistics useful in the analysis of jointly stationary time series. Ann. Math. Statist. 39, 1849-1862.

Wahba, G. (1990), SpIine Models for Observational Data. SIAM, PhiIaddphia. Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math.

Walker, A. M. (1%1), Large-sample estimation of parameters for moving-average models.

Walker, A. M. (19641, Asymptotic properties of least-squm estimates of the spectnun of a

Wallis, K. F. (19671, Lagged dependent variables and serially correlated errors: a

7,140.

Econ. Statist. 11, 225-233.

15-28.

11.47-69.

Statist. 20, 595-601.

Biomtrika 48, 343-357.

stationary non-deterministic time series. J. Australian Math. Soc. 4, 363-384.

reappraisal of three pass least squares. Rev. &on. Statist. 49, 555-567.

Page 710: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

BIBLIOGRAPHY 687

Wallis, K. F. (1972), Testing for fourth order autocornlation in quarterly regression

Watson, G. S. (1955). Serial correlation in regression analysis I. Biometrika 42, 327-341. Watson, G. S., and Hsnnan, E. J. (1956), Serial correlation in regression analysis IT.

Wegman, E. J. (1974). Some results on nonstationary first order autoregression. Tech-

Weiss, A. A. (1984), ARMA models with ARCH errors. J. Time Series Anal. 5, 129-143. Weiss, A. A. (1986), ARCH and bilinear time series models: comparison and combination.

Weiss, L. (1971). Asymptotic properties of maximum likeliW estimators in some

Weiss. L. (1973). Asymptotic properties of maximum likelihood estimators in some

White, H., and Domowitz, I. (1984), Nonlinear regression with dependent observations.

White, J. S. (1958), The limiting distribution of the serial correlation coefficient in the

White, J. S. (1959). The limiting distribution of the serial correlation coefficient in the

Whittle, P. (1951). Hypothesis Testing in Time Series A ~ l y ~ i s . Almqvist and Wiksell.

Whittle, P. (1953), The analysis of multiple stationary time series. J. Roy. Srnrist. Soc. B 15,

Whittle, I? (1%3), Prediction and Regulation by Linear Least-Square Methods. Van

Wilks, S . S . (1%2), Mathematical Statistics. Wiley. New York. Williams, E. J. (1959), Regression Analysis. Wiley, New Yo&. Williamson, M. (1972). The Analysis of BwlogicuZ Populations. Edward Arnold, London. Wilson, G. T. (1%9), Factorization of the generating function of a pure moving average

process. SIAM J. Numer. Anal. 6, 1-7. Wincek, M. A., and Reinsel, G. C. (1986), An exact maximum likelihood estimation

procedure for regtession-ARMA time series models with possibly nonconsecutive

Withers, C. S. (1981). Conditions for linear processes to be strong mixing. Z. Wahr. verw. Geb. 57, 477-480.

Wold, H. 0. A. (1938), A Study in the Analysis of Stationary Time Series (2nd ed., 1954). Almqvist and Wiksell, Uppsala.

Wold, H. 0. A. (1965), Bibliography on Tinre Series and Stochastic Processes. Oliver and Boyd, Edinburgh.

Woodward, W. A.. and Gray, H. L. (1991). “Global warming” and the problem of testing for trend in time series data. Departmot of Statistical Science, Southern Methodist University.

equations. Economettica 40, 617-638.

Biometrika 43, 436-445.

m m e t r i ~ ~ 16, 321-322.

J. Business a d Econ. Statist. 4, 59-70.

nonstandard cases. J. Amer. Statist. Assoc. 66,345-350.

nonstandard cases, II. J. Amer. Statist. Assoc. 68,428-430.

Econometrics 52, 143-161.

explosive cam. Ann. Math. Statist. 29, 1188-1197.

explosive case II. Ann. Math. Statist. 30, 831-834.

Uppsala.

125- 139.

Nostrand, Princeton, New Jersey.

data 1. ROY. S I d t . SW. B 48, 303-313.

Page 711: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

688 BIBLIOGRAPHY

Wu. C. F. (1981). Asymptotic thwwy for nonlinear least 8quares estimation. Ann. Math.

Wu, L. S.-Y., Hosking, I. R. M., and Ravishanker, N, (1993). Reallocation outliers in time

Yaglom. A. M, (1962). An Inrroduction to the meory of Starionary Random Functions

Yajima, Y. (1985). On estimation of long-memory time series models. Austrufiun J. Sturist.

Yajima, Y. (1988), On estimation of a regression model with long-memory stationary errors.

Yajima. Y. (19891, A central Iimit t h w m of Fowim transform of strongly dependent stationary processes. J. Time Series AnaI. 10, 375-383.

Yajima, Y. (1991). Asymptotic properties of the LSE in a regression model with long- memory stationary errom. Ann. Statisr. 19, 158-177.

Yamamoto, T. (1976). Asymptotic mean square prediction error for an autoregtessive model with estimated coefficients. J, Appl. Sratisr. 25, 123-127.

Yap, S. F., and Reinsel, G. C. ( m a ) , Results on estimation and testing for a unit root in the nonstationary autoregressive moving-average model. J. T h e Series Anal. 16, 339-353.

Yap, S. F., and Reinsel, G. C. (1995b), Estimation and testing for unit roots in a partially nonstationary vector autoregressive moving average model. J. Amer. Siatist. Assoc. 90,

Yule, G. U. (1927), On a method of investigating peridcities in distributed series with special reference to Wolfer’s sunspot numbers. Philos. T r m . Roy. Soc. London Ser. A

Stufist. 9, 501-513.

series. Appl. Statist. 42, 301-313.

(translated by R.A. Silverman). PrenticeHall, Englewd Cliffs, New Jersey.

27, 303-320.

Ann. Staht. 16, 791-807,

253-267.

226, 267-298. Zygmund, A, (I959), Trigonometric Series. Cantbridge University Press, Cambridge.

Page 712: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

Index

Absolutely summable sequence, 28 of matrices, 76 see a h Sequence

ACF, see Autocorrelation function Additive outlier, 460 AIC. 438 Alias, 136 Amplitude, 14 Amplitude spect~m, 174. See uko Cross

amplitude spectrum Analysis of variance, 356 AR, scp Autoregressive time series ARCH, 110 (Exercise 43), 494 m 101 Arithmetic means, method of, 129. See a h

ARMA, 70. See ako Autoregressive moving

Associated polynomial, see Characteristic

Autocorrelation function, 7-10

CesAro summability

average time series

equation

definition, 7 estimation, 317

bias, 317 covariance of estimators, 317 distribution of estimators, 335 least squares residuals, 484-486

partial, 10-12,24,27 spectral representation, 127 for vector valued time series, 17 see also Autocovariance

Autocovariance, 4 of autoregressive moving average time

series, 72 of autoregressive time series, 40,55,62

difference equation for, 55.62 of complex valued time series. 12 estimation, 313

bias, 3 16 consistency, 331,343 covariance of estimators, 3 14-3 I5 distribution, 333 least squares residuals, 484486 variance of, 314-315

of moving average time series, 22 properties of, 7-10 of second order autoregressive process, 55 spectral representation, 127 of vector process. 15-17

Autoregression, on residuals, 519-520. See also Autoregressive time series, estimation

70-75 Autoregressive moving average time series,

autocorrelation function, 72 autoregressive representation. 74 estimation, 431-443

properties, 432-434 model selection, 437438 moving average representation, 72-73 prediction. 88-90 spectral density, 161 state space representation, 200 vector, 79

Autoregressive time series, 36-41,54-58 autocomelation function, 40,%, 61 backward representation, 63-64 estimation for, 404-421

bias. 412 distriiution, 408 empirical density, 412 first order, 404-407 higher order, 407418 least squares, 407413 maximum likelihood, 405,413414 multivariate, 419-421

689

Introduction to Statistical Time Series WAYNE A. FULLER

Copyright 0 1996 by John Wiley & Sons, Inc.

Page 713: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

INDEX

Autoregressive time series, estimation for (Continuat)

nonlinear, 451 order of, 412,437-439 residual mean square, 408 from residuals, 539-520

first order, 39-41 infinite order, 65 likelihood for. 405,413-414 model selection. 412,437-439 moving average representation, 39.59,

nonstationary. 546,583 order of, 39 partial correlation in, 40.62 prediction for, 83-85

with estimated parameters, 444-449 as representation for spectral density,

165 root representation for, 67-68 second order. 54-58

63

autocowelation, 56 autocovariance function, 55,61-62 correlogram, 57

spectral density of, 159 threshold. 454 unit root, 64,546,555-560.563-564 vector, 77-78

estimation. 419-421 Autoregressive transformation. 520. See also

Gram Schmidt orthogonalization Auxiliary equation, 46. See also

Characteristic equation

Backward shiR operator, 43-44.67 autoregressive moving average

representation, 74 Bandpassed white noise, 212 Bartlett window. 384 Basis vectors, 112 Bessel's inequality, 118 Bilinear model, 457 Bounded in probability, 216. See also Order

Bricks, pottery, glass and cement, 535 Burg estimator, 418

Causal, 108 (Exercise 32) Centered moving average, 501 Central limit theorem, 233

for ARMA parameters. 432 for finite moving average series, 320 for infinite moving average series, 326 Liapounov. 233 Lindeberg, 233

in probability

for linear functions, 329 for martingale differences. 235 for nt-dependent random variables.

multivariate, 234 for regmsion coefficients. 478 for sample autocovariance. 333

Cesiiro summability, 129 for Fourier Series. 129

Characteristic equation, of autoregressive

32 1

series. 54.59'64 of difference equation, 46 of moving average time serips, 65

toots greater than one. 68-70 roots, 46-48 of vector difference equation, 50

Characteristic function, 9, 133. 143 Characteristic roots. of autoregressive series.

46-48 of circulant matrix, I51 of explosive process, 61 1-612 of matrix, 298 of moving average time series, 65 of unit root process, 591

Chebyshev's inequality. 219 Circuiant 150

characteristic roots of, 151 characteristic vectors of, 151

Circular matrix, 150 symmetric. 151

Coherency. 172 inequality, 172 see also Squared coherency

Coincident spectral density, 169 Cointegrated time series, 608 Cointegrating vectors. 608

Complete convergence. 228. See a h Convergence in distribution

Complete system of functions. 119 Complex exponential, 9 Complex multivariate normaL 207

Complex trigonometric series, 130 Complex valued time series, 12-13 Component model. 475.509. See also

Structural model Conjugate transpose, 151,170 Continuity theorem, 230 Continuous process, 5 Convergence, complete, 228

in distribution, 227 of functions, 222 of matrix functions. 233

estimators, 617,627

(Exercise 10)

in law, 228

Page 714: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

INDEX 691

in mean square (squared mean), 31-32.35,

of moving average. 3 1-37 in probability. 215 in r-th mean, 221 of series. 27-29 in squared mean (mean square), 31-32,

35.120 of sum of random variables, 31-37 weak 236

Convolution, 30,136 Fourier transform of, 136-138 of sequence, 30

120

Correlation, cross. 17 Correlation function. 7-9. See also

Autocorrelation function; Cross correlation

Correlation matrix. 17 Comlation, partial, 10-12

autoregressive time series, 40-41 moving average time series, 24 see also Partial autocorrelation

Correlogram, 57. See also Autocomlation function

Cospectrum, 169

Covariance, 4. See also Autocovariance;

Cross amplitude spectrum, 174 confidence interval for. 394 estimation, 393-394

estimation, 341-342 consistency of, 343 covariance of. 342 distribution oT. 345

estimation of, 391

Cross covariance

Cross correlation, 17

Cross covariance. 15 estimation, 339-343

consistency, 343 covariance of, 342

Cross periodogram 388 definition, 388 expected value of, 388 smoothed. 389-394

expected value of. 390 variance of, 390

Cross spectral function. 169 Cycle, 14

Delay, 167 fmpenCy response function of, 167 phase angle of, 167

Delta function, 141 (Exercise 8) Des Moines River, 184,192,346,394

sample correlations. 347

Determinantal equation, 50,77.78 Deterministic time series, 81,% Diagnostic checking. see Residuals, least

Difference, 41,507-509 squares

of autoregressive moving amage series,

effects oT, 513414,516 fractional. 100 gain of operator, 514-515 of integrated time series, 508 of lag H. 508 as moving average, 508 of periodic functions, 508 of polynomials, 46.508 of time series, 507

DitTerence equation. 41-54 auxiliary equation, 46 characteristic equation, 46 determinantal equation, 50 general solution, 45 homogeneous, 44 Jordan canonical form, 52 particular solution, 45 reduced, 44 second order, 47-48 solution of n-th order, 48 solution of second order, 47-48 solution of vector, 50-51.53-54 stochastic. 39 systems of, 49-54 vector. 49-54 vector n-th order, 53-54

516

Dirac's delta function, 141 (Exercise 8) Distriiutcd lags, see Regression with lagged

Dism%ution function, 2-3 Donsker's theorem, 237 Drift random walk, 565 Durbin-Levinson algorithm, 82 Durbin's moving average estimator, 425

Durbin-Watson d, 486

variables

e m in. 426

t approximation for, 488

Ensemble, 3 Ergodic. 308 Error spectral denaity, 177 Error spectrum, 177

estimation of, 392 Estimated generalized least squam, 280

distribution of, 280 based on residuals. 288

Estimator, 79. SBe alro w n w m d s and parameten

Page 715: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

INDEX

Even function, 8-9 Event. 1 Expectation, of produets of means, 242

of sequence of functions, 243 Explanatory variable, 605 Explosive autoregression, 583-596

estimation, 584-596 prediction, 5954%

Extrapolation, 79-94 polynomial, 482 see ah0 Prediction

Fast Fourier transform, 357 Fieller's confidence interval, 393 Filter, 79,165

delay, 167 frequency response function. 167 gain of, 167 Kalman, 188 linear, 165.497-509 matrix, 179 for mean, 497-502 mean square error of. 184 moving average, 497-509 phase of, 167 power transfer function, 167 for signal measuremenf 181 time invariant, 166 transfer function, 167 see also Moving average; Difference;

Autoregressive transforma tion Fisher's periodogram test, 363

Folding frequency, see Nyquist frequency Forecast, 79-94. See also Prediction Fourier coefficients, 114

table. 361

analysis of variance, 356 complex, 130-131 of finite vector, 114 of function on an interval, 117 multivariate, 385-386

distribution of, 389 as regression coefftcients, 114-135.356 of time series, 355-356 uniqueness, 119

Fourier Integral Theorem, 132 Fourier series, CeSaro summability, 129

convergence, 121-130 integral formula, 123 of periodic function, 125 see also Fourier transform; Trigonometric

pot ynomial Fourier transform, 132

of convotution, 136-138

of cornlation function, 127,132 of covariance function, 357. &a.!w

spectral density of estimated covariance function, 357. See

also Periodogram inverse transfom 133 pair, 132 of product, 138439 on real line, 132 table of, 134

Fourier transform pair, 132 Fractional difference, 100 Frequency, 14 Frequency domain, 143 Frequency respame function, 167 Fubmi's theorem, 34 Function, complete system of, 119

delta, 141 (Exercise 8) distribution, 2-3 even, 8-9 expectation of, 243 generalized, 140 (Exercise 8) intejyable, 117

periodic, 13. 14 piecewise smooth, 121 sample. 3 smooth, 121 symmetric distribution, LO trigonometric. 13

odd. 8-9

Gain, 174 bivariate time series, 174 confidence interval for, 394 of difference operator, 514 estimation, 394 of filter, 167 of trend removal fiter, 513-514

Gauss-Newton cstimator. 269,437 estimator of variance of, 271 moving average time series, 422424 one step, 269 unit root moving average, 630

Generalized function, 140 (Exercise 8) Generalized inverse, 80 Generalized least squares, 280

efficiency. 479-480 estimated. 280.520-522 for trend, 476

Gibb's phenomenon, 140 (Exercise. 3) Grafted polynomials, 480-484

definition, 481 for trend, 482-484

Grum-Schmidt orthogonalization. 86 520

Page 716: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

INDEX 693

Helly-Bray theorem. 230 Hermitian matrix. 170 Heterogeneous variances, 488.492497 HOlder inequality, 243 Homogeneous difference equation. 44. See

also Difference equation

Implicit price deflator, 593 Impulse response function. 529 Index set. 3.5 Initial condition equation. 188 Innovation algorithm. 86 Innovation outlier, 460.461 Input-output system. 174 Instrumental variables. 273. 532-534

distribution of estimator. 275 estimator. 274 example of. 277 for lagged dependent variables.

Integral formula, Fourier Series. 123 Integrated spectrum, 144 Integrated time series, 502

532-534

moving average for, 502-503 see also Unit root autoregression

Inverse autocorrelation function, 21 I Inverse transform. 133 Invertible moving average, 66. See also

Moving average time series

Joint distribution function. 3 Jordan canonical form, 52 Jump discontinuity, 122

Kalman filter. 188.497 initial condition equation for. 188 with missing data, 198 for predictions. 198.202 for unit root process. 195 updating equations. 190 see also State space

Kernel, 382. See also Window Kolmogorov-Smimov test. 363 Kronecker's Lemma. 126

Lagged value. 42

Lag window, 382 Law of large numbers, 253-255. See also

Least squares estimator, 251

in regression. 530

Convergence in probability

for autoregressive time series, 405-412. See also Autoregressive time series, estimation

of autoregressive unit root, 603 distribution of. 256.478 eficiency of, 312,479480,518-519 of explosive process, 584,591 Fourier coefficients, 114.117 of mean, 3 I1 of mean function, 476-480 moving average, 497 for moving average time series. 421-429.

See also Moving average time series, estimation

nonlinear. 251. See also Gauss-Newton estimator

for prediction, 79-80. See also Prediction residuals, 484497

autoregressive parameters from, 490 heterogeneous variances, 488,492497

with time series errors. 518-529 for trend, 476 with trigonometric functions, 117 variance of, 256 variance estimator, 260 of vector unit root process. 578-629 see also Generalized least squares;

Maximum likelihood; various models Lebesque-Stieijes integral, 9 Liapounov central Iimit theorem, 233 Likelihood estimation. see Maximum

Limited information maximum likelihood,

Lindeberg central limit theorem, 233 Linear filter, 165.497-509 Linear operator, see Filter Line spectrum, 146 Long memory time series, 98-101

likelihood estimation

274

estimation for. 466471 spectral density of, 168

Lynx, 455

Martingale difference, 235 Martingale process, 234 Matrix, absolutely summable sequence, 76

circulant, 150 circular, 150 circular symmetric. 151 covariance of least squares estimator. 256. See a h Least squares estimator, Gauss-Newton estimator; Generalized least squares; Instrumental variables

covariance of n obsewations, 80 covariance of prediction errors, 90-91 covariance of vector time series, 15 diagonalization of covariance, 151-154

Page 717: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

694 INDEX

Matrix (Conrinrred) difference equations, 49-54 Hermitian, 170 Jordan canonicat form, 52 positive definite, 154 positive semidefinite, 154

Maximum likelihood estimation, autoregression, 405,413414

434 autoregressive moving average, 431,

lagged dependent variable, 535-538 regression model with autoregressive

unit root autoregressive process, 546-547, errors, 526527

574-578 m-dependent, 321 Mean, best linear estimator of, 3 I I

central limit theorems for, 321,329 consistency of, 309,325-326 distribution of sample, 321 efliciency of sample, 3 12 estimation of, 308 function, 475-476 local approximation to, 497-499 moving average estimator, 497-502 structural model for, 509-513 product of sample, 242 of time series, 4 variance of, 309

Mean prediction error, 438 Mean square approximation, 120 Mean square continuous, 6 Mean square convergence, 31-32,35 Mean square differentiable, 7 Measurement error, 181 Measurement equation, 187 Missing data, f i l m a n filter for, 198

maximum likelihood with, 458-459 Moorman Manufacturing Company, 278 Moving average, 21-26,31-39,497-509

centered. 501 computation of, 499-500,506-507 covariance function, 22 covariance of, 33-39.501-502 difference as, 508 filter, 497-509 for integrated time series, 502 least squares, 499 for mean, 498-501 mean square convergence of, 31-32,35 for nonstationarity, 502 for polynomial mean, 501, 504 for seasonal, 504-507.515 for seasonal component, 504-507 spectral density, effect on, 513-514

for trend, 498-501,513 see also FilteG DitTerence

Moving average process, 21-26. Seealso Moving average time series

Moving average time series, 21-26 autocornlation function, 22-23 autoregressive representation, 65 covariance function, 22-23 estimation for, 421-429

autocovariance estimator, 421422 distribution, 424,432 Durbin's procedure, 425 empirical density, 429 first order, 421-425 higher order, 430-432 initial estimators, 425 residual mean square, 424

first order, 23-25.66 finite, 21-22 infmite, 23 invertible, 66 one sided, 22 order of, 22 partial correlation in. 23.65 prediction for, 85-88 as representation for spectral density, 163 as representation for general time series.

root representation for, 66 spectral density of, 156 unit root, 66.629-636 vector, 75-76

Multiplicative model, 440 Multivariate random walk 596-599 Multivariate spectral estimation, 385-399 Multivariate time series, 15-17. See also

96

Vector valued time series

NI(0, d), 19 (Exercise 9) Noise to signal ratio, 177 Nondeterministic, 81.86.94 Nonlinear model. 250

autoregression,' 451 estimator for, 251,269

Nonlinear regression, 250. ,!h?aLw Gauss-Newton estimator

Nonlinearity, test for, 453-4M Nonnegative definite, 7,9 Nonstationary autoregression, 5414%. See

also Explosive autoregression; Unit root au toregression

Nonstationary time serie-s, 4.475.547 types of, 475 see also Nonstationary autoregression

Nyquist frequency, 136

Page 718: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

INDEX 695

Phase angle, 14 of delay, 167 of filter, 167

Phase spectrum 174 confdence interval for, 393 estimation, 392

Pig feeding experiment, 277 Polynomial, extrapolation, 482

grafted, 480 moving average for, 501,504 trend, difference of, 508 trigonometric, 112

Positive definite Hermitian matrix, 170 Positive semidefinite, 7

Hermitian matrix, 170 Power spectrum, see Spectral density Power transfer function, 167 Prediction, 79-94

for autoregressive moving average, 88-94 for autoregrassive time series, 83-85 confidence interval for, 90 estimated expIosive pmess, 5954% with estimated parameters, 443-457 estimated unit root process, 582-583 first order autoregressive, 83-84 with Kalman filter, 198,202 linear, 80 minimum mean square error, 80 for moving average time series, 85-88 poiynomial, 482 recursive equations for, 86-87 variance of, 80

Probability limit, 215 of functions, 222

Probability space, 1-2 Process, 3-5. See also Time series: various rvpes Product, expectation of- 242

Quadrature spectral density, 169 Quadrature spectrum, 169

estimation of, 391

Observation equation, 187 Odd function, 8,9 One-step Gauss-Newton, 269 One-step prediction, see Prediction Order, of autoregressive moving average

time series, 70 of autoregressive time series, 39 of diffemnce equation, 41.53 of expected value of functions, 214-250 of expected value of moments. 242 of magnitude, 214 of moving average time series, 22 in probability, 215 of trigonometric polynomial, I I7

Ordinary independent variable. 530 Ordinate. periodagram, 358 Orthogonal basis, 112 Orthogonal system of functions, 112,116 Orthogonal vectors, 112 Orthogonality, of trigonometric functions,

Outliers. 460-466 additive, 460 innavation. 461

112-1 l4,l I6

PACE see Partial autocorrelation Parseval's theorem, 120 Partial autocorrelation, 10-32

autoregressive time series, 40,62 estimators, 416 moving average time series, 23-24.65

Partial regression, 10 Period, 13 Periodic function. 13

difference of, 508 Periodic time series, 13-14 Periodicity, test for. 363 Periodogram, 357-363

chi-square distriiution of. 360 confidence interval based on, 375 covariance of ordinates. 369 cross, 388. See a h Cross periodogram cumulative, 363 definition. 357 distribution of. 360 expected value of, 359 as Fourier transform of sample

covarisnces, 357 Fisher's test, 363 Kolmogorov-Smimov test, 363 smoothed, 371-374

distribution of, 372.374 expected value of, 372 variance of. 372

test for perbdicity, 363

Random variable, 2 Random coeficient autoregression, 457 Random walk, 546

with drift, 565 estimation, 546 vector, 596-599 see also Unit root

Realization, 3,s. 81 Regression. with lagged variables. 530-538

for time series, 51 8 - 5 3 see aLro Least squares estimator:

Generalized least squares; Partial regression

Page 719: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

696 lNDEX

Regular time series, 81,86,94 Residuals. least squares. 484-497. See also

Response function, see Frequency response

Roots, of characteristic equation, 46-48

Least squares estimator

function

of matrices, 298 of polynomial, 291 see also Characteristic roots

Sample function, 3 Seasonal adjustment, 504-507. See also

Scasonal component. 14,475,504-M7 moving average estimation, 504-507 see olso Mean; Filter; Moving

Filter; Moving average

average Sequence. 26-39

absolutely summable, 28 convergent. 28 convolution o t 30 of expectations. 243 infinite, 26-27 of matrices, 76 of uncorrelated random variabla, 5 white noise, 5

correlation

covariance

Serial correlation, see Autocorrelation; Cross

Serial covariance, see Autacovariance: Cross

Series, infinite, 27 absolutely convergent, 28 convergent, 28 divergent, 28 Fourier. 123-130. &e oko Fourier series trigonometric, 117

Shift opetator. see Backward shin operator; Lagged value

Short memory process, 98 Sigma-algebra, 2 Sigma-field, 2 Signal measurement, 181 Singular time series, 81, % Skew symmetric, 13 Smoothed periodogram, 371-374 Smoothing, see Moving average; Filter;

Periodogram Spectral density, I32

approximation for, 163, 165 autoregressive estimator. 385 of autoregressive moving average, 161 autoregressive representation, 165 of autoregressive time series, I59 coincident, 169 confidence interval for, 375

covariance estimator of, 382 distribution of. 382-384 windows for. 384-385

of convolution, 138 cross, 169. See also Cms spectrum of differenced time series. 513-514 error, 177 estimation, 355-385. See ulro Cms

periodogram; Periodogram; Spectrum estimation

existence of, 127 of filtered time series, 156,166-167,

of long memory process. 168 moving average representation, 163 of moving average time series, 156 properties of. 127 quadrature!, 169 rational, 161 of seasonally adjusted time series, 515-516 of vector autoregressive process, 180 of vector moving average process. 180 of vector time scrim, 172

estimation. 385-399 see a h Spectrum

vector process, 172

179-180,513-514

Spectral distribution function, 144

Spectral kernel, 382 Spectral window, 382. See aLE0 Window Spectrum 132

amplitude. 174 cospectrum, 169 cma, 169 discrete, 146 e m , 177

estimation of, 392 estimation, 355-385. See a150 C m

periodogram; Periodogram; Spectral density, covariance estimator

integrated, 144 line, 146 phase, 174 quadrature, 169 see uho Spectral density

Spline functions, 484. See a h Grafted

Spirits. 522,528 Squared cohmncy. 172.175

confidence interval. 392 estimation, 391 multiple. 391-392 test of zero coherency, 3%

Starting equation, 188 State equa tion, I87 State space, 187

polynomials

Page 720: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

INDEX 697

measurement equation. 187 representation for ARMA, 200 state equation, 187

Stationarity. 3-4 covariance, 4 second order, 4 strictly, 3 weakly, 4 wide sense, 4 see also Nonstationary time series

estimation, 404-407. See also Autoregressive

vector. 77 see also Autoregressive time series

continuous, 6 definition. 3 mean square continuous. 6 see also Time series

Stochastic difference equation. 39

time series, estimation

Stochastic ptocess, 3-5

Stochastic volatility model, 494-497 Strictly stationary. 3 Structural model, 509-513 Symmetric estimators. 414-416

simple. 414 simple for unit root, 568-573 weighted, 414-416 weighted for unit root. 570-574

Taylor's series. for expectations, 214-250 for random functions, 224-226

Threshold autoregression, 454 smoothed. 455

Time domain. 143 Time series, causal. 108 (Exercise 32)

complex. 12-13 covariance stationary, 4 continuous, 5 definition. 3 deterministic, 81 long memory, 98-101 nonsingular. 82 nonstationary, 4 periodic, 13 realization of. 81 regular, 81.94 short memory. 98 singular, 81, % stationary, 3,4 strictly stationary. 3 see also Autoregressive time series; Moving

average time series; various time series enm'es

model. 529 Transfer function. 167

of trend removal filter. 514 Transformation, autongressive:

Gram-Schmidt orthogonalization, 86 see also Filter; Moving average

Transform pair, 132 Transition equation, 187 Tteasury bill rate, 564,577,581,624 %nd. 4.475

filter for, 497-502 global estimation of, 476-480 grafted polynomial. 480-482 moving average estimation. 497-502 structural model for, 509-513 see also Mean: Least squares estimator.

Moving average Trigonometric functions, 14,112

amplitude, 14 completeness of, 119 frequency, 14 orthogonality of. on integers. 112-114

period. 14 phase. 14 polynomial. 112,355 sum on integers, 112-1 14 see also Fourier series; Fourier transform

Trigonometric polynomial. 112 order of. 117

Trigonometric series, 117 complex representation, I30

h e s t a g e least squares, 274,532

Unemployment rate. 336,379,412

on interval, 116

autoregressive representation, 412 correlogram for, 339-340 monthly, 379

Unit root, autoregression. 63-64,546 autoregressive estimation, 547-583

distribution 550-551,556,561-563 drift, 565-568 first order. 547-563 higher order, 555-560 maximum likelihood. 573-577 mean estimated, 560-565 median unbiased, 578-579 trend estimated, 567-568

explanatory variable, 603-606 first order, 546 Kalman filter for, 195 moments of, 547.639 (Exercise 4) moving average, 66.629-636 prediction for, 573-577 test for, univariate, 553.555.558.

568-577 tables. 641-645

Page 721: Introduction to Statistical Time Series (Wiley Series in Probability and Statistics)

698 INDEX

Unit root (Continued) test for, vector, 622

tables. 646-452 vector autoregression, 596-429

Unobserved components, #)9 Updating equation, 190

Variance, 4 heterogeneous, 488,492497 see ako Autocovariance; Cross covariance

Vector process, 15-17,75-79. See ulw, Vector

Vector valued time series, 15-17.75-79 valued time series

autoregressive, 77-78 unit root, 607410

covariance matrix, 15 covariance stationary, 15 estimation. 419-421. SeeuLso Cross

covariance..estimation; Cross periodogram

moving average, 75-76 spectral density, 172

estimation, 385-399 unit mots, tests for, 622

tables, 646-652 Wiener process. 238,596-599 seeulso Cross correlation; cross spectral

fUnCtiOa von Neumann ratio, 319,486

f approximation for, 488

Weak convergence. 236 Weakly stationary, 4 Weighted average. see Filter; Moving average Wheat yields, 510.527 White noise, 5

band passed, 212 Wiener ptocess, 236 Window, 382.384

Bartlett. 384 Blackman-ltkey, 385 hamming, 385 hanning. 385 lag, 382 Panen, 385 rectangular, 384 spectral, 382 triangular, 384 truncated. 384

Wold decomposition, 94-98

Yuie-Walker equations. 55,408 multivariate, 78