time series

Preface

The theory and practice of the analysis of time series has followed two lines almost since its inception. One of these proceeds from the Fourier transformation of the data and the other from a parametric representation of the temporal relationships. Of course, the two lines are interrelated. The frequency analysis of data was surveyed in Volume 3 of the present Handbook of Statistics series, subtitled, Time Series in the Frequency Domain, edited by D. R. Brillinger and P. R. Krishnaiah. Time domain methods are dealt with in this volume. The methods are old, going back at least to the ideas of Prony in the eighteenth century, and owe a great deal to the work of Yule early this century. Several different techniques for classes of nonstationary processes have been developed by various analysts. By the very nature of the subject in these cases, the work tends to be either predominantly data analysis oriented with scant justifications, or mathematically oriented with inevitably advanced arguments. This volume contains descriptions of both these approaches by strengthening the former and minimizing the latter, and yet presenting the state-of-the-art in the subject. A brief indication of the work included is as follows. One of the successful parametric models is the classical autoregressive scheme, going back to the pioneering work of G. U. Yule, early in this century. The model is a difference equation with constant coefficients, and much of the classical work is done if the roots of its characteristic equation are interior to the unit circle. If the roots are of unit modulus, the analysis presents many difficulties. The advances made in recent years in this area are described in W. Fuller's article. An important development in the time domain area is the work of R. Kalman. It led to the emphasis on a formalization of rational transfer function systems as defined by an underlying state vector generated in a Markovian manner and observed subject to noise. This representation is connected with a rich structure theory whose understanding is central in the subject. It is surveyed in the article by M. Deistler. The structure and analysis of several classes of nonstationary time series that are not of autoregressive type but for which the ideas of Fourier analysis extend is given in the article by M. M. Rao; and the filtering and smoothing problems are discussed by D. K. Chang. Related results on what may be termed "asymptotically stationary" and allied time series have been surveyed in C. S. K. Bahagavan's paper. The papers by L. Ljung, P. Young and G. C. Tiao relate to the estimation

vi

Preface

problems in the dynamical modelling systems. Here Young's paper deals with the on-line (real time) calculations. One of the uses of these models has been to analyze the consequences of an intervention (such as the introduction of exhaust emission laws) and another to consider the outlier detection problems. These are discussed by Tiao and T. Ozaki. Though rational transfer function models are parametric, it is seldom the case that the model set contains the truth and the problem may better be viewed as one of selecting a structure from an infinite set in some asymptotically optimal manner. This point of view is explored by R. Shibata. Though least squares techniques, applied to the prediction errors, have dominated, there is a need to modify these to obtain estimators less influenced by discrepant observations. This is treated by Tiao and, in an extensive discussion, by R. D. Martin and V. J. Yohai. The model selection and unequally spaced data are natural problems in this area confronting the experimenter, and these are discussed by R. H. Jones. Since the time points may sometimes be under control of the experimenter, their optimal choice must be considered. This problem is treated by S. Cambanis. The modelling in the papers referred to above has been essentially linear. Ozaki presents an approach to the difficult problem of nonlinear modelling. The autoregressive models may have time varying parameters, and this is considered by D. F. Nicholls and A. R. Pagan. Their paper has special reference to econometric data as does also the paper by H. Theil and D. G. Fiebig who treat the problem where the regressor vectors in a multivariate system may be of a dimension higher than the number of time points for observation. The final two papers on applications by M. A. Cameron, P. J. Thomson and P. de Souza complement the areas covered by the preceding ones. These are designed to show two special applications, namely in signal attenuation estimation and speech recognition. Thus several aspects of the time domain analysis and the current trends are described in the different chapters of this volume. So they will be of interest not only to the research workers in the area of time series, but also to data analysts who use these techniques in their work. We wish to express our sincere appreciation to the authors for their excellent cooperation. We also thank the North-Holland Publishing Company for their cooperation. Eo J. Hannan P. R. Krishnaiah M. M. Rao

Contributors

C. S. K. Bhagavan, Dept. of Statistics, Andhra University, Waltair, India 530003 (Ch. H) S. Cambanis, Dept. of Statistics, University of North Carolina, Chapel Hill, NC 27514, USA (Ch. 13) M. A. Cameron, CSIRO, Division of Mathematics & Statistics, P.O. Box 218, Lindfield, N.S.W., Australia 2070 (Ch. 14) D. K. Chang, Dept. of Mathematics, California State University, Los Angeles, CA 90023, USA (Ch. 12) M. Deistler, Institute of Econometrics, Technical University of Vienna, Argentinierstr. 8, A 1040 Vienna, Austria (Ch. 9) P. de Souza, Dept. of Mathematics, Victoria University, Wellington, New Zealand (Ch. 15) D. G. Fiebig, University of Sydney, Sydney, N.S.W., Australia 2006 (Ch. 17) W. A. Fuller, Dept. of Statistics, Iowa State University, Ames, IA 50011, USA (Ch. 1) R. H. Jones, Scientific Computing Center, University of Colorado Medical Center, Box B-119, Denver, CO 80262, USA (Ch. 5) L. Ljung, Dept. of Electrical Engineering, Link6ping University, S-581 83 LinkSping, Sweden (Ch. 7) R. D. Martin, Dept. of Statistics, GN22, B313 Padelford Hall, University of Washington, Seattle, WA 98195, USA (Ch. 4) D. F. Nicholls, Statistics Dept., Australian National University, G.P.O. Box 4, Canberra, A.C.T., Australia 2601 (Ch. 16) T. Ozaki, The Institute of Statistical Mathematics, 4-6-7-Minami-Azabu, Minato-Ku, Tokyo, Japan (Ch. 2) A.R. Pagan, Statistics Dept., Australian National University, G.P.O. Box 4, Canberra, A.C.T., Australia 2601 (Ch. 16) M.M. Rao, Dept. of Mathematics, University of California, Riverside, CA 92521, USA (Ch. 10) R. Shibata, Dept. of Mathematics, Keio University, 3-14-1 Hiyoshi, Kohoku, Yokohama 223, Japan (Ch. 6) H. Theil, College of Business Administration, Dept. of Economics, University of Florida, Gainesville, FL 32611, USA (Ch. 17)xiii

xiv

Contributors

19. J. Thomson, Institute of Statistics and Operations Research, Victoria University, Wellington, New Zealand (Ch. 14, 15) G. C. Tiao, Graduate School of Business, University of Chicago, Chicago, IL 60637, USA (Ch. 3) V. J. Yohai, Department of Mathematics, Piso 7, University of Buenos Aires, Argentina (Ch. 4) P. Young, Dept. of Environmental Sciences, University of Lancaster, Lancaster L A I 4YQ, England (Ch. 8)

E. J. Hannan, P. R. Krishnaiah, M. M. Rao, eds., Handbook of Statistics, Vol. 5 Elsevier Science Publishers B.V. (1985) 1-23

1

Nonstationary

Autoregressive

Time

Series

Wayne A. Fuller

1. I n t r o d u c t i o n A m o d e l often u s e d to d e s c r i b e the b e h a v i o r of a v a r i a b l e o v e r time is the a u t o r e g r e s s i v e m o d e l . In this m o d e l it is a s s u m e d that the c u r r e n t value can be e x p r e s s e d as a function of p r e c e d i n g values a n d a r a n d o m error. If we let Yt d e n o t e the value of the v a r i a b l e at time t, the p t h - o r d e r real valued a u t o r e gressive time series is a s s u m e d to satisfyP

Y , = g ( t ) + ~ , c q Y , _ i + e ,,i-1

t-l,2

.....

(1.1)

w h e r e t h e e,, t - 1, 2 . . . . . are r a n d o m v a r i a b l e s a n d g ( t ) is a real v a l u e d fixed function of time. W e h a v e chosen to define the a u t o r e g r e s s i v e time series on the p o s i t i v e integers, but the time series might b e d e f i n e d on o t h e r d o m a i n s . T h e statistical b e h a v i o r of the t i m e series is d e t e r m i n e d by the initial values (I(0, Y--1. . . . . Y p+l), by the function g(t), by the coefficients (cq, a 2. . . . . C~p), and by the stochastic p r o p e r t i e s of the e,. W e shall, h e n c e f o r t h , assume that the e, h a v e z e r o m e a n a n d v a r i a n c e 0-2. A t a m i n i m u m we a s s u m e the e, to be u n c o r r e l a t e d . O f t e n we a s s u m e the e, to be i n d e p e n d e n t l y and i d e n t i c a l l y distributed. L e t t h e j o i n t d i s t r i b u t i o n function of a finite set {Y,,, Y'2 . . . . . Y,} of the 1/, be d e n o t e d by

Y,2

. . . . .

Y'2 . . . . Y'o) .

T h e t i m e series is strictly s t a t i o n a r y if F,,,,, r,2..... v,, (Y,,' Y,2. . . . . Y,) = F ~,~+,, W,2+h r,,,h(Y'~' Y'2. . . . . ..... Y',,)

for all p o s s i b l e sets of indices tl, t 2 , . . . , tn and t 1+ h, t 2+ h , . . . ,-t, + h in t h e set {1, 2 . . . . }. T h e time series is said to b e c o v a r i a n c e s t a t i o n a r y if

E{Y,} =

t=1,2 .....

2

W. A, Fuller

and E{(Y,-tx)(Yt.~h-/X)}=y(h), t=1,2 .... ; h=0,1,...,

w h e r e / x is a real n u m b e r and y ( h ) is a'real valued function of h. T o study the b e h a v i o r of the time series Y~ we solve the difference e q u a t i o n (1.1) and express I,', as a function of (el, e2 . . . . e,) and (Y0, Y < . . . . , Y-p+1). T h e difference equationp

coi= ~ c~jcoi-jj=l

(1.2)

with initial conditions co0 = 1 , coi=0, i= 1, - 2 . . . .

has solution of the formp

col = Z cjim~,j=l

(1.3)

where m i are the roots of the characteristic equationp

m p - ~ ~jm p-j = 0 ,j=l

(1.4)

the coefficients Cji

are

of the form (1.5)

cji = bji kj ,

and the bj are such that the initial conditions are satisfied. T h e e x p o n e n t kj is zero if the root rnj is a distinct root. A root with multiplicity r has r coefficients with k j = 0 , 1 . . . . . r - 1 . Using the coi, the time series Y, can be written ast-I p I t -1

Yt = 2 coie,-i + ~, co,+iY-i + ~]~ wig(ti=0 i= 0 i=0

i) .

(1.6)

T h e m e a n of Yt ist-I p 1

E { Y t } = ~, co,g(ti=0

i)+ ~, co,+iE{Y ,}.i= 0

(1.7)

T h e r e f o r e , if (1/0, Y-1 . . . . . Y-p+1) is a fixed vector, the variance of I", is a function of t and Y, is not stationary.

Nonstationary autoregressive time series

3

If the roots of (1.4) are less than one in absolute value, then m i goes to zero as i goes to infinity. O n e c o m m o n model is that in which g(t) =- a o. A s s u m e that (Yo, Y - l , . . . , Y-p+1) is a vector of r a n d o m variables with c o m m o n mean

o0(,-ko,)'i=1

c o m m o n variancea 0"2 E tO~ i=o (1.9)

and covariances

S{Yt.Yt+h}=0-2E(.oio)i+h,i=0

t,t+h=O,-1

.... ,-p+l.

(1.10)

If g(t) = %, if (Y0, Y-1 . . . . . Y--v+1) is i n d e p e n d e n t of (e0, e 1. . . . ), and if the initial conditions satisfy (1.8), (1.9) and (1.10), then Y, is covariance stationary. If the initial conditions do not satisfy (1.8), (1.9) and (1.10), the time series will display a different b e h a v i o r for small t than for large t. H o w e v e r , if g ( t ) = a0 and the roots of the characteristic e q u a t i o n are less than one in absolute value, the nonstationarity is transitory. In such a situation, the large-t b e h a v i o r is that of a stationary time series.

2. T h e first-order model W e begin our discussion with the first-order m o d e l

g,=ao+c~lY=%,

, 1-~ e,,

t=1,2 ..... t=0.

(2.1)

Given n o b s e r v a t i o n s on the process, several inference p r o b l e m s can be considered. O n e is the estimation of %. Closely related to the estimation p r o b l e m is the p r o b l e m of testing h y p o t h e s e s a b o u t oq, particularly the hypothesis that o~ = 1. Finally, one may be interested in predicting future observations. A natural e s t i m a t o r for (c~o,%) is the least squares e s t i m a t o r o b t a i n e d by regressing Y, oll Y,-1, including an intercept in the regression. T h e estimators aren _. 2 q lr n

(2.2)

4

W . A . Fuller

wheren

Y, 1,t 1 n t-1

These estimators are the m a x i m u m likelihood estimators for normal e, and fixed I/0. The distribution of c~1 depends upon the true value of al, the initial conditions, and the distribution of the e,. The error in the estimator of c~1 can be written

(~1 (~1-

(Yg-I- Y(-,))

~'~ (Y,-1- Y(-t))( e, e(0)).t=l

(2.3)

Under the assumption that the e, are uncorrelated, the expected value of the n u m e r a t o r is zero. The limiting behavior of the estimator is determined by the joint behavior of the sample m o m e n t s in the numerator and d e n o m i n a t o r of (2.3). The limiting distributions of c}~ are characterized in Table 2.1. For a time series with l a [ < 1, the limiting distribution of F/1/2(15~1 - O{1) is normal under quite weak assumptions. T h e first proof of the limiting normal distribution was given by Mann and Wald (1943). There have been a n u m b e r of extensions since that time. Because wi ~ 0 as n -~ % the initial value Y0, for any real Y0, will not influence the limiting distribution, though the influence for small samples could be large. The variance of the limiting distribution ofn l / 2 ( ~ 1 -- 0~1) is

O) i

~

l--

O'1.

Table 2.1 Limiting properties of the least squares estimator of crt Parameters Initial value I/0 any real any real any real !/0 = 0 Yo = 0 Y0 # 0 Distribution of et lID(0, 0-2) lID(0, 0-2) lID(0, 0.2) NID(0, 0.2) lID(0, 0.2)N I D ( 0 , 0.2)

Limiting distribution Standardizing function a nl/2(1 - a2) -1/2 n 3/2 n (a 2 - 1) '0" ( a 2 - I) toe'l' (a '2 -- 1) 'de

0"1 ]all < 10"ll = lai] = [all > 10"11> 1 1 1 1 1 I0"1[ > a

0"o any real ao # 0 ao 0 0"0 = 0 ao = 0 0"0 # 0

Form b N(0, 1) Normal Tabulated Cauchy 9 N(0, 1)/N(~:, 1)

aThe standardizing function is a multiplier of ( ~ distribution. bThe constant s = Y0+ o~0(1 og) 1. c

oct) that produces a n o n d e g e n e r a t e limiting

Nonstationary autoregressivetime series

5

The result of Table 2.1 is stated for independently and identically distributed random variables, but the limiting distribution of nU2(&l- cq) is also normal for et that are martingale differences. For example, see H a n n a n and Heyde (1972) and Crowder (1980). If Icq] = 1 and a 0 - 0, there is no simple closed form expression for the limiting distribution of n(&a- oq). The limiting distribution of n(~ 1 %) is that of a function of three random variables,

n(al-- ~,)--, [2(r- W2)] l[(T2 1) 2TW],where (F, T, W) ==

L

(2.4)

2 2 ~,,z,, E 2%,,z,, Z ~,~/2~',~,), ~ 2~.i=1 i=1

Yi-- (-1)i+12[( 2i -

l ) v ] -1 ,

and {Z/} is a sequence of NI(0, 1) random variables. Tables of the distribution are given in Fuller (1976) and the distribution has been discussed by Dickey and Fuller (1979). T h e estimator of oq constructed under the knowledge that a 0 = 0 has been studied by White (1958), Rao (1978a, 1978b), Dickey and Fuller (1979), and Evans and Savin (1981a). It is interesting that the normalization required to obtain a limiting distribution for 61 when [all = 1 is n, not n u2. The basis for the normalization is partly explained by examining the sum of squares in the denominator of ill. If Yt is stationary, E{Y~} is a constant for all t and

is nearly a constant multiple of n. This remains true for lall < 1 and any fixed real initial conditions. If lO/ll : 1 and a 0 = 0,

E { v , ~} = to-~andn

y,=

n g 2 = [2-'n(n q 1) = 6 q ( n 2 - 1)o.2 "

6 1, a0 = 0 and Y0 = 0, then Y, can be written ast-1

r , : E ' lleti=o t-1

i

= Ol tl i=0

Ol i1- t et ic~

j=t+l

where X : j=l

aTJej.

Therefore, aTtyt converges to the r a n d o m variable X as t becomes large. It is also true thatn

-2n (~1 t=l

--2 P , 2 1)X Y t - " ' + ( O / 1 -" .

2

T h e limiting properties of the estimator of % follow from these results. Because the sum of squares of Yt is increasing at the rate a 2n, the least squares 1 estimator of a 1 converges to o/1 very rapidly and it is necessary to multiply d 1 - % by a'~ to obtain a limiting distribution. The limiting distribution of a~'(~l - al) is that of the ratio of two r a n d o m variables. T h e variable X (or X plus a constant) is in the d e n o m i n a t o r and the n u m e r a t o r variable is an independent r a n d o m variable whose distribution is the limiting distribution ofn-1

1=0

Therefore, if s 0 - 0, I/0 = 0 and the e, are normally distributed, the limiting

Nonstationary autoregressive time series

7

distribution is that of a Cauchy random variable. This-result was obtained by White (1958) and has been extended by Anderson (1959), R a o (1961), Venkataraman (1967), Narasimham (1969), and Hasza (1977). If s 0 0 or Y0 0, the denominator random variable has a nonzero m e a n (see Table 2.1). If the e t are not normally distributed, the form of the limit distribution depends upon the form of the distribution of the e r To summarize, the least squares estimator of c~1 has a limiting distribution for any value of c~,, but the standardizing function of n required to obtain a limiting distribution is a function of cq, c~0 and Y0- Also, the form of the distribution is a function of the same three parameters. An interesting aspect of the limiting distribution of the estimator of % is that o-2 is not a p a r a m e t e r of the distribution. This is because the least squares estimator of o~1 is invariant to changes in the scale of Yr. The case of Icq] = 1 is clearly a boundary case. Fuller (197"9) has shown that slight changes in the definition of the estimator produce different limiting distributions. For example, if it is known that ]O{11~ 1, and if one has observations (Y0, Y , . . . , Y,), one might use the estimator

~1where

n 1 ( Y o - y)2 + Z ( Y t - Y)2 q-2(Ynt=l

n

Z (Yt-,- Y)(Yt-- Y),t=l

(2.6)

n

J3 = (n + 1)-' ~'~ Y~.t-0

This estimator is restricted to [ - 1 , 1] and is the estimator for the first-order process used in the m a x i m u m entropy method of spectral estimation described by Burg (1975) and Ulrych and Bishop (1975). If al = 1, thenL 1 n ( ~ l - gl)-)-2[/=~1 ~ u i ] 2] -1 , 2

(2.7)

where {ui} is a sequence of NID(0, 1) random variables Y2i-12= (4i2 2)-1 Y2i2= (4Z~)-1 ,

and Z i is the ith positive zero of the function t 2 sin t - t ' cos t. The limiting distribution was obtained in a different context by Anderson and Darling (1952) and is discussed by MacNeil (1978). The distribution defined in (2.7) is much easier to tabulate than that of 61, where 61 is defined in (2.2) because the characteristic function for (2.7) may be obtained and inverted numerically. Statistics closely related to 61 have been discussed by Durbin (1973), Sargan and Bhargava (1983) and Bhargava (1983). Lai and Siegmund (1983) consider a sampling scheme in which observations

8

W.A. Fuller

are taken from the time series until

ncE Y~-I > co-2,t=l

where c is a specified constant and n c is the smallest number such that the inequality holds. For this sampling scheme and the model with a 0 = 0 known, they show thatn~l

Yt-1)

2

\1t2

( d q - oq) + N(0, o e)

L

as c ~ % uniformly for - 1 ~< ~'1 ~ 1. Thus, for a particular kind of sampling, a limiting normal distribution is also obtained for the unit root case. The least squares estimator of a0 given in (2.2) can be written as

(2.8)Therefore, the distribution of d 0 is intimately related to that of & l - %. For the model with ]%] < 1, the limiting distribution of nl/2(60- o~0)is normal. For other situations, the limiting distribution is more complicated. The fact that the distribution of 61 does not depend on o-2 permits one to use the distribution of Table 2.1 for inference about a 1. Another statistic that is natural to use for inference purposes is the Studentized statistic= [ ~t~'r{61}1-1(~11),

(2.9)

whereg{~l} = = n

(Yt-

] or2,

I~ = "2

(g/ - -

2) -1 Z [ 1/, - Y{0~- c~l(Y, 1 - 'i2{ 0)12.t=l

The limiting distribution of the statistic [ also depends upon the true parameters of the model. The types of distributions are tabulated in Table 2.2. For those situations where the limiting distribution of the standardized least squares estimator oq is normal, the limiting distribution of the [-statistic is N(0, 1). The distribution of [ for loq] = 1 is a ratio of quadratic forms and has been tabulated by Dickey (1976). See Fuller (1976). One of the more interesting results of Table 2.2 is the fact that the limiting distribution of the -statistic is N(0, 1) for ]all > 1. This result emphasizes the unique place of tall = 1. The -statistic for estimator (2.6) has a limiting distribution that is a simple transformation of the limiting distribution of 61 . The properties of predictors for the first-order autoregressive process are

Nonstationary autoregressive time seriesTable 2.2 Limiting properties of the least squares 't-statistic' Parameters Initial value Y0 lall < Icql = I~ll = Icql > Icql > [all > I 1 1 1 1 1 any real s0 ~ 0 so = 0 0~0 = 0 a0 = 0 ao # 0 any real any real any real Yo - 0 Y~ - 0 Yo- 0 Distribution of et IID(0, 0-:) IID(0, 0-2) lID(0, 0 - 2 ) NID(0, 0-2) liD(0, 0-2) N1D(O, 0"2) Limiting distribution N(0, 1) N(0, 1) Tabulated N(0, 1) ?(0, 1) N(O, 1)

given in Table 2.3. Let Y,j denote the predictor constructed with known parameters. If the parameters are known and if the e t are independent, the best predictor of Y,+j given (Y0, Y~. . . . , Y,) is the conditional expectation ?o+j = E{Yo+j [ Y.} = d0+ = %(1+ a 1+... + The error in this predictor is Yn+j - Y,+: e,,.j+ale,.;1 +'' "q-Od 1

-1)+ a t Y . .

j-2

e, 0 the system is called a hard spring type and when c < 0 the system is called a soft spring type. A natural non-linear extension of the time series model for ship rolling may be the A R model with some non-linear terms such asXt = ~)lXt_ 1 -~ . . . + (~pXt_ p 4- O ( X t _ l . . . . . Xt_p) + E,,

(2.12)

where Q(Xt_l,..., Xt_p) is a polynomial of the variables xt_ 1. . . . , xt_ p. We call the non-linear A R model (2.12) a polynomial A R model. The validity of the polynomial A R model is checked by fitting both linear and polynomial A R models for part of the data (see Fig. 2.6) and by comparing the one-step-ahead prediction error variances obtained by applying both fitted models to the rest of the data. For example, we fitted the AR(7) model and an AR(7) model with a non-linear term x 31 (AR(7)+ ~-Xt_l) for the first 760 data points, 3 tx 1, x a. . . . . x760, of Fig. 2.2, and calculated the variances of the one-step-ahead prediction error for x761. . . . . xl000. The obtained prediction error variance d-2Lof ~2 the AR(7) model was ~rL= 0.1041 and the prediction error variance O'NL^2 the by AR(7) +Trxt_ model was ~rNL^2 0.1016. This means that the non-linear model ~3 = slightly improves the prediction performance of the ship rolling. Although the above polynomial A R model gives better predictions than the linear A R model, it has a fatal deficiency as a model for the dynamics of vibration systems. Simulations of fitted polynomial A R models almost always diverge even though the original ship rolling process or the process defined by a non-linear stochastic differential equation (2.11) is quite stable and nondivergent. Therefore, some other non-linear time series model which is not explosive is desired.2.3. E x p o n e n t i a l A R models

To see the reason why the polynomial A R models are explosive, let us

32

T. Ozaki

consider some simple polynomial A R models of vibration systems. The simplest A R model which can exhibit a r a n d o m vibration system is the AR(2) model X t 051Xt_1 ~- 052Xt 2 + F.t,

(2.13)

and the simplest polynomial A R model for a non-linear vibration system is xt = 05ixt_1 + 052xt_2+ 7rxt_13+ et. The spectrum of the process defined by (2.13) isO" 2

(2.14)

P(f)

~- I 1 - 051 e i 2 ~ r f 05 e-~2~S.2 2

2

(2.15)

The p e a k of the spectrum which characterizes the proper frequency of the vibration system is attained at 1 X / - 052 -- 4052, ~ f = 2"rr tan-1

05~

which is the argument of the roots of the characteristic equation, A 2 - 05,A --052 = O. (2.16)

When 052 is fixed to be a constant, the proper frequency is characterized by 051 as in Fig. 2.7. The polynomial A R model (2.14) is represented asxt = (051 + 7TXt_l)Xt_l -]- 052Xt_2 q- e t , 2gl 1

(2.17)

Fig. 2.7.

Non-linear time series models and dynamical systems

33

and is considered to have an amplitude-dependent first-order autoregressive coefficient (see Fig. 2.8). In m a n y vibration systems, the value of x~ may stay in some finite region [x,l < M and the roots of the equationa 2 - (l~lq- "JTx~)A -- 11)2 -..~ 0

(2.18)

may stay inside the unit circle for such x[s. However, the white noise e t is Gaussian distributed and may have a large value and the roots of (2.18) may lie outside the unit circle. Then the system begins to diverge at this stage. Since we are interested in the stochastic behaviour of x, mostly for ]xt[ < M, it may be reasonable to make the non-linear function approach a b o u n d as t ~ + oo as in Fig. 2.9. A time series model which can exhibit this characteristic is the following model x, = (1 ~r e -x,_, )x,__l+2ff~2Xt_2 q- g t .

(2.19)

The model is called an exponential A R model (Ozaki and Oda, 1978). The roots of the equation A2 - (~1-- "/r e -x2 ')A - 2 = 0

(2.2o)

always stay inside the unit circle for any x,_ 1 if 1, 2 and rr satisfy the condition that the roots A0 and 0 of A2-- ( 4 1 - 7r)A - (/)2 0,~-

(2.21)

and the roots , ~ and a- ofA 2 --I~IA .--4)2 = 0

(2.22)

all lie inside the unit circle (see Fig. 2.10).

. . . . . . . . . . . . . .

1 .........................

x

1

Fig. 2.8.

34

T. Ozaki

P

Xt-~

Fig. 2.9. In the above example, the second-order coefficient is fixed to 4~2 and the roots of both (2.21) and (2.22) all stay inside the unit circle. However, in the general vibration system, the damping coefficient is not constant in general. One example is the following van der Pol equation: - a(1x 2 ) . , ~ q-

b x = O,

(2.23)

where for x 2 < 1 the system has negative damping force and starts to oscillate and diverge, but for x 2 > 1 the system has positive damping force and it starts to damp out. The interplay of these two effects of opposite tendency produces a steady oscillation of a certain amplitude, which is called a limit cycle. When the system is disturbed by a white noise n, we have - a ( 1 - x2)x + bx = n, (2.24)

which produces a perturbed limit cycle process (see Fig. 2.11). The exponential A R model (2.19) is easily extended and applied (Haggan

Fig. 2.10.

Non-linear time series models and dynamical systems"" I ~ _ i~1111t. Wlgllllll i i BI~Ill I

35

x(t)

~(t)

n(t)

~

Fig~ 2.11. Analog simulation of (2.24). and Ozaki, 1981; O z a k i 1982a) for this kind of non-linear d a m p i n g system by m a k i n g the s e c o n d - o r d e r coefficient amplitude d e p e n d e n t asXt = (~1 q- 7/'1 e-X2-1)xt-1 q- (t~2 + 7r2 e-x2 l)xt 2 q- Ft "

(2.25)

If the coefficients satisfy the condition (C1), which is such that (C1) the roots )t o and A0 of A 2 - (4~1 + wi)A - (~b2 + -,r2)= 0 (2.26)

lie outside the unit circle, then x t starts to oscillate and diverge for small x t 1~ while if the coefficients satisfy the condition (C2) such that (C2) the roots of A~ and A~ of A2 ~b~A - ~b2 = 0 (2.27)

lie inside the unit circle, then x t starts to d a m p out w h e n xt_ ~ b e c o m e s too large. T h e result of these two effects is e x p e c t e d to p r o d u c e a similar sort of self-sustained oscillation as (2.23) if we suppress the white noise e t of (2.25). Fig.

36

T. Ozaki

2-

Q .L-)

kD

'0,00

I00.00

200,00

300.00

400,00

500,00

500.00

700

Fig. 2.12.

2.12 shows the limit cycles obtained for the model xt = (1.95 + 0.23 e-X2'-')xt_l - (0.96 + 0.24 e -x~ 9x, 2 + t-,, where the coefficients satisfy the above conditions (C1) and (C2). 2.4. Stationarity The necessary and sufficient condition for the AR(2) modelXt = (01Xt-1 + (02Xt-2 ~- ~'t

(2.28)

(2.29)

to be stationary is that the roots of the characteristic equation A 2 - (01A - (02 = 0 (2.30)

all lie inside the unit circle. For checking the stationarity of exponential model x, = ((o1+ ~rle x2'-l)xt 1+(,52 + 7rze x]-l)X, 2-~ e,, (2.31)

the following theorem about the ergodicity of a Markov chain on a norm space is useful. THEORE~ 2.1 (Tweedie, 1975). A Markov chain X , on a norm space with transition law p(x, y) is ergodic if p(x, y) is strongly continuous, i.e. p(x, y) is continuous with respect to x when y is fixed, and if there exists a compact set K and a positive value c > 0 which satisfy the following conditions,

(i)(ii)

E{Ilxo+,II- Ilxoll I x . = x}

-c

for x ~ K , for x ~ K ,

(2.32) (2.33)

E{[]X,+I[ [[X,][ [ X, = x} ~ T 2 ,

(2.57)

where 1r(x,_l)= ~r0+ ~r,xt-,+ ' " + ~rrx~-l. If we a p p r o x i m a t e f1(x,-1)by a con2 stant plus a H e r m i t i a n - t y p e p o l y n o m i a l thl + ( % + 7rlxt-i + " ' " + %x~_1)e -x'-~

(a) Linear threshold AR model

(b) Non-linear threshold AR model

(c) Exponential AR model

~(x)

~(x)'

~(x) IiI I x 0

--q

x

0

Fig. 2.17.

44

T. Ozaki

, ',(XO TI I I I i I

x2

,%

0

~;

/,~

xt

Fig. 2.18.

(see Fig. 2.17), we have the following e x t e n d e d e x p o n e n t i a l A R m o d e l (Ozaki, 1981a): x, = {4)1+ ( % + 7rlx,_ 1 + " " + 7r2c~_1) ex 2

'-i}x, 1 + et.

(2.58)

This model includes the exponential A R model as special case, s --- 0. It seems that non-linear models with continuous 4) functions have more versatile geometric structure than models with discontinuous step & functions such as linear threshold A R models. For example, if we design the 4) function ofx,+, = 4~(x,)x, + ~,+1

as in Fig. 2.18 by using non-linear threshold A R models or an extended exponential A R model, then (b(x,) = 1 at four points x t = (1, sol, (2 and ~:2 (see Fig. 2.18) and so they have four non-zero stable or unstable singular points to which x t converges or from which x t diverges when the white noise e, is suppressed. However, the linear threshold models do not have such a geometric structure, since the ~b function of the model is a discontinuous step function.

2. Z T h r e s h o l d structure

W e have used the threshold in some amplitude-dependent A R models to a p p r o x i m a t e the dynamics of the A R coefficients. The introduction of the threshold idea in such a situation may look somewhat ad hoc. However, there are often cases in nature, in physical or biological phenomena, where the threshold value has a significant physical meaning. The threshold structure does not necessarily mean that the system is switched from one linear system to another linear system depending on whether the concerned x t values crosses over the critical value. One example is the wave propagation of a nerve impulse (see Fig. 2.19) or a heart beat, which are supposed to form a fixed wave

Non-linear time series models and dynamical systems(a) Impulse above the threshold

45

P

(b) Impulse b e l o w the threshold

L___Fig. 2.19. pattern and propagate if an impulse is larger than a critical value, while if the impulse is less than the critical value the impulse wave dies out (see Fig. 2.19). Neurophysically, the wave propagation is realized by the flow of electrons along the axon which is caused by the change of membrane potential and a mathematical model, called the Hodgkin-Huxley equation, is presented for this dynamic phenomenon by Hoi3gkin and Huxley (1952). Starting from this Hodgkin-Huxley equation, Fitzhugh (1969) obtained the following non-linear dynamical system model for the dynamics of the potential V: dV . : = a ( V - Eo)3+(E=- V) - b ( V - E l ) ,tit

(2.59)

where ( V - E o ) B + = ( V - E o ) 3 for V>~Eo and ( V - E o ) 3 - O for V < E o , E 0 < E 1% E 2 and E 0, E 1 and E 2 are ionic equilibrium potentials determined by the sodium and potassium ion and some other ion. The coefficients a and b of (2.59) are values which are related to the sodium, potassium and some other ions. Since they are varying very slowly compared with V, they can be considered to be locally constant. From (2.59) we know that dV/dt is zero at A, B and C (see Fig. 2.20). The reference to the sign of dV/dt on the neighbouro hood of these points shows that A and C are stable singular points, while B is an unstable singular point. If V > B , V ~ C, but if V < B , V ~ A . Therefore, B is a 'threshold', separating two stable states which may be called the resting state A, and the excited state C. This kind of threshold structure is realized by the discrete time non-linear difference equationX,+l = 4,(x,)x,,

designing &(xt) as in Fig. 2.18. One example is the following model:x,+1 = (0.8 + 4x, e-X')x,,2 2

(2.60)

46

7". O z a k iIonic current

~

b(V-E1)

AEo E1 0

~ ( V - E o ) ~ (E2-V)E2

Fig. 2.20.where sc~ = 0.226 . . . . and so; = - 0 . 2 2 6 . . . are unstable singular points and ~:~ = 2.1294 . . . . sc~ = - 2 . 1 2 9 4 . . . and s%= 0 are stable singular points. If we apply an impulse to model (2.60), then xt goes to zero for t - + ~ if the magnitude of the impulse is less than the unstable singular point ~:~ but xt goes to ~:~ for t ~ m if the magnitude of the impulse is larger than the threshold value ~ (see Fig. 2.21). If we have a white noise input to the model defined by (2.60), we have the following model:

Xt+1= (0.8 -}-*x ,2 e x2/.x , + e , + ~ , 4 )

(2.61)

where et+~ is a Gaussian white noise. Fig. 2.22 shows the simulation of model (2.61), where xt fluctuates around one of the stable singular points and sometimes moves around from one stable singular point to another depending

cD cD.....

,%

c~

'?t____J g" r0.00

40.00

r

-r----T ......--F .......

80.00

O ,00Fig. 2.21.

:

40.00

-T"----T. . . . T. . . . .

60,00

Non-linear time series models and dynamical systems0 (xJ

47

~-I--

n'~rY"l"11~,'~-,

.q-lr,rr'F''ir'v

v.~n~r-T.'~

O

C) 130

io .00

-f

1

i

I

I

I

"3"--I

l--

I

2o.oo

4o.oo

6o.oo

8o.oo

I00,00

~i0 1Fig. 2.22.

on the white noise input. By looking at the data (Fig. 2.22) of the above example (2.61), people may think of two linear models, one above the threshold and one below the threshold. However, the data are actually described by o n e non-linear model. A similar non-linear phenomenon is realized by a non-linear time series model with time varying coefficients. For example, consider the following model:X,+ 1

= (~(t, Xt)X t H- 6 , + 1 ,

(2.62)

where~b(t, x,) = {0.8 + 0.4rt e -x2' + 4(1 - rt)x 2 e -~2'}

changes fromx,+l = (0.8 H 4x 2 e-X,)x, -tto2 ', +1

(2.63) (2.64)

xt+x = (0.8 + 0.4 e-X2')x, + e , < ,

(23 (%1

0

or')

T '0.00

r 20.00

1 ~

J 40.00

'-T

~" 60.00

1

T .............. J. . . . . . . . 1 I00.00 80.~u~'~

Fig. 2.23.

48

T. Ozaki

x'-

0

Fig. 2.24.

as "rt increases monotonically from 0 to 1 as t increases. The model (2.63), as we saw before, has three stable singular points ~:~, ~:z and so0= 0 and two unstable singular points s~ and ~ , while the model (2.64) has two stable singular points r t ~ = 0 . 8 3 . . . . and - q ~ = - 0 . 8 3 . . . , and one unstable singular point rl = 0 . Therefore, the stable singular point ~:0= 0 changes into an unstable singular point as time t passes and the process xt begins to move arouhd s% to one of the other stable singular points as in Fig. 2.23. The sudden change of an equilibrium point in the above example is considered to be a result of a smooth change of some potential function as in Fig. 2.24. This kind of structural change of the process caused by a gradual change of parameters is closely related with the topic treated in catastrophe theory (see, for example, Zeeman, 1977).

2.8. DistributionsWe have seen that a threshold structure is realized by a stationary non-linear time series model x,+l = (0.8 + 4x 2 e-X~)x, + e,+l, (2.65)

where x~ moves around from one stable singular point to another depending on the white noise input. However, the process defined by (2.65) has one and the same equilibrium distribution on the whole. Fig. 2.26 shows the histogram of the data generated by simulating the non-linear threshold A R model 1"(0.8+ 1.3x{ - 1.3xg)xt + e,+l for Ix, I < 1.0, for tx, f > 1.0, (2.66)

x,+l = t 0.8xt + et+x

which has the same structural property as (2.65). It has three stable singular points ~:0= 0, ~: = 0.9 and sc~ = - 0 . 9 and two unstable singular points ~:~ =


49

I

-1 .47

-0.63

0.2

1 .05

Fig. 2.25. 0.4358.~. and s~i = - 0 . 4 3 5 8 . . . Fig. 2.25 shows the histogram of the white noise used in the above simulation, where the number of data is N = 8000. It is obvious that the three peaks in Fig. 2.25 correspond to the three stable singular points G0, s~i and ~ , and the two valleys correspond to the two unstable singular points ~:i and s~. These correspondences remind us of the ~

1

-0.8I

-0.26

0.09

Fig. 2.26.

I

0.44

50

T.

Ozaki

Fig. 2.27. correspondence between the singular points of the dynamical system

Yc = f ( x )and its potential functionx

V(x) -= - f f ( y ) dy.For example, the dynamical system 2 = - 4 x + 5x 3 - x 5 has three stable singular points G0= 0, s~ = 2, ~:~ = - 2 and two unstable singular points ~:i - 1 and ~ = - 1 (see Fig. 2.27). ~ The stable singular points correspond to the valleys of the potential and unstable singular points correspond to the peaks of potential (see Fig. 2.28). Further, it is known that the equilibrium distribution W ( x ) of the diffusion process defined by the stochastic dynamical system

2 = f ( x ) + n(t)is given by

W ( x ) - Wo exp{-2 V(x)/0-2},where 0-2 is the variance of white noise n(t) and W0 is a normalizing constant. If we consider this structural correspondence between non-linear time series models and diffusion processes defined by stochastic dynamical systems, it may be natural to study the diffusion process and its time discretization scheme in the succeeding section.


51

V(J

0

X

Fig. 2.28.

3. Diffusion processes and their time discretizations

3.1. Stochastic d y n a m i c a l systems

A stochastic dynamical system is defined by= f(x) + ~(t).

(3.~)

where ~:(t) is a Gaussian white noise with variance cr 2, and so it is also r e p r e s e n t e d as5c = f ( x ) + ern (t) ,

(3.2)

where n ( t ) is a unit G a u s s i a n white noise whose v a r i a n c e is one. Since, for small r > 0, it holds that lim E [ A x ] _ f ( x ) ,.r~O T

limr~0

E[(Ax)2IT

~r2 , -0 (k/>3),

limr-*O

Et(ax)qT

where Ax = x ( t + r ) - x ( t ) = f ( x ) r + f[+'~ ds n ( s ) + o(r), we have, for the process defined by (3.2), the following F o k k e r - P l a n c k equation: 0p ot

1 02 0 Ox [f(x)pl + ~ x 5 [o'2p],

(3.3)

52

T. Ozaki

where p stands for the transition probability p(X[Xo, t) which means the probability that the process takes the value x at time t, given that it had the value x 0 at time t = 0. Thus the stochastic dynamical system uniquely defines a diffusion process with transition probability p(x I Xo, t) defined by the F o k k e r Planck equation (3.2). Conversely, the diffusion process defined by (3.3), obviously, uniquely defines the stochastic dynamical system (3.2). However, the rate of the growth of the variance, lim r~0

E[(Ax)T

,

of a general diffusion process is not a constant but a function of x. A general diffusion process is characterized by the following Fokker-Planck equation:

OpOt

0

1 02

[a(x)p] + ~ ~ [b(x)p] . Ox Ox

(3.4)

Then (3.4) uniquely defines the following stochastic differential equation (see, for example, Goel and Richter-Dyn, 1974)2 = f(x) + g(x)n(t),

(3.5)

where

f ( x ) : a(x),

g ( x ) - X/b(x).

On the other hand, a stochastic differential equation

2 - f ( x ) + g(x)n(t)uniquely defines a diffusion process whose Fokker-Planck equation is

019 Ot

1 0a 0 [f(x)p] + ~ ~x 2 [g2(x)p] . Ox

By the variable transformation

y = y(x)=

f

x dE g(~),

(3.6)

we have, from the stochastic differential equation (3.5), the following stochastic dynamical system:

= a ( y ) + n(t),

(3.7)

where n(t) is a Gaussian white noise with unit variance. We call the process y


53

the associated diffusion process of (3.4), and we call the dynamical system f~ = a(y) the associated dynamical system of (3.4). By the analogy with mechanics we define the potential function by

V(y) = -

f

Y

a 05) d)T

(3.8)

We note that the potential function (3.8) is different from the potential function well known in Markov process theory (Blumenthal and Getoor, 1968), and we call V(y) of (3.8) the potential function associated with the diffusion process or simply the associated potential function. The above discussion suggests that any diffusion process uniquely defines a variable transformation and a potential function with respect to the transformed variable.

3.2. Distribution systemsSince our interests are non-linear stationary time series with given equilibrium distributions, let us confine ourselves to homogeneous diffusion processes which have unique equilibrium distributions. The equilibrium distribution W(x) of the diffusion process (3.4) is given by W(x) = ~ C exp{2 fx [a(~)/b(~)]d~} (3.9)

where C is the normalizing constant. Wong (1963) showed that for any probability distribution function W(x) defined by the Pearson system

d W(x) c o+ Qx dx - do+ dlX + d2x2 W(x) ,

(3.10)

we can construct a diffusion process whose equilibrium distribution is W(x). Then the following proposition is obvious from the straightforward extension of Wong's logic: PROPOSITION 3.1~ For any distribution W(x) defined by the distribution system dW(x)

dx

c(x) W(x), d(x)

(3.11)

we can construct a diffusion process whose equilibrium distribution is W(x) as follows:Op O 1 02 Ot ..... Ox [{c(x) + d'(x)Ip] + ~ Ox~ [2d(x)p] , (3.12)

where c(x) and d(x) are analytic functions.

54

T. Ozaki

We call the distribution system (3.11) a generalized Pearson system. The system includes not only distributions of the Pearson system but also all the analytic exponential families ~g of distributions which are defined by the set of distributions {f} of the following forria:

W(x) = a (f)a (x) exp{ fi (f). t(x)},

(3.13)

where a and the fli of fl = ( i l l , . . . , ilk) are real-valued functions of ~, and a(x) and t(x)= (tl(X) . . . . . tk(X))' are analytic functions of x (Barndorff-Nielsen, 1978). From the definition of the generalized Pearson system the following propositions are also easily obtained.

The generalized Pearson system of the equilibrium distribution of the diffusion process defined by the Fokker-Planck equationPROPOSITION 3.2.

Op Ot isdW-

0 1 Oa [a(x)p] + ~ ~ [b(x)p] Ox Ox2 a ( x ) - b'(x)W(x).

(3.14)

dx

b(x)

(3.15)

PROPOSITION 3.3.

The generalized Pearson system of the equilibrium distribution of the diffusion process defined by the stochastic differential equation Yc= f(x) + g(x)n(t) isdW(x) dx=

(3.16)W(x).

2 f ( x ) - g(x)g'(x)-

g(x) 2

(3.1'7)

The generalized Pearson system of the diffusion process y associated with the diffusion process x defined by (3.16) isPROPOSrrlON 3.4. dW(y) dy -2a(y)W(y), (3.18)

wherec~ ( y ) = c~ ( y ( x ) ) -

f(x) g(x) "

(3.19)

The above correspondence between the generalized Pearson system and the diffusion process in Proposition 3.1 is unique if we restrict that c(x) and d(x) of (3.11) are mutually irreducible.


55

3.3. Local linearization of y = f(y) + n(t)A well-known method for the time discretization of

= f ( y ) + n(t)is to use the following Markov chain model: yt+A,- Yt = At. f ( y , ) + B,+a,- B,,

(3.20)

(3.21)

where B , + a t - B t is an increment of a process of Brownian motion and is distributed as a Gaussian distribution with variance At. The process y, defined by (3.21) is known to converge uniformly, for At-+0, to the original diffusion process y defined by (3.20) on a finite interval of time (Gikhman and Skorohod, 1965). The deterministic part, y,+a,-Yt = At "f(Yt), of (3.21) is known as the Euler method of discretization of the dynamical system Y = f(y). (3.22)

However, the Euler method is known to be unstable and explosive for any small At, if the initial value of y is in some region. For example, the trajectory y(t) of 3~ = _y3 (3.23)

is known to go to zero for any initial value of y. Its discretized model by the Euler method is Y,+at = Yt- At. y~, (3.24)

which is explosive, the trajectory going to infinity if the initial value Y0 is in the region ]Y0I> ~/2/At. It is also known that, for any small At, the Markov chain (3.21) is non-stationary if f ( y ) is a non-linear function which goes to + ~ for [ y l - ~ (Jones, 1978). The same thing can be said for some other more sophisticated discretization methods such as the H e u n m e t h o d or the R u n g e Kutta method (see, for example, Henrici, 1962). For the estimation and simulation of diffusion processes by a digital com~ puter, it is desirable to have a stationary Markov chain which converges to the concerned stationary diffusion process for At-~ 0. Our idea of obtaining such a stationary Markov chain is based on the following local linearization idea. When f ( y ) of (3.22) is linear as in= -~y, (3.25)

its analytic solution is obtained as

y(t) = Yo e-~'.

(3.26)

56

T. O z a k i

Therefore, we can define the discrete time dynamical system by y,+~, = e ~aty,, (3.27)

which coincides with y(t) of (3.26) on t, t + At, t + 2At . . . . . Also, the Markov chain defined byY t+At = e - ' ~ a t y t + k / ~ e t+at

(3.28)

is stationary if a > 0 , and the Markov chain converges to the stationary diffusion process:9 = - ~ y + n(t)~

If we approximate e -~a' of (3.27) by a first-order Taylor approximation, (3.27) becomes equivalent to the Euler method, which does not even coincide with the analytic solution (3.26) at t, t + At, t + 2 A t , . . . . Other discretization methods such as the Heun method and the R u n g e - K u t t a method are approximation methods which aim to be higher-order (2nd and 4th, respectively) Taylor approximations of e -~at. If we consider the general dynamical system (3.22) to be locally linear, i.e. linear for a small interval At, and if we use the analytic solution (3.26) for the small interval, we have a trajectory which coincides with the trajectory of the original dynamical system at least for linear f(y). This idea is realized by integrating, over [t, T), t ~< ~- < t + At, Y = 7f(Y) oy which is obtained by differentiating (3.22), assuming that

of

(3.29)

J, = Of 0 Oy

(3.30)

is constant on the interval, i.e. assuming that the system is linear on the interval. Then we have y(~-) = eJ'('-3~(t ) from which we have, by integrating again over [t, t ~ At), y(t 4 (3.31)

At) = y(t) + J;l(eJ'a'--

1)f(y,).

(3.32)

For Jt = 0 we have

y(t ~ at) = y(/) + a t f ( y , ) .

(3.33)


57

It is easily seen that the model defined by (3.32) and (3.33), which we call a locally linearized dynamical system, converges to 3 ) = f ( t ) f o r A t e 0 . It is also easily checked (see, for example, Gikhman and Skorohod, 1965) that the Markov chain defined by Y,+a, = where Yt + j;1 (e,,a, _ 1)f(yt) qS(Yt) = Yt + At . f(yt) for Jt O, for Jt = O, (3.35)

@(Y,)+ V ~ e , + a , ,

(3.34)

and et+~t is a Gaussian white noise with unit variance, converges to the diffusion process y(t) of (3.20). We call the model (3.34) the locally linearized Markov chain model of the stochastic dynamical system model (3.20). As we shall see later, the present local linearization method brings us, unlike the Euler method or other discretization methods, non-explosive discrete time dynamical systems. If f ( x ) is specified it is easy to check whether the locally linearized dynamical system is non-explosive or not. However, it may be sometimes useful if sufficient conditions for the non-explosiveness of the locally linearized dynamical system are given for the general dynamical system 9 = f(Y). The model (3.34) is rewritten in the following way: Y,+a, = 4' (Y,)Y, + V~Te t+at, where 4'(y,) = 1 + (e j'a'- 1)f(y,)/(.l, . y,) for y, 0 and Jt 0. For the y, to be non-explosive for t ~ , has only to satisfy[6(y,)l < 1

(3.36) (3.37) the function f ( y )

for large lY,I. From (3.37) it is obvious that we have (e j'a' - 1)f(yD/(J , . y,) < 0 ; hence 4'(Yt) < 1, for large ]YtI if f ( y ) satisfies the following condition: (A) f(y)O forlyl~ ~. for y - > - - m , (3.38)

of(y0y

:(=J(y)) -1 for lY]-* m This is equivalent to

T. Ozaki

(eJ~y~a' 1)f(y)/{J(y)y} > - 2

(3.39)

for lyl~m, W h e t h e r f(y) satisfies (3.39) or not very much d e p e n d s on the decreasing (or increasing) b e h a v i o u r of the function f ( y ) for [Yl--'m. F r o m now on, we will discuss the situation w h e r e y ~ m, because the s a m e l o g i c m a y be applied for the negative side. If

J(y)"~Othen we have

for y ~ m

e s)a'- I 2~tf(y) J(y) At y

- - >

Atf(y) y

-2

for y--~ m.

(3.40)

T h e r e f o r e , ~b(y) > - 1 for y -* m if f(y) satisfies J ( y ) ~ 0 for y ~ m. If J(y)-* c < 0 for y -* m, we have, for sufficiently small At,

eJ(y)at 1 .f(y___)) e - c a ' - l f ( y ) > _ 2

for y ~ m

(3.41)

J(y)

y

c

y

T h e r e f o r e , a sufficient condition for qS(y)> -1 for y ~ o~ is: (BI)

J(y)-*c 0 for Y>Yo, then we have q ~ ( y ) > 0 for Y>Yo. T o have q ~ ' ( y ) > 0 for Y>Y0 concaveness of f ( y ) is sufficient. To have q~(y0)_> 0 for some Y0, it is sufficient that there exists Yl ~> Y0 such that f(Yl) < - c X / y l Vc > 0. This is always satisfied if f(y) satisfies the condition (B;), and so (B~) is a sufficient condition for qS(y) > - 1 for y ~ oo. Examples of functions which satisfy (B;) are

f(y)=-y

e y2 and

f(y)=-y3.

The similar conditions of f ( y ) for y ~ --oo are obtained f r o m the same logic as follows:(C1)

J (y )'~ c 0"~ and for any c > 0 there exists Yl ~< Yo OY2 ]such that

f(Yl) > - cYl.From the above discussions we have the following theorem: THEOREM 3.1.

The locally linearized dynamical system (3.32) is non-explosive if the function f ( y ) of (3.22) satisfies the condition (A), any one of conditions (B1) or (B;) and any one of conditions (C0 or (C;).The non-explosiveness of the locally linearized dynamical system (3.32) is

60

T. Ozaki

closely related with the ergodicity of Markov chains on the continuous norm space. For the locally linearized Markov chains (3.34) to be ergodic, Theorem 2.1 requires q~(y) to be a continuous function of y and to have the shift back to centre property which is guaranteed by [q'(Y)/Yl = 14ffy)l < 1 forly[ ~ ~ . Therefore, we have the following theorem: THEOREM 3.2. The locally linearized Markov chain (3.34) is ergodic if f ( y ) satisfies the condition (A), any one of conditions (B1) or (B;) and,any one of conditions (C 0 or (C~). 3.4. Some examples Let us see some examples of diffusion processes which have some distributions of interest and their locally linearized Markov chain models. EXAMPLE 1. Ornstein-Uhlenbeck process. is defined by0p_ 0 10 2

The Ornstein-Uhlenbeck process

at

Ox [axp] + ~ Ox-- [o'2pl, ~

(3.42)

from which we have the following stochastic differential equation: Y= - ax + ~rn(t). The associated dynamical system is = -ay,(3.44)

(3.43)

where y = x/cr. We define the damping function z ( y ) of a dynamical system = f ( y ) byz(y) = -f(y) o

Then the damping function of (3.44) is a linear function (see Fig. 3.1) z(y) = ay. The associated potential function (see Fig. 3.2) is V(y) = a y2.Z

(3.45)

(3.46)


61

zlv)

/

Fig. 3.1.

Fig. 3.2.

T h e Pearson system of the equilibrium distribution of x of (3.42) is

dW(x) - 2ax W(x), dx o.2

(3.47)

and the distribution W ( x ) is the well-known Gaussian distribution (see Fig. 3.3)

/ a / ax2\ W ( x ) = ~ a 2 e x p , - --~-7-) T h e locally linearized M a r k o v chain model isXt = o . Y t , Yt+at = e - a

(3.48)

atYt + X/--~

et+at ,

(3.49)

which is an AR(1) model with a constant ~b function (see Fig. 3.4). EXAMPLE 2. 2 = --X 3. 2 - - x3 T h e dynamical system (3.50)

has a non-linear cubic damping function as in Fig. 3.5. If this dynamical system is

Wlxl

0

Fig. 3.3.

Fig. 3.4.

62

T. Ozaki

driven by a white noise of variance 0"2, we have 2 = - x 3 + o'n(t). The Fokker-Planck equation of the process x is _ _ = _ _ [x3p ] + _102 [0-2p]. 013 0Ot OX 20X 2

(3.51)

(3.52)

The associated dynamical system is obtained by employing the variable transformationy = x/o',

(3.53) (3.54)

giving= _0-2y3.

The associated potential function (see Fig. 3.6) is

V(y)

= ~

0-1 y4 .

(3.55)

The distribution system of the equilibrium distribution of x isd W(x) dx_

--2X 30"2 W ( x ) .

(3.56)

Then the distribution W ( x ) is given by (see Fig. 3.7)W ( x ) = W o exp - ~ 2

,

(3.57)

where W 0 is a normalizing constant.

V{y)

0Fig. 3.5. Fig. 3.6.


63

W(X)

~(Yt)

5

2

Fig. 3.7.

Fig. 3.8.

T h e locally linearized M a r k o v chain model is

xt = 0.Yt, Yt+at = 6(Yt)Y, + X/~te,+at.where2 1 2 2 qb(yt) = 3 + 5 exp(--30. 2xtyt).

(3.58) (3.59)

T h e figure of the ~b function is shown in Fig. 3.8. EXAMPLE 3. system 2 = --6X + 5.5X 3 - X5. T h e d a m p i n g function of the dynamical

2 = - 6 x + 5.5x 3 - x 5

(3.60)

has five zero points, so0 = 0, sc~ = ~22, ~:~ =-Xf~-~, sc~ = 2 and ( ~ = - 2 (see Fig. 3.9). T h e y are called singular points of the dynamical system. If an initial value x 0 of (3.60) is one of the five singular points, then x(t) stays at x 0 for any t > 0. If the d y n a m i c a l system is driven by a white noise 0-n(t), we have 2 = -6x+ 5.5x 3 x 5~

o'n(t).

(3.61)

T h e c o r r e s p o n d i n g F o k k e r - P l a n c k equation is

0t9Ot

0 [ ( - 6 x + 5.5x 3-- xS)p] + 1 0 2 0x~ [0-2p] ~ Ox

(3.62)

T h e associated dynamical system is 3? = - 6 y + 5.50-2y 3 - o4y5, where y = x/0-. The associated potential function is (see Fig. 3.10) 11"2 y4 +0. 4 y6

V ( y ) = 3y 2 - - - 8 -

--6

"

(3.63)

64T h e distribution system of x is dW(x)

7". Ozaki

- 1 2 x + l l x 3 - 2x 5

dx

0-2

W(x) ,

(3.64)

and the distribution W ( x ) is (see Fig. 3.11)

W(x)= Woexp{(-6x2+llx4-1x6)/0-2},

(3.65)

where W 0 is a normalizing constant. T h e locally linearized M a r k o v chain m o d e l is

x t = cryt , y,+~,, = 49(yt) + X/Net+at,where

q)(Y,) =! Y'

+ flY') ~ t ) [exp{J(yt)z~t} - 1]

for J(Yt) ~ 0, for J(y,) = 0 ,

y, + a t . f(y,) f ( y , ) = _ 6 y t + 5.50-2y3_ o '4 t,, yand

J(Yt) = - 6 + 16.50-2yt; - 50"4yt.aSince

cl)(yt)/y, ~ e -6a'

for

ly, l-" 0,

the ~b function of the locally linearized M a r k o v chain m o d e l is (see Fig. 3.12)

I1 + f ( Y t L [ e x p { J ( y t ) A t } - 1] 'I J~Yt)Yt ~P(Yt) = ' 1 ] + (--6y, + 16.50-2y3t -- 50-'ySt)At/

for J(Yt)Yt # O, for J(y,) --- O, for Yt = 0.

t e-6AtJ

Z(y)

/Fig. 3.9. Fig. 3.10.

Non-linear time series models and dynamical systemsw(l'

65

(Vt)

0

Z

Fig. 3.11.

Fig. 3.12.

EXAMPLE 4. by

Gamma-distributed process.

The Gamma distribution is defined

W(x) =

x ~-1 e x p ( - x / f l )

r(~)/3

(3.66)

Its Pearson system is dW(x)dx

(a-1)/3-x/3x

(3.67)

from which we have a diffusion process defined by the following Fokker= Planck equation0t9Ot

0Ox

l 02 [(a/3 - x)p] + ~ ~ [2/3xp].

(3.68)

The stochastic differential equation representation of the diffusion process is2 = (a - )fl - x + ~/~2flx" n ( t ) .

(3.69)

By the variable transformation

y = x/2~//3we have the stochastic dynamical systemy = ( a - ~ )1/ y - y/2+ n ( t ) .

0.70)

(3.71)

The damping function z ( y ) of the associated dynamical system isz ( y ) = y / 2 - (a - ~)/y. 1

(3.72)

661

T. Ozaki

As is seen in Fig. 3.13, if a >~ the damping function is negative for y < 1 V'2a - 1 while if a < ~ the damping function is always positive. The associated potential function (see Fig. 3.14) isV ( y ) = y2/4 - (a - 2) log y.

(3.73)

The shape of the distribution of Gamma distribution changes drastically at a = 1, while the critical value for the distribution of the associated process y ( t ) of (3.71) is ce = 12. The equilibrium distribution of y(t) is given by (see Fig. 3.15) 1 y(~_l) exp(_ ~ ) V(oL)2,,_1 (3.74)

W(y)-

when a =~1 the damping function is a linear function of y, and the potential function is a quadratic function. Therefore, the distribution of y is Gaussian for 1 a = ~. The locally linearized Markov chain model for the diffusion process x ( t ) is x, = (flyt)2/2fl, (3.75)Yt+a, = 49(y,) + X/ M e,+a,,

where = l y , + [exp{J(yt) At}- 1]. f ( y t ) / J ( y , ) q:'(y,) and for J(y,) # 0, for J(y,)=-(a-

Ly, + A t .

f(y,),

(3.76)J(y,) = 0

f ( y , ) = (a - ~ ) / y , - y J 2 , 1A l t h o u g h [4~(Y,)I =

19/7,

2 1-

~.

L~(y,)/y,I < 1 for y,-~ % ~b(y,) is not bounded (see Fig. 3.16), when a _! ~ z.Z(y)[ V(y)

(~ >0.5 0 Y

Fig. 3.13.

Fig. 3.14.


67

W(x)

~'0.5

o'q,

00=-1, and note that 7(I)= 7(-1).

Autocorrelation functionThe autocorrelation function

p(1) = y(1)/'y(O)

(2.15)

Can be obtained directly from (2.14). Note that if @(B)= 1, i.e. (1.3) is a moving average model of order q, MA(q), then

p(l)={o,-Oq(l+O2+ ' ' - + 0 o ) , ll>=q.q'

2 -1

(2.16)

This is an important property which will prove useful in the model building process.

Partial autocorrelation functionLet us define, for k = 1,2 . . . . ; m = 0, 1, 2, ..

~'(k, m) = [Y(7 + 1) ]

and

Jy(m)..., y(m _.1) o. y ( m - k + 1)] (2.17)

a(k,m)=]v(T+l).'..'"'".ii" "",(,h 1)b y ( m + k ......1)..;7"(tfi+l i" 7(m)

J

92

G. C. Tiao

From (2.14), we have for l = 1 , . . . , p, a system of equations

r(p, 0) = a(p, 0),(p)+ c,where q~(p)= (qb1. . . . . ~p)' and c this case we can express 45p as= (cD . . . ,

(2.18) Cp)t. When O(B)= 1, c = 0 and in

45 = e,(p)lc(p, 0)/~,(0)1-',where1g(p) =

(2.19)p(1)

...

p(,p-2)

p(1) P(P" 1) " " p(i) PC)

This result then leads to defining the following function of the autocorrelation coefficients O(1) . . . . , p(/): ~(1) = c {p(l),k g(l)lG(l,

l = 1,

(2.20)

O)/r(O)l-i,

l > 1,

which is known as the partial autocorrelation function. It has the property that, for a stationary AR(p) model, i.e. O(B)= 1, N(1) = qbt' { t O, l = p, (2.21)

l>p.

The important property of ~(1) is that it vanishes for 1 > p when the model is AR(p). This is akin to the property of the autocorrelation coefficients p(/)'s with respect to the MA(q) model, and will prove to be a useful tool in model building.

Extended autocorrelation functionFor the ARMA(p, q) model, we see from (2.14) that for 1 > q, letting

@(')(p) = c(p, t)-lr(p, O,where 4~(')(p) = (4~{ . . . . . . q~o), and letting

(2.22)

p,t

=

-'"

(2.23)

W(t)~ then, since qO(p)_ q~(p), the transformed process {__p.,, follows a MA(q)

ARMA

models, intervention problems a n d outlier detection

93

model. Thus, if we let p(p, l) be the lag 1 autocorrelation of " q) we have that wp, t,

p ( p , 1) =

+

+'"+02)

-1 ,

l=q, l>q.(2.24)

ll0,

In general, for k = 1,2, 3 , . . . and 1 = 1, 2,3 . . . . , let the k x 1 vector

~O(k ) =

~ (91 .....

satisfies the equations

G(k, l)~(~)(k ) = ~,(k, l)

(2.25)

and p(k, l) be the lag 1 autocorrelation of the transformed process tw(0x t vv k,tJ, where W~l]t = (1 - cI)g)B . . . . . cI)g~Bk)Z,. That is

p(k, l) = b ' G ( k + 1, I)b/b'G(k + 1, O)b,

(2.26)

where b '= (1, q(k)') and it is easily seen that p(k, l) is a function of the autocorrelations p(1) . . . . , p(k + 1). Now, for k = p and l >1 q, p(k, l) has the 'cutting off' property (2.24) for A R M A ( p , q ) model which is akin to the property of p(1) in (2.16) for the MA(q) model. Following the work of Tsay and Tiao (1984), we shall call p(k, l) the kth extended autocorrelation of lag l for Z r W e shall also denote p(l) = p(O, l) so that p(k, l) will be defined for k >i 0 and l/> 1. It can be readily shown that for stationary A R M A ( p , q) model, when k >~p, c, l = q + k-p, (2.27)

p(k,l)=

O,

l>q+k-p,

where [c] < 1. The above property for p(k, l) will be exploited later in the model building process.

2.2. Prediction theoryIn this section, we discuss the problem of forecasting future observations for the A R M A ( p , q) model (1.3). W e shall assume that the model is known, i.e. all the p a r a m e t e r s q~l. . . . , q~p, 01. . . . . Oq and o-2 are given. In practice, these parameters will, of course, have to be estimated from the data. For a discussion of the effect of estimation errors of the estimates on forecasts, see e.g. Y a m a m o t o (1976). Basically, the forecasting problem is as follows. Suppose that the {Zt} series begins at time m and we have available observations up to time T, Z m. . . . . Z r. What statements can then be m a d e about future observations Zr+l, l = 1, 2 . . . . . L? Clearly, all the information about Zr+ 1. . . . . Zr+ c is contained in the conditional distribution p ( Z T + 1. . . . . ZT+ c [ Z(T)), where Z ( T ) = ( Z m . . . . . Z T ) ' .

94

G. (2". T i a o

From the probabilistic structure assumed in (2.1), this conditional distribution is a L-dimensional multivariate normal distribution. In what follows, we obtain the mean vector and covariance matrix of this distribution and discuss their main properties. We shall denote Z r ( l ) as the conditional expectation

Zr(l) = ET(ZT+,)= E(ZT+t

I ZCT)),

(2.28)

which is the minimum mean square,d error (m.m.s.e.) forecast of Zr+ l, and denote er(1 ) as the forecast error

er(l) = Zr+,

-

2r(z).

(2.29)

From (1.3) with C = 0 and (2.3), we have that for l t> 1 2r(0 where Z,)(/') = ZT+j, j < 0, and fiT(i)=

: @ , 2 r ( l - 1) + " - O~fiT(l -- 1) .....

+ % 2 r ( 1 - p) + fir(t)OqfiT(l -- q)

(2.30)

E(ar+ i [ Z(T))

so that fiT(i) = 0 for i > 0. Thus, the Z,r(/)'s can be recursively calculated from (2.30) once the expected values fir(-/'), J' = 0 . . . . . q - 1, are determined, and for l > q the Zr(/)'s satisfy the difference equation ( B ) 2 r ( / ) = 0, where B now operates on l. To obtain a r ( - ] ) , we have from (2.10) thatT-j-m T-j m

(2.31)

fir(--J)- Z r - i -

Zh=l

%Zr-i-h +

Zh= T-j-(m+r)+l

fr * E h

(w~__j_~ I zr~)(2.32)

It can be shown that when all the zeros of O(B) are lying outside the unit circle, both ~rh and rr~ approach zero as h ~ m and for T - j >> m, the third term on the right-hand side of (2.30) can be ignored so thatT-j-m

fir(--jl=Zr_j -

~h=l

rrhZrq_ h.

(2.32a)

Thus, approximately, fir(--J) only depends on Zr_ j. . . . . Z,,. Note that the requirement that all zeros of O(B) be lying outside the unit circle is known as the 'invertibility condition' of the A R M A ( p , q) model. For a discussion of noninvertible models, see e.g. Harvey (1981). It is of interest to study the behavior of the forecasts Z'r(/) as a function of

ARMA

m o d e l s , i n t e r v e n t i o n p r o b l e m s a n d outlier d e t e c t i o n

95

the lead time I. F r o m (2.31), we can write

aT(l) =.

"~IA(T)'~'I~I-1-

"'"

-t- J-lNa(T)of/~l ,

(2.33)

-1 where, as in (2.5), p o < p , a71, . . . , c%o are the Po distinct zeros of q~(B), and . (T) A~r), .,Ap0 are polynomials in I whose coefficients are linear functions of Z q, the asymptotic variance of r(1) is V a r ( r ( / ) ) - -- 1 + 2/'~ j=l

02(/) .

(2.43)

By substituting r(j) for the unknown p(j) in (2.43), the estimated variances of the r(/)'s are often used to help specify the order q of a MA model.

SPA CFThe sample partial autocorrelations ~(l), l = 1. . . . . (2.44)

98

G. C. Tiao

of Z, are obtained by replacing the p(/)'s in (2.20) by their sample estimates r(/)'s. For stationary modelsP

a~(1) ~ ~(1)

(2.45)

and the ~(/)'s are asymptotically normally distributed. Also, for a stationary AR(p) model Var(~(l))---,n

1

l>p.

(2.46)

The properties in (2.45) arid (2.46) make SPACF a convenient tool for specifying the order p of a stationary A R model in practice. For nonstationary models, i.e. ~ ( B ) contains the factor U(B) in (1.5), the asymptotic property of ~(l) is rather complex, however. In the past, the SACF and SPACF have been the most commonly used statistical tools for tentative model specification. Specifically, a persistently high SACF signals the need for differencing, a moving average model is suggested by SACF exhibiting a small number of large values at low lags and an autoregressive model, by SPACF showing a similar 'cutting off' pattern. Also, for series exhibiting a strong seasonal behavior of period s, persistent high SACF at lags which are multiples of s signals the need to apply the 'seasonal differencing' operator 1 - B ' to the data, and so on. The weaknesses of these two methods are (i) subjective judgement is often required to decide on the order of differencing and (ii) for stationary mixed autoregressive moving average models, both SACF and SPACF tend to exhibit a gradual 'tapering off' behavior making specification of the orders of the autoregressive and the moving average parts difficult.

ESA CFRecently, several approaches have been proposed to handle the mixed model specification problems. These include the R- and S-array methods of Gray et al. (1978) and the generalized partial autocorrelations by Woodward and Gray (1981). In what follows, we discuss the procedure proposed by Tsay and Tiao (1984), using what they called the extended sample autocorrelation function (ESACF) for tentative specification of the order (p, q) for the general nonstationary and stationary A R M A model (1.3). The proposed procedure eliminates the need to difference or in general transform the series to achieve stationarity and directly specify the values p and q. For stationary A R M A models, estimates ~(k,/)'s of the EACF p(k,/)'s as defined in (2.26) can be obtained upon replacing the p(/)'s in (2.26) by their sample counterparts r(/)'s. In this case, the estimated ~5(k,/)'s will be consistent for the p(k,/)'s and hence the property (2.27) can be exploited for model identification. However, for nonstationary model, the ~(k,/)'s will not have the asymptotic property given by the right-hand side of (2.27) in general.

ARMA

models, intervention problems and outlier detection

99

Now for ARMA(p, q) models, one can view the extended sample autocorrelation function approach as consisting of the following two steps. W e first attempt to find consistent estimates of the autoregressive parameters in order to transform Z t into a moving average process. We then make use of the 'cutting off' property of the autocorrelation function of the transformed process for model identification. For estimating the autoregressive parameters, the following iterated regression approach has been proposed. First, let ,.g(0)) , . . ..(0) be the ordinary "a- l ( k U)k(k) least squares (OLS) estimates from fitting the A R ( k ) regression to the data,,

Z t = 451(k)Zt 1 + " " " + qgk(k)

(o)

(o) Z

t-t + ~ka,

()

(2.47)

where L. k,t denotes the error term. The 1st iterated A R ( k ) regression is given by .,(0)Zt = ~(1) 7~l(k.~t-1 ~- " " " -1~(1)

k(k)

Z

t - k -- t-" l ( k ) ~ k , t - 1 -}- e k , t ,

~

~(1)

.9(0)

(1)

(2.48)

who,.o ~(o)-_- , :1 - - ~,g(o)~ . . . . . ^ (o) k ..... k., ' t ( k ) ~" q~k(k)B )Z, ,s the residual from (2.47) and e(k'~ denotes the error term. This yields a new set of OLS estimates C~]~k),.. " ' C ~k(kF O) In general, for 1 = 1, 2, . . . the estimates ,fi(t) , m(~)) are obtained from the ~'t(k), ~k(k /th iterated A R ( k ) regressionZ t ~_ ( ~ l ~ k ) Z t _ 1 _ ~ . . . _[_ (~)(l) 7 ft(l) 0 ) ~(0) k(k)L't~k -}- b ' l ( k ) ~('-~) -}- " " " q- P ( 'l(k)t~ k,t-I + k,t-1

e(~!,(2.49)

wherei

O(i) = ( 1 k,t

~(i)

"x'- ( k ) ~

R

....

. __

(~(i)

]~k~, 7 __ ~ "~ k(k) JJ .IL't ~h=l

i~(i) ~(i-h) I"h(k)'k,t-h

(i.e. the residuals from the ith iterated regression) and e~)t is the error term. In practice, these iterated estimates ,g(0 ,~ can be obtained from OLS estimates of '~:'(k) ~ the autoregressive coefficients by fitting AR(k), . . . , A R ( k + l) to Z t using the recursion j(k, q(t) =^ . . . . ~i(g+0- q~;'(~)qb~+lllk+,)/45~(I,1)), ~(t-1)

(2.50)

where ~0(k,'~(~)'=-1, j = l , . . . , k , k ~ > l and 1/>1. Based on some consistency results of OLS estimates of autoregressive parameters for nonstationary and stationary ARMA(p, q) models in Tiao and Tsay (1983), they show that fork=pP

~(')(p)-->

~(p),

l ~ q,

(2.51)

where ~(l)(p)= (ci)l(p) . . . . . .p(p): .

Now analogous to (2.26), the extended sample autocorrelation -function r ( k , 1)

100is defined as

G. C. Tiao

r(k, 1)= q(Wk.,) , ~ (o

(2.52)

where rl(lTd~t!, is the lag l sample autocorrelation of the transformed series )Wk, t =

-~ (0

( 1 - -a(0k y B tPl(

.....

,fi(0) 'r~k~7t ~k(k -" J~

(2.53)

Also, we may denote r(0, l ) = r(l) for the ordinary sample autocorrelations, and shall call r(k, l) the kth extended sample autocorrelation of lag I. Tsay and Tiao show that for the general A R M A ( p , q) model in (1.3), stationary or nonstationary, when k >/p e {c, l=q+k-p, (2.54)

r(k,l)--~ O, l > q + k - p .where Icl < 1.

Tentative model specification via E S A CFThe asymptotic property of the E S A C F r(k, l) given by (2.54) can now be exploited to help tentatively identify A R M A ( p , q) models in practice. For this purpose, it is useful to arrange the r(k, l)'s in a two-way table as shown in Table 2.1 in which the first row gives the SACF, the second row gives the 1st E S A C F , and so on. The rows are numbers 0, 1, 2 , . . . to signify the A R order and the columns in a similar way for the M A order. To illustrate the use of the table, suppose the true model is an A R M A ( 1 , 2). For the SACF, it is well known that asymptotically r(0, l) 0 for l ~ 2. Now from (2.54) with p = 1 and q = 2, we see that (i) when k = 1, r(1, l) - 0 for 1/> 3, (ii) when k = 2, r(2, I) - 0 for 1 ~> 4 and so on. The full situation is shown in Table 2.2, where x denotes a nonzero value, 0 is zero and * means a value between - 1 and 1. T h e zero values are seen to form a triangle with boundaries given by the two lines k = 1 and l - k = 2. The row and column coordinates of the vertex correspond precisely to the A R and M A order, respectively.

Table 2.1 The ESACF table ~,.,. R MA ~_

MA\

0 r(O, 1) r(1, 1) r(2, 1) r(3, 1)

1 r(0,2) r(1, 2) r(2, 2) r(3, 2)

2 r(O, 3) r(1, 3) r(2, 3) r(3, 3)

3 r(0,4) r(1, 4) r(2, 4) r(3, 4)

0 1 2 3

A R M A models, intervention problems and outlier detectionTable 2.2 T h e asymptotic E S A C F table for an A R M A (1.2) model where x denotes a nonzero value and * denotes a value between - 1 and 1

101

A R ~

MA

0

1

2

3

4

5

6

7

0 1 2 3 4

* * * * *

X X X X X

X 0 X X X

X 0 0 X X

X 0 0 0 X

X 0 0 0 0

X 0 0 0 0

X 0 0 0 0

In general, we are thus led to search from the E S A C F table the vertex of a triangle of asymptotic 'zero' values having boundary lines k = c1> 0 and l - k = c 2 > 0 , and tentatively identify p - - c 1 and q = c 2 as the order of the A R M A model. In practice, for finite samples, the r(k,/)'s will not be zero. The asymptotic variance of the r(k,/)'s can be approximately obtained by using Bartlett's formula. As a crude but simple approximation, we may use the value (n - k - l) -1 on the hypothesis that the transformed series lYC(~!tis white noise to estimate the variance of r(k, l). Of course, it is understood that this simple approximation might underestimate the variance of r(l, k) and a further study of this subject is needed in the future. As a preliminary but informative guide for model specification, the E S A C F table may be supplemented by an analogous table consisting of indicator symbols x denoting values greater or less than -+2 standard deviations and 0 for in between values.

2.3.2. EstimationOnce the order (p,q) of the model (1.3) is tentatively specified, the parameters (C, ~1 . . . . . @p, 01,. . , Oq, tr 2) can now be estimated by maximizing the corresponding likelihood function. An extensive literature exists on properties of the likelihood function, various simplifying approximations to this function, and asymptotic properties of the associated maximum likelihood estimates (see e.g. Anderson, 1971; Newbold, 1974; Fullerl 1976; Ljung and Box, 1979). In what follows, we consider two useful approximations, the first of which has been called the 'conditional likelihood function' proposed by Box and Jenkins (1970) and the second, the 'exact likelihood function' by Hillmer and Tiao (1979). With n observations Z = ( Z 1. . . . . Zn)' from the model (1.3) and assuming m ~ 1, consider the transformed vector W = ( W x. . . . . IV,)', where

W : D~)Z,

(2.55)

with D ~ ) a n x n matrix analogous to D ~ ) in (2.35). Now partitioning W ' =

102

q~ C. T i a o

. . . (Wo), W(2)), . where W ( O - ( W 1 , . . . , Wp) and Wi2)= write the joint distribution of W as

(Wp+l,

.. . , W,), we can

p( W ) = p ( w m l W~2))p( W~2)) .

(2.56)

Both the 'conditional' and the 'exact' likelihood approaches are based on the distribution p(W(2)) by ignoring p(W(l) I W(2)); and it can in fact be shown that, for moderately large n, the parameter estimates are little affected by p(W(I) IW(2)). Now from (1.3) and (2.55), the probabilistic structure of W(2) is given byq

W t - C - ~ , Oia, i + a,i=1

t = p + 1. . . . , n .

(2.57)ap_q+1 = 0. In this

The 'conditional' approach assumes that ap = case, the likelihood function can be written as 10(C, ~, 0, ~r2 I Z) oc o-; ("-p) exp - ~

ap_ 1 . . . .

(

1 ZO'a t = p + l

a2 ,

)

(2.58)

where for given parameter values of (C, ~, O) the at's are recursively calculated fromp i=1 q

a, = Z, - C - Z ebZ,-i + ~, O,a,-i.i=1

(2.59)

Standard nonlinear least squares methods can now be employed to obtain estimates (C, q~, 0) minimizing the sum of squares in the exponent of (2.58). That is,S(C, 4~, O)= min S(C, ~ , O),

(2.60)

where S(C, ~, 0 ) = Y'",=p+l a,.2 Also, the corresponding maximum likelihood estimate of ~r2a is1

d-2a= - S ( C , ~, 0) on

(2.61)

In the 'exact' approach, the assumption at, ap_q+1 = 0 is not made, and after some algebraic reduction it can be shown that the likelihood function is. . . . . .

l(c, ~, o, o-]lZ) o~ o-X~"-,~)l~l-laexp(

1 "~ 2~r2 =p~q+l d~) .

(2.62)

ARMA


103

In (2.62), for t = p + 1 . . . . .p

nq

d, = z, - c -

Z 4,,z,, + Z 0,a, ,,i=1 i=1 = (ap-q+l, - - - , i~p)'

(2.63)

and for t = p - q + 1 . . . . . p the vector d ,~i, = 22 - I R ' M ' a ,

is given by (2.63a)

w h e r e / ) = Iq + R ' M 'MR,1

."

.

.

71"1" " "

".. . " "1

Eq i1 .......n

7"gn,_l . . . . . . .

7"i'n,_q

n' = n - p, lq is a q x q identity matrix, the 7r~'s satisfy the relation (1 ~- 7 r ~ B 0qB q) = 1, and a = (ap+l . . . . . a , ) ' the elements of Ir~B 2 . . . . )(1 - 01B . . . . .which are given by (2.59). For a detailed derivation of (2.62), see Hillmer and Tiao (1979). T o obtain the m a x i m u m likelihood estimates of the p a r a m e t e r s in (2.62), we see that the c o n c e n t r a t e d likelihood of (C, q~, 0) is max l(C, ~, O, or] ] Z) ~O'a t= +1

/~,

,

(2.64)

where/)t = l~'~[l/2(n-P)~lt"T h u s standard nonlinear routines can be used to obtain estimates (C, ~ , 0) minimizing the sum of squaresn

s*(c,., o ) =

Zt = p - q + l

b,~is

(2.65)

and the c o r r e s p o n d i n g m a x i m u m likelihood estimate of O" 2 ~a1^2 O" a __

n-p

~-l/(n P)S*(C, 4}, 0).

(2.66)

it is clear f r o m (2.59), (2.63) and (2.63a) that the exact a p p r o a c h is c o r n putationally m o r e b u r d e n s o m e , but it can appreciably r e d u c e the biases in estimating the moving average p a r a m e t e r s 0 associated with the conditiona~ approach, especially w h e n some of the zeros of O(B) are near or on the uni~.

104

G. C. Tiao

circle. In practice, one uses the conditional approach in the initial phases of the iterative modeling process and switches to the exact methods towards the end.2.3.3. Diagnostic checking

Once the parameters of the tentatively specified model are obtained, it is important to perform various diagnostic checks on the fitted model to determine if it is indeed adequate in representing the time series being studied. Methods for detecting model inadequacies are primarily based on the residualsP q

at:Zt-d-~l~tZt-i-~ati=1 i=1

i, t : p + l

.... ,n,

(2.67)

from the fitted model. Useful tools include plotting of residuals against time to spot outliers (see later discussion in Subsection 3.3) and changes in level and variability, and studying the sample autocorrelation function rn(1) of the residuals to determine if it is consonant with that of a white noise process. A 'portmenteau' criterion originally proposed by Box and Pierce (1970) and later modified by Ljung and Box (1978) is given bym

O = n(n + 2) ~ (n - l)-lr](l).1=1

(2.68)

On the hypothesis that the Zt's are generated from a stationary ARMA(p, q) model, then O in (2.68) obtained from the residuals will be approximately distributed as X2 with m - (p + q) degrees of freedom. It should be noted that in practice when serious inadequacy occurs, patterns of the individual ra(/)'s often provide useful information about directions to modify the tentatively specified model.

3. Transfer function models, intervention analysis and outlier detection

In this section, we discuss some properties of the transfer function model in (1.6) with special emphasis on its application to intervention analysis and outlier detection problems. In general, the input variables X#'s can be deterministic or stochastic. When the X#'s themselves are stochastic and follow Gaussian ARMA models, Box and Jenkins (1970) have proposed a modeling procedure which specifically deals with the case of one input variable. AIthough their procedure can in principle be extended to the case of several stochastically independent input variables, it becomes cumbersome to apply and an alternative method via vector ARMA models has been suggested (see Tiao and Box, 1981). In what follows, we shall confine our discussion to deterministic inputs.

A R M A models, intervention problems and outlier detection

105

3.1. Intervention problems In the analysis of economic and environmental time series data, it is frequently of interest to determine the effects of exogenous interventions such as a change in fiscal policy or the implementation of a certain pollution control measures that occurred at some known time points. Standard statistical procedures such as the t-test of mean difference before and after the intervention are often not appropriate because of (i) the dynamic characteristics of the intervention, and (ii) the existence of serial dependence in the observations. It is shown in Box and Tiao (1975) that a transfer function of the form (1.6) can be employed to study the effect of interventions. Specifically, suppose we wish to estimate simultaneously the effects of J interventions on an output series Yt, we may make X# indicator variables taking the values 1 and 0 to denote the occurrences and nonoccurrences of exogenous interventions and use 8~I(B)coj(B)B bj to model the dynamic effects on the output, where 8j(B) = 1 - 6liB . . . . . 6rfiB rj,co(B) = cooj - colj B . . . . . cosj s]

(3.1)and bj is a nonnegative integer representing the delay or 'dead time'. The variables X# can assume the form of a step function X# = S(~ or a pulse ~) function Xjt = -tP(rJ), where 0, 1, tTj, and {1, p~r,)= 0, t=~, tCTj,

S~rJ)=

(3.2)

and note that (1 - B)S~ r) = p~r). Fig. 3.1 shows the response to a step and a pulse input for various transfer functions models of practical interest. Specifically, for a step change in input, (a) shows a step response with one-period delay; (b) shows the more common situation of a 'first-order' dynamic response and the steady state gain (eventual effect) is measured by w/(1 - 6); and (c) represents the situation when 6 = 1 in which the step change in the input produces a 'ramp' response or trend in the output. For a pulse input, (d) shows the situation in which the pulse input (e.g. a promotion campaign) has only a transient effect on the output (sales) with col measuring the initial increase and 6 the rate of decline; (e) represents the situation that apart from the transient effect, the possibility is entertained that a residual gain (or loss) 0)2 in the output persists, and finally (f) shows the situation of an immediate positive response to be followed by a decay and possibly a permanent residual effect. The last figure might represent the dynamic response of sales to a price increase. A positive coo would represent an immediate rush of buying when a prospective price change was announced at time T, the initial reduction in sales which occurred at time T + 1 when the price increase took effect would be measured by o) I + o)2 and the final effect of the price change would be represented by 0)2.

106

G. C. Tiao I~_~ sIT) e , ~ - ~ STEP ~ PULS~ Pt (T'

(T, 's(s, st

e(B) piT) ~(B--~

[o1

Ill

_ _

II

+ - -

P

w2

(hi

~e)

,'4(c)

J

....If)

%~t . . . . .

r~P,

Fig. 3.1. R e s p o n s e s to a step and a p u l s e input.

Obviously, these dynamic transfer models may be readily extended to represent many situations of potential interest, and intervention extending over several time periods can be represented by indicator variables other than the pulse or the step functions.

3.2. Model buildingIn practice, one needs to tentatively specify both the dynamic models

8il(B)o~j(B)B bj and an ARMA(p, q) model for the noise term N, in (1.6).Parsimonious dynamic models are usually postulated to represent the expected effects of interventions. For tentative specification of a model for the noise term Nt, there are several possible alternatives. One may apply the identification procedures discussed earlier in Subsection 2.3 to data prior to the occurrences of the interventions if a sufficiently large number of such observations are available. One may apply these procedures to the entire data set when the effects of the interventions are expected to be transient in nature. Finally, one may first estimate the impulse responses ~'~h l = 1 , . . . , m, for a

ARMA


107

suitably large m, whereI.~j(B ) = FoJ q._ Pl.1 .jr_... _[_ 1]m.iB m ._t. 6fl(B)wj(B)Bb,, B

by ordinary least squares, and then apply the identification procedures to the residuals Yt - Y'~=t ui(B)X# Once a model of the form (1.6) is tentatively specified, we can then estimate the intervention parameters and parameters in the noise model for N t simultaneously via maximum likelihood. Specifically, writeJ

Yt = C + Z ujt + dP-I(B)O(B)a,,j=l

(3.3)

where 8j(B)U i, = ~oj(B)BbJXj, so that for given values of the parameters in 3j(B) and ~oj(B) the Uj, s carl be recursively calculated from the Xj,'s; we may then compute the at's recursively from q 0 ( B ) ( Y t - C - Z ] = l Ujt) = O(B)at and apply nonlinear least squares methods to estimate all the parameters involved. Finally, diagnostic checks can be performed on the residuals to assess the adequacy of the model fit and to search for directions of improvement, if needed.

3.3. Detection of outliers in time seriesIn the above application of the transfer function model (1.6), the time points of occurrence of the interventions are supposed known. We now discuss a variant of the methods for handling situations in which the timings Tj's of the exogenous interventions are unknown and the effects lead to what may be called aberrant observations or outliers. We summarize the results on outliers detection in time series of Chang and Tiao (1983), following earlier work by Fox (1972).

Additive and innovational outliersLet {Yt} be the observable time series. We shall concentrate on two types of outliers, additive and innovational. An additive outlier (AO) is defined as

YI

Nt

+

.~(,o)

(3.4)

while an innovational outlier (10) is defined as

v,where

N, +

O(B)

(3.5)

~:(t0)_{l' t=t 0,t O, t#to,

108

G. (2 Tiao

and N t follows the m o d e l (1.3). In terms of the a,'s in (1.3) with C

time series

Documents

analysis of time series

data analysis

allied time series

time domain area

time domain methods

classical work

work of yule

frequency analysis of