stat4: advancedtimeseriesanalysis& forecasting summarylecturenotes · ... the analysis of...

STAT 4: Advanced Time Series Analysis &Forecasting

Summary Lecture Notes

Lecturer: A/Prof Shelton Peiris

Semester II - 2014

School of Mathematics and Statistics

The University of Sydney

ADVANCED TIME SERIES AND FORECASTING METHODS

1. Lecturer: A/Prof Shelton Peiris (Carslaw Room 819)

2. Objectives: Establish some advanced methods of modelling and analysing of time seriesdata. A particular attention will be given to the theoretical development of variousmethods related to the advanced topics given in the course outline.

3. Outcomes: Following this course, students will be able to develop their analytical think-ing on theoretical aspects of time series beyond the topics covered. This may provide apath for further research and/or higher studies for potential students.

4. Course Outline:

(i) Review of Linear ARMA/ARIMA Time Series Models and their Properties.

(ii) An Introduction to Spectral Analysis of Time Series.

(iii) Fractional Differencing and Long Memory Time Series Modelling.

(iv) Generalized Fractional Processes. Gegenbaur Processes.

(v) Topics from Financial Time Series/Econometrics: ARCH and GARCH Models.

(vi ) Time Series Modelling of Durations: Autoregressive Conditional Duration (ACD)and Stochastic Conditional Duration (SCD) Models.

(vii) An introduction to State-space Modelling and Kalman Filtering in Time Series.

5. Assumed Knowledge: Mathematical Statistics (Advanced knowledge at Intermediateand Senior Levels) including a course on Time Series Analysis or equivalent.

6. Method of Teaching and Learning: Lectures: 2 lectures (2 hours) a weekTuesday 12.00pm (Carslaw 830)Wednesday 3.00pm (Carslaw 830)

Assessments:

2 Assignments - 14%1 Technical Report∗ - 11%November Examination - 75%

*Note: This report (about 6-8 pages, typed or neatly handwritten) must include the anal-ysis of a real time series data set using standard time series techniques. The evidence ofusing a suitable computer package (eg. R, S+, SAS) is essential in your report. Studentsmust clearly demonstrate their understanding of time series analysis through this reportfor full marks.

7. Suggested Reading:

(a) Time Series: Theory and Methods, Brockwell, P.J. & Davis, R.A. (Springer-Verlag- 1991).

(b) Time Series Analysis: Forecasting and Control, Box, G.E.P. & Jenkins, G.M. (Holden-Day - 1976).

(c) Spectral Analysis and Time Series. Priestley, M.B. (Academic Press - 1981).

(d) The Analysis of Financial Time Series, Tsay, R.S. (John Wiley - 2001).

(e) Modelling Financial Time Series with SPlus, Zivot, E. and Wang, J. (Springer, NewYork - 2003).

Chapter 1

Review of Linear Time SeriesModels and their Properties

1.1 Preliminaries

1.1.1 Probability Space

Let Ω be the sample space of an experiment, and let A be the collection of all subsets of Ω(σ-algebra). If P is the probability measure (which assigns probability to each subset of A)then the triplet (Ω,A,P) is called the probability space.

1.1.2 Stochastic Process

A stochastic process is a family of random variables indexed by the symbol t, defined on(Ω,A,P), where t ∈ T , a given index set.

The functions X(ω), ω ∈ Ω on T are known as the realisations or sample paths of the processXt, t ∈ T. The collections of all possible realisations is called the ensemble.

1.1.3 Stationarity of a Stochastic Process

The process Xt is said to be stationary up to order m if, for any admissible t1, . . . , tn, andany k, all the joint moments up to order m of Xt1 , . . . ,Xtn exist and equal the correspondingjoin moments up to order m of Xt1+k, . . . ,Xtn+k. Thus,

E[Xt1m1 · · · Xtnmn

]= E

[Xt1+km1 · · · Xtn+kmn

]

for any k, and all positive integers, m1, . . . ,mn such that m1 + . . . mn ≤ m.

1

Some special results:

1. Set m2 = . . . = mn = 0 and take k = −t. Then,

EXtm1 = EX0m1

for any t. In other words, EXtm1 is independent of t for all m1 ≤ m.

2. Set m3 = . . . = mn = 0 and take k = −t1. Thus,

E[Xt1m1Xt2m2

]= E

[Xt0m1Xt2−t1m2

]

for any t1, t2 and all m1, m2 satisfying m1 + m2 ≤ m. In this case for any t and s,E[Xtm1Xsm2

]is a function of (s− t) only.

Some special cases:

1. Stationarity up to order 1 (m = 1). In this case, E(Xt) = E(X0) = µ for all t.

2. Stationarity up to order 2 (m = 2). In this case, m1 and m2 satisfy m1 + m2 ≤ 2:(m1,m2) : (0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (0, 2)

. Thus, E(Xt) = µ is a constant inde-

pendent of t and E(X2t ) = c also a constant independent of t. Furthermore, E(XtXs) is

a function of (s − t) only. In practice stationarity up to order 2 is referred to as secondorder stationarity or weak stationarity. In summary a process is weakly stationary if:

(a) it has the same mean value µ for all t: E(Xt) = µ (independent of t).

(b) it has the same variance for all t: σ2 <∞ for all t.

(c) the covariance between the values at any two time points, s and t depend only onthe interval between the time points (s − t) and not on the location of the points :cov(Xt,Xt+k) = γ(k) is independent of t and depends only on k.

A time series is a set of observations taken sequentially at specified times.

1.1.4 Gaussian Time Series

The time series Xt; t ∈ T is said to be Gaussian if and only if the distribution functions ofXt are all multivariate normal. In other words, for any admissible set, t1, . . . , tn, the jointprobability distribution of Xt1 , . . . ,Xtn is multivariate normal.

Now we give a number of formal definitions for later use.

1.1.5 Autocovariance and Autocorrelation Functions

The autocovariance and function at lag k, γk, for stationary process, Xt is defined as:

cov(Xt,Xt+k) = γk = E[(Xt − µ)(Xt+k − µ)

]; k = 0,±1,±3, . . .

Note that in defining the autocovariance function, we implicitly assume stationarity, in the formof a constant mean, µ.

2

The autocorrelation function at lag k, ρk is given by:

ρk =γkγ0,

where γ0 = E[(Xt − µ)2

]= var(Xt) = σ2X for all t.

Properties of γk:

1. |γk| ≤ γ0 and γ−k = γk for all k.

2. For any set of time points and constants, (a1, . . . , an) we have

n∑

r=1

n∑

s=1

arasγr−s ≥ 0

where ai ∈ R. In other words, we have non-negative definiteness.

3. Suppose that Xt is a stationary time series with γs−r = cov(Xr,Xs) and γs−r = γr−s =σrs. Let W =

∑ni=1 aiXi. Then,

var(W ) =n∑

r=1

n∑

s=1

arascov(Xr,Xs) =n∑

r=1

n∑

s=1

arasσrs.

1.1.6 Estimating γk and ρk.

Suppose that we have n observations from x1, . . . , xn.

1. An estimate for γk is:

ck =1

n

n−k∑

t=1

(xt − x)(xt+k − x)

for k = 0, . . . , n − 1.

2. An estimate of ρk is:

rk =ckc0

=

∑n−kt=1 (xt − x)(xt+k − x)∑n

t=1(xt − x)2.

1.1.7 The Partial Autocorrelation Function

Suppose that Xt is a stationary time series. The partial autocorrelation at lag k of this series isthe additional correlation between between Xt and Xt−k when the linear effect of all interveningreadings (Xt−1, . . . ,Xt−k+1) has been removed from each Xt and Xt−k. This quantity is denotedby πk.

Recall that the linear effect on Yt from X1t,X2t, · · · ,Xpt can be removed using the regressionYt = α0 + α1X1t + α2X2t + · · · + αpXpt + Ut. Using the standard assumptions on regression,Ut = Yt −E(Yt|X1t,X2t, · · · ,Xpt gives the required expression.

Now for a stationary time series

πk = corr(Xt − E(Xt|Xt−1, . . . ,Xt−k+1),Xt−k − E(Xt−k|Xt−1, . . . ,Xt−k+1)).

3

It can be seen that for a stationary process, π1 and π2 in terms of ρ1 and ρ2 are:

π1 = ρ1, π2 =ρ2 − ρ211− ρ21

.

Eexercise: Prove the results for π1 and π2 as shown above for any stationary time series.

Hint: To find π2, remove the linear effect of Xt−1 from Xt and Xt−2 by regressing:

Xt = αXt−1 + ut

andXt−2 = βXt−1 + vt.

Now evaluate π2 = corr(ut, vt) = corr(Xt − E(Xt|Xt−1),Xt−2 − E(Xt−2|Xt−1)).

Eexercise: Deduce that for an AR(1) process π1 = ρ1 and πk = 0, k ≥ 2.

Eexercise: Consider the AR(2) process given by Xt = α1Xt−1 + α2Xt−2 + ut. Show that

π1 = ρ1, π2 =ρ2−ρ2

1

1−ρ21

= α2 and πk = 0, k ≥ 3.

Eexercise: Deduce that for an AR(p) process π1 = ρ1, π2 =ρ2−ρ2

1

1−ρ21

, · · · , πp = αp and πk =

0, k ≥ p+ 1.

Eexercise: Find expressions for π3 and πk, k ≥ 3 for an AR(3) process.deduce that π3 =α3.

Note: The plot of πk, k ≥ 1, against k serves as a diagnostic tool complementing the role ofthe ACF.

AR(2)

4

Estimating πk

To estimate πk, the partial autocorrelation function at lag k, replace the estimates of ρk by rkin the above expressions. For example,

π2 = r1, π2 =r2 − r211− r21

.

Now we review a number of linear time series models.

1.2 Linear Time Series Models

Consider time series in T ∈ Z = 0,±1,±2, . . ..

1.2.1 White Noise Process

The process, Zt; t ∈ Z is called a purely random process or a white noise process if it consists ofa sequence of uncorrelated (but not necessarily independent) random variables. That is,

cov(Zr, Zs) =

σ2 r = s0 r 6= s

for all r, s ∈ Z.

The process Zt is often called “white noise” and written as Zt ∼WN(0, σ2). Note that theprocess Zt has no memory in the sense that the value of the process at time t is uncorrelatedwith all past values up to time t− 1 and in fact with all future values of the process.

1.2.2 General Linear Process

Xt is said to be a general linear process if it can be expressed in the form

Xt =

∞∑

j=0

ΨjZt−j ,

where Zt ∼WN(0, σ2) and Ψj is a given sequence of constants satisfying

∞∑

j=0

Ψ2j <∞.

The general linear process has zero mean: E(Zt) = 0 ∀ t and finite variance:

var(Xt) = σ2∞∑

j=0

Ψ2j <∞.

Exercise: Find the autocovariance function of the above general linear process.

5

1.2.3 Autoregressive Moving Average Process

A very wide class of stationary processes can be generated by using the purely random process(or white noise) as building blocks. The most popular class is called Autoregressive MovingAverage (ARMA) processes and given by Box and Jenkins (1970).

Definition 1.2.1. The process Xt; t ∈ Z is said to be an ARMA(p, q) process if Xt isgenerated by the stochastic difference equations:

Xt =

p∑

j=1

φjXt−j + Zt +

q∑

j=1

θjZt−j

=

p∑

j=1

φjBjXt + Zt +

q∑

j=1

θjBjZt,

where Zt ∼WN(0, σ2) and B is the backshift operator such that BjXt = Xt−j ; j ≤ 0. Thenthe above can be written as:

1−p∑

j=1

φjBj

Xt =

1−q∑

j=1

θjBj

Zt

Φ(B)Xt = Θ(B)Zt

Where Φ(B) is the autoregressive polynomial of degree p and Θ(B) is the moving averagepolynomial of degree q.

1.2.4 Causal ARMA Process

An ARMA(p, q) process defined by the equation Φ(B)Xt = Θ(B)Zt is said to be causal if thereexists a sequence of constants Ψj such that

∑∞j=0Ψ

2j <∞ and

Xt =∞∑

j=0

ΨjZt−j ,

for all t ∈ Z. In other words, Xt is a linear combination of all past Zts.

1.2.5 ARIMA(p, d, q) Processes

Suppose that we have a homogeneous non-stationary time series as shown in Fig 1. Fig 2 showsthe series after differencing (R commands are given):

6

Time

ts.s

im

0 50 100 150 200 250 300

−12

0−

80−

400

Time

diff(

ts.s

im)

0 50 100 150 200 250 300

−4

02

4

1.

The top plot exhibits similar segments, it’s just the trend that differs. It is known that this typeof non-stationary behaviour can be removed by taking successive lag differences of the series.If the series is non-seasonal, take lag one differences of the series. If the series is seasonal withperiod s, take lag s differences of the series. Using the backshift operator, lag one or non-seasonaldifferences are denoted by BXt = Xt−1 and lag s or seasonal differences are BsXt = Xt−s.

First differencing is defined as: Xt − Xt−1 = (1 − B)Xt = ∆Xt. dth differencing is: Yt =(1−B)dXt = ∆dXt.

Suppose that Xt is a non-stationary (non-seasonal) time series and Yt = (1 − B)dXt, d ≥ 1is stationary. Now we can fit and ARMA(p, q) model for Yt such that Φ(B)Yt = Θ(B)Zt.Therefore this can be written as:

Φ(B)(1−B)dXt = Θ(B)Zt,

where Φ(B) and Θ(B) are AR and MA operators as defined perviously. This model is knownas ARIMA(p, d, q).

Note: In general homogeneous non-stationary ARIMA time series can be transformed to astationary ARMA time series by successive differencing. We therefore consider further resultsfor ARMA processes.

1Generated in R using:ts.sim <- arima.sim(list(order = c(1,1,1, ar = 0.7), ma=0.4, n = 300)

ts.plot(ts.sim)

ts.plot(diff(ts.sim))

7

1.3 Stationarity and Invertibility of ARMA Processes

An ARMA(p, q) process defined by Φ(B)Xt = Θ(B)Zt is said to be stationary if there is asolution such that Xt =

∑∞j=0ΨjZt−j and

∑∞j=0Ψ

2j <∞.

Note: A causal ARMA process is a stationary ARMA process. It is clear that the AR polyno-mial, Φ(B) determines the stationarity. We can see this by writing:

Xt = Φ−1(B)Θ(B)Zt,

where we write Ψ(B) = Φ−1(B)Θ(B).

Note that a stationary ARMA(p, q) implies an MA(∞) process. Invertibility

An ARMA(p, q) process defined by Φ(B)Xt = Θ(B)Zt is said to be invertible if there exists asequence of constants πj such that

∑ |πj| <∞ and

Zt =

∞∑

j=0

πjXt−j .

Note that an invertible ARMA(p, q) implies an AR(∞) process.

1.3.1 Conditions for Stationarity and Invertibility

From the fundamental theorem of algebra, we have that Φ(B) can be factorised as p linearfactors such that:

Φ(B) = (1− ξ1B) · · · (1− ξpB)

so we getΦ−1(B) = (1− ξ1B)−1 · · · (1− ξpB)−1

Recall that1 + x+ x2 + . . . = (1− x)−1

only when |x| < 1 (i.e. converges only when |x| < 1). Therefore, for Φ−1(B) to be convergent,we require all of the (1− ξiB)−1 factors to be convergent and hence all |ξi| < 1.

A similar argument can be developed for the convergence of Θ−1(B). We summarize theseconcepts in the theorem below:

Theorem 1.3.1. Let Xt be an ARMA(p, q) process for which the polynomials Φ(·) and Θ(·)have no common zeros. Then

1. Xt is stationary if the characteristic equation Φ(x) = 0 has all its roots outside the unitcircle. The stationary solution is:

Xt = Φ−1(B)Θ(B)Zt = Ψ(B)Zt.

2. Xt is invertible if the characteristic equation Θ(x) = 0 has all its roots outside the unitcircle. The desired invertible representation is:

Zt = Ψ(B)Θ−1(B)Xt = π(B)Xt =∞∑

j=0

ujXt−1.

8

Note that this is a AR(∞) process.

Example 1.3.1. Show that the ARMA(1,1) process given by

Xt − 0.6Xt−1 = Zt + 0.3Zt−1, Zt ∼WN(0, σ2)

is stationary and invertible. Find its (1) stationary and (2) invertible solutions. Hence orotherwise find var(Xt).

1. The stationary solution can be found:

Xt = (1− 0.6B)−1(1 + 0.3B)Zt

= (1 + 0.6B + 0.62B2 + . . .)(1 + 0.3B)Zt

=[1 + (0.6 + 0.3)B + (0.62 + 0.6× 0.3)B2 + (0.63 + 0.62 × 0.3)B3 + . . .)Zt

=

∞∑

j=0

ΨjZt−j

Where Ψ0 = 1 and Ψj = 0.6j−1(0.6 + 0.3) = 0.6j−1 × 0.9 for j ≥ 1. Thus,

∞∑

j=0

Ψ2j = 1 + 0.92

∞∑

j=1

0.62(j−1)

= 1 + 0.92∞∑

j=0

(0.62)j

= 1 + 0.92(1− 0.62)−1 = 2.265625 <∞

Note that we could simply have noted that because we have:

(1− 0.6B)Xt = (1 + 0.3B)Zt

the root of Φ(x) = 0 is:

1− 0.6x = 0 =⇒ x =5

3> 1

Therefore Xt is stationary.

2. The root of Θ(x) = 0 is x = −103 . Therefore Xt is invertible since |x| > 1. The

invertible solution is given by:

Zt = (1 + 0.3B)−1(1− 0.6B)Xt

=

∞∑

j=0

πjXt−j

3. Thus the variance can be found as var(Xt) = σ2(2.265625).

9

1.4 Yule Walker Equations

Theorem 1.4.1. Suppose that Φ(B)Xt = Zt is a stationary AR(p) process. Then the autocor-relation function ρk satisfies the following Yule-Walker equations:

Φ(B)ρk = 0, k ≥ 1

Proof. ConsiderΦ(B)Xt = Xt − φ1Xt−1 − . . .− φpXt−p = Zt

Multiply through by Xt−k and take expectations:

E(XtXt−k)− φ1E(Xt−1Xt−k)− . . .− E(Xt−pXt−k) = E(ZtXt−k)

Now because Xt is stationary,

Xt =

∞∑

j=0

φjZt−j

= Zt +Φ1Zt−1 + φ2Zt−2 + . . .

and similarly,Xt−k = Zt−k + φ1Zt−k−1 + φ2Zt−k−2 + . . .

We know that E(Xt−kZt) = 0 for k ≥ 1. Further for k = 0:

E(XtZt) = E

Zt

∞∑

j=0

φjZt−j

= E(Z2t ) + 0

= σ2.

Now since E(Xt) = 0,γk = cov(Xt,Xt−k) = E(XtXt−k)

so we have:γk − φ1γk−1 − . . . − φpγk−p = 0; k ≥ 1.

Dividing through by γ0 gives:

ρk − φ1ρk−1 − . . .− φpρk−p = 0

(1− φ1B − . . .− φpBp)ρk = 0

Φ(B)ρk = 0

1.4.1 Expression for ρk

Suppose that Φ(B) = (1− ξ1B) · · · (1− ξpB) are the factors of Φ(B). Since Xt is stationary,|ξi| < 1 for all i. Using the theory of difference equations it can be seen that

ρk = A1ξk1 + . . .+Apξ

kp , k ≥ 0

10

is a solution to Yule Walker equations for suitably chosen constants A1, . . . , Ap such that∑p

i=1Ai = 1 to satisfy ρ0 = 1.

Note that from this representation it is easy to see that the decay of the ACF is exponentialas each |ξi| < 1. Furthermore, if two or more of the zeros are complex, then ρk is a mixture ofexponentials and sine/cosine waves:

a+ ib = reiθ = r(cos θ + i sin θ),

where r = |a+ ib|.Exercise 1.4.1. Consider the following stationary AR(2) process:

Xt + α1Xt−1 + α2Xt−2 = Zt.

If α21 < 4α2 (i.e. α2

1 − 4α2 < 0 or the discriminant is less than zero), show that

ρk =αk/22 sin(kθ + ψ)

sinψ,

where tanψ =(1 + α2) tan θ

1− α2, cos θ =

−α1

2√α2.

(See Priestley (1981) p130).

Example: Consider the AR(2) process given by Xt + aXt−1 + bXt−2 = Zt with (1) a =−0.6, b = 0.08 (2) a = 0.4, b = 0.3 and Zt ∼WN(0, 1). Find ρk and plot.

Solution:

Plots of these ACFs are given below (respectively):

11

0 5 10 15 20 25

0.0

0.4

0.8

k

rhok

0 5 10 15 20

−0.

40.

00.

40.

8

k

rho

The first graph represents the ACF when the roots are real and the second represents the caseof complex roots. 2

2Generated with the following code:Real roots:

> polyroot(c(1,-.6,0.08))

2.5+0i 5.0-0i

> k=c(0:25)

> ρk = (16/9) ∗ (0.4)k + (−7/9) ∗ (0.2)k> plot(k, ρk)Complex roots:

> a=0.4

> b=0.3

> a2 − 4 ∗ b-1.04

> polyroot(c(1,0.4,0.3))

-0.666667+1.699673i -0.666667-1.699673i

> θ = acos(−a/(2 ∗√

(b)))> ψ = atan((1 + b) ∗ tan(θ)/(1 − b))> k = c(0 : 20)> ρk = bk/2 ∗ sin(k ∗ θ + ψ)/sin(ψ)> plot(k, ρk)

12

1.5 Autocovariance Generating Function

Theorem 1.5.1. Let Xt be a stationary ARMA(p, q) process. Then the autocovariance gen-erating function of this stationary process is given by:

g(x) =∞∑

h=−∞

γ(h)xh = σ2Ψ(x)Ψ(x−1)

where Ψ(x) = Θ(x)Φ−1(x) =∑∞

j=0Ψjxj.

Proof. Model: Φ(B)Xt = Θ(B)Zt. Causal =⇒ Xt =∑∞

j=0ΨjZt−j with∑∞

j=0Ψ2j < ∞.

Therefore we have autocovariance function:

γ(h) = cov(Xt,Xt+h) = E(XtXt+h) = σ2∞∑

j=0

ΨjΨj+h.

Let g(x) be the autocovariance generating function, i.e.

g(x) =∞∑

h=−∞

γ(h)xh

= σ2∞∑

h=−∞

∞∑

j=0

ΨjΨj+h

xh

= σ2∞∑

j=0

∞∑

h=−j

ΨjΨj+hxh.

Let j + h = k then

g(x) = σ2∞∑

j=0

∞∑

k=0

ΨjΨkxk−j

= σ2

∞∑

j=0

Ψjx−j

(∞∑

k=0

Ψkxk

)

= σ2Ψ(x)Ψ(x−1)

Exercise 1.5.1 (AR(1) Case). Consider (1− φB)Xt = Zt.

Ψ(x) =1

1− φx.

The autocovariance generating function is given by:

g(x) = σ2(

1

1− φx

)(1

1− φx−1

)

= σ2(1 + φx+ φ2x2 + . . .) +

(

1 +φ

x+φ2

x2+ . . .

)

= σ2[(1 + φ2 + φ4 + . . .) + (φ+ φ3 + φ5 + . . .)x+ . . .

]

13

So, we can find the variance, γ0, as the coefficient of x0 in g(x):

γ0 = σ2(1 + φ2 + φ4 + . . .) =σ2

1− φ2

Similarly, we can find γ(1):

γ1 = σ2(φ+ φ3 + φ5 + . . .) =σ2φ

1− φ2

and so on. . .

Two General Results

Suppose that Xt and Yt are two stationary processes related by Yt =∑∞

j=−∞ αjXt−j , where∑∞

j=−∞ α2j <∞ or square summable. Then

γYk =∑

j

∑

k

αjαkγXk+l−j

andfY (ω) = |α(e−iω)|2fX(ω).

1.6 Forecasting with ARMA Models

Suppose that Xt is a stationary and invertible ARMA(p, q) model. Then

Xt =

∞∑

j=0

ψjZt−j

is a stationary solution. Write an l-step ahead forecast function of Xt+l, from the origin t in theform Xt(l) =

∑∞j=0 bjZt−j (b0 = 1) for suitably chosen constants b1, b2, . . .. The corresponding

l-step ahead forecast error is:εt(l) = Xt+l −Xt(l).

Now we can find constants b1, b2, . . . such that E[εt(l)]2 is minimum and and the corresponding

forecast function is called the minimum mean square (mmse) forecast function.

14

Theorem 1.6.1. The mmse forecast function of Xt+l is given by the conditional mean

Xt(l) = E(Xt+l|Ft)

where Ft is the history of the process.

Proof.

εt(l) = Xt+l −Xt(l)

=

∞∑

j=0

ψjZt+l−j −∞∑

j=0

bjZt−j

=

l−1∑

j=0

ψjZt+l−j +

∞∑

j=l

ψjZt+l−j −∞∑

j=0

bjZt−j

=

l−1∑

j=0

ψjZt+l−j +

∞∑

k=0

(ψl+k − bk)Zt−k

So we have past shocks on the right and future shocks on the left, therefore we have a linearcombination of uncorrelated values so,

var[εt(l)

]= σ2

∞∑

k=0

ψ2j + σ2

∞∑

k=0

(ψl+k − bk)2

which is minimised when∞∑

k=0

(ψl+k − bk)2 = 0

which occurs when bk = ψl+k. As we originally specifiedXt(l) =∑∞

j=0 bjZt−j , the correspondingmmse forecast function is thus,

Xt(l) =∑∞

j=0ψj+lZt−j

This is found by noting that the true value is:

Xt+l =

∞∑

j=0

ψjZt+l−j = Zt+l + ψ1Zt+l−1 + . . . + ψl−1Zt+1 + ψlZt + ψl+1Zt−1 + . . .

So the forecast function is:

Xt(l) =Xt+l − εt(l)

=∞∑

j=0

ψjZt+l−j −l−1∑

j=0

ψjZt+l−j

=Zt+l + ψ1Zt+l−1 + . . .+ ψl−1Zt+1 + ψlZt + ψl+1Zt−1 + . . .

−(Zt+l + ψ1Zt+l−1 + ψ2Zt+l−2 + . . .+ ψl−1Zt+1

)

=ψlZt + ψl+1Zt−1 + ψl+2Zt−2 . . .

=∞∑

j=0

ψj+lZt−j

=E(Xt+l|Ft)

15

The final conditional expectation is found by noting that when you condition on Ft all futureobservations in the Xt+l infinite sum are equal to zero.

The forecast error is given by:

εt(l) = Zt+l + ψ1Zt+l−1 + . . . + ψl−1Zt+1

which has corresponding variance:

var[εt(l)

]= σ2

l−1∑

j=0

ψ2j .

If we assume normality then we can construct intervals using the distribution:

εt(l) ∼ N

0, σ2l−1∑

j=0

ψ2j

.

Therefore,Xt+l −Xt(l)√

σ2∑l−1

j=0 ψ2j

∼ N (0, 1)

so a 100(1 − α)% forecast interval for Xt+l is:

(Xt(l)− zα/2S, Xt(l) + zα/2S

)

where S2 = σs∑l−1

j=0ψ2j .

16

Chapter 2

Introduction to Spectral Analysis ofTime Series

2.1 Spectral Density Functions

Let Xt be a stationary process with autocovariance function γh = cov(Xt,Xt+h). The spectraldensity function (sdf) of Xt, fX(ω) is given by:

fX(ω) =1

2π

∞∑

j=−∞

γjeiωj , −π < ω < π

It’s basically the fourier transform of the acf. Note that we can write this in terms of thecovariances or correlations by noting that γ−j = γj and e−iθ = cos(θ) + i sin(θ):

fX(ω) =1

2π

∞∑

j=−∞

γjeiωj

=1

2π

[. . .+ γ−1e

−iω + γ0 + γ1eiω + . . .

]

=1

2π

[γ0 + γ1(e

iω + e−iω) + γ2(e2iω + e−2iω) + . . .

]

=1

2π[γ0 + 2γ1 cos(ω) + 2γ2 cos(2ω) + . . .]

=1

2π

γ0 + 2

∞∑

j=1

γj cos(ωj)

=γ02π

1 + 2∞∑

j=1

ρj cos(ωj)

The normalised spectrum is obtained by dividing the spectrum by it’s variance (to get unitvariance overall, as per the usual definition of normalise):

f∗X(ω) =fX(ω)

γ0=

1

2π

1 + 2

∞∑

j=1

ρj cos(ωj)

A plot of ω against fX(ω) or f∗X(ω) is very useful in time series analysis.

17

2.1.1 ARMA(p, q) case

Theorem: The sdf of an ARMA(p, q) process can be expressed as a rational function:

fX(ω) =σ2

2π

∣∣∣∣

Θ(eiω)

Φ(eiω)

∣∣∣∣

2

, −π < ω < π (2.1)

Proof: Use the theorem/results on p13-p14.

Example 2.1.1 (Theoretical Spectrum of an AR(1) process). Consider (1− φB)Xt = Zt andlet Φ(B) = 1− φB. Of course in the AR(1) case, Θ = 1, i.e. there is no MA component.

Thus, using (2.1), we have,

fX(ω) =σ2

2π

1

|1− φeiω|2 , −π < ω < π

=σ2

2π

1

|1− φ cos(ω)− iφ sin(ω)|2

Now noting that if we have z = x + iy then |z| =√

x2 + y2 so |z|2 = x2 + y2 applying that tothe present case, we have:

fX(ω) =σ2

2π

1

1− 2φ cos(ω) + φ2 cos2(ω) + φ2 sin2(ω)

=σ2

2π

1

1− 2φ cos(ω) + φ2

We can plot this, first by considering y = 1− 2φ cos(ω) + φ2. Turning points of y are given byy′ = 2φ sin(ω) = 0 or at ω = 0, π. Further, y′′ = 2φ cos ω. It is clear that for φ > 0, at ω = 0there is a minimum and at ω = π there is a maximum of y. Further, a point of inflection occurswhen y′′ = 0 or at ω = π

2 .

18

Spectrum

Example 2.1.2 (Theoretical Spectrum of an MA(1) process). Consider Xt = (1− θB)Zt withΘ(B) = 1− θB. From (2.1) we have

fX(ω) =σ2

2π

∣∣1− θeiω

∣∣2

=σ2

2π|1− θ cosω − iθ sinω|2

=σ2

2π(1− θ cosω)2 + θ2 sin2 ω

=σ2

2π(1− 2θ cosω + θ2 cos2 ω + θ2 sin2 ω)

=σ2

2π(1− 2θ cosω + θ2)

Spectrum

19

Example 2.1.3 (ARMA(1,1) process). Consider (1 − 0.6B)Xt = (1 + 0.3B)Zt. Again, usingan application of (2.1), we have

fX(ω) =σ2

2π

∣∣∣∣

1 + 0.3xiω

1− 0.6eiω

∣∣∣∣

2

, −π < ω < π

=σ2

2π× 1.09 + 0.6 cos ω

1.36− 1.2 cos ω

Spectrum

2.2 Periodogram

Suppose that we have n observations on X1,X2, · · · ,Xn from a stationary time series. SincefX(ω) = 1

2π

∑∞j=−∞ γje

iωj is the theoretical spectrum (or the spectral density function - sdf) ofthe process, a natural estimator for fX(ω) is defined by

In,X(ω) =1

2π

n−1∑

k=−(n−1)

ckeiωk,

where ck = 1n

∑n−kt=1 (xt − x)(xt+k − x) for k = 0, . . . , n − 1.

Lemma: In,X(ωj) based on n observations is equivalent to

In,X(ωj) =1

2πn

∣∣∣∣∣

n∑

t=1

Xte−iωjt

∣∣∣∣∣

2

,

where ωj =2πjn , j = 0, 1, · · · , [n2 ] = n

2 . (Recall that [n2 ] =

n2 (when n is even) and n−1

2 (when nis odd)).

20

To prove the above Lemma, we use the following well-known Trigonometric Results.

For ωj =2πjn , j = −[n2 ], . . . , [

n2 ], we have

1.n∑

t=1

cos(tωj) =n∑

t=1

sin(tωj) = 0

2.n∑

t=1

cos(tωj) cos(ωkt) =

0 j 6= kn j = k = 0 or n/2n2 j = k 6= 0 or n

2

3.n∑

t=1

sin(tωj) sin(ωkt) =

0 j 6= k0 j = k = 0 or n/2n2 j = k 6= 0 or n

2

4.n∑

t=1

sin(tωj) cos(tωj) = 0 ∀ j, k

Proof of Lemma

Noting that x is the complex conjugate of x, we have

∣∣∣∣∣

n∑

t=1

Xte−iωjt

∣∣∣∣∣

2

=

[n∑

s=1

Xse−iωjs

][n∑

t=1

Xte−iωjt

]

=

[n∑

s=1

Xse−iωjs

][n∑

t=1

Xteiωjt

]

=

[n∑

s=1

(Xs −m)e−iωjs

] [n∑

t=1

(Xt −m)eiωjt

]

(sums of sines and cosines are zero as m is a constant).

=∑

s

∑

t

(Xs −m)(Xt −m)e−ωj(s−t).

Let k = s− t. Now we have −(n− 1) < k < (n− 1) or |k| < n

= n∑

|k|<n

cke−iωjk

and hence the Lemma.

Note: This In,X(ω) is known as Sample Periodogram. Sample periodogram In,X(ω) is used toestimate the spectrum and can be used to estimate the parameters of a model. A method basedon this approach in time series is known as Frequenct Domain Approach. There are a numberof alternative definitions for the sample periodogram. They are equivalent to each other apartfrom a scaler multiplication.

21

1. Priestly (p 395):

In(ωj) =2

n

∣∣∣∣∣

n∑

t=1

Xte−iωjt

∣∣∣∣∣

2

,

where ωj =2πjn , j = 0, 1, . . . ,

[n2

].

2. Priestly (p 395):

In(ωj) =2

πn

∣∣∣∣∣

n∑

i=1

Xte−iωjt

∣∣∣∣∣

2

,

where ωj =2πjn .

3. Priestly (p 399)

In(ωj) = 2

[

c0 + 2n∑

s=1

cs cos(sωj)

]

4. Chatfield (p 125)

In(ωj) =1

π

[

c0 + 2n−1∑

k=1

ck cos(kωj)

]

2.3 Estimation of a Discrete Spectrum

Suppose that we suspect that a given discrete time series of n observations contains determin-istic periodic component/s. Therefore, a standard model for a process with a purely discretespectrum is given by

Xt =

k∑

j=1

αj cos(ωjt+ φj) + Zt

where k, αj, ωj are constants, φj are iid U(−π, π), Zt ∼ WN(0, σ2) and Zt and φjare mutually independent.

Write

Xt =

k∑

j=1

[Aj cos(tωj) +Bj sin(tωj)] + Zt,

where Aj = αj cos(φj) and Bj = −αj sin(φj). Note that A2j +B2

j = α2j and tan φj = −Bj

Aj. We

can find Aj and Bj by minimising∑n

t=1 Z2t (i.e. use the least squares criterion).

Q =n∑

t=1

Z2t =

n∑

t=1

Xt −k∑

j=1

[Aj cos(tωj) +Bj sin(tωj)]

2

Setting ∂Q∂Aj

= 0 and ∂Q∂Bj

= 0, for ωj =2πjn , gives:

Aj =2

n

n∑

t=1

Xt cos(tωj); Bj =2

n

n∑

t=1

Xt sin(tωj)

Now we look at some properties of these estimates Aj and Bj.

22

2.3.1 Properties of Aj and Bj

Since we have observations from a single realization, it is reasonable to treat φi as constants.This gives Ai and Bi are constants and we have following results:

1. E(Aj) = Aj

Proof.

E(Aj) =2

n

n∑

t=1

E(Xt) cos(tωj)

=2

n

n∑

t=1

(k∑

l=1

[Al cos(tωl) +Bl sin(tωl)]

)

cos(tωj)

=2

n

[n∑

t=1

cos(tωj)

k∑

l=1

Al cos(tωl) +

n∑

t=1

cos(tωj)

k∑

l=1

Bl sin(tωl)

]

Now we only get a contribution from the first term in the square brackets when l = j(2nd trig property) and 0 from the second term (4th trig property). Thus we have

E(Aj) =2

n

[

Ajn

2+ 0]

= Aj

2. E(Bj) = Bj

Proof. Similar to the Aj case.

3. The variance of Aj:

var(Aj) =22

n2

n∑

t=1

var(Xt) cos(tωj)

Now

Xt =

k∑

j=1

[Aj cos(tωj) +Bj sin(tωj)]

︸︷︷︸

deterministic

+Zt

Therefore var(Xt) = var(Zt) = σ2 so,

var(Aj) =4

n2σ2

n∑

t=1

cos2(tωj)

=4

n2σ2n

2

=2σ2

n

Using the second trigonometric property from before.

4. var(Bj) =2σ2

n is shown as for the Aj case.

23

5. cov(Aj , Bj) = 0 for all j.

Periodogram Analysis

Let

A(ωj) =

√

2

n

n∑

t=1

Xt cos(ωjt) =

√n

2Aj

B(ωj) =

√

2

n

n∑

t=1

Xt sin(ωjt) =

√n

2Bj

Recall that 4πIn(ωj) =[A(ωj)

]2+[B(ωj)

]2and hence, the periodogram In(ωj) is related to

A(ωj) and B(ωj) through:

I ′n(ωj) =[A(ωj)

]2+[B(ωj)

]2= 4πIn(ωj).

In practice, we evaluate In(ωj) at ωj = 2πjn ; j = 0, 1, . . . ,

[n2

], where [·] is the integer compo-

nent.

2.4 Distribution of In(ωj) under the null hypothesis,

H0 : αj = 0 for all j.

Consider the model Xt =∑k

j=1 αj cos(ωjt+ φj) + Zt, where Zt ∼WN(0, σ2).

Suppose that we wish to test H0 : αj = 0 for all j. That is, under the null hypothesis, the seriesexhibits no periodic behaviour and Xt ∼WN(0, σ2). Consider the following lemma:

Lemma 2.4.1. If Zt ∼ N (0, σ2) (and are independently distributed) i.e. Zt ∼ NID(0, σ2) thenunder H0 :

1. A(ωj) ∼ N (0, σ2) for j 6= 0,[n2

]

2. B(ωj) ∼ N (0, σ2) for j 6= 0,[n2

]

3. cov(A(ωj), B(ωk)) = 0 form all j, k

4. cov(A(ωj), A(ωk)) = cov(B(ωj), B(ωk))0 form all j 6= k

Proof. 1. Under H0 : Xt ∼ N (0, σ2) therefore E(A(ωj)) = 0 and

var(A(ωj) =2nσ

2∑n

t=1 cos2(ωjt)

Clearly, for j = 0, the rhs is 2σ2 and for j 6= 0, n/2(n − even), the rhs is σ2 by trigidentities.

2, 3 and 4 follow similarly.

Theorem 2.4.1. Under H0, I′n(ωj) ∼ σ2χ2

2; j 6= 0,[n2

](i.e. only consider non-endpoints.)

24

Proof. For j 6= 0,[n2

](n even), we have

I ′n(ωj) = [A(ωj)]2 + [B(ωj)]

2

= σ2

[(A(ωj)

σ

)2

+

(B(ωj)

σ

)2]

= σ2χ22

Adding two standard normal squares gives χ22.

Since E(χ2p) = p and var(χ2

p) = 2p we have E[I ′n(ωj)

]= 2σ2 and var

[I ′n(ωj)

]= 4σ2 for

j 6= 0,[n2

].

A periodogram of a white noise satisfy the above properties.

Note: It is clear that for j = 0,[n2

](n even), we have var

[I ′n(ωj)

]= 8σ2.

2.5 A Test for Periodogram Ordinates

This section develops a test to identify whether a given time series is deviating from white noise.That is to test the null hypothesis H0 : Xt ∼ NID(0, σ2).

Consider the model Xt =∑k

j=1 αj cos(ωjt + φj) + Zt. If αj = 0 for all j, then Xt will be a

purely random process. In this case we have shown thatIn(ωj)σ2 ∼ χ2

2. Further, it is easy to seethat cov(In(ωj), In(ωk)) = 0 for all j 6= k (see Priestley, 1981, p405). These results will be usedto develop the following test:

Recall that the pdf of a χ22 is f(x) = 1

2exp(−x/2); x ≥ 0 and hence for z ≥ 0 and forj = 1, 2, · · · , [n/2], we have

P (I ′n(ωj)

σ2≤ z) = 1− exp(−z/2).

Let

γ = maxI ′n(ωj)

σ2, j ∈ [1, [n/2]].

Now under the null hypothesis H0, we have

P (γ ≥ z) = 1− P (γ ≤ z) = 1− P (maxI ′n(ωj)

σ2≤ z)) = 1− P (

I ′n(ωj)

σ2≤ z, for allj).

This givesP (γ ≥ z) = 1− [1− exp(−z/2)][n/2].

Since σ2 is not known in practice, we find an unbiased estimate under H0 as follows:

E(∑[n/2]

j=1 I ′n(ωj) =∑[n/2]

j=1 E(I ′n(ωj)) =∑[n/2]

j=1 2σ2 = 2[n/2]σ2.

This gives an unbiased estimate of σ2 as:

σ2 =1

2[n/2]

[n/2]∑

j=1

I ′n(ωj).

25

Note: For large n, σ2 = 1n

∑[n/2]j=1 I ′n(ωj) and using I ′n(ωj) = 2

∑n−1k=−(n−1) cke

iωk, we have

σ2 ≈ c0 (the sample variance).

Now a modified (or studentize) statistic under H0 is

γ∗ =maxI ′n(ωj)

σ2

and this can be used to test the null hypothesis neglecting the sampling fluctuations of thedenominator. That is using the fact that the distribution of γ∗ will have the same distributionas γ, we have

P (γ∗ ≥ z) ≈ 1− [1− exp(−z/2)][n/2].

Example: Suppose that a time series of length 114 gave the following:MaxI ′n(ωj) = I(12) = 21.0246,

∑57j=1 I

′n(ωj) = 35.23.

Thus, γ∗ = 21.02461

2∗57∗35.23

= 68.03305.

Using the distribution of γ∗, we find the upper 5% of the distribution (critical value z at 5%)from 1− [1− exp(−z/2)]57 = .05 or z = 14.02739.Since γ∗ is highly significant, we have evidence against the null hypothesis.

2.6 Properties of the Periodogram

Suppose that Xt is a stationary time series. Recall that γk is an even function and fX(ω) =12π

∑∞k=−∞ γke

−iωk. Thus we have

fX(ω) =1

2π

[

γ0 + 2

∞∑

k=1

γk cosωk

]

.

The corresponding periodogram estimate of fX(ω) is

In(ω) =1

2π

[

c0 + 2n−1∑

k=1

ck cos(kω)

]

.

Now we establish the following properties:

• Unbiasedness of In(ω)

Since ck → γk as n→ ∞, clearly

E(In(ω)) → fX(ω).

Therefore In(ω) is asymptotically unbiased for fX(ω).

• Consistency of In(ω)

To find the variance, var [InX(ω)] , state the following theorem:

Theorem: Suppose that Xt =∑∞

j=−∞ gjZt−j , where∑∞

j=−∞ |gj |.|j|α <∞, α > 0. Then

InX(ω) =f(ω)

σ2InZ(ω) +Rn(ω),

26

where InZ(ω) is the peridogram of the WN Zt.

Proof: See Priestley (1981), p424.

Noting that fX(ω) is deterministic, i.e. not a random variable, we have that

InX(ω)

fX(ω)

for j 6= 0,[n2

]are independently distributed as χ2

2 so,

E

[InX(ω)

fX(ω)

]

= 2 and var

[InX(ω)

fX(ω)

]

= 4.

Thusvar [InX(ω)] = 4 [fX(ω)]2 .

Notes:

1. The above result tell us that the estimator [InX(ω)] fluctuate widely.

2. Since var [InX(ω)] ∼ f2X(ω) for large n, we conclude that the periodogram is not consistentestimator as its variance will never go to zero. i.e. InX(ω) is not a consistent estimatorof fX(ω).

However, we can introduce certain weights to smooth the periodogram and to achieve theconsistency of these estimators. This will be considered in the next section.

27

2.7 Lag window spectral density estimates

Since

InX(ωj) =1

2π

∑

|k|<n

cke−iωjk

is not consistent for fX(ωj), take the following class of estimators:

hX(ωj) =1

2π

∑

|k|<m

λkcke−iωjk

where m < n such that mn → 0 as m,n → ∞ and λk is a sequence of suitable constants or

lag windows.

Notes:

• One choice would be m = nθ; 0 < θ < 1.

• The sequence λk (lag windows) must satisfy certain regularity conditions.

• The corresponding estimate hX(ω) is called a lag window spectral density estimate.

Some popular lag windows in practice

1. Truncated periodogram window:

λk =

1 |k| ≤ m0 |k| > m

2. Bartlett window:

λk =

1− |k|m |k| ≤ m

0 |k| > m

3. Tukey window

λk =

1− 2a+ 2a cos

(πkm

)|k| ≤ m

0 |k| > m

28

4. Daniell Window:

λk =sin(πkm

)

(πkm

) ∀ k

Thus instead of using

InX(ωj) = 2

n−1∑

k=−(n−1)

cke−iωjk

find hX(ωj) using a suitable lag window λk.

2.8 Sampling properties of lag window estimates

For our convenience, define the function W (θ) as W (θ) = 12π

∑

|s|<n λse−iθs. This is called a

spectral window and is equivalent to:

λs =

∫ π

−πω(θ)eiθsdθ.

Note: W (θ) (or Wn(θ)) must satisfy the following regularity conditions:

1. Wn(θ) ≥ 0 i.e. non negative definite.

2. For all ε > 0 Wn(θ) → 0 uniformly as n→ ∞ for |θ| > ε.

3.∫ π−πWn(θ)dθ = 1.

4.∫ π−πW

2n(θ)dθ <∞ and 1

n

∫ π−πW

2n(θ)dθ → 0.

5.∑

|s|<n|s|n λ

2n(s)

∑

|s|<n λ2n(s)

→ 0 as n→ ∞.

Now, we have the following theorem for h(ω) :

Theorem:

h(ω) =

∫ π

−πIn(θ)W (ω − θ)dθ

Proof: Consider

In(ω) =1

2π

∑

|k|<n

cke−iωk,

29

where ck =∫ π−π In(θ)e

ikθdθ.

Therefore,

hX(ω) =1

2π

∑

|k|<m

λkcke−iωk

=1

2π

∑

|k|<n

λkcke−iωk

(since λk = 0 for |k| > m.)

=1

2π

∑

|k|<n

λk

∫ π

−πIn(θ)e

−ik(ω−θ)dθ

=

∫ π

−πI⋆n(θ)

1

2π

∑

|k|<n

λke−ik(ω−θ)dθ

=

∫ π

−πIn(θ)W (ω − θ)dθ

Note: If we approximate the the integral by a discrete sum over the ordinates ωj, then wehave

hX(ω) ≈ 2π

n

∑

i

In(ωj)W (ω − ωj).

Since In(ωj) has a non-central χ2 distribution in the form

h(ω)

fX(ω)∼ aχ2

ν (A).

Now write

var(h(ω)

)∼ f2(ω)

n

∑

|s|<n

λ2n(s), ω 6= 0,±π.

The following theorem gives another approximation for var(h(ω)).

Theorem: For large n we have

var(h(ω)) ∼ 2π

nf2X(ω)

∫ π

−πW 2

n(ω − θ)dθ, ω 6= 0, π.

Proof: See Priestly p. 454.

Note that since

Wn(θ) =1

2π

∑

|s|<n

λn(s)e−isθ

we have [Parseval’s Identity]:

2π

∫ π

−πW 2

n(θ)dθ =∑

|s|<n

λ2n(s).

Thus we haveInX(ωj)

2πfX(ωj)∼ χ2

2; j 6= 0,[n

2

]

.

30

Evaluating a and b in (A)

To find a and ν we use the following:

1. Using E[h(ω)

]= f(ω) and from h(ω) ∼ af(ω)χ2

ν , we have

E[h(ω)

]= aνf(ω).

Thusaν = 1 (2.2)

2. From var(h(ω)

)=[af(ω)

]22ν, it is clear that

[af(ω)

]22ν =

f2(ω)

n

∑

|s|<n

λ2n(s)

= 2a2ν =1

n

∑

|s|<n

λ2(s).

aν = 1 gives

2a(aν) = 2a =1

n

∑

|s|<n

λ2n(s)

and thus we have

a =1

2n

∑

|s|<n

λ2n(s).

Now,

ν =2n

∑

|s|<n λ2n(s)

Example 2.8.1. Find a and ν for the Truncated Periodogram Window given by

λk =

1 |k| ≤ m0 otherwise

solution: Clearly, ν = 2n2m+1 ≈ n

m for large m.

2.8.1 Confidence intervals for estimated sdf

To find an 100(1 − α)% confidence interval for f(ωj) we use the fact that

h(ωj)

f(ωj)∼ aχ2

ν

orh(ωj)

af(ωj)∼ χ2

ν

So a 100(1 − α)% CI for f(ωj) is:

l <h(ωj)af(ωj )

< u

νh(ωj)

u< f(ωj) <

νh(ωj)

l

Noting that ν = a−1 and l and u are the critical values from the lower and upper α/2 regionsof the χ2

ν distribution.

31

stat4: advancedtimeseriesanalysis& forecasting summarylecturenotes · ... the analysis of...

Documents