notes on time series analysis

111
Time Series Analysis Lecture Notes for 475.726 Ross Ihaka Statistics Department University of Auckland April 14, 2005

Upload: alzarani

Post on 19-Dec-2015

28 views

Category:

Documents


2 download

DESCRIPTION

Time Series Econometrics, Introductory Notes on Time series Analysis.

TRANSCRIPT

  • Time Series AnalysisLecture Notes for 475.726

    Ross Ihaka

    Statistics DepartmentUniversity of Auckland

    April 14, 2005

  • ii

  • Contents

    1 Introduction 11.1 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Stationarity and Non-Stationarity . . . . . . . . . . . . . . . . . 11.3 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.3.1 Annual Auckland Rainfall . . . . . . . . . . . . . . . . . . 21.3.2 Nile River Flow . . . . . . . . . . . . . . . . . . . . . . . . 21.3.3 Yield on British Government Securities . . . . . . . . . . 21.3.4 Ground Displacement in an Earthquake . . . . . . . . . . 31.3.5 United States Housing Starts . . . . . . . . . . . . . . . . 31.3.6 Iowa City Bus Ridership . . . . . . . . . . . . . . . . . . . 3

    2 Vector Space Theory 72.1 Vectors In Two Dimensions . . . . . . . . . . . . . . . . . . . . . 7

    2.1.1 Scalar Multiplication and Addition . . . . . . . . . . . . . 72.1.2 Norms and Inner Products . . . . . . . . . . . . . . . . . 8

    2.2 General Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . 92.2.1 Vector Spaces and Inner Products . . . . . . . . . . . . . 92.2.2 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.3 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.1 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3.2 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.4 Hilbert Spaces and Prediction . . . . . . . . . . . . . . . . . . . . 142.4.1 Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . 142.4.2 General Prediction . . . . . . . . . . . . . . . . . . . . . . 15

    3 Time Series Theory 173.1 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Hilbert Spaces and Stationary Time Series . . . . . . . . . . . . . 183.3 The Lag and Differencing Operators . . . . . . . . . . . . . . . . 193.4 Linear Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.5 Autoregressive Series . . . . . . . . . . . . . . . . . . . . . . . . . 21

    3.5.1 The AR(1) Series . . . . . . . . . . . . . . . . . . . . . . . 213.5.2 The AR(2) Series . . . . . . . . . . . . . . . . . . . . . . . 233.5.3 Computations . . . . . . . . . . . . . . . . . . . . . . . . . 26

    3.6 Moving Average Series . . . . . . . . . . . . . . . . . . . . . . . . 293.6.1 The MA(1) Series . . . . . . . . . . . . . . . . . . . . . . 293.6.2 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . 293.6.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . 30

    iii

  • iv Contents

    3.7 Autoregressive Moving Average Series . . . . . . . . . . . . . . . 303.7.1 The ARMA(1,1) Series . . . . . . . . . . . . . . . . . . . 313.7.2 The ARMA(p,q) Model . . . . . . . . . . . . . . . . . . . 323.7.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . 323.7.4 Common Factors . . . . . . . . . . . . . . . . . . . . . . . 34

    3.8 The Partial Autocorrelation Function . . . . . . . . . . . . . . . 343.8.1 Computing the PACF . . . . . . . . . . . . . . . . . . . . 363.8.2 Computation . . . . . . . . . . . . . . . . . . . . . . . . . 37

    4 Identifying Time Series Models 394.1 ACF Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.2 PACF Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.3 System Identification . . . . . . . . . . . . . . . . . . . . . . . . . 434.4 Model Generalisation . . . . . . . . . . . . . . . . . . . . . . . . . 45

    4.4.1 Non-Zero Means . . . . . . . . . . . . . . . . . . . . . . . 454.4.2 Deterministic Trends . . . . . . . . . . . . . . . . . . . . . 474.4.3 Models With Non-stationary AR Components . . . . . . . 474.4.4 The Effect of Differencing . . . . . . . . . . . . . . . . . . 48

    4.5 ARIMA Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    5 Fitting and Forecasting 515.1 Model Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    5.1.1 Computations . . . . . . . . . . . . . . . . . . . . . . . . . 515.2 Assessing Quality of Fit . . . . . . . . . . . . . . . . . . . . . . . 545.3 Residual Correlations . . . . . . . . . . . . . . . . . . . . . . . . 585.4 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    5.4.1 Computation . . . . . . . . . . . . . . . . . . . . . . . . . 595.5 Seasonal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    5.5.1 Purely Seasonal Models . . . . . . . . . . . . . . . . . . . 615.5.2 Models with Short-Term and Seasonal Components . . . 635.5.3 A More Complex Example . . . . . . . . . . . . . . . . . 66

    6 Frequency Domain Analysis 756.1 Some Background . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    6.1.1 Complex Exponentials, Sines and Cosines . . . . . . . . . 756.1.2 Properties of Cosinusoids . . . . . . . . . . . . . . . . . . 766.1.3 Frequency and Angular Frequency . . . . . . . . . . . . . 776.1.4 Invariance and Complex Exponentials . . . . . . . . . . . 77

    6.2 Filters and Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 786.2.1 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.2.2 Transfer Functions . . . . . . . . . . . . . . . . . . . . . . 796.2.3 Filtering Sines and Cosines . . . . . . . . . . . . . . . . . 806.2.4 Filtering General Series . . . . . . . . . . . . . . . . . . . 806.2.5 Computing Transfer Functions . . . . . . . . . . . . . . . 816.2.6 Sequential Filtering . . . . . . . . . . . . . . . . . . . . . 81

    6.3 Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826.3.1 The Power Spectrum . . . . . . . . . . . . . . . . . . . . . 826.3.2 The Cramer Representation . . . . . . . . . . . . . . . . . 836.3.3 Using The Cramer Representation . . . . . . . . . . . . . 856.3.4 Power Spectrum Examples . . . . . . . . . . . . . . . . . 86

  • Contents v

    6.4 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . 886.4.1 Some Distribution Theory . . . . . . . . . . . . . . . . . . 886.4.2 The Periodogram and its Distribution . . . . . . . . . . . 906.4.3 An Example Sunspot Numbers . . . . . . . . . . . . . . 916.4.4 Estimating The Power Spectrum . . . . . . . . . . . . . . 926.4.5 Tapering and Prewhitening . . . . . . . . . . . . . . . . . 956.4.6 Cross Spectral Analysis . . . . . . . . . . . . . . . . . . . 96

    6.5 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 996.5.1 A Simple Spectral Analysis Package for R . . . . . . . . . 996.5.2 Power Spectrum Estimation . . . . . . . . . . . . . . . . . 996.5.3 Cross-Spectral Analysis . . . . . . . . . . . . . . . . . . . 100

    6.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

  • vi Contents

  • Chapter 1

    Introduction

    1.1 Time Series

    Time series arise as recordings of processes which vary over time. A recordingcan either be a continuous trace or a set of discrete observations. We willconcentrate on the case where observations are made at discrete equally spacedtimes. By appropriate choice of origin and scale we can take the observationtimes to be 1, 2, . . .T and we can denote the observations by Y1, Y2, . . . , YT .

    There are a number of things which are of interest in time series analysis.The most important of these are:

    Smoothing: The observed Yt are assumed to be the result of noise valuest additively contaminating a smooth signal t.

    Yt = t + t

    We may wish to recover the values of the underlying t.

    Modelling: We may wish to develop a simple mathematical model whichexplains the observed pattern of Y1, Y2, . . . , YT . This model may dependon unknown parameters and these will need to be estimated.

    Forecasting: On the basis of observations Y1, Y2, . . . , YT , we may wish topredict what the value of YT+L will be (L 1), and possibly to give anindication of what the uncetainty is in the prediction.

    Control: We may wish to intervene with the process which is producing theYt values in such a way that the future values are altered to produce afavourable outcome.

    1.2 Stationarity and Non-Stationarity

    A key idea in time series is that of stationarity. Roughly speaking, a timeseries is stationary if its behaviour does not change over time. This means, forexample, that the values always tend to vary about the same level and thattheir variability is constant over time. Stationary series have a rich theory and

    1

  • 2 Chapter 1. Introduction

    their behaviour is well understood. This means that they play a fundamentalrole in the study of time series.

    Obviously, not all time series that we encouter are stationary. Indeed, non-stationary series tend to be the rule rather than the exception. However, manytime series are related in simple ways to series which are stationary. Two im-portant examples of this are:

    Trend models : The series we observe is the sum of a determinstic trendseries and a stationary noise series. A simple example is the linear trendmodel:

    Yt = 0 + 1t+ t.

    Another common trend model assumes that the series is the sum of aperiodic seasonal effect and stationary noise. There are many othervariations.

    Integrated models : The time series we observe satisfies

    Yt+1 Yt = t+1

    where t is a stationary series. A particularly important model of this kindis the random walk. In that case, the t values are independent shockswhich perturb the current state Yt by an amount t+1 to produce a newstate Yt+1.

    1.3 Some Examples

    1.3.1 Annual Auckland Rainfall

    Figure 1.1 shows the annual amount of rainfall in Auckland for the years from1949 to 2000. The general pattern of rainfall looks similar throughout the record,so this series could be regarded as being stationary. (There is a hint that rainfallamounts are declining over time, but this type of effect can occur over shortishtime spans for stationary series.)

    1.3.2 Nile River Flow

    Figure 1.2 shows the flow volume of the Nile at Aswan from 1871 to 1970. Theseare yearly values. The general pattern of this data does not change over timeso it can be regarded as stationary (at least over this time period).

    1.3.3 Yield on British Government Securities

    Figure 1.3 shows the percentage yield on British Government securities, monthlyover a 21 year period. There is a steady long-term increase in the yields. Overthe period of observation a trend-plus-stationary series model looks like it mightbe appropriate. An integrated stationary series is another possibility.

  • 1.3. Some Examples 3

    1.3.4 Ground Displacement in an Earthquake

    Figure 1.4 shows one component of the horizontal ground motion resulting froman earthquake. The initial motion (a little after 4 seconds) corresponds to thearrival of the p-wave and the large spike just before six seconds corresponds tothe arrival of the s-wave. Later features correspond to the arrival of surfacewaves. This is an example of a transient signal and cannot have techniquesappropriate for stationary series applied to it.

    1.3.5 United States Housing Starts

    Figure 1.5 shows the monthly number of housing starts in the Unites States (inthousands). Housing starts are a leading economic indicator. This means thatan increase in the number of housing starts indicates that economic growth islikely to follow and a decline in housing starts indicates that a recession may beon the way.

    1.3.6 Iowa City Bus Ridership

    Figure 1.6 shows the monthly average weekday bus ridership for Iowa City overthe period from September 1971 to December 1982. There is clearly a strongseasonal effect suprimposed on top of a general upward trend.

  • 4 Chapter 1. Introduction

    Year

    Annu

    al R

    ainf

    all (c

    m)

    1950 1960 1970 1980 1990 2000

    80

    100

    120

    140

    160

    180

    200

    Figure 1.1: Annual Auckland rainfall (in cm) from 1949 to 2000 (fromPaul Cowpertwait).

    Year

    Flow

    1880 1900 1920 1940 1960

    600

    800

    1000

    1200

    1400

    Figure 1.2: Flow volume of the Nile at Aswan from 1871 to 1970 (fromDurbin and Koopman).

  • 1.3. Some Examples 5

    Month

    Perc

    ent

    0 50 100 150 200 250

    2

    4

    6

    8

    Figure 1.3: Monthly percentage yield on British Government securitiesover a 21 year period (from Chatfield).

    Seconds

    Dis

    plac

    emen

    t

    0 2 4 6 8 10 12

    100

    0

    100

    200

    300

    400

    500

    Figure 1.4: Horizontal ground displacement during a small Nevadaearthquake (from Bill Peppin).

  • 6 Chapter 1. Introduction

    Time

    Hou

    sing

    Sta

    rts (0

    00s)

    1966 1968 1970 1972 1974

    50

    100

    150

    200

    Figure 1.5: Housing starts in the United States (000s) (from S-Plus).

    Time

    Aver

    age

    Wee

    kly R

    ider

    ship

    1972 1974 1976 1978 1980 1982

    4000

    6000

    8000

    10000

    Figure 1.6: Average weekday bus ridership, Iowa City (monthly ave)Sep 1971 - Dec 1982.

  • Chapter 2

    Vector Space Theory

    2.1 Vectors In Two Dimensions

    The theory which underlies time series analysis is quite technical in nature.In spite of this, a good deal of intuition can be developed by approaching thesubject geometrically. The geometric approach is based on the ideas of vectorsand vector spaces.

    2.1.1 Scalar Multiplication and Addition

    A good deal can be learnt about the theory of vectors by considering the two-dimensional case. You can think of two dimensional vectors as being littlearrows which have a length and a direction. Individual vectors can be stretched(altering their length but not their direction) and pairs of vectors can be addedby placing them head to tail.

    To get more precise, well suppose that a vector v extends for a distance xin the horizontal direction and a distance y in the vertical direction. This givesus a representation of the vector as a pair of numbers and we can write it as

    v = (x, y).

    Doubling the length of the vector doubles the x and y values. In a similarway we can scale the length of the vector by any value, and this results in asimilar scaling of the x and y values. This gives us a natural way of definingmultiplication of a vector by a number.

    cv = (cx, cy)

    A negative value for c produces a reversal of direction as well as change of lengthof magnitude |c|.

    Adding two vectors amounts to placing them head to tail and taking the sumto be the arrow which extends from the base of the first to the head of the second.This corresponds to adding the x and y values corresponding to the vectors. Fortwo vectors v1 = (x1, y1) and v2 = (x2, y2) the result is (x1 + x2, y1 + y2). Thiscorresponds to a natural definition of addition for vectors.

    v1 + v2 = (x1 + x2, y1 + y2)

    7

  • 8 Chapter 2. Vector Space Theory

    2.1.2 Norms and Inner Products

    While coordinates give a complete description of vectors, it is often more usefulto describe them in terms of the lengths of individual vectors and the anglesbetween pairs of vectors. If we denote the length of a vector by u, then byPythagoras theorem we know

    u =x2 + y2.

    Vector length has a number of simple properties:

    Positivity: For every vector u, we must have u > 0, with equality if andonly if u = 0.

    Linearity: For every scalar c and vector u we have cu = |c|u.The Triangle Inequality: If u and v are vectors, then u+ v 6 u+ v.The first and second of these properties are obvious, and the third is simplya statement that the shortest distance between two points is a straight line.The technical mathematical name for the length of a vector is the norm of thevector.

    Although the length of an individual vector gives some information aboutit, it is also important to consider the angles between pairs of vectors. Supposethat we have vectors v1 = (x1, y1) and v2 = (x2, y2), and that we want to knowthe angle between them. If v1 subtends an angle 1 with the x axis and v2subtends an angle 2 with the x axis, simple geometry tells us that:

    cos i =xivi ,

    sin i =yivi .

    The angle we are interested in is 1 2 and the cosine of this can be computedusing the formulae:

    cos( ) = cos cos + sin sinsin( ) = sin cos cos sin,

    so thatcos(1 2) = x1x2 + y1y2v1v2sin(1 2) = x2y1 x1y2v1v2 .

    Knowledge of the sine and cosine values makes it possible to compute the anglebetween the vectors.

    There are actually two angles between a pair of vectors; one measured clock-wise and one measured counter-clockwise. Often we are interested in the smallerof these two angles. This can be determined from the cosine value (becausecos = cos). The cosine is determined by the lengths of the two vectorsand the quantity x1x2 + y1y2. This quantity is called the inner product of thevectors v1 and v2 and is denoted by v1,v2. Inner products have the followingbasic properties.

  • 2.2. General Vector Spaces 9

    Positivity: For every vector v, we must have v,v > 0, with v,v = 0 onlywhen v = 0.

    Linearity: For all vectors u, v, and w; and scalars and ; we must haveu+ v,w = u,w+ v,w.

    Symmetry: For all vectors u and v, we must have u,v = v,u (or in thecase of complex vector spaces u,v = v,u, where the bar indicates acomplex conjugate).

    It is clear that the norm can be defined in terms of the inner product by

    v2 = v,v.

    It is easy to show that the properties of the norm v follow from those of theinner product.

    Two vectors u and v are said to be orthogonal if u,v = 0. This meansthat the vectors are at right angles to each other.

    2.2 General Vector Spaces

    The theory and intuition obtained from studying two-dimensional vectors car-ries over into more general situations. It is not the particular representationof vectors as pairs of coordinates which is important but rather the ideas ofaddition, scalar multiplication and inner-product which are important.

    2.2.1 Vector Spaces and Inner Products

    The following concepts provide an abstract generalisation of vectors in two di-mensions.

    1. A vector space is a set of objects, called vectors, which is closed underaddition and scalar multiplication. This means that when vectors arescaled or added, the result is a vector.

    2. An inner product on a vector space is a function which takes two vectorsu and v and returns a scalar denoted by u,v which has the conditionsof positivity, linearity and symmetry described in section 2.1.2. A vectorspace with an associated inner product is called an inner product space.

    3. The norm associated with an inner product space is defined in terms ofthe inner product as u =u,u.

    These ideas can be applied to quite general vector spaces. Although it mayseem strange to apply ideas of direction and length to some of these spaces,thinking in terms of two and three dimensional pictures can be quite useful.

    Example 2.2.1 The concept of two-dimensional vectors can be generalised bylooking at the set of n-tuples of the form u = (u1, u2, . . . , un), where each ui isa real number. With addition and scalar multiplication defined element-wise,

  • 10 Chapter 2. Vector Space Theory

    the set of all such n-tuples forms a vector space, usually denoted by Rn. Aninner product can be defined on the space by

    u,v =ni=1

    uivi,

    and this produces the norm

    u =(

    ni=1

    u2i

    ) 12

    .

    This generalisation of vector ideas to n dimensions provides the basis for agood deal of statistical theory. This is especially true for linear models andmultivariate analysis.

    We will need a number of fundamental results on vector spaces. These followdirectly from the definition of vector space, inner-product and norm, and do notrely on any special assumptions about the kind of vectors being considered.

    The Cauchy-Schwarz Inequality. For any two vectors u and v

    |u,v| 6 uv. (2.1)Proof : For any two vectors u and v and scalars and ,

    0 6 u v2 = u v, u v = 2u2 2u,v+ 2v2

    Setting = v and = u,v/v yields0 6 u2v2 2u,v2 + u,v2 = u2v2 u,v2.

    This can be rewritten as|u,v| 6 uv.

    The Triangle Inequality. For any two vectors u and v,

    u+ v 6 u+ v.Proof : For any u and v

    u+ v2 = u+ v,u+ v= u,u+ u,v+ v,u+ v,v6 u2 + 2uv+ v2 (Cauchy-Schwarz)= (u+ v)2

    The Angle Between Vectors. The Cauchy-Schwarz inequality means that

    1 6 x,yxy 6 1.

    This means that just as in the 2-d case we can define the angle betweenvectors u and v by

    = cos1x,yxy .

  • 2.2. General Vector Spaces 11

    2.2.2 Some Examples

    The ideas in abstract vector space theory are derived from the concrete ideas ofvectors in 2 and 3 dimensions. The real power of vector space theory is that itapplies to much more general spaces. To show off the generality of the theorywell now look at a couple of examples of less concrete spaces.

    Example 2.2.2 The set F of all (real-valued) random variables with EX = 0and E|X|2

  • 12 Chapter 2. Vector Space Theory

    There is also a technical uniqueness problem in this case. This is handledby regarding functions f and g as equal if 1

    1|f(x) g(x)|2 dx = 0.

    This happens, for example, if f and g differ only on a finite or countably infiniteset of points. Integration theory gives a characterisation in terms of sets withmeasure zero.

    2.3 Hilbert Spaces

    The vector spaces weve looked at so far work perfectly well for performing finiteoperations such as forming linear combinations. However, there can be problemswhen considering limits.

    Consider the case of example 2.2.3. Each of the functions fn defined by

    fn(x) =

    0, x < 1

    n,

    nx+ 12

    , 1n6 x 6 1

    n,

    1, x >1n,

    is continuous and the sequence obviously converges to the function

    f(x) =

    0, x < 0,1/2, x = 0,1, x > 0.

    This limit function is not in the space under consideration because it is notcontinuous.

    Hilbert spaces add a requirement of completeness to those for an inner prod-uct space. In Hilbert spaces, sequences that look like they are converging willactually converge to an element of the space. To make this precise we need todefine the meaning of convergence.

    Convergence. Suppose that {un} is a sequence of vectors and u is a vectorsuch that un u 0, then un is said to converge to u and this is denotedby un u.

    If {un} is a sequence of vectors such that the un u and v is any vectorthen, by the Cauchy-Schwarz inequality un,v u,v. In a similar wayun u. These properties are referred to as continuity of the inner productand norm.

    A sequence for which limm,n umun 0 is called a Cauchy sequence.It is easy to see that every convergent sequence is a Cauchy sequence. If con-versely every Cauchy sequence converges to an element of the space, the spaceis said to be complete. A complete inner product space is called a Hilbert space.

    Hilbert spaces preserve many of the important properties of Rn. In particularthe notions of length and direction retain their intuitive meanings. This makesit possible to carry out mathematical arguments geometrically and even to usepictures to understand what is happening in quite complex cases.

  • 2.3. Hilbert Spaces 13

    2.3.1 Subspaces

    A subset M of a Hilbert space H is called a linear manifold if whenever u andv are elements of M then so is u+ v. A linear manifold which contains thelimit of every Cauchy sequence of its elements is called a linear subspace of H.A linear subspace of a Hilbert space is itself a Hilbert space.

    A vector v is orthogonal to a subspace M if v,u = 0 for every u M.The set all vectors which are orthogonal to M is itself a subspace denoted byM and called the orthogonal complement of M.

    A set of vectors {v : } is said to generate a linear subspaceM ifM isthe smallest subspace containing those vectors. It is relatively straightforwardto show that the subspace consists of all linear combinations of the vs togetherwith the limits of all Cauchy sequences formed from these linear combinations.

    We will use the notation sp{v : } to indicate the subspace generatedby the random variables {v : }.

    2.3.2 Projections

    If M is a subspace of a Hilbert space H and v is a vector not in M then thedistance from M to v is defined to be minuM v u. A crucial result inHilbert space theory tells us about how this minimum is attained.

    The Projection Theorem. There is a unique vector PMv M such thatv PMv = min

    uMv u.

    The vector PMv satisfies PMv M and v PMv M. It is called theorthogonal projection of v on M.

    When M is generated by a set of elements {u : }, the conditionvPMv M is equivalent to the condition (vPMv) u for every .In other words, vPMv,u = 0 for every . This produces the equations

    PMv,u = v,u, ,which together with the requirement PMv M, completely determines PMv.

    When a subspace of a Hilbert space is generated by a countable set of or-thogonal vectors {n}, the orthogonal projection has the form

    PMv =n

    v, nn2

    n

    In the case of a finite set {1, . . . , n} this is easy to see.v, 112

    1 + +v, nn2

    n

    In addition, the orthogonality of {1, . . . , n} means v, 112

    1 + +v, nn2

    n, i

    =v, ii2

    i, i

    = v, i

  • 14 Chapter 2. Vector Space Theory

    so the conditions above are verified.The general (countable) case follows from the continuity of the norm and

    inner product.

    2.4 Hilbert Spaces and Prediction

    Consider the Hilbert space H consisting of all finite-variance random variableson some probability space, with inner product defined by

    X,Y = EXY .We will now look at the problem of predicting a variable Y using zero, one ormore predicting random variables X1, . . . , Xn.

    2.4.1 Linear Prediction

    The problem can be stated as follows. Given a response random variable Yand predictor random variables X1, . . . , Xn, what is the best way of predictingY using a linear function of the Xs. This amounts to finding the coefficientswhich minimise the mean squared error

    E|Y 0 1X1 nXn|2,or, in a Hilbert space setting,

    Y 0 1X1 nXn2.The variables {1, X1, . . . , Xn} generate a subspaceM of H, and the minimi-

    sation problem above amounts to finding the projection PMY . Well approachthis problem in steps, beginning with n = 0.

    The subspace C generated by the constant random variable 1 consists of allconstant random variables. Using the result above, the projection of a randomvariable Y with mean Y and variance 2Y onto this subspace is

    PCY = Y, 112 1 = EY = Y .

    This tells us immediately that the value of c which minimises E[Y c]2 is Y .Now consider the subspace L generated by 1 and a single random variable

    X with mean X and variance 2X . This is clearly the same as the subspacegenerated by 1 and X EX. Since 1 and X EX are orthogonal, we can usethe projection result of the previous section to compute the projection of Y ontoL.

    PLY = Y, 112 1 +Y,X EXX EX2 (X EX)

    = EY +Y EY,X EX

    X EX2 (X EX) +EY,X EXX EX2 (X EX)

    = EY +Y EY,X EX

    X EX2 (X EX)

  • 2.4. Hilbert Spaces and Prediction 15

    because EY,X EX = 0.Now Y EY,X EX = cov(Y,X) which is in turn equal to Y XY X ,

    where YX is the correlation between Y and X. This means that

    PLY = Y + YX YX

    (X X).

    PLY is the best possible linear prediction of Y based on X, because among allpredictions of the form 0 + 1X, it is the one which minimises

    E(Y 0 1X)2.

    The general case of n predictors proceeds in exactly the same way, but ismore complicated because we must use progressive orthogonalisation of the setof variables {1, X1, . . . , Xn}. The final result is that the best predictor of Y is

    PLY = Y +YX1XX(X X),

    where X represents the variables X1, . . . , Xn assembled into a vector, X isthe vector made up of the means of the Xs, YX is the vector of covariancesbetween Y and each of the Xs, and XX is the variance-covariance matrix ofthe Xs.

    2.4.2 General Prediction

    It is clear that linear prediction theory can be developed using Hilbert spacetheory. What is a little less clear is that Hilbert space theory also yields ageneral non-linear prediction theory.

    Linear prediction theory uses only linear information about the predictorsand there is much more information available. In the one variable case, theadditional information can be obtained by considering all possible (Borel) func-tions (X). These are still just random variables and so generate a subspaceM of H. The projection onto this subspace gives the best possible predictor ofY based on X1, . . . , Xn.

    With some technical mathematics it is possible to show that this projectionis in fact just the conditional expectation E[Y |X]. More generally, it is possibleto show that the best predictor of Y based on X1, . . . , Xn is E[Y |X1, . . . , Xn].

    Although this is theoretically simple, it requires very strong assumptionsabout the distribution of Y andX1, . . . , Xn to actually compute an explicit valuefor the prediction. The simplest assumption to make is that Y and X1, . . . , Xnhave a joint normal distribution. In this case, the general predictor and thelinear one are identical.

    Because of this it is almost always the case that linear prediction is preferredto general prediction.

  • 16 Chapter 2. Vector Space Theory

  • Chapter 3

    Time Series Theory

    3.1 Time Series

    We will assume that the time series values we observe are the realisations ofrandom variables Y1, . . . , YT , which are in turn part of a larger stochastic process{Yt : t Z}. It is this underlying process that will be the focus for our theoreticaldevelopment.

    Although it is best to distinguish the observed time series from the under-lying stochastic process, the distinction is usually blurred and the term timeseries is used to refer to both the observations and the underlying process whichgenerates them.

    The mean and the variance of random variables have a special place in thetheory of statistics. In time series analysis, the analogs of these are the meanfunction and the autocovariance function.

    Definition 3.1.1 (Mean and Autocovariance Functions): The mean functionof a time series is defined to be (t) = EYt and the autocovariance function isdefined to be (s, t) = cov(Ys, Yt).

    The mean and the autocovariance functions are fundamental parameters andit would be useful to obtain sample estimates of them. For general time seriesthere are 2T + T (T 1)/2 parameters associated with Y1, . . . , YT and it is notpossible to estimate all these parameters from T data values.

    To make any progress at all we must impose constraints on the time serieswe are investigating. The most common constraint is that of stationarity. Thereare two common definitions of stationarity.

    Definition 3.1.2 (Strict Stationarity): A time series {Yt : t Z} is said to bestrictly stationary if for any k > 0 and any t1, . . . , tk Z, the distribution of

    (Yt1 , . . . , Ytk)

    is the same as that for(Yt1+u, . . . , Ytk+u)

    for every value of u.

    17

  • 18 Chapter 3. Time Series Theory

    This definition says that the stochastic behaviour of the process does notchange through time. If Yt is stationary then

    (t) = (0)

    and(s, t) = (s t, 0).

    So for stationary series, the mean function is constant and the autocovariancefunction depends only on the time-lag between the two values for which thecovariance is being computed.

    These two restrictions on the mean and covariance functions are enough fora reasonable amount of theory to be developed. Because of this a less restrictivedefinition of stationarity is often used in place of strict stationarity.

    Definition 3.1.3 (Weak Stationarity): A time series is said to be weakly, wide-sense or covariance stationary if E|Yt|2

  • 3.3. The Lag and Differencing Operators 19

    subspaceM. For example, Yt+1 can be predicted by PMYt+1, Yt+1 by PMYt+2and so on.

    Computing these predictions requires a knowledge of the autocovariancefunction of the time series and typically this is not known. We will spend agood deal of this chapter studying simple parametric models for time series. Byfitting such models we will be able to determine the covariance structure of thetime series and so be able to obtain predictions or forecasts of future values.

    3.3 The Lag and Differencing Operators

    The lag operator L is defined for a time series {Yt} byLYt = Yt1.

    The operator can be defined for linear combinations by

    L(c1Yt1 + c2Yt2) = c1Yt11 + c2Yt21

    and can be extended to all of H by a suitable definition for limits.In addition to being linear, the lag operator preserves inner products.

    LYs, LYt = cov(Ys1, Yt1)= cov(Ys, Yt)= Ys, Yt

    (An operator of this type is called a unitary operator.)There is a natural calculus of operators on H. For example we can define

    powers of L naturally by

    L2Yt = LLYt = LYt1 = Yt2L3Yt = LL2Yt = Yt3

    ...

    LkYt = Ytk

    and linear combinations by

    (Lk + Ll)Yt = Ytk + Ytl.

    Other operators can be defined in terms in terms of L. The differencingoperator defined by

    Yt = (1 L)Yt = Yt Yt1is of fundamental importance when dealing with models for non-stationary timeseries. Again, we can define powers of this operator

    2Yt = (Yt)= (Yt Yt1)= (Yt Yt1) (Yt1 Yt2)= Yt 2Yt1 + Yt2.

    We will not dwell on the rich Hilbert space theory associated with time series,but it is important to know that many of the operator manipulations which wewill carry out can be placed on a rigorous footing.

  • 20 Chapter 3. Time Series Theory

    3.4 Linear Processes

    We will now turn to an examination of a large class of useful time series models.These are almost all defined in terms of the lag operator L. As the simplestexample, consider the autoregressive model defined by:

    Yt = Yt1 + t, (3.1)

    where is a constant with || < 1 and t is a sequence of uncorrelated randomvariables, each with with mean 0 and variance 2. From a statistical point ofview thismodelmakes perfect sense, but is not clear that any Yt which satisfiesthis equation exists.

    One way to see that there is a solution is to re-arrange equation 3.1 andwrite it in its operator form.

    (1 L)Yt = t.Formally inverting the operator (1 L) leads to

    Yt = (1 L)1t

    =u=0

    uLut

    =u=0

    utu.

    The series on the right is defined as the limit as n ofn

    u=0

    utu.

    Loosely speaking, this limit exists if

    u=n+1

    utu2 0.

    Since

    u=n+1

    utu2 = var(

    u=n+1

    utu)

    =

    u=n+1

    ||2u2

    and || < 1, there is indeed a well-defined solution of 3.1. Further, this solutioncan be written as an (infinite) moving average of current and earlier t values.

    This type of infinite moving average plays a special role in the theory of timeseries.

    Definition 3.4.1 (Linear Processes) The time series Yt defined by

    Yt =

    u=utu

  • 3.5. Autoregressive Series 21

    where t is a white-noise series and

    u=

    |u|2

    22Y

    and hence1 > 2.

  • 22 Chapter 3. Time Series Theory

    There is an alternative view of this, using the operator formulation of equa-tion 3.2, namely

    (1 L)Yt = tIt is possible to formally invert the autoregressive operator to obtain

    (1 L)1 =u=0

    uLu.

    Applying this to the series {t} produces the representation

    Yt =u=0

    utu. (3.4)

    If || < 1 this series converges in mean square because ||2u 1 the series defined by 3.4 does not converge. We can, however,rearrange the defining equation 3.2 in the form

    Yt =1Yt+1 1

    t+1,

    and a similar argument to that above produces the representation

    Yt = u=0

    ut+u.

    This series does converge, so there is a stationary series Yt which satisfies 3.2.The resulting series is generally not regarded as satisfactory for modelling andforecasting because it is not causal, i.e. it depends on future values of t.

    If || = 1 there is no stationary solution to 3.2. This means that for thepurposes of modelling and forecasting stationary time series, we must restrictour attention to series for which || < 1 or, equivalently, to series for which theroot of the polynomial 1 z lies outside the unit circle in the complex plane.

    If we multiply both sides of equation 3.2 by Ytu and take expectations weobtain

    E(YtYtu) = E(Yt1Ytu) + E(tYtu).

    The term on the right is zero because, from the linear process representation, tis independent of earlier Yt values. This means that the autocovariances mustsatisfy the recursion

    (u) = (u 1), u = 1, 2, 3, . . . .

    This is a first-order linear difference equation with solution

    (u) = u(0), u = 0, 1, 2, . . .

  • 3.5. Autoregressive Series 23

    1 = 0.7

    0 2 4 6 8 10 12 14

    1.0

    0.5

    0.0

    0.5

    1.0

    1 = 0.7

    ACF

    0 2 4 6 8 10 12 14

    1.0

    0.5

    0.0

    0.5

    1.0

    Figure 3.1: Autocorrelation functions for two of AR(1) models.

    By rearranging equation 3.3 we find (0) = 2/(1 2), and hence that

    (u) =u21 2 , u = 0, 1, 2, . . .

    This in turn means that the autocorrelation function is given by

    (u) = u, u = 0, 1, 2, . . .

    The autocorrelation functions for the the AR(1) series with 1 = .7 and1 = .7 are shown in figure 3.1. Both functions show exponential decay.

    3.5.2 The AR(2) Series

    The AR(2) model is defined by

    Yt = 1Yt1 + 2Yt2 + t (3.5)

    or, in operator form(1 1L 2L2)Yt = t.

    As in the AR(1) case we can consider inverting the AR operator. To see whetherthis is possible, we can consider factorising the operator

    1 1L 2L2

    and inverting each factor separately. Suppose that

    1 1L 2L2 = (1 c1L)(1 c2L),

  • 24 Chapter 3. Time Series Theory

    then it is clear that we can invert the operator if we can invert each factorseparately. This is possible if |c1| < 1 and |c2| < 1, or equivalently, if the rootsof the polynomial

    1 1z 2z2lie outside the unit circle. A little algebraic manipulation shows that this isequivalent to the conditions:

    1 + 2 < 1, 1 + 2 < 1, 2 > 1.These constraints define a triangular region in the 1, 2 plane. The region isshown as the shaded triangle in figure 3.3.

    The autocovariance function for the AR(2) series can be investigated bymultiplying both sides of equation 3.5 by Ytu and taking expectations.

    E(YtYtu) = 1E(Yt1Ytu) + 2E(Yt2Ytu) + E(tYtu).

    This in turns leads to the recurrence

    (u) = 1(u 1) + 2(u 2)with initial conditions

    (0) = 1(1) + 2(2) + 2(1) = 1(0) + 2(1).

    or, using the fact that (u) = (u),(0) = 1(1) + 2(2) + 2(1) = 1(0) + 2(1).

    The solution to these equations has the form

    (u) = A1Gu1 +A2Gu2

    where G11 and G12 are the roots of the polynomial

    1 1z 2z2 (3.6)and A1 and A2 are constants that can be determined from the initial conditions.In the case that the roots are equal, the solution has the general form

    (u) = (A1 +A2u)Gu

    These equations indicate that the autocovariance function for the AR(2) serieswill exhibit (exponential) decay as u.

    If Gk corresponds to a complex root, then

    Gk = |Gk|eik

    and henceGuk = |Gk|ueiku = |Gk|u(cos ku+ i sin ku)

    Complex roots will thus introduce a pattern of decaying sinusoidal variationinto the covariance function (or autocorrelation function). The region of the1, 2 plane corresponding to complex roots is indicated by the cross-hatchedregion in figure 3.3.

  • 3.5. Autoregressive Series 25

    1 = 0.5, 2 = 0.3

    Lag

    0 2 4 6 8 10 12 14

    1.0

    0.5

    0.0

    0.5

    1.0

    1 = 1, 2 = 0.5

    Lag

    ACF

    0 2 4 6 8 10 12 14

    1.0

    0.5

    0.0

    0.5

    1.0

    1 = 0.5, 2 = 0.3

    0 2 4 6 8 10 12 14

    1.0

    0.5

    0.0

    0.5

    1.0

    1 = 0.5, 2 = 0.3

    ACF

    0 2 4 6 8 10 12 14

    1.0

    0.5

    0.0

    0.5

    1.0

    Figure 3.2: Autocorrelation functions for a variety of AR(2) models.

  • 26 Chapter 3. Time Series Theory

    2 1 0 1 2

    1

    0

    1

    1

    2

    Figure 3.3: The regions of 1/2 space where the series produced by theAR(2) scheme is stationary (indicated in grey) and has complex roots(indicated by cross-hatching).

    The AR(p) Series

    The AR(p) series is defined by

    Yt = 1Yt1 + + pYtp + t (3.7)If the roots of

    1 1z pzp (3.8)lie outside the unit circle, there is a stationary causal solution to 3.7.

    The autocovariance function can be investigated by multiplying equation 3.7by Ytu and taking expectations. This yields

    (u) = 1(u 1) + + p(u p).This is a linear homogeneous difference equation and has the general solution

    (u) = A1Gu1 + +ApGup(this is for distinct roots), where G1,. . . ,Gp are the reciprocals of the rootsof equation 3.8. Note that the stationarity condition means that (u) 0,exhibiting exponential decay. As in the AR(2) case, complex roots will introduceoscillatory behaviour into the autocovariance function.

    3.5.3 Computations

    R contains a good deal of time series functionality. On older versions of R (thoseprior to 1.7.0) you will need to type the command

  • 3.5. Autoregressive Series 27

    > library(ts)

    to ensure that this functionality is loaded.The function polyroot can be used to find the roots of polynomials and so

    determine whether a proposed model is stationary. Consider the model

    Yt = 2.5Yt1 Yt2 + tor, its equivalent operator form

    (1 2.5L+ L2)Yt = t.We can compute magnitudes of the roots of the polynomial 1 2.5z + z2 withpolyroot.

    > Mod(polyroot(c(1,-2.5,1)))[1] 0.5 2.0

    The roots have magnitudes .5 and 2. Because the first of these is less than 1 inmagnitude the model is thus not stationary and causal.

    For the modelYt = 1.5Yt1 .75Yt2 + t

    or its operator equivalent

    (1 1.5L+ .75L2)Yt = twe can check stationarity by examining the magnitudes of the roots of 11.5z+.75z2.

    > Mod(polyroot(c(1,-1.5,.75)))[1] 1.154701 1.154701

    Both roots are bigger than 1 in magnitude, so the series is stationary. We canobtain the roots themselves as follows.

    > polyroot(c(1,-1.5,.75))[1] 1+0.5773503i 1-0.5773503i

    Because the roots are complex we can expect to see a cosine-like ripple in theautocovariance function and the autocorrelation function.

    The autocorrelation function for a given model can be computed using theARMAacf function. The acf for the model above can be computed and plottedas follows.

    > plot(0:14, ARMAacf(ar=c(1.5,-.75), lag=14), type="h",xlab = "Lag", ylab = "ACF")

    > abline(h = 0)

    The result is shown in figure 3.4.Finally, it may be useful to simulate a time series of a given form. We can

    create a time series from the model Yt = 1.5Yt1 .75Yt2+ t and plot it withthe following statements.

    > x = arima.sim(model = list(ar=c(1.5,-.75)), n = 100)> plot(x)

    The result is shown in figure 3.5. Note that there is evidence that the seriescontains a quasi-periodic component with period about 12, as suggested by theautocorrelation function.

  • 28 Chapter 3. Time Series Theory

    0 2 4 6 8 10 12 14

    0.4

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 3.4: The acf for the model Yt = 1.5Yt1 .75Yt2 + t.

    Time

    x

    0 20 40 60 80 100

    6

    4

    2

    0

    2

    4

    6

    8

    Figure 3.5: Simulation of the model Yt = 1.5Yt1 .75Yt2 + t.

  • 3.6. Moving Average Series 29

    3.6 Moving Average Series

    A time series {Yt} which satisfiesYt = t + 1t1 + + qtq (3.9)

    (with {t} white noise) is said to be a moving average process of order q orMA(q) process. No additional conditions are required to ensure stationarity.

    The autocovariance function for the MA(q) process is

    (u) =

    (1 + 21 + + 2q)2 u = 0(u + 1u+1 + + quq)2 u = 1, . . . , q0 otherwise.

    which says there is only a finite span of dependence on the series.Note that it is easy to distinguish MA and AR series by the behaviour of

    their autocorrelation functions. The acf for MA series cuts off sharply whilethat for an AR series decays exponentially (with a possible sinusoidal ripplesuperimposed).

    3.6.1 The MA(1) Series

    The MA(1) series is defined by

    Yt = t + t1. (3.10)

    It has autocovariance function

    (u) =

    (1 + 2)2 u = 02 u = 10 otherwise.

    and autocorrelation function

    (u) =

    1 + 2for u = 1,

    0 otherwise.(3.11)

    3.6.2 Invertibility

    If we replace by 1/ and 2 by 2 the autocorrelation function given by 3.11is unchanged. There are thus two sets of parameter values which can explainthe structure of the series.

    For the general process defined by equation 3.9, there is a similar identifia-bility problem. The problem can be resolved by requiring that the operator

    1 + 1L+ + qLq

    be invertible i.e. that all roots of the polynomial

    1 + 1z + + qzq

    lie outside the unit circle.

  • 30 Chapter 3. Time Series Theory

    0 2 4 6 8 10 12 14

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 3.6: The acf for the model Yt = t + .9t1.

    3.6.3 Computation

    The function polyroot can be used to check invertibility for MA models. Re-member that the invertibility requirement is only so that each MA model is onlydefined by one set of parameters.

    The function ARMAacf can be used to compute the acf for MA series. Forexample, the acf of the model

    Yt = t + 0.9t1

    can be computed and plotted as follows.

    > plot(0:14, ARMAacf(ma=.9, lag=14), type="h",xlab = "Lag", ylab = "ACF")

    > abline(h = 0)

    The result is shown in figure 3.6.A simulation of the series can be computed and plotted as follows.

    > x = arima.sim(model = list(ma=.9), n = 100)> plot(x)

    The result of the simulation is shown in figure 3.7.

    3.7 Autoregressive Moving Average Series

    Definition 3.7.1 If a series satisfies

    Yt = 1Yt1 + + pYtp + t + 1t1 + + qtq (3.12)

  • 3.7. Autoregressive Moving Average Series 31

    Time

    x

    0 20 40 60 80 100

    3

    2

    1

    0

    1

    2

    3

    Figure 3.7: Simulation of the model Yt = t + .9t1.

    (with {t} white noise), it is called an autoregressive-moving average series oforder (p, q), or an ARMA(p, q) series.

    An ARMA(p, q) series is stationary if the roots of the polynomial

    1 1z pzp

    lie outside the unit circle.

    3.7.1 The ARMA(1,1) Series

    The ARMA(1,1) series is defined by

    Yt = Yt1 + t + t1. (3.13)

    To derive the autocovariance function for Yt, note that

    E(tYt) = E[t(Yt1 + t + t1)]

    = 2

    and

    E(t1Yt) = E[t1(Yt1 + t + t1)]

    = 2 + 2

    = (+ )2 .

  • 32 Chapter 3. Time Series Theory

    Multiplying equation 3.13 by Ytu and taking expectation yields:

    (u) =

    (1) + (1 + (+ ))2 u = 0

    (0) + 2 u = 1

    (u 1) u 2Solving the first two equations produces

    (0) =(1 + 2+ 2)

    1 2 2 =

    (1 + 2+ 2)1 2

    2

    and using the last recursively shows

    (u) =(1 + )(+ )

    1 2 u12 for u 1.

    The autocorrelation function can then be computed as

    (u) =(1 + )(+ )(1 + 2+ 2)

    u1 for u 1.

    The pattern here is similar to that for AR(1), except for the first term.

    3.7.2 The ARMA(p,q) Model

    It is possible to make general statements about the behaviour of general ARMA(p,q)series. When values are more than q time units apart, the memory of the moving-average part of the series is lost. The functions (u) and (u) will then behavevery similarly to those for the AR(p) series

    Yt = 1Yt1 + + pYtp + tfor large u, but the first few terms will exhibit additional structure.

    3.7.3 Computation

    Stationarity can be checked by examining the roots of the characteristic poly-nomial of the AR operator and model parameterisation can be checked by ex-amining the roots of the characteristic polynomial of the MA operator. Bothchecks can be carried out with polyroot.

    The autocorrelation function for an ARMA series can be computed withARMAacf. For the model

    Yt = .5Yt1 + t + .3t1this can be done as follows

    > plot(0:14, ARMAacf(ar=-.5, ma=.3, lag=14), type="h",xlab = "Lag", ylab = "ACF")

    > abline(h = 0)

    and produces the result shown in figure 3.8.Simulation can be carried out using arma.sim

    > x = arima.sim(model = list(ar=-.5,ma=.3), n = 100)> plot(x)

    producing the result in figure 3.9.

  • 3.7. Autoregressive Moving Average Series 33

    0 2 4 6 8 10 12 14

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 3.8: The acf for the model Yt = .5Yt1 + t + .3t1.

    Time

    x

    0 20 40 60 80 100

    2

    1

    0

    1

    2

    Figure 3.9: Simulation of the model Yt = .5Yt1 + t + .3t1.

  • 34 Chapter 3. Time Series Theory

    3.7.4 Common Factors

    If the AR and MA operators in an ARMA(p, q) model possess common factors,then the model is over-parameterised. By dividing through by the commonfactors we can obtain a simpler model giving and identical description of theseries. It is important to recognise common factors in an ARMA model becausethey will produce numerical problems in model fitting.

    3.8 The Partial Autocorrelation Function

    The autocorrelation function of an MA series exhibits different behaviour fromthat of AR and general ARMA series. The acf of an MA series cuts of sharplywhereas those for AR and ARMA series exhibit exponential decay (with possi-ble sinusoidal behaviour superimposed). This makes it possible to identify anARMA series as being a purely MA one just by plotting its autocorrelation func-tion. The partial autocorrelation function provides a similar way of identifyinga series as a purely AR one.

    Given a stretch of time series values

    . . . , Ytu, Ytu+1, . . . , Yt1, Yt, . . .

    the partial correlation of Yt and Ytu is the correlation between these randomvariables which is not conveyed through the intervening values.

    If the Y values are normally distributed, the partial autocorrelation betweenYt and Ytu can be defined as

    (u) = cor(Yt, Ytu|Yt1, . . . , Ytu+1).

    A more general approach is based on regression theory. Consider predictingYt based on Yt1, . . . , Ytu+1. The prediction is

    Yt = 1Yt1 + 2Yt2 , u1Ytu+1with the s chosen to minimise

    E(Yt Yt)2.

    It is also possible to think backwards in time and consider predicting Ytuwith the same set of predictors. The best predictor will be

    Yiu = 1Ytu+1 + 2Ytu+2 , u1Yt1.

    (The coefficients are the same because the correlation structure is the samewhether the series is run forwards or backwards in time.

    The partial correlation function at lag u is the correlation between the pre-diction errors.

    (u) = cor(Yt Yt, Ytu Ytu)

    By convention we take (1) = (1).

  • 3.8. The Partial Autocorrelation Function 35

    It is quite straightforward to compute the value of (2). Using the resultsof section 2.4.1, the best predictor of Yt based on Yt1 is just (1)Yt1. Thus

    cov(Yt (1)Yt1, Yt2 (1)Yt1) = 2Y ((2) (1)2 (1)2) + (1)2)

    = 2Y ((2) (1)2)

    and

    var(Yt (1)Yt1) = 2Y (1 + (1)2 2(1)2)

    = 2Y (1 (1))2

    This means that

    (2) =(2) (1)21 (1)2 (3.14)

    Example 3.8.1 For the AR(1) series, recall that

    (u) = u (u 0).

    Substituting this into equation 3.14 we find

    (2) =2 21 2 = 0.

    Example 3.8.2 For the MA(1) series

    (u) =

    1 + 2if u = 1,

    0 otherwise.

    Substituting this into 3.14 we find

    (2) =0 (/(1 + 2))21 (/(1 + 2))2

    =2

    (1 + 2)2 2

    =2

    1 + 2 + 4.

    More generally it is possible to show

    (u) =u(1 2)1 2(u+1) for u 0.

    For the general AR(p) series, it is possible to show that (u) = 0 for allu > p. For such a series, the best predictor of Yt using Yt1, . . . , Ytu+1 foru > p is

    1Yt1 + + pYtp.

  • 36 Chapter 3. Time Series Theory

    becauseYt 1Yt1 + + pYtp = t

    and t is uncorrelated with Yt1, Yt2, . . ., so that the fit cannot be improved.The prediction error corresponding to the best linear predictor of Ytu is

    based on Yt1, . . . , Ytu+1 and so must be uncorrelated with t. This showsthat (u) = 0.

    For the general MA(q), it is possible to show that (u) decays exponentiallyas u.

    3.8.1 Computing the PACF

    The definition of the partial autocorrelation function given in the previous sec-tion is conceptually simple, but it makes computations hard. In this sectionwell see that there is an equivalent form which is computationally simple.

    Consider the kth order autoregressive prediction of Yk+1

    Yk+1 = k1Yk + + kkY1 (3.15)

    obtained by minimising E(Yk+1 Yk+1)2. We will show that the kth partialautocorrelation values is given by (k) = kk. The proof of this is a geometricone which takes places in the space H generated by the series {Yt}.

    We begin by defining the subspace H1 = sp{Y2, . . . , Yk} and associatedprojection PH1 , the subspace H2 = sp{Y1 PH1Y1} and the subspace Hk =sp{Y1, . . . , Yk}. Any Y H,

    PHkY = PH1Y + PH2Y.Thus

    Yk+1 = PHkYk+1= PH1Yk+1 + PH2Yk+1= PH1Yk+1 + a(Y1 PH1Y1),

    wherea = Yk+1, Y1 PH1Y1/Y1 PH1Y12. (3.16)

    Rearranging, we find

    Yk+1 = PH1(Yk+1 aY1) + aY1.The first term on the right must be a linear combination of Y2, . . . , Yk, so com-paring with equation 3.15 we see that a = kk.

    Now, the kth partial correlation is defined as the correlation between theresiduals from the regressions of Yk+1 and Y1 on Y2, . . . , Yk. But this is just

    cor(Yk+1 PH1Yk+1, Y1 PH1Y1)= Yk+1 PH1Yk+1, Y1 PH1Y1/Y1 PH1Y12

    = Yk+1, Y1 PH1Y1/Y1 PH1Y12

    = a

  • 3.8. The Partial Autocorrelation Function 37

    by equation 3.16.A recursive way of computing the regression coefficients in equation 3.15 from

    the autocorrelation function was given by Levinson (1947) and Durbin (1960).The Durbin-Levinson algorithm updates the coefficients the from k 1st ordermodel to those of the kth order model as follows:

    kk =

    (k)k1j=1

    k1,jkj

    1k1j=1

    k1,jj

    k,j = k1,j kkk1,kj j = 1, 2, . . . , k 1.

    3.8.2 Computation

    The R function ARMAacf can be used to obtain the partial autocorrelation func-tion associated with a stationary ARMA series. The call to ARMAacf is identicalto its use for obtaining the ordinary autocorrelation function, except it has theadditional argument pacf=TRUE.

    The following code computes and plots the partial autocorrelation functionfor the ARMA(1,1) model with = .5 and = .3.> plot(1:14, ARMAacf(ar=-.5, ma=.3, lag=14, pacf=TRUE),

    type="h", xlab = "Lag", ylab = "ACF")> abline(h = 0)

    The resulting plot is shown in figure 3.10.

  • 38 Chapter 3. Time Series Theory

    2 4 6 8 10 12 14

    0.20

    0.15

    0.10

    0.05

    0.00

    0.05

    Lag

    ACF

    Figure 3.10: The partial acf for the model Yt = .5Yt1 + t + .3t1.

  • Chapter 4

    Identifying Time SeriesModels

    4.1 ACF Estimation

    We have seen that it is possible to distinguish between AR, MA and ARMAmodels by the behaviour of their acf and pacf functions. In practise, we dontknow these functions and so we must estimate them.

    Given a stretch of data Y1, . . . , YT , the usual estimate of the autocovariancefunction is

    (u) =1T

    Tut=1

    (Yt+u Y )(Yt Y )

    Note that this estimator is biased an unbiased estimator would have adivisor of T u in place of T . There are two reasons for using this estimator.

    The first of these reasons is that it produces a (u) which is positive definite.This means that for any constants c1, . . . , ck,

    ku=1

    kv=1

    cucv(u v) 0.

    This ensures that our estimate of the variance ofk

    u=1

    cuXtu

    will be non-negative, something which might not be the case for the unbiasedestimate.

    The second reason is that for many time series (u) 0 as u . Forsuch time series, the biased estimate can have lower mean-squared error.

    The estimate of (u) based on (u) is

    r(u) =(u)(0)

    =

    t(Yt+u Y )(Yt Y )t(Yt Y )2

    (Again, this can have better mean-squared error properties than the estimatebased on the unbiased estimate of (u).)

    39

  • 40 Chapter 4. Identifying Time Series Models

    In order to say whether an observed correlation is significantly different fromzero, we need some distribution theory. Like most time series results, the theoryhere is asymptotic (as T ). The original results in this area were obtainedby Bartlett in 1947. We will look at results due to T. W. Anderson in 1971.

    Suppose that

    Yt = +u=0

    utu

    with the t independent and identically distributed with zero mean and non-zerovariance. Suppose that

    u=0

    |u|

  • 4.1. ACF Estimation 41

    Table 4.1: Large Sample Results for rk for an AR(1) Model.

    var r(1)

    var r(2) cor

    (r(1), r(2)

    ) var r(10)

    0.9 0.44/T 0.807/

    T 0.97 2.44/

    T

    0.7 0.71/T 1.12/

    T 0.89 1.70/

    T

    0.5 0.87/T 1.15/

    T 0.76 1.29/

    T

    0.3 0.95/T 1.08/

    T 0.53 1.09/

    T

    0.1 0.99/T 1.01/

    T 0.20 1.01/

    T

    Notice that the closer is to 1, the more accurate the estimate becomes.As u, 2u 0. In that case

    var r(u) 1T

    (1 + 2

    1 2).

    For values of close to 1 this produces large variances for the r(u).For 0 < u 6 v (after much algebra),

    cuv =(v1 v+u)(1 + 2)

    1 2 + (v u)vu (v + u)v+u.

    In particular,

    cor(r(1), r(2)

    ) 2( 1 21 + 22 34

    )1/2Using these formulae it is possible to produce the results in table 4.1.

    Example 4.1.3 The MA(1) SeriesFor the MA(1) series it is straightforward to show that

    c11 = 1 3(1)2 + 4(1)4

    cuu = 1 2(1)2 u > 1

    c12 = 2(1)(1 (1)2)Using these results it is easy to produce the results in table 4.2.

    Example 4.1.4 The General MA(q) SeriesIn the case of the general MA(q) series it is easy to see that

    cuu = 1 + 2q

    v=1

    (v)2, for u > q,

    and hence that

    var r(u) =1T

    (1 + 2

    qv=1

    (v)2), for u > q.

  • 42 Chapter 4. Identifying Time Series Models

    Table 4.2: Large Sample Results for rk for an MA(1) Model.

    var r(1)

    var r(k) (k > 1) cor

    (r(1), r(2)

    )0.9 0.71/

    T 1.22/

    T 0.86

    0.7 0.73/T 1.20/

    T 0.84

    0.5 0.79/T 1.15/

    T 0.74

    0.4 0.84/T 1.11/

    T 0.65

    Notes

    1. In practise we dont know the parameters of the model generating the datawe might have. We can still estimate the variances and covariances of ther(u) by substituting estimates of (u) into the formulae above.

    2. Note that there can be quite large correlations between the r(u) values socaution must be used when examining plots of r(u).

    4.2 PACF Estimation

    In section 3.8.1 we saw that the theoretical pacf can be computed by solvingthe Durbin-Levinson recursion

    kk =

    (k)k1j=1

    k1,j(k j)

    1k1j=1

    k1,j(j)

    k,j = k1,j kkk1,kj j = 1, 2, . . . , k 1.

    and setting (u) = uu.In practice, the estimated autocorrelation function is used in place of the

    theoretical autocorrelation function to generate estimates of the partial auto-correlation function.

    To decide whether partial autocorrelation values are significantly differentfrom zero, we can use a (1949) result of Quenouille which states that if the trueunderlying model is AR(p), then the estimated partial autocorrelations at lagsgreater than p are approximately independently normal with means equal tozero and variance 1/T . Thus 2/T can be used as critical limits on (u) foru > p to test the hypothesis of an AR(p) model.

  • 4.3. System Identification 43

    Time

    Diff

    eren

    ce In

    Yie

    ld

    1962 1964 1966 1968 1970 1972 1974

    0.4

    0.6

    0.8

    1.0

    1.2

    1.4

    1.6

    Figure 4.1: Monthly differences between the yield on mortgages andgovernment loans in the Netherlands, January 1961 to March 1974.

    4.3 System Identification

    Given a set of observations Y1, . . . , YT we will need to decide what the appro-priate model might be. The estimated acf and pacf are the tools which can beused to do this. If the acf exhibits slow decay and the pacf cuts off sharply afterlag p, we would identify the series as AR(p). If the pacf shows slow decay andthe acf show a sharp cutoff after lag q, we would identify the series as beingMA(q). If both the acf and pacf show slow decay we would identify the seriesas being mixed ARMA. In this case the orders of the AR and MA parts are notclear, but it is reasonable to first try ARMA(1,1) and move on to higher ordermodels if the fit of this model is not good.

    Example 4.3.1 Interest Yields

    Figure 4.1 shows a plot of the Monthly differences between the yield on mort-gages and government loans in the Netherlands, January 1961 to March 1974.The series appears stationary, so we can attempt to use the acf and pacf todecide whether an AR, MA or ARMA model might be appropriate.

    Figures 4.2 and 4.3 show the estimated acf and pacf functions for the yieldseries. The horizontal lines in the plots are drawn at the y values 1.96/159(the series has 159 values). These provide 95% confidence limits for what canbe expected under the hypothesis of white noise. Note that these limits arepoint-wise so that we would expect to see roughly 5% of the values lying outsidethe limits.

    The acf plot shows evidence of slow decay, while the pacf plot shows asharp

  • 44 Chapter 4. Identifying Time Series Models

    0.0 0.5 1.0 1.5

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 4.2: The acf for the yield data.

    0.5 1.0 1.5

    0.0

    0.2

    0.4

    0.6

    0.8

    Lag

    Parti

    al A

    CF

    Figure 4.3: The pacf for the yield data.

  • 4.4. Model Generalisation 45

    Time

    Yiel

    d

    0 10 20 30 40 50 60 70

    30

    40

    50

    60

    70

    80

    Figure 4.4: Yields from a chemical process (from Box and Jenkins).

    cutoff after lag 1. On the basis of these two plots we might hypothesise thatan AR(1) model was an appropriate description of the data.

    Example 4.3.2 Box and Jenkins Chemical Yields

    Figure 4.4 shows an example from the classic time series text by Box and Jenk-ins. This series contains consecutive yields recorded from a chemical process.Again, the series is apparently stationary so that we can consider identifying anappropriate model on the basis of the acf and pacf.

    Again, the acf seems to show slow decay, this time with alternating signs.The pacf shows sudden cutoff after lag 1, suggesting that again an AR(1) modelmight be appropriate.

    4.4 Model Generalisation

    ARMA series provide a flexible class of models for stationary mean-zero series,with AR and MA series being special cases.

    Unfortunately, many series are clearly not in this general class for models.It is worthwhile looking at some generalisation of this class.

    4.4.1 Non-Zero Means

    When a series {Yt} has a non-zero mean , the mean can be subtracted and thedeviations from the mean modeled as an ARMA series.

    Yt = 1(Yt1 ) + + p(Ytp ) + t + 1t1 + + qtq

  • 46 Chapter 4. Identifying Time Series Models

    0 5 10 15

    0.4

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 4.5: The acf for the chemical process data.

    5 10 15

    0.4

    0.3

    0.2

    0.1

    0.0

    0.1

    0.2

    Lag

    Parti

    al A

    CF

    Figure 4.6: The pacf for the chemical process data.

  • 4.4. Model Generalisation 47

    Alternatively, the model can be adjusted by introducing a constant directly.

    Yt = 1Yt1 + + pYtp + 0 + t + 1t1 + + qtqThe two characterisations are connected by

    0 = (1 + + p)so that

    =0

    1 1 por

    0 = (1 1 p).

    4.4.2 Deterministic Trends

    Consider the modelYt = f(t) + Zt,

    where Zt is a stationary ARMA series and f(t) is a deterministic function of t.Considering Yt f(t) reduces Yt to an ARMA series. (If f(t) contains unknownparameters we can estimate them.)

    4.4.3 Models With Non-stationary AR Components

    Weve seen that any AR model with characteristic equation roots outside theunit circle will be non-stationary.

    Example 4.4.1 Random WalksA random walk is defined by the equation

    Yt = Yt1 + t

    where {t} is a series of uncorrelated (perhaps independent) random variables.In operator form this equation is

    (1 L)Yt = t.The equation 1 z = 0 has a root at 1 so that Yt is non-stationary.

    Example 4.4.2 Integrated Moving AveragesConsider the model

    Yt = Yt1 + t + t1.

    This is similar to an ARMA(1,1) model, but again it is non-stationary.

    Both these models can be transformed to stationarity by differencing transforming to Yt = Yt Yt1. In the first example we have,

    Yt = t,which is the white noise model. In the second we have,

    Yt = t + t1,which is the MA(1) model.

  • 48 Chapter 4. Identifying Time Series Models

    4.4.4 The Effect of Differencing

    Suppose that Yt has a linear trend

    Yt = 0 + 1t+ Zt

    where Zt is stationary with E(Zt) = 0.

    Yt = 1 +Zt

    Differencing has removed the trend. Now suppose that {Yt} has a deterministicquadratic trend

    Yt = 0 + 1t+ 2t2 + Zt

    then

    Yt = (0 + 1t+ 2t2) (0 + 1(t 1) + 2(t 1)2) +Zt= 1 + 2(t2 (t2 2t+ 1)) +Zt= (1 2) + 22t+Zt= linear trend + stationary.

    Differencing again produces

    2Yt = 22 +2Zt.

    In general, a polynomial trend of order k can be eliminated by differencing ktimes.

    Now lets consider the case of a stochastic trend. Suppose that

    Yt =Mt + t

    where Mt is a random process which changes slowly over time. In particular,we can assume that Mt is generated by a random walk model.

    Mt =Mt1 + t

    with t independent of t. Then

    Yt = t + t t1.

    Yt is stationary and has an autocorrelation function like that of an MA(1)series.

    (1) =1

    2 + (/)2

    More generally, higher order stochastic trend models can be reduced to sta-tionarity by repeated differencing.

  • 4.5. ARIMA Models 49

    4.5 ARIMA Models

    If Wt = dYt is an ARMA(p,q) series than Yt is said to be an integrated autore-gressive moving-average (p, d, q) series, denoted ARIMA(p,d,q). If we write

    (L) = 1 1L pLp

    and(L) = 1 + 1L+ + qLq

    then we can write down the operator formulation

    (L)dYt = (L)t.

    Example 4.5.1 The IMA(1,1) Model This model is widely used in businessand economics. It is defined by

    Yt = Yt1 + t + t1.

    Yt can be thought of as a random walk with correlated errors.Notice that

    Yt = Yt1 + t + t1

    = Yt2 + t1 + t2 + t + t1

    = Yt2 + t + (1 + )t1 + t2

    ...

    = Ym + t + (1 + )t1 + + (1 + )m + m1If we assume that Ym = 0 (i.e. observation started at time m),

    Yt = t + (1 + )t1 + + (1 + )m + m1This representation can be used to derive the formulae

    var(Yt) =(1 + 2 + (1 + )2(t+m)

    )2

    cor(Yk, Ytk

    )=

    1 + 2 + (1 + )2(t+m k)(var(Yt)var(Ytk)

    )1/2=

    t+m kt+m

    1for m large and k moderate. (We are considering behaviour after burn-in.)This means that we can expect to see very slow decay in the autocorrelationfunction.

    The very slow decay of the acf function is characteristic of ARIMA serieswith d > 0. Figures 4.7 and 4.8 show the estimated autocorrelation functionsfrom a simulated random walk and an integrated autoregressive model. Bothfunctions show very slow declines in the acf.

  • 50 Chapter 4. Identifying Time Series Models

    0 5 10 15 20

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 4.7: The autocorrelation function of the ARIMA(0,1,0) (randomwalk) model.

    0 5 10 15 20

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 4.8: The autocorrelation function of the ARIMA(1,1,0) modelwith 1 = .5.

  • Chapter 5

    Fitting and Forecasting

    5.1 Model Fitting

    Suppose that we have identified a particular ARIMA(p,d,q) model which appearsto describe a given time series. We now need to fit the identified model andassess how well the model fits. Fitting is usually carried out using maximumlikelihood. For a given set of model parameters, we calculate a series of one-step-ahead predictions.

    Yk+1 = PHkYk+1

    where Hk is the linear space spanned by Y1, . . . , Yk. The predictions are ob-tained in a recursive fashion using a process known as Kalman filtering. Eachprediction results in a prediction error Yk+1 Yk+1. These are, by construc-tion, uncorrelated. If we add the requirement that the {Yt} series is normallydistributed, the prediction errors are independent normal random variables andthis can be used as the basis for computing the likelihood.

    The parameters which need to be estimated are the AR coefficients 1, . . . , p,the MA coefficients 1, . . . , q and a constant term (either or 0 as outlinedin section 4.4.1). Applying maximum-likelihood produces both estimates andstandard errors.

    5.1.1 Computations

    Given a time series y in R, we can fit an ARMA(p,d,q) model to the series asfollows

    > z = arima(y, order=c(p, d, q))

    The estimation results can be inspected by printing them.

    Example 5.1.1 U.S. Unemployment RatesFigure 5.1 shows a plot of the seasonally adjusted quarterly United States un-employment rates from the first quarter of 1948 to the first quarter of 1978.If the data is stored as a time series in the R data set unemp, the plot can beproduced with the command

    > plot(unemp)

    51

  • 52 Chapter 5. Fitting and Forecasting

    Time

    un

    em

    p

    1950 1955 1960 1965 1970 1975

    3

    4

    5

    6

    7

    8

    9

    Figure 5.1: United States quarterly unemployment rates (seasonallyadjusted).

    Neither the original series nor the presence of very slow decay in the acf(figure 5.2) indicate strongly that differencing is required so we will leave theseries undifferenced and attempt to find an appropriate ARMA model.

    The acf and pacf functions for the series can be computed and plotted withthe following commands

    > acf(unemp)> pacf(unemp)

    The resulting plots (figures 5.2 and 5.3) show the strong signature of an AR(2)series (slow decay of the acf and sharp cutoff of the pacf after two lags.)

    With the series identified we can go about estimating the unknown param-eters. We do this with the arima function in R.

    > z = arima(unemp, order=c(2, 0, 0))> z

    Call:arima(x = unemp, order = c(2, 0, 0))

    Coefficients:ar1 ar2 intercept

    1.5499 -0.6472 5.0815s.e. 0.0681 0.0686 0.3269

    sigma^2 estimated as 0.1276: log likelihood = -48.76, aic = 105.53

  • 5.1. Model Fitting 53

    0 1 2 3 4 5

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 5.2: The acf for the unemployment series.

    1 2 3 4 5

    0.5

    0.0

    0.5

    Lag

    Parti

    al A

    CF

    Figure 5.3: The pacf for the unemployment series.

  • 54 Chapter 5. Fitting and Forecasting

    The fitted model in this case is

    Yt = 5.0815 + 1.5499 Yt1 0.6472 Yt2 + twhere t has an estimated variance of 0.1276.

    Example 5.1.2 Railroad Bond YieldsFigure 5.4 shows a plot of the monthly yields on AA rated railroad bonds (as apercentage times 100). With the data values stored in the R variable rrbonds,the plot was produced with the command

    > plot(rrbonds, ylab = "Bond Yields")

    In this case it is clear that the series is non-stationary and we need to trans-form to stationarity by taking differences. The series can be differenced and theresult plotted as follows

    > rrdiffs = diff(rrbonds)> plot(rrdiffs, ylab = "Differenced Yields")

    Figure 5.5 shows a plot of the differenced series and it is clear from this plotthat the assumption of stationarity is much more reasonable. To confirm this,and to suggest possible models we need to examine the acf and pacf functions.These are shown in figures 5.6 and 5.7.

    The model signature here is less certain than that of the previous example.A possible interpretation is that the acf is showing rapid decay and the pacfis showing sharp cutoff after one lag. This suggests that original series can bemodelled as an ARIMA(1,1,0) series.

    The estimation of parameters can be carried out for this model as follows

    > z = arima(rrbonds,order=c(1,1,0))> z

    Call:arima(x = rrbonds, order = c(1, 1, 0))

    Coefficients:ar1

    0.4778s.e. 0.0865

    sigma^2 estimated as 84.46: log likelihood = -367.47, aic = 738.94

    The fitted model in this case is

    Yt = 0.4778 Yt + t.with t having an estimated variance of 84.46.

    5.2 Assessing Quality of Fit

    Once a model has been fitted to a set of data it is always important to assesshow well the model fits. This is because the inferences we make depend crucially

  • 5.2. Assessing Quality of Fit 55

    Time

    Bond

    Yie

    lds

    1968 1970 1972 1974 1976

    650

    700

    750

    800

    850

    900

    Figure 5.4: Monthly AA railroad bond yields (% 100).

    Time

    Diff

    eren

    ced

    Yiel

    ds

    1968 1970 1972 1974 1976

    20

    10

    0

    10

    20

    30

    Figure 5.5: Differenced monthly AA railroad bond yields.

  • 56 Chapter 5. Fitting and Forecasting

    0.0 0.5 1.0 1.5

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 5.6: The ACF for the differenced monthly AA railroad bondyields.

    0.5 1.0 1.5

    0.2

    0.1

    0.0

    0.1

    0.2

    0.3

    0.4

    Lag

    Parti

    al A

    CF

    Figure 5.7: The PACF for the differenced monthly AA railroad bondyields.

  • 5.2. Assessing Quality of Fit 57

    on the appropriateness of the fitted model. The usual way of assessing goodnessof fit is through the examination of residuals.

    In the case of autoregressive models it is easy to see how residuals might bedefined. In the case of the AR(p) model

    Yt = 1Yt1 + + pYtp + tit is clear that we can take the residuals to be

    t = Yt 1Yt1 + + pYtpIn the general ARMA case

    Yt = 1Yt1 + + pYtp + t + 1t1 + + qtqor

    (L) = (L)tqwe must first transform to autoregressive form by inverting the MA operator.

    (L)1(L) = t

    or

    Yt =u=1

    piuYtu + t.

    The residuals can then be defined as

    t = Yt u=1

    piuYtu.

    This is a useful theoretical approach, but in practise the residuals are ob-tained as a byproduct of the computation of the likelihood (they are the predic-tion errors from the one-step ahead forecasts). If the ARMA model is correct(and the series is normally distributed) then the residuals are approximatelyindependent normal random variables with mean zero and variance 2 .

    A simple diagnostic is to simply plot the residuals and to see whether theyappear to be a white noise series. In the case of the US unemployment serieswe can do this as follows

    > z = arima(unemp, order=c(2, 0, 0))> plot(resid(z))

    The results are shown in figure 5.8.It is also possible to carry out tests of normality on the residuals. This can

    be done by simply producing a histogram of the residuals or by performing anormal quantile-quantile plot. This can be done as follows:

    > hist(resid(z))> qqnorm(resid(z))

    A simple check of heteroscedasticity can be carried out by plotting the resid-uals against the fitted values. This is done with the following command:

    > plot(unemp - resid(z), resid(z),+ xy.lines = FALSE, xy.labels = FALSE)

    None of these plots indicate that there is any problem with the fit of the model.

  • 58 Chapter 5. Fitting and Forecasting

    Time

    resi

    d(z)

    1950 1955 1960 1965 1970 1975

    0.5

    0.0

    0.5

    1.0

    Figure 5.8: Residuals from the unemployment data.

    5.3 Residual Correlations

    It is tempting to examine the quality of model fit by seeing whether the residualsform an uncorrelated sequence. One might for example plot the estimated acfof the residuals and look for lags where the correlations exceed 2/T . Unfor-tunately, while these limits are approximately correct for large lags, for smalllags they overstate the variability of the estimated correlations. This should beno surprise, the effect of model-fitting is to remove as much of the correlationpresent in the series as possible. The correlations between the residuals shouldbe closer to zero than for a non-fitted series.

    In addition to checking whether there are large individual correlations presentin a time series, it can be useful to pool information from successive correlationsto see whether there is significant correlation left in the residuals.

    One test statistic in common use is the modified Box-Pierce (or Ljung-Box-Pierce) statistic

    Q = T (T + 2)Kk=1

    r2kT k .

    If the true underlying model is ARMA(p,q), the distribution of Q is approxi-mately 2Kpq.

    In R, the modified Box-Pierce statistic (and an older variant) can be com-puted with the function Box.test (you should specify type="Ljung"). Analternative is to use the function tsdiag which plots the (standardised) residu-als, the acf of the residuals and the modified Box-Pierce statistic for a varietyof values of K. In the case of the US unemployment series, an appropriate

  • 5.4. Forecasting 59

    command would be

    > tsdiag(z)

    This produces the plot shown in figure 5.9. The first panel of the plot shows the(standardised) residuals from the model fit. The residuals look reasonably ran-dom but during the period from 1965 to 1970 there is a sequence of values whichare all negative. This indicates that there may be problems with the model fit.The second panel shows the autocorrelation function for the residuals. One ofthe correlations (that at a lag of 7 quarters) lies outside the two standard errorbounds about zero. Since we would expect one in twenty of the estimated cor-relations to exceed these bounds this does not provide evidence for a significantcorrelation. The third panel shows p-values for the Ljung-Box statistics at lagsfrom one quarter to ten quarters. There is no evidence of significant correlationin this plot.

    Overall, there is some evidence of lack of fit, but there is no obvious wayof fixing the problem. Because of this we will move on to making forecasts forthe series, but keep in mind that we need to be cautious in making use of theseforecasts.

    5.4 Forecasting

    Once we have decided on an appropriate time-series model, estimated its un-known parameters and established that the model fits well, we can turn to theproblem of forecasting future values of the series.

    The autoregressive representation

    Yt =u=1

    piuYtu + t.

    suggests predicting the next observation beyond Y1, . . . , YT using

    YT+1 =u=1

    piuYT+1u.

    where the piu are obtained by substituting the estimated parameters in place ofthe theoretical ones.

    Once a forecast is obtained for YT+1 we can use it to obtain a forecast forYT+2 and then use these two forcasts to generate a forecast for YT+3. Theprocess can be continued to obtain forecasts out to any point in the future.Because uncertainty increases as we predict further and further from the datawe have, we can expect the standard errors associated with our predictions toincrease.

    In practise, forecasts are generated by the same Kalman filter algorithm usedto compute the likelihood used for parameter estimation.

    5.4.1 Computation

    Once a model has been fitted using arima, the function predict can be used toobtain forecasts. In the case of the US unemployment series we could obtain 10future forecasts as follows:

  • 60 Chapter 5. Fitting and Forecasting

    Standardized Residuals

    Time

    1950 1955 1960 1965 1970 1975

    2

    1

    0

    1

    2

    3

    0 1 2 3 4 5

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    ACF of Residuals

    l

    l

    l

    l

    l

    l

    ll l l

    2 4 6 8 10

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    p values for LjungBox statistic

    lag

    p va

    lue

    Figure 5.9: Residual diagnostics for the US unemployment series.

  • 5.5. Seasonal Models 61

    > z = arima(unemp,order=c(2, 0, 0))> p = predict(z, n.ahead = 10)

    The predictions can be viewed with the command

    > p$predQtr1 Qtr2 Qtr3 Qtr4

    1978 5.812919 5.491268 5.2432531979 5.067017 4.954379 4.893856 4.8729471980 4.879708 4.903719 4.936557

    and their standard errors with

    > p$seQtr1 Qtr2 Qtr3 Qtr4

    1978 0.3572328 0.6589087 0.90950401979 1.0969935 1.2248696 1.3040856 1.34794991980 1.3689669 1.3771201 1.3792859

    It can be useful to plot the forecasts on the same graph as the original timeseries. This a relatively complex task, and would probably be worth packagingup as an R function. Here is how 20 forecasts for the unemp series and theirstandard errors can be plotted

    > p = predict(z, n.ahead = 20)> xlim = range(time(unemp), time(p$pred))> ylim = range(unemp, p$pred - 2 * p$se, p$pred + 2 * p$se)> plot(unemp, xlim = xlim, ylim = ylim)> lines(p$pred, lwd=2)> lines(p$pred - 2 * p$se, lty = "dotted")> lines(p$pred + 2 * p$se, lty = "dotted")

    The result of this appears in figure 5.10.Notice that the standard errors around forecasts widen rapidly and the fore-

    casts themselves seem to be tending to a constant value. If we extend theforecast period, this becomes more obvious as shown in figure 5.11. In fact, overthe long term, forecasts for stationary series ultimately converge to the mean ofthe series and the standard errors for the forecasts tend to the standard devia-tion of the series. This means that we can only expect to gain advantage fromthe use of short-term forecasts.

    5.5 Seasonal Models

    5.5.1 Purely Seasonal Models

    Many time series exhibit strong seasonal characteristics. Well use s to denotethe seasonal period. For monthly series, s = 12, and for quarterly series s = 4.These are the most common cases, but seasonal patterns can show up in otherplaces (e.g. weekly patterns in daily observations or daily patterns in hourlydata).

  • 62 Chapter 5. Fitting and Forecasting

    Time

    un

    em

    p

    1950 1955 1960 1965 1970 1975 1980

    2

    3

    4

    5

    6

    7

    8

    9

    Figure 5.10: Forecasts for the unemployment data.

    Time

    1976 1978 1980 1982 1984

    2

    3

    4

    5

    6

    7

    8

    9

    Figure 5.11: Long-term forecasts for the US unemployment series.

  • 5.5. Seasonal Models 63

    Seasonal effects can be modelled by including coefficients at lags which aremultiples of the seasonal period. For example, the model

    Yt+1Yts+2Yt2s+ +PYtPs = t+1ts+2t2s+ +qtQsis the seasonal analog of an ARMA model. Caution is required with modelswhich only involve coefficients at multiples of the seasonal period because theyprovide independent models for each of s seasonal sub-series and make no state-ment about the relationship between the series. For example, the model aboveapplies equally well to the case where the seasonal sub-series are identical

    Yks+1 = Yks+2 = = Yks+sas it does to the case where they are independent.

    5.5.2 Models with Short-Term and Seasonal Components

    In practise, series will have a mixture of both seasonal and short-term dependen-cies. It is possible to consider models for such series by including both seasonaland non-seasonal terms. This can be done directly, but it turns out to be moreuseful to consider a slightly different class of model.

    Recall that the general ARMA(p,q) model can be written as

    (L)Yt = (L)t.

    Each of the (L) and operators can be written as polynomials in L. Thesehave the factorised form

    (1 + a1L)(1 + a2L) (1 + apL)Yt = (1 + b1L)(1 + b2L) (1 + bpL)t.

    Non-stationarity can also be handled by introducing additional factors of theform (1L). This suggests a different way of handling seasonal effects. We cansimply introduce additional factors of the form (1+AL12) into the autoregressiveoperator or (1 +BL12) into the moving average operator.

    The operator on each side of the general model is generally written as aproduct of a polynomial in L times a polynomial in Ls. The general ARMAmodel is

    (Ls)(L)Yt = (Ls)(L)t,

    where is a polynomial of degree P , is a polynomial of degree p, is apolynomial of order Q and is a polynomial of degree q. Such a model isdenoted as ARMA(p, q) (P,Q)s.

    Non-stationarity can be accommodated by introducing additional differenc-ing terms. This produces a seasonal ARIMA model, the most general form ofwhich is

    (Ls)(L)(1 L)d(1 Ls)DYt = (Ls)(L)t,which is denoted by ARMA(p, d, q) (P,D,Q)s.

    As an example, well consider the monthly average temperatures in Dubuque,Iowa, from January 1964 to December 1975. Figure 5.12 shows a plot of theseries. It contains a very strong seasonal pattern. The strength of the patternsuggests that we should start by differencing the series at the seasonal lag.

  • 64 Chapter 5. Fitting and Forecasting

    Time

    tem

    pdub

    1964 1966 1968 1970 1972 1974 1976

    10

    20

    30

    40

    50

    60

    70

    Figure 5.12: Monthly average temperatures C in Dubuque, Iowa.

    Time

    diff(t

    empd

    ub, 1

    2)

    1966 1968 1970 1972 1974 1976

    10

    5

    0

    5

    10

    Figure 5.13: The seasonally differenced Dubuque temperature series.

  • 5.5. Seasonal Models 65

    0 1 2 3 4

    0.4

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 5.14: The acf of the seasonally differenced Dubuque temperatureseries.

    Figure 5.13 shows a plot of the seasonally differenced series and 5.14 showsits acf function. Together the plots indicate the differenced series is stationary.To decide on an appropriate model we also need to inspect a plot of the pacf ofthe seasonally differenced series. This is shown in figure 5.15. When taken inconjunction with the structure of the acf, we have clear evidence that a seasonalMA(1) is an appropriate model for the differenced series.

    The R function arima can be used to fit the full model, which is ARIMA(0, 0, 0) (0, 1, 1)12. Assuming that the series has been read into the variabletempdub, the fitting process and display of the results can be carried out asfollows.

    > z = arima(tempdub, seas = c(0,1,1))> z

    Call:arima(x = tempdub, seasonal = c(0, 1, 1))

    Coefficients:sma1

    -1.0000s.e. 0.0967

    sigma^2 estimated as 11.69: log likelihood = -364.48,aic = 732.96

  • 66 Chapter 5. Fitting and Forecasting

    0 1 2 3 4

    0.3

    0.2

    0.1

    0.0

    0.1

    Lag

    Parti

    al A

    CF

    Figure 5.15: The pacf of the seasonally differenced Dubuque tempera-ture series.

    Before producing forecasts of the series we need to check the residuals fromthe fitting process to see that they are (close to) white noise. As before, thiscan be done with the tsdiag function.

    > tsdiag(z)

    As shown by figure 5.16, the model seems to fit well.

    5.5.3 A More Complex Example

    Figure 5.17 shows a plot of a