Econometrics 2 — Fall 2005
Univariate Time Series Analysis;
ARIMA Models
Heino Bohn Nielsen
1 of 41
Univariate Time Series Analysis• We consider a single time series, y1, y2, ..., yT .We want to construct simple models for yt as a function of the past: E[yt |history].
• Univariate models are useful for:(1) Analyzing the dynamic properties of time series.
What is the dynamic adjustment after a shock?Do shocks have transitory or permanent effects (presence of unit roots)?
(2) Forecasting.A model for E[yt | xt] is only useful for forecasting yt+1 if we know (or canforecast) xt+1.
(3) Univariate time series analysis is a way to introduce the tools necessary for ana-lyzing more complicated models.
2 of 41
Outline of the Lecture(1) Characterizing time dependence: ACF and PACF.
(2) Modelling time dependence: the ARMA(p,q) model
(3) Examples:
• AR(1).• AR(2).• MA(1).
(4) Lag operators, lag polynomials and invertibility.
(5) Model selection.
(6) Estimation.
(7) Forecasting.
3 of 41
Characterizing Time Dependence• For a stationary time series the autocorrelation function (ACF) is
ρk = Corr(yt, yt−k) =Cov(yt, yt−k)pV (yt) · V (yt−k)
=Cov(yt, yt−k)
V (yt)=γkγ0.
An alternative measure is the partial autocorrelation function (PACF), which is theconditional correlation:.
θk = Corr(yt, yt−k | yt−1, ..., yt−k+1).
Note: ACF and PACF are bounded in [−1; 1], symmetric ρk = ρ−k and ρk = θ0 = 1.
• Simple estimators, bρk and bθk, can be derived from OLS regressionsACF: yt = c + ρkyt−k + residualPACF: yt = c + θ1yt−1 + ... + θkyt−k + residual
• For an IID time series it hold that V (bρk) = V (bθk) = T−1, and a 95% confidenceband is given by ±2/
√T .
4 of 41
Example: Danish GDP
1980 2000
6.50
6.75
7.00
Danish GDP, log
1980 2000
-0.05
0.00
0.05Deviation from trend
1980 2000
-0.025
0.000
0.025
0.050
First difference
0 10 20-1
0
1
ACF
0 10 20-1
0
1
0 10 20-1
0
1
0 10 20-1
0
1
PACF
0 10 20-1
0
1
0 10 20-1
0
1
5 of 41
The ARMA(p,q) Model• First define a white noise process, t ∼ IID(0, σ2).
• The autoregressive AR(p) model is defined as
yt = θ1yt−1 + θ2yt−2 + ... + θpyt−p + t.
Systematic part of yt is a linear function of p lagged values.We need p (observed) initial values: y−(p−1), y−(p−2), ..., y−1, y0.
• The moving average MA(q) model is defined as
yt = t + α1 t−1 + α2 t−2 + ... + αq t−q.
yt is a moving average of past shocks to the process.We need q initial values: −(p−1) = −(p−2) = ... = −1 = 0 = 0.
• They can be combined into the ARMA(p,q) model
yt = θ1yt−1 + ... + θpyt−p + t + α1 t−1 + ... + αq t−q.6 of 41
Dynamic Properties of an AR(1) Model• Consider the AR(1) model
Yt = δ + θYt−1 + t.
Assume for a moment that the process is stationary.As we will see later, this requires |θ| < 1.
• First we want to find the expectation.Stationarity implies that E[Yt] = E[Yt−1] = µ. We find
E[Yt] = E[δ + θYt−1 + t]
E[Yt] = δ + θE[Yt−1] +E[ t]
(1− θ)µ = δ
µ =δ
1− θ.
Note the following:(1) The effect of the constant term, δ, depends on the autoregressive parameter, θ.(2) µ is not defined if θ = 1. This is excluded for a stationary process.
7 of 41
• Next we want to calculate the variance and the autocovariances.It is convenient to define the deviation from mean, yt = Yt − µ, so that
Yt = δ + θYt−1 + t
Yt = (1− θ)µ + θYt−1 + t
Yt − µ = θ (Yt−1 − µ) + t
yt = θyt−1 + t.
• We note that γ0 = V [Yt] = V [yt]. We find:
V [yt] = E[y2t ]
= E[(θyt−1 + t)2]
= E[θ2y2t−1 +2t + 2θyt−1 t]
= θ2E[y2t−1] +E[ 2t ] + 2θE[yt−1 t]
= θ2V [yt−1] + σ2 + 0.
Using stationarity, γ0 = V [yt] = V [yt−1], we get
γ0(1− θ2) = σ2 or γ0 =σ2
1− θ2.
8 of 41
• The covariances, Cov[yt, yt−k] = E[ytyt−k], are given by
γ1 = E[ytyt−1] = E[(θyt−1 + t)yt−1] = θE[y2t−1] + E[yt−1 t] = θγ0 = θσ2
1− θ2
γ2 = E[ytyt−2] = E[(θyt−1 + t) yt−2] = θE[yt−1yt−2] + E[ tyt−2] = θγ1 = θ2σ2
1− θ2...
γk = E[ytyt−k] = θkγ0
• The ACF is given by
ρk =γkγ0=θkγ0γ0
= θk.
• The PACF is simply the autoregressive coefficients: θ1, 0, 0, ...
9 of 41
Examples of Stationary AR(1) Models
0 50 100
-2.5
0.0
2.5 yt =εt
0 10 20
0
1ACF-T heoret ical ACF-Est imated
0 10 20
0
1PA CF-T heoretical PA CF-Est imated
0 50 100
-2.5
0.0
2.5
5.0yt =0.80⋅yt−1+εt
0 10 20
0
1ACF-T heoret ical ACF-Est imated
0 10 20
0
1PA CF-T heoretical PA CF-Est imated
0 50 100
-2.5
0.0
2.5
5.0yt =-0.80⋅yt−1+εt
0 10 20
0
1ACF-T heoret ical ACF-Est imated
0 10 20
0
1PA CF-T heoretical PA CF-Est imated
10 of 41
Examples of AR(1) Models
0 20 40 60 80 100
-2
0
2
y t=0.50⋅yt−1+ε t
0 20 40 60 80 100
-2.5
0.0
2.5
5.0
7.5y t=0.95⋅yt−1+ε t
0 20 40 60 80 100
0
5
10y t=1.00⋅yt−1+ε t
0 20 40 60 80 100
0
50
100
150
200y t=1.05⋅yt−1+ε t
11 of 41
Dynamic Properties of an AR(2) Model• Consider the AR(2) model given by
Yt = δ + θ1Yt−1 + θ2Yt−2 + t.
• Again we find the mean under stationarity:
E [Yt] = δ + θ1E [Yt−1] + θ2E [Yt−2] +E [ t]
E [Yt] =δ
1− θ1 − θ2= µ.
• We then define the process yt = Yt − µ for which it holds that
yt = θ1yt−1 + θ2yt−2 + t.
12 of 41
• Multiplying both sides with yt and taking expectations yields
E£y2t¤= θ1E [yt−1yt] + θ2E [yt−2yt] +E [ tyt]
γ0 = θ1γ1 + θ2γ2 + σ2
Multiplying instead with yt−1 yields
E [ytyt−1] = θ1E [yt−1yt−1] + θ2E [yt−2yt−1] +E [ tyt−1]
γ1 = θ1γ0 + θ2γ1
Multiplying instead with yt−2 yields
E [ytyt−2] = θ1E [yt−1yt−2] + θ2E [yt−2yt−2] +E [ tyt−2]
γ2 = θ1γ1 + θ2γ0
Multiplying instead with yt−3 yields
E [ytyt−3] = θ1E [yt−1yt−3] + θ2E [yt−2yt−3] +E [ tyt−3]
γ3 = θ1γ2 + θ2γ1
• These are the so-called Yule-Walker equations.13 of 41
• To find the variance we can substitute γ1 and γ2 into the equation for γ0. This is,however, a bit tedious.
• We can find the autocorrelations, ρk = γk/γ0, as
ρ1 = θ1 + θ2ρ1
ρ2 = θ1ρ1 + θ2
ρk = θ1ρk−1 + θ2ρk−2, k ≥ 3
or alternatively that
ρ1 =θ1
1− θ2
ρ2 =θ21
1− θ2+ θ2
ρk = θ1ρk−1 + θ2ρk−2, k ≥ 3.
14 of 41
Examples of AR(2) Models
0 50 100
-2.5
0.0
2.5
5.0yt =0.50⋅yt−1+0.40⋅yt−2+εt
0 10 20
0
1ACF-T heoret ical ACF-Est imated
0 10 20
0
1PA CF-T heoretical PA CF-Est imated
0 50 100
-5
0
5 yt =-0.80⋅yt−2+εt
0 10 20
0
1ACF-T heoret ical ACF-Est imated
0 10 20
0
1PA CF-T heoretical PA CF-Est imated
0 50 100
-5
0
5yt =1.30⋅yt−1-0.80⋅yt−2+εt
0 10 20
0
1ACF-T heoret ical ACF-Est imated
0 10 20
0
1PA CF-T heoretical PA CF-Est imated
15 of 41
Dynamic Properties of a MA(1) Model• Consider the MA(1) model
Yt = µ + t + α t−1.
• The mean is given by
E[Yt] = E[µ + t + α t−1] = µ
which is here identical to the constant term.
• Define the deviation from mean: yt = Yt − µ.
• Next we find the variance:
V [Yt] = E£y2¤= E
h( t + α t−1)
2i= E
£2t
¤+E
£α2 2t−1
¤+E [2α t t−1] =
¡1 + α2
¢σ2.
16 of 41
• The covariances, Cov[yt, yt−k] = E[ytyt−k], are given by
γ1 = E[ytyt−1]
= E[( t + α t−1) ( t−1 + α t−2)]
= E[ t t−1 + α t t−2 + α 2t−1 + α2 t−1 t−2] = ασ2
γ2 = E[ytyt−2]
= E[( t + α t−1) ( t−2 + α t−3)]
= E[ t t−2 + α t t−3 + α t−1 t−2 + α2 t−1 t−3] = 0...
γk = E[ytyt−k] = 0
• The ACF is given by
ρ1 =γ1γ0=
ασ2
(1 + α2)σ2=
α
(1 + α2)ρk = 0, k ≥ 2.
17 of 41
Examples of MA Models
0 50 100
-2.5
0.0
2.5
5.0yt =εt-0.90⋅ εt−1
0 10 20
0
1ACF-T heoret ical ACF-Est imated
0 10 20
0
1PA CF-T heoretical PA CF-Est imated
0 50 100
-2.5
0.0
2.5yt =εt+0.90⋅ εt−1
0 10 20
0
1ACF-T heoret ical ACF-Est imated
0 10 20
0
1PA CF-T heoretical PA CF-Est imated
0 50 100
-2.5
0.0
2.5yt =εt-0.60⋅ εt−1+0.50⋅ εt−2+0.50⋅ εt−3
0 10 20
0
1ACF-T heoret ical ACF-Est imated
0 10 20
0
1PA CF-T heoretical PA CF-Est imated
18 of 41
The Lag— and Difference Operators• Now we introduce an important tool called the lag-operator, L.It has the property that
L · yt = yt−1,
and, for example,
L2yt = L(Lyt) = Lyt−1 = yt−2.
• Also define the first difference operator, ∆ = 1− L, such that
∆yt = (1− L) yt = yt − Lyt = yt − yt−1.
• The operators L and ∆ are not functions, but can be used in calculations.
19 of 41
Lag Polynomials• Consider as an example the AR(2) model
yt = θ1yt−1 + θ2yt−2 + t.
That can be written as
yt − θ1yt−1 − θ2yt−2 = t
yt − θ1Lyt − θ2L2yt = t
(1− θ1L− θ2L2)yt = t
θ(L)yt = t,
where
θ(L) = 1− θ1L− θ2L2
is a polynomial in L, denoted a lag-polynomial.
• Standard rules for calculating with polynomials also hold for polynomials in L.
20 of 41
Characteristic Equations and Roots• For a model
yt − θ1yt−1 − θ2yt−2 = t
θ(L)yt = t,
we define the characteristic equation as
θ(z) = 1− θ1z − θ2z2 = 0.
The solutions, z1 and z2, are denoted characteristic roots.
• An AR(p) has p roots.Some of them may be complex values, h± v · i, where i =
√−1.
• Recall, that the roots can be used for factorizing the polynomial
θ(z) = 1− θ1z − θ2z2 = (1− φ1z) (1− φ2z) ,
where φ1 = z−11 and φ2 = z−12 are the inverse roots.
21 of 41
Invertibility of Polynomials• Define the inverse of a polynomial, θ−1(L) of θ(L), so that
θ−1(L)θ(L) = 1.
• Consider the AR(1) case, θ(L) = 1− θL, and look at the product
(1− θL)¡1 + θL + θ2L2 + θ3L3 + ... + θkLk
¢= (1− θL) +
¡θL− θ2L2
¢+¡θ2L2 − θ3L3
¢+¡θ3L3 − θ4L4
¢+ ...
= 1− θk+1Lk+1.
If |θ| < 1, it holds that θk+1Lk+1→ 0 as k →∞ implying that
θ−1(L) = (1− θL)−1 =1
1− θL= 1 + θL + θ2L2 + θ3L3 + ... =
∞Xi=0
θiLi.
• If θ(L) is a finite polynomial, the inverse polynomial, θ−1(L), is infinite.
22 of 41
ARMA Models in AR and MA form• Using lag polynomials we can rewrite the stationary ARMA(p,q) model as
yt − θ1yt−1 − ...− θpyt−p = t + α1 t−1 + ... + αq t−q (∗)θ(L)yt = α(L) t.
where θ(L) and α(L) are finite polynomials.
• If θ(L) is invertible, (∗) can be written as the infinite MA(∞) model
yt = θ−1(L)α(L) t
yt = t + γ1 t−1 + γ2 t−2 + ...
This is called the MA representation.
• If α(L) is invertible, (∗) can be written as an infinite AR(∞) model
α−1(L)θ(L)yt = t
yt − γ1yt−1 − γ2yt−2 − ... = t.
This is called the AR representation.23 of 41
Invertibility and Stationarity• A finite order MA process is stationary by construction.— It is a linear combination of stationary white noise terms.— Invertibility is sometimes convenient for estimation and prediction.
• An infinite MA process is stationary if the coefficients, αi, converge to zero.— We require that
P∞i=1 α
2i <∞.
• An AR process is stationary if θ(L) is invertible.— This is important for interpretation and inference.— In the case of a root at unity standard results no longer hold.We return to unit roots later.
24 of 41
• Consider again the AR(2) model
θ(z) = 1− θ1z − θ2z2 = (1− φ1L) (1− φ2L) .
The polynomial is invertible if the factors (1− φiL) are invertible, i.e. if
|φ1| < 1 and |φ2| < 1.
• In general a polynomial, θ(L), is invertible if the characteristic roots, z1, ..., zp, arelarger than one in absolute value.In complex cases, this corresponds to the roots being outside the complex unit circle.(Modulus larger than one).
Imaginary part
i
1 Real part
25 of 41
Solution to the AR(1) Model• Consider the model
Yt = δ + θYt−1 + t
(1 + θL)Yt = δ + t.
The solution is given as
Yt = (1 + θL)−1(δ + t)
=¡1 + θL + θ2L2 + θ3L3 + ...
¢(δ + t)
=¡1 + θ + θ2 + θ3 + ...
¢δ + t + θ t−1 + θ2 t−2 + θ3 t−3 + ...
• This is the MA-representation. The expectation is given by
E[Yt] =¡1 + θ + θ2 + θ3 + ...
¢δ → δ
1− θ.
26 of 41
• An alternative solution method is recursive subtitution:
Yt = δ + θYt−1 + t
= δ + θ(δ + θYt−2 + t−1) + t
= (1 + θ) δ + t + θ t−1 + θ2Yt−2
= (1 + θ) δ + t + θ t−1 + θ2(δ + θYt−3 + t−2)
=¡1 + θ + θ2
¢δ + t + θ t−1 + θ2 t−2 + θ3Yt−3
...
=¡1 + θ + θ2 + θ3 + ...
¢δ + t + θ t−1 + θ2 t−2 + ... + θt−1Y1
where we see the effect of the initial observation.
• The expectation is
E[Yt] =¡1 + θ + θ2 + θ3 + ...
¢δ + θt−1Y1→
δ
1− θ.
27 of 41
ARMA Models and Common Roots• Consider the stationary ARMA(p,q) model
yt − θ1yt−1 − ...− θpyt−p = t + α1 t−1 + ... + αq t−q
θ(L)yt = α(L) t
(1− φ1L) (1− φ2L) · · ·¡1− φpL
¢yt = (1− ξ1L) (1− ξ2L) · · ·
¡1− ξqL
¢t.
• If φi = ξj for some i, j, they are denoted common roots or canceling roots.The ARMA(p,q) model is equivalent to a ARMA(p-1,q-1) model.
• As an example, consider
yt − yt−1 + 0.25yt−2 = t − 0.5 t−1¡1− L + 0.25L2
¢yt = (1− 0.5L) t
(1− 0.5L) (1− 0.5L) yt = (1− 0.5L) t
(1− 0.5L) yt = t.
28 of 41
Unit Roots and ARIMA Models• A root at one is denoted a unit root.We consider the consequences later, here we just remove them by first differences.
• Consider an ARMA(p,q) model
θ(L)yt = α(L) t.
If there is a unit root in the AR polynomial, we can factorize into
θ(L) = (1− L) (1− φ2L) · · ·¡1− φpL
¢= (1− L)θ∗(L),
and we can write the model as
θ∗(L)(1− L)yt = α(L) t
θ∗(L)∆yt = α(L) t.
• An ARMA(p,q) model for ∆dyt is denoted an ARIMA(p,d,q) model for yt.
29 of 41
Example: Danish Real House Prices
1970 1975 1980 1985 1990 1995 2000 2005
-0.1
0.0
0.1
0.2
0.3
0.4
P=Log(HousePrice/CPI)
∆P
30 of 41
• Estimating an AR(2) model for 1972:1-2004:2 yields
pt = 1.545(21.0)
pt−1 − 0.5646(−7.58)
pt−2 + 0.003359(1.29)
The lag polynomial is given by
θ(L) = 1− 1.545 · L + 0.5646 · L2,
with inverse roots given by 0.953 and 0.592.
• One root is close to unity and we estimate an ARIMA(1,1,0) model for pt :
∆pt = 0.5442(7.35)
∆pt−1 + 0.0008369(0.416)
.
The second root is basically unchanged.
31 of 41
ARIMA(p,d,q) Model Selection• Find a transformation of the process that is stationary, e.g. ∆dYt.
• Recall, that for the stationary AR(p) model— The ACF is infinite but convergent.— The PACF is zero for lags larger than p.
• For the MA(q) model— The ACF is zero for lags larger than q.— The PACF is infinite but convergent.
• The ACF and PACF contains information p and q.Can be used to select relevant models.
32 of 41
• If alternative models are nested, they can be tested.
• Model selection can be based on information criteria
IC = log bσ2| {z }Measures the likelihood
+ penalty(T,#parameters)| {z }A penalty for the number of parameters
The information criteria should be minimized!
• Three important criteria
AIC = log bσ2 + 2 · kT
HQ = log bσ2 + 2 · k · log(log(T ))T
BIC = log bσ2 + k · log(T )T
,
where k is the number of estimated parameters, e.g. k = p + q.
33 of 41
Example: Consumption-Income Ratio
1970 1980 1990 2000
6.00
6.25
(A) Consumption and income, logs.Consumption (c) Income (y)
1970 1980 1990 2000
-0.15
-0.10
-0.05
0.00(B) Consumption-Income ratio, logs.
0 5 10 15 20
0
1 (C) ACF for series in (B)
0 5 10 15 20
0
1 (D) PACF for series in (B)
34 of 41
Model T p log-lik SC HQ AIC
ARMA(2,2) 130 5 300.82151 -4.4408 -4.5063 -4.5511
ARMA(2,1) 130 4 300.39537 -4.4717 -4.5241 -4.5599
ARMA(2,0) 130 3 300.38908 -4.5090 -4.5483 -4.5752
ARMA(1,2) 130 4 300.42756 -4.4722 -4.5246 -4.5604
ARMA(1,1) 130 3 299.99333 -4.5030 -4.5422 -4.5691
ARMA(1,0) 130 2 296.17449 -4.4816 -4.5078 -4.5258
ARMA(0,0) 130 1 249.82604 -3.8060 -3.8191 -3.8281
35 of 41
---- Maximum likelihood estimation of ARFIMA(1,0,1) model ----
The estimation sample is: 1971 (1) - 2003 (2)
The dependent variable is: cy (ConsumptionData.in7)
Coefficient Std.Error t-value t-prob
AR-1 0.857361 0.05650 15.2 0.000
MA-1 -0.300821 0.09825 -3.06 0.003
Constant -0.0934110 0.009898 -9.44 0.000
log-likelihood 299.993327
sigma 0.0239986 sigma^2 0.000575934
---- Maximum likelihood estimation of ARFIMA(2,0,0) model ----
The estimation sample is: 1971 (1) - 2003 (2)
The dependent variable is: cy (ConsumptionData.in7)
Coefficient Std.Error t-value t-prob
AR-1 0.536183 0.08428 6.36 0.000
AR-2 0.250548 0.08479 2.95 0.004
Constant -0.0935407 0.009481 -9.87 0.000
log-likelihood 300.389084
sigma 0.0239238 sigma^2 0.00057234936 of 41
Estimation of ARMA Models• The natural estimator is maximum likelihood. With normal errors
logL(θ, α, σ2) = −T2log(2πσ2)−
TXt=1
2t
2 · σ2,
where t is the residual.
• For an AR(1) model we can write the residual as
t = Yt − δ − θ1 · Yt−1,
and OLS coincides with ML.
• Usual to condition on the initial values. Alternatively we can postulate a distributionfor the first observation, e.g.
Y1 ∼ N
µδ
1− θ,
σ2
1− θ2
¶,
where the mean and variance are chosen as implied by the model for the rest of theobservations. We say that Y1 is chosen from the invariant distribution.
37 of 41
• For the MA(1) model
Yt = µ + t + α t−1,
the residuals can be found recursively as a function of the parameters
1 = Y1 − µ
2 = Y2 − µ− α 1
3 = Y3 − µ− α 2
...
Here, the initial value is 0 = 0, but that could be relaxed if required by using theinvariant distribution.
• The likelihood function can be maximized wrt. α and µ.
38 of 41
Forecasting• Easy to forecast with ARMA models.Main drawback is that here is no economic insight.
• We want to predict yT+k given all information up to time T , i.e. given the informationset
IT = {y−∞, ..., yT−1, yT}.
The optimal predictor is the conditional expectation
yT+k|T = E[yT+k | IT ].
39 of 41
• Consider the ARMA(1,1) model
yt = θ · yt−1 + t + α t−1, t = 1, 2, ..., T.
• To forecast we— Substitute the estimated parameters for the true.— Use estimated residuals up to time T . Hereafter, the best forecast is zero.
• The optimal forecasts will be
yT+1|T = E[θ · yT + T+1 + α · T | IT ]= bθ · yT + bα ·bT
yT+2|T = E[θ · yT+1 + T+2 + α · T+1 | IT ]= bθ · yT+1|T .
40 of 41
1990 1992 1994 1996 1998 2000 2002 2004 2006 2008
-0.16
-0.14
-0.12
-0.10
-0.08
-0.06
-0.04
-0.02Forecasts Actual
41 of 41