5. estimation of kalman filters...1 today’s agenda 1. filtering 2. examples of ﬁlters 3. maximum...

1 Today’s Agenda

1. Filtering2. Examples of filters3. Maximum Likelihood Estimation4. Kalman Filtering5. Estimation of Kalman Filters6. Next: Spectral Representation of time series

Final:• Open book• Study:– Lecture notes– Book– Homework– Additional Material

2 Filtering

• Filtering is about transforming a time series withcertain “characteristics” or properties into anothertime series with different characteristics.

• The concept of filtering is very general and wewill see it now in the time domain and later in thefrequency domain.

• Filtering is best explained by examples.• Define the lag operator L to be such that, for a time

series Xt,LXt = Xt−1

(sometimes you will see L denoted as B, whichpeople call the “backward” operator)

• The lag operator will be treated as a “number”, i.e.we can define all the usual math operations on it,such as addition, multiplication, inversion, etc.

• For example, we can writeL2Xt = LLXt = Xt−2Xt

(1− L) =∞Xj=0

Xt−jLj

• Example: We can represent the familiar AR(1)process Yt = φYt−1 + εt as:

Yt = φLYt + εtor

Yt − φLYt = εt(1− φL)Yt = εt

• Example: We can represent an AR(2) processYt = φ1Yt−1 + φ2Yt−2 + εt as:

Yt = φ1LYt + φ2L2Yt + εt

orYt − φ1LYt − φ2L

2Yt = εt¡1− φ1L− φ2L

2¢Yt = εt

• In the AR(1) case, suppose that φ = 0.95. Then theprocess is persistent.

• But suppose we have an AR(2) process withφ1 = 0.95 and φ2 = −0.2.

• Q: Is the process persistent?• A:• Q: What exactly do we mean by a “persistent”

process?• A: We want to know:

∂yt+kεt

• In other words, if there is a shock, or “news,” howlong would the news impact the process and towhat extent.

• In the AR(1) case,yt+k = φyt+k−1 + εt+k

= εt+k + ... + φkεt + φk+1yt−k−1∂yt+k∂εt

= φk

• ∂yt+k∂εt

(as a function of k) is known as an “impulseresponse” function.

• But what about in the AR(2) or AR(k) case?

• We can write an AR(2) process as a vector AR(1)process:·

ytyt−1

¸=

·φ1 φ21 0

¸ ·yt−1yt−2

¸+

·εt0

¸Yt = ΦYt−1 + εt

• Then the dynamics of Yt are driven by Φ, becauseYt+k = εt+k + ... + Φkεt + Φk+1Yt−1

• So, we have to characterize the behavior of Φk.• But Φ is a matrix and everything can get pretty

messy.

• But there is an easy way out. Write:Φ = TΛT−1

where T is the matrix of eigenvectors and Λ isa diagonal matrix with the eigenvalues along thediagonal.

• Then:Φ2 = ΦΦ = TΛT−1TΛT−1

= TΛ2T−1

• Then:Φk = TΛkT−1

• So the eigenvalues in Λ will govern the behavior ofΦ. In particular, the highest eigenvalue of Λ will bethe most important to consider.

• Recall that the eigenvalues of Φ (in the AR(2) case)are the solution to:¯·

φ1 φ21 0

¸− λ

·1 00 1

¸¯= 0

• Or:λ2 − φ1λ− φ2 = 0

• We can view¡1− φ1L− φ2L

2¢

as a polynomial inthe lag operator. Therefore, we can look for roots,or numbers λ1 and λ2 such that¡

1− φ1L− φ2L2¢= (1− λ1L) (1− λ2L) = 0¡

1− (λ1 + λ2)L + λ1λ2L2¢= 0

• Therefore, matching powers of L, we can find theroots by solving the equations (λ1 + λ2) = φ1 andλ1λ2 = −φ2.

• Example: Suppose φ1 = 0.6 and φ2 = −0.08, sothat Yt = 0.6LYt − 0.08L2Yt + εt. You can check that¡

1− 0.6L + 0.08L2¢ = (1− 0.4L) (1− 0.2L)• But we want to be able to find λ1 and λ2 in general

situations. How to do that?• Think of a generic polynomial in z, (we use z

because the L operator has a particular meaning,i.e. it is the lag operator), as above¡

1− φ1z − φ2z2¢= (1− λ1z) (1− λ2z)

• Now, we ask, what values of z will set the right handside to zero?

• The answer is: z = 1/λ1 or z = 1/λ2.• But why is it important?• Well, the same z must also set the left hand side to

zero.• But we also know that the z that sets

¡1− φ1z − φ2z

2¢=

0 can be found by

z1,2 =φ1 ±

qφ21 + 4φ2

−2φ2• In other words, z1 = 1/λ1 and z2 = 1/λ2.

• So, can we always find the λ0s from the z0s byinverting?

• Yes, except for the case when φ21+4φ2 < 0, becausethen we will have trouble evaluating

qφ21 + 4φ2.

• In such a case z1 and z2 are complex numbers (andcomplex conjugates).

• We can write z1 = a + bi, where i =√−1. In

our example, you can check that a = φ1/2 andb = 1

2

q−φ21 − 4φ2

• We can write the complex number z in the polarcoordinate as

z1 = R[cos θ + i sin θ]

whereR =

√a2 + b2

cos θ = a/R

sin θ = b/R

• Now, we will use DeMoivre’s theorem to writez1 = R[cos θ + i sin θ] = R.e

iθ

• Similarlyz2 = R[cos θ − i sin θ] = R.e−iθ

• Thereforeλ1 = z−11 = R−1e−iθ

λ2 = z−12 = R−1eiθ

• The possibility of having complex roots is reallycool. We can see that an AR(2) process withcomplex roots can generate a process that has avery pronounced cyclical component (sin and cosbehavior).

• Q: Can we have a cyclical behavior in an AR(1)process.?

• We will come back to those same ideas when weget into the frequency domain representation of aseries.

• As a final remark, the lag operator is extremelyuseful to manipulate time series. For instance, wecan find the mean and the variance of an AR(1)process simply as:

Yt = c + φLYt + εtHence, denoting E (Yt) = µ, we obtain

µ = c + φLµ

µ = c + φµ

µ =c

1− φ

• Similarly for the variance. Denoting the varianceV ar (Yt) = γ, we can write

γ = φ2L2γ + σ2

γ − φ2γ = σ2

γ =σ2

1− φ2

• Now that we are comfortable with the lag operatorand polynomials in the lag operator, here is afiltering example:

• Suppose that εt is an iid series with varianceV ar (εt) = σ2.

• We are going to filter, or transform the series εt intoanother series, Xt.

• First, define the filterF (L) = (1− θL)

• Second, we obtain the Xt series asXt = F (L) εt

= (1− θL) εt= εt − θεt−1

• The series Xt is called a moving average of order1, or MA(1), process.

• Note that the properties of Xt are different fromthose of εt– V ar (εt) = σ2, Cov (εtεt−k) = 0– V ar(Xt) = σ

¡1 + θ2

¢> V ar (εt)

– Cov (XtXt−1) = Cov ([εt − θεt−1] [εt−1 − θεt−2]) =−θσ2

– Cov (XtXt−j) = 0, j>1.• So, the filtered series Xt induces a slight (one

period) serial correlation.

• Suppose we define another filter, G(L) = F (L)−1 =1/ (1− θL)

• We will filter εt to obtain a series Yt, such thatYt = G (L) εt

• Note that Xt and Yt are obtained from the same{εt} process, but by applying different filters.

• We will investigate the properties of YtYt =

εt(1− θL)

Yt (1− θL) = εtYt = θYt−1 + εt

• Yt is our familiar AR(1) process. We know that– V ar (Yt) = σ2

1−θ2

– Cov (YtYt−k) = σ2θk

1−θ2

• Therefore, the filter G(L) induces a persistence thatlasts longer than only one period.

• The H(L) = (1− L) filter.• Recall that if we have a series that is non-stationary,

my advice was to take first-difference, and the newseries would be stationary. We will illustrate thisadvice with a (quite general) example.

• Suppose we have a series, Yt, composed of astationary and a non-stationary series:

Yt = Zt +XtZt = Zt−1 + etXt = φXt−1 + ut, |φ| < 1

• Note that we can rewrite Zt (1− L) = et andXt (1− φL) = ut.

• Therefore, we can writeYt = Zt +Xt

Yt =et

1− L +ut

1− φL(1− L) (1− φL)Yt = et (1− φL) + ut (1− L)(1− L) (1− φL)Yt = ηt

• But, from here we can immediately see that one ofthe roots of the polynomial in L is 1. Therefore, thisprocess will be non-stationary.

• Q: Why don’t we like non-stationary processes?

• We will stationarize Yt, by filtering it using H(L).• Let’s call the filtered series Yt = H(L)Yt

Yt = (1− L)·et

1− L +ut

1− φL

¸= et +

ut (1− L)1− φL

= et +ut

1− φL− ut−11− φL

= stationary + stationary + stationary

= stationary

• Therefore, the filter H(L) transformed the non-stationary series Yt into a stationary series Yt.

• To summarize:– We have defined the lag operator L– Using the lag operator, we can define a filter

function F (L)– F (L) modifies the properties of a series Yt– It is useful to modify the properties of Yt (think

stationarity, seasonality, etc.)• Filtering is used everywhere– Stereos– Cell phones– Search for extra-terrestrials

3 Examples of filters

• We have already given a few examples of familiarfilters. Here we discuss some more:

• Intuitively, we know that when we take a movingaverage of a series, we effectively smooth theseries, i.e. we iron out any non-linearities.

• Example:

0 10 20 30 40 50 60 70 80 90 100-8

-6

-4

-2

0

2

4

6

8unfiltered Ytfiltered Yt

1.• In the above example, we have created a seriesYt = Zt +Xt.

• Zt = 0.9Zt−1 + et and Xt = 0.1Xt−1 + ut• We think of Zt as the strong “signal” and Xt is the

almost-iid noise.• In other words, Zt is responsible for the big swings

in Yt, whereas the jagged peaks are due to Xt

• We want to smooth out the noise and be left withthe signal.

• We can do that, as in the figure, by applying a filterF (L) = 1

3L +13L

0 + 13L−1 to Yt.

• The filtered series is:Yt = F (L)Yt

=

µ1

3L +

1

3L0 +

1

3L−1

¶Yt

=1

3Yt−1 +

1

3Yt +

1

3Yt+1

• This is nothing but a moving average of Yt and itstwo adjacent values.

• In a similar fashion, we can smooth the series evenmore by taking a longer moving average.

• Example: F (L) = 17L

3 + 17L

2 + 17L +

17L

0 + 17L−1 +

17L−2 + 1

7L−3

0 10 20 30 40 50 60 70 80 90 100-8

-6

-4

-2

0

2

4

6

8unfiltered YtMA(3) filter MA(7) filter

2.

• The second filter, the MA(7) filter induces an evensmoother behavior in the filtered series.

• So, by filtering the series, we have effectivelyremoved the components of Yt that are moving fast,or that induce the mostly unpredictable behavior inYt.

• If you have used some of the financial websites(Yahoo!), you know that they offer the option of

displaying a price using a MA filter.• But F (L) = 1

7L3+ 17L

2+ 17L+17L

0+ 17L−1+ 17L

−2+ 17L−3

is a special filter. It is:– Two sided– Symmetric– Equally-weighted

• The design of the filter depends on the application.

• The MA filter is the single most widely used filter inapplied work (justifiably so, since it is simple anddoes the job).

• However, suppose we want to filter out a specificfeature of the data.– For example, corporate taxes are due quarterly.

Therefore, any series that is tightly related totaxes will have a quarterly-frequency componentto it. In other words, every time taxes are due, theseries will exhibit a certain pattern.

– We might not want to have this quarterly patternin taxes influence our results. In other words, wewant to seasonally adjust (or filter) the series sothat quarterly fluctuations are not part of the newseries. We can do that with a quarterly filter.

– There are prices, such as prices of commodities,prices of utilities that have a seasonal component.An entire industry of analysts is trying to forecastsuch fluctuations. For example, it is widelybelieved that the price of crude (and refined) oilhas a 6 months cycle, peaking in the summer(travel season) and the winter (heating).

– Although this is usually (historically) true, weshould not forget that we are dealing with random

variables. For example, last year, the price of oilDECREASED during the summer months.

• So, the question is, how can we isolate the featuresof the data that happen every 3 or every 6 months?

• Can we decompose a series into periodic com-ponents, where each component has differentperiodicity?

• Yes, but for that we will need a great deal of tools(next time).

• Before we leave the time domain representation, wewill introduce the Kalman Filter and its estimation.

4 Maximum Likelihood Estimation

(Preliminaries for Kalman Filtering)• Suppose we have the series {Y1, Y2,...., YT} with

a joint density fYT ....Y1(θ) that depends on someparameters θ (such as means, variances, etc.)

• We observe a realization of Yt.• If we make some functional assumptions on f, we

can think of f as the probability of having observedthis particular sample, given the parameters θ.

• The maximum likelihood estimate (MLE) of θ is thevalue of the parameters θ for which this sample ismost likely to have been observed.

• In other words, θMLE

is the value that maximizesfYT ....Y1(θ).

• Q: But, how do we know what f–the true density ofthe data–is?

• A: We don’t.• Usually, we assume that f is normal, but this is

strictly for simplicity. The fact that we have to makedistributional assumptions limits the use of MLE inmany financial applications.

• Recall that if Yt are independent over time, thenfYT ....Y1(θ) = fYT (θT )fYT−1(θT−1)...fY1(θ1)

= ΠTi=1fYi (θi)

• Sometimes it is more convenient to take the log ofthe likelihood function, then

Λ (θ) = log fYT ....Y1(θ) =TXi=1

log fYi (θ)

• However, in most time series applications, theindependence assumption is untenable. Instead,we use a conditioning trick.

• Recall thatfY2Y1 = fY2|Y1fY1

• In a similar fashion, we can writefYT ....Y1(θ) = fYT |YT−1....Y1(θ)fYT−1|YT−2....Y1(θ)...fY1(θ)

• The log likelihood can be expressed as

Λ (θ) = log fYT ....Y1(θ) =TXi=1

log fYi|Yi−1,...,Y1 (θi)

• Example: The log-likelihood of an AR(1) processYt = c + φYt−1 + εt

• Suppose that εt is iid N(0,σ2)

• Recall that E (Yt) = c1−φ and V ar (Yt) = σ2

1−φ2

• Since Yt is a linear function of the ε0ts, then it is alsoNormal (sum of normals is a normal).

• Therefore, the density (unconditional) of Yt isNormal.

• Result: If Y1 and Y2 are jointly Normal, then themarginals are also normal.

• Therefore,fY2|Y1 is N

¡(c + φy1) ,σ

2¢

or

fY2|Y1 =1√2πσ2

exp

"− (y2 − c− φy1)

2

2σ2

#

• Similarly,fY3|Y2 is N

¡(c + φy2) ,σ

2¢

or

fY3|Y2 =1√2πσ2

exp

"− (y3 − c− φy2)

2

2σ2

#

Then,the log likelihood can be written as

Λ (θ) = log fY1 +TXt=2

log fYt|Yt−1

= −12log (2π)− 1

2log¡σ2/

¡1− φ2

¢¢−{y1 − (c/ (1− φ))}2

2σ2/¡1− φ2

¢−(T − 1)

2log (2π)− (T − 1)

2log¡σ2¢

−TXt=2

(yt − c− φyt−1)2

2σ2

• The unknown parameters are collected in θ =(c,φ,σ)

• We can maximize Λ (θ) with respect to all thoseparameters and find the estimates that maximizethe probability of having observed such a sample.

maxθ

Λ (θ)

• Sometimes, we can even put constraints (such as|φ| < 1)

• Q: Is it necessary to put the constraint σ2 > 0?

• Note: If we forget the first observation, then we canwrite (setting c = 0) the FOC:

−TXt=2

∂

∂φ

(yt − φyt−1)2

2σ2= 0

TXt=2

yt−1 (yt − φyt−1) = 0

φ =

PTt=2 yt−1ytPTt=2 y

2t−1

• RESULT: In the univariate linear regression case,OLS, GMM, MLE are equivalent!!!

• To summarize the maximum likelihood principle:(a) Make a distributional assumption about the data(b) Use the conditioning to write the joint likelihood

function(c) For convenience, we work with the log-likelihood

function(d) Maximize the likelihood function with respect to

the parameters• There are some subtle points.– We had to specify the unconditional distribution of

the first observation– We had to make an assumption about the

dependence in the series• But sometimes, MLE is the only way to go.• MLE is particularly appealing if we know the

distribution of the series. Most other deficienciescan be circumvented.

• Now, you will ask: What are the properties ofθMLE

? More specifically, is it consistent? What is itsdistribution, where

θMLE

= argmaxΛ (θ)

• Yes, θMLE

is a consistent estimator of θ.• As you probably expect the asymptotic distribution

of θMLE

is normal.• Result:

T 1/2³θMLE − θ

´∼ aN (0, V )

V =

·−∂

2Λ (θ)

∂θ∂θ0|θMLE

¸−1or

V =TXt=1

l³θMLE

, y´l³θMLE

, y´

l³θMLE

, y´=

∂f

∂θ

³θMLE

, y´

• But we will not dwell on proving those properties.

• Now we are ready to plunge into Kalman Filtering!!!!• We will use MLE to estimate a complicated system

of observable and unobservable components• As I mentioned at the beginning of this course, the

Kalman Filtering techniques used in finance are notexactly analogous to those used in engineering.

5 Kalman Filtering

• History: Kalman (1963) paper• Problem: We have a missile that we want to guide

to its proper target.– The trajectory of the missile IS observable from

the control center.– Most other circumstances, such as weather

conditions, possible interception methods, etc.are NOT observable, but can be forecastable.

– We want to guide the missile to its properdestination.

• In finance the setup is very similar, but the problemis different.

• In the missile case, the parameters of the systemare known. The interest is, given those parameters,to control the missile to its proper destination.

• In finance, we want to estimate the parameters ofthe system. We are usually not concerned witha control problem, because there are very fewinstruments we can use as controls (although thereare counter-examples).

5.1 Setup (Hamilton CH 13)

yt = A0xt +H 0zt + wtzt = Fzt−1 + vt

where• yt is the observable variable (think “returns”)– The first equation, the yt equation is called the

“space” or the “observation” equation.• zt is the unobservable variable (think “volatility” or

“state of the economy”)– The second equation, the zt equation is called the

“state” equation.• xt is a vector of exogenous (or predetermined)

variables (we can set xt = 0 for now).• vt and wt are iid and assumed to be uncorrelated at

all lagsE (wtv

0t) = 0

• Also E (vtv0t) = Q, E (wtw0t) = R• The system of equations is known as a state-space

representation.• Any time series can be written in a state-space

representation.

• In standard engineering problems, it is assumedthat we know the parameters A,H,F,Q,R.

• The problem is to give impulses xt such that, giventhe states zt, the missile is guided as closely totarget as possible.

• In finance, we want to estimate the unknownparameters A,H,F,Q,R in order to understandwhere the system is going, given the states zt.There is little attempt at guiding the system. In fact,we usually assume that xt = 1 and A = E(Yt), oreven that xt = 0.

• Note: Any time series can be written as a statespace.

• Example: AR(2): Yt+1 − µ = φ1 (Yt − µ) +φ2 (Yt−1 − µ) + εt+1

• State equation:·Yt+1 − µYt − µ

¸=

·φ1 φ21 0

¸ ·Yt − µYt−1 − µ

¸+

·εt+10

¸• Observation equation:

yt = µ+£1 0

¤ · Yt+1 − µYt − µ

¸• There are other state-space representations of Yt.

Can you write down another one?

• As a first step, we will assume that A,H,F,Q,Rare known.

• Our goal would be to find a best linear forecast ofthe state (unobserved) vector zt. Such a forecastis needed in control problems (to take decisions)and in finance (state of the economy, forecasts ofunobserved volatility).

• The forecasts will be denoted by:zt+1|t = E (zt+1|yt..., xt....)

and we assume that we are only taking linearprojections of zt+1 on yt..., xt..... Nonlinear KalmanFilters exist but the results are a bit more compli-cated.

• The Kalman Filter calculates the forecasts zt+1|trecursively, starting with z1|0, then z2|1, ...until zT |T−1.

• Since zt|t−1 is a forecast, we can ask how good of aforecast it is?

• Therefore, we define Pt|t−1 = E¡¡zt − zt|t−1

¢ ¡zt − zt|t−1

¢¢,

which is the forecasting error from the recursiveforecast zt|t−1.

• The Kalman Filter can be broken down into 5 steps

1. Initialization of the recursion. We need z1|0. Usually,we take z1|0 to be the unconditional mean, orz1|0 = E (z1) . (Q: how can we estimate E (z1)? )The associated error with this forecast is P1|0 =E¡¡z1|0 − z1

¢ ¡z1|0 − z1

¢¢

2. Forecasting yt (intermediate step)The ultimate goal is to calculate zt|t−1, but we dothat recursively. We will first need to forecast thevalue of yt, based on available information:

E (yt|xt, zt) = A0xt +H 0ztFrom the law of iterated expectations,

Et−1 (Et (yt)) = Et−1 (yt) = A0xt +H 0zt|t−1The error from this forecast is

yt − yt|t−1 = H 0 ¡zt − zt|t−1¢ + wtwith MSE

E¡yt − yt|t−1

¢ ¡yt − yt|t−1

¢0= E

hH 0 ¡zt − zt|t−1¢ ¡zt − zt|t−1¢0Hi +E [wtw0t]

= H 0Pt|t−1H +R

4. Forecast zt+1|t.– Once we have an update of the old forecast, we

can produce a new forecast, the forecast of zt+1|tEt (zt+1) = E (zt+1|yt, xt, ...)= E (Fzt + vt+1|yt, xt, ...)= FE (zt|yt, xt, ...) + 0

= Fzt|t– We can use the above equation to write

Et (zt+1) = F{zt|t−1+Pt|t−1H

¡H 0Pt|t−1H +R

¢−1 ¡yt −A0xt −H 0zt|t−1

¢}= Fzt|t−1

+FPt|t−1H¡H 0Pt|t−1H +R

¢−1 ¡yt −A0xt −H 0zt|t−1

¢– We can also derive an equation for the error in

forecast as a recursionPt+1|t = F [Pt|t

−Pt|t−1H¡H 0Pt|t−1H +R

¢−1H 0Pt|t−1]F 0

+Q

5. Go to step 2, until we reach T. Then, we are done.

• Summary: The Kalman Filter produces– The optimal forecasts of zt+1|t and yt+1|t (optimal

within the class of linear forecasts)– We need some initialization assumptions– We need to know the parameters of the system,

i.e. A, H, F, Q, R.• Now, we need to find a way to estimate the

parameters A, H, F, Q, R.• By far, the most popular method is MLE.• Aside: Simulations Methods–getting away from the

restrictive assumptions of εt

6 Estimation of Kalman Filters

• Suppose that z1, and the shocks (wt, vt) are jointlynormally distributed.

• Under such an assumption, we can make the verystrong claim that the forecasts zt+1|t and yt+1|tare optimal among any functions of xt, yt−1.... Inother words, if we have normal errors, we cannotproduce better forecasts using the past data thanthe Kalman forecasts!!!

• If the errors are normal, then all variables in thelinear system have a normal distribution.

• More specifically, the distribution of yt conditionalon xt, and yt−1, ... is normal, oryt|xt, yt−1... ∼ N

¡A0xt +H 0zt|t−1,

¡H 0Pt|t−1H +R

¢¢• Therefore, we can specify the likelihood function ofyt|xt, yt−1 as we did above.

fyt|xt,yt−1 = (2π)−n/2¯H 0Pt|t−1H +R

¯−1/2× exp[−1

2

¡yt −A0xt −H 0zt|t−1

¢0 ¡H 0Pt|t−1H +R

¢−1× ¡yt −A0xt −H 0zt|t−1

¢]

• The problem is to maximize

maxA,H,F,Q,R

TXt=1

log fyt|xt,yt−1

• Words of wisdom:– This maximization problem can easily get un-

manageable to estimate, even using moderncomputers. The problem is that searching forglobal max is very tricky.∗ A possible solution is to make as many restric-

tions as possible and then to relax them one byone.

∗ A second solution is to write a model that givestheoretical restrictions.

– Recall that there are more than 1 state spacerepresentations of an AR process. This impliesthat some of the parameters in the state-spacesystem are not identified. In other words, morethan one value of the parameters (differentcombinations) can give rise to the same likelihoodfunction.∗ Then, which likelihood do we choose?∗ Have to make restrictions so that we have an

exactly identified problem.

5. estimation of kalman filters...1 today’s agenda 1. filtering 2. examples of ﬁlters 3. maximum...

Documents