multivariate time series

25
Certificate in Quantitative Finance Module 6 Assessed Assignment 2012 Luigi Piva Multi-Variate Time Series Analysis A multivariate time series consists of several series. Therefore, the concepts of vector and matrix are important in multivariate time series analysis Many of the models and methods used in the univariate analysis can be generalized directly to the multivariate case, but there are situations in which the generalization requires some attention. In some situations,we need new models and methods to manage the complex relationships between different series. I decided to use five important energy futures, importing closing data into a spreadsheets. The time series, cover the period from 31/05/2007 to 16/07/2012: Crude Oil Ethanol Gasoline Heating Oil Natural Gas In the graph below we see the series . Obviously the value of each series is different from the others, to be able to easily view all the series together, all the time series start from the same point, one, and move proportionally

Upload: luigi-piva

Post on 11-Jul-2015

316 views

Category:

Economy & Finance


9 download

TRANSCRIPT

Page 1: Multivariate time series

Certificate in

Quantitative Finance

Module 6 Assessed Assignment

2012

Luigi Piva

Multi-Variate Time Series Analysis

A multivariate time series consists of several series. Therefore, the concepts of vector and matrix are

important in multivariate time series analysis

Many of the models and methods used in the univariate analysis can be generalized directly to the

multivariate case, but there are situations in which the generalization requires some attention. In some

situations,we need new models and methods to manage the complex relationships between different

series.

I decided to use five important energy futures, importing closing data into a spreadsheets.

The time series, cover the period from 31/05/2007 to 16/07/2012:

Crude Oil

Ethanol

Gasoline

Heating Oil

Natural Gas

In the graph below we see the series . Obviously the value of each series is different from the others, to

be able to easily view all the series together, all the time series start from the same point, one, and

move proportionally

Page 2: Multivariate time series

To plot this chart , all series start at one. In the subsequent period, the value is equal to one plus the

variation, calculated as follows:

Value = (today'close-yesterday close) / yesterday close

The series then continues by adding the following variation to the accumulated value up to that

moment.

In an initial visual inspection, the series appear to be trending. In the markets of Gasoline and Ethanol

there is a positive trend, while for what concerns Crude Oil and Heating Oil,the evolution is more an

oscillatory movement . There is a negative trend for the Natural Gas. Does not appear that futures have

a mean-reverting behavior, meaning that they tend to move around a mean value. Again visually, it

seems that Crude Oil and Heating Oil are related, as well as Ethanol and Gasoline.

-1

-0,5

0

0,5

1

1,5

2

2,5

3

1

45

89

13

3

17

7

22

1

26

5

30

9

35

3

39

7

44

1

48

5

52

9

57

3

61

7

66

1

70

5

74

9

79

3

83

7

88

1

92

5

96

9

10

13

10

57

11

01

11

45

11

89

12

33

12

77

CL

HO

NG

GS

ET

Page 3: Multivariate time series

If we plot daily returns, we get the following chart : from the top to the bottom, Crude Oil, Ethanol,

Gasoline, Heating Oil and Natural Gas:

The behaviour is completely different. The values are moving around zero. some series (Crude

Oil, Ethanol, Heating Oil) show larger daily variations if compared to other series (Gasoline,

Natural Gas).

Page 4: Multivariate time series

Flow Diagram

In the flow chart below we see the major steps we will follow in this project.

Page 5: Multivariate time series

Augmented Dickey-Fuller Test

We may be interested, in the individual time series first . Univariate time series are integrated if

can be brought to stationarity through differencing.

Using the Augmented Dickey Fuller test we can test the individual time series and see if they

are stationary

The following table summarizes the results:

H0 PValue Stat

Crude Oil 0 0.4513 -0.5477

Ethanol 0 0.8781 0.7621

Gasoline 0 0.7175 0.1787

Heating Oil 0 0.5804 -0.1955

Natural Gas 0 0.4215 -0.5561

For all time series ,the null hypothesis of unit root is not rejected, the price series are not

stationary, they are probably integrated. To make the series stationary we could take the

differences. The number of differences that we have to take to make the series stationary is the

order of integration

For all five series the order of integration is equal to one, can be compared to variables AR1

We repeat the ADF test for the daily returns series:

H0 PValue Stat

Crude Oil 1 <Min -37.55

Ethanol 1 <Min -36.28

Gasoline 1 <Min -36.76

Heating Oil 1 <Min -36.45

Natural Gas 1 <Min -40.53

Page 6: Multivariate time series

The result is obviously completely different, in all the cases the null hypothesis is rejected and

the series are stationary and non-integrated.

The AR models are normally used to study stationary time series, when we speak of multi-

variate time series models we refer to VAR (Vector Auto-Regression) models.

We will now use VAR models to analyze the returns of the five energy futures.

Vector Autoregressive Models

VAR is a simple and useful model for modeling our vectors of returns . We will think in terms

of a model like the following:

Yt is a vector [n:1] e A is a [n:n] matrix of the coefficients of the lagged variable Yp . In this

case the lag of the model is equal to 1.

Determining an appropriate number of lags

Among the various methods to derive the most appropriate number of lags, we will use Akaike

Information Criterion, which requires various values : the likelihood and the number of active

parameters in the model.

In practice, we can quickly obtain these data modeling our VAR for different lag (1,2,3,4 ...),

keeping in mind that the first values are the most likely. To obtain the likelihood in Matlab,

simply type LLF after the estimate of the model parameters. To derive the number of active

parameters:

[NumParam,NumActive]=vgxcount( Model name )

To calculate Akaike Information Criterion

AIC = aicbic([LLF1, ...LLFn],[Np1,...Npn])

where LLF indicates the likelihood and Npn indicates the nth number of active parameters.

Page 7: Multivariate time series

The lowest values of the AIC indicates the best lag.

VAR(p) Likelihood NumParam AIC

1 1.5890e+004 5 -31770

2 1.5936e+004 5 -31862

3 1.6109e+004 5 -32208

4 1.6188e+004 5 -32366

Obviously, we will choose a VAR (1), model, ie with lag equal to one.

VAR(1) Parameters Estimation

In order to estimate the model using Matlab we will follow the following steps:

1. import stationary time series, collected in a matrix in excel with a series of returns in each of

the columns and a number of rows equal to the observations.

2. Create the VAR model

We want to build a VAR model with one lag , a constant and five series:

Model = vgxset('n',5,'nAR',1,'Constant',true)

3. Fit the model to the data

We also want to find the values of the constants, parameters and of the covariances of the

innovations:

[EstSpec,EstStdErrors,LLF,W] = vgxvarx(Model, DataMatrix);

and obviously we want to see the results

vgxdisp(EstSpec,EstStdErrors)

Then we obtain the estimates of the parameters:

Page 8: Multivariate time series

and the covariance matrix of the residuals

Stability Check

Once fitted the model, we can control the stability of the model, given that we have no MA

elements, having only AR model, the model is invertible by definition.

[isStable, isInvertible] = vgxqual(Model);

The answer is a logical operator (0.1) which represent the rejection and acceptance of the

hypothesis of stability and reversibility.

Page 9: Multivariate time series

In our case the answer (ans) is: (1.1). The model is stable and invertible.

Forecasts using a VAR model

We can use the estimated VAR model to make predictions about future values of the series

studied.

[ypred,ycov] = vgxpred(Model, [],5,[],[])

Is an iterative instruction that uses the model we built and estimated to make 5 predictions

about future changes in the futures prices.

Page 10: Multivariate time series

-0,05

0

0,05

0,1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

CL

CL

-0,1

-0,05

0

0,05

0,1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

HO

HO

-0,05

0

0,05

0,1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

GS

GS

-0,1

-0,05

0

0,05

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

ET

ET

-0,1

-0,05

0

0,05

0,1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

NG

NG

Page 11: Multivariate time series

We could check , later in this project, whether these changes are consistent with the forecasts

of our VEC models on closing prices .

Closing Prices Time Series

We have already seen, with the ADF tests, that time series of prices are not stationary. We

want a confirmation from the KPSS test, which evaluates the null hypothesis that a univariate

time series y is trend stationary against the alternative that it is a unit root . We want this series

to be integrated.

[h0,pVal0] = kpsstest(TimeSeries,'trend',false)

KPSS Test

H0 PValue

Crude Oil 1 >0.01

Ethanol 1 >0.01

Gasoline 1 >0.01

Heating Oil 1 >0.01

Natural Gas 1 >0.01

The results show that we accept the hypothesis that the processes are integrated in all the time

series of derivative prices. We may calculate the order of integration of each series obtaining

the number of differences required to make the series stationary.

Returning to the flowchart, rejecting the hypothesis of stationarity and having an indication of

integrated processes, we continue in the right part of the scheme and apply a first test of

cointegration.

Page 12: Multivariate time series

Engle-Granger Test for Cointegration

To get information about the presence of a cointegration relationship, we will run the Engle-

Granger test and T-test, this time over the entire Matrix of our futures prices.

The test has the form:

Y(:,1)=Y(:2:end)*b+X+a+e

On the left side we have the regressand , the first series, while on the right-side ofthe equation

, from 2 to five in our case, we have the regressors. The key factor here are the residuals, to be

more precise, the estimates of residuals. If the residuals series is stationary, the linear

combination of variables is stationary

[hEG,pValEG]=egtest(DataMatrix,'test',{'t1})

we obtain the following results:

H0 PValue t

Engle-Granger 1 0.0615 ---

t.statistic 1 ---- 0.1

Both tests indicate the presence of cointegration in the matrix of the values of the derivatives.

At this point, we want to identify the cointegration relationship.

We extract the vector of parameter b and the intercept c0 obtained running the function

egcitest and form a linear combination of regression:

c0=reg.coeff(1);

b=reg.coeff(2:5);

plot(Y*[1;-b]-c0,'LineWidth',2)

Page 13: Multivariate time series

We have a new variable that is the linear combination of the five futures.

We can see how the series is relatively stationary, moving around zero, with different clusters

of volatility. That's another indication that there is a cointegration relationship.

The models that are used for the cointegrated systems are the Vector Error Correction Models

(or cointegrated VAR ).

Page 14: Multivariate time series

Vector Error Correction Models

Once a cointegration relationship has been determined, the remaining coefficients of the VEC

model can be estimated using Ordinary Least Squares. Cointegrated variables tend to restore

common stochastic trend, expressed in terms of error correction. The expression for a VEC (q)

model ,where q is the number of lag, is the following:

Estimating a VEC Model

We said that after finding the cointegration relationship, we can determine the coefficients of

the model. The term with the summation in the VEC model is similar to the VAR model.

The term that is really different is AB'yt-1. A represents the speed of adjustment to the

imbalances of the model. Dx represents an exogenous variables, (not present in our case).

The matrix product AB represents our error correction coefficients

Page 15: Multivariate time series

Residuals estimate Covariance Matrix

Page 16: Multivariate time series

Simulations and Forecasts

Once the model coefficients are estimated, the underlying data generation process can be

simulated. For example, the code in the script generates a single path of Monte Carlo

forecast:

Page 17: Multivariate time series
Page 18: Multivariate time series
Page 19: Multivariate time series

In these days we can test the effectiveness of the forecast compared to the evolution of the

markets, but it is clear that the predictions of such models lose value when it exceeds one or

two periods , especially with daily data and without seasonal or exogenous element.

Limits of Engle-Granger Regression

The Engle-Granger method has several limitations. First, it identifies only a single cointegration

relationship. This requires one of the variables to be identified as "first" among all the

variables. This choice, which is usually arbitrary, will affect both test results and model

estimation.

We try to go a bit further, permuting the five series and estimating the cointegration

relationship for any choice of a variable as regressant

The table shows the results of the t statistic:

Page 20: Multivariate time series

H0 PVal

1 0.0010

1 0.0010

1 0.0063

1 0.0010

1 0.0010

In our case, there is not much difference in choosing any of the five series as regressant and the

other four as regressors.

Here we can see the five cointegration relationships detected by the permutation, the scale

penalizes the different values but all relationships are stationary and mean-reverting.

Page 21: Multivariate time series

Another limitation of the Engle-Granger method is that it is a two-steps procedure, with a first

regression that estimates the residual series, and another regression to verify the unit root.

Errors in the initial estimate are necessarily brought in the second evaluation.

Furthermore, the Engle-Granger method for the estimation of the cointegration relationships

play a role in the VEC model definition. As a result, the VEC model estimates also becomes a

two-step procedure

Johansen Test for Cointegration

The Johansen test for cointegration addresses many of the limitations of the Engle-Granger

method. It avoids the two-step estimators and provides comprehensive tests in the presence of

multiple cointegrating relationships.

His approach incorporates the maximum-likelihood test procedure in the process of estimating

the model, avoiding conditional estimates. Furthermore, the test provides a framework for

testing restrictions on cointegrating relationships.

The key point in the Johansen method is the ratio between the degree of the impact matrix

C = AB' and the size of its eigenvalues. The eigenvalues depend on the shape of the VEC model,

and in particular on the composition of its deterministic terms. The method relies on the rank

of cointegration by testing the number of eigenvalues that are statistically different from 0.

We will now run the test for the cointegration rank using the H1 default model,the form of the

H1 model is:

A(B'yt-1+C0)+C1

[~,~,~,~,mles] = jcitest(Y,'model','H1','lags',2,'display','params');

The term mLes refers to the fact that the test procedure is based on the Maximum Likelihood

method.

Page 22: Multivariate time series

The results of the Johansen test for Cointegration give us more information than the Engle-

Granger test. We will not take in account the case of rank equal to zero (VAR) and the case of

rank equal to 5 (the data are stationary in value).

************************

Results Summary (Test 1)

Data: Y

Effective sample size: 1294

Model: H1

Lags: 1

Statistic: trace

Significance level: 0.05

r h stat cValue pValue eigVal

========================================

0 1 198.0842 95.7541 0.0010 0.0988

1 0 63.4945 69.8187 0.1443 0.0247

2 0 31.1656 47.8564 0.6613 0.0134

3 0 13.7345 29.7976 0.8550 0.0080

4 0 3.3247 15.4948 0.9503 0.0026

5 0 0.0048 3.8415 0.9445 0.0000

As expected, the null hypothesis is rejected for Rank equal to zero, while the null hypotheses

are not rejected for the ranks from 2 to 4.

All the statistics (stat, cValue, pValue, eigenvalues) indicate the rank = 4 model as the most

appropriate.

Parameters Estimation

In addition to the test for cointegration relationships, the test produces maximum likelihood

estimates of the coefficients of the VEC model. We estimate the parameters for a VEC model

with lag = 1 and rank = 4,

Page 23: Multivariate time series

Comparing the Cointegration Analysis Strategies

Comparisons of Engle-Granger and Johansen approaches may be difficult for several reasons.

First of all, the two methods are essentially different, and may disagree on inferences from the

data itself

Page 24: Multivariate time series

The Engle-Granger two-step method for the estimation of the model-VEC first estimates the

cointegration relationship and then the coefficients of the model. It's very different from

Johansen method of Maximum Likelihood.

However, the two approaches should provide results that are generally comparable, if both

begin with the same data , searching for the same underlying relationships. Normalized

cointegrating relationships discovered by one of the two methods should reflect the

mechanisms of the process in the data and VEC models constructed from the reports should

have comparable predictive power.

Page 25: Multivariate time series

In our case the cointegration relationship obtained with the Engle-Granger and Johansen tests

are very similar. when the results converge we get an important confirmation, given that we

are using different methodologies.

Conclusions

Having said that the forecasting power of the econometric models is questionable, there are

possible practical uses of these models possible.

In particular, even the VEC models obtained by the Johansen, procedures can be used to make

predictions and seem to be more accurate.

We could, for example, to study the effect of these five series on the price of energy and we

could study it at different timeframe (daily, hourly, high frequency). In doing so, we could insert

exogenous variables, such as the dollar index or meteorological factors.

To improve the effectiveness of these models could be very useful to the use of genetic

algorithms (included in Matlab and other software).