multivariate time series
TRANSCRIPT
Certificate in
Quantitative Finance
Module 6 Assessed Assignment
2012
Luigi Piva
Multi-Variate Time Series Analysis
A multivariate time series consists of several series. Therefore, the concepts of vector and matrix are
important in multivariate time series analysis
Many of the models and methods used in the univariate analysis can be generalized directly to the
multivariate case, but there are situations in which the generalization requires some attention. In some
situations,we need new models and methods to manage the complex relationships between different
series.
I decided to use five important energy futures, importing closing data into a spreadsheets.
The time series, cover the period from 31/05/2007 to 16/07/2012:
Crude Oil
Ethanol
Gasoline
Heating Oil
Natural Gas
In the graph below we see the series . Obviously the value of each series is different from the others, to
be able to easily view all the series together, all the time series start from the same point, one, and
move proportionally
To plot this chart , all series start at one. In the subsequent period, the value is equal to one plus the
variation, calculated as follows:
Value = (today'close-yesterday close) / yesterday close
The series then continues by adding the following variation to the accumulated value up to that
moment.
In an initial visual inspection, the series appear to be trending. In the markets of Gasoline and Ethanol
there is a positive trend, while for what concerns Crude Oil and Heating Oil,the evolution is more an
oscillatory movement . There is a negative trend for the Natural Gas. Does not appear that futures have
a mean-reverting behavior, meaning that they tend to move around a mean value. Again visually, it
seems that Crude Oil and Heating Oil are related, as well as Ethanol and Gasoline.
-1
-0,5
0
0,5
1
1,5
2
2,5
3
1
45
89
13
3
17
7
22
1
26
5
30
9
35
3
39
7
44
1
48
5
52
9
57
3
61
7
66
1
70
5
74
9
79
3
83
7
88
1
92
5
96
9
10
13
10
57
11
01
11
45
11
89
12
33
12
77
CL
HO
NG
GS
ET
If we plot daily returns, we get the following chart : from the top to the bottom, Crude Oil, Ethanol,
Gasoline, Heating Oil and Natural Gas:
The behaviour is completely different. The values are moving around zero. some series (Crude
Oil, Ethanol, Heating Oil) show larger daily variations if compared to other series (Gasoline,
Natural Gas).
Flow Diagram
In the flow chart below we see the major steps we will follow in this project.
Augmented Dickey-Fuller Test
We may be interested, in the individual time series first . Univariate time series are integrated if
can be brought to stationarity through differencing.
Using the Augmented Dickey Fuller test we can test the individual time series and see if they
are stationary
The following table summarizes the results:
H0 PValue Stat
Crude Oil 0 0.4513 -0.5477
Ethanol 0 0.8781 0.7621
Gasoline 0 0.7175 0.1787
Heating Oil 0 0.5804 -0.1955
Natural Gas 0 0.4215 -0.5561
For all time series ,the null hypothesis of unit root is not rejected, the price series are not
stationary, they are probably integrated. To make the series stationary we could take the
differences. The number of differences that we have to take to make the series stationary is the
order of integration
For all five series the order of integration is equal to one, can be compared to variables AR1
We repeat the ADF test for the daily returns series:
H0 PValue Stat
Crude Oil 1 <Min -37.55
Ethanol 1 <Min -36.28
Gasoline 1 <Min -36.76
Heating Oil 1 <Min -36.45
Natural Gas 1 <Min -40.53
The result is obviously completely different, in all the cases the null hypothesis is rejected and
the series are stationary and non-integrated.
The AR models are normally used to study stationary time series, when we speak of multi-
variate time series models we refer to VAR (Vector Auto-Regression) models.
We will now use VAR models to analyze the returns of the five energy futures.
Vector Autoregressive Models
VAR is a simple and useful model for modeling our vectors of returns . We will think in terms
of a model like the following:
Yt is a vector [n:1] e A is a [n:n] matrix of the coefficients of the lagged variable Yp . In this
case the lag of the model is equal to 1.
Determining an appropriate number of lags
Among the various methods to derive the most appropriate number of lags, we will use Akaike
Information Criterion, which requires various values : the likelihood and the number of active
parameters in the model.
In practice, we can quickly obtain these data modeling our VAR for different lag (1,2,3,4 ...),
keeping in mind that the first values are the most likely. To obtain the likelihood in Matlab,
simply type LLF after the estimate of the model parameters. To derive the number of active
parameters:
[NumParam,NumActive]=vgxcount( Model name )
To calculate Akaike Information Criterion
AIC = aicbic([LLF1, ...LLFn],[Np1,...Npn])
where LLF indicates the likelihood and Npn indicates the nth number of active parameters.
The lowest values of the AIC indicates the best lag.
VAR(p) Likelihood NumParam AIC
1 1.5890e+004 5 -31770
2 1.5936e+004 5 -31862
3 1.6109e+004 5 -32208
4 1.6188e+004 5 -32366
Obviously, we will choose a VAR (1), model, ie with lag equal to one.
VAR(1) Parameters Estimation
In order to estimate the model using Matlab we will follow the following steps:
1. import stationary time series, collected in a matrix in excel with a series of returns in each of
the columns and a number of rows equal to the observations.
2. Create the VAR model
We want to build a VAR model with one lag , a constant and five series:
Model = vgxset('n',5,'nAR',1,'Constant',true)
3. Fit the model to the data
We also want to find the values of the constants, parameters and of the covariances of the
innovations:
[EstSpec,EstStdErrors,LLF,W] = vgxvarx(Model, DataMatrix);
and obviously we want to see the results
vgxdisp(EstSpec,EstStdErrors)
Then we obtain the estimates of the parameters:
and the covariance matrix of the residuals
Stability Check
Once fitted the model, we can control the stability of the model, given that we have no MA
elements, having only AR model, the model is invertible by definition.
[isStable, isInvertible] = vgxqual(Model);
The answer is a logical operator (0.1) which represent the rejection and acceptance of the
hypothesis of stability and reversibility.
In our case the answer (ans) is: (1.1). The model is stable and invertible.
Forecasts using a VAR model
We can use the estimated VAR model to make predictions about future values of the series
studied.
[ypred,ycov] = vgxpred(Model, [],5,[],[])
Is an iterative instruction that uses the model we built and estimated to make 5 predictions
about future changes in the futures prices.
-0,05
0
0,05
0,1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
CL
CL
-0,1
-0,05
0
0,05
0,1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
HO
HO
-0,05
0
0,05
0,1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
GS
GS
-0,1
-0,05
0
0,05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
ET
ET
-0,1
-0,05
0
0,05
0,1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
NG
NG
We could check , later in this project, whether these changes are consistent with the forecasts
of our VEC models on closing prices .
Closing Prices Time Series
We have already seen, with the ADF tests, that time series of prices are not stationary. We
want a confirmation from the KPSS test, which evaluates the null hypothesis that a univariate
time series y is trend stationary against the alternative that it is a unit root . We want this series
to be integrated.
[h0,pVal0] = kpsstest(TimeSeries,'trend',false)
KPSS Test
H0 PValue
Crude Oil 1 >0.01
Ethanol 1 >0.01
Gasoline 1 >0.01
Heating Oil 1 >0.01
Natural Gas 1 >0.01
The results show that we accept the hypothesis that the processes are integrated in all the time
series of derivative prices. We may calculate the order of integration of each series obtaining
the number of differences required to make the series stationary.
Returning to the flowchart, rejecting the hypothesis of stationarity and having an indication of
integrated processes, we continue in the right part of the scheme and apply a first test of
cointegration.
Engle-Granger Test for Cointegration
To get information about the presence of a cointegration relationship, we will run the Engle-
Granger test and T-test, this time over the entire Matrix of our futures prices.
The test has the form:
Y(:,1)=Y(:2:end)*b+X+a+e
On the left side we have the regressand , the first series, while on the right-side ofthe equation
, from 2 to five in our case, we have the regressors. The key factor here are the residuals, to be
more precise, the estimates of residuals. If the residuals series is stationary, the linear
combination of variables is stationary
[hEG,pValEG]=egtest(DataMatrix,'test',{'t1})
we obtain the following results:
H0 PValue t
Engle-Granger 1 0.0615 ---
t.statistic 1 ---- 0.1
Both tests indicate the presence of cointegration in the matrix of the values of the derivatives.
At this point, we want to identify the cointegration relationship.
We extract the vector of parameter b and the intercept c0 obtained running the function
egcitest and form a linear combination of regression:
c0=reg.coeff(1);
b=reg.coeff(2:5);
plot(Y*[1;-b]-c0,'LineWidth',2)
We have a new variable that is the linear combination of the five futures.
We can see how the series is relatively stationary, moving around zero, with different clusters
of volatility. That's another indication that there is a cointegration relationship.
The models that are used for the cointegrated systems are the Vector Error Correction Models
(or cointegrated VAR ).
Vector Error Correction Models
Once a cointegration relationship has been determined, the remaining coefficients of the VEC
model can be estimated using Ordinary Least Squares. Cointegrated variables tend to restore
common stochastic trend, expressed in terms of error correction. The expression for a VEC (q)
model ,where q is the number of lag, is the following:
Estimating a VEC Model
We said that after finding the cointegration relationship, we can determine the coefficients of
the model. The term with the summation in the VEC model is similar to the VAR model.
The term that is really different is AB'yt-1. A represents the speed of adjustment to the
imbalances of the model. Dx represents an exogenous variables, (not present in our case).
The matrix product AB represents our error correction coefficients
Residuals estimate Covariance Matrix
Simulations and Forecasts
Once the model coefficients are estimated, the underlying data generation process can be
simulated. For example, the code in the script generates a single path of Monte Carlo
forecast:
In these days we can test the effectiveness of the forecast compared to the evolution of the
markets, but it is clear that the predictions of such models lose value when it exceeds one or
two periods , especially with daily data and without seasonal or exogenous element.
Limits of Engle-Granger Regression
The Engle-Granger method has several limitations. First, it identifies only a single cointegration
relationship. This requires one of the variables to be identified as "first" among all the
variables. This choice, which is usually arbitrary, will affect both test results and model
estimation.
We try to go a bit further, permuting the five series and estimating the cointegration
relationship for any choice of a variable as regressant
The table shows the results of the t statistic:
H0 PVal
1 0.0010
1 0.0010
1 0.0063
1 0.0010
1 0.0010
In our case, there is not much difference in choosing any of the five series as regressant and the
other four as regressors.
Here we can see the five cointegration relationships detected by the permutation, the scale
penalizes the different values but all relationships are stationary and mean-reverting.
Another limitation of the Engle-Granger method is that it is a two-steps procedure, with a first
regression that estimates the residual series, and another regression to verify the unit root.
Errors in the initial estimate are necessarily brought in the second evaluation.
Furthermore, the Engle-Granger method for the estimation of the cointegration relationships
play a role in the VEC model definition. As a result, the VEC model estimates also becomes a
two-step procedure
Johansen Test for Cointegration
The Johansen test for cointegration addresses many of the limitations of the Engle-Granger
method. It avoids the two-step estimators and provides comprehensive tests in the presence of
multiple cointegrating relationships.
His approach incorporates the maximum-likelihood test procedure in the process of estimating
the model, avoiding conditional estimates. Furthermore, the test provides a framework for
testing restrictions on cointegrating relationships.
The key point in the Johansen method is the ratio between the degree of the impact matrix
C = AB' and the size of its eigenvalues. The eigenvalues depend on the shape of the VEC model,
and in particular on the composition of its deterministic terms. The method relies on the rank
of cointegration by testing the number of eigenvalues that are statistically different from 0.
We will now run the test for the cointegration rank using the H1 default model,the form of the
H1 model is:
A(B'yt-1+C0)+C1
[~,~,~,~,mles] = jcitest(Y,'model','H1','lags',2,'display','params');
The term mLes refers to the fact that the test procedure is based on the Maximum Likelihood
method.
The results of the Johansen test for Cointegration give us more information than the Engle-
Granger test. We will not take in account the case of rank equal to zero (VAR) and the case of
rank equal to 5 (the data are stationary in value).
************************
Results Summary (Test 1)
Data: Y
Effective sample size: 1294
Model: H1
Lags: 1
Statistic: trace
Significance level: 0.05
r h stat cValue pValue eigVal
========================================
0 1 198.0842 95.7541 0.0010 0.0988
1 0 63.4945 69.8187 0.1443 0.0247
2 0 31.1656 47.8564 0.6613 0.0134
3 0 13.7345 29.7976 0.8550 0.0080
4 0 3.3247 15.4948 0.9503 0.0026
5 0 0.0048 3.8415 0.9445 0.0000
As expected, the null hypothesis is rejected for Rank equal to zero, while the null hypotheses
are not rejected for the ranks from 2 to 4.
All the statistics (stat, cValue, pValue, eigenvalues) indicate the rank = 4 model as the most
appropriate.
Parameters Estimation
In addition to the test for cointegration relationships, the test produces maximum likelihood
estimates of the coefficients of the VEC model. We estimate the parameters for a VEC model
with lag = 1 and rank = 4,
Comparing the Cointegration Analysis Strategies
Comparisons of Engle-Granger and Johansen approaches may be difficult for several reasons.
First of all, the two methods are essentially different, and may disagree on inferences from the
data itself
The Engle-Granger two-step method for the estimation of the model-VEC first estimates the
cointegration relationship and then the coefficients of the model. It's very different from
Johansen method of Maximum Likelihood.
However, the two approaches should provide results that are generally comparable, if both
begin with the same data , searching for the same underlying relationships. Normalized
cointegrating relationships discovered by one of the two methods should reflect the
mechanisms of the process in the data and VEC models constructed from the reports should
have comparable predictive power.
In our case the cointegration relationship obtained with the Engle-Granger and Johansen tests
are very similar. when the results converge we get an important confirmation, given that we
are using different methodologies.
Conclusions
Having said that the forecasting power of the econometric models is questionable, there are
possible practical uses of these models possible.
In particular, even the VEC models obtained by the Johansen, procedures can be used to make
predictions and seem to be more accurate.
We could, for example, to study the effect of these five series on the price of energy and we
could study it at different timeframe (daily, hourly, high frequency). In doing so, we could insert
exogenous variables, such as the dollar index or meteorological factors.
To improve the effectiveness of these models could be very useful to the use of genetic
algorithms (included in Matlab and other software).