Download - Financial Econometric Models I
Financial Econometric Models Vincent JEANNIN – ESGF 5IFM
Q1 2012
1
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
2
Summary of the session (est 3h) • Introduction & Objectives • Bibliography • OLS & Exploration
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Introduction & Objectives
• What is a model?
• What the point writing models?
3
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Describe data behaviour
Modelise data behaviour
• Acquire theory knowledge on Econometrics & Statistics
• Step by step from OLS to ANOVA on residuals
• Usage of R and Excel
Forecast data behaviour
𝑂𝑏𝑠 = 𝑀𝑜𝑑𝑒𝑙 + 𝜀 with 𝜀 being a white noise
Bibliography
4
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
OLS & Exploration
5
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Linear regression model
Minimize the sum of the square vertical distances between the observations and the linear approximation
𝑦 = 𝑓 𝑥 = 𝑎𝑥 + 𝑏
Residual ε
OLS: Ordinary Least Square
6
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Two parameters to estimate: • Intercept α • Slope β
Minimising residuals
𝐸 = 𝜀𝑖2
𝑛
𝑖=1
= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2
𝑛
𝑖=1
When E is minimal?
When partial derivatives i.r.w. a and b are 0
7
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝐸 = 𝜀𝑖2
𝑛
𝑖=1
= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2
𝑛
𝑖=1
= 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2
𝑛
𝑖=1
𝜕𝐸
𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖
2 + 2𝑏𝑥𝑖
𝑛
𝑖=1
= 0
𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2 = 𝑦𝑖2 − 2𝑎𝑥𝑖𝑦𝑖 − 2𝑏𝑦𝑖 + 𝑎2𝑥𝑖
2 + 2𝑎𝑏𝑥𝑖 + 𝑏2
Quick high school reminder if necessary…
−𝑥𝑖𝑦𝑖 + 𝑎𝑥𝑖2 + 𝑏𝑥𝑖
𝑛
𝑖=1
= 0
𝑎 ∗ 𝑥𝑖2
𝑛
𝑖=1
+ 𝑏 ∗ 𝑥𝑖
𝑛
𝑖=1
= 𝑥𝑖𝑦𝑖
𝑛
𝑖=1
𝜕𝐸
𝜕𝑏= −2𝑦𝑖 + 2𝑏 + 2𝑎𝑥𝑖
𝑛
𝑖=1
= 0
−𝑦𝑖 + 𝑏 + 𝑎𝑥𝑖
𝑛
𝑖=1
= 0
𝑎 ∗ 𝑥𝑖
𝑛
𝑖=1
+ 𝑛𝑏 = 𝑦𝑖
𝑛
𝑖=1
8
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑎 ∗ 𝑥𝑖
𝑛
𝑖=1
+ 𝑛𝑏 = 𝑦𝑖
𝑛
𝑖=1
Leads easily to the intercept
𝑎𝑛𝑥 + 𝑛𝑏 = 𝑛𝑦
𝑎𝑥 + 𝑏 = 𝑦
The regression line is going through (𝑥 , 𝑦 )
The distance of this point to the line is 0 indeed
𝜕𝐸
𝜕𝑏
𝑏 = 𝑦 − 𝑎𝑥
9
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝜕𝐸
𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖
2 + 2𝑏𝑥𝑖
𝑛
𝑖=1
= 0
y = 𝑎𝑥 + 𝑦 − 𝑎𝑥
y − 𝑦 = 𝑎(𝑥 − 𝑥 )
𝑏 = 𝑦 − 𝑎𝑥
𝑥𝑖 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 = 0
𝑛
𝑖=1
𝜕𝐸
𝜕𝑏= −2𝑦𝑖 + 2𝑏 + 2𝑎𝑥𝑖 = 0
𝑛
𝑖=1
𝑦𝑖 − 𝑏 − 𝑎𝑥𝑖
𝑛
𝑖=1
= 0
𝑦𝑖 − 𝑦 + 𝑎𝑥 − 𝑎𝑥𝑖 = 0
𝑛
𝑖=1
(𝑦𝑖 − 𝑦 ) − 𝑎(𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 0
𝑥𝑖 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑦 + 𝑎𝑥 = 0
𝑛
𝑖=1
𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 0
𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 0
10
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 0 𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 0
𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
− 𝑥 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥
𝑛
𝑖=1
= 0
(𝑥𝑖−𝑥 )(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 0
𝑎 = (𝑥𝑖−𝑥 )(𝑦𝑖 − 𝑦 )𝑛
𝑖=1
(𝑥𝑖−𝑥 )2 𝑛𝑖=1
Finally…
We have
and
11
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑎 = (𝑥𝑖 − 𝑥 )(𝑦𝑖 − 𝑦 )𝑛
𝑖=1
(𝑥𝑖 − 𝑥 )2𝑛𝑖=1
Covariance
Variance
𝑎 =𝐶𝑜𝑣𝑥𝑦
𝜎2𝑥
𝑏 = 𝑦 − 𝑎 𝑥
You can use Excel function INTERCEPT and SLOPE
12
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Calculate the Variances and Covariance of X{1,2,3,3,1,2} and Y{2,3,1,1,3,2}
You can use Excel function VAR.P, COVARIANCE.P and STDEV.P
13
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Let’s asses the quality of the regression
Let’s calculate the correlation coefficient (aka Pearson Product-Moment Correlation Coefficient – PPMCC):
𝑟 =𝐶𝑜𝑣𝑥𝑦
𝜎𝑥𝜎𝑦 Value between -1 and 1
𝑟 = 1 Perfect dependence
𝑟 ~0 No dependence
Give an idea of the dispersion of the scatterplot
You can use Excel function CORREL
14
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
R=0.96
High quality
R=0.62
Poor quality
15
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
What is good quality?
Slightly discretionary…
𝑟 ≥3
2= 0.8666…
If
It’s largely admitted as the threshold for acceptable / poor
16
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
The regression itself introduces a bias
Let’s introduce the coefficient of determination R-Squared
Total Dispersion = Dispersion Regression + Dispersion Residual
Dispersion Regression
Total Dispersion 𝑅2 =
In other words the part of the total dispersion explained by the regression
𝑦𝑖 − 𝑦 2 = 𝑦𝑖 − 𝑦𝑖 2 + 𝑦𝑖 − 𝑦 2
You can use Excel function RSQ
17
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
In a simple linear regression with intercept 𝑅2 = 𝑟2
Is a good correlation coefficient and a good coefficient of determination enough to accept the regression?
Not necessarily!
Residuals need to have no effect, in other word to be a white noise!
18
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
19
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑦 = 7.5
𝑥 = 9
𝑦 = 3 + 0.5𝑥
𝑟 = 0.82
𝑅2 = 0.67
Don’t get fooled by numbers!
For every dataset of the Quarter
Can you say at this stage which regression is the best?
Certainly not those on the right you need a LINEAR dependence
20
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Is any linear regression useless?
Think what you could do to the series
Polynomial transformation, log transformation,…
Else, non linear regressions, but it’s another story
21
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
First application on financial market
S&P / AmEx in 2011
22
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑅𝐴𝑚𝑒𝑥 = 0.06% + 1.1046 ∗ 𝑅𝑆&𝑃
𝑟 =𝐶𝑜𝑣𝐴𝑚𝐸𝑥,𝑆&𝑃
𝜎𝐴𝑚𝐸𝑥𝜎𝑆&𝑃= 0.8501
𝑅2 = 𝑟2 = 0.7227
Oups :-o
Is Excel wrong?
R-Squared has different calculation methods
Let’s accept the following regression then as the quality seems pretty good
23
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
How to use this?
• Forecasting? Not really… Both are random variables
• Hedging? Yes but basis risk Yes but careful to the residuals…
Let’s have a try!
In theory, what is the daily result of the hedge? 𝑎
24
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Hedging $1.0M of AmEx Stocks with $1.1046M of S&P
It would have been too easy… Great differences… Why?
Sensitivity to the size of the sample
Heteroscedasticity
25
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Let’s have a similar approach using a proper statistics and econometrics software
• Free • Open Source • Developments shared by developers
> Val<-read.csv(file="C:/Users/Vinz/Desktop/Val.csv",head=TRUE,sep=",")
> summary(Val)
SPX AMEX
Min. :-0.0666344 Min. :-0.0883287
1st Qu.:-0.0069082 1st Qu.:-0.0094580
Median : 0.0010016 Median : 0.0013007
Mean : 0.0001249 Mean : 0.0005891
3rd Qu.: 0.0075235 3rd Qu.: 0.0102923
Max. : 0.0474068 Max. : 0.0710967
Let’s begin with statistical exploration to get familiar with the series and the software
26
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
> hist(Val$AMEX, breaks=20, main="Distribution
AMEX Returns")
> sd(Val$AMEX)
[1] 0.01915489
> hist(Val$SPX, breaks=20, main="Distribution
SPXX Returns")
> sd(Val$SPX)
[1] 0.01468776
27
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
These are obvious negatively skewed distributions
> skewness(Val$AMEX)
[1] -0.2453693
> skewness(Val$SPX)
[1] -0.4178701
Reminders
• Negative skew: long left tail, mass on the right, skew to the left • Positive skew: long right tail, mass on the left, skew to the right
𝑆𝐾𝐸𝑊 𝑋 = 𝐸𝑋 − 𝑋
𝜎
3
=𝐸 𝑋 − 𝑋 3
𝐸 𝑋 − 𝑋 2 3/2
28
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
These are obvious leptokurtic distributions
> library(moments)
> kurtosis(Val$AMEX)
[1] 5.770583
> kurtosis(Val$SPX)
[1] 5.671254
Reminders
What is their K? (excess kurtosis)
Subtract 3 to make it relative to the normal distribution…
𝐾𝑈𝑅𝑇 𝑋 = 𝐸𝑋 − 𝑋
𝜎
4
=𝐸 𝑋 − 𝑋 4
𝐸 𝑋 − 𝑋 2 2
29
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Excel function SKEW
R function skewness (package moments)
Quick check: what are the Skewness and Kurtosis of {1,2,-3,0,-2,1,1}?
30
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Excel function KURT
R function kurtosis (package moments)
31
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
By the way, what is the most platykurtic distribution in the nature?
Toss it!
Head = Success = 1 / Tail = Failure = 0
> require(moments)
> library(moments)
> toss<-rbinom(10000000,1,0.5)
> mean(toss)
[1] 0.5001777
> kurtosis(toss)
[1] 1.000001
> kurtosis(toss)-3
[1] -1.999999
> hist(toss, breaks=10,main="Tossing a
coin 10 millions times",xlab="Result
of the trial",ylab="Occurence")
> sum(toss)
[1] 5001777
32
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
50.01777% rate of success: fair or not fair? Trick coin ?
On a perfect 50/50, Kurtosis would be 1, Excess Kurtosis -2: the minimum!
This is a Bernoulli trial
𝐵(𝑛, 𝑝)
𝑝 Mean
SD 𝑝(1 − 𝑝)
Skewness 1 − 2𝑝
𝑝(1 − 𝑝)
Kurtosis 1
𝑝(1 − 𝑝)− 3
Easy to demonstrate if p=0.5 the Kurtosis will be the lowest Bit more complicated to demonstrate it for any distribution
Can be tested later with a Bayesian approach
𝑛 > 1 0 < 𝑝 < 1 with and 𝑝 ∈ ℝ and 𝑛 integer
33
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Back to our series, a good tool is the BoxPlot
boxplot(Val$AMEX,Val$SPX, main="AMEX & S&P BoxPlots",
names=c("AMEX","SPX"),col="blue")
Too Many Outliers!
There should be 2 max To be normal
Fatter tails than the normal distribution
34
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Leptokurtic distributions
Negatively skewed distribution
Are they normal distributions?
Let’s compare them to normal distributions with same standard deviation and mean and make the QQ Plots
35
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
x=seq(-0.2,0.2,length=200)
y1=dnorm(x,mean=mean(Val$AMEX),sd=sd(
Val$AMEX))
hist(Val$AMEX, breaks=100,main="AmEx
Returns / Normal
Distribution",xlab="Return",ylab="Occ
urence")
lines(x,y1,type="l",lwd=3,col="red")
x=seq(-0.2,0.2,length=200)
y1=dnorm(x,mean=mean(Val$SPX),sd=sd(Val$S
PX))
hist(Val$SPX, breaks=20,main="S&P Returns
/ Normal
Distribution",xlab="Return",ylab="Occuren
ce")
lines(x,y1,type="l",lwd=3,col="red")
36
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Excess kurtosis obvious
Fatter and longer tails
Let’s have a look to their CDF through QQPlot
37
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Let’s properly test the normality
> qqnorm(Val$AMEX)
> qqline(Val$AMEX)
> qqnorm(Val$SPX)
> qqline(Val$SPX)
Fatter tails
38
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Can use many tests…
• Kolmogorov-Smirnov • Jarque Bera • Chi Square • Shapiro Wilk
Let’s try Kolmogorov-Smirnov
It compares the distance between the empirical CDF and the CFD of the reference distribution
39
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
x=seq(-4,4,length=1000)
plot(ecdf(Val$AMEX),do.points=FALSE, col="red", lwd=3,
main="Normal Distribution against AMEX - CFD's", xlab="x",
ylab="P(X<=x)")
lines(x,pnorm(x,mean=mean(Val$AMEX),sd=sd(Val$AMEX)),col="blue",t
ype="l",lwd=3)
x=seq(-4,4,length=1000)
plot(ecdf(Val$SPX),do.points=FALSE, col="red", lwd=3,
main="Normal Distribution against S&P - CFD's", xlab="x",
ylab="P(X<=x)")
lines(x,pnorm(x,mean=mean(Val$SPX),sd=sd(Val$SPX)),col="blue",typ
e="l",lwd=3)
40
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
> ks.test(Val$SPX, "pnorm")
One-sample Kolmogorov-
Smirnov test
data: Val$SPX
D = 0.4811, p-value < 2.2e-16
alternative hypothesis: two-sided
> ks.test(Val$AMEX, "pnorm")
One-sample Kolmogorov-Smirnov
test
data: Val$AMEX
D = 0.4742, p-value < 2.2e-16
alternative hypothesis: two-sided
The 0 hypothesis is the distribution is normal
Do we accept or reject the hypothesis 0 with a 95% confidence interval?
The hypothesis regarding the distributional form is rejected if the test statistic, D, is greater than the critical value obtained from a table
41
vin
zjea
nn
in@
ho
tmai
l.co
m
Sample size: 251 1.36
251= 0.086
Rejected or not?
Rejected! Series aren’t fitting a normal distribution P-Value was giving the answer
42
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Ok, we now know a bit more the 2 series we want to regress
> lm(Val$AMEX~Val$SPX)
Call:
lm(formula = Val$AMEX ~ Val$SPX)
Coefficients:
(Intercept) Val$SPX
0.0004505 1.1096287
plot(Val$SPX,Val$AMEX, main="S&P / AmEx", xlab="S&P", ylab="AmEx",
col="red")
abline(lm(Val$AMEX~Val$SPX), col="blue")
𝑦 = 110.96% ∗ 𝑥 + 0.045%
43
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
> Reg<-lm(Val$AMEX~Val$SPX)
> summary(Reg)
Call:
lm(formula = Val$AMEX ~ Val$SPX)
Residuals:
Min 1Q Median 3Q Max
-0.030387 -0.006072 -0.000114 0.006624 0.027824
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0004505 0.0006365 0.708 0.48
Val$SPX 1.1096287 0.0434231 25.554 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
Residual standard error: 0.01008 on 249 degrees of freedom
Multiple R-squared: 0.7239, Adjusted R-squared: 0.7228
F-statistic: 653 on 1 and 249 DF, p-value: < 2.2e-16
The next important step is no analyse the residuals
They need to be a white noise, you can have a first assessment with quartiles
44
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
layout(matrix(1:4,2,2))
plot(Reg)
45
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
QQ Plot compares the CDF
A perfect fit is a line
Left tail noticeably different
46
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Residuals should be randomly distributed around the 0 horizontal line
To accept or reject the regression you need residuals to be a white noise
Their mean should be 0
You don’t want to see a trend, a dependence
47
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
• Square root of the standardized residuals as a function of the fitted values
• There should be no obvious trend in this plot
Nothing suggesting a white noise
48
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Showing now leverage
Marginal importance of a point in the regression
Far points suggest outlier or poor model
49
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
So do we accept the regression?
Probably not… But let’s check…
Kolmogorov-Smirnov on residuals
Resid<-resid(Reg)
ks.test(Resid, "pnorm")
One-sample Kolmogorov-Smirnov test
data: Resid
D = 0.4889, p-value < 2.2e-16
alternative hypothesis: two-sided
𝐷 =1.36
251= 0.086
Higher bound value for the H0 to be accepted
Rejected! Regression between 2 different asset are very often poor
Heteroscedasticity
Basis risk if you hedge anyway
50
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Conclusion
OLS
Residuals
Normality
Heteroscedasticity