financial econometric models i

Financial Econometric Models Vincent JEANNIN – ESGF 5IFM

Q1 2012

Summary of the session (est 3h) • Introduction & Objectives • Bibliography • OLS & Exploration

Introduction & Objectives

• What is a model?

• What the point writing models?

Describe data behaviour

Modelise data behaviour

• Acquire theory knowledge on Econometrics & Statistics

• Step by step from OLS to ANOVA on residuals

• Usage of R and Excel

Forecast data behaviour

𝑂𝑏𝑠 = 𝑀𝑜𝑑𝑒𝑙 + 𝜀 with 𝜀 being a white noise

Bibliography

OLS & Exploration

Linear regression model

Minimize the sum of the square vertical distances between the observations and the linear approximation

𝑦 = 𝑓 𝑥 = 𝑎𝑥 + 𝑏

Residual ε

OLS: Ordinary Least Square

Two parameters to estimate: • Intercept α • Slope β

Minimising residuals

𝐸 = 𝜀𝑖2

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2

𝑖=1

When E is minimal?

When partial derivatives i.r.w. a and b are 0

𝐸 = 𝜀𝑖2

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2

𝑖=1

𝜕𝐸

𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖

2 + 2𝑏𝑥𝑖

𝑖=1

𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2 = 𝑦𝑖2 − 2𝑎𝑥𝑖𝑦𝑖 − 2𝑏𝑦𝑖 + 𝑎2𝑥𝑖

2 + 2𝑎𝑏𝑥𝑖 + 𝑏2

Quick high school reminder if necessary…

−𝑥𝑖𝑦𝑖 + 𝑎𝑥𝑖2 + 𝑏𝑥𝑖

𝑖=1

𝑎 ∗ 𝑥𝑖2

𝑖=1

+ 𝑏 ∗ 𝑥𝑖

𝑖=1

= 𝑥𝑖𝑦𝑖

𝑖=1

𝜕𝐸

𝜕𝑏= −2𝑦𝑖 + 2𝑏 + 2𝑎𝑥𝑖

𝑖=1

−𝑦𝑖 + 𝑏 + 𝑎𝑥𝑖

𝑖=1

𝑎 ∗ 𝑥𝑖

𝑖=1

+ 𝑛𝑏 = 𝑦𝑖

𝑖=1

𝑎 ∗ 𝑥𝑖

𝑖=1

+ 𝑛𝑏 = 𝑦𝑖

𝑖=1

Leads easily to the intercept

𝑎𝑛𝑥 + 𝑛𝑏 = 𝑛𝑦

𝑎𝑥 + 𝑏 = 𝑦

The regression line is going through (𝑥 , 𝑦 )

The distance of this point to the line is 0 indeed

𝜕𝐸

𝜕𝑏

𝑏 = 𝑦 − 𝑎𝑥

𝜕𝐸

𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖

2 + 2𝑏𝑥𝑖

𝑖=1

y = 𝑎𝑥 + 𝑦 − 𝑎𝑥

y − 𝑦 = 𝑎(𝑥 − 𝑥 )

𝑏 = 𝑦 − 𝑎𝑥

𝑥𝑖 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 = 0

𝑖=1

𝜕𝐸

𝜕𝑏= −2𝑦𝑖 + 2𝑏 + 2𝑎𝑥𝑖 = 0

𝑖=1

𝑦𝑖 − 𝑏 − 𝑎𝑥𝑖

𝑖=1

𝑦𝑖 − 𝑦 + 𝑎𝑥 − 𝑎𝑥𝑖 = 0

𝑖=1

(𝑦𝑖 − 𝑦 ) − 𝑎(𝑥𝑖 − 𝑥 )

𝑖=1

𝑥𝑖 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑦 + 𝑎𝑥 = 0

𝑖=1

𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑖=1

𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑖=1

= 0 𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑖=1

= 𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑖=1

− 𝑥 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥

𝑖=1

(𝑥𝑖−𝑥 )(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑖=1

𝑎 = (𝑥𝑖−𝑥 )(𝑦𝑖 − 𝑦 )𝑛

𝑖=1

(𝑥𝑖−𝑥 )2 𝑛𝑖=1

Finally…

We have

𝑎 = (𝑥𝑖 − 𝑥 )(𝑦𝑖 − 𝑦 )𝑛

𝑖=1

(𝑥𝑖 − 𝑥 )2𝑛𝑖=1

Covariance

Variance

𝑎 =𝐶𝑜𝑣𝑥𝑦

𝜎2𝑥

𝑏 = 𝑦 − 𝑎 𝑥

You can use Excel function INTERCEPT and SLOPE

Calculate the Variances and Covariance of X{1,2,3,3,1,2} and Y{2,3,1,1,3,2}

You can use Excel function VAR.P, COVARIANCE.P and STDEV.P

Let’s asses the quality of the regression

Let’s calculate the correlation coefficient (aka Pearson Product-Moment Correlation Coefficient – PPMCC):

𝑟 =𝐶𝑜𝑣𝑥𝑦

𝜎𝑥𝜎𝑦 Value between -1 and 1

𝑟 = 1 Perfect dependence

𝑟 ~0 No dependence

Give an idea of the dispersion of the scatterplot

You can use Excel function CORREL

R=0.96

High quality

R=0.62

Poor quality

What is good quality?

Slightly discretionary…

𝑟 ≥3

2= 0.8666…

It’s largely admitted as the threshold for acceptable / poor

The regression itself introduces a bias

Let’s introduce the coefficient of determination R-Squared

Total Dispersion = Dispersion Regression + Dispersion Residual

Dispersion Regression

Total Dispersion 𝑅2 =

In other words the part of the total dispersion explained by the regression

𝑦𝑖 − 𝑦 2 = 𝑦𝑖 − 𝑦𝑖 2 + 𝑦𝑖 − 𝑦 2

You can use Excel function RSQ

In a simple linear regression with intercept 𝑅2 = 𝑟2

Is a good correlation coefficient and a good coefficient of determination enough to accept the regression?

Not necessarily!

Residuals need to have no effect, in other word to be a white noise!

𝑦 = 7.5

𝑥 = 9

𝑦 = 3 + 0.5𝑥

𝑟 = 0.82

𝑅2 = 0.67

Don’t get fooled by numbers!

For every dataset of the Quarter

Can you say at this stage which regression is the best?

Certainly not those on the right you need a LINEAR dependence

Is any linear regression useless?

Think what you could do to the series

Polynomial transformation, log transformation,…

Else, non linear regressions, but it’s another story

First application on financial market

S&P / AmEx in 2011

𝑅𝐴𝑚𝑒𝑥 = 0.06% + 1.1046 ∗ 𝑅𝑆&𝑃

𝑟 =𝐶𝑜𝑣𝐴𝑚𝐸𝑥,𝑆&𝑃

𝜎𝐴𝑚𝐸𝑥𝜎𝑆&𝑃= 0.8501

𝑅2 = 𝑟2 = 0.7227

Oups :-o

Is Excel wrong?

R-Squared has different calculation methods

Let’s accept the following regression then as the quality seems pretty good

How to use this?

• Forecasting? Not really… Both are random variables

• Hedging? Yes but basis risk Yes but careful to the residuals…

Let’s have a try!

In theory, what is the daily result of the hedge? 𝑎

Hedging $1.0M of AmEx Stocks with $1.1046M of S&P

It would have been too easy… Great differences… Why?

Sensitivity to the size of the sample

Heteroscedasticity

Let’s have a similar approach using a proper statistics and econometrics software

• Free • Open Source • Developments shared by developers

> Val<-read.csv(file="C:/Users/Vinz/Desktop/Val.csv",head=TRUE,sep=",")

> summary(Val)

SPX AMEX

Min. :-0.0666344 Min. :-0.0883287

1st Qu.:-0.0069082 1st Qu.:-0.0094580

Median : 0.0010016 Median : 0.0013007

Mean : 0.0001249 Mean : 0.0005891

3rd Qu.: 0.0075235 3rd Qu.: 0.0102923

Max. : 0.0474068 Max. : 0.0710967

Let’s begin with statistical exploration to get familiar with the series and the software

> hist(Val$AMEX, breaks=20, main="Distribution

AMEX Returns")

> sd(Val$AMEX)

[1] 0.01915489

> hist(Val$SPX, breaks=20, main="Distribution

SPXX Returns")

> sd(Val$SPX)

[1] 0.01468776

These are obvious negatively skewed distributions

> skewness(Val$AMEX)

[1] -0.2453693

> skewness(Val$SPX)

[1] -0.4178701

Reminders

• Negative skew: long left tail, mass on the right, skew to the left • Positive skew: long right tail, mass on the left, skew to the right

𝑆𝐾𝐸𝑊 𝑋 = 𝐸𝑋 − 𝑋

=𝐸 𝑋 − 𝑋 3

𝐸 𝑋 − 𝑋 2 3/2

These are obvious leptokurtic distributions

> library(moments)

> kurtosis(Val$AMEX)

[1] 5.770583

> kurtosis(Val$SPX)

[1] 5.671254

Reminders

What is their K? (excess kurtosis)

Subtract 3 to make it relative to the normal distribution…

𝐾𝑈𝑅𝑇 𝑋 = 𝐸𝑋 − 𝑋

=𝐸 𝑋 − 𝑋 4

𝐸 𝑋 − 𝑋 2 2

Excel function SKEW

R function skewness (package moments)

Quick check: what are the Skewness and Kurtosis of {1,2,-3,0,-2,1,1}?

Excel function KURT

R function kurtosis (package moments)

By the way, what is the most platykurtic distribution in the nature?

Toss it!

Head = Success = 1 / Tail = Failure = 0

> require(moments)

> library(moments)

> toss<-rbinom(10000000,1,0.5)

> mean(toss)

[1] 0.5001777

> kurtosis(toss)

[1] 1.000001

> kurtosis(toss)-3

[1] -1.999999

> hist(toss, breaks=10,main="Tossing a

coin 10 millions times",xlab="Result

of the trial",ylab="Occurence")

> sum(toss)

[1] 5001777

50.01777% rate of success: fair or not fair? Trick coin ?

On a perfect 50/50, Kurtosis would be 1, Excess Kurtosis -2: the minimum!

This is a Bernoulli trial

𝐵(𝑛, 𝑝)

𝑝 Mean

SD 𝑝(1 − 𝑝)

Skewness 1 − 2𝑝

𝑝(1 − 𝑝)

Kurtosis 1

𝑝(1 − 𝑝)− 3

Easy to demonstrate if p=0.5 the Kurtosis will be the lowest Bit more complicated to demonstrate it for any distribution

Can be tested later with a Bayesian approach

𝑛 > 1 0 < 𝑝 < 1 with and 𝑝 ∈ ℝ and 𝑛 integer

Back to our series, a good tool is the BoxPlot

boxplot(Val$AMEX,Val$SPX, main="AMEX & S&P BoxPlots",

names=c("AMEX","SPX"),col="blue")

Too Many Outliers!

There should be 2 max To be normal

Fatter tails than the normal distribution

Leptokurtic distributions

Negatively skewed distribution

Are they normal distributions?

Let’s compare them to normal distributions with same standard deviation and mean and make the QQ Plots

x=seq(-0.2,0.2,length=200)

y1=dnorm(x,mean=mean(Val$AMEX),sd=sd(

Val$AMEX))

hist(Val$AMEX, breaks=100,main="AmEx

Returns / Normal

Distribution",xlab="Return",ylab="Occ

urence")

lines(x,y1,type="l",lwd=3,col="red")

x=seq(-0.2,0.2,length=200)

y1=dnorm(x,mean=mean(Val$SPX),sd=sd(Val$S

hist(Val$SPX, breaks=20,main="S&P Returns

/ Normal

Distribution",xlab="Return",ylab="Occuren

lines(x,y1,type="l",lwd=3,col="red")

Excess kurtosis obvious

Fatter and longer tails

Let’s have a look to their CDF through QQPlot

Let’s properly test the normality

> qqnorm(Val$AMEX)

> qqline(Val$AMEX)

> qqnorm(Val$SPX)

> qqline(Val$SPX)

Fatter tails

Can use many tests…

• Kolmogorov-Smirnov • Jarque Bera • Chi Square • Shapiro Wilk

Let’s try Kolmogorov-Smirnov

It compares the distance between the empirical CDF and the CFD of the reference distribution

x=seq(-4,4,length=1000)

plot(ecdf(Val$AMEX),do.points=FALSE, col="red", lwd=3,

main="Normal Distribution against AMEX - CFD's", xlab="x",

ylab="P(X<=x)")

lines(x,pnorm(x,mean=mean(Val$AMEX),sd=sd(Val$AMEX)),col="blue",t

ype="l",lwd=3)

x=seq(-4,4,length=1000)

plot(ecdf(Val$SPX),do.points=FALSE, col="red", lwd=3,

main="Normal Distribution against S&P - CFD's", xlab="x",

ylab="P(X<=x)")

lines(x,pnorm(x,mean=mean(Val$SPX),sd=sd(Val$SPX)),col="blue",typ

e="l",lwd=3)

> ks.test(Val$SPX, "pnorm")

One-sample Kolmogorov-

Smirnov test

data: Val$SPX

D = 0.4811, p-value < 2.2e-16

alternative hypothesis: two-sided

> ks.test(Val$AMEX, "pnorm")

One-sample Kolmogorov-Smirnov

data: Val$AMEX

D = 0.4742, p-value < 2.2e-16

The 0 hypothesis is the distribution is normal

Do we accept or reject the hypothesis 0 with a 95% confidence interval?

The hypothesis regarding the distributional form is rejected if the test statistic, D, is greater than the critical value obtained from a table

Sample size: 251 1.36

251= 0.086

Rejected or not?

Rejected! Series aren’t fitting a normal distribution P-Value was giving the answer

Ok, we now know a bit more the 2 series we want to regress

> lm(Val$AMEX~Val$SPX)

lm(formula = Val$AMEX ~ Val$SPX)

Coefficients:

(Intercept) Val$SPX

0.0004505 1.1096287

plot(Val$SPX,Val$AMEX, main="S&P / AmEx", xlab="S&P", ylab="AmEx",

col="red")

abline(lm(Val$AMEX~Val$SPX), col="blue")

𝑦 = 110.96% ∗ 𝑥 + 0.045%

> Reg<-lm(Val$AMEX~Val$SPX)

> summary(Reg)

lm(formula = Val$AMEX ~ Val$SPX)

Residuals:

Min 1Q Median 3Q Max

-0.030387 -0.006072 -0.000114 0.006624 0.027824

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.0004505 0.0006365 0.708 0.48

Val$SPX 1.1096287 0.0434231 25.554 <2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

Residual standard error: 0.01008 on 249 degrees of freedom

Multiple R-squared: 0.7239, Adjusted R-squared: 0.7228

F-statistic: 653 on 1 and 249 DF, p-value: < 2.2e-16

The next important step is no analyse the residuals

They need to be a white noise, you can have a first assessment with quartiles

layout(matrix(1:4,2,2))

plot(Reg)

QQ Plot compares the CDF

A perfect fit is a line

Left tail noticeably different

Residuals should be randomly distributed around the 0 horizontal line

To accept or reject the regression you need residuals to be a white noise

Their mean should be 0

You don’t want to see a trend, a dependence

• Square root of the standardized residuals as a function of the fitted values

• There should be no obvious trend in this plot

Nothing suggesting a white noise

Showing now leverage

Marginal importance of a point in the regression

Far points suggest outlier or poor model

So do we accept the regression?

Probably not… But let’s check…

Kolmogorov-Smirnov on residuals

Resid<-resid(Reg)

ks.test(Resid, "pnorm")

One-sample Kolmogorov-Smirnov test

data: Resid

D = 0.4889, p-value < 2.2e-16

𝐷 =1.36

251= 0.086

Higher bound value for the H0 to be accepted

Rejected! Regression between 2 different asset are very often poor

Heteroscedasticity

Basis risk if you hedge anyway

Conclusion

Residuals

Normality

Heteroscedasticity

financial econometric models i

ifm q1

square esgf

variance esgf

intercept slope esgf

econometrics software

linear regression model

regression line

simple linear regression

Economy & Finance

risk and volatility: econometric models and financial -...

structural econometric models: past and future (with ... ·...

advanced econometric marketing models

econometric models for oil price

econometric analysis of large factor models

high-dimensional sparse econometric models, an...

building econometric models

comparing information in forecasts from econometric models

risk and volatility: econometric models and financial...

financial econometric modelling

econometric models and samples:...

dynamic econometric models - diw · this course provides an...

subject: advanced statistical inference (200604) … ·...

risk and volatility: econometric models and financial …

the usefulness of econometric models with …

econometric analysis of international financial markets

economic and econometric models

financial econometric models iv

gaussian mixture models meet econometric models

bootstraping econometric models