Download - Financial Econometric Models I

Financial Econometric Models Vincent JEANNIN – ESGF 5IFM

Q1 2012

1

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

2

Summary of the session (est 3h) • Introduction & Objectives • Bibliography • OLS & Exploration

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Introduction & Objectives

• What is a model?

• What the point writing models?

3

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Describe data behaviour

Modelise data behaviour

• Acquire theory knowledge on Econometrics & Statistics

• Step by step from OLS to ANOVA on residuals

• Usage of R and Excel

Forecast data behaviour

𝑂𝑏𝑠 = 𝑀𝑜𝑑𝑒𝑙 + 𝜀 with 𝜀 being a white noise

Bibliography

4

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

OLS & Exploration

5

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Linear regression model

Minimize the sum of the square vertical distances between the observations and the linear approximation

𝑦 = 𝑓 𝑥 = 𝑎𝑥 + 𝑏

Residual ε

OLS: Ordinary Least Square

6

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Two parameters to estimate: • Intercept α • Slope β

Minimising residuals

𝐸 = 𝜀𝑖2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2

𝑛

𝑖=1

When E is minimal?

When partial derivatives i.r.w. a and b are 0

7

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝐸 = 𝜀𝑖2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2

𝑛

𝑖=1

𝜕𝐸

𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖

2 + 2𝑏𝑥𝑖

𝑛

𝑖=1

= 0

𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2 = 𝑦𝑖2 − 2𝑎𝑥𝑖𝑦𝑖 − 2𝑏𝑦𝑖 + 𝑎2𝑥𝑖

2 + 2𝑎𝑏𝑥𝑖 + 𝑏2

Quick high school reminder if necessary…

−𝑥𝑖𝑦𝑖 + 𝑎𝑥𝑖2 + 𝑏𝑥𝑖

𝑛

𝑖=1

= 0

𝑎 ∗ 𝑥𝑖2

𝑛

𝑖=1

+ 𝑏 ∗ 𝑥𝑖

𝑛

𝑖=1

= 𝑥𝑖𝑦𝑖

𝑛

𝑖=1

𝜕𝐸

𝜕𝑏= −2𝑦𝑖 + 2𝑏 + 2𝑎𝑥𝑖

𝑛

𝑖=1

= 0

−𝑦𝑖 + 𝑏 + 𝑎𝑥𝑖

𝑛

𝑖=1

= 0

𝑎 ∗ 𝑥𝑖

𝑛

𝑖=1

+ 𝑛𝑏 = 𝑦𝑖

𝑛

𝑖=1

8

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑎 ∗ 𝑥𝑖

𝑛

𝑖=1

+ 𝑛𝑏 = 𝑦𝑖

𝑛

𝑖=1

Leads easily to the intercept

𝑎𝑛𝑥 + 𝑛𝑏 = 𝑛𝑦

𝑎𝑥 + 𝑏 = 𝑦

The regression line is going through (𝑥 , 𝑦 )

The distance of this point to the line is 0 indeed

𝜕𝐸

𝜕𝑏

𝑏 = 𝑦 − 𝑎𝑥

9

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝜕𝐸

𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖

2 + 2𝑏𝑥𝑖

𝑛

𝑖=1

= 0

y = 𝑎𝑥 + 𝑦 − 𝑎𝑥

y − 𝑦 = 𝑎(𝑥 − 𝑥 )

𝑏 = 𝑦 − 𝑎𝑥

𝑥𝑖 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 = 0

𝑛

𝑖=1

𝜕𝐸

𝜕𝑏= −2𝑦𝑖 + 2𝑏 + 2𝑎𝑥𝑖 = 0

𝑛

𝑖=1

𝑦𝑖 − 𝑏 − 𝑎𝑥𝑖

𝑛

𝑖=1

= 0

𝑦𝑖 − 𝑦 + 𝑎𝑥 − 𝑎𝑥𝑖 = 0

𝑛

𝑖=1

(𝑦𝑖 − 𝑦 ) − 𝑎(𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

𝑥𝑖 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑦 + 𝑎𝑥 = 0

𝑛

𝑖=1

𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

10

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12


𝑛

𝑖=1

= 0 𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0


𝑛

𝑖=1

= 𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1


𝑛

𝑖=1

− 𝑥 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥

𝑛

𝑖=1

= 0

(𝑥𝑖−𝑥 )(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

𝑎 = (𝑥𝑖−𝑥 )(𝑦𝑖 − 𝑦 )𝑛

𝑖=1

(𝑥𝑖−𝑥 )2 𝑛𝑖=1

Finally…

We have

and

11

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑎 = (𝑥𝑖 − 𝑥 )(𝑦𝑖 − 𝑦 )𝑛

𝑖=1

(𝑥𝑖 − 𝑥 )2𝑛𝑖=1

Covariance

Variance

𝑎 =𝐶𝑜𝑣𝑥𝑦

𝜎2𝑥

𝑏 = 𝑦 − 𝑎 𝑥

You can use Excel function INTERCEPT and SLOPE

12

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Calculate the Variances and Covariance of X{1,2,3,3,1,2} and Y{2,3,1,1,3,2}

You can use Excel function VAR.P, COVARIANCE.P and STDEV.P

13

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Let’s asses the quality of the regression

Let’s calculate the correlation coefficient (aka Pearson Product-Moment Correlation Coefficient – PPMCC):

𝑟 =𝐶𝑜𝑣𝑥𝑦

𝜎𝑥𝜎𝑦 Value between -1 and 1

𝑟 = 1 Perfect dependence

𝑟 ~0 No dependence

Give an idea of the dispersion of the scatterplot

You can use Excel function CORREL

14

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

R=0.96

High quality

R=0.62

Poor quality

15

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

What is good quality?

Slightly discretionary…

𝑟 ≥3

2= 0.8666…

If

It’s largely admitted as the threshold for acceptable / poor

16

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

The regression itself introduces a bias

Let’s introduce the coefficient of determination R-Squared

Total Dispersion = Dispersion Regression + Dispersion Residual

Dispersion Regression

Total Dispersion 𝑅2 =

In other words the part of the total dispersion explained by the regression

𝑦𝑖 − 𝑦 2 = 𝑦𝑖 − 𝑦𝑖 2 + 𝑦𝑖 − 𝑦 2

You can use Excel function RSQ

17

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

In a simple linear regression with intercept 𝑅2 = 𝑟2

Is a good correlation coefficient and a good coefficient of determination enough to accept the regression?

Not necessarily!

Residuals need to have no effect, in other word to be a white noise!

18

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

19

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑦 = 7.5

𝑥 = 9

𝑦 = 3 + 0.5𝑥

𝑟 = 0.82

𝑅2 = 0.67

Don’t get fooled by numbers!

For every dataset of the Quarter

Can you say at this stage which regression is the best?

Certainly not those on the right you need a LINEAR dependence

20

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Is any linear regression useless?

Think what you could do to the series

Polynomial transformation, log transformation,…

Else, non linear regressions, but it’s another story

21

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

First application on financial market

S&P / AmEx in 2011

22

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑅𝐴𝑚𝑒𝑥 = 0.06% + 1.1046 ∗ 𝑅𝑆&𝑃

𝑟 =𝐶𝑜𝑣𝐴𝑚𝐸𝑥,𝑆&𝑃

𝜎𝐴𝑚𝐸𝑥𝜎𝑆&𝑃= 0.8501

𝑅2 = 𝑟2 = 0.7227

Oups :-o

Is Excel wrong?

R-Squared has different calculation methods

Let’s accept the following regression then as the quality seems pretty good

23

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

How to use this?

• Forecasting? Not really… Both are random variables

• Hedging? Yes but basis risk Yes but careful to the residuals…

Let’s have a try!

In theory, what is the daily result of the hedge? 𝑎

24

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Hedging $1.0M of AmEx Stocks with $1.1046M of S&P

It would have been too easy… Great differences… Why?

Sensitivity to the size of the sample

Heteroscedasticity

25

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Let’s have a similar approach using a proper statistics and econometrics software

• Free • Open Source • Developments shared by developers

> Val<-read.csv(file="C:/Users/Vinz/Desktop/Val.csv",head=TRUE,sep=",")

> summary(Val)

SPX AMEX

Min. :-0.0666344 Min. :-0.0883287

1st Qu.:-0.0069082 1st Qu.:-0.0094580

Median : 0.0010016 Median : 0.0013007

Mean : 0.0001249 Mean : 0.0005891

3rd Qu.: 0.0075235 3rd Qu.: 0.0102923

Max. : 0.0474068 Max. : 0.0710967

Let’s begin with statistical exploration to get familiar with the series and the software

http://www.r-project.org/index.html

26

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

> hist(Val$AMEX, breaks=20, main="Distribution

AMEX Returns")

> sd(Val$AMEX)

[1] 0.01915489

> hist(Val$SPX, breaks=20, main="Distribution

SPXX Returns")

> sd(Val$SPX)

[1] 0.01468776

27

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

These are obvious negatively skewed distributions

> skewness(Val$AMEX)

[1] -0.2453693

> skewness(Val$SPX)

[1] -0.4178701

Reminders

• Negative skew: long left tail, mass on the right, skew to the left • Positive skew: long right tail, mass on the left, skew to the right

𝑆𝐾𝐸𝑊 𝑋 = 𝐸𝑋 − 𝑋

𝜎

3

=𝐸 𝑋 − 𝑋 3

𝐸 𝑋 − 𝑋 2 3/2

28

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

These are obvious leptokurtic distributions

> library(moments)

> kurtosis(Val$AMEX)

[1] 5.770583

> kurtosis(Val$SPX)

[1] 5.671254

Reminders

What is their K? (excess kurtosis)

Subtract 3 to make it relative to the normal distribution…

𝐾𝑈𝑅𝑇 𝑋 = 𝐸𝑋 − 𝑋

𝜎

4

=𝐸 𝑋 − 𝑋 4

𝐸 𝑋 − 𝑋 2 2

29

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Excel function SKEW

R function skewness (package moments)

Quick check: what are the Skewness and Kurtosis of {1,2,-3,0,-2,1,1}?

30

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Excel function KURT

R function kurtosis (package moments)

31

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

By the way, what is the most platykurtic distribution in the nature?

Toss it!

Head = Success = 1 / Tail = Failure = 0

> require(moments)

> library(moments)

> toss<-rbinom(10000000,1,0.5)

> mean(toss)

[1] 0.5001777

> kurtosis(toss)

[1] 1.000001

> kurtosis(toss)-3

[1] -1.999999

> hist(toss, breaks=10,main="Tossing a

coin 10 millions times",xlab="Result

of the trial",ylab="Occurence")

> sum(toss)

[1] 5001777

32

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

50.01777% rate of success: fair or not fair? Trick coin ?

On a perfect 50/50, Kurtosis would be 1, Excess Kurtosis -2: the minimum!

This is a Bernoulli trial

𝐵(𝑛, 𝑝)

𝑝 Mean

SD 𝑝(1 − 𝑝)

Skewness 1 − 2𝑝

𝑝(1 − 𝑝)

Kurtosis 1

𝑝(1 − 𝑝)− 3

Easy to demonstrate if p=0.5 the Kurtosis will be the lowest Bit more complicated to demonstrate it for any distribution

Can be tested later with a Bayesian approach

𝑛 > 1 0 < 𝑝 < 1 with and 𝑝 ∈ ℝ and 𝑛 integer

33

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Back to our series, a good tool is the BoxPlot

boxplot(Val$AMEX,Val$SPX, main="AMEX & S&P BoxPlots",

names=c("AMEX","SPX"),col="blue")

Too Many Outliers!

There should be 2 max To be normal

Fatter tails than the normal distribution

34

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Leptokurtic distributions

Negatively skewed distribution

Are they normal distributions?

Let’s compare them to normal distributions with same standard deviation and mean and make the QQ Plots

35

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

x=seq(-0.2,0.2,length=200)

y1=dnorm(x,mean=mean(Val$AMEX),sd=sd(

Val$AMEX))

hist(Val$AMEX, breaks=100,main="AmEx

Returns / Normal

Distribution",xlab="Return",ylab="Occ

urence")

lines(x,y1,type="l",lwd=3,col="red")

x=seq(-0.2,0.2,length=200)

y1=dnorm(x,mean=mean(Val$SPX),sd=sd(Val$S

PX))

hist(Val$SPX, breaks=20,main="S&P Returns

/ Normal

Distribution",xlab="Return",ylab="Occuren

ce")

lines(x,y1,type="l",lwd=3,col="red")

36

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Excess kurtosis obvious

Fatter and longer tails

Let’s have a look to their CDF through QQPlot

37

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Let’s properly test the normality

> qqnorm(Val$AMEX)

> qqline(Val$AMEX)

> qqnorm(Val$SPX)

> qqline(Val$SPX)

Fatter tails

38

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Can use many tests…

• Kolmogorov-Smirnov • Jarque Bera • Chi Square • Shapiro Wilk

Let’s try Kolmogorov-Smirnov

It compares the distance between the empirical CDF and the CFD of the reference distribution

39

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

x=seq(-4,4,length=1000)

plot(ecdf(Val$AMEX),do.points=FALSE, col="red", lwd=3,

main="Normal Distribution against AMEX - CFD's", xlab="x",

ylab="P(X<=x)")

lines(x,pnorm(x,mean=mean(Val$AMEX),sd=sd(Val$AMEX)),col="blue",t

ype="l",lwd=3)

x=seq(-4,4,length=1000)

plot(ecdf(Val$SPX),do.points=FALSE, col="red", lwd=3,

main="Normal Distribution against S&P - CFD's", xlab="x",

ylab="P(X<=x)")

lines(x,pnorm(x,mean=mean(Val$SPX),sd=sd(Val$SPX)),col="blue",typ

e="l",lwd=3)

40

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

> ks.test(Val$SPX, "pnorm")

One-sample Kolmogorov-

Smirnov test

data: Val$SPX

D = 0.4811, p-value < 2.2e-16

alternative hypothesis: two-sided

> ks.test(Val$AMEX, "pnorm")

One-sample Kolmogorov-Smirnov

test

data: Val$AMEX

D = 0.4742, p-value < 2.2e-16


The 0 hypothesis is the distribution is normal

Do we accept or reject the hypothesis 0 with a 95% confidence interval?

The hypothesis regarding the distributional form is rejected if the test statistic, D, is greater than the critical value obtained from a table

41

vin

zjea

nn

in@

ho

tmai

l.co

m

Sample size: 251 1.36

251= 0.086

Rejected or not?

Rejected! Series aren’t fitting a normal distribution P-Value was giving the answer

42

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Ok, we now know a bit more the 2 series we want to regress

> lm(Val$AMEX~Val$SPX)

Call:

lm(formula = Val$AMEX ~ Val$SPX)

Coefficients:

(Intercept) Val$SPX

0.0004505 1.1096287

plot(Val$SPX,Val$AMEX, main="S&P / AmEx", xlab="S&P", ylab="AmEx",

col="red")

abline(lm(Val$AMEX~Val$SPX), col="blue")

𝑦 = 110.96% ∗ 𝑥 + 0.045%

43

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

> Reg<-lm(Val$AMEX~Val$SPX)

> summary(Reg)

Call:

lm(formula = Val$AMEX ~ Val$SPX)

Residuals:

Min 1Q Median 3Q Max

-0.030387 -0.006072 -0.000114 0.006624 0.027824

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.0004505 0.0006365 0.708 0.48

Val$SPX 1.1096287 0.0434231 25.554 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

1

Residual standard error: 0.01008 on 249 degrees of freedom

Multiple R-squared: 0.7239, Adjusted R-squared: 0.7228

F-statistic: 653 on 1 and 249 DF, p-value: < 2.2e-16

The next important step is no analyse the residuals

They need to be a white noise, you can have a first assessment with quartiles

44

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

layout(matrix(1:4,2,2))

plot(Reg)

45

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

QQ Plot compares the CDF

A perfect fit is a line

Left tail noticeably different

46

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Residuals should be randomly distributed around the 0 horizontal line

To accept or reject the regression you need residuals to be a white noise

Their mean should be 0

You don’t want to see a trend, a dependence

47

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

• Square root of the standardized residuals as a function of the fitted values

• There should be no obvious trend in this plot

Nothing suggesting a white noise

48

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Showing now leverage

Marginal importance of a point in the regression

Far points suggest outlier or poor model

49

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

So do we accept the regression?

Probably not… But let’s check…

Kolmogorov-Smirnov on residuals

Resid<-resid(Reg)

ks.test(Resid, "pnorm")

One-sample Kolmogorov-Smirnov test

data: Resid

D = 0.4889, p-value < 2.2e-16


𝐷 =1.36

251= 0.086

Higher bound value for the H0 to be accepted

Rejected! Regression between 2 different asset are very often poor

Heteroscedasticity

Basis risk if you hedge anyway

50

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Conclusion

OLS

Residuals

Normality

Heteroscedasticity

Download - Financial Econometric Models I

Top Related