applied statistics ii

75
Applied Statistics Vincent JEANNIN – ESGF 4IFM Q1 2012 1 [email protected] ESGF 4IFM Q1 2012

Upload: vincent-jeannin

Post on 29-May-2015

291 views

Category:

Economy & Finance


2 download

DESCRIPTION

Second course of Applied Statistics, MSc level in Buisiness School

TRANSCRIPT

Page 1: Applied Statistics II

Applied Statistics Vincent JEANNIN – ESGF 4IFM

Q1 2012

1

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Page 2: Applied Statistics II

2

Summary of the session (est. 4.5h) • R Steps by Steps • Reminders of last session • The Value at Risk • OLS & Exploration

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Page 3: Applied Statistics II

R Step by Step

3

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

http://www.r-project.org/

Downloadable for free (open source)

Page 4: Applied Statistics II

4

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Main screen

Page 5: Applied Statistics II

5

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Menu: File / New Script

Page 6: Applied Statistics II

6

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Step 1, upload your data

Excel CSV file easy to import

Path C:\Users\vin\Desktop

DATA<-read.csv(file="C:/Users/vin/Desktop/DataFile.csv",header=T)

Note: 4 columns with headers

Page 7: Applied Statistics II

7

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Run your instruction(s)

Page 8: Applied Statistics II

8

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

You can call variables anytime you want

Page 9: Applied Statistics II

9

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Page 10: Applied Statistics II

10

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

summary(DATA) Shows a quick summary of the distribution of all variables

SPX SPXr AMEXr AMEX

Min. : 86.43 Min. :-0.0666344 Min. : 97.6 Min. :-0.0883287

1st Qu.: 95.70 1st Qu.:-0.0069082 1st Qu.:104.7 1st Qu.:-0.0094580

Median :100.79 Median : 0.0010016 Median :108.8 Median : 0.0013007

Mean : 99.67 Mean : 0.0001249 Mean :109.4 Mean : 0.0005891

3rd Qu.:103.75 3rd Qu.: 0.0075235 3rd Qu.:114.1 3rd Qu.: 0.0102923

Max. :107.21 Max. : 0.0474068 Max. :123.5 Max. : 0.0710967

Min. 1st Qu. Median Mean 3rd Qu. Max.

86.43 95.70 100.80 99.67 103.80 107.20

summary(DATA$SPX) Shows a quick summary of the distribution of one variable

Careful using the following instructions min(DATA)

max(DATA)

This will consider DATA as one variable

> min(DATA)

[1] -0.08832874

> max(DATA)

[1] 123.4793

> sd(DATA)

SPX SPXr AMEXr AMEX

4.92763551 0.01468776 6.03035318 0.01915489

> mean(DATA)

SPX SPXr AMEXr AMEX

9.967046e+01 1.249283e-04 1.093951e+02 5.890780e-04

Mean & SD

Page 11: Applied Statistics II

11

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Easy to show histogram

hist(DATA$SPXr, breaks=25, main="Distribution of SPXr", ylab="Freq",

xlab="SPXr", col="blue")

Page 12: Applied Statistics II

12

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Obvious Excess Kurtosis Obvious Asymmetry

Functions doesn’t exists directly in R…

However some VNP (Very Nice Programmer) built and shared add-in

Package Moments

Page 13: Applied Statistics II

13

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Menu: Packages / Install Package(s)

• Choose whatever mirror (server) you want • Usually France (Toulouse) is very good as it’s a

University Server with all the packages available

Page 14: Applied Statistics II

14

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

require(moments)

library(moments)

Once installed, you can load them with the following instructions:

New functions can now be used!

Page 15: Applied Statistics II

15

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

> require(moments)

> library(moments)

> skewness(DATA)

SPX SPXr AMEXr AMEX

-0.6358029 -0.4178701 0.1876994 -0.2453693

> kurtosis(DATA)

SPX SPXr AMEXr AMEX

2.411177 5.671254 2.078366 5.770583

Btw, you can store any result in a variable

> Kur<-kurtosis(DATA$SPXr)

> Kur

[1] 5.671254

Page 16: Applied Statistics II

16

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Lost?

Call the help! help(kurtosis)

Reminds you the package

Syntax

Arguments definition

Page 17: Applied Statistics II

17

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Let’s store a few values

x<-seq(from=SPMean-4*SPSD,to=SPMean+4*SPSD,length=500)

Build a sequence, the x axis

SPMean<-mean(DATA$SPXr)

SPSD<-sd(DATA$SPXr) Package Stats

Build a normal density on these x

Y1<-dnorm(x,mean=SPMean,sd=SPSD) Package Stats

hist(DATA$SPXr, breaks=25,main="S&P Returns / Normal

Distribution",xlab="Returns",ylab="Occurences", col="blue")

Display the histogram

Display on top of it the normal density

lines(x,y1,type="l",lwd=3,col="red")

Package graphics

Package graphics

Page 18: Applied Statistics II

18

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Positive Excess Kurtosis & Negative Skew

Page 19: Applied Statistics II

19

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Let’s build a spread Spd<-DATA$SPXr-DATA$AMEX

What is the mean?

Mean is linear 𝐸 𝑎𝑋 + 𝑏𝑌 = 𝑎𝐸 𝑋 + 𝑏𝐸(𝑌)

𝐸 𝑋 − 𝑌 = 𝐸 𝑋 − 𝐸(𝑌)

> mean(DATA$SPXr)-mean(DATA$AMEX)-mean(Spd)

[1] 0

Let’s verify

Page 20: Applied Statistics II

20

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

What is the standard deviation?

Is standard deviation linear? NO! VAR 𝑎𝑋 + 𝑏𝑌 = 𝑎2𝑉𝐴𝑅 𝑋 + 𝑏2𝐸 𝑌 + 2𝑎𝑏𝐶𝑂𝑉(𝑋, 𝑌)

> (var(DATA$SPXr)+var(DATA$AMEX)-2*cov(DATA$SPXr,DATA$AMEX))^0.5

[1] 0.01019212

> sd(Spd)

[1] 0.01019212

Let’s show the implication in a proper manner

Let’s create a portfolio containing half of each stocks

Page 21: Applied Statistics II

21

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Portf<-0.5*DATA$SPXr+0.5*DATA$AMEX

plot(sd(DATA$SPXr),mean(DATA$SPXr),col="blue",ylim=c(0,0.0008),xlim=c(0.012

,0.022),ylab="Return",xlab="Vol")

points(sd(DATA$AMEX),mean(DATA$AMEX),col="red")

points(sd(Portf),mean(Portf),col="green")

Page 22: Applied Statistics II

22

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

The efficient frontier

Page 23: Applied Statistics II

23

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

points(sd(0.1*DATA$SPXr+0.9*DATA$AMEX),mean(0.1*DATA$SPXr+0.9*DATA$AMEX),c

ol="green")

points(sd(0.2*DATA$SPXr+0.8*DATA$AMEX),mean(0.2*DATA$SPXr+0.8*DATA$AMEX),c

ol="green")

points(sd(0.3*DATA$SPXr+0.7*DATA$AMEX),mean(0.3*DATA$SPXr+0.7*DATA$AMEX),c

ol="green")

points(sd(0.4*DATA$SPXr+0.6*DATA$AMEX),mean(0.4*DATA$SPXr+0.6*DATA$AMEX),c

ol="green")

points(sd(0.6*DATA$SPXr+0.4*DATA$AMEX),mean(0.6*DATA$SPXr+0.4*DATA$AMEX),c

ol="green")

points(sd(0.7*DATA$SPXr+0.3*DATA$AMEX),mean(0.7*DATA$SPXr+0.3*DATA$AMEX),c

ol="green")

points(sd(0.8*DATA$SPXr+0.2*DATA$AMEX),mean(0.8*DATA$SPXr+0.2*DATA$AMEX),c

ol="green")

points(sd(0.9*DATA$SPXr+0.1*DATA$AMEX),mean(0.9*DATA$SPXr+0.1*DATA$AMEX),c

ol="green")

Page 24: Applied Statistics II

24

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

plot(DATA$AMEX,DATA$SPXr)

abline(lm(DATA$AMEX~DATA$SPXr), col="blue")

Page 25: Applied Statistics II

25

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

LM stands for Linear Models

> lm(DATA$AMEX~DATA$SPXr)

Call:

lm(formula = DATA$AMEX ~ DATA$SPXr)

Coefficients:

(Intercept) DATA$SPXr

0.0004505 1.1096287

𝑦 = 1.1096𝑥 + 0.04%

Will be used later for linear regression and hedging

Page 26: Applied Statistics II

26

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Do you remember what is the most platykurtic distribution in the nature?

Toss Head = Success = 1 / Tail = Failure = 0

> require(moments)

Loading required package: moments

> library(moments)

> toss<-rbinom(100,1,0.5)

> mean(toss)

[1] 0.52

> kurtosis(toss)

[1] 1.006410

> kurtosis(toss)-3

[1] -1.993590

> hist(toss, breaks=10,main="Tossing a

coin 100 times",xlab="Result of the

trial",ylab="Occurence")

> sum(toss)

[1] 52

Let’s test the fairness

100 toss… Else memory issue…

Page 27: Applied Statistics II

27

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑓 𝑟 𝐻 = ℎ, 𝑇 = 𝑡 =𝑁 + 1 !

ℎ! 𝑡!𝑟ℎ(1 − 𝑟)𝑡

Density of a binomial distribution

Let’s plot this density with

ℎ = 52

𝑡 = 48

𝑁 = 100 N<-100

h<-52

t<-48

r<-seq(0,1,length=500)

y<-

(factorial(N+1)/(factorial(h)*factori

al(t)))*r^h*(1-r)^t

plot(r,y,type="l",col="red",main="Pro

bability density to have 52 head out

100 flips")

Page 28: Applied Statistics II

28

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

If the probability between 45% and 55% is significant we’ll accept the fairness

What do you think?

Page 29: Applied Statistics II

29

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

What is the problem with this coin?

Toss it! Head = Success = 1 / Tail = Failure = 0

> require(moments)

Loading required package: moments

> library(moments)

> toss<-rbinom(100,1,0.7)

> mean(toss)

[1] 0.72

> kurtosis(toss)

[1] 1.960317

> kurtosis(toss)-3

[1] -1.039683

> hist(toss, breaks=10,main="Tossing a

coin 100 times",xlab="Result of the

trial",ylab="Occurence")

> sum(toss)

[1] 72

Let’s test the fairness (assuming you don’t know it’s a trick)

100 toss

Obvious fake! Assuming the probability of head is 0.7

Page 30: Applied Statistics II

30

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

If the probability between 45% and 55% is significant we’ll accept the fairness

N<-100

h<-72

t<-28

r<-seq(0.2,0.8,length=500)

y<-(factorial(N+1)/(factorial(h)*factorial(t)))*r^h*(1-r)^t

plot(r,y,type="l",col="red",main="Probability density or r given 72

head out 100 flips")

Trick coin!

Page 31: Applied Statistics II

Reminders of last session

31

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Snapshot, 4 moments:

Mean

SD

Skewness

Kurtosis

0

1

0

3

Normal Standard Distribution

Page 32: Applied Statistics II

32

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑃 𝑋 ≤ 𝜇 = 𝑃 𝑋 ≤ −𝜎 + 𝜇

𝑃 𝑋 ≤ −2 ∗ 𝜎 + 𝜇

𝑃 𝑋 ≤ −3 ∗ 𝜎 + 𝜇

𝑃 𝜇 − 𝜎 ≤ 𝑋 ≤ 𝜇 + 𝜎

𝑃 𝜇 − 2 ∗ 𝜎 ≤ 𝑋 ≤ 𝜇 + 2 ∗ 𝜎

𝑃 𝜇 − 3 ∗ 𝜎 ≤ 𝑋 ≤ 𝜇 + 3 ∗ 𝜎

𝑃 𝑋 ≤ −1.645 ∗ 𝜎 + 𝜇

𝑃 𝑋 ≤ −2.326 ∗ 𝜎 + 𝜇

0.5

= 0.05

= 0.01

= 0.159

= 0.023

= 0.001

= 0.682

= 0.954

= 0.996

Page 33: Applied Statistics II

33

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑓 𝑥 =1

2𝜋𝜎2𝑒−(𝑥−𝜇)2

2𝜎2 Density

𝑁(𝜇, 𝜎) Notation

𝑃 𝑋 ≤ 𝑥 = 𝜙 𝑥 = 𝑓 𝑥 𝑑𝑥𝑥

−∞

CDF

Page 34: Applied Statistics II

34

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Let be X~N(1,1.5) Find:

𝑃 𝑋 ≤ 4.75

𝑃 𝑋 ≤ 4.75 =P 𝑌 ≤4.75−1

1.5

With Y~N(0,1)

P 𝑌 ≤ 2.5 =?

Use the table!

P 𝑌 ≤ −2.5 =0.0062

P 𝑋 ≤ 4.75 =0.9938

P 𝑌 ≤ 2.5 =0.9938

Page 35: Applied Statistics II

ESG

F 4

IFM

Q1

20

12

vi

nzj

ean

nin

@h

otm

ail.c

om

35

>qqnorm(FCOJ$V1)

>qqline(FCOJ$V1)

Fat Tail

QQ Plot

Page 36: Applied Statistics II

36

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Discrete form 𝑑𝑠𝑡 = 𝜇𝑠𝑡𝑑𝑡 + 𝜎𝑠𝑡 𝑑𝑡𝜀

Geometric Brownian Motion

Based on Stochastic Differential Equation 𝑑𝑠𝑡 = 𝜇𝑠𝑡𝑑𝑡 + 𝜎𝑠𝑡𝑊𝑡

with 𝜀~N(0,1)

B&S

CRR

𝑢 = 𝑒𝜎 𝑡

𝑑 =1

𝑢= 𝑒−𝜎 𝑡

S𝑒𝑟𝑡 = 𝑝𝑆𝑢 + 1 − 𝑝 𝑆𝑑 𝑒𝑟𝑡 = 𝑝𝑢 + 1 − 𝑝 𝑑

𝑝 =𝑒𝑟𝑡 − 𝑑

𝑢 − 𝑑

BV= OpUp ∗ p + OpDown ∗ 1 − p ∗ 𝑒−𝑟𝑡

Page 37: Applied Statistics II

37

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Greeks Approximation – Taylor Development

𝑑𝐶 = 𝐶 + ∆ ∗ 𝑑𝑆 +1

2∗ 𝛾 ∗ 𝑑𝑆2

+1

6∗ 𝑆𝑝𝑒𝑒𝑑 ∗ 𝑑𝑆3

+1

24∗ 𝐺𝑟𝑒𝑒𝑘4𝑡ℎ ∗ 𝑑𝑆4

etc…

Page 38: Applied Statistics II

38

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Estimate with a specific confidence interval (usually 95% or 99%) the worth loss possible. In other words, the point is to identify a particular point on the left of the distribution

3 Methods

• Historical • Parametrical • Monte-Carlo

For now, we’ll focus on VaR on one linear asset… FCOJ is back!

The Value at Risk

Page 39: Applied Statistics II

39

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Historical VaR

• No assumption about the distribution • Easy to implement and calculate • Sensitive to the length of the history • Sensitive to very extreme values

Let’s get back to our FCOJ time series, last price is $150 cents

If we work on returns, we’ve seen the use of the PERCENTILE Excel function

• 1% Percentile is -5.22%, 99% Historical Daily VaR is -$7.83 cents • 5% Percentile is -3.34%, 95% Historical Daily VaR is -$5.00 cents

Works as well on weekly, monthly, quarterly series

Page 40: Applied Statistics II

40

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Historical VaR

Can be worked as well with prices variations instead of returns but it’s going to be price sensitive! So careful to the bias.

• 1% Percentile in term of price movement is -$8.11 cents • 5% Percentile in term of price movement is -$4.14 cents

Page 41: Applied Statistics II

41

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Parametric VaR

• Easy to implement and calculate • Assumes a particular shape of the distribution • Not really sensitive to fat tails

FCOJ Mean Return: 0.1364%

𝑃 𝑋 ≤ −1.645 ∗ 𝜎 + 𝜇 = 0.05

𝑃 𝑋 ≤ −2.326 ∗ 𝜎 + 𝜇 = 0.01

FCOJ SD: 2.1664%

We already know:

𝑃 𝑋 ≤ −3.43% = 0.05

Then:

𝑃 𝑋 ≤ −4.90% = 0.01

VaR 95% (-$5.15 cents)

VaR 99% (-$7.35 cents)

Page 42: Applied Statistics II

42

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Parametric VaR

𝑃 𝑋 ≤ −3.57% = 0.05

𝑃 𝑋 ≤ −5.04% = 0.01

VaR 95% (-$5.36 cents)

VaR 99% (-$8.10 cents)

Very often you assume anyway a 0 mean, therefore:

Lower values than the historical VaR

Problem with leptokurtic distributions, impact of fat tails isn’t strong on the method

Page 43: Applied Statistics II

43

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Monte Carlo VaR

Based on an assumption of a price process (for example GBM)

• Most efficient method when asset aren’t linear • Tough to implement • Assumes a particular shape of the distribution

Great number of random simulations on the price process to build a distribution and outline the VaR

Page 44: Applied Statistics II

44

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Monte Carlo VaR

library(sde)

require(sde)

FCOJ<-

read.csv(file="C:/Users/Vinz/Desktop/FCOJStats.csv",head=FALSE,sep=",")

Drift<-mean(FCOJ$V1)

Volat<-sd(FCOJ$V1)

nbsim<-252

Spot<-150

Final<-rep(1,10000)

for(i in 1:100000){

Matr<-GBM(x=Spot,r=Drift, sigma=Volat,N=nbsim)

Final[i]<-Matr[nbsim+1]}

quantile(Final, 0.05)

quantile(Final, 0.01)

Let’s simulate 10,000 GBM, 252 steps and store the final result

Don’t be fooled by the 252, we’re still making a daily simulation: what to change in the code to make it yearly?

Page 45: Applied Statistics II

45

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Monte Carlo VaR

> quantile(Final, 0.05)

5%

144.93

> quantile(Final, 0.01)

1%

142.7941

• 95% Daily VaR is -$5.07 cents • 99% Daily VaR is -$7.21 cents

Let’s take off the drift

Page 46: Applied Statistics II

46

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Monte Carlo VaR

> quantile(Final, 0.05)

5%

144.7583

> quantile(Final, 0.01)

1%

142.6412

• 95% Daily VaR is -$5.35 cents • 99% Daily VaR is -$7.36 cents

Page 47: Applied Statistics II

47

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Comparison

Which is the best?

Page 48: Applied Statistics II

48

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Going forward on the VaR

All method give different but coherent values

Easy? Yes but…

• We’ve involved one asset only • We’ve involved a linear asset

What about an option?

What about 2 assets?

Page 49: Applied Statistics II

49

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Going forward on the VaR

Portfolio scale: what to look at to calculate the VaR?

Big question, is the VaR additive?

NO! Keywords for the future: covariance, correlation, diversification

Page 50: Applied Statistics II

50

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Going forward on the VaR

Options: what to look at to calculate the VaR?

4 risk factors: • Underlying price • Interest rate • Volatility • Time

4 answers: • Delta/Gamma approximation knowing the distribution of the underlying • Rho approximation knowing the distribution of the underlying rate • Vega approximation knowing the distribution of implied volatility • Theta (time decay)

Yes but,… Does the underling price/rate/volatility vary independently?

Might be a bit more complicated than expected…

Page 51: Applied Statistics II

OLS & Exploration

51

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Linear regression model

Minimize the sum of the square vertical distances between the observations and the linear approximation

𝑦 = 𝑓 𝑥 = 𝑎𝑥 + 𝑏

Residual ε

OLS: Ordinary Least Square

Page 52: Applied Statistics II

52

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Two parameters to estimate: • Intercept α • Slope β

Minimising residuals

𝐸 = 𝜀𝑖2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2

𝑛

𝑖=1

When E is minimal?

When partial derivatives i.r.w. a and b are 0

Page 53: Applied Statistics II

53

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝐸 = 𝜀𝑖2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2

𝑛

𝑖=1

𝜕𝐸

𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖

2 + 2𝑏𝑥𝑖

𝑛

𝑖=1

= 0

𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2 = 𝑦𝑖2 − 2𝑎𝑥𝑖𝑦𝑖 − 2𝑏𝑦𝑖 + 𝑎2𝑥𝑖

2 + 2𝑎𝑏𝑥𝑖 + 𝑏2

Quick high school reminder if necessary…

−𝑥𝑖𝑦𝑖 + 𝑎𝑥𝑖2 + 𝑏𝑥𝑖

𝑛

𝑖=1

= 0

𝑎 ∗ 𝑥𝑖2

𝑛

𝑖=1

+ 𝑏 ∗ 𝑥𝑖

𝑛

𝑖=1

= 𝑥𝑖𝑦𝑖

𝑛

𝑖=1

𝜕𝐸

𝜕𝑏= −2𝑦𝑖 + 2𝑏 + 2𝑎𝑥𝑖

𝑛

𝑖=1

= 0

−𝑦𝑖 + 𝑏 + 𝑎𝑥𝑖

𝑛

𝑖=1

= 0

𝑎 ∗ 𝑥𝑖

𝑛

𝑖=1

+ 𝑛𝑏 = 𝑦𝑖

𝑛

𝑖=1

Page 54: Applied Statistics II

54

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑎 ∗ 𝑥𝑖

𝑛

𝑖=1

+ 𝑛𝑏 = 𝑦𝑖

𝑛

𝑖=1

Leads easily to the intercept

𝑎𝑛𝑥 + 𝑛𝑏 = 𝑛𝑦

𝑎𝑥 + 𝑏 = 𝑦

The regression line is going through (𝑥 , 𝑦 )

The distance of this point to the line is 0 indeed

𝜕𝐸

𝜕𝑏

𝑏 = 𝑦 − 𝑎𝑥

Page 55: Applied Statistics II

55

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝜕𝐸

𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖

2 + 2𝑏𝑥𝑖

𝑛

𝑖=1

= 0

y = 𝑎𝑥 + 𝑦 − 𝑎𝑥

y − 𝑦 = 𝑎(𝑥 − 𝑥 )

𝑏 = 𝑦 − 𝑎𝑥

𝑥𝑖 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 = 0

𝑛

𝑖=1

𝜕𝐸

𝜕𝑏= −2𝑦𝑖 + 2𝑏 + 2𝑎𝑥𝑖 = 0

𝑛

𝑖=1

𝑦𝑖 − 𝑏 − 𝑎𝑥𝑖

𝑛

𝑖=1

= 0

𝑦𝑖 − 𝑦 + 𝑎𝑥 − 𝑎𝑥𝑖 = 0

𝑛

𝑖=1

(𝑦𝑖 − 𝑦 ) − 𝑎(𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

𝑥𝑖 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑦 + 𝑎𝑥 = 0

𝑛

𝑖=1

𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

Page 56: Applied Statistics II

56

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0 𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

− 𝑥 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥

𝑛

𝑖=1

= 0

(𝑥𝑖−𝑥 )(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

𝑎 = (𝑥𝑖−𝑥 )(𝑦𝑖 − 𝑦 )𝑛

𝑖=1

(𝑥𝑖−𝑥 )2 𝑛𝑖=1

Finally…

We have

and

Page 57: Applied Statistics II

57

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑎 = (𝑥𝑖 − 𝑥 )(𝑦𝑖 − 𝑦 )𝑛

𝑖=1

(𝑥𝑖 − 𝑥 )2𝑛𝑖=1

Covariance

Variance

𝑎 =𝐶𝑜𝑣𝑥𝑦

𝜎2𝑥

𝑏 = 𝑦 − 𝑎 𝑥

You can use Excel function INTERCEPT and SLOPE

Page 58: Applied Statistics II

58

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Calculate the Variances and Covariance of X{1,2,3,3,1,2} and Y{2,3,1,1,3,2}

You can use Excel function VAR.P, COVARIANCE.P and STDEV.P

Page 59: Applied Statistics II

59

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Let’s asses the quality of the regression

Let’s calculate the correlation coefficient (aka Pearson Product-Moment Correlation Coefficient – PPMCC):

𝑟 =𝐶𝑜𝑣𝑥𝑦

𝜎𝑥𝜎𝑦 Value between -1 and 1

𝑟 = 1 Perfect dependence

𝑟 ~0 No dependence

Give an idea of the dispersion of the scatterplot

You can use Excel function CORREL

Page 60: Applied Statistics II

60

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

R=0.96

High quality

R=0.62

Poor quality

Page 61: Applied Statistics II

61

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

What is good quality?

Slightly discretionary…

𝑟 ≥3

2= 0.8666…

If

It’s largely admitted as the threshold for acceptable / poor

Page 62: Applied Statistics II

62

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

The regression itself introduces a bias

Let’s introduce the coefficient of determination R-Squared

Total Dispersion = Dispersion Regression + Dispersion Residual

Dispersion Regression

Total Dispersion 𝑅2 =

In other words the part of the total dispersion explained by the regression

𝑦𝑖 − 𝑦 2 = 𝑦𝑖 − 𝑦𝑖 2 + 𝑦𝑖 − 𝑦 2

You can use Excel function RSQ

Page 63: Applied Statistics II

63

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

In a simple linear regression with intercept 𝑅2 = 𝑟2

Is a good correlation coefficient and a good coefficient of determination enough to accept the regression?

Not necessarily!

Residuals need to have no effect, in other word to be a white noise!

Page 64: Applied Statistics II

64

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Page 65: Applied Statistics II

65

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑦 = 7.5

𝑥 = 9

𝑦 = 3 + 0.5𝑥

𝑟 = 0.82

𝑅2 = 0.67

Don’t get fooled by numbers!

For every dataset of the Quarter

Can you say at this stage which regression is the best?

Certainly not those on the right you need a LINEAR dependence

Page 66: Applied Statistics II

66

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Is any linear regression useless?

Think what you could do to the series

Polynomial transformation, log transformation,…

Else, non linear regressions, but it’s another story

Page 67: Applied Statistics II

67

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

First application on financial market

S&P / AmEx in 2011

Page 68: Applied Statistics II

68

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑅𝐴𝑚𝑒𝑥 = 0.06% + 1.1046 ∗ 𝑅𝑆&𝑃

𝑟 =𝐶𝑜𝑣𝐴𝑚𝐸𝑥,𝑆&𝑃

𝜎𝐴𝑚𝐸𝑥𝜎𝑆&𝑃= 0.8501

𝑅2 = 𝑟2 = 0.7227

Oups :-o

Is Excel wrong?

R-Squared has different calculation methods

Let’s accept the following regression then as the quality seems pretty good

Page 69: Applied Statistics II

69

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

How to use this?

• Forecasting? Not really… Both are random variables

• Hedging? Yes but basis risk Yes but careful to the residuals…

Let’s have a try!

In theory, what is the daily result of the hedge? 𝑎

Page 70: Applied Statistics II

70

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Hedging $1.0M of AmEx Stocks with $1.1046M of S&P

It would have been too easy… Great differences… Why?

Sensitivity to the size of the sample

Heteroscedasticity Basis Risk

Page 71: Applied Statistics II

71

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

The purpose was to see if the market as effect an effect on a particular stock

The dependence is obvious but residuals too volatile for any stable application

But attention!

We are looking for causation, not correlation!

Causation implies correlation

Reciprocity is not true!

DON’T BE FOOLED BY PRETTY NUMBERS

Let prove this…

Page 72: Applied Statistics II

72

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Perfect linear dependence

Excellent R-Squared

Residuals are a white noise

What’s the problem then?

Page 73: Applied Statistics II

73

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Do you really think fresh lemon reduces car fatalities?

Page 74: Applied Statistics II

74

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Page 75: Applied Statistics II

Conclusion

75

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

R

Normal Distribution

VaR

OLS