applied statistics ii

Applied Statistics Vincent JEANNIN – ESGF 4IFM

Q1 2012

1

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

2

Summary of the session (est. 4.5h) • R Steps by Steps • Reminders of last session • The Value at Risk • OLS & Exploration

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

R Step by Step

3

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

http://www.r-project.org/

Downloadable for free (open source)

4

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Main screen

5

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Menu: File / New Script

6

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Step 1, upload your data

Excel CSV file easy to import

Path C:\Users\vin\Desktop

DATA<-read.csv(file="C:/Users/vin/Desktop/DataFile.csv",header=T)

Note: 4 columns with headers

7

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Run your instruction(s)

8

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

You can call variables anytime you want

9

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

10

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

summary(DATA) Shows a quick summary of the distribution of all variables

SPX SPXr AMEXr AMEX

Min. : 86.43 Min. :-0.0666344 Min. : 97.6 Min. :-0.0883287

1st Qu.: 95.70 1st Qu.:-0.0069082 1st Qu.:104.7 1st Qu.:-0.0094580

Median :100.79 Median : 0.0010016 Median :108.8 Median : 0.0013007

Mean : 99.67 Mean : 0.0001249 Mean :109.4 Mean : 0.0005891

3rd Qu.:103.75 3rd Qu.: 0.0075235 3rd Qu.:114.1 3rd Qu.: 0.0102923

Max. :107.21 Max. : 0.0474068 Max. :123.5 Max. : 0.0710967

Min. 1st Qu. Median Mean 3rd Qu. Max.

86.43 95.70 100.80 99.67 103.80 107.20

summary(DATA$SPX) Shows a quick summary of the distribution of one variable

Careful using the following instructions min(DATA)

max(DATA)

This will consider DATA as one variable

> min(DATA)

[1] -0.08832874

> max(DATA)

[1] 123.4793

> sd(DATA)

SPX SPXr AMEXr AMEX

4.92763551 0.01468776 6.03035318 0.01915489

> mean(DATA)

SPX SPXr AMEXr AMEX

9.967046e+01 1.249283e-04 1.093951e+02 5.890780e-04

Mean & SD

11

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Easy to show histogram

hist(DATA$SPXr, breaks=25, main="Distribution of SPXr", ylab="Freq",

xlab="SPXr", col="blue")

12

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Obvious Excess Kurtosis Obvious Asymmetry

Functions doesn’t exists directly in R…

However some VNP (Very Nice Programmer) built and shared add-in

Package Moments

13

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Menu: Packages / Install Package(s)

• Choose whatever mirror (server) you want • Usually France (Toulouse) is very good as it’s a

University Server with all the packages available

14

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

require(moments)

library(moments)

Once installed, you can load them with the following instructions:

New functions can now be used!

15

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

> require(moments)

> library(moments)

> skewness(DATA)

SPX SPXr AMEXr AMEX

-0.6358029 -0.4178701 0.1876994 -0.2453693

> kurtosis(DATA)

SPX SPXr AMEXr AMEX

2.411177 5.671254 2.078366 5.770583

Btw, you can store any result in a variable

> Kur<-kurtosis(DATA$SPXr)

> Kur

[1] 5.671254

16

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Lost?

Call the help! help(kurtosis)

Reminds you the package

Syntax

Arguments definition

17

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Let’s store a few values

x<-seq(from=SPMean-4*SPSD,to=SPMean+4*SPSD,length=500)

Build a sequence, the x axis

SPMean<-mean(DATA$SPXr)

SPSD<-sd(DATA$SPXr) Package Stats

Build a normal density on these x

Y1<-dnorm(x,mean=SPMean,sd=SPSD) Package Stats

hist(DATA$SPXr, breaks=25,main="S&P Returns / Normal

Distribution",xlab="Returns",ylab="Occurences", col="blue")

Display the histogram

Display on top of it the normal density

lines(x,y1,type="l",lwd=3,col="red")

Package graphics

Package graphics

18

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Positive Excess Kurtosis & Negative Skew

19

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Let’s build a spread Spd<-DATA$SPXr-DATA$AMEX

What is the mean?

Mean is linear 𝐸 𝑎𝑋 + 𝑏𝑌 = 𝑎𝐸 𝑋 + 𝑏𝐸(𝑌)

𝐸 𝑋 − 𝑌 = 𝐸 𝑋 − 𝐸(𝑌)

> mean(DATA$SPXr)-mean(DATA$AMEX)-mean(Spd)

[1] 0

Let’s verify

20

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

What is the standard deviation?

Is standard deviation linear? NO! VAR 𝑎𝑋 + 𝑏𝑌 = 𝑎2𝑉𝐴𝑅 𝑋 + 𝑏2𝐸 𝑌 + 2𝑎𝑏𝐶𝑂𝑉(𝑋, 𝑌)

> (var(DATA$SPXr)+var(DATA$AMEX)-2*cov(DATA$SPXr,DATA$AMEX))^0.5

[1] 0.01019212

> sd(Spd)

[1] 0.01019212

Let’s show the implication in a proper manner

Let’s create a portfolio containing half of each stocks

21

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Portf<-0.5*DATA$SPXr+0.5*DATA$AMEX

plot(sd(DATA$SPXr),mean(DATA$SPXr),col="blue",ylim=c(0,0.0008),xlim=c(0.012

,0.022),ylab="Return",xlab="Vol")

points(sd(DATA$AMEX),mean(DATA$AMEX),col="red")

points(sd(Portf),mean(Portf),col="green")

22

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

The efficient frontier

23

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

points(sd(0.1*DATA$SPXr+0.9*DATA$AMEX),mean(0.1*DATA$SPXr+0.9*DATA$AMEX),c

ol="green")


ol="green")


ol="green")


ol="green")


ol="green")


ol="green")


ol="green")


ol="green")

24

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

plot(DATA$AMEX,DATA$SPXr)

abline(lm(DATA$AMEX~DATA$SPXr), col="blue")

25

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

LM stands for Linear Models

> lm(DATA$AMEX~DATA$SPXr)

Call:

lm(formula = DATA$AMEX ~ DATA$SPXr)

Coefficients:

(Intercept) DATA$SPXr

0.0004505 1.1096287

𝑦 = 1.1096𝑥 + 0.04%

Will be used later for linear regression and hedging

26

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Do you remember what is the most platykurtic distribution in the nature?

Toss Head = Success = 1 / Tail = Failure = 0

> require(moments)

Loading required package: moments

> library(moments)

> toss<-rbinom(100,1,0.5)

> mean(toss)

[1] 0.52

> kurtosis(toss)

[1] 1.006410

> kurtosis(toss)-3

[1] -1.993590

> hist(toss, breaks=10,main="Tossing a

coin 100 times",xlab="Result of the

trial",ylab="Occurence")

> sum(toss)

[1] 52

Let’s test the fairness

100 toss… Else memory issue…

27

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑓 𝑟 𝐻 = ℎ, 𝑇 = 𝑡 =𝑁 + 1 !

ℎ! 𝑡!𝑟ℎ(1 − 𝑟)𝑡

Density of a binomial distribution

Let’s plot this density with

ℎ = 52

𝑡 = 48

𝑁 = 100 N<-100

h<-52

t<-48

r<-seq(0,1,length=500)

y<-

(factorial(N+1)/(factorial(h)*factori

al(t)))*r^h*(1-r)^t

plot(r,y,type="l",col="red",main="Pro

bability density to have 52 head out

100 flips")

28

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

If the probability between 45% and 55% is significant we’ll accept the fairness

What do you think?

29

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

What is the problem with this coin?

Toss it! Head = Success = 1 / Tail = Failure = 0

> require(moments)

Loading required package: moments

> library(moments)

> toss<-rbinom(100,1,0.7)

> mean(toss)

[1] 0.72

> kurtosis(toss)

[1] 1.960317

> kurtosis(toss)-3

[1] -1.039683

> hist(toss, breaks=10,main="Tossing a

coin 100 times",xlab="Result of the

trial",ylab="Occurence")

> sum(toss)

[1] 72

Let’s test the fairness (assuming you don’t know it’s a trick)

100 toss

Obvious fake! Assuming the probability of head is 0.7

30

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

If the probability between 45% and 55% is significant we’ll accept the fairness

N<-100

h<-72

t<-28

r<-seq(0.2,0.8,length=500)

y<-(factorial(N+1)/(factorial(h)*factorial(t)))*r^h*(1-r)^t

plot(r,y,type="l",col="red",main="Probability density or r given 72

head out 100 flips")

Trick coin!

Reminders of last session

31

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Snapshot, 4 moments:

Mean

SD

Skewness

Kurtosis

0

1

0

3

Normal Standard Distribution

32

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑃 𝑋 ≤ 𝜇 = 𝑃 𝑋 ≤ −𝜎 + 𝜇

𝑃 𝑋 ≤ −2 ∗ 𝜎 + 𝜇

𝑃 𝑋 ≤ −3 ∗ 𝜎 + 𝜇

𝑃 𝜇 − 𝜎 ≤ 𝑋 ≤ 𝜇 + 𝜎

𝑃 𝜇 − 2 ∗ 𝜎 ≤ 𝑋 ≤ 𝜇 + 2 ∗ 𝜎

𝑃 𝜇 − 3 ∗ 𝜎 ≤ 𝑋 ≤ 𝜇 + 3 ∗ 𝜎

𝑃 𝑋 ≤ −1.645 ∗ 𝜎 + 𝜇

𝑃 𝑋 ≤ −2.326 ∗ 𝜎 + 𝜇

0.5

= 0.05

= 0.01

= 0.159

= 0.023

= 0.001

= 0.682

= 0.954

= 0.996

33

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑓 𝑥 =1

2𝜋𝜎2𝑒−(𝑥−𝜇)2

2𝜎2 Density

𝑁(𝜇, 𝜎) Notation

𝑃 𝑋 ≤ 𝑥 = 𝜙 𝑥 = 𝑓 𝑥 𝑑𝑥𝑥

−∞

CDF

34

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Let be X~N(1,1.5) Find:

𝑃 𝑋 ≤ 4.75

𝑃 𝑋 ≤ 4.75 =P 𝑌 ≤4.75−1

1.5

With Y~N(0,1)

P 𝑌 ≤ 2.5 =?

Use the table!

P 𝑌 ≤ −2.5 =0.0062

P 𝑋 ≤ 4.75 =0.9938

P 𝑌 ≤ 2.5 =0.9938

ESG

F 4

IFM

Q1

20

12

vi

nzj

ean

nin

@h

otm

ail.c

om

35

>qqnorm(FCOJ$V1)

>qqline(FCOJ$V1)

Fat Tail

QQ Plot

36

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Discrete form 𝑑𝑠𝑡 = 𝜇𝑠𝑡𝑑𝑡 + 𝜎𝑠𝑡 𝑑𝑡𝜀

Geometric Brownian Motion

Based on Stochastic Differential Equation 𝑑𝑠𝑡 = 𝜇𝑠𝑡𝑑𝑡 + 𝜎𝑠𝑡𝑊𝑡

with 𝜀~N(0,1)

B&S

CRR

𝑢 = 𝑒𝜎 𝑡

𝑑 =1

𝑢= 𝑒−𝜎 𝑡

S𝑒𝑟𝑡 = 𝑝𝑆𝑢 + 1 − 𝑝 𝑆𝑑 𝑒𝑟𝑡 = 𝑝𝑢 + 1 − 𝑝 𝑑

𝑝 =𝑒𝑟𝑡 − 𝑑

𝑢 − 𝑑

BV= OpUp ∗ p + OpDown ∗ 1 − p ∗ 𝑒−𝑟𝑡

37

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Greeks Approximation – Taylor Development

𝑑𝐶 = 𝐶 + ∆ ∗ 𝑑𝑆 +1

2∗ 𝛾 ∗ 𝑑𝑆2

+1

6∗ 𝑆𝑝𝑒𝑒𝑑 ∗ 𝑑𝑆3

+1

24∗ 𝐺𝑟𝑒𝑒𝑘4𝑡ℎ ∗ 𝑑𝑆4

etc…

38

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Estimate with a specific confidence interval (usually 95% or 99%) the worth loss possible. In other words, the point is to identify a particular point on the left of the distribution

3 Methods

• Historical • Parametrical • Monte-Carlo

For now, we’ll focus on VaR on one linear asset… FCOJ is back!

The Value at Risk

39

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Historical VaR

• No assumption about the distribution • Easy to implement and calculate • Sensitive to the length of the history • Sensitive to very extreme values

Let’s get back to our FCOJ time series, last price is $150 cents

If we work on returns, we’ve seen the use of the PERCENTILE Excel function

• 1% Percentile is -5.22%, 99% Historical Daily VaR is -$7.83 cents • 5% Percentile is -3.34%, 95% Historical Daily VaR is -$5.00 cents

Works as well on weekly, monthly, quarterly series

40

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Historical VaR

Can be worked as well with prices variations instead of returns but it’s going to be price sensitive! So careful to the bias.

• 1% Percentile in term of price movement is -$8.11 cents • 5% Percentile in term of price movement is -$4.14 cents

41

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Parametric VaR

• Easy to implement and calculate • Assumes a particular shape of the distribution • Not really sensitive to fat tails

FCOJ Mean Return: 0.1364%

𝑃 𝑋 ≤ −1.645 ∗ 𝜎 + 𝜇 = 0.05

𝑃 𝑋 ≤ −2.326 ∗ 𝜎 + 𝜇 = 0.01

FCOJ SD: 2.1664%

We already know:

𝑃 𝑋 ≤ −3.43% = 0.05

Then:

𝑃 𝑋 ≤ −4.90% = 0.01

VaR 95% (-$5.15 cents)

VaR 99% (-$7.35 cents)

42

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Parametric VaR

𝑃 𝑋 ≤ −3.57% = 0.05

𝑃 𝑋 ≤ −5.04% = 0.01

VaR 95% (-$5.36 cents)

VaR 99% (-$8.10 cents)

Very often you assume anyway a 0 mean, therefore:

Lower values than the historical VaR

Problem with leptokurtic distributions, impact of fat tails isn’t strong on the method

43

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Monte Carlo VaR

Based on an assumption of a price process (for example GBM)

• Most efficient method when asset aren’t linear • Tough to implement • Assumes a particular shape of the distribution

Great number of random simulations on the price process to build a distribution and outline the VaR

44

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Monte Carlo VaR

library(sde)

require(sde)

FCOJ<-

read.csv(file="C:/Users/Vinz/Desktop/FCOJStats.csv",head=FALSE,sep=",")

Drift<-mean(FCOJ$V1)

Volat<-sd(FCOJ$V1)

nbsim<-252

Spot<-150

Final<-rep(1,10000)

for(i in 1:100000){

Matr<-GBM(x=Spot,r=Drift, sigma=Volat,N=nbsim)

Final[i]<-Matr[nbsim+1]}

quantile(Final, 0.05)

quantile(Final, 0.01)

Let’s simulate 10,000 GBM, 252 steps and store the final result

Don’t be fooled by the 252, we’re still making a daily simulation: what to change in the code to make it yearly?

45

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Monte Carlo VaR

> quantile(Final, 0.05)

5%

144.93


1%

142.7941

• 95% Daily VaR is -$5.07 cents • 99% Daily VaR is -$7.21 cents

Let’s take off the drift

46

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Monte Carlo VaR


5%

144.7583


1%

142.6412

• 95% Daily VaR is -$5.35 cents • 99% Daily VaR is -$7.36 cents

47

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Comparison

Which is the best?

48

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Going forward on the VaR

All method give different but coherent values

Easy? Yes but…

• We’ve involved one asset only • We’ve involved a linear asset

What about an option?

What about 2 assets?

49

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12


Portfolio scale: what to look at to calculate the VaR?

Big question, is the VaR additive?

NO! Keywords for the future: covariance, correlation, diversification

50

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12


Options: what to look at to calculate the VaR?

4 risk factors: • Underlying price • Interest rate • Volatility • Time

4 answers: • Delta/Gamma approximation knowing the distribution of the underlying • Rho approximation knowing the distribution of the underlying rate • Vega approximation knowing the distribution of implied volatility • Theta (time decay)

Yes but,… Does the underling price/rate/volatility vary independently?

Might be a bit more complicated than expected…

OLS & Exploration

51

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Linear regression model

Minimize the sum of the square vertical distances between the observations and the linear approximation

𝑦 = 𝑓 𝑥 = 𝑎𝑥 + 𝑏

Residual ε

OLS: Ordinary Least Square

52

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Two parameters to estimate: • Intercept α • Slope β

Minimising residuals

𝐸 = 𝜀𝑖2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2

𝑛

𝑖=1

When E is minimal?

When partial derivatives i.r.w. a and b are 0

53

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝐸 = 𝜀𝑖2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2

𝑛

𝑖=1

𝜕𝐸

𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖

2 + 2𝑏𝑥𝑖

𝑛

𝑖=1

= 0

𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2 = 𝑦𝑖2 − 2𝑎𝑥𝑖𝑦𝑖 − 2𝑏𝑦𝑖 + 𝑎2𝑥𝑖

2 + 2𝑎𝑏𝑥𝑖 + 𝑏2

Quick high school reminder if necessary…

−𝑥𝑖𝑦𝑖 + 𝑎𝑥𝑖2 + 𝑏𝑥𝑖

𝑛

𝑖=1

= 0

𝑎 ∗ 𝑥𝑖2

𝑛

𝑖=1

+ 𝑏 ∗ 𝑥𝑖

𝑛

𝑖=1

= 𝑥𝑖𝑦𝑖

𝑛

𝑖=1

𝜕𝐸

𝜕𝑏= −2𝑦𝑖 + 2𝑏 + 2𝑎𝑥𝑖

𝑛

𝑖=1

= 0

−𝑦𝑖 + 𝑏 + 𝑎𝑥𝑖

𝑛

𝑖=1

= 0

𝑎 ∗ 𝑥𝑖

𝑛

𝑖=1

+ 𝑛𝑏 = 𝑦𝑖

𝑛

𝑖=1

54

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑎 ∗ 𝑥𝑖

𝑛

𝑖=1

+ 𝑛𝑏 = 𝑦𝑖

𝑛

𝑖=1

Leads easily to the intercept

𝑎𝑛𝑥 + 𝑛𝑏 = 𝑛𝑦

𝑎𝑥 + 𝑏 = 𝑦

The regression line is going through (𝑥 , 𝑦 )

The distance of this point to the line is 0 indeed

𝜕𝐸

𝜕𝑏

𝑏 = 𝑦 − 𝑎𝑥

55

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝜕𝐸

𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖

2 + 2𝑏𝑥𝑖

𝑛

𝑖=1

= 0

y = 𝑎𝑥 + 𝑦 − 𝑎𝑥

y − 𝑦 = 𝑎(𝑥 − 𝑥 )

𝑏 = 𝑦 − 𝑎𝑥

𝑥𝑖 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 = 0

𝑛

𝑖=1

𝜕𝐸

𝜕𝑏= −2𝑦𝑖 + 2𝑏 + 2𝑎𝑥𝑖 = 0

𝑛

𝑖=1

𝑦𝑖 − 𝑏 − 𝑎𝑥𝑖

𝑛

𝑖=1

= 0

𝑦𝑖 − 𝑦 + 𝑎𝑥 − 𝑎𝑥𝑖 = 0

𝑛

𝑖=1

(𝑦𝑖 − 𝑦 ) − 𝑎(𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

𝑥𝑖 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑦 + 𝑎𝑥 = 0

𝑛

𝑖=1

𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

56

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12


𝑛

𝑖=1

= 0 𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0


𝑛

𝑖=1

= 𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1


𝑛

𝑖=1

− 𝑥 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥

𝑛

𝑖=1

= 0

(𝑥𝑖−𝑥 )(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )

𝑛

𝑖=1

= 0

𝑎 = (𝑥𝑖−𝑥 )(𝑦𝑖 − 𝑦 )𝑛

𝑖=1

(𝑥𝑖−𝑥 )2 𝑛𝑖=1

Finally…

We have

and

57

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑎 = (𝑥𝑖 − 𝑥 )(𝑦𝑖 − 𝑦 )𝑛

𝑖=1

(𝑥𝑖 − 𝑥 )2𝑛𝑖=1

Covariance

Variance

𝑎 =𝐶𝑜𝑣𝑥𝑦

𝜎2𝑥

𝑏 = 𝑦 − 𝑎 𝑥

You can use Excel function INTERCEPT and SLOPE

58

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Calculate the Variances and Covariance of X{1,2,3,3,1,2} and Y{2,3,1,1,3,2}

You can use Excel function VAR.P, COVARIANCE.P and STDEV.P

59

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Let’s asses the quality of the regression

Let’s calculate the correlation coefficient (aka Pearson Product-Moment Correlation Coefficient – PPMCC):

𝑟 =𝐶𝑜𝑣𝑥𝑦

𝜎𝑥𝜎𝑦 Value between -1 and 1

𝑟 = 1 Perfect dependence

𝑟 ~0 No dependence

Give an idea of the dispersion of the scatterplot

You can use Excel function CORREL

60

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

R=0.96

High quality

R=0.62

Poor quality

61

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

What is good quality?

Slightly discretionary…

𝑟 ≥3

2= 0.8666…

If

It’s largely admitted as the threshold for acceptable / poor

62

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

The regression itself introduces a bias

Let’s introduce the coefficient of determination R-Squared

Total Dispersion = Dispersion Regression + Dispersion Residual

Dispersion Regression

Total Dispersion 𝑅2 =

In other words the part of the total dispersion explained by the regression

𝑦𝑖 − 𝑦 2 = 𝑦𝑖 − 𝑦𝑖 2 + 𝑦𝑖 − 𝑦 2

You can use Excel function RSQ

63

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

In a simple linear regression with intercept 𝑅2 = 𝑟2

Is a good correlation coefficient and a good coefficient of determination enough to accept the regression?

Not necessarily!

Residuals need to have no effect, in other word to be a white noise!

64

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

65

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑦 = 7.5

𝑥 = 9

𝑦 = 3 + 0.5𝑥

𝑟 = 0.82

𝑅2 = 0.67

Don’t get fooled by numbers!

For every dataset of the Quarter

Can you say at this stage which regression is the best?

Certainly not those on the right you need a LINEAR dependence

66

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Is any linear regression useless?

Think what you could do to the series

Polynomial transformation, log transformation,…

Else, non linear regressions, but it’s another story

67

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

First application on financial market

S&P / AmEx in 2011

68

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑅𝐴𝑚𝑒𝑥 = 0.06% + 1.1046 ∗ 𝑅𝑆&𝑃

𝑟 =𝐶𝑜𝑣𝐴𝑚𝐸𝑥,𝑆&𝑃

𝜎𝐴𝑚𝐸𝑥𝜎𝑆&𝑃= 0.8501

𝑅2 = 𝑟2 = 0.7227

Oups :-o

Is Excel wrong?

R-Squared has different calculation methods

Let’s accept the following regression then as the quality seems pretty good

69

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

How to use this?

• Forecasting? Not really… Both are random variables

• Hedging? Yes but basis risk Yes but careful to the residuals…

Let’s have a try!

In theory, what is the daily result of the hedge? 𝑎

70

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Hedging $1.0M of AmEx Stocks with $1.1046M of S&P

It would have been too easy… Great differences… Why?

Sensitivity to the size of the sample

Heteroscedasticity Basis Risk

71

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

The purpose was to see if the market as effect an effect on a particular stock

The dependence is obvious but residuals too volatile for any stable application

But attention!

We are looking for causation, not correlation!

Causation implies correlation

Reciprocity is not true!

DON’T BE FOOLED BY PRETTY NUMBERS

Let prove this…

72

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Perfect linear dependence

Excellent R-Squared

Residuals are a white noise

What’s the problem then?

73

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Do you really think fresh lemon reduces car fatalities?

74

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Conclusion

75

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

R

Normal Distribution

VaR

OLS

applied statistics ii

Economy & Finance

ifm q1

histogram esgf

sddata spx spxramexr

file new

distribution of spxr

variable kur kur

h r steps

data excel csv file