lecture 5 correlation and regression dr peter wheale

Post on 15-Jan-2016

238 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 5Correlation and Regression

Dr Peter Wheale

A Scatter Plot of Monthly Returns

Interpretation of Correlation Coefficient

Correlation Interpretation coefficient (r)

(r) r = +1 perfect positive correlation 0 < r < +1 positive linear relationship r = 0 no linear relationship r = -1 perfect negative correlation -1 < r < 0 negative linear relationship

Scatter Plots and Correlation

Covariance of Rates of Return

n

t,1 1 t,2 2t 1

1,2

R R R Rcov

n 1Example: Calculate the covariance between the returns on the two stocks indicated below:

Covariance Using Historical Data

R2 = 0.07

Σ = 0.0154

Cov = 0.0154 / 2 = 0.0077 R1 = 0.05

Sample Correlation Coefficient

Correlation, ρ, is a standardized measure of covariance and is bounded by +1 and –1

1,21,2

1 2

Cov

1,2

0.00510.662

0.07 0.11

Example: The covariance of returns on two assets is 0.0051 and σ1= 7% and σ2= 11%. Calculate ρ1,2.

Testing H0: Correlation = 0

The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is:

Degrees of freedom is n – 2

Example

Data:n = 10r = 0.475 Determine if the sample correlation is significant at the

5% level of significance.t = 0.475 (8)0.5 / [1 – (0.475)2] 0.5

= 1.3435 / 0.88 = 1.5267 The two-tailed critical t – values at a 5% level of

significance with df = 8 (n-2) are found to be +/- 2.306.

Since -2.306≤ 1.5267≤ 2.306, the null hypothesis cannot be rejected, i.e. correlation between variables X and Y is not significantly different from zero at a 5% significance level.

Testing H0: Correlation = 0

The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is:

Degrees of freedom is n – 2

Testing H0: Correlation = 0

The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is:

Degrees of freedom is n – 2

Testing H0: Correlation = 0

The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is:

Degrees of freedom is n – 2

Linear Regression• Dependent variable: you are trying to

explain changes in this variable• Independent variable: the variable being

used to explain the changes in the dependent variable

• Example: You want to predict housing starts using mortgage interest rates:

Independent variable = mortgage interest ratesDependent variable = housing starts

Regression Equation

y-Intercept

Slope Coefficient

Independent Variable

Dependent Variable

Error Term

Assumptions of Linear Regression

• Linear relation between dependent and independent variables

• Independent variable uncorrelated with error term

• Expected value of error term is zero• Variance of the error term is constant• Error term is independently distributed• Error term is normally distributed

Estimated Regression Coefficients

Estimated regression line is:

Y-InterceptSlope

Estimating the slope coefficient

b1 = the cov(X,Y) / var(X)

Example Compute the slope coefficient and intercept term for

the least squares regression equation using the following information:

Where X – Xmean multiplied by Y-Ymean = 445, and X – Xmean squared = 374.50. The sample means of X and Y = 25 and 75, respectively.

The slope coefficient, b1 = 445/374.5 = 1.188.

The intercept term, b0 = 75 – 1.188 (25) = 45.3.

Calculating the Standard Error of the Estimate (SEE)

• SEE measures the accuracy of the prediction from a regression equation It is the standard dev. of the error term The lower the SEE, the greater the accuracy

SSESEE

n – 2

where:

SSE sum of squared errors

Interpreting the Coefficient of Determination (R2)

• R2 measures the percentage of the variation in the dependent variable that can be explained by the independent variable

• An R2 of 0.25 means the independent variable explains 25% of the variation in the dependent variable

Caution: You cannot conclude causation

Calculating the Coefficient of Determination (R2)

• For simple linear regression, R2 is the correlation coefficient (r) squared

Example: Correlation coefficient between X and Y, (r) = 0.50

Coefficient of determination = 0.502 = 0.25

Coefficient of Determination (R2)

R2 can also be calculated with SST and SSRSS Total = SS Regression + SS Error

Total variation = explained variation + unexplained variation

2 SSR SST – SSE SSE explained variationR = = =1– =

SST SST SST total variation

top related