correlation and regression

24
Elementary Statistics Correlation and Regression

Upload: rhona-berg

Post on 04-Jan-2016

28 views

Category:

Documents


1 download

DESCRIPTION

Correlation and Regression. Elementary Statistics. Correlation. A relationship between two variables. Explanatory (Independent) Variable. Response (Dependent) Variable. y. x. Hours of Training. Number of Accidents. Shoe Size. Height. Cigarettes smoked per day. Lung Capacity. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Correlation and Regression

Elementary Statistics

Correlation and Regression

Page 2: Correlation and Regression

Correlation

What type of relationship exists between the two variables and is the correlation significant?

x y

Cigarettes smoked per day

Score on SAT

Height

Hours of Training

Explanatory(Independent) Variable

Response(Dependent) Variable

A relationship between two variables

Number of Accidents

Shoe Size Height

Lung Capacity

Grade Point Average

IQ

Page 3: Correlation and Regression

Negative Correlation–as x increases, y decreases

x = hours of trainingy = number of accidents

Scatter Plots and Types of Correlation

60

50

40

30

20

10

0

0 2 4 6 8 10 12 14 16 18 20

Hours of Training

Acc

iden

ts

Page 4: Correlation and Regression

Positive Correlation–as x increases, y increases

x = SAT scorey = GPA

GP

AScatter Plots and Types of Correlation

4.003.753.50

3.002.752.502.252.00

1.501.75

3.25

300 350 400 450 500 550 600 650 700 750 800

Math SAT

Page 5: Correlation and Regression

No linear correlation

x = height y = IQ

Scatter Plots and Types of Correlation

160150140130120110

1009080

60 64 68 72 76 80

Height

IQ

Page 6: Correlation and Regression

Correlation CoefficientA measure of the strength and direction of a linear

relationship between two variables

The range of r is from –1 to 1.

If r is close to 1 there is a

strong positive

correlation.

If r is close to –1 there is a strong negative correlation.

If r is close to 0 there is no

linear correlation.

–1 0 1

Page 7: Correlation and Regression

x y 8 78 2 92 5 9012 5815 43 9 74 6 81

AbsencesFinalGrade

Application

959085807570656055

4540

50

0 2 4 6 8 10 12 14 16

Fin

al G

rade

XAbsences

Page 8: Correlation and Regression

6084846481003364184954766561

624 184450696 645666486

57 516 3751 579 39898

1 8 78 2 2 92 3 5 90 4 12 58 5 15 43 6 9 74 7 6 81

64 4 25144225 81 36

xy x2 y2

Computation of rn x y

Page 9: Correlation and Regression

r is the correlation coefficient for the sample. The correlation coefficient for the population is (rho).

The sampling distribution for r is a t-distribution with n – 2 d.f.

Standardized teststatistic

For a two tail test for significance:

For left tail and right tail to testnegative or positive significance:

Hypothesis Test for Significance

(The correlation is not significant)

(The correlation is significant)

Page 10: Correlation and Regression

A t-distribution with 5 degrees of freedom

Test of Significance

You found the correlation between the number of times absent and a final grade r = –0.975. There were seven pairs of data.Test the significance of this correlation. Use = 0.01.

1. Write the null and alternative hypothesis.

2. State the level of significance.

3. Identify the sampling distribution.

(The correlation is not significant)

(The correlation is significant)

= 0.01

Page 11: Correlation and Regression

t0 4.032–4.032

Rejection Regions

Critical Values ± t0

4. Find the critical value.

5. Find the rejection region.

6. Find the test statistic.

Page 12: Correlation and Regression

t0–4.032 –4.032

t = –9.811 falls in the rejection region. Reject the null hypothesis.

There is a significant correlation between the number of times absent and final grades.

7. Make your decision.

8. Interpret your decision.

Page 13: Correlation and Regression

The equation of a line may be written as y = mx + b where m is the slope of the line and b is the y-intercept.

The line of regression is:

The slope m is:

The y-intercept is:

Once you know there is a significant linear correlation, you can write an equation describing the relationship between the x and y variables. This equation is called the line of regression or least squares line.

The Line of Regression

Page 14: Correlation and Regression

180

190

200

210

220

230

240

250

260

1.5 2.0 2.5 3.0Ad $

= a residual

(xi,yi) = a data pointre

venu

e= a point on the line with the same x-value

Page 15: Correlation and Regression

Calculate m and b.

Write the equation of the line of regression with x = number of absences and y = final grade.

The line of regression is: = –3.924x + 105.667

6084846481003364184954766561

624 184450696 645666486

57 516 3751 579 39898

1 8 78 2 2 92 3 5 90 4 12 58 5 15 43 6 9 74 7 6 81

64 4 25144225 81 36

xy x2 y2x y

Page 16: Correlation and Regression

0 2 4 6 8 10 12 14 16

404550556065707580859095

Absences

Fin

alG

rade

m = –3.924 and b = 105.667

The line of regression is:

Note that the point = (8.143, 73.714) is on the line.

The Line of Regression

Page 17: Correlation and Regression

The regression line can be used to predict values of y for values of x falling within the range of the data.

The regression equation for number of times absent and final grade is:

Use this equation to predict the expected grade for a student with

(a) 3 absences (b) 12 absences

(a)

(b)

Predicting y Values

= –3.924(3) + 105.667 = 93.895

= –3.924(12) + 105.667 = 58.579

= –3.924x + 105.667

Page 18: Correlation and Regression

The coefficient of determination, r2, is the ratio of explained variation in y to the total variation in y.

The correlation coefficient of number of times absent and final grade is r = –0.975. The coefficient of determination is r2 = (–0.975)2 = 0.9506.

Interpretation: About 95% of the variation in final grades can be explained by the number of times a student is absent. The other 5% is unexplained and can be due to sampling error or other variables such as intelligence, amount of time studied, etc.

The Coefficient of Determination

Page 19: Correlation and Regression

The Standard Error of Estimate, se,is the standard

deviation of the observed yi values about the predicted

value.

The Standard Error of Estimate

Page 20: Correlation and Regression

1 8 78 74.275 13.8756 2 2 92 97.819 33.8608 3 5 90 86.047 15.6262 4 12 58 58.579 0.3352 5 15 43 46.807 14.4932 6 9 74 70.351 13.3152 7 6 81 82.123 1.2611

92.767

= 4.307

x y

Calculate for each x.

The Standard Error of Estimate

Page 21: Correlation and Regression

Given a specific linear regression equation and x0, a specific value of x, a c-prediction interval for y is:

where

Use a t-distribution with n – 2 degrees of freedom.

The point estimate is and E is the maximum error of estimate.

Prediction Intervals

Page 22: Correlation and Regression

Construct a 90% confidence interval for a final grade when a student has been absent 6 times.

1. Find the point estimate:

The point (6, 82.123) is the point on the regression line with x-coordinate of 6.

Application

Page 23: Correlation and Regression

Construct a 90% confidence interval for a final grade when a student has been absent 6 times.

2. Find E,

At the 90% level of confidence, the maximum error of estimate is 9.438.

Application

Page 24: Correlation and Regression

Construct a 90% confidence interval for a final grade when a student has been absent 6 times.

When x = 6, the 90% confidence interval is from 72.685 to 91.586.

3. Find the endpoints.

Application

– E = 82.123 – 9.438 = 72.685

+ E = 82.123 + 9.438 = 91.561

72.685 < y < 91.561