corelation and regression
TRANSCRIPT
-
8/12/2019 Corelation and Regression
1/33
Elementary StatisticsLarson Farber
9 Correlation and Regression
-
8/12/2019 Corelation and Regression
2/33
orrelation
Section 9.1
-
8/12/2019 Corelation and Regression
3/33
Correlation
What type of relationship exists between the twovariables and is the correlation significant?
x y
Cigarettes smoked per day
Score on SATHeight
Hours of Training
Explanatory(Independent) Variable
Response(Dependent) Variable
A relationship between two variables
Number of Accidents
Shoe Size Height
Lung Capacity
Grade Point AverageIQ
-
8/12/2019 Corelation and Regression
4/33
Negative Correlationasxincreases, ydecreases
x= hours of trainingy= number of accidents
Scatter Plots and Types of Correlation
60
50
40
30
20
10
0
0 2 4 6 8 10 12 14 16 18 20
Hours of Training
Accidents
-
8/12/2019 Corelation and Regression
5/33
Positive Correlationasxincreases,y increases
x= SAT scorey= GPA
GPA
Scatter Plots and Types of Correlation
4.003.753.50
3.002.752.502.252.00
1.501.75
3.25
300 350 400 450 500 550 600 650 700 750 800
Math SAT
-
8/12/2019 Corelation and Regression
6/33
No linear correlation
x= height y= IQ
Scatter Plots and Types of Correlation
160
150
140
130120
110
100
90
8060 64 68 72 76 80
Height
IQ
-
8/12/2019 Corelation and Regression
7/33
Correlation Coefficient
A measure of the strength and direction of a linearrelationship between two variables
The range of ris from 1 to 1.
If ris close to1 there is a
strongpositive
correlation.
If ris close to 1there is a strongnegativecorrelation.
If ris close to0 there is nolinear
correlation.
1 0 1
-
8/12/2019 Corelation and Regression
8/33
x y8 78
2 925 90
12 5815 43
9 746 81
AbsencesFinalGrade
Application
959085807570656055
45
40
50
0 2 4 6 8 10 12 14 16
FinalGrad
e
X
Absences
-
8/12/2019 Corelation and Regression
9/33
608484648100
3364184954766561
624184450
696645666486
57 516 3751 579 39898
1 8 782 2 923 5 90
4 12 585 15 436 9 747 6 81
644
25
1442258136
xy x2
y2
Computation of r
x y
-
8/12/2019 Corelation and Regression
10/33
ris the correlation coefficient for the sample. Thecorrelation coefficient for the population is (rho).
The sampling distribution for ris a t-distribution with n
2 d.f.
Standardized teststatistic
For a two tail test for significance:
For left tail and right tail to testnegative or positive significance:
Hypothesis Test for Significance
(The correlation is not significant)
(The correlation is significant)
-
8/12/2019 Corelation and Regression
11/33
A t-distribution with 5 degrees of freedom
Test of Significance
You found the correlation between the number of times absentand a final grade r= 0.975. There were seven pairs of
data.Test the significance of this correlation. Use = 0.01.
1. Write the null and alternative hypothesis.
2. State the level of significance.
3. Identify the sampling distribution.
(The correlation is not significant)
(The correlation is significant)
= 0.01
-
8/12/2019 Corelation and Regression
12/33
t0 4.0324.032
Rejection Regions
Critical Values t0
4. Find the critical value.
5. Find the rejection region.
6. Find the test statistic.
-
8/12/2019 Corelation and Regression
13/33
t0
4.032 4.032
t= 9.811 falls in the rejection region. Reject the null hypothesis.
Thereisa significant correlation between the number oftimes absent and final grades.
7. Make your decision.
8. Interpret your decision.
-
8/12/2019 Corelation and Regression
14/33
Linear Regression
Section 9.2
-
8/12/2019 Corelation and Regression
15/33
The equation of a line may be written as y= mx+ bwhere mis the slope of the line and bis the y-intercept.
The line of regression is:
The slope mis:
The y-intercept is:
Once you know there is a significant linear correlation,you can write an equation describing the relationshipbetween thexandyvariables. This equation is called theline of regressionor least squares line.
The Line of Regression
-
8/12/2019 Corelation and Regression
16/33
180
190
200
210
220
230240
250
260
1.5 2.0 2.5 3.0Ad $
= a residual
(xi,yi) = a data point
revenue
= a point on the line with the same x-value
-
8/12/2019 Corelation and Regression
17/33
Calculate mand b.
Write the equation of theline of regression with
x= number of absencesand y= final grade.
The line of regression is: =
3.924x+ 105.667
60848464
81003364184954766561
624184
450696645666486
57 516 3751 579 39898
1 8 782 2 92
3 5 904 12 585 15 436 9 747 6 81
644
251442258136
xy x2 y2x y
-
8/12/2019 Corelation and Regression
18/33
0 2 4 6 8 10 12 14 16
4045
50556065707580859095
Absences
FinalGra
de
m= 3.924 and b= 105.667
The line of regression is:
Note that the point = (8.143, 73.714) is on the line.
The Line of Regression
-
8/12/2019 Corelation and Regression
19/33
The regression line can be used to predict values of yforvalues ofxfalling within the range of the data.
The regression equation for number of times absent and final grade is:
Use this equation to predict the expected grade for a student with
(a) 3 absences (b) 12 absences
(a)
(b)
Predicting yValues
=
3.924(3) + 105.667 = 93.895
= 3.924(12) + 105.667 = 58.579
= 3.924x+ 105.667
-
8/12/2019 Corelation and Regression
20/33
Measures ofRegression andorrelation
Section 9.3
-
8/12/2019 Corelation and Regression
21/33
The coefficient of determination, r2,is the ratio of explainedvariation in yto the total variation in y.
The correlation coefficient of number of times absent andfinal grade is r= 0.975. The coefficient of determinationis r2 = (0.975)2 = 0.9506.
Interpretation: About 95% of the variation in final gradescan be explained by the number of times a student isabsent. The other 5% is unexplained and can be due tosampling error or other variables such as intelligence,
amount of time studied, etc.
The Coefficient of Determination
-
8/12/2019 Corelation and Regression
22/33
The Standard Error of Estimate,se,is the standard
deviation of the observed yivalues about the predicted
value.
The Standard Error of Estimate
-
8/12/2019 Corelation and Regression
23/33
1 8 78 74.275 13.87562 2 92 97.819 33.86083 5 90 86.047 15.6262
4 12 58 58.579 0.33525 15 43 46.807 14.49326 9 74 70.351 13.31527 6 81 82.123 1.2611
92.767
= 4.307
x y
Calculate for eachx.
The Standard Error of Estimate
-
8/12/2019 Corelation and Regression
24/33
Given a specific linear regression equation andx0, a specific valueofx, a c-prediction interval for yis:
where
Use a t-distribution with n 2 degrees of freedom.
The point estimate is andEis the maximum error of estimate.
Prediction Intervals
-
8/12/2019 Corelation and Regression
25/33
Construct a 90% confidence interval for a final grade when astudent has been absent 6 times.
1. Find the point estimate:
The point (6, 82.123) is the point on the regression line withx-coordinate of 6.
Application
-
8/12/2019 Corelation and Regression
26/33
Construct a 90% confidence interval for a final grade whena student has been absent 6 times.
2. Find E,
At the 90% level of confidence, the maximumerror of estimate is 9.438.
Application
-
8/12/2019 Corelation and Regression
27/33
Construct a 90% confidence interval for a final gradewhen a student has been absent 6 times.
Whenx= 6, the 90% confidenceinterval is from 72.685 to 91.586.
3. Find the endpoints.
Application
E= 82.123
9.438 = 72.685
+ E= 82.123 + 9.438 = 91.561
72.685 < y< 91.561
-
8/12/2019 Corelation and Regression
28/33
Regression Analysis
The regression equation is
y= 106
3.92x
Predictor Coef StDev T PConstant 105.668 3.655 28.91 0.000
Minitab Output
x 3.9241 0.4019 9.76 0.000
S = 4.307 R-Sq = 95.0% R-Sq(adj) = 94.0%
-
8/12/2019 Corelation and Regression
29/33
Multiple Regression
Section 9.4
-
8/12/2019 Corelation and Regression
30/33
Absence IQ Grade
More Explanatory Variables
8
2
5
12
15
9
6
115
135
126
110
105
120
125
78
92
90
58
43
74
81
-
8/12/2019 Corelation and Regression
31/33
Regression Analysis
The regression equation is
Grade = 52.7
2.65 absence + 0.357 IQ
Predictor Coef StDev T P
Constant
AbsenceIQ
Minitab Output
S = 4.603 R-Sq = 95.4% R-Sq(adj) = 93.2%
0.5730.277
0.571
0.611.26
0.62
86.1102.111
0.580
52.7202.652
0.357
-
8/12/2019 Corelation and Regression
32/33
Interpretation
The regression equation is
Grade = 52.7 2.65 absence + 0.357 IQ
When other variables are 0, the grade is 52.7.
If IQ is held constant, each time there is one moreabsence the predicted grade will decrease by 2.65points.
If number of absences is held constant, and IQ isincreased by one point the predicted grade will increaseby 0.357 points.
-
8/12/2019 Corelation and Regression
33/33
The regression equation isGrade = 52.7 2.65 absence + 0.357 IQ
Predicting the Response Variable
Use the regression equation to predict a grade when astudent is absent 5 times and has an IQ of 125.
Grade = 52.7 2.65 absence + 0.357 IQ
Grade = 52.7 2.65(5) + 0.357(125) = 80.075 (about 80)
Use the regression equation to predict a grade when astudent is absent 9 times and has an IQ of 120.
Grade = 52.7 2.65 absence + 0.357 IQ
Grade = 52.7 2.65(9) + 0.357(120) = 71.69 (about 72)