chapter 5_ handouts.pdf
TRANSCRIPT
-
8/14/2019 CHAPTER 5_ handouts.pdf
1/9
QMT412 Pn. Sanizah's Notes 02/05/2013
1
CHAPTER 5
CORRELATION AND REGRESSION
Introduction
Correlation and Regression
Scatter Plot/Diagram
Coefficient of Correlation
Simple Linear Regression
1
Learning objectives
Explain the concept of correlation Calculate Pearsons correlation coefficientand
interpret the results
Calculate Spearmans rank correlation forqualitative and quantitative data and interpret theresults
Determine the regression equation for a set of data andinterpret the equation
Use the regression equation to forecast
2
Introduction
Correlation: Do you have a relationship?
(Between two quantitative variables, x&y)
If you have a relationship:
1) What is the direction? (+ or -)
2) What is the strength (r: -1 to +1)
#Correlationmeasures LINEARrelationship.
If you have a significant correlation:
How well can you predicta subjects y-score ifyou know their x-score?
3
Correlation & Regression Regressionand correlationare two concepts used to
describe the relationshipbetween variables.
Correlationis a statistical method used todetermineif a relationship between variablesexists.
Regressionis the statistical method used todescribethe nature of the relationship betweenvariables - that is, positive or negative, linear ornonlinear.
4
-
8/14/2019 CHAPTER 5_ handouts.pdf
2/9
QMT412 Pn. Sanizah's Notes 02/05/2013
2
Independent and Dependent Variable
In this chapter, we want to study the relationshipbetween 2 variables only.
Independent variable x
Dependent variable - y
For example:
Expenditure (x) andRevenue (y)
Price (x) andsales (y)
Number of days absent (x) andCGPA (y)
Age of a person (x) andhis/her blood pressure (y)
5
Independent and Dependent Variable
Also called predictoror
explanatoryor manipulatedvariable the variable in regression that can
be controlled or manipulated
Independentvariable (x)
Also called the responsevariable the variable that cannot be
controlled or manipulated
Dependentvariable (y)
6
Dependent(x) Vs. Independent(y)
Intentionally manipulated
Controlled
Vary at known rate
Cause
Intentionally left alone
Measured
Vary at unknown rate
Effect
7
Example: What affects a students
arrival to class?Variables: Type of School
FSPPP, Business School, FSKM
Type of Student Gender? CGPA?
Class Time Morning, Afternoon, Evening
Mode of Transportation Motorcycle, Car, UiTM bus
8
-
8/14/2019 CHAPTER 5_ handouts.pdf
3/9
QMT412 Pn. Sanizah's Notes 02/05/2013
3
Scatter Plot (scatter diagram) A scatter plot is used to showthe relationship
between two variables.
The scatter plot is a visual way to describe the nature ofthe relationshipbetween the independentvariable (x) and the dependent variable (y).
Interpreting scatter plots:
Positivelinear relationship
Negativelinear relationship
Nonlinearrelationship
No relationship
9
Scatter Plot Examples
y
x
y
x
y
y
x
x
Linear relationships Nonlinear (Curvilinear)
relationships
Positive
Negative
10
Scatter Plot Examples
y
x
y
x
y
y
x
x
Strong relationships Weak relationships
(continued)
11
Scatter Plot Examples
y
x
y
x
No relationship
(continued)
12
-
8/14/2019 CHAPTER 5_ handouts.pdf
4/9
QMT412 Pn. Sanizah's Notes 02/05/2013
4
Example 1 (pg. 134) Draw a scatter diagram for the following data and state
the type of relationship between the variables.
13
x 1 3 5 7 9 13 17
y 0 5 11 14 19 22 30
Correlation [email protected]
14
Correlation coefficient measures the strength and directionof
a LINEAR relationship between a pair of random variables.
The POPULATIONcorrelation coefficient (rho) measures the
strength of the association between the variables.
The samplecorrelation coefficient ror s is an estimate of and is used to measure the strength of the linear relationship
in the sample observations.
Correlation Coefficient r or s indicates
strengthof relationship (strong, weak, or none)
directionof relationship
positive (direct) variables move in same direction
negative (inverse) variables move in oppositedirections
rranges in value from1.0 to +1.0.
Very Strong No Strong VeryStrong Relationship Strong
-1.0 -0.8 -0.5 0.0 +0.5 +0.8 +1.0
Moderate Weak Weak ModerateNegative Positive
15
-vePerfect
+vePerfect
Do Variables Relate to One Another?
Is teachers pay related to performance?
Is exercise related to illness?
Is CO2 related to global warming?
Is TV viewing related to shoe size?
Is shoe size related to height?
Is height related to IQ?
Is cigarettes smoked per day related to
lung capacity?
Positive
Negative
PositiveZero
16
-
8/14/2019 CHAPTER 5_ handouts.pdf
5/9
QMT412 Pn. Sanizah's Notes 02/05/2013
5
Positive correlation
17
Two variables move in the same direction
Negative correlation
18
Two variables tend to go in the opposite direction
19
Methods for CalculatingCorrelation Coefficient, rors
Pearson Product-Moment Correlation
Coefficient
Spearman RankCorrelation Coefficient
Pearson Coefficient of Correlation
Both variables must be quantitative and normallydistributed.
Calculation for r:
20
-
8/14/2019 CHAPTER 5_ handouts.pdf
6/9
QMT412 Pn. Sanizah's Notes 02/05/2013
6
Example 2
Refer to Example 1. Compute Pearson coefficientof correlation and interpret the result.
21
The Spearman rank correlation coefficient
Spearmans rank correlation coefficient is a measure of associationbetween two variables that are at least of ordinal scale (suitable forqualitative data).
Can also be applied to quantitative databut the variables must firstsbe rankedand then only it is calculated based on these rankings.
where:
d = difference between two ranks
n = number of pairs of observations
NOTE: Be careful with tied observations22
How to calculate Spearmans rank
correlation coefficient?
1. List each set of scores in a column.
2. Rank the two sets of scores.
3. Place the appropriate rank beside each score.
4. Head a column dand determine the differencein rank for
each pair of scores.
(Note:Sum of the dcolumn should always be 0)
5. Square each number in the dcolumn and sum the
values (d 2).
6. Use the formula to calculate the correlation coefficient.
23
Refer Example 5 pg. 140
StudentSubject d d
2
Statistics Computer
A 1 3
B 2 1
C 3 4
D 4 2
E 5 5
24
Five students A, B, C, D, E are rankedin two subjects, statistics and
computer programming with the following results.
Calculate the Spearmans rank correlation coefficient.
)1(61
2
2
nnd
s
-
8/14/2019 CHAPTER 5_ handouts.pdf
7/9
QMT412 Pn. Sanizah's Notes 02/05/2013
7
Refer Example 6 pg. 141x y Rank of x,
Rx
Rank of y,Ry
d=Rx-Ry d2
6.0 80
6.2 80
6.5 78
6.8 75
7.0 70
7.2 60
7.5 60
7.8 55
8.0 50
8.2 48
8.4 45
8.7 40
25
The Regression Line Regression indicates the degree to which the variation in one
variable X, is related to or can be explained by the variation in
another variable Y
Once you know there is a significant linear correlation, youcan write an equation describing the relationship between
thexandyvariables.
This equation is called the line of regression or least squares
line.
The equation of a line may be written as:
where bis the slope of the line and ais the y-intercept.
26
Regression line Creates a line of best fit running through the data
Analyze the relationship between the two quantitativevariables,Xand Y
a-intercept: if x= 0 is in the range, then ais the meanof the distribution
of the response y, when x= 0; if x= 0 is not in the range, then ahas no practical
interpretation
b-slope: change in the mean of the distribution of the response
produced by a unit change inx
Dependentvariable
Independentvariable
27
The Least Squares Regression Line
The values of a and bin the regression line y = a+ bxcan be calculated by using the least squares method (ormethod of least squares), given by the following formula:
28
-
8/14/2019 CHAPTER 5_ handouts.pdf
8/9
QMT412 Pn. Sanizah's Notes 02/05/2013
8
x y8 78
2 92
5 90
12 58
15 43
9 746 81
Absences
Final
Grade
Example 3: Application
959085807570656055
4540
50
0 2 4 6 8 10 12 14 16
FinalGrade
X
Absences
29
Calculate aand b.
Write the equation of the
line of regression with
x= number of absences
and y= final grade.
The line of regression is:
60848464
8100
3364
1849
54766561
624184
450
696
645
666486
57 516 3751 579 39898
1 8 78
2 2 92
3 5 90
4 12 585 15 43
6 9 747 6 81
644
25
144
225
8136
xy x2 y2x y
30
0 2 4 6 8 10 12 14 16
40455055606570758085
9095
Absences
Final
Grade
The line of regression is: y = -3.924x + 105.667
Note that the point = (8.143, 73.714) is on the line.
The Line of Regression31
The regression line can be used to predict values of y
for values of xfalling within the range of the data.
The regression equation for number of times absent and final
grade is:
Use this equation to predictthe expected grade for a student with
(a) 3 absences (b) 12 absences
Predicting yValues
(a) y =3.924(3) + 105.667 = 93.895
(b) y =3.924(12) + 105.667 = 58.579
y =
3.924x
+ 105.667
32
-
8/14/2019 CHAPTER 5_ handouts.pdf
9/9
QMT412 Pn. Sanizah's Notes 02/05/2013
9
Coefficient of Determination
The coefficient of determination, r2,measures thestrength of the association and is the ratio of explained
variation in yto the total variation in y.
Interpretation :proportion of the variation iny that is explained by the variation in x
33
The correlation coefficient of number of times absent and final
grade is r=0.975. The coefficient of determination is
r2 = (0.975)2 = 0.9506.
Interpretation:About 95.06%of the variation in final grades can be
explained by the number of times a student is absent.
Note:The other 4.94%is unexplained and can be due to sampling
error or other variables such as intelligence, amount of time
studied, etc.
Recall Example 3
34