correlation n regression
Embed Size (px)
TRANSCRIPT
-
8/2/2019 Correlation n Regression
1/37
17-1
-
8/2/2019 Correlation n Regression
2/37
17-2
CORRELATION ANALYSISAND REGRESSION
ANALYSIS
-
8/2/2019 Correlation n Regression
3/37
17-3
Correlation
Correlation A measure of association between
two numerical variables.
Example (positive correlation)Typically, in the summer as the
temperature increases people arethirstier.
-
8/2/2019 Correlation n Regression
4/37
17-4
Scatter Diagram
Scatter diagrams provide therelationship between two variables ina graphical form
The diagram summarizes the natureof relationship between two variables
Whether the relationship is positiveor negative
The diagram also explains themagnitude of the relationship
-
8/2/2019 Correlation n Regression
5/37
17-5
Scatter Diagrams with varied rvalues
r2 = 1, r2 = 1,
r2 = .81, r2 = 0,
Y
X
Y
X
Y
X
Y
X
r = +1 r = -1
r = +0.
9 r = 0
-
8/2/2019 Correlation n Regression
6/37
17-6
Specific Example
For sevenrandomsummer days, a
person recordedthe temperatureand their waterconsumption,
during a three-hour periodspent outside.
Temperature(F)
Water
Consumption
(ounces)
75 16
83 20
85 25
85 27
92 32
97 48
99 48
-
8/2/2019 Correlation n Regression
7/37
17-7
How would you describe the graph?
-
8/2/2019 Correlation n Regression
8/37
17-8
How strong is the linearrelationship?
-
8/2/2019 Correlation n Regression
9/37
17-9
Correlation Analysis Correlation Analysis is statistical
technique used to measure themagnitude of linear relationshipbetween two variables
Correlation can be used along withregression analysis to determine thenature of the relationship betweenvariables
The prominent correlation coefficientsare
1.The Pearson product moment
correlation coefficient
-
8/2/2019 Correlation n Regression
10/37
17-10
Measuring the Relationship
Pearsons SampleCorrelation
Coefficient, r
measures the direction and
the strength of the linearassociation between twonumerical paired variables.
-
8/2/2019 Correlation n Regression
11/37
17-11
Direction of Association
Positive Correlation NegativeCorrelation
-
8/2/2019 Correlation n Regression
12/37
17-12
Strength of LinearAssociation
rvalue Interpretation1 perfect positive linear
relationship
0 no linear relationship-1 perfect negative linear
relationship
17 13
-
8/2/2019 Correlation n Regression
13/37
17-13
Strength of LinearAssociation
17 14
-
8/2/2019 Correlation n Regression
14/37
17-14
Other Strengths ofAssociation
r value Interpretation0.9 strong association
0.5 moderate association
0.25 weak association
17 15
-
8/2/2019 Correlation n Regression
15/37
17-15
Other Strengths ofAssociation
17 16
-
8/2/2019 Correlation n Regression
16/37
17-16
Product Moment CorrelationThe product moment correlation,
r, summarizes the strength ofassociation between two metric(interval or ratio scaled) variables,say X and Y.
As it was originally proposed by Karl
Pearson, it is also known as thePearson correlation coefficient. It isalso referred to as simple correlation,bivariate correlation, or merely the
correlation coefficient.
17 17
-
8/2/2019 Correlation n Regression
17/37
17-17
Product Moment Correlation
From a sample ofn observations,Xand Y,the product moment correlation, r, can becalculated as:
rvaries between -1.0 and +1.0.
( ) ( )
( ) ( )
1
2 2
1 1
n
i i
i
n n
i i
i i
X X Y Y
r
X X Y Y
=
= =
=
17 18
-
8/2/2019 Correlation n Regression
18/37
17-18
Ad Spending and Corresponding Sales ofRoyal Products
Company Adver t is ingE xp(X )
S ale s(Y )
1 6 10
2 9 12
3 8 12
4 3 4
5 1 0 1 2
6 4 6
7 5 8
8 2 2
9 1 1 1 8
1 0 9 9
1 1 1 0 1 7
1 2 2 2
Ad Ex(in Crores)Sales(inThousands)
17 19
-
8/2/2019 Correlation n Regression
19/37
17-19
Product Moment CorrelationThe correlation coefficient may be calculated as follows:
X= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12= 9.333
Y= (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12= 6.583
= (10 -9.33)(6-6.58) + (12-9.33)(9-6.58)+ (12-9.33)(8-6.58) + (4-9.33)(3-6.58)+ (12-9.33)(10-6.58) + (6-9.33)(4-6.58)+ (8-9.33)(5-6.58) + (2-9.33) (2-6.58)+ (18-9.33)(11-6.58) + (9-9.33)(9-6.58)
+ (17-9.33)(10-6.58) + (2-9.33)(2-6.58)= -0.3886 + 6.4614 + 3.7914 + 19.0814+ 9.1314 + 8.5914 + 2.1014 + 33.5714+ 38.3214 - 0.7986 + 26.2314 + 33.5714= 179.6668
17-20
-
8/2/2019 Correlation n Regression
20/37
17-20
Product Moment Correlation1
= (10-9.33)2 + (12-9.33)2 + (12-9.33)2 + (4-9.33)2
+ (12-9.33)
2
+ (6-9.33)
2
+ (8-9.33)
2
+ (2-9.33)
2
+ (18-9.33)2 + (9-9.33)2 + (17-9.33)2 + (2-9.33)2
= 0.4489 + 7.1289 + 7.1289 + 28.4089+ 7.1289+ 11.0889 + 1.7689 + 53.7289+ 75.1689 + 0.1089 + 58.8289 + 53.7289= 304.6668
= (6-6.58)2 + (9-6.58)2 + (8-6.58)2 + (3-6.58)2+ (10-6.58)2+ (4-6.58)2 + (5-6.58)2 + (2-6.58)2
+ (11-6.58)2 + (9-6.58)2 + (10-6.58)2 + (2-6.58)2
= 0.3364 + 5.8564 + 2.0164 + 12.8164
+ 11.6964 + 6.6564 + 2.4964 + 20.9764+ 19.5364 + 5.8564 + 11.6964 + 20.9764= 120.9168
Thus, r= 179.6668
(304.6668) (120.9168)= 0.9361
17-21
-
8/2/2019 Correlation n Regression
21/37
17 21
Product Moment CorrelationThe correlation coefficient may be calculated as follows:
X= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12= 9.333
Y= (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12= 6.583
= (10 -9.33)(6-6.58) + (12-9.33)(9-6.58)+ (12-9.33)(8-6.58) + (4-9.33)(3-6.58)+ (12-9.33)(10-6.58) + (6-9.33)(4-6.58)+ (8-9.33)(5-6.58) + (2-9.33) (2-6.58)+ (18-9.33)(11-6.58) + (9-9.33)(9-6.58)
+ (17-9.33)(10-6.58) + (2-9.33)(2-6.58)= -0.3886 + 6.4614 + 3.7914 + 19.0814+ 9.1314 + 8.5914 + 2.1014 + 33.5714+ 38.3214 - 0.7986 + 26.2314 + 33.5714= 179.6668
17-22
-
8/2/2019 Correlation n Regression
22/37
17 22
Rank correlation
Researchers often face situationswhere they have to take decisionsbased on data measured on ordinal
scale scales in such casesSpearmans rank correlation isappropriate to relationship between
variables.It can be calculated using following
formula
rs = 1 (( 6 D 2 )/( N(N2 -1))
17-23
h ki f l i i
-
8/2/2019 Correlation n Regression
23/37
17 23
The ranking of televisionModels
Television Models Existing System New system
A 3 1
B 5 5
C 10 9
D 2 3
E 7 2F 6 4
G 4 6
H 1 7
I 8 10J 9 8
17-24
-
8/2/2019 Correlation n Regression
24/37
17 24
Calculation of Rank correlationcoefficient
Television
Models
Existing
System(X)
New
system(Y)
D =(R1 - R2 ) D2
A 3 12 4
B 5 50 0
C 10 91 1D 2 3-1 1
E 7 25 25
F 6 42 4
G 4 6-2 4
H 1 7-6 36
I 8 10-2 4
J 9 81 1
17-25
-
8/2/2019 Correlation n Regression
25/37
17 25
rs = 1 (( 6 D2 )/( N(N2 -1))
= 1-((6X80) /(10(100-1)))
= 1-(480/990)
= 1-0.48
= 0.52
This indicates that there is a positivecorrelation between two variables.This means the both the systems are
giving similar results
17-26
-
8/2/2019 Correlation n Regression
26/37
6
Regression
Regression
Specific statistical methods for
finding the line of best fit for oneresponse (dependent) numericalvariable based on one or more
explanatory (independent)variables.
17-27
-
8/2/2019 Correlation n Regression
27/37
Regression: 3 MainPurposes
To describe (or model)
To predict (or estimate)
To control (or administer)
17-28
-
8/2/2019 Correlation n Regression
28/37
Regression AnalysisRegression analysis examines
associative relationships between ametric dependent variable and oneor more independent variables in thefollowing ways:
Determine whether the independentvariables explain a significantvariation in the dependent variable
Determine how much of thevariation in the dependent variablecan be explained by theindependent variables: strength ofthe relationship.
Predict the values of the dependent
17-29
-
8/2/2019 Correlation n Regression
29/37
Example
Plan an outdoor party.
Estimate number of softdrinks to buy per person, basedon how hot the weather is.
Use Temperature/Water dataand regression.
17-30
-
8/2/2019 Correlation n Regression
30/37
Real Life Applications
Estimating Seasonal Sales forDepartment Stores (Periodic)
17-31
-
8/2/2019 Correlation n Regression
31/37
Real Life Applications
Predicting Student Grades Based onTime Spent Studying
17-32
-
8/2/2019 Correlation n Regression
32/37
Practice Problems
Can the number of pointsscored in a basketball game bepredicted by
The time a player plays inthe game?
By the players height?
17-33
-
8/2/2019 Correlation n Regression
33/37
Types of Regression Models
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
17-34
-
8/2/2019 Correlation n Regression
34/37
Least square method
The equation for regression line assumedby Least Squares method is
Y=a+bx+ei Where ei =Yi-i Where Y is the dependent variable X is the independent variable a is the Y-intercept
b is the slope of the line b=( (n (XY)-( X Y))/ ((n (X 2)-( X) 2) a=Y-bX
17-35
Calculations for determining
-
8/2/2019 Correlation n Regression
35/37
Calculations for determiningconstants a and b
Man Hours(X) Productivity in
units(Y)
XY X2
3.6 9.3 33.48 12.96
4.8 10.2 48.96 23.04
2.4 9.7 23.28 5.76
7.2 11.5 82.8 51.84
6.9 12 82.8 47.61
8.4 14.2 119.28 70.56
10.7 18.6 199.02 114.49
11.2 28.4 318.08 125.44
6.1 13.2 80.52 37.21
7.9 10.8 85.32 62.41
9.5 22.7 215.65 90.25
5.4 12.3 66.42 29.16
X=84.1 Y=172.9 XY=1355.61 X2
17-36
-
8/2/2019 Correlation n Regression
36/37
b=1.768
a=2.01
Y=2.01+1.768X
17-37
-
8/2/2019 Correlation n Regression
37/37
The Strength of Association R2
R2 = ( Explained Variance) / ( TotalVariance)
Total Variance = (ExplainedVariance)+
(UnexplainedVariance)
Explained Variance=(TotalVariance )
(Unexplainedi