correlation n regression
Embed Size (px)
TRANSCRIPT

8/2/2019 Correlation n Regression
1/37
171

8/2/2019 Correlation n Regression
2/37
172
CORRELATION ANALYSISAND REGRESSION
ANALYSIS

8/2/2019 Correlation n Regression
3/37
173
Correlation
Correlation A measure of association between
two numerical variables.
Example (positive correlation)Typically, in the summer as the
temperature increases people arethirstier.

8/2/2019 Correlation n Regression
4/37
174
Scatter Diagram
Scatter diagrams provide therelationship between two variables ina graphical form
The diagram summarizes the natureof relationship between two variables
Whether the relationship is positiveor negative
The diagram also explains themagnitude of the relationship

8/2/2019 Correlation n Regression
5/37
175
Scatter Diagrams with varied rvalues
r2 = 1, r2 = 1,
r2 = .81, r2 = 0,
Y
X
Y
X
Y
X
Y
X
r = +1 r = 1
r = +0.
9 r = 0

8/2/2019 Correlation n Regression
6/37
176
Specific Example
For sevenrandomsummer days, a
person recordedthe temperatureand their waterconsumption,
during a threehour periodspent outside.
Temperature(F)
Water
Consumption
(ounces)
75 16
83 20
85 25
85 27
92 32
97 48
99 48

8/2/2019 Correlation n Regression
7/37
177
How would you describe the graph?

8/2/2019 Correlation n Regression
8/37
178
How strong is the linearrelationship?

8/2/2019 Correlation n Regression
9/37
179
Correlation Analysis Correlation Analysis is statistical
technique used to measure themagnitude of linear relationshipbetween two variables
Correlation can be used along withregression analysis to determine thenature of the relationship betweenvariables
The prominent correlation coefficientsare
1.The Pearson product moment
correlation coefficient

8/2/2019 Correlation n Regression
10/37
1710
Measuring the Relationship
Pearsons SampleCorrelation
Coefficient, r
measures the direction and
the strength of the linearassociation between twonumerical paired variables.

8/2/2019 Correlation n Regression
11/37
1711
Direction of Association
Positive Correlation NegativeCorrelation

8/2/2019 Correlation n Regression
12/37
1712
Strength of LinearAssociation
rvalue Interpretation1 perfect positive linear
relationship
0 no linear relationship1 perfect negative linear
relationship
17 13

8/2/2019 Correlation n Regression
13/37
1713
Strength of LinearAssociation
17 14

8/2/2019 Correlation n Regression
14/37
1714
Other Strengths ofAssociation
r value Interpretation0.9 strong association
0.5 moderate association
0.25 weak association
17 15

8/2/2019 Correlation n Regression
15/37
1715
Other Strengths ofAssociation
17 16

8/2/2019 Correlation n Regression
16/37
1716
Product Moment CorrelationThe product moment correlation,
r, summarizes the strength ofassociation between two metric(interval or ratio scaled) variables,say X and Y.
As it was originally proposed by Karl
Pearson, it is also known as thePearson correlation coefficient. It isalso referred to as simple correlation,bivariate correlation, or merely the
correlation coefficient.
17 17

8/2/2019 Correlation n Regression
17/37
1717
Product Moment Correlation
From a sample ofn observations,Xand Y,the product moment correlation, r, can becalculated as:
rvaries between 1.0 and +1.0.
( ) ( )
( ) ( )
1
2 2
1 1
n
i i
i
n n
i i
i i
X X Y Y
r
X X Y Y
=
= =
=
17 18

8/2/2019 Correlation n Regression
18/37
1718
Ad Spending and Corresponding Sales ofRoyal Products
Company Adver t is ingE xp(X )
S ale s(Y )
1 6 10
2 9 12
3 8 12
4 3 4
5 1 0 1 2
6 4 6
7 5 8
8 2 2
9 1 1 1 8
1 0 9 9
1 1 1 0 1 7
1 2 2 2
Ad Ex(in Crores)Sales(inThousands)
17 19

8/2/2019 Correlation n Regression
19/37
1719
Product Moment CorrelationThe correlation coefficient may be calculated as follows:
X= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12= 9.333
Y= (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12= 6.583
= (10 9.33)(66.58) + (129.33)(96.58)+ (129.33)(86.58) + (49.33)(36.58)+ (129.33)(106.58) + (69.33)(46.58)+ (89.33)(56.58) + (29.33) (26.58)+ (189.33)(116.58) + (99.33)(96.58)
+ (179.33)(106.58) + (29.33)(26.58)= 0.3886 + 6.4614 + 3.7914 + 19.0814+ 9.1314 + 8.5914 + 2.1014 + 33.5714+ 38.3214  0.7986 + 26.2314 + 33.5714= 179.6668
1720

8/2/2019 Correlation n Regression
20/37
1720
Product Moment Correlation1
= (109.33)2 + (129.33)2 + (129.33)2 + (49.33)2
+ (129.33)
2
+ (69.33)
2
+ (89.33)
2
+ (29.33)
2
+ (189.33)2 + (99.33)2 + (179.33)2 + (29.33)2
= 0.4489 + 7.1289 + 7.1289 + 28.4089+ 7.1289+ 11.0889 + 1.7689 + 53.7289+ 75.1689 + 0.1089 + 58.8289 + 53.7289= 304.6668
= (66.58)2 + (96.58)2 + (86.58)2 + (36.58)2+ (106.58)2+ (46.58)2 + (56.58)2 + (26.58)2
+ (116.58)2 + (96.58)2 + (106.58)2 + (26.58)2
= 0.3364 + 5.8564 + 2.0164 + 12.8164
+ 11.6964 + 6.6564 + 2.4964 + 20.9764+ 19.5364 + 5.8564 + 11.6964 + 20.9764= 120.9168
Thus, r= 179.6668
(304.6668) (120.9168)= 0.9361
1721

8/2/2019 Correlation n Regression
21/37
17 21
Product Moment CorrelationThe correlation coefficient may be calculated as follows:
X= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12= 9.333
Y= (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12= 6.583
= (10 9.33)(66.58) + (129.33)(96.58)+ (129.33)(86.58) + (49.33)(36.58)+ (129.33)(106.58) + (69.33)(46.58)+ (89.33)(56.58) + (29.33) (26.58)+ (189.33)(116.58) + (99.33)(96.58)
+ (179.33)(106.58) + (29.33)(26.58)= 0.3886 + 6.4614 + 3.7914 + 19.0814+ 9.1314 + 8.5914 + 2.1014 + 33.5714+ 38.3214  0.7986 + 26.2314 + 33.5714= 179.6668
1722

8/2/2019 Correlation n Regression
22/37
17 22
Rank correlation
Researchers often face situationswhere they have to take decisionsbased on data measured on ordinal
scale scales in such casesSpearmans rank correlation isappropriate to relationship between
variables.It can be calculated using following
formula
rs = 1 (( 6 D 2 )/( N(N2 1))
1723
h ki f l i i

8/2/2019 Correlation n Regression
23/37
17 23
The ranking of televisionModels
Television Models Existing System New system
A 3 1
B 5 5
C 10 9
D 2 3
E 7 2F 6 4
G 4 6
H 1 7
I 8 10J 9 8
1724

8/2/2019 Correlation n Regression
24/37
17 24
Calculation of Rank correlationcoefficient
Television
Models
Existing
System(X)
New
system(Y)
D =(R1  R2 ) D2
A 3 12 4
B 5 50 0
C 10 91 1D 2 31 1
E 7 25 25
F 6 42 4
G 4 62 4
H 1 76 36
I 8 102 4
J 9 81 1
1725

8/2/2019 Correlation n Regression
25/37
17 25
rs = 1 (( 6 D2 )/( N(N2 1))
= 1((6X80) /(10(1001)))
= 1(480/990)
= 10.48
= 0.52
This indicates that there is a positivecorrelation between two variables.This means the both the systems are
giving similar results
1726

8/2/2019 Correlation n Regression
26/37
6
Regression
Regression
Specific statistical methods for
finding the line of best fit for oneresponse (dependent) numericalvariable based on one or more
explanatory (independent)variables.
1727

8/2/2019 Correlation n Regression
27/37
Regression: 3 MainPurposes
To describe (or model)
To predict (or estimate)
To control (or administer)
1728

8/2/2019 Correlation n Regression
28/37
Regression AnalysisRegression analysis examines
associative relationships between ametric dependent variable and oneor more independent variables in thefollowing ways:
Determine whether the independentvariables explain a significantvariation in the dependent variable
Determine how much of thevariation in the dependent variablecan be explained by theindependent variables: strength ofthe relationship.
Predict the values of the dependent
1729

8/2/2019 Correlation n Regression
29/37
Example
Plan an outdoor party.
Estimate number of softdrinks to buy per person, basedon how hot the weather is.
Use Temperature/Water dataand regression.
1730

8/2/2019 Correlation n Regression
30/37
Real Life Applications
Estimating Seasonal Sales forDepartment Stores (Periodic)
1731

8/2/2019 Correlation n Regression
31/37
Real Life Applications
Predicting Student Grades Based onTime Spent Studying
1732

8/2/2019 Correlation n Regression
32/37
Practice Problems
Can the number of pointsscored in a basketball game bepredicted by
The time a player plays inthe game?
By the players height?
1733

8/2/2019 Correlation n Regression
33/37
Types of Regression Models
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
1734

8/2/2019 Correlation n Regression
34/37
Least square method
The equation for regression line assumedby Least Squares method is
Y=a+bx+ei Where ei =Yii Where Y is the dependent variable X is the independent variable a is the Yintercept
b is the slope of the line b=( (n (XY)( X Y))/ ((n (X 2)( X) 2) a=YbX
1735
Calculations for determining

8/2/2019 Correlation n Regression
35/37
Calculations for determiningconstants a and b
Man Hours(X) Productivity in
units(Y)
XY X2
3.6 9.3 33.48 12.96
4.8 10.2 48.96 23.04
2.4 9.7 23.28 5.76
7.2 11.5 82.8 51.84
6.9 12 82.8 47.61
8.4 14.2 119.28 70.56
10.7 18.6 199.02 114.49
11.2 28.4 318.08 125.44
6.1 13.2 80.52 37.21
7.9 10.8 85.32 62.41
9.5 22.7 215.65 90.25
5.4 12.3 66.42 29.16
X=84.1 Y=172.9 XY=1355.61 X2
1736

8/2/2019 Correlation n Regression
36/37
b=1.768
a=2.01
Y=2.01+1.768X
1737

8/2/2019 Correlation n Regression
37/37
The Strength of Association R2
R2 = ( Explained Variance) / ( TotalVariance)
Total Variance = (ExplainedVariance)+
(UnexplainedVariance)
Explained Variance=(TotalVariance )
(Unexplainedi