# correlation n regression

Author: santosh-pandey

Post on 05-Apr-2018

234 views

Category:

## Documents

Embed Size (px)

TRANSCRIPT

• 8/2/2019 Correlation n Regression

1/37

17-1

• 8/2/2019 Correlation n Regression

2/37

17-2

CORRELATION ANALYSISAND REGRESSION

ANALYSIS

• 8/2/2019 Correlation n Regression

3/37

17-3

Correlation

Correlation A measure of association between

two numerical variables.

Example (positive correlation)Typically, in the summer as the

temperature increases people arethirstier.

• 8/2/2019 Correlation n Regression

4/37

17-4

Scatter Diagram

Scatter diagrams provide therelationship between two variables ina graphical form

The diagram summarizes the natureof relationship between two variables

Whether the relationship is positiveor negative

The diagram also explains themagnitude of the relationship

• 8/2/2019 Correlation n Regression

5/37

17-5

Scatter Diagrams with varied rvalues

r2 = 1, r2 = 1,

r2 = .81, r2 = 0,

Y

X

Y

X

Y

X

Y

X

r = +1 r = -1

r = +0.

9 r = 0

• 8/2/2019 Correlation n Regression

6/37

17-6

Specific Example

For sevenrandomsummer days, a

person recordedthe temperatureand their waterconsumption,

during a three-hour periodspent outside.

Temperature(F)

Water

Consumption

(ounces)

75 16

83 20

85 25

85 27

92 32

97 48

99 48

• 8/2/2019 Correlation n Regression

7/37

17-7

How would you describe the graph?

• 8/2/2019 Correlation n Regression

8/37

17-8

How strong is the linearrelationship?

• 8/2/2019 Correlation n Regression

9/37

17-9

Correlation Analysis Correlation Analysis is statistical

technique used to measure themagnitude of linear relationshipbetween two variables

Correlation can be used along withregression analysis to determine thenature of the relationship betweenvariables

The prominent correlation coefficientsare

1.The Pearson product moment

correlation coefficient

• 8/2/2019 Correlation n Regression

10/37

17-10

Measuring the Relationship

Pearsons SampleCorrelation

Coefficient, r

measures the direction and

the strength of the linearassociation between twonumerical paired variables.

• 8/2/2019 Correlation n Regression

11/37

17-11

Direction of Association

Positive Correlation NegativeCorrelation

• 8/2/2019 Correlation n Regression

12/37

17-12

Strength of LinearAssociation

rvalue Interpretation1 perfect positive linear

relationship

0 no linear relationship-1 perfect negative linear

relationship

17 13

• 8/2/2019 Correlation n Regression

13/37

17-13

Strength of LinearAssociation

17 14

• 8/2/2019 Correlation n Regression

14/37

17-14

Other Strengths ofAssociation

r value Interpretation0.9 strong association

0.5 moderate association

0.25 weak association

17 15

• 8/2/2019 Correlation n Regression

15/37

17-15

Other Strengths ofAssociation

17 16

• 8/2/2019 Correlation n Regression

16/37

17-16

Product Moment CorrelationThe product moment correlation,

r, summarizes the strength ofassociation between two metric(interval or ratio scaled) variables,say X and Y.

As it was originally proposed by Karl

Pearson, it is also known as thePearson correlation coefficient. It isalso referred to as simple correlation,bivariate correlation, or merely the

correlation coefficient.

17 17

• 8/2/2019 Correlation n Regression

17/37

17-17

Product Moment Correlation

From a sample ofn observations,Xand Y,the product moment correlation, r, can becalculated as:

rvaries between -1.0 and +1.0.

( ) ( )

( ) ( )

1

2 2

1 1

n

i i

i

n n

i i

i i

X X Y Y

r

X X Y Y

=

= =

=

17 18

• 8/2/2019 Correlation n Regression

18/37

17-18

Ad Spending and Corresponding Sales ofRoyal Products

Company Adver t is ingE xp(X )

S ale s(Y )

1 6 10

2 9 12

3 8 12

4 3 4

5 1 0 1 2

6 4 6

7 5 8

8 2 2

9 1 1 1 8

1 0 9 9

1 1 1 0 1 7

1 2 2 2

17 19

• 8/2/2019 Correlation n Regression

19/37

17-19

Product Moment CorrelationThe correlation coefficient may be calculated as follows:

X= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12= 9.333

Y= (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12= 6.583

= (10 -9.33)(6-6.58) + (12-9.33)(9-6.58)+ (12-9.33)(8-6.58) + (4-9.33)(3-6.58)+ (12-9.33)(10-6.58) + (6-9.33)(4-6.58)+ (8-9.33)(5-6.58) + (2-9.33) (2-6.58)+ (18-9.33)(11-6.58) + (9-9.33)(9-6.58)

+ (17-9.33)(10-6.58) + (2-9.33)(2-6.58)= -0.3886 + 6.4614 + 3.7914 + 19.0814+ 9.1314 + 8.5914 + 2.1014 + 33.5714+ 38.3214 - 0.7986 + 26.2314 + 33.5714= 179.6668

17-20

• 8/2/2019 Correlation n Regression

20/37

17-20

Product Moment Correlation1

= (10-9.33)2 + (12-9.33)2 + (12-9.33)2 + (4-9.33)2

+ (12-9.33)

2

+ (6-9.33)

2

+ (8-9.33)

2

+ (2-9.33)

2

+ (18-9.33)2 + (9-9.33)2 + (17-9.33)2 + (2-9.33)2

= 0.4489 + 7.1289 + 7.1289 + 28.4089+ 7.1289+ 11.0889 + 1.7689 + 53.7289+ 75.1689 + 0.1089 + 58.8289 + 53.7289= 304.6668

= (6-6.58)2 + (9-6.58)2 + (8-6.58)2 + (3-6.58)2+ (10-6.58)2+ (4-6.58)2 + (5-6.58)2 + (2-6.58)2

+ (11-6.58)2 + (9-6.58)2 + (10-6.58)2 + (2-6.58)2

= 0.3364 + 5.8564 + 2.0164 + 12.8164

+ 11.6964 + 6.6564 + 2.4964 + 20.9764+ 19.5364 + 5.8564 + 11.6964 + 20.9764= 120.9168

Thus, r= 179.6668

(304.6668) (120.9168)= 0.9361

17-21

• 8/2/2019 Correlation n Regression

21/37

17 21

Product Moment CorrelationThe correlation coefficient may be calculated as follows:

X= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12= 9.333

Y= (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12= 6.583

= (10 -9.33)(6-6.58) + (12-9.33)(9-6.58)+ (12-9.33)(8-6.58) + (4-9.33)(3-6.58)+ (12-9.33)(10-6.58) + (6-9.33)(4-6.58)+ (8-9.33)(5-6.58) + (2-9.33) (2-6.58)+ (18-9.33)(11-6.58) + (9-9.33)(9-6.58)

+ (17-9.33)(10-6.58) + (2-9.33)(2-6.58)= -0.3886 + 6.4614 + 3.7914 + 19.0814+ 9.1314 + 8.5914 + 2.1014 + 33.5714+ 38.3214 - 0.7986 + 26.2314 + 33.5714= 179.6668

17-22

• 8/2/2019 Correlation n Regression

22/37

17 22

Rank correlation

Researchers often face situationswhere they have to take decisionsbased on data measured on ordinal

scale scales in such casesSpearmans rank correlation isappropriate to relationship between

variables.It can be calculated using following

formula

rs = 1 (( 6 D 2 )/( N(N2 -1))

17-23

h ki f l i i

• 8/2/2019 Correlation n Regression

23/37

17 23

The ranking of televisionModels

Television Models Existing System New system

A 3 1

B 5 5

C 10 9

D 2 3

E 7 2F 6 4

G 4 6

H 1 7

I 8 10J 9 8

17-24

• 8/2/2019 Correlation n Regression

24/37

17 24

Calculation of Rank correlationcoefficient

Television

Models

Existing

System(X)

New

system(Y)

D =(R1 - R2 ) D2

A 3 12 4

B 5 50 0

C 10 91 1D 2 3-1 1

E 7 25 25

F 6 42 4

G 4 6-2 4

H 1 7-6 36

I 8 10-2 4

J 9 81 1

17-25

• 8/2/2019 Correlation n Regression

25/37

17 25

rs = 1 (( 6 D2 )/( N(N2 -1))

= 1-((6X80) /(10(100-1)))

= 1-(480/990)

= 1-0.48

= 0.52

This indicates that there is a positivecorrelation between two variables.This means the both the systems are

giving similar results

17-26

• 8/2/2019 Correlation n Regression

26/37

6

Regression

Regression

Specific statistical methods for

finding the line of best fit for oneresponse (dependent) numericalvariable based on one or more

explanatory (independent)variables.

17-27

• 8/2/2019 Correlation n Regression

27/37

Regression: 3 MainPurposes

To describe (or model)

To predict (or estimate)

17-28

• 8/2/2019 Correlation n Regression

28/37

Regression AnalysisRegression analysis examines

associative relationships between ametric dependent variable and oneor more independent variables in thefollowing ways:

Determine whether the independentvariables explain a significantvariation in the dependent variable

Determine how much of thevariation in the dependent variablecan be explained by theindependent variables: strength ofthe relationship.

Predict the values of the dependent

17-29

• 8/2/2019 Correlation n Regression

29/37

Example

Plan an outdoor party.

Estimate number of softdrinks to buy per person, basedon how hot the weather is.

Use Temperature/Water dataand regression.

17-30

• 8/2/2019 Correlation n Regression

30/37

Real Life Applications

Estimating Seasonal Sales forDepartment Stores (Periodic)

17-31

• 8/2/2019 Correlation n Regression

31/37

Real Life Applications

Predicting Student Grades Based onTime Spent Studying

17-32

• 8/2/2019 Correlation n Regression

32/37

Practice Problems

Can the number of pointsscored in a basketball game bepredicted by

The time a player plays inthe game?

By the players height?

17-33

• 8/2/2019 Correlation n Regression

33/37

Types of Regression Models

Positive Linear Relationship

Negative Linear Relationship

Relationship NOT Linear

No Relationship

17-34

• 8/2/2019 Correlation n Regression

34/37

Least square method

The equation for regression line assumedby Least Squares method is

Y=a+bx+ei Where ei =Yi-i Where Y is the dependent variable X is the independent variable a is the Y-intercept

b is the slope of the line b=( (n (XY)-( X Y))/ ((n (X 2)-( X) 2) a=Y-bX

17-35

Calculations for determining

• 8/2/2019 Correlation n Regression

35/37

Calculations for determiningconstants a and b

Man Hours(X) Productivity in

units(Y)

XY X2

3.6 9.3 33.48 12.96

4.8 10.2 48.96 23.04

2.4 9.7 23.28 5.76

7.2 11.5 82.8 51.84

6.9 12 82.8 47.61

8.4 14.2 119.28 70.56

10.7 18.6 199.02 114.49

11.2 28.4 318.08 125.44

6.1 13.2 80.52 37.21

7.9 10.8 85.32 62.41

9.5 22.7 215.65 90.25

5.4 12.3 66.42 29.16

X=84.1 Y=172.9 XY=1355.61 X2

17-36

• 8/2/2019 Correlation n Regression

36/37

b=1.768

a=2.01

Y=2.01+1.768X

17-37

• 8/2/2019 Correlation n Regression

37/37

The Strength of Association R2

R2 = ( Explained Variance) / ( TotalVariance)

Total Variance = (ExplainedVariance)+

(UnexplainedVariance)

Explained Variance=(TotalVariance )

(Unexplainedi