Transcript
Page 1: Correlation & regression (2)
Page 2: Correlation & regression (2)

Correlation-Regression

Page 3: Correlation & regression (2)

It deals with association between two or more variables

Correlation analysis deals with covariation between two or more variables

Types1. Positive or negativeSimple or multipleLinear or non-linear

Page 4: Correlation & regression (2)

Methods of Measuring correlation1. Graphic Method2. Diagramatic Method- Scatter Diagram3. Algebraic methoda. Karl Pearson’s Coefficient of correlationb. Spearman’s Rank Co-efficient Correlationc. Coefficient of Concurrent deviationsd. Least Squares Method

Page 5: Correlation & regression (2)

Karl Pearson’s Coefficient of Correlation

Σ dx dy γ ( Gamma) = ------------------------- √ Σ dx2 Σ dy2

Σ dx dy = ------------------------- N σxσy dx = x-xbardy = y- ybardx dy = sum of products of deviations from respective

arithmetic means of both series

Page 6: Correlation & regression (2)

Karl Pearson’s Coefficient of Correlation After calculating assumed or working mean Ax & Ay Σ dx dy – (Σ dx) x( Σ dy)γ ( Gamma) = -------------------------------- √ [ NΣ dx2 - (Σ dx)2 x [Σ Ndy2 - (Σ dy)2 ]Σ dx dy = total of products of deviation from assumed

means of x and y seriesΣ dx = total of deviations of x seriesΣ dy = total of deviations of y seriesΣ dx2 = total of squared deviations of x seriesΣ dy2 = total of squared deviations of y seriesN= No. of items ( no. of paired items

Page 7: Correlation & regression (2)

Karl Pearson’s Coefficient of Correlation After calculating assumed or working mean Ax & Ay Σ dx x Σ dy Σ dx dy - ---------------- Nγ ( Gamma) = ------------------------- (Σ dx)2 (Σ dy)2

√ [ Σ dx2 - --------- ] x [ Σ dy2 - ------------] N N

Page 8: Correlation & regression (2)

Assumptions of Karl Pearson’s Coefficient of Correlation 1. Linear relationship exists between the variablesProperties of Karl Pearson’s Coefficient of Correlation 1.value lies between +1 & - 12.Zero means no correlation3.γ ( Gamma) = √ bxy X byxWhere bxy X byx are regression coefficicentMerit Convenient for accurate interpretation as it gives degree &

direction of relationship between two variables

Page 9: Correlation & regression (2)

Limitations 1. Assumes linear relationship , even though it

may not be2. Method & process of calculation is difficult &

time consuming3. Affected by extreme values in distribution

Page 10: Correlation & regression (2)

Probable Error of Karl Pearson’s Coefficient of Correlation

1- γ2

Probable Error of γ ( Gamma) = 0.6745 -------- √ N

Page 11: Correlation & regression (2)

Q7.Calculate coefficient of correlation for following data

X65 63 67 64 68 62 70 66 68 67 69 71

Y 68 66 68 65 69 66 68 65 71 67 68 70

Ans Σ dx dy γ ( Gamma) = ------------------------- √ Σ dx2 Σ dy2

Σ dx dy = ------------------- N σxσy

Page 12: Correlation & regression (2)

1 2 3 4 5 6 7 8 9 10 11 12SumX Xbar

X 65 63 67 64 68 62 70 66 68 67 69 71 800 66.67

Y 68 66 68 65 69 66 68 65 71 67 68 70 811 67.58

dx=x-xbar -1.67 -3.67 0.33 -2.67 1.33 -4.67 3.33 -0.67 1.33 0.33 2.33 4.33

dx2 2.78 13.44 0.11 7.11 1.78 21.78 11.11 0.44 1.78 0.11 5.44 18.7884.67

dx.dy -0.69 5.81 0.14 6.89 1.89 7.39 1.39 1.72 4.56 -0.19 0.97 10.4740.33

dy=y-ybar 0.42 -1.58 0.42 -2.58 1.42 -1.58 0.42 -2.58 3.42 -0.58 0.42 2.42

dy2 0.17 2.51 0.17 6.67 2.01 2.51 0.17 6.67 11.67 0.34 0.17 5.8438.92

Σ dx dy sum dx2* sumdy2

3294.9

√ Σ dx2 Σ dy2 57.40

coeff of correlation = 0.70

Page 13: Correlation & regression (2)

Q8. following information about age of husbands & wives. Find correlation coefficient

Husband 23 27 28 29 30 31 33 35 36 39

Wife 18 22 23 24 25 26 28 29 30 32

γ ( Gamma) =0.99

Page 14: Correlation & regression (2)

1 2 3 4 5 6 7 8 9 10SumX Xbar

X 23 27 28 29 30 31 33 35 36 39 311 31.10

Y 18 22 23 24 25 26 28 29 30 32 257 25.70

dx=x-xbar -8.10 -4.10 -3.10 -2.10 -1.10 -0.10 1.90 3.90 4.90 7.90

dx2 65.61 16.81 9.61 4.41 1.21 0.01 3.61 15.21 24.01 62.41202.

9

dx.dy 62.37 15.17 8.37 3.57 0.77 -0.03 4.37 12.87 21.07 49.77178.

3

dy=y-ybar -7.70 -3.70 -2.70 -1.70 -0.70 0.30 2.30 3.30 4.30 6.30

dy2 59.29 13.69 7.29 2.89 0.49 0.09 5.29 10.89 18.49 39.69158.

1

Σ dx dy sum dx2* sumdy232078.4

9

√ Σ dx2 Σ dy2 179.10

coeff of correlation = 1.00

Page 15: Correlation & regression (2)

Rank Correlation : some times variable are not quantitative in nature but can be arranged in serial order. Specially while eading with attributes like – honesty , beauty , character , morality etcTo deal with such situations , Charles Edward Spearman , in 1904 developed a formula for obtaining correlation coefficient between ranks of n individuals in two attributes under study , or ranks given by two or three judges

Page 16: Correlation & regression (2)

Rank coefficient of correlation 6Σ d2 ρ (rho) = 1 - ------------------- N3-N

6Σ d2 ρ (rho) = 1 - ------------------- N(N2-1) Σ d2 = total of squared differenceN = number of items

Page 17: Correlation & regression (2)

Q9. ten competitors in a cooking competition are ranked by three judges in the following way .by using rank coorelation method find out which pair of judges have nearest approach

P Q R

1 1 3 6

2 6 5 4

3 5 8 9

4 10 4 8

5 3 7 1

6 2 10 2

7 4 2 3

8 9 1 10

9 7 6 5

10 8 9 7

Page 18: Correlation & regression (2)

P Q R Rp-Rq dpq2 Rq-Rr dqr2 Rp-Rr dpr2

1 1 3 6 -2 4 -3 9 -5 25

2 6 5 4 1 1 1 1 2 4

3 5 8 9 -3 9 -1 1 -4 16

4 10 4 8 6 36 -4 16 2 4

5 3 7 1 -4 16 6 36 2 4

6 2 10 2 -8 64 8 64 0 0

7 4 2 3 2 4 -1 1 1 1

8 9 1 10 8 64 -9 81 -1 1

9 7 6 5 1 1 1 1 2 4

10 8 9 7 -1 1 2 4 1 1

1000 200 214 0 60

6Sigma d2 1200 1284 360

N3-N 990 6Sigma d2/N3-N 1.21 1.297 0.3636

ρ (rho) -0.21 -0.297 0.636364

Page 19: Correlation & regression (2)

Regression Analysis is the process of developing a statistical model which is used to predict the value of a dependant variable by an independent variableApplicationAdvertising v/s sales revenueFirst used by Sir Francis Gatton in 1877 for study of height of sons w.r.t height of fathers

Page 20: Correlation & regression (2)

Regression Analysis – going back or to revert to the former condition or return

Refers to functional relationship between x & y and estimates of value of depebdent variable y for given values of independeny variable x

Relationship between income of employees and savings

Regression coefficients can be used to calculate , correlation coeffecient.γ ( Gamma) = √ bxy X byx

Page 21: Correlation & regression (2)

Types of Regression 1. Simple & Multiple Regression2. Total or Partial3. Linear / Non-linear

Methods of Regression Analysis1. Scatter Diagram2. Regression Equations3. Regression LinesRegression of x on y y= a + bx Regression of y on x x= a + by

Page 22: Correlation & regression (2)

Regression coefficients coefficient of regression of x on y = coefficient of regression of x on y = Σ( x- x-) (y- y-) Σdx dy bxy= ------------------= ------- Σ (y- y-)2 Σ dy2 coefficient of regression of y on x Σ( x- x-) (y- y-) Σdx dy byx= ------------------= ---------- Σ (x- x-)2 Σ dx2

Page 23: Correlation & regression (2)

Q2.From the data given below findtwo regression coefficientstwo regression equationscoefficient of correlation between marks in Economics & statisticsmost likely marks in statistics when marks in Economics are 30

let marks in Economics be x and that in statistics be y

Marks in Eco 25 28 35 32 31 36 29 38 34 32

Marks in Stat 43 46 49 41 36 32 31 30 33 39

Page 24: Correlation & regression (2)

Marks in Eco

25 28 35 32 31 36 29 38 34 32 Σx 320 x- 32

Marks in Stat

43 46 49 41 36 32 31 30 33 39 Σy 380 y- 38

Page 25: Correlation & regression (2)

Marks in Eco

25 2835 3

23

13

62

93

83

43

2Σx 320 x- 3

2

Marks in Stat

43 4649 4

13

63

23

13

03

33

9Σy 380 y- 3

8

dx=x- x-

=x-32-7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 3

33

3

dy=y- y-

=x-385 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0

Page 26: Correlation & regression (2)

Marks in Eco

25 28 35 32 31 36 29 38 34 32 Σx 320 x- 32

Marks in Stat

43 46 49 41 36 32 31 30 33 39 Σy 380 y- 38

dx=x- x-=x-32

-7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 33 33

dy=y- y-=x-38

5 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0

dx2 49 16 9 0 1 16 9 36 4 0 Σdx2 140

dy2 25 64 121 9 4 36 49 64 25 1 Σdy2 398

dx dy -35 -32

33 0 2 -24

21 -48

-10

0 Σdxdy

-93

Page 27: Correlation & regression (2)

Regression coefficients coefficient of regression of x on y = coefficient of regression of x on y = Σ( x- x-) (y- y-) Σdx dy -93bxy= ------------------= ------- = ------ = -0.2337 Σ (y- y-)2 Σ dy2 398coefficient of regression of y on x = Σ( x- x-) (y- y-) Σdx dy -93byx= ------------------= ---------- = --------= -0.6643 Σ (x- x-)2 Σ dx2 140

Page 28: Correlation & regression (2)

regression of x on yx-x- = bxy (y-y-)x-32 = -0.2337(y-38) = - 0.2337 y +0.2337 *38 = -0.2337y + 8.8806x = -0.2337y +32 + 8.8806

x = -0.2337y +40.8806

Page 29: Correlation & regression (2)

Correlation Coefficient = bxy *byx= -0.2337 *-0.6643 = 0.1552 = -0.394

Since byx & bxy are both negative

Page 30: Correlation & regression (2)

regression of y on xy-y- = bxy (x-x-)y-38 = -0.6643(x-32)y -38= -0.6643x+0.6643*32y = -0.6643x+38+0.6643*32y = -0.6643x+38+21.2576y = -0.6643x+59.2576

Page 31: Correlation & regression (2)

In order to estimate most likely marks in statistics (y) when Economics (x) are 30 , we shall use the line regression of y x vizThe required estimate is given byy = -0.6643* 30+59.2576= -19.929+59.2576 = =39.3286

Page 32: Correlation & regression (2)

Sum of Squares- x&y

(Σx )*(Σy) SSxy = Σ ( x-x- ) ( y-y- ) = = Σxy - -------------- nSum of Squares xx

(Σx ) SSxx = Σ ( x-x- )2 =Σx2 - ------------- n

Page 33: Correlation & regression (2)

advt sales

92 930

94 900

97 1020

98 990

100 1100

102 1050

104 1150

105 1120

105 1130

107 1200

107 1250

110 1220

Sales &advt expenses in Rs.1000. Develop a regression model

Page 34: Correlation & regression (2)

SSxy b = ------------ SSxx y=a+bx

Σ y= Σ a+b Σ xΣ y= n* a+b Σ x

n* a = b Σ x - Σ y Σ y - bΣ x Σ y bΣ x a = ----------- = ------- - ------- n n n

Page 35: Correlation & regression (2)

 xi=  yi=      predicted residual

advt sales x2 xy (yi-y-) (yi-y-)2 y^=fits yi-y^  ( yi-y^)2  y^-y- (y^-y-)2

92 930 8464 85560  = 902.4 27.6  

94 900 8836 84600   940.54 -40.54  

97 1020 9409 98940   997.75 22.25  

98 990 9604 97020  1016.8

2 -26.82  

100 1100 10000 110000  1054.9

6 45.04  

102 1050 10404 107100   1093.1 -43.1  

104 1150 10816 119600  1131.2

4 18.76  

105 1120 11025 117600  1150.3

1 -30.31  

105 1130 11025 118650  1150.3

1 -20.31  

107 1200 11449 128400  1188.4

5 11.55  

107 1250 11449 133750  1188.4

5 61.55  

110 1220 12100 134200  1245.6

6 -25.66  

1221 13060 124581 1335420 013059.

99 0.01  

Σx Σy Σx2 Σxy Σ Σ Σ(yi-yc) Σ

Page 36: Correlation & regression (2)

 xi=  yi=         predicted residual      

advt sales x2 xy (yi-y-) (yi-y-)2 y^=fits yi-y^  ( yi-y^)2  y^-y- (y^-y-)2

92 930 8464 85560 -158.3325069.44

902.4 27.6761.76 -185.93 34571.20

94 900 8836 84600 -188.3335469.44

940.54 -40.541643.49 -147.79 21842.87

97 1020 9409 98940 -68.33 4669.44 997.75 22.25 495.06 -90.58 8205.34

98 990 9604 97020 -98.33 9669.44 1016.82 -26.82 719.31 -71.51 5114.16

100 1100 10000 110000 11.67 136.11 1054.96 45.04 2028.60 -33.37 1113.78

102 1050 10404 107100 -38.33 1469.44 1093.1 -43.1 1857.61 4.77 22.72

104 1150 10816 119600 61.67 3802.78 1131.24 18.76 351.94 42.91 1840.98

105 1120 11025 117600 31.67 1002.78 1150.31 -30.31 918.70 61.98 3841.11

105 1130 11025 118650 41.67 1736.11 1150.31 -20.31 412.50 61.98 3841.11

107 1200 11449 128400 111.67 12469.44 1188.45 11.55 133.40 100.12 10023.35

107 1250 11449 133750 161.67 26136.11 1188.45 61.55 3788.40 100.12 10023.35

110 1220 12100 134200 131.67 17336.11 1245.66 -25.66 658.44 157.33 24751.68

1221 13060 124581 1335420 0.00 138966.667 13059.99 0.01 13769.21 -0.01 125191.6

Σx Σy Σx2 Σxy Σ   Σ Σ(yi-yc) Σ    

Page 37: Correlation & regression (2)

1221x- = ------------- = 101.75 12 (Σx *Σy) 1221*13060SSxy = Σxy - ------------= 1335420 - -------------- =6565 n 12 (Σx )2 ( 1221)2

SSxx = Σx2 - -------------= 124581 - ------- = = 344.25 n 12

Page 38: Correlation & regression (2)

SSxy 6565b = ------------- = ----------------= 19.0704 SSxx 344.25y=a+bxΣ y= Σ a+b Σ xΣ y= n* a+b Σ xn* a = b Σ x - Σ y Σ y - bΣ x Σ y bΣ x 13060 19.0704*1221a = ----------- = ------- - ------- = ---------- - -------------- n n n 12 12

= - 852.08

Page 39: Correlation & regression (2)

equation for simple regression liney= a+bx

y= -852.08+ 19.0704 x for regression of y on x

Page 40: Correlation & regression (2)

For testing the Fityi = yi- value of y –recorded value in the given data

y- = Mean ( Average )of yy^ = Predicted Values from regression linedeviation = (yi- y

-) = difference in actual value of y from meanResiduals = (yi- y^)= gap ( error , difference ) between actual value of y & predicted value calculated from regression lineDeviation of predicted value from mean = (y^- y-)a = intercept on y -axisb= slope of regression line

Page 41: Correlation & regression (2)

total sum of squares = SST = Σ (yi-y-)2

regression sum of squares = SSR = Σ (y^- y-)2

Error sum of squares = SSE = Σ (yi-y^)2

SSR coefficient of determination = γ2= ------- SST

Page 42: Correlation & regression (2)

SSE Standard Error of Estimate =Syx= √---------------- n-2In order to to determine whether a significant linear relationship exists between independent variable x and dependent variable y we perform whether population slope is zero b - β t= ---------- Sb

Syx Sb = Standard error of b= -----------

√ SSxx

Page 43: Correlation & regression (2)

H0:Slope of thr regression line is zeroH1-Slope of the regression line is not zero

Page 44: Correlation & regression (2)

SSE Syx= Standard Error of Estimate =√-------- n-2 Σ (yi-y^)2 13769.21=√ -------- = √------------ = √1376.92 = 37.1068 n-2 10-2 (Σx )2 (1221)2SSxx = Σx2 - -------- = 124581 - -------= 344.25 n 12

Syx Sb = Standard error of b= ----------- √ SSxx

Page 45: Correlation & regression (2)

Syx Sb = Standard error of b= ----------- √ SSxx b- β 19.07-0 t= ---------- = ------------------------------- = 9.53 Sb 37.1068/( √344.25) As calculated value of t is more than table value of t for 12-2 = 10 degrees of freedomNull hypothesis is rejected

Page 46: Correlation & regression (2)

Coefficient of Determination DefinitionThe Coefficient of Determination, also known as R Squared, is interpreted as the goodness of fit of a regression. The higher the coefficient of determination, the better the variance that the dependent variable is explained by the independent variable.  The coefficient of determination is the overall measure of the usefulness of a regression.For example,r2 is given at 0.95. This means that the variation in the regression is 95% explained by the independent variable. That is a good regression.

Page 47: Correlation & regression (2)

The Coefficient of Determination can be calculated as the Regression sum of squares, SSR, divided by the total sum of squares, SST SSR Coefficient of Determination γ2 = ----------                                                  SST

Page 48: Correlation & regression (2)

Campus Overview

907/A Uvarshad, GandhinagarHighway, Ahmedabad – 382422.

Ahmedabad Kolkata

Infinity Benchmark, 10th Floor, Plot G1,Block EP & GP, Sector V, Salt-Lake, Kolkata – 700091.

Mumbai

Goldline Business Centre Linkway Estate, Next to Chincholi Fire Brigade, Malad (West), Mumbai – 400 064.

Page 49: Correlation & regression (2)

Thank You


Top Related