# correlation & regression (2)

of 49 /49

Post on 15-Nov-2014

617 views

Category:

## Education

Tags:

• #### rank correlation

Embed Size (px)

DESCRIPTION

Correlation & regression - Unitedworld School of Business

TRANSCRIPT  Correlation-Regression It deals with association between two or more variables

Correlation analysis deals with covariation between two or more variables

Types1. Positive or negativeSimple or multipleLinear or non-linear Methods of Measuring correlation1. Graphic Method2. Diagramatic Method- Scatter Diagram3. Algebraic methoda. Karl Pearson’s Coefficient of correlationb. Spearman’s Rank Co-efficient Correlationc. Coefficient of Concurrent deviationsd. Least Squares Method Karl Pearson’s Coefficient of Correlation

Σ dx dy γ ( Gamma) = ------------------------- √ Σ dx2 Σ dy2

Σ dx dy = ------------------------- N σxσy dx = x-xbardy = y- ybardx dy = sum of products of deviations from respective

arithmetic means of both series Karl Pearson’s Coefficient of Correlation After calculating assumed or working mean Ax & Ay Σ dx dy – (Σ dx) x( Σ dy)γ ( Gamma) = -------------------------------- √ [ NΣ dx2 - (Σ dx)2 x [Σ Ndy2 - (Σ dy)2 ]Σ dx dy = total of products of deviation from assumed

means of x and y seriesΣ dx = total of deviations of x seriesΣ dy = total of deviations of y seriesΣ dx2 = total of squared deviations of x seriesΣ dy2 = total of squared deviations of y seriesN= No. of items ( no. of paired items Karl Pearson’s Coefficient of Correlation After calculating assumed or working mean Ax & Ay Σ dx x Σ dy Σ dx dy - ---------------- Nγ ( Gamma) = ------------------------- (Σ dx)2 (Σ dy)2

√ [ Σ dx2 - --------- ] x [ Σ dy2 - ------------] N N Assumptions of Karl Pearson’s Coefficient of Correlation 1. Linear relationship exists between the variablesProperties of Karl Pearson’s Coefficient of Correlation 1.value lies between +1 & - 12.Zero means no correlation3.γ ( Gamma) = √ bxy X byxWhere bxy X byx are regression coefficicentMerit Convenient for accurate interpretation as it gives degree &

direction of relationship between two variables Limitations 1. Assumes linear relationship , even though it

may not be2. Method & process of calculation is difficult &

time consuming3. Affected by extreme values in distribution Probable Error of Karl Pearson’s Coefficient of Correlation

1- γ2

Probable Error of γ ( Gamma) = 0.6745 -------- √ N Q7.Calculate coefficient of correlation for following data

X65 63 67 64 68 62 70 66 68 67 69 71

Y 68 66 68 65 69 66 68 65 71 67 68 70

Ans Σ dx dy γ ( Gamma) = ------------------------- √ Σ dx2 Σ dy2

Σ dx dy = ------------------- N σxσy 1 2 3 4 5 6 7 8 9 10 11 12SumX Xbar

X 65 63 67 64 68 62 70 66 68 67 69 71 800 66.67

Y 68 66 68 65 69 66 68 65 71 67 68 70 811 67.58

dx=x-xbar -1.67 -3.67 0.33 -2.67 1.33 -4.67 3.33 -0.67 1.33 0.33 2.33 4.33

dx2 2.78 13.44 0.11 7.11 1.78 21.78 11.11 0.44 1.78 0.11 5.44 18.7884.67

dx.dy -0.69 5.81 0.14 6.89 1.89 7.39 1.39 1.72 4.56 -0.19 0.97 10.4740.33

dy=y-ybar 0.42 -1.58 0.42 -2.58 1.42 -1.58 0.42 -2.58 3.42 -0.58 0.42 2.42

dy2 0.17 2.51 0.17 6.67 2.01 2.51 0.17 6.67 11.67 0.34 0.17 5.8438.92

Σ dx dy sum dx2* sumdy2

3294.9

√ Σ dx2 Σ dy2 57.40

coeff of correlation = 0.70 Q8. following information about age of husbands & wives. Find correlation coefficient

Husband 23 27 28 29 30 31 33 35 36 39

Wife 18 22 23 24 25 26 28 29 30 32

γ ( Gamma) =0.99 1 2 3 4 5 6 7 8 9 10SumX Xbar

X 23 27 28 29 30 31 33 35 36 39 311 31.10

Y 18 22 23 24 25 26 28 29 30 32 257 25.70

dx=x-xbar -8.10 -4.10 -3.10 -2.10 -1.10 -0.10 1.90 3.90 4.90 7.90

dx2 65.61 16.81 9.61 4.41 1.21 0.01 3.61 15.21 24.01 62.41202.

9

dx.dy 62.37 15.17 8.37 3.57 0.77 -0.03 4.37 12.87 21.07 49.77178.

3

dy=y-ybar -7.70 -3.70 -2.70 -1.70 -0.70 0.30 2.30 3.30 4.30 6.30

dy2 59.29 13.69 7.29 2.89 0.49 0.09 5.29 10.89 18.49 39.69158.

1

Σ dx dy sum dx2* sumdy232078.4

9

√ Σ dx2 Σ dy2 179.10

coeff of correlation = 1.00 Rank Correlation : some times variable are not quantitative in nature but can be arranged in serial order. Specially while eading with attributes like – honesty , beauty , character , morality etcTo deal with such situations , Charles Edward Spearman , in 1904 developed a formula for obtaining correlation coefficient between ranks of n individuals in two attributes under study , or ranks given by two or three judges Rank coefficient of correlation 6Σ d2 ρ (rho) = 1 - ------------------- N3-N

6Σ d2 ρ (rho) = 1 - ------------------- N(N2-1) Σ d2 = total of squared differenceN = number of items Q9. ten competitors in a cooking competition are ranked by three judges in the following way .by using rank coorelation method find out which pair of judges have nearest approach

P Q R

1 1 3 6

2 6 5 4

3 5 8 9

4 10 4 8

5 3 7 1

6 2 10 2

7 4 2 3

8 9 1 10

9 7 6 5

10 8 9 7 P Q R Rp-Rq dpq2 Rq-Rr dqr2 Rp-Rr dpr2

1 1 3 6 -2 4 -3 9 -5 25

2 6 5 4 1 1 1 1 2 4

3 5 8 9 -3 9 -1 1 -4 16

4 10 4 8 6 36 -4 16 2 4

5 3 7 1 -4 16 6 36 2 4

6 2 10 2 -8 64 8 64 0 0

7 4 2 3 2 4 -1 1 1 1

8 9 1 10 8 64 -9 81 -1 1

9 7 6 5 1 1 1 1 2 4

10 8 9 7 -1 1 2 4 1 1

1000 200 214 0 60

6Sigma d2 1200 1284 360

N3-N 990 6Sigma d2/N3-N 1.21 1.297 0.3636

ρ (rho) -0.21 -0.297 0.636364 Regression Analysis is the process of developing a statistical model which is used to predict the value of a dependant variable by an independent variableApplicationAdvertising v/s sales revenueFirst used by Sir Francis Gatton in 1877 for study of height of sons w.r.t height of fathers Regression Analysis – going back or to revert to the former condition or return

Refers to functional relationship between x & y and estimates of value of depebdent variable y for given values of independeny variable x

Relationship between income of employees and savings

Regression coefficients can be used to calculate , correlation coeffecient.γ ( Gamma) = √ bxy X byx Types of Regression 1. Simple & Multiple Regression2. Total or Partial3. Linear / Non-linear

Methods of Regression Analysis1. Scatter Diagram2. Regression Equations3. Regression LinesRegression of x on y y= a + bx Regression of y on x x= a + by Regression coefficients coefficient of regression of x on y = coefficient of regression of x on y = Σ( x- x-) (y- y-) Σdx dy bxy= ------------------= ------- Σ (y- y-)2 Σ dy2 coefficient of regression of y on x Σ( x- x-) (y- y-) Σdx dy byx= ------------------= ---------- Σ (x- x-)2 Σ dx2 Q2.From the data given below findtwo regression coefficientstwo regression equationscoefficient of correlation between marks in Economics & statisticsmost likely marks in statistics when marks in Economics are 30

let marks in Economics be x and that in statistics be y

Marks in Eco 25 28 35 32 31 36 29 38 34 32

Marks in Stat 43 46 49 41 36 32 31 30 33 39 Marks in Eco

25 28 35 32 31 36 29 38 34 32 Σx 320 x- 32

Marks in Stat

43 46 49 41 36 32 31 30 33 39 Σy 380 y- 38 Marks in Eco

25 2835 3

23

13

62

93

83

43

2Σx 320 x- 3

2

Marks in Stat

43 4649 4

13

63

23

13

03

33

9Σy 380 y- 3

8

dx=x- x-

=x-32-7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 3

33

3

dy=y- y-

=x-385 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0 Marks in Eco

25 28 35 32 31 36 29 38 34 32 Σx 320 x- 32

Marks in Stat

43 46 49 41 36 32 31 30 33 39 Σy 380 y- 38

dx=x- x-=x-32

-7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 33 33

dy=y- y-=x-38

5 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0

dx2 49 16 9 0 1 16 9 36 4 0 Σdx2 140

dy2 25 64 121 9 4 36 49 64 25 1 Σdy2 398

dx dy -35 -32

33 0 2 -24

21 -48

-10

0 Σdxdy

-93 Regression coefficients coefficient of regression of x on y = coefficient of regression of x on y = Σ( x- x-) (y- y-) Σdx dy -93bxy= ------------------= ------- = ------ = -0.2337 Σ (y- y-)2 Σ dy2 398coefficient of regression of y on x = Σ( x- x-) (y- y-) Σdx dy -93byx= ------------------= ---------- = --------= -0.6643 Σ (x- x-)2 Σ dx2 140 regression of x on yx-x- = bxy (y-y-)x-32 = -0.2337(y-38) = - 0.2337 y +0.2337 *38 = -0.2337y + 8.8806x = -0.2337y +32 + 8.8806

x = -0.2337y +40.8806 Correlation Coefficient = bxy *byx= -0.2337 *-0.6643 = 0.1552 = -0.394

Since byx & bxy are both negative regression of y on xy-y- = bxy (x-x-)y-38 = -0.6643(x-32)y -38= -0.6643x+0.6643*32y = -0.6643x+38+0.6643*32y = -0.6643x+38+21.2576y = -0.6643x+59.2576 In order to estimate most likely marks in statistics (y) when Economics (x) are 30 , we shall use the line regression of y x vizThe required estimate is given byy = -0.6643* 30+59.2576= -19.929+59.2576 = =39.3286 Sum of Squares- x&y

(Σx )*(Σy) SSxy = Σ ( x-x- ) ( y-y- ) = = Σxy - -------------- nSum of Squares xx

(Σx ) SSxx = Σ ( x-x- )2 =Σx2 - ------------- n 92 930

94 900

97 1020

98 990

100 1100

102 1050

104 1150

105 1120

105 1130

107 1200

107 1250

110 1220

Sales &advt expenses in Rs.1000. Develop a regression model SSxy b = ------------ SSxx y=a+bx

Σ y= Σ a+b Σ xΣ y= n* a+b Σ x

n* a = b Σ x - Σ y Σ y - bΣ x Σ y bΣ x a = ----------- = ------- - ------- n n n xi=  yi=      predicted residual

advt sales x2 xy (yi-y-) (yi-y-)2 y^=fits yi-y^  ( yi-y^)2  y^-y- (y^-y-)2

92 930 8464 85560  = 902.4 27.6

94 900 8836 84600   940.54 -40.54

97 1020 9409 98940   997.75 22.25

98 990 9604 97020  1016.8

2 -26.82

100 1100 10000 110000  1054.9

6 45.04

102 1050 10404 107100   1093.1 -43.1

104 1150 10816 119600  1131.2

4 18.76

105 1120 11025 117600  1150.3

1 -30.31

105 1130 11025 118650  1150.3

1 -20.31

107 1200 11449 128400  1188.4

5 11.55

107 1250 11449 133750  1188.4

5 61.55

110 1220 12100 134200  1245.6

6 -25.66

1221 13060 124581 1335420 013059.

99 0.01

Σx Σy Σx2 Σxy Σ Σ Σ(yi-yc) Σ xi=  yi=         predicted residual

advt sales x2 xy (yi-y-) (yi-y-)2 y^=fits yi-y^  ( yi-y^)2  y^-y- (y^-y-)2

92 930 8464 85560 -158.3325069.44

902.4 27.6761.76 -185.93 34571.20

94 900 8836 84600 -188.3335469.44

940.54 -40.541643.49 -147.79 21842.87

97 1020 9409 98940 -68.33 4669.44 997.75 22.25 495.06 -90.58 8205.34

98 990 9604 97020 -98.33 9669.44 1016.82 -26.82 719.31 -71.51 5114.16

100 1100 10000 110000 11.67 136.11 1054.96 45.04 2028.60 -33.37 1113.78

102 1050 10404 107100 -38.33 1469.44 1093.1 -43.1 1857.61 4.77 22.72

104 1150 10816 119600 61.67 3802.78 1131.24 18.76 351.94 42.91 1840.98

105 1120 11025 117600 31.67 1002.78 1150.31 -30.31 918.70 61.98 3841.11

105 1130 11025 118650 41.67 1736.11 1150.31 -20.31 412.50 61.98 3841.11

107 1200 11449 128400 111.67 12469.44 1188.45 11.55 133.40 100.12 10023.35

107 1250 11449 133750 161.67 26136.11 1188.45 61.55 3788.40 100.12 10023.35

110 1220 12100 134200 131.67 17336.11 1245.66 -25.66 658.44 157.33 24751.68

1221 13060 124581 1335420 0.00 138966.667 13059.99 0.01 13769.21 -0.01 125191.6

Σx Σy Σx2 Σxy Σ   Σ Σ(yi-yc) Σ 1221x- = ------------- = 101.75 12 (Σx *Σy) 1221*13060SSxy = Σxy - ------------= 1335420 - -------------- =6565 n 12 (Σx )2 ( 1221)2

SSxx = Σx2 - -------------= 124581 - ------- = = 344.25 n 12 SSxy 6565b = ------------- = ----------------= 19.0704 SSxx 344.25y=a+bxΣ y= Σ a+b Σ xΣ y= n* a+b Σ xn* a = b Σ x - Σ y Σ y - bΣ x Σ y bΣ x 13060 19.0704*1221a = ----------- = ------- - ------- = ---------- - -------------- n n n 12 12

= - 852.08 equation for simple regression liney= a+bx

y= -852.08+ 19.0704 x for regression of y on x For testing the Fityi = yi- value of y –recorded value in the given data

y- = Mean ( Average )of yy^ = Predicted Values from regression linedeviation = (yi- y

-) = difference in actual value of y from meanResiduals = (yi- y^)= gap ( error , difference ) between actual value of y & predicted value calculated from regression lineDeviation of predicted value from mean = (y^- y-)a = intercept on y -axisb= slope of regression line total sum of squares = SST = Σ (yi-y-)2

regression sum of squares = SSR = Σ (y^- y-)2

Error sum of squares = SSE = Σ (yi-y^)2

SSR coefficient of determination = γ2= ------- SST SSE Standard Error of Estimate =Syx= √---------------- n-2In order to to determine whether a significant linear relationship exists between independent variable x and dependent variable y we perform whether population slope is zero b - β t= ---------- Sb

Syx Sb = Standard error of b= -----------

√ SSxx H0:Slope of thr regression line is zeroH1-Slope of the regression line is not zero SSE Syx= Standard Error of Estimate =√-------- n-2 Σ (yi-y^)2 13769.21=√ -------- = √------------ = √1376.92 = 37.1068 n-2 10-2 (Σx )2 (1221)2SSxx = Σx2 - -------- = 124581 - -------= 344.25 n 12

Syx Sb = Standard error of b= ----------- √ SSxx Syx Sb = Standard error of b= ----------- √ SSxx b- β 19.07-0 t= ---------- = ------------------------------- = 9.53 Sb 37.1068/( √344.25) As calculated value of t is more than table value of t for 12-2 = 10 degrees of freedomNull hypothesis is rejected Coefficient of Determination DefinitionThe Coefficient of Determination, also known as R Squared, is interpreted as the goodness of fit of a regression. The higher the coefficient of determination, the better the variance that the dependent variable is explained by the independent variable.  The coefficient of determination is the overall measure of the usefulness of a regression.For example,r2 is given at 0.95. This means that the variation in the regression is 95% explained by the independent variable. That is a good regression. The Coefficient of Determination can be calculated as the Regression sum of squares, SSR, divided by the total sum of squares, SST SSR Coefficient of Determination γ2 = ----------                                                  SST Campus Overview 