correlation & regression (2)
Embed Size (px)
DESCRIPTION
Correlation & regression - Unitedworld School of BusinessTRANSCRIPT


Correlation-Regression

It deals with association between two or more variables
Correlation analysis deals with covariation between two or more variables
Types1. Positive or negativeSimple or multipleLinear or non-linear

Methods of Measuring correlation1. Graphic Method2. Diagramatic Method- Scatter Diagram3. Algebraic methoda. Karl Pearson’s Coefficient of correlationb. Spearman’s Rank Co-efficient Correlationc. Coefficient of Concurrent deviationsd. Least Squares Method

Karl Pearson’s Coefficient of Correlation
Σ dx dy γ ( Gamma) = ------------------------- √ Σ dx2 Σ dy2
Σ dx dy = ------------------------- N σxσy dx = x-xbardy = y- ybardx dy = sum of products of deviations from respective
arithmetic means of both series

Karl Pearson’s Coefficient of Correlation After calculating assumed or working mean Ax & Ay Σ dx dy – (Σ dx) x( Σ dy)γ ( Gamma) = -------------------------------- √ [ NΣ dx2 - (Σ dx)2 x [Σ Ndy2 - (Σ dy)2 ]Σ dx dy = total of products of deviation from assumed
means of x and y seriesΣ dx = total of deviations of x seriesΣ dy = total of deviations of y seriesΣ dx2 = total of squared deviations of x seriesΣ dy2 = total of squared deviations of y seriesN= No. of items ( no. of paired items

Karl Pearson’s Coefficient of Correlation After calculating assumed or working mean Ax & Ay Σ dx x Σ dy Σ dx dy - ---------------- Nγ ( Gamma) = ------------------------- (Σ dx)2 (Σ dy)2
√ [ Σ dx2 - --------- ] x [ Σ dy2 - ------------] N N

Assumptions of Karl Pearson’s Coefficient of Correlation 1. Linear relationship exists between the variablesProperties of Karl Pearson’s Coefficient of Correlation 1.value lies between +1 & - 12.Zero means no correlation3.γ ( Gamma) = √ bxy X byxWhere bxy X byx are regression coefficicentMerit Convenient for accurate interpretation as it gives degree &
direction of relationship between two variables

Limitations 1. Assumes linear relationship , even though it
may not be2. Method & process of calculation is difficult &
time consuming3. Affected by extreme values in distribution

Probable Error of Karl Pearson’s Coefficient of Correlation
1- γ2
Probable Error of γ ( Gamma) = 0.6745 -------- √ N

Q7.Calculate coefficient of correlation for following data
X65 63 67 64 68 62 70 66 68 67 69 71
Y 68 66 68 65 69 66 68 65 71 67 68 70
Ans Σ dx dy γ ( Gamma) = ------------------------- √ Σ dx2 Σ dy2
Σ dx dy = ------------------- N σxσy

1 2 3 4 5 6 7 8 9 10 11 12SumX Xbar
X 65 63 67 64 68 62 70 66 68 67 69 71 800 66.67
Y 68 66 68 65 69 66 68 65 71 67 68 70 811 67.58
dx=x-xbar -1.67 -3.67 0.33 -2.67 1.33 -4.67 3.33 -0.67 1.33 0.33 2.33 4.33
dx2 2.78 13.44 0.11 7.11 1.78 21.78 11.11 0.44 1.78 0.11 5.44 18.7884.67
dx.dy -0.69 5.81 0.14 6.89 1.89 7.39 1.39 1.72 4.56 -0.19 0.97 10.4740.33
dy=y-ybar 0.42 -1.58 0.42 -2.58 1.42 -1.58 0.42 -2.58 3.42 -0.58 0.42 2.42
dy2 0.17 2.51 0.17 6.67 2.01 2.51 0.17 6.67 11.67 0.34 0.17 5.8438.92
Σ dx dy sum dx2* sumdy2
3294.9
√ Σ dx2 Σ dy2 57.40
coeff of correlation = 0.70

Q8. following information about age of husbands & wives. Find correlation coefficient
Husband 23 27 28 29 30 31 33 35 36 39
Wife 18 22 23 24 25 26 28 29 30 32
γ ( Gamma) =0.99

1 2 3 4 5 6 7 8 9 10SumX Xbar
X 23 27 28 29 30 31 33 35 36 39 311 31.10
Y 18 22 23 24 25 26 28 29 30 32 257 25.70
dx=x-xbar -8.10 -4.10 -3.10 -2.10 -1.10 -0.10 1.90 3.90 4.90 7.90
dx2 65.61 16.81 9.61 4.41 1.21 0.01 3.61 15.21 24.01 62.41202.
9
dx.dy 62.37 15.17 8.37 3.57 0.77 -0.03 4.37 12.87 21.07 49.77178.
3
dy=y-ybar -7.70 -3.70 -2.70 -1.70 -0.70 0.30 2.30 3.30 4.30 6.30
dy2 59.29 13.69 7.29 2.89 0.49 0.09 5.29 10.89 18.49 39.69158.
1
Σ dx dy sum dx2* sumdy232078.4
9
√ Σ dx2 Σ dy2 179.10
coeff of correlation = 1.00

Rank Correlation : some times variable are not quantitative in nature but can be arranged in serial order. Specially while eading with attributes like – honesty , beauty , character , morality etcTo deal with such situations , Charles Edward Spearman , in 1904 developed a formula for obtaining correlation coefficient between ranks of n individuals in two attributes under study , or ranks given by two or three judges

Rank coefficient of correlation 6Σ d2 ρ (rho) = 1 - ------------------- N3-N
6Σ d2 ρ (rho) = 1 - ------------------- N(N2-1) Σ d2 = total of squared differenceN = number of items

Q9. ten competitors in a cooking competition are ranked by three judges in the following way .by using rank coorelation method find out which pair of judges have nearest approach
P Q R
1 1 3 6
2 6 5 4
3 5 8 9
4 10 4 8
5 3 7 1
6 2 10 2
7 4 2 3
8 9 1 10
9 7 6 5
10 8 9 7

P Q R Rp-Rq dpq2 Rq-Rr dqr2 Rp-Rr dpr2
1 1 3 6 -2 4 -3 9 -5 25
2 6 5 4 1 1 1 1 2 4
3 5 8 9 -3 9 -1 1 -4 16
4 10 4 8 6 36 -4 16 2 4
5 3 7 1 -4 16 6 36 2 4
6 2 10 2 -8 64 8 64 0 0
7 4 2 3 2 4 -1 1 1 1
8 9 1 10 8 64 -9 81 -1 1
9 7 6 5 1 1 1 1 2 4
10 8 9 7 -1 1 2 4 1 1
1000 200 214 0 60
6Sigma d2 1200 1284 360
N3-N 990 6Sigma d2/N3-N 1.21 1.297 0.3636
ρ (rho) -0.21 -0.297 0.636364

Regression Analysis is the process of developing a statistical model which is used to predict the value of a dependant variable by an independent variableApplicationAdvertising v/s sales revenueFirst used by Sir Francis Gatton in 1877 for study of height of sons w.r.t height of fathers

Regression Analysis – going back or to revert to the former condition or return
Refers to functional relationship between x & y and estimates of value of depebdent variable y for given values of independeny variable x
Relationship between income of employees and savings
Regression coefficients can be used to calculate , correlation coeffecient.γ ( Gamma) = √ bxy X byx

Types of Regression 1. Simple & Multiple Regression2. Total or Partial3. Linear / Non-linear
Methods of Regression Analysis1. Scatter Diagram2. Regression Equations3. Regression LinesRegression of x on y y= a + bx Regression of y on x x= a + by

Regression coefficients coefficient of regression of x on y = coefficient of regression of x on y = Σ( x- x-) (y- y-) Σdx dy bxy= ------------------= ------- Σ (y- y-)2 Σ dy2 coefficient of regression of y on x Σ( x- x-) (y- y-) Σdx dy byx= ------------------= ---------- Σ (x- x-)2 Σ dx2

Q2.From the data given below findtwo regression coefficientstwo regression equationscoefficient of correlation between marks in Economics & statisticsmost likely marks in statistics when marks in Economics are 30
let marks in Economics be x and that in statistics be y
Marks in Eco 25 28 35 32 31 36 29 38 34 32
Marks in Stat 43 46 49 41 36 32 31 30 33 39

Marks in Eco
25 28 35 32 31 36 29 38 34 32 Σx 320 x- 32
Marks in Stat
43 46 49 41 36 32 31 30 33 39 Σy 380 y- 38

Marks in Eco
25 2835 3
23
13
62
93
83
43
2Σx 320 x- 3
2
Marks in Stat
43 4649 4
13
63
23
13
03
33
9Σy 380 y- 3
8
dx=x- x-
=x-32-7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 3
33
3
dy=y- y-
=x-385 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0

Marks in Eco
25 28 35 32 31 36 29 38 34 32 Σx 320 x- 32
Marks in Stat
43 46 49 41 36 32 31 30 33 39 Σy 380 y- 38
dx=x- x-=x-32
-7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 33 33
dy=y- y-=x-38
5 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0
dx2 49 16 9 0 1 16 9 36 4 0 Σdx2 140
dy2 25 64 121 9 4 36 49 64 25 1 Σdy2 398
dx dy -35 -32
33 0 2 -24
21 -48
-10
0 Σdxdy
-93

Regression coefficients coefficient of regression of x on y = coefficient of regression of x on y = Σ( x- x-) (y- y-) Σdx dy -93bxy= ------------------= ------- = ------ = -0.2337 Σ (y- y-)2 Σ dy2 398coefficient of regression of y on x = Σ( x- x-) (y- y-) Σdx dy -93byx= ------------------= ---------- = --------= -0.6643 Σ (x- x-)2 Σ dx2 140

regression of x on yx-x- = bxy (y-y-)x-32 = -0.2337(y-38) = - 0.2337 y +0.2337 *38 = -0.2337y + 8.8806x = -0.2337y +32 + 8.8806
x = -0.2337y +40.8806

Correlation Coefficient = bxy *byx= -0.2337 *-0.6643 = 0.1552 = -0.394
Since byx & bxy are both negative

regression of y on xy-y- = bxy (x-x-)y-38 = -0.6643(x-32)y -38= -0.6643x+0.6643*32y = -0.6643x+38+0.6643*32y = -0.6643x+38+21.2576y = -0.6643x+59.2576

In order to estimate most likely marks in statistics (y) when Economics (x) are 30 , we shall use the line regression of y x vizThe required estimate is given byy = -0.6643* 30+59.2576= -19.929+59.2576 = =39.3286

Sum of Squares- x&y
(Σx )*(Σy) SSxy = Σ ( x-x- ) ( y-y- ) = = Σxy - -------------- nSum of Squares xx
(Σx ) SSxx = Σ ( x-x- )2 =Σx2 - ------------- n

advt sales
92 930
94 900
97 1020
98 990
100 1100
102 1050
104 1150
105 1120
105 1130
107 1200
107 1250
110 1220
Sales &advt expenses in Rs.1000. Develop a regression model

SSxy b = ------------ SSxx y=a+bx
Σ y= Σ a+b Σ xΣ y= n* a+b Σ x
n* a = b Σ x - Σ y Σ y - bΣ x Σ y bΣ x a = ----------- = ------- - ------- n n n

xi= yi= predicted residual
advt sales x2 xy (yi-y-) (yi-y-)2 y^=fits yi-y^ ( yi-y^)2 y^-y- (y^-y-)2
92 930 8464 85560 = 902.4 27.6
94 900 8836 84600 940.54 -40.54
97 1020 9409 98940 997.75 22.25
98 990 9604 97020 1016.8
2 -26.82
100 1100 10000 110000 1054.9
6 45.04
102 1050 10404 107100 1093.1 -43.1
104 1150 10816 119600 1131.2
4 18.76
105 1120 11025 117600 1150.3
1 -30.31
105 1130 11025 118650 1150.3
1 -20.31
107 1200 11449 128400 1188.4
5 11.55
107 1250 11449 133750 1188.4
5 61.55
110 1220 12100 134200 1245.6
6 -25.66
1221 13060 124581 1335420 013059.
99 0.01
Σx Σy Σx2 Σxy Σ Σ Σ(yi-yc) Σ

xi= yi= predicted residual
advt sales x2 xy (yi-y-) (yi-y-)2 y^=fits yi-y^ ( yi-y^)2 y^-y- (y^-y-)2
92 930 8464 85560 -158.3325069.44
902.4 27.6761.76 -185.93 34571.20
94 900 8836 84600 -188.3335469.44
940.54 -40.541643.49 -147.79 21842.87
97 1020 9409 98940 -68.33 4669.44 997.75 22.25 495.06 -90.58 8205.34
98 990 9604 97020 -98.33 9669.44 1016.82 -26.82 719.31 -71.51 5114.16
100 1100 10000 110000 11.67 136.11 1054.96 45.04 2028.60 -33.37 1113.78
102 1050 10404 107100 -38.33 1469.44 1093.1 -43.1 1857.61 4.77 22.72
104 1150 10816 119600 61.67 3802.78 1131.24 18.76 351.94 42.91 1840.98
105 1120 11025 117600 31.67 1002.78 1150.31 -30.31 918.70 61.98 3841.11
105 1130 11025 118650 41.67 1736.11 1150.31 -20.31 412.50 61.98 3841.11
107 1200 11449 128400 111.67 12469.44 1188.45 11.55 133.40 100.12 10023.35
107 1250 11449 133750 161.67 26136.11 1188.45 61.55 3788.40 100.12 10023.35
110 1220 12100 134200 131.67 17336.11 1245.66 -25.66 658.44 157.33 24751.68
1221 13060 124581 1335420 0.00 138966.667 13059.99 0.01 13769.21 -0.01 125191.6
Σx Σy Σx2 Σxy Σ Σ Σ(yi-yc) Σ

1221x- = ------------- = 101.75 12 (Σx *Σy) 1221*13060SSxy = Σxy - ------------= 1335420 - -------------- =6565 n 12 (Σx )2 ( 1221)2
SSxx = Σx2 - -------------= 124581 - ------- = = 344.25 n 12

SSxy 6565b = ------------- = ----------------= 19.0704 SSxx 344.25y=a+bxΣ y= Σ a+b Σ xΣ y= n* a+b Σ xn* a = b Σ x - Σ y Σ y - bΣ x Σ y bΣ x 13060 19.0704*1221a = ----------- = ------- - ------- = ---------- - -------------- n n n 12 12
= - 852.08

equation for simple regression liney= a+bx
y= -852.08+ 19.0704 x for regression of y on x

For testing the Fityi = yi- value of y –recorded value in the given data
y- = Mean ( Average )of yy^ = Predicted Values from regression linedeviation = (yi- y
-) = difference in actual value of y from meanResiduals = (yi- y^)= gap ( error , difference ) between actual value of y & predicted value calculated from regression lineDeviation of predicted value from mean = (y^- y-)a = intercept on y -axisb= slope of regression line

total sum of squares = SST = Σ (yi-y-)2
regression sum of squares = SSR = Σ (y^- y-)2
Error sum of squares = SSE = Σ (yi-y^)2
SSR coefficient of determination = γ2= ------- SST

SSE Standard Error of Estimate =Syx= √---------------- n-2In order to to determine whether a significant linear relationship exists between independent variable x and dependent variable y we perform whether population slope is zero b - β t= ---------- Sb
Syx Sb = Standard error of b= -----------
√ SSxx

H0:Slope of thr regression line is zeroH1-Slope of the regression line is not zero

SSE Syx= Standard Error of Estimate =√-------- n-2 Σ (yi-y^)2 13769.21=√ -------- = √------------ = √1376.92 = 37.1068 n-2 10-2 (Σx )2 (1221)2SSxx = Σx2 - -------- = 124581 - -------= 344.25 n 12
Syx Sb = Standard error of b= ----------- √ SSxx

Syx Sb = Standard error of b= ----------- √ SSxx b- β 19.07-0 t= ---------- = ------------------------------- = 9.53 Sb 37.1068/( √344.25) As calculated value of t is more than table value of t for 12-2 = 10 degrees of freedomNull hypothesis is rejected

Coefficient of Determination DefinitionThe Coefficient of Determination, also known as R Squared, is interpreted as the goodness of fit of a regression. The higher the coefficient of determination, the better the variance that the dependent variable is explained by the independent variable. The coefficient of determination is the overall measure of the usefulness of a regression.For example,r2 is given at 0.95. This means that the variation in the regression is 95% explained by the independent variable. That is a good regression.

The Coefficient of Determination can be calculated as the Regression sum of squares, SSR, divided by the total sum of squares, SST SSR Coefficient of Determination γ2 = ---------- SST

Campus Overview
907/A Uvarshad, GandhinagarHighway, Ahmedabad – 382422.
Ahmedabad Kolkata
Infinity Benchmark, 10th Floor, Plot G1,Block EP & GP, Sector V, Salt-Lake, Kolkata – 700091.
Mumbai
Goldline Business Centre Linkway Estate, Next to Chincholi Fire Brigade, Malad (West), Mumbai – 400 064.

Thank You