correlation analysis · correlation a linear association between two random variables correlation...

29
CORRELATION ANALYSIS NDIM

Upload: others

Post on 18-Jul-2020

37 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

CORRELATION

ANALYSIS

NDIM

Page 2: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

IntroductionCorrelation a LINEAR association between two

random variables

Correlation analysis show us how to determineboth the nature and strength of relationshipbetween two variables

When variables are dependent on timecorrelation is applied

Correlation lies between +1 to -1NDIM

Page 3: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

A zero correlation indicates that there is no

relationship between the variables

A correlation of –1 indicates a perfect negative

correlation

A correlation of +1 indicates a perfect positive

correlation

NDIM

Page 4: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Types of Correlation

There are three types of correlation

Types

Type 1 Type 2 Type 3

NDIM

Page 5: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Type1

Positive Negative No Perfect

If two related variables are such that when

one increases (decreases), the other also

increases (decreases).

If two variables are such that when one

increases (decreases), the other decreases

(increases)

If both the variables are independentNDIM

Page 6: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

When plotted on a graph it tends to be a perfect

line

When plotted on a graph it is not a straight line

Type 2

Linear Non – linear

NDIM

Page 7: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

NDIM

Page 8: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Type 3

Simple Multiple Partial

Two independent and one dependent variable

One dependent and more than one independent

variables

One dependent variable and more than one

independent variable but only one independent

variable is considered and other independent

variables are considered constant

NDIM

Page 9: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Methods of Studying Correlation

Scatter Diagram Method

Karl Pearson Coefficient Correlation of

Method

Spearman’s Rank Correlation Method

NDIM

Page 10: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

180

160

140

120

100

80

60

40

20

0

0 5 0 2 0 0 2 5 0100 150

Drug A (dose in mg)

S y

mpt

omIn

dex

160

140

120

100

80

60

40

20

00 50 250100 150 200

Drug B (dose in mg)

Sym

pto

mIn

dex

Very good fit Moderate fit

Correlation: LinearRelationships

Strong relationship = good linear fit

Points clustered closely around a line show a strong correlation. The line is a good predictor (good fit) with the data. The more spread out the points, the weaker the correlation, and the less good the fit. The line is a REGRESSSION line (Y = bX + a)

NDIM

Page 11: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Coefficient of CorrelationA measure of the strength of the linear relationship

between two variables that is defined in terms of the

(sample) covariance of the variables divided by their

(sample) standard deviations

Represented by “r”

r lies between +1 to -1

Magnitude and Direction

NDIM

Page 12: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

-1 < r < +1

The + and – signs are used for positive linear

correlations and negative linear

correlations, respectively

NDIM

Page 13: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

n Y 2 ( Y)2X2n ( X)2

n XY X Yrxy

Shared variability of X and Y variables on the topIndividual variability of X and Y variables on the bottom

NDIM

Page 14: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Interpreting CorrelationCoefficient r

strong correlation: r > .70 or r < –.70

moderate correlation: r is between .30 &.70or r is between –.30

and –.70

weak correlation: r is between 0 and .30 or r is between 0 and –.30 .

NDIM

Page 15: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Spearmans rank coefficient

A method to determine correlation when the data

is not available in numerical form and as an

alternative the method, the method of rank

correlation is used. Thus when the values of the

two variables are converted to their ranks, and

there from the correlation is obtained, the

correlations known as rank correlation.

NDIM

Page 16: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Computation of RankCorrelation

Spearman’s rank correlation coefficient

ρ can be calculated when

Actual ranks given

Ranks are not given but grades are given but not

repeated

Ranks are not given and grades are given and

repeated

NDIM

Page 17: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

REGRESSION ANALYSIS

NDIM

Page 18: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Algebraically method

1.Least Square Method-:

The regression equation of X on Y is :

X= a+bX

Where,

X=Dependent variable and Y=Independent variable

The regression equation of Y on X is:

Y = a+bX

Where,

Y=Dependent variable

X=Independent variable

NDIM

Page 19: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Simple Linear Regression

Independent variable (x)

De

pe

nd

en

t va

ria

ble

(y)

The output of a regression is a function that predicts the dependent variable

based upon values of the independent variables.

Simple regression fits a straight line to the data.

y = a + bX ± є

a (y intercept)

b = slope

= ∆y/ ∆x

є

NDIM

Page 20: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Example1-: From the following data obtain the regression equations using the method of Least Squares.

X 3 2 7 4 8

Y 6 1 8 5 9

Solution-:

X Y XY X2 Y2

3 6 18 9 36

2 1 2 4 1

7 8 56 49 64

4 5 20 16 25

8 9 72 64 81

24X 29Y 168XY 1422 X 2072 Y

NDIM

Page 21: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

XbnaY

2XbXaXY

Substitution the values from the table we get

29=5a+24b…………………(i)168=24a+142b84=12a+71b………………..(ii)

Multiplying equation (i ) by 12 and (ii) by 5

348=60a+288b………………(iii)420=60a+355b………………(iv)

By solving equation(iii)and (iv) we get

a=0.66 and b=1.07

NDIM

Page 22: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

By putting the value of a and b in the Regression equation Y on X we get

Y=0.66+1.07X

Now to find the regression equation of X on Y ,The two normal equation are

2YbYaXY

YbnaX

Substituting the values in the equations we get

24=5a+29b………………………(i)168=29a+207b…………………..(ii)

Multiplying equation (i)by 29 and in (ii) by 5 we get

a=0.49 and b=0.74

NDIM

Page 23: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Substituting the values of a and b in the Regression equation X and Y

X=0.49+0.74Y

2.Deaviation from the Arithmetic mean method:

The calculation by the least squares method are quit cumbersome when the values of X and Y are large. So the work can be simplified by using this method.The formula for the calculation of Regression Equations by this method:

Regression Equation of X on Y- )()( YYbXX xy Regression Equation of Y on X-

)()( XXbYY yx

2y

xybxy

2x

xybyxand

Where,xyb

yxband = Regression Coefficient

NDIM

Page 24: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Example2-: from the previous data obtain the regression equations byTaking deviations from the actual means of X and Y series.

X 3 2 7 4 8

Y 6 1 8 5 9

X Y x2 y2 xy

3 6 -1.8 0.2 3.24 0.04 -0.36

2 1 -2.8 -4.8 7.84 23.04 13.44

7 8 2.2 2.2 4.84 4.84 4.84

4 5 -0.8 -0.8 0.64 0.64 0.64

8 9 3.2 3.2 10.24 10.24 10.24

XXx YYy

24X 29Y 8.262 x 8.28xy8.382 y 0x 0 y

Solution-:

NDIM

Page 25: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Regression Equation of X on Y is

49.074.0

8.574.08.4

8.58.38

8.288.4

2

YX

YX

YX

y

xybxy

Regression Equation of Y on X is)()( XXbYY yx

66.007.1

)8.4(07.18.5

8.48.26

8.288.5

2

XY

XY

XY

x

xybyx

………….(I)

………….(II)

)()( YYbXX xy

NDIM

Page 26: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

It would be observed that these regression equations are same as those obtained by the direct method .

3.Deviation from Assumed mean method-:

When actual mean of X and Y variables are in fractions ,thecalculations can be simplified by taking the deviations from theassumed mean.

The Regression Equation of X on Y-:

22

yy

yxyx

xy

ddN

ddddNb

The Regression Equation of Y on X-:

22

xx

yxyx

yx

ddN

ddddNb

)()( YYbXX xy

)()( XXbYY yx

But , here the values of and will be calculated byfollowing formula:

xyb yxb

NDIM

Page 27: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

Example-: From the data given in previous example calculateregression equations by assuming 7 as the mean of X series and 6 asthe mean of Y series.

X YDev. From

assu. Mean 7 (dx)=X-7

Dev. From assu. Mean 6 (dy)=Y-6

dxdy

3 6 -4 16 0 0 0

2 1 -5 25 -5 25 +25

7 8 0 0 2 4 0

4 5 -3 9 -1 1 +3

8 9 1 1 3 9 +3

Solution-:

2

xd 2

yd

24X 29Y 11xd 1yd 512

xd 392

yd 31yxddNDIM

Page 28: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

The Regression Coefficient of X on Y-:

22

yy

yxyx

xy

ddN

ddddNb

74.0

194

144

1195

11155

)1()39(5

)1)(11()31(52

xy

xy

xy

xy

b

b

b

b

8.55

29

Y

N

YY

The Regression equation of X on Y-:

49.074.0

)8.5(74.0)8.4(

)()(

YX

YX

YYbXX xy

8.45

24

X

N

XX

NDIM

Page 29: CORRELATION ANALYSIS · Correlation a LINEAR association between two random variables Correlation analysis show us how to determine both the nature and strength of relationship between

The Regression coefficient of Y on X-:

22

xx

yxyx

yx

ddN

ddddNb

07.1

134

144

121255

11155

)11()51(5

)1)(11()31(52

yx

yx

yx

yx

b

b

b

b

The Regression Equation of Y on X-:)()( XXbYY yx

66.007.1

)8.4(07.1)8.5(

XY

XY

It would be observed the these regression equations are same as thoseobtained by the least squares method and deviation from arithmetic mean .

NDIM