correlation analysis and regression analysis...2019/06/21  · correlation analysis and regression...

Post on 18-Jul-2020

25 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Correlation Analysis and

Regression Analysis

L. W. Dasanayake

Department of Economics

University of Kelaniya

• Regression Analysis deals with the nature of the

relationship between variables

• Correlation analysis is concerned with measuring

the strength of “closeness” of the relationship

between two variables.

Regression Analysis

• Simple Linear Regression analysis

• Multiple Regression analysis

Correlation Analysis

• Simple Correlation analysis

• Multiple correlation analysis

• Partial Correlation analysis

3

Simple Correlation analysis

• Graphical Modelo Simple Grapho Scatter Diagram

• Mathematical Modelo Karl Pearson’s Coefficient of Correlationo Charles Spearman’s Coefficient

0

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3

y

x5

Scatter Diagram

Scatter Diagram is a graph of observed plotted points where each point

represents the values of X and Y as a coordinate. It portrays the relationship

between these two variables graphically.

Simple Correlation Analysis(Karl Pearson’s Coefficient of Correlation (r))

• Simple Correlation Analysis - Concerned with providing a statistical measure of the strength of the relationship between two variables (Independent variable and Dependent variable)

• Correlation coefficient (r) provides a numerical summary measure of the degree of the correlation between two variables. ( -1 ≤ r ≤ +1)

• The direction between them

o Positive and Negative correlation

Positive Correlation: The correlation is said to be positive

correlation if the values of two variables change with same direction.

Ex. Public Exp. and sales, Height and weight.

Study time and grades

Negative Correlation: The correlation is said to be negative

correlation when the values of variables change with opposite

direction.

Ex. Price and demand,

Alcohol consumption and driving ability.

Interpretation of Correlation Coefficient (r)

• The value of correlation coefficient ‘r’ ranges from -1 to +1

• If r = +1, then the correlation between the two variables is

said to be perfect and positive

• If r = -1, then the correlation between the two variables is

said to be perfect and negative

• If r = 0, then there exists no correlation between the

variables

• The closer the coefficient is to 1, the stronger the relationship; the closer it is to 0, the weaker the relationship.

• The coefficient will be either positive or negative – this indicates the direction of a relationship.

Interpretation of Correlation Coefficient (r)……

0

0.5

1

1.5

2

2.5

3

0 1 2 3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 0.2 0.4 0.6 0.8

10

Perfect and positive Correlation High positive Correlation

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

3

3.5

0 1 2 3

11

Low positive Correlation No Correlation

0

0.5

1

1.5

2

2.5

3

0 0.5 1 1.50

0.5

1

1.5

2

2.5

3

0 0.5 1 1.5

0

0.5

1

1.5

2

2.5

3

3.5

4

0 0.5 1 1.5

12

Perfect and negative Correlation High negative Correlation Low negative Correlation

Karl Pearson’s Coefficient of Correlation – r (Simple Correlation)

𝑟 =𝑛 𝑥𝑖𝑦𝑖 − 𝑥𝑖 𝑦𝑖

𝑛 𝑥𝑖2 − ( 𝑥𝑖)

2 𝑛 𝑦𝑖2 − ( 𝑦𝑖)

2

n – Sample size𝑦𝑖 - Dependent variable𝑥𝑖- Independent Variable

Year Millions spent on Research &

Development

Annual profit

(millions)

2013 2 20

2014 3 25

2015 5 34

2016 4 30

2017 11 40

2018 5 31

14

The following data record the annual profits (millions) made by a company over

6 years along with the amount spent on research and development (millions) for

each year.

Make comments on the relationship between two variables.

Year x Y 𝑥2 xy y2

2013 2 20 4 40 400

2014 3 25 9 75 625

2015 5 34 25 170 1156

2016 4 30 16 120 900

2017 11 40 121 440 1600

2018 5 31 25 155 961

𝑥 = 30 𝑦 = 180 𝑥 2= 200 𝑥𝑦 = 1000 𝑦 2= 5642

r=6𝑥1000−30𝑥180

6𝑥200−302 6𝑥5642−1802=0.909

15

𝑟 =𝑛 𝑥𝑖𝑦𝑖 − 𝑥𝑖 𝑦𝑖

𝑛 𝑥𝑖2 − ( 𝑥𝑖)

2 𝑛 𝑦𝑖2 − ( 𝑦𝑖)

2

Spearman’s Coefficient of correlation (𝑟𝑠)

• In some situations the values of the variables X and Y are expressed in rank order form.

• The measure of correlation which deals with this type of situation is Spearman’s rank correlation Coefficient.

• The value of 𝑟𝑠 varies within the range -1 to +1.

Charles Spearman’s Coefficient of correlation

Spearman’s Coefficient of correlation (𝑟𝑠) = 1- [6 𝐷2

𝑛(𝑛2−1)]

D – difference between each pair of x and y ranks n – number of paired values of x and y

Example: In a survey, ten popular television programmes were ranked in order by groups of men and women as shown below. Is there a significant relationship between the ranking of programmes by men and women?

Television Program Ranking by men Ranking by women

1 1 5

2 5 10

3 8 6

4 7 4

5 2 7

6 3 2

7 10 9

8 4 8

9 6 1

10 9 3

Television Program Ranking by men Ranking by women D 𝐃𝟐

1 1 5 -4 16

2 5 10 -5 25

3 8 6 2 4

4 7 4 3 9

5 2 7 -5 25

6 3 2 1 1

7 10 9 1 1

8 4 8 -4 16

9 6 1 5 25

10 9 3 6 36

𝐷2 = 158

𝑟𝑠 = 1- [6 𝐷2

𝑛(𝑛2−1)] 𝑟𝑠=1 − [

6X158

10(100−1)] = 0.0424 The ranking of TV

programs by men and women are not related.

Multiple Correlation

• It is a study of more than two variables.• One is dependent variable and others are independent

variables.• Study of multiple impact of independent variables on

dependent variable.• Study the direction between variables – Only positive

correlation.• Study the degree between variables – Correlation ranges

between 0 and 1 (0 ≤ R ≤ 1).

Options of Multiple Correlation

RX.YZ = The multiple impact of Y and Z independent variables on x dependent variable.

RZ.XY= The multiple impact of X and Y independent variables on Z dependent variable.

RY.XZ = The multiple impact of X and Z independent variables on Y dependent variable.

Coefficient of Multiple Correlation RX.YZ

RX.YZ =𝑟𝑋𝑌2 + 𝑟𝑋𝑍

2 − 2𝑟𝑋𝑌. 𝑟𝑋𝑍. 𝑟𝑌𝑍

1 − 𝑟𝑌𝑍2

Where RX.YZ = Multiple Correlation coefficient𝑟𝑋𝑌, 𝑟𝑋𝑍, 𝑟𝑌𝑍 = Simple correlation coefficient

Partial Correlation(First Order Correlation)

• A study of more than two variables• One is dependent variable and others are

independent variables.• Study the partial impact of one independent

variable on one dependent variable keeping other independent variables are constant.

• The direction between variables may be positive or negative.

• The correlation ranges between -1≤r ≤ 1.

Options of Partial Correlation Coefficient

rxy.z = The partial impacts of Y variable on X variable keeping Z independent variable constant.

ryz.x = The partial impacts of Z variable on Y variable keeping X independent variable constant.

rxz.y = The partial impacts of Z variable on X variable keeping Y independent variable constant.

Coefficient of Partial Correlation rxy.z

𝑟xy.z =𝑟𝑥𝑦 − 𝑟𝑥𝑧. 𝑟𝑦𝑧

1 − 𝑟𝑥𝑧2 1 − 𝑟𝑦𝑧

2

Where rxy.z = partial Correlation between X and Y

𝑟𝑋𝑌, 𝑟𝑋𝑍, 𝑟𝑌𝑍 = Simple correlation coefficient

𝑟𝑥𝑦= 0.60 𝑟𝑋𝑍 = 0.70 𝑟𝑌𝑍 = 0.4

26

Compute Partial Correlation Coefficients for the given values.

top related