correlation analysis and regression analysis...2019/06/21 · correlation analysis and regression...
TRANSCRIPT
Correlation Analysis and
Regression Analysis
L. W. Dasanayake
Department of Economics
University of Kelaniya
• Regression Analysis deals with the nature of the
relationship between variables
• Correlation analysis is concerned with measuring
the strength of “closeness” of the relationship
between two variables.
Regression Analysis
• Simple Linear Regression analysis
• Multiple Regression analysis
Correlation Analysis
• Simple Correlation analysis
• Multiple correlation analysis
• Partial Correlation analysis
3
Simple Correlation analysis
• Graphical Modelo Simple Grapho Scatter Diagram
• Mathematical Modelo Karl Pearson’s Coefficient of Correlationo Charles Spearman’s Coefficient
0
0.5
1
1.5
2
2.5
3
3.5
0 0.5 1 1.5 2 2.5 3
y
x5
Scatter Diagram
Scatter Diagram is a graph of observed plotted points where each point
represents the values of X and Y as a coordinate. It portrays the relationship
between these two variables graphically.
Simple Correlation Analysis(Karl Pearson’s Coefficient of Correlation (r))
• Simple Correlation Analysis - Concerned with providing a statistical measure of the strength of the relationship between two variables (Independent variable and Dependent variable)
• Correlation coefficient (r) provides a numerical summary measure of the degree of the correlation between two variables. ( -1 ≤ r ≤ +1)
• The direction between them
o Positive and Negative correlation
Positive Correlation: The correlation is said to be positive
correlation if the values of two variables change with same direction.
Ex. Public Exp. and sales, Height and weight.
Study time and grades
Negative Correlation: The correlation is said to be negative
correlation when the values of variables change with opposite
direction.
Ex. Price and demand,
Alcohol consumption and driving ability.
Interpretation of Correlation Coefficient (r)
• The value of correlation coefficient ‘r’ ranges from -1 to +1
• If r = +1, then the correlation between the two variables is
said to be perfect and positive
• If r = -1, then the correlation between the two variables is
said to be perfect and negative
• If r = 0, then there exists no correlation between the
variables
• The closer the coefficient is to 1, the stronger the relationship; the closer it is to 0, the weaker the relationship.
• The coefficient will be either positive or negative – this indicates the direction of a relationship.
Interpretation of Correlation Coefficient (r)……
0
0.5
1
1.5
2
2.5
3
0 1 2 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0 0.2 0.4 0.6 0.8
10
Perfect and positive Correlation High positive Correlation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.5 1 1.5 20
0.5
1
1.5
2
2.5
3
3.5
0 1 2 3
11
Low positive Correlation No Correlation
0
0.5
1
1.5
2
2.5
3
0 0.5 1 1.50
0.5
1
1.5
2
2.5
3
0 0.5 1 1.5
0
0.5
1
1.5
2
2.5
3
3.5
4
0 0.5 1 1.5
12
Perfect and negative Correlation High negative Correlation Low negative Correlation
Karl Pearson’s Coefficient of Correlation – r (Simple Correlation)
𝑟 =𝑛 𝑥𝑖𝑦𝑖 − 𝑥𝑖 𝑦𝑖
𝑛 𝑥𝑖2 − ( 𝑥𝑖)
2 𝑛 𝑦𝑖2 − ( 𝑦𝑖)
2
n – Sample size𝑦𝑖 - Dependent variable𝑥𝑖- Independent Variable
Year Millions spent on Research &
Development
Annual profit
(millions)
2013 2 20
2014 3 25
2015 5 34
2016 4 30
2017 11 40
2018 5 31
14
The following data record the annual profits (millions) made by a company over
6 years along with the amount spent on research and development (millions) for
each year.
Make comments on the relationship between two variables.
Year x Y 𝑥2 xy y2
2013 2 20 4 40 400
2014 3 25 9 75 625
2015 5 34 25 170 1156
2016 4 30 16 120 900
2017 11 40 121 440 1600
2018 5 31 25 155 961
𝑥 = 30 𝑦 = 180 𝑥 2= 200 𝑥𝑦 = 1000 𝑦 2= 5642
r=6𝑥1000−30𝑥180
6𝑥200−302 6𝑥5642−1802=0.909
15
𝑟 =𝑛 𝑥𝑖𝑦𝑖 − 𝑥𝑖 𝑦𝑖
𝑛 𝑥𝑖2 − ( 𝑥𝑖)
2 𝑛 𝑦𝑖2 − ( 𝑦𝑖)
2
Spearman’s Coefficient of correlation (𝑟𝑠)
• In some situations the values of the variables X and Y are expressed in rank order form.
• The measure of correlation which deals with this type of situation is Spearman’s rank correlation Coefficient.
• The value of 𝑟𝑠 varies within the range -1 to +1.
Charles Spearman’s Coefficient of correlation
Spearman’s Coefficient of correlation (𝑟𝑠) = 1- [6 𝐷2
𝑛(𝑛2−1)]
D – difference between each pair of x and y ranks n – number of paired values of x and y
Example: In a survey, ten popular television programmes were ranked in order by groups of men and women as shown below. Is there a significant relationship between the ranking of programmes by men and women?
Television Program Ranking by men Ranking by women
1 1 5
2 5 10
3 8 6
4 7 4
5 2 7
6 3 2
7 10 9
8 4 8
9 6 1
10 9 3
Television Program Ranking by men Ranking by women D 𝐃𝟐
1 1 5 -4 16
2 5 10 -5 25
3 8 6 2 4
4 7 4 3 9
5 2 7 -5 25
6 3 2 1 1
7 10 9 1 1
8 4 8 -4 16
9 6 1 5 25
10 9 3 6 36
𝐷2 = 158
𝑟𝑠 = 1- [6 𝐷2
𝑛(𝑛2−1)] 𝑟𝑠=1 − [
6X158
10(100−1)] = 0.0424 The ranking of TV
programs by men and women are not related.
Multiple Correlation
• It is a study of more than two variables.• One is dependent variable and others are independent
variables.• Study of multiple impact of independent variables on
dependent variable.• Study the direction between variables – Only positive
correlation.• Study the degree between variables – Correlation ranges
between 0 and 1 (0 ≤ R ≤ 1).
Options of Multiple Correlation
RX.YZ = The multiple impact of Y and Z independent variables on x dependent variable.
RZ.XY= The multiple impact of X and Y independent variables on Z dependent variable.
RY.XZ = The multiple impact of X and Z independent variables on Y dependent variable.
Coefficient of Multiple Correlation RX.YZ
RX.YZ =𝑟𝑋𝑌2 + 𝑟𝑋𝑍
2 − 2𝑟𝑋𝑌. 𝑟𝑋𝑍. 𝑟𝑌𝑍
1 − 𝑟𝑌𝑍2
Where RX.YZ = Multiple Correlation coefficient𝑟𝑋𝑌, 𝑟𝑋𝑍, 𝑟𝑌𝑍 = Simple correlation coefficient
Partial Correlation(First Order Correlation)
• A study of more than two variables• One is dependent variable and others are
independent variables.• Study the partial impact of one independent
variable on one dependent variable keeping other independent variables are constant.
• The direction between variables may be positive or negative.
• The correlation ranges between -1≤r ≤ 1.
Options of Partial Correlation Coefficient
rxy.z = The partial impacts of Y variable on X variable keeping Z independent variable constant.
ryz.x = The partial impacts of Z variable on Y variable keeping X independent variable constant.
rxz.y = The partial impacts of Z variable on X variable keeping Y independent variable constant.
Coefficient of Partial Correlation rxy.z
𝑟xy.z =𝑟𝑥𝑦 − 𝑟𝑥𝑧. 𝑟𝑦𝑧
1 − 𝑟𝑥𝑧2 1 − 𝑟𝑦𝑧
2
Where rxy.z = partial Correlation between X and Y
𝑟𝑋𝑌, 𝑟𝑋𝑍, 𝑟𝑌𝑍 = Simple correlation coefficient
𝑟𝑥𝑦= 0.60 𝑟𝑋𝑍 = 0.70 𝑟𝑌𝑍 = 0.4
26
Compute Partial Correlation Coefficients for the given values.