correlation & regression
TRANSCRIPT
Name of Institution
1
CORRELATION & REGRESSION ANALYSIS
Name of Institution
2
CORRELATION
• When the relationship is of quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief formula is known as correlation.
• The measure of correlation called the coefficient of correlation indicates the strength & direction of relationship between two variables.
• The coefficient between two variables x and y is denoted by r or rxy
or ρ.
• It lies between – 1 to + 1.
• If r = 0, then the variables are said to be independent.
Name of Institution
3
TYPES OF CORRELATION
I) Based on Direction:--Positive Correlation: When increase/decrease in the value of one variable results in a corresponding increase/ decrease in the value of other variable.Negative Correlation: When increase/ decrease in the value of one variable results in a corresponding decrease/ increase in the value of other variable.
II) Based on Degree:-- High
ModerateLow
Name of Institution
4
METHODS OF STUDYING CORRELATION
1) Scatter Diagram Method.
2) Karl Pearson’s Coefficient of Correlation.
3) Spearman’s Rank Correlation Coefficient.
Name of Institution
5
SCATTER DIAGRAM
• The simplest method for studying correlation in two variables is a special type of dot chart called Dotogram or Scatter Diagram.
• In this method given data are plotted in the form of dots, for each pair of X and Y.
• The more the plotted points scatter over the chart, the lesser is the degree of relationship between two variables.
• The more nearly the points come to the line, the higher the degree of relationship.
Name of Institution
Y
X
= -1= -1 Y
X
= 0= 0
Y
X
= 1= 1 Y
X
= 0= 0
Perfect negativeCorrelation
No Correlation
Perfect PositiveCorrelation
No Correlation
Name of Institution
7
Advantages:
1. It is readily comprehensive and enables us to form a rough idea of the nature of relationship between the two variables x and y.2. It is not affected by extreme observations.
Disadvantages:
1.It is not a suitable method if the number of observations is fairly large.2.It is only a rough measure of correlation where the exact magnitude cannot be known.
Name of Institution
8
KARL PEARSON COEFFICIENT OF CORRELATION
• Also known as Pearsonian Coefficient of Correlation.
• It describes the degree & direction of relationship between two variables X and Y.
• It is denoted by the symbol ‘r’.
• The value of Pearson’s coefficient of correlation lies between -1 to +1.
• If X and Y are independent variables then coefficient of correlation is zero.
Name of InstitutionPEARSON FORMULA
• Correlation coefficient is denoted by r given by the formula:-
n
yy
n
xx
n
yxxy
ror
formThird
yyxx
yyxxr
formSecond
yxCov
yx
yxCovr
formFirst
yx
2
2
2
2
22
)(
)()(
))((
),.(
varvar
),.(
Name of Institution
10
Ques 1. Calculate Karl Pearson coefficient of correlation.
X Y
12 14
9 8
8 6
10 9
11 11
13 12
7 3
Name of Institution
11
Ques 2. A financial analyst wanted to find out whether inventory turnover influences any company’s earnings per share.Random sample of 7 companies listed in stock exchange were selected and the following data was recorded for each.Find the correlation coefficient.
Company Inventory turnover
Earnings per share (%)
A 4 11
B 5 9
C 7 13
D 8 7
E 6 13
F 3 8
G 5 8
Name of Institution
12
Ques 3. The following table gives the indices of industrial production and number of registered unemployed people (in lakhs). Calculate Karl Pearson’s coefficient of correlation.
Index of production
No. of unemployed
100 15
102 12
104 13
107 11
105 12
112 12
103 19
99 26
Name of InstitutionSPEARMAN CORRELATION
• Rank X and Y separately.• The largest value gets rank 1 and the second
largest 2 and so on.• Formula is:-
• For tied ranks:-
YRankXRankdwherenn
d
;
)1(
*61
2
2
.
)1(
.......)(121
)(121
*61
2
23
213
12
repeatedisvalueatimesofnumbertheismHere
nn
mmmmd
Name of Institution
Question1) Calculate the coefficient of correlation for the following heights in inches of fathers(X) and sons(Y).
X Y
65 67
66 68
67 65
67 68
68 72
69 72
70 69
72 71
Name of Institution
15
Question 2) Find rank correlation coefficient between x and y.
X Y
85 18.3
91 20.8
56 16.9
72 15.7
95 19.2
76 18.1
89 17.5
51 14.9
59 18.9
90 15.4
Name of Institution
Question 3) obtain the rank correlation coefficient for the following data.
X Y
68 62
64 58
75 68
50 45
64 81
80 60
75 68
40 48
55 50
64 70
Name of InstitutionREGRESSION
• Regression analysis provides a mathematical model of the relationship between two variables, in which one is independent and one is dependent.
• If X and Y are two variables, then we have two regression lines:-
(a) Regression line of X on Y.
(b) Regression line of Y on X.
Name of InstitutionRegression line X on Y.
The regression line of X on Y is given by:-
X= a + b Y
where, b is called regression coefficient X on Y, denoted by bxy
Here, Y is the independent variable and X is dependent variable.
Normal equations to estimate a and b are:-
2YbYaXY
YbnaX
Name of Institution
Another form of regression equation X on Y is :-
y
xxy
y
x
rbHere
YYrXX
*,
*
Name of InstitutionRegression line Y on X.
The regression line of Y on X is given by:-
Y= a + b X
where, is called regression coefficient X on Y, denoted by byx
Here, X is the independent variable and Y is dependent variable.
Normal equations to estimate a and b are:-
2XbXaXY
XbnaY
Name of Institution
Another form of regression equation Y on X is :-
x
yyx
x
y
rbHere
XXrYY
*,
*
Name of InstitutionProperties of regression lines and
coefficients
• Both the regression lines passes through the point • The correlation coefficient is the geometric mean of two
regression coefficients of X and Y i.e
• If one of the regression coefficients is greater than 1,the other must be less than 1.
• bxy and byx and correlation coefficient (r) have the same sign.
for eg:-if bxy = -0.664 and byx = -0.234
then r = -(0.664*0.234)1/2 = -0.394
yx,
yxxy bbr
Name of Institution
QUESTION 1) You are given the following information about advertising expenditure and sales.
Advertisement(x) Sales(y)
A.M 10 90
S.D 3 12
And r = 0.8
(a)Obtain the two regression lines.
(b)Find the likely sales when advertisement budget is Rs 15 lakhs?
Name of Institution
QUESTION 2) The two regression lines are given by:-
3 X + 12 Y = 19
9 X +3 Y = 46
And σx = 4.
Obtain:-
(a). Mean values of X and Y.
(b) The value of correlation coefficient.
(c) Standard deviation of y.
Name of Institution
25
Question 3. For the following data,
Obtain the two regression equations and hence find the correlation coefficient.
X 1 2 3 4 5
Y 2 5 3 8 7
Name of Institution
26
Question 4. The following data gives the ages and blood pressure of 10 women.
(i) Find the correlation coefficient between age and blood pressure.(ii) Determine the regression equation of blood pressure on age.(iii) Estimate the blood pressure of a woman whose age is 45 years.
Age 56 42 36 47 49 42 60 72 63 55
B.P 147 125 118 128 145 140 155 160 149 150