linear correlation and regression - ca study web€¦ · linear correlation and regression...

55
LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two variables. Correlation Analysis involves various methods and techniques used for studying and measuring the extent of the relationship between two variables. The correlation expresses rates or relationship between the groups of items but not between the individual items. The relationship between the two variables is not functional. In other words, correlation analysis is a statistical procedure by which we can determine the degree of association or relationship between two or more variables. The amount of correlation in a sample (of data) is measured by the sample coefficient of correlation, which is, generally, denoted by r or by ρ . The relationship between two variables such that a change in one variable results in a positive or negative change in the other variable and also a greater change in one variable results in corresponding greater or smaller change in the other variable is known as correlation. COEFFICIENT OF CORRELATION Correlation may be defined as a tendency towards interrelation variation and the coefficient of correlation is a measure of such a tendency, i.e., the degree to which the two variables are interrelated is measured by a coefficient which is called the coefficient of correlation. It gives the degree of correlation. 3 198 STATISTICS (CPT)

Upload: others

Post on 22-Aug-2020

91 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

LINEAR CORRELATION ANDREGRESSION

CORRELATION

Correlation is a statistical tool which studies the relationshipbetween two variables. Correlation Analysis involves variousmethods and techniques used for studying and measuring the extentof the relationship between two variables.

The correlation expresses rates or relationship between thegroups of items but not between the individual items. Therelationship between the two variables is not functional.

In other words, correlation analysis is a statistical procedure bywhich we can determine the degree of association or relationshipbetween two or more variables. The amount of correlation in asample (of data) is measured by the sample coefficient ofcorrelation, which is, generally, denoted by r orby ρ .

The relationship between two variables such that a change in one variableresults in a positive or negative change in the other variable and also agreater change in one variable results in corresponding greater or smallerchange in the other variable is known as correlation.

COEFFICIENT OF CORRELATION

Correlation may be defined as a tendency towards interrelation variationand the coefficient of correlation is a measure of such a tendency, i.e.,the degree to which the two variables are interrelated is measured by acoefficient which is called the coefficient of correlation. It gives the degreeof correlation.3

198 STATISTICS (CPT)

Page 2: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

The coefficient of correlation between the two variables x, y is generally denoted by r or or rxy

or r(x, y) or ρ (x, y) or ρ .

PROPERTIES OF COEFFICIENT OF CORRELATION

1. It is a measure of the closeness of a fit in a relative sense.

2. Correlation coefficient lies between -1 and + 1, i.e., - 1 r 1.

3. The correlation is perfect and positive if r = 1 and it is perfrct and negative if r = -1

4. If r = 0, then there is no correlation between the two variables and thus the variables aresaid to be independent

5. The correlation coefficient is a pure number and is not affected by a change of origin andscale in magnitude.

This property states that if the original pair of variables x andy is changed to a new pair ofvariables u and v by affecting a change of origin and scale for both x and y i.e.

baxu

or x ab b

u ; and y cv

d

or cyvd d

where a and c are the origins of x and y and b and d are respective scales and then we have:

xy uvbd

| b | | d |r r ...(A)

rxy and ruv being the coefficient of correlation between ‘x and y’ and ‘u and v’ respectively,The equation (A) shows that numerically, the two correlation coefficients remain equal andthey would have opposite signs only when b and d, the two scales, differ in sign.

6. It is a relative measure of association between two or more variables.

Example 1. Given that the coefficient of correlation between x andy is 0.6, write down the conelationcoefficient between u and v where

(a) 2u - 3x + 4 = 0 and 4v - 16y + 11 = 0

(b) 2u + 3x + 4 = 0 and 4v - 16y + 11 = 0.

Solution. (a) 2u - 3x + 4 = 0 u = 32 x - 2 and 4v - 16y + 11 = 0 v = 4y -

114

We know that if u = (1/b)x - (a/b) and v = (1/d)y -

then rxy = bd

b x d ruv or ruv = b x d

bd rxy

LINEAR CORRELATION AND REGRESSION 199

Page 3: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

Here b = 3/2, = 4, and rxy = 0.6

Since b and d are of the some sign, so rxy = ruv

Moreover; ruv = b x d

bd rxy 0.6 x

(3 / 2)x4(3 / 2)x4 = 0.6.

(b) 2u + 3x + 4 = 0 u = (-3/2) x - 2

4u - 16y + 11 = 0 v = 4y - 11/4

Hence b = -3/2 and d = 4.Since b and d are of opposite signs and so we have:ruv = - rxy = - 0.6.

BIVARIATE DISTRIBUTION

There are two types of distributions.

(i) Univariate Distribution. These are the distributions in which there is only one variable suchas the heights of the students of a class or marks obtained by the students of a class.

(ii) Bivariate Distribution. Distribution involving two discrete variables is called a Bivariatedistribution. For example,

1. The heights and weights of the students of a class in a school.

2. The daily petrol used by a scooter owner and the mileage covered by it.

Let (xi, yi), i = 1,2, 3...... m; j = 1,2, ..., n, be a bivariate distribution. If the pair (xi, yi) occurs fij

times, then fij is called the frequency of the pair (xi, yi) and m n

iji 1 j 1

f = N = the total frequency.

In this section, we shall be dealing with a bivariate population. In a bivariate population we areinterested to know whether there exists some sort of functional relationship between the twovariables involved.

COVARIANCE

Before we study the correlation analysis we introduce the concept of covariance between twoquantitative variables X and Y. Let the corresponding values of the two variables X and Y on thegiven set of n units of observations be given by the ordered pairs.

(x1, y1), (x2, y2), (x3, y3) ..... (xn, yn).

Then the covariance between X and V is denoted by Coy. (X, Y). It is defined as

Cov. (X, Y) = 1 1 2 2 n n(x x) (y y) (x x)(y y) ..... (x x)(y y)n

200 STATISTICS (CPT)

Page 4: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

= n

i ii 1

1 (x x) (y y)n

... (1)

where x and y are the means of X and Y series respectively..

i.e., x = 1 2 nx x ..... xn

y = 1 2 ny y ..... y

n

Covariance is an “absolute measure”.

Calculation of Covariance

The above formula for the calculation of covariance is complicated and may have more chancesof the occurrence of an error. We now give below a formula which makes the calculations easierto carry out and which also reduces the chances of an error.

The formula is

Cov. (X, Y) = 1n

n n n

i i i ii 1 i 1 i 1

1x y x yn

....(2)

Cov. (X, Y) = E (XY) - E(X) E(Y) ....(3)where E (X), E (Y), E (XY) are the expectations of X, Y, and XY respectivelyExample 2. Calculate the covariance of the following pairs of observations of the two variates Xand Y(1, 6), (2, 9), (3, 6), (4, 7), (5, 8), (6, 5), (7, 12), (8, 3), (9, 17), (10, 1).

Solution. Here ix = 1 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55.

iy = 6 + 9 + 6 + 7 + 8 + 5 + 12 + 3 + 17 + = 74.

i ix y = 1 x 6 + 2 x 9 + 3 x 6 + 4 x 7 + 5 x 8 + 6 x 5 + 7 X 12 + 8 X 3 + 9 17 + 10 X 1

= 6 + 18 + 18 + 28 + 40 + 30 + 84 + 24 + 153 + 10 = 411.

Cov. (X, Y) = 1n i i i i

1x y x yn

=

1 1411 x 55 x 7410 10

= 1

10 (411 - 407) = 0.4

Example 3. Calculate the covariance of the following pairs of observations of the variables Xand Y.(15, 44), (20, 43), (25, 45), (30, 37), (40, 34), (50, 37).

Solution. Here x = 15 20 25 30 40 50

6

= 180

6 = 30;

y = 44 43 45 37 34 37

6

= 240

6 = 40.

Here = i i(x x) (y y) = (x 30)(y 40) = -60 - 30 - 25 + - 60 - 60 = -235

LINEAR CORRELATION AND REGRESSION 201

Page 5: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

n = 6.

Cov. (X, Y) = i i(x x)(y y)n

= 235

6 = -39.17 nearly..

TYPES OF CORRELATIONS

Positive Correlation

If the value of the two variables deviate in the same direction, i.e., an increase in the value of one variableresults, on an average, in a corresponding increase in the values of the other variable or if a decrease in thevalue of one variable say X results on an average a corresponding decrease in the other variable say Y,the correlation is said to be positive or direct. The height and weight of a growing child is anexample of positive correlation.

Negative or Inverse Correlation

The correlation is said to be negative or inverse if the two variables X and Y deviate in the oppositedirection, i.e., if the increase (or decrease) in the values of the variable X results, on the average, in acorresponding decrease (or increase) in the values of other variable Y.

The price and demand of a commodity or the expenditure and saving of a person etc. are examplesof negative or inverse correlation.

Linear Correlation

If the plotted points(xi, yi) are approximately on or near about a straight line, then the correlation betweenthe variables is said to be linear (see Fig. 7.1). For example, the correlation between the savings andearnings of a man is a linear correlation.

Perfectly Linear Correlation

When all the plotted points lie exactly on a straight line, then the correlation is said to be perfectly linear.(See Fig. 4.1)

202 STATISTICS (CPT)

Page 6: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

Perfect Correlation

If the deviation in one variable is followed by a corresponding and proportional deviation in the other,then the correlation is said to be perfect correlation.

Direct or Perfect Positive Correlation

If the correlation is perfectly linear and the line runs from the lower left hand corner to the upper righthand, corner, then it is called the direct or positive perfect correlation.

In other words, V the equal proportional changes in the two variates x any y are in same direction, thenit is a perfect positive correlation (see Fig. 4.1).

In this case r = 1 or p (x, y) = 1.

Inverse or Perfect Negative Correlation.

It is a perfectly linear correlation in which the line runs from upper left hand corner to the lower righthand corner (see Fig. 4.2).

In other words, if the equal proportional changes in the two variates x and y are in oppositedirection, then the relation between x and y is inverse or perfect negative correlation.

In this case r = - 1, i.e., ρ (x, y) 1.

LINEAR CORRELATION AND REGRESSION 203

Page 7: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

High Degree Positive Correlation

If the plotted points (xi, yi), i = 1, 2, 3,..., n ; fall in a narrow band and the points are rising from lower lefthand corner to the upper right hand corner, then there will be a high degree of positive correlationbetween x andy (see Fig. 4.3).

High Degree Negative Correlation

If the plotted points (xi, yi), i = 1, 2 n fall in a narrow band from upper left hand corner to the lowerright hand corner, then there will be a high degree of negative correlation between x andy (see Fig. 4.4)

Low Degree Positive or Negative Correlation

The figures 4.5 and 4.6 show a low degree of positive and negative correlation between the variates xand y because the points are scattered much away from the line.

No Correlation

204 STATISTICS (CPT)

Page 8: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

If the plotted points lie scattered all over the graph paper, then there is no correlation between the twovariables and the variables are said to be statistically independent (see Fig. 4.7). Here r =0. Thevariables x and y are saId to be independent.

Notes. 1. More the plotted points lie on a straight line, the greater is the magnitude of correlationcoefficient.

2. If the dots are widely scattered, then the correlation between the variables is poor.

METHODS TO STUDY CORRELATION

The intensity of correlation between two variates x and y can be measured by the followingmethods.(i) Scatter Diagram Method or Dot Diagram Metbod(ii) Karl Perason’s Coefficient of Correlation Method(iii) Spearman’s Rank Correlation Method(iv) Concurrent Deviation Method(v) Two-way Frequency Table Method

SCATTER DIAGRAM OR DOT DIAGRAM METHOD

Scatter diagram is a graphical method of showing the correlation between the two variables x andy. Let(xi, yi) i = 1, 2, 3 n be a bivariate distribution. Let the values of the variables x andy be plottedalong the x-axis and y-axis in a coordinate plane by choosing a suitable scale, so that it measuresthe range of the data of both the variates (series) under consideration. Then corresponding toevery ordered pair (xi, yi) there corresponds a point or a dot in the coordinate plane.The diagram of dots or points so obtained is called a scatter diagram or a dot diagram.

The scatter diagram may indicate both degree and the type of correlation. From scatter diagramwe can form a fairly good, though rough idea about the relationship between the two variables.The diferent types of correlation are depicted by means of scattered diagrams as shown by figures4.1, 4.2, 4.3, 4.4, 4.5, 4.6. and 4.7.

KARL PEARSON’S COEFFICIENT OF CORRELATION

LINEAR CORRELATION AND REGRESSION 205

Page 9: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

Karl Pearson (1857 - 1936) was a great statistician. He gave the following mathematical formulafor measuring the magnitude of linear correlation coefficient between two variables. If X and Yare two variables, then the correlation coefficient p (X, Y) between them is given by :

FIRST FORM: ρ (X, Y) = Cov. (X, Y)

Var. (X) x Var.(Y) ....(1)

SECOND FORM: ρ (X, Y) =

n

i ii=1

n n2 2

i ii=1 i=1

(x - x) (y - y)

(x - x) (y - y)

where x , y are means of X and Y series respectively..

ρ (X,Y) is also written as rxyor r or simply ρ .

THIRD FORM : ρ (X,Y) =

i ii i

2 2i i2 2

i i

x yx y -

nx y

x - y -n n

This formula saves a lot of computational labour. It reduces the error due to computation androunding off.

Example 4. Calculate the coefficient of correlation for the following data:(1, 2,) (2, 4), (3, 8), (4, 7), (5, 10), (6, 5), (7, 14) (8, 16), (9, 2), (10, 20).Solution. Here n = 10.

ix = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55

iy = 2 + 4 + 8 + 7 + 10 + 5 + 14 + 16 + 2 + 20 = 88

2ix = 1 + 4 + 9 + 16 + 25 + 36 + 49 + 64 + 81 + 10 = 385

2iy = 4 + 16 + 64 + 49 + 100 + 25 + 196 + 256 + 4 + 400 = 1114

i ix y = 2 + 8+ 24 + 28 + 50 + 30 + 98 + 128 + 18 + 200 = 586

Now

206 STATISTICS (CPT)

Page 10: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

i ii i

2 2i i2 2

i i

x yx y -

nρ(X, Y) =x y

x - y -n n

=

55 8858610

55 55 88 88385 111410 10

= 586 484 102 102 102

167.35(82.5)(339.5) 2808.75385 302.5 (1114 774.5)

= 0.61 nearly..

The above example shows that the computation is not simple. We give below another method tomake the computation simpler.

Direct Method

If X and Y are two variates having their means x and y respectively, then

FOURTH FORM :

2 2

dx x dyρ(X, Y) =

dx x dy

where dx = xi - x , dy = yi - y , dx2 = (xi - x )2 and dy2 = (yi - y )2

It can also be written as :

FIFTH FORM : rxy =

x y

dx x dynσ x σ ,

where n is the number of observations in X or Y series, xσ are yσ standard deviation of X andY series respectively.

The following formula can also be deduced from above formula

SIXTH FORM :

,

2 2

2 2

dx dydx dy -

nρ(X, Y) =dx dy

dx - dy -n n

where n is the number of observations.

Merits and Limitations of Pearson’s Correlation Co-efficient:

Karl Pearson’s co-efficient of correlation is the best measure for expressing the relationship

LINEAR CORRELATION AND REGRESSION 207

Page 11: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

between two variables. The degree and direction of the relationship between the variables canbe obtained by it. However the following are some of the limitations of it.

(1) If is based on the assumption of linearity of relationship between the variables.

(2) The computation by this method is difficult compared to other methods.

(3) The correlation co-efficient is highly influenced by extreme pairs of observations.

(4) It is always difficult to interprete the correlation co-efficient, correctly.

Spearman’s Method of Rank Correlation:

Prof. Charles Edward Spearman has given one method of finding out correlation co-efficientbetween two attributes. In this method instead of values, the ranks are used to find out correlationco-efficient and hence the method is known as the method of rank correlation. We known thatqualitative phenomena cannot be numerically expressed. But it is convenient to assign themranks, e.g. suppose there are 10 competitors in a beauty contest. It is inconvenient to give marksto these competitors in a beauty contest. instead of that they can be easily assigned ranks as firstrank, second rank etc. If two Judges have given ranks to the same participants then we may beinterested in knowing how far the two judges agree in assigning ranks. This can be measuredby co-efficient of rank correlation. Method of rank correlation can thus be used for finding outthe relationship between two qualitative phenomena like honesty, intelligence, poverty etc.

This method is also useful when the variations in the series are more. Pearson’s method becomestedious in such situations.

For finding out correlation co-efficient first of all, the ranks are given separately to the values ofeach series. In each series the highest value is given first rank. The next value is given thesecond rank and so on. Thus each value is given rank. If two values are equal then ambiguityarises. In such cases both values are braketed and they are given ranks equal to the averagevalues of their ranks e.g. suppose the marks of 8 students in statistics are 63, 65, 87, 24, 61, 57,50 61. If we want to give ranks to these students then the student securing 87 marks will begiven first rank. The second rank will be given to the student getting 65 marks, third rank willbe given to the student getting 63 marks. Now there are two students getting 61 marks. They

can be given ranks 254

= 4.5 each. The next student getting marks 57 will be given sixth rank

and so on.

After assigning ranks in two series, the differences in the corresponding ranks of the two series

are obtained for all the pairs. These differences are denoted by d and the value of 2d isobtained. The co-efficient of rank correlation is then calculated by the following formula

208 STATISTICS (CPT)

Page 12: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

r = 1 - 1nnd6

2

2

Where n = number of pairs

In case of finding out rank correlation co-efficient when the observation are paired the followingadjustment or correction is necessary.

In 2d , 12m

(m2 - 1) is added where m is the number of times an item is repeated. The corretion

factor should be added for each repeated value. The formula for rank correlation co-efficient

will be as follows:r = 1 -

1nn

.......1m12m1m

12md6

2

222

The value of correlation co-efficient by Spearman’s method also lies between -1 and +1. If theranks are same for each pair of two series then each value of d = 0, hence = 0 and the value ofr = + 1, which shows perfect positive correlation between the two variables. If the ranks areexactly in reverse order for each pair of two series, then the value of r = - 1 which shown perfectnegative ‘correlation between the variables.

Merits and Limitations of Rank Correlation Method:

Merits

(1) This method is easier to understand and apply compared to Karl Pearson’s method.

(2) When the data are of qualitative nature like honesty, beauty, intelligence etc. this methodis convenient.

(3) When the dispersion in a series is more this method is useful.

(4) When the ranks are given instead of values then this is the only method that can be used.

Limitations:

(1) This method does not give accurate results as compared to Pearson’s method.

(2) When there are more observations, it is tedious to assign ranks.

(3) The method cannot be used for data given in a bivariate frequency distribution.

Now we shall use this method in some of the examples.

Method of Concurrent Deviations:

A very simple and easy method, based on direction of changes of the variables is a method ofconcurrent deviation. In this method the direction of change in the value of a variablex is notedby comparing it withvalue Similarly the direction of change in the value ofy is also noted. The

LINEAR CORRELATION AND REGRESSION 209

Page 13: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

direction of change of each value of x will be either + or -. Similarly the direction of change ofeach value of y will be either + or For any pair of both x and y if the changes are in the samedirection, the deviations are said to be concurrent. The following formula is used for findingcorrelation co-efficient by this method.

r = +

2c - nn

Where, r = Correlation co-efficient

c = Number of pairs of concurrent deviations i.e.number of pairs having same sign.

n = Number of pairs of deviations compared.

Note :

(i) If the value of x increases compared to the preceding value, deviation of x is taken as +, ifit decreases deviation is taken as — and if there is no increase or decrease, the sign of = istaken. Similarly the signs of deviations of y are noted. These direction of changes aredevoted by Cx and Cy.

(ii) The pair of observations having the same signs of deviations is called concurrent. deviation.c denotes number of such concurrent deviations.

(3) The sign of depends upon the sign of 2c — n. If 2c — n is positive, r is positive and if 2c —n is negative r is negative.

(4) It should be noted that n is number of pairs of deviations and not the number of pairs ofobservations. In fact, it is one less than the number of pairs of observations.

ILLUSTRATION 7 : Find co-efficient of correlation by method of concurrent deviation.

Year 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001

Import 28 82 50 170 170 260 240 300 320 350 300

Export 120 82 38 32 57 55 57 50 60 50 62

Ans.

210 STATISTICS (CPT)

Page 14: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

Year Import Export Deviation Deviation Concurrent

x y of x of y Deviations

Cx Cy

1991 28 120

1992 82 82 + - -

1993 50 38 - - +

1994 170 32 + - -

1995 170 57 = + -

1996 260 55 + - -

1997 240 57 - + -

1998 300 50 + - -

1999 320 60 + + +

2000 350 50 + - -

2001 300 62 - + -

C = 2

Here n = 10 and C = 2.

r = 2c n

n

= 4 10

10

=6

10

= - 0.6

= - 0.77

Merits and limitations of concurrent deviation method

Merits:

(1) The method is simple and easy.

(2) If the number of pairs is very large, the method can be easily used.

LINEAR CORRELATION AND REGRESSION 211

Page 15: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

Limitations:

(1) As actual values are not used it is not a reliable method.

(2) The method gives only a rough idea of co-efficient of correlation.

4. Interpretation of Correlation Co-efficient :

The correlation co-efficient expresses the degree and direction of the relationship between thevariables. Having obtained the value of the correlation co-efficient, it is essential to interprete it.The sign of the correlation co-efficient gives the idea about the direction of the relationshipwhile the numerica’ value gives the idea about the closeness of the relationship. However, theinterpretation of the correlation co-efficient mainly depends upon the experience of dealingwith such problems. The following general rules are useful in interpreting the value of correlationco-efficient.

(1) Interpretation of r = + 1 : r = +1 shows perfect positive correlation between two variables.For such variables an increase in the value of one variable is associated with a proportionalincrease in the value of the other variable. The points on the scatter diagram for suchvariables are in a straight line in an increasing order.

(2) Interpretation of r = 1: r = - 1 shows perfect negative correlation between two variables.For such variables an increase in the value of one variable is associated with a proportionaldecrease in the value of the other variable. The points on the scatter diagram for suchvariables are in one straight line in a decreasing order.

(3) Interpretation of r = 0: r = 0 shows absence of the linear relationship between the variables.Such variables are said to be linearly uncorrelated. The variables are independent and thepoints on the scatter diagram are randomly distributed.

(4) If the value of r is nearer to +1 or - 1, the relationship between the variables is more close,and if the value of r is nearer to zero, the relationship is less close.

(5) The relationship between the variables is not proportional to the value of r = 0.8 does notindicate that the relation is two times closer than when r = 0.4. r = 0.8 indicates morecloseness of the relationship than r = 0.4.

(6) Before interpreting the value of r, we should examine whether there exist cause and effectrelationship between the variables.

(7) In estimating the population correlation co-efficient from the value of sample correlationco-efficient the probable error of r should also be taken into consideration.

Now, we shall discuss about probable error of the correlation co-efficient.

212 STATISTICS (CPT)

Page 16: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

5. Probable Error :

Generally we obtain correlation co-efficient of a sample drawn from a bivariate population. ifdifferent samples of the same size are drawn from a given population, we get different valuesof r. All these values of r differ from the actual value of the population correlation co-efficient.The average of the absolute differences of correlation co-effirients obtained from all possiblesamples and the population corretation co-efficient is known as probable error of the correlationco-efficient. The value of probable error depends upon the size of the sample. If the sample islarge, the value of probable error is small. From the value of the sample correlation co-efficientwe can estimate the population correlation co-efficient, and with the help of probable error wecan determine whether the correlation in the population is significant or not.

If a sample of size n is drawn from a bivariate population and if r is its correlation co-efficient,then the probable error (P.E.) can be found out by the following formula:

P.E. = n

r16745.0 2

The following rules can be applied to judge whether the correlation in the population is significantor not:

(1) If r < P.E., there is no evidence of correlation in the population i.e. the correlation in thepopulation is not significant.

(2) If r> 6 (P.E.), there is evidence of significant correlation in the population.

Moreover with the help of probable error we can determine the limits within which thepopulation correlation co-efficient is expected to lie.

The probable limits of the population correlation co-efficient are r ± P.E.

6. Co-efficient of Determination

We know that the study of correlation and regression is based on the assumption that therelationship between x and y is linear or close to linear. If this assumption is correct our studymay be regarded valid, and if it is not correct, the study is not valid. A very powerful tool forthis is to find out co-efficient of determination denoted by R2. The value of R2 can be provedequal to r2. i.e. square of the correlation co-efficient between the variables. If R2 is nearer to 1 theassumption about the linear relationship between the variables may be regarded as valid. Onthe other hand if R2 is nearer to zero the assumption about linear relationship may be regardedinvalid.The co-efficient of determination is also a more useful and readily comprehensive measurewhich gives the percentage variation in the dependent variable that is accounted by theindependent variable.

LINEAR CORRELATION AND REGRESSION 213

Page 17: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

If R2 = 0.64 it means only 64% of the variations in the dependent variable has been explained bythe independent variable and remaining 36% of the variation is due to other factors. Thus Thetotal variation in the dependent variable y can be divided in two parts

(i) the variation due to the independent variable x which is expressed as explained variance.

(ii) the variation due to other factors which is expressed as unexplained variance.

If the arithmetic mean of y is y and the estimated value of y derived from the regression line isdenoted by .y’, then

Total Variation = 2yy

Explained Variation = 2y'y

Unexplained Variation = 2'yy

2yy = 2y'y + 2'yy

The co-efficient of determination

R2 = VarianceTotalVarianceExplained

=

2

2

yyy'y

The co-efficient of determination R2 always lies between 0 and, 1 i.e. it is non-negative and assuch does not tell us about direction of the relationship (whether it is positive or negative)between the two variables.

The co-efficient of non-determination is 1 - R2.

214 STATISTICS (CPT)

Page 18: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

LINEAR REGRESSION

Meaning of Regression:

When there are simultaneous changes in the values of two variables and when the changes inone variable are due to the changes in other variable, they are said to be correlated, and thecorrelation co-efficient expresses the extent of the relationship between them. But if we want toknow the value of one variable when the value of other variable is given, the correlation analysiscannot help us e.g. If we are’ given the figures regarding rainfall and yield of rice for last 10years, we can find out correlation co-efficient between them. But that cannot help us in estimatingthe yield of current year when we know the rainfall. For estimating the value of one variablefor a given value of another variable, we must find out some functional relationship betweenthe variables. Regression is a statistical technique with the help of which the functionalrelationship between two variables can be established and which helps us in estimatingthe unknown value of one variable for a known value of other variable. The word regressionwas first used by Sir Francis Galton at the end of nineteenth century. He used the wordregression while ‘studying the relationship between heights of fathers and heights of sons.

In the study of regression mathematical model is used for representing the relationship betweentwo variables. i.e. Some mathematical equation is obtained to represent the relationship betweentwo variables. If there is cause and effect relationship between two variables, a change in thevalue of one variable will result in a corresponding change in the value of another variable. Thevariable in which we make changes is called causal variable and it is called an independentvariable. Generally it is denoted by x. By making changes in the values of causal variable x, theother variable in the form of effect also changes. The variable in the form of effect is called adependent variable and usually it is denoted by y. In the relationship between income andexpenditure, income is independent variable and expenditure is dependent variable. Incomecan be denoted by x and expenditure can be denoted by y. In the study of rainfall and the yieldof rice the amount of rainfall is an independent variable, while yield of rice is dependent variable.In the regression model the dependent variable y is expressed as a function of independentvariable x. Thus regression is a relationship between two variables determined by an appropriatemathematical function. If this relationship is represented by some curve, it is called curvilinearregression, and if the relationship is represented by some straight line it is called linear regression.In this chapter we shall study the linear regression model between a dependent variable andan, independent variable.In most of the cases the independent and dependent variable can be easily decided. In thestudy of income and expenditure of families income is independent variable, while expenditure

LINEAR CORRELATION AND REGRESSION 215

Page 19: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

is dependent variable. Similarly in the study of regression between rainfall and yield of a croprainfall is independent variable, while yield of crop is dependent variable. Thus the dependentand independent variables can be easily distinguished in most of the cases. Denoting independentvariable by x and dependent variable by y, we obtain regression line of y on x.In some of the cases it is not easy to determine independent and dependent variables. Forexample in the study of the relationship between demand and price, one can think in thefollowing two ways (1) Demand depends upon price or (2) Price depends upon demand. Thusit is difficult to deterinine independent and dependent variables. Similarly in the study ofrelationship between height and weight the independent and dependent variables cannot becategorically decided. In such cases the two variables are found to be mutually dependent, andit becomes difficult to determine which variable should be regarded as dependent variable andwhich variable should be regarded as independent variable. In such cases two regression linesare obtained.

8. Regression Lines:We know that the simplest method of studying the relationship between two variables is themethod of scatter diagram.In the figure scatter diagram is shown for two variables. Generally all the points of the scatterdiagram are not in one straight line, and hence the line, around which most of the points liemay be regarded as a line showing the relationship between the variables. A number of suchlines can be thought of. We must find out the best line out of all such lines. The line aroundwhich most of the points lie is regarded as the best line. A well known mathematical principleof Least Squares can be used to obtain such a line. The line obtained by least squares principleis known as the Line of Best Fit. It is also called the best estimating line or the regression line.Thus regression line is the best average line obtained by the least squares principle.

Taking x as independent variable and y as dependent variable a line obtain by least squaresprinciple is called regression line of y on x. We know that equation of any straight line can bewritten in the form of y = a + bx. Hence the line obtained by’least squares principle from thepoints of the scatter diagram can be represented as y = a + bx. More specifically this line will berepresented as y = a + byxx

216 STATISTICS (CPT)

Page 20: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

In cases in which y is taken as independent variable and x is taken as dependent variable theline obtained by least squares principle can be called a regression line of x on y and we shallwrite its equation as x = A + bxyy

Thus for two mutually related variables we can obtain two regression lines.

9. Equations of Lines of Regression:

The equation of regression line of y on x obtained by least squares principle will be as follow

Equation of regression line of y on x

y=a+bxy . x ....(1)

Similarly if regression line of x on y is obtained by applying least squares principle, its equationwill be as follow

Equation of regression line of x on y

x = A + bxy . y ....(2)

It is clear that in the regression line of y on x, x is independent variable and y is dependentvariable. Hence when we want to estimate a value of y for a given value of x, we should use thisequation. Similarly in the regression line of x on y, y is independent variable and xis dependentvariable. Hence for obtaining estimated value of x, for a given value of y we should use thesecond equation i.e. the equation of regression line of x on y.

In the regression line of y on x i.e. y = a + bxy x, byx is called regression co-efficient of y on x andits value can be obtained as follows:

byx =

2Sxy,xCov

2xx

yyxx

After obtaining the value of a can be found out by using the relation a = y - byx . x where x and yare means of x and y respectively. Similarly in the regression equation of x on y ie. x = A + bxy . y, byxis called regression co-efficient of x on y and its value can be obtained as follows:

bxy =

2Syy,xCov

=

2yy

yyxx

From the value of bxy the value of A can be found out as A = x - bxy . y

LINEAR CORRELATION AND REGRESSION 217

Page 21: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

10. Calculations of Regression Co-efficients :

From the regression equation of y on x i.e. y = a + byx . x, it can be seen that the regression co-efficient of y on x i.e byx denotes the change in the value of y for a unit change in the value of x.While in the regression equation of x on y i.e. x = A + bxy y, the regression co-efficient of x on yi.e bxy denotes the change in the value of x for a unit change in the value of y. In short theregression co-efficient between two variables is a numerical measure showing the change inthe value of one variable for a unit cha.nge in the value of the other variable.

The following formulae are useful for calculating byx and bxy

For calculating byx For calculating bxy

byx =

2Sxy,xCov

bxy =

2Syy,xCov

= SxSyr = Sy

Sxr

=

2nSxyyxx

=

2nSyyyxx

=

2xx

yyxx

=

2yy

yyxx

= 22 xxn

yxxyn

= 22 yyn

yxxyn

= 22 uun

vuuvn

= 22 vvn

vuuvn

Where u = x - A Where u = x - A

v = y - B v = y - B

byx = 22 uun

vuuvn

x Cx

Cybxy = 22 vvn

vuuvn

x Cy

Cx

218 STATISTICS (CPT)

Page 22: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

Where u = xCAx

Where u = xCAx

v = yCBx

v = yCBx

byx = CxCy

ufufuun

vfvufufuvn22

bxy = CyCx

vfvfvvn

vfvufufuvn22

(For bivariate table) (For bivariate table)

x.bya yx y.bxA xy

Properties of regression co-efficients:

(1) The product of regression co-efficients is equal to the square of the correlation co-efficient.

We have byx = SxSyr and bxy = Sy

Sxr

byx x bxy = SySxr

SxSyr

= r2

Thus correlation co-efficient is the Geometric Mean between two regression co-efficients.

(2) byx . bxy and r have always the same sign.

byx =

2Sxy,xCov

, bxy =

2Syy,xCov

and

r = Cov x, y

Sx Sy

As Sx and Sy are always positive, the signs of byx , bxy and r depend upon the sign of Coy(x, y)if Coy(x,y) is positive and and r are positive and if Coy(x, y) is negative byx and bxy and r arenegative. Thus all the three have always the same sign.

(3) If two variables have perfect relationship, one regression co-efficient is reciprocal of theother.

For, perfect relationship r = ± 1.

Now byx x bxy = r2

LINEAR CORRELATION AND REGRESSION 219

Page 23: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

byx x bxy = ( 1)2

= 1

byx = xyb1

(4) The product of regression co-efficients is r2 which can not exceed 1. Hence if one regressionco-efficient is greater than 1, the other regression co-efficient must be less than 1.

(5) The regression co-efficients are independent of change of origin, but not of scale.

11. Correlation and Regression:

The relationship between two variables can be studied with the help of correlation and regression.The correlation co-efficient is a numerical measure which shows the degree and direction ofthe relationship, while regression gives a functional relationship between the variables andhelps us in estimating the value of one variable for a given value of another variable.

We have seen that for a given bivariate data we can draw two regression lines. These regressionlines intersect each other at ( x , y ) Hence if we know the equations of two regression lines wecan obtain the average values of x and y by solving the given equations. When the two regressionlines coincide the correlation between the variables is perfect. i.e. either perfect positive orperfect negative. If the two lines are perpendicular to each other, two variables are uncorrelated(r = 0). The variables are independent.

The greater the angle between two regression lines, the lesser is the correlation between thevariables. Thus angle between two regression lines can be taken as a measure of correlation co-efficient between the variables.

Difference between correlation and regression:

Correlation Regression

(1) It gives a numerical measure of the linear (1) It gives functional relationship between

relationship between the variables, the variables, and this relationship helps

us in estimating the value of one variable

for a given value of another variable.

(2) Correlation co-efficient is always between - 1 (2) One regression co-efficient can be greater

and + 1. than 1.

(3) Correlation co-efficient is independent of change (3) Regression co-efficients are independent

of origin and scale. of change of origin but not of scale.

220 STATISTICS (CPT)

Page 24: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

(4) Correlation co-efficient can be obtained from (4) Regression co-efficient cannot be obtained

regression co-efficients. from only correlation co-efficient.

Explained and Unexplained variance

Suppose y = a + byx . x is a regression line of y on x. Suppose x and y are, means of x andyrespectively. For a value xi of x, we can obtain an estimated value Yi of y. It can be easilyverified that

2i yy = 2

i2

ii y'y'yy

ny'y

n'yy

nyy 2

i2

ii2

i

Here

nyy 2

i is total variance

n

y'y 2i

is called explained variance

and

n'yy 2

ii is called unexplained variance

Thus,

Total variance = unexplained variance + explained variance

The ratio of explained variance and total variance is called co-efficient of determination and itis devoted by R2.

R2 = VarianceTotalVarianceExplained

= 2

i i2

i

(y - y )

y - y

The explained variance indicate the strength of the relationship between the variable.

K2 = 1 - R2 is called co-efficient of non-determination. The co-efficient of non-determinationindicaters the proportion of total variance not explained by the independent variable.

K2 = 1 - R2 =

2'i i

2i

(y y )(y y)

LINEAR CORRELATION AND REGRESSION 221

Page 25: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

12. Utility of Study of Regression:

Regression is a functional relationship between the variables. The following are main uses of itsstudy:

(1) It helps us in estimating the value of one variable for a given value of other variable.

(2) It helps us in estimating the change in the value of one variable for a unit increase in thevalue of other variable.

(3) The error committed in the estimated value can be known by regression analysis.

* * *

222 STATISTICS (CPT)

Page 26: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

CLASS WORK

1. The formula of correlation coefficient was developed in the year

(a) 1910 (b) 1890 (c) 1908 (d) none of theabove.

2. The idea of product moment correlation was given by

(a) R.A. Fisher (b) Sir Francis Galton

(c) Karl Pearson (d) Spearman

3. If the correlation between two variables X and Y is negative, the regression coefficient of Y on Xis

(a) positive (b) negative

(c) not certain (d) none of the above

4. The unit of correlation coefficients is

(a) kg (b) per cent

(c) not existing (d) none of the above.

5. The value of correlation co-efficient r is from

(a) 0 to 1 (b) –1 to 0 (c) –1 to 1 (d) – 3 to 3

6. If r = 1, the relation between the two variables X and Y is :

(a) Y is proportional to X (b) Y is inversely proportional to X

(c) Y is equal to X (d) none of the above.

7. If r = 0, the variables X and Y are

(a) linearly related (b) independent

(c) not linearly related (d) none of the above.

8. If r = –1, the relation between X and Y is of the type:

(a) When Y increases, X also increases (b) When Y decreases, X also decreases

(c) X is equal to – Y (d) when Y increases, X decreasesproportionally

9. The correlation co-efficient between x and y is 0.78, hence the correlation co-efficient between –x and – y is

(a) – 0.78 (b) 0.78

(c) – 0.78 or 0.78 (d) none of them

LINEAR CORRELATION AND REGRESSION 223

Page 27: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

10. If the points of the scatter diagram are not in a straight line, they indicate ......... of the linearrelationship.

(a) positive (b) negative (c) absence (d) can't say

11. If 2d = 0 then there is ................ correlation between the two variables.

(a) no (b) perfect negative

(c) perfect positive (d) partial correlation

12. The correlation co-efficient 0.6 indicates twice the relationship than the correlation co-efficient0.3.

(a) yes (b) no (c) may be (c) can't say

13. If two variables are independent.

(a) r =1 (b) r = 0 (c) r = –1 (d) r = (a) and (c)

14. The correlation co-efficient between x and y is 0.6 hence the correlation co-efficient between x +0.2 and y + 0.2 is

(a) 0.8 (b) –0.6 (c) 0.6 (d) 0.4

15. The correlation co-efficient between heights and weights of 1000 persons is 0.6 when heightsand weights are expressed in inches and in lbs respectively. If heights and weights are expressedin cms. and kg. respectively, the correlation co - efficient is

(a) 0.6 (b) –0.6

(c) can't say (d) none of them

16. If r = 0, the two variables are

(a) uncorrelated (b) perfectly related

(c) linearly independent (d) none of them

17. A positive significant correlation between the number of shoes produced and the steel producedper year is :

(a) a nonsense correlation (b) a spurious correlation

(c) a meaningless correlation (d) all the above.

18. Given the expected values for two variable X and Y as : E(X) = 8, E(X2) = 10, E(Y) = 3, E(Y2) = 20and E(XY) = 16 . We conclude that

(a) correlation coefficient will be positive (b) correlation coefficient will be negative

(c) expected values are incompatible (d) none of the above

19. If r is the correlation coefficient between X and Y, the correlation coefficient between aX + b and Yis (a > 0).

(a) ar (b) ar + b (c) a2 r (d) r

224 STATISTICS (CPT)

Page 28: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

20. If two variables are perfectly positively correlated, r =

(a) –1 (b) 1 (c) 0 (d) between 0 to 1

21. If two variables are negatively correlated, r =

(a) –1 (b) +1

(c) between 0 and 1 (d) between –1 and 0

22. If the simultaneous changes in two variables are in the same direction, the correlation coefficientbetween them is

(a) –1 (b) 1

(c) between 0 and1 (d) between –1 and 0

23. If the simultaneous changes in two variables are in opposite direction, the correlation coefficientbetween them is

(a) –1 (b) 1

(c) between 0 and 1 (d) between –1 and 0

24. The correlation co-efficient r is independent of change of

(a) origin (b) scale

(c) (a) and (b) both (d) none of them

25. In rank correlation if 2d = 0 then r =(a) –1 (b) +1 (c) 0 (d) none of them

26. If correlation co-efficient r between x and y is 0.5 then r between 4x and y is(a) 0.5 (b) –0.5 (c) 2.0 (d) none of them

27. If correlation co-efficient r between x and y is 0.5 then r between 4x + 2 and y is(a) 2.2 (b) 0.5 (c) –0.5 (d) 0

28. If correlation co-efficient r between x and y is 0.5 then r between – x and – y is(a) –1 (b) 0.5 (c) –0.5 (d) 0

29. If correlation co-efficient r between x and y is 0.5 then r between x and – y is(a) –1 (b) 0.5 (c) –0.5 (d) 0

30. If all the points, in a scatter diagram are in one line then r =(a) –1 (b) 1 (c) (a) and (b) (d) (a) or (b)

31. The value of r2 lies between(a) 0 and 1 (b) –1 and 1 (c) –1 and 0 (d) none of them

32. If r is the simple correlation, the quantity r2 is known as

(a) coefficient of determination (b) coefficient of non determination

(c) coefficient of alienation (d) none of the above

LINEAR CORRELATION AND REGRESSION 225

Page 29: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

33. If r is the simple correlation, the quantity (1 – r2) is called :

(a) coefficient of determination (b) coefficient of nondetermination

(c) coefficient of alienation (d) none of the above

34. The ranks in Statistics and Accountancy of 10 students are given in brackets.

(8, 3), (1, 5), (6, 2), (3, 9), (2, 10), (5, 1), (4, 6), (10, 4), (9, 7), (7, 8).

The rank correlation co-efficient is

(a) 0.32 (b) 0.16 (c) – 0.32 (d) 0.5

35. The ranks of ten students in two subjects A and B are as follows.

A 3 5 8 4 7 10 2 1 6 9

B 7 6 2 8 4 3 10 9 5 1

Spearman's rank correlation co-efficient is(a) –0.94 (b) 0.94 (c) 0.5 (d) 0

36. The rank correlation co-efficient between marks in statistics and marks in accountancy of 10students is 0.4. Later on its was noticed that the difference in ranks of a student was taken as 3instead of 7. The correct value of the correlation co-efficient is(a) –0.4 (b) 0.4 (c) 0.16 (d) 0.5

37. If the sum of squares of differences in ranks of two variables x and y is 126 and the correlationco-efficient is – 0.5. The number of pairs are(a) 8 (b) 7 (c) 5 (d) 10

38. The co-efficient of correlation from following results isAverage of x = 10.5, Average of y = 13.9, S. d. of x = 3.5, S. d of y = 4.1 n = 10, xy = 1364(a) 0.6 (b) 0.66 (c) –0.66 (d) 0

39. The co-efficient of correlation from following results isn = 10, x = 140, y = 150, 2(x 10) = 180, 2(y 15) = 215, (x 10)(y 15) = 60(a) –0.5 (b) 0.91 (c) –0.9 (d) none of them

40. If x and y are measured from their respective means, the correlation co-efficient from the followingdatan = 20, xy = 400, Sx = 16, Sy = 2.5. is

(a) –0.5 (b) 0.5 (c) –1 (d) 1

41. The correlation co-efficient between x and y from the following data isn = 8, x = 408, y = 272, 2(x 51) = 42, 2(y 34) = 60, (x 51)(y 34) = –16is

(a) –0.32 (b) 0.32 (c) –1 (d) 1

226 STATISTICS (CPT)

Page 30: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

42. For a bivariate sample n = 16, r = 0.8, the probable error of r is

(a) –0.06 (b) 0.06 (c) –1 (d) 1

43. For a bivariate sample r = 0.5 and the probable error of r = 0.125, the value of number of pairs is

(a) 16 (b) 6 (c) 10 (d) 12

44. Find co-efficient of correlation by method of concurrent deviations.

x : 78 80 84 89 81 83 87 94 91

y : 190 178 158 132 164 152 128 88 98

(a) –0.32 (b) 0.32 (c) –1 (d) 1

45. Co–efficient of correlation between two variables x and y is 0.8 and their covariance is 20. If thevariance of x series is 16, the standard deviation of y series is

(a) 32 (b) 12 (c) –6.25 (d) 6.25

46. The correlation co-efficient between two variables is 0.8. What is the co-efficient of determination?

(a) 0.32 (b) 0.64 (c) 6.4 (d) 0.8

47. Find the co-efficient of correlation, using the method of concurrent deviations, between supplyand demand of an item for a ten year period given below.

Year : 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990

Supply : 125 160 164 174 155 170 165 162 172 175

Demand : 115 125 192 190 165 174 124 127 152 169

(a) 0.75 (b) –0.75 (c) –1 (d) 1

48. The co-efficient of correlation by concurrent deviations for the following data is

Supply : 65 40 35 75 63 80 35 20 80 60 50

Demand : 60 55 50 56 30 70 40 35 80 75 80

(a) –0.32 (b) –0.5 (c) –1 (d) 0.89

49. Given that the correlation coefficent between x and y is 0.5, what is the correlation coefficientbetween 2x – 4 and 3 – 2y ?

(a) –0.5 (b) 0.5 (c) –1 (d) 1

50. If the covariance between two variables is positive, it means that(a) the variables would change in the same direction(b) the variables would change in the opposite direction(c) the variables would not change.(d) none of the above.

51. If the correlation between two variables is zero, it implies that :(a) two variable are independent

LINEAR CORRELATION AND REGRESSION 227

Page 31: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

(b) two variables do not have negative correlation(c) the two variables are not linearity related(d) all the above.

52. Correlation between the direction of deviations is calculated by the method of :

(a) product moments (b) rank correlation

(c) coefficient of concurrent deviation (d) Kedndall's

53. The term regression was introduce by :

(a) R.A. Fisher (b) Sir Frances Galton

(c) Karl Pearson (d) None of the above

54. If X and Y are two variables, there can be at most :

(a) One regression line (b) two regression lines

(c) three regression lines (d) an infinite number of regression lines

55. In a regression line of Y on X, the variable X is known as :

(a) independent variable (b) regressor

(c) explanatory variable (d) all the above

56. Regression equation is also named as :

(a) prediction equation (b) estimating equation

(c) line of average relationship (d) all the above

57. In the regression line Y = a + bX, b is called

(a) slope of the line (b) intercept of the line

(c) nether (a) nor (b) (d) both (a) and (b)

58. If bYX and bXY are two regression coefficients, they have

(a) same sign (b) opposite sign

(c) either same or opposite signs (d) nothing can be said

59. The property that bYX and bXY and r have same signs, is called

(a) fundamental property (b) signature property

(c) magnitude property (d) none of the above.

60. The average of the two regression coefficients is always greater than or equal to the correlationcoefficient is called

(a) fundamental property (b) signature property

(c) magnitude property (d) mean property

228 STATISTICS (CPT)

Page 32: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

61. If byx = – 0.8, bxy = – 0.45, then r =

(a) 0.5 (b) –0.6 (c) 0.6 (d) –0.36

62. The two regression co-efficients are 0.8 and 0.2, hence correlation co-efficient is

(a) 0.4 (b) – 0.16 (c) 0.16 (d) –0.4

63. One of the regression co-efficients of two perfectly correlated variables is 0.5, hence the otherregression co-efficient is

(a) 0.5 (b) –0.5 (c) 2 (d) –2

64. The two regression lines are perpendicular to each other hence the correlation co-efficient betweenthem is

(a) 0.5 (b) –1 (c) 1 (d) 0

65. The greater the angle between the regression lines, .......... the correlation between the variables.

(a) lesser (b) higher (c) medium (d) none ofthem

66. If bYX > 1, then bXY is

(a) less than 1 (b) greater than 1

(c) equal to 1 (d) equal to 0

67. The property bYX > 1 implies that bXY < 1 is known as :

(a) fundamental property (b) signature property

(c) magnitude property (d) none of the above

68. If X and Y are independent, the value of regression coefficient bYX is equal to :

(a) 0 (b) 1 (c) (d) any positive value

69. The regression equation of y on x is y = – 3 + 0.5x and that of x on y is x = – 7 + By.

If the correlation co-efficient between x and y is 0.1, then B =

(a) 0.5 (b) –0.5 (c) 0.02 (d) –0.02

70. The probable change in the value of y with unit change in the value of x can be given by

(a) bxy (b) byx (c) r (d) none of them

71. Two regression lines intersect each other at

(a) x, y (b) Sx, Sy (c) r (d) x y,

72. The property if X and Y are independent, then bYX = bXY = 0 is called :

(a) fundamental property (b) mean property

(c) independence property (d) magnitude property

LINEAR CORRELATION AND REGRESSION 229

Page 33: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

73. The lines of regression intersect at the point :

(a) (X, Y) (b) ( , )x y (c) (0, 0) (d) (1, 1)

74. The coordinates (x y, ) satisfy the lines of regression of :

(a) Y on X (b) X on Y

(c) both the regression lines (d) none of the two regression are:

75. If r = ± 1, the two lines of regression are :

(a) coincident (b) parallel

(c) perpendicular to each other (d) none of the above

76. If r = 1, the angle between the two lines of regression is :

(a) zero degree (b) ninety degree

(c) sixty degree (d) thirty degree

77. If r = 0, the lines of regression are :

(a) coincident (b) parallel

(c) perpendicular to each other (d) none of the above

78. If r = 0, the angle between the two lines of regression is:

(a) zero degree (b) ninety degree

(c) sixty degree (d) thirty degree

79. If a constant 50 is subtracted from each of the value of X and Y, the regression coefficient is:

(a) reduced by 50 (b) (1/50)th of the original regressioncoefficient

(c) increased by 50 (d) not changed

80. If each observation in the set of values ( X, Y) is divided by 100, the regression coefficient of YonX is

(a) increased by 100 (b) decreased by 100

(c) (1/100)th of bYX (d) none of the above

81. If each of X variate is divided by 5 and of Y by 10, then by coded values bYX is(a) same as bYX (b) half of bYX(c) twice by (d) none of the above

82. If each value of X is divided by 2 and of Y is multiplied by 2, then bYX by coded values is(a) same as bYX (b) half of bYX(c) four time of bYX (d) eight times of bYX

83. If from each value of X and Y, constant 25 is subtracted and then each value is divided by 10,the coded bYX is

230 STATISTICS (CPT)

Page 34: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

(a) same as bYX (b) 2½ times of bYX

(c) 25 time of bYX (d) 10 times of bYX

84. Two regression lines are 3x = y and 4y = 3x and Sx = 2, then r =

(a) 0.05 (b) –0.05 (c) 0.5 (d) 0

85. Two regression lines are 3x = y and 4y = 3x and Sx = 2, then Sy =

(a) 2 (b) –2 (c) 3 (d) 5

86. The regression equation of y on x from the following data isx y

Average 25.5 40S.d. 2.4 6Correlation co-efficient = 0.8(a) y = – 11 + 2x (b) y = – 11 – 2x(c) y = 11 + 2x (d) y = – 11 – 2x

87. From the following data the estimated value of x for y = 25 isx y

Average 25.5 40S.d. 2.4 6Correlation co-efficient = 0.8

(a) 2.07 (b) 207 (c) –20.7 (d) 20.7

88. From the following information estimated price in Bombay when the price in Calcutta is ` 70 isAverage price in Calcutta = ` 65 Average price in Bombay = ` 67S. D. of price in Calcutta = ` 2.5 S. D. of price of Bombay = ` 3.5Correlation co-efficient between prices = 0.8

(a) 72.6 (b) 73.6 (c) 71.6 (d) 70.6

89. If x = 168, y = 65, Sx = 6, Sy = 14, r = 0.6 then the estimated value of x when y = 80 is

(a) 172.9 (b) 173.9 (c) 171.9 (d) 170.9

90. If x = 168, y = 65, Sx = 6, Sy = 14, r = 0.6 then the estimated value of y when x = 155 is

(a) 46.8 (b) 48.6 (c) 46.6 (d) 48.8

91. If each value of X is multiplied by 10 and of Y by 20, bXY , the regression coefficient by codedvalues is

(a) same as bXY (b) half of bXY

(c) four time of bXY (d) one fourth of bXY

92. If the two lines of regression are perpendicular to each other, the relation between the tworegression coefficients is:

LINEAR CORRELATION AND REGRESSION 231

Page 35: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

(a) bYX = bXY (b) bYX . bXY = 1

(c) bYX < bXY (d) bYX = bXY = 0

93. If the two lines of regression are coincident, the relation between the two regression coefficientsis :

(a) bYX = bXY (b) bYX . bXY = 1

(c) bYX < bXY (d) bYX = – bXY

94. Regression coefficient is independent of change of

(a) Origin (b) scale

(c) both origin and scale (d) neither origin nor scale

95. The geometric mean of the two regression coefficients bxy and bxy is equal to

(a) r (b) r2 (c) 1 (d) none of the above

96. If the two lines of regression are x + 2y – 5 = 0 and 2x + 3y – 8 = 0, the regression line of y on xis:

(a) x + 2y – 5 = 0 (b) 2x + 3y – 8 = 0

(c) any of the two lines (d) none of the above

97. If the two lines of regression are x + 2y – 5 = 0 and 2x + 3y – 8 = 0, the means of X and Y are

(a) –3, 4 (b) 2, 4 (c) 1, 2 (d) none of the above

98. The means of x and y are 16 and 20 respectively. Their standard deviations are 6 and 8 respectively.The correlation co-efficient between them is 0.6. The equation of regression line of y on x is

(a) y = 7.2 + 0.8x (b) y = –7.2 + 0.8x

(c) y = 7.2 – 0.8x (d) y = –7.2 – 0.8x

99. The means of x and y are 16 and 20 respectively. Their standard deviations are 6 and 8 respectively.

The correlation co-efficient between them is 0.6. The equations of regression line of x on y is

(a) x = –7 + 0.45y (b) x = 7 – 0.45y

(c) x = –7 – 0.45y (d) x = 7 + 0.45y

100. The following information is obtained regarding age of husbands and wives.

Average age of husbands = 30.3 years Average age of wives = 24.8 years

S.D. of age of husbands = 5.4 years S.D. of age of wives = 4.5 years

The correlation co-efficient between age of husbands and wives = 0.8. If the age of a husband is50 years then estimate the age of his wife is

(a) 28 (b) 30 (c) 35 (d) 38

232 STATISTICS (CPT)

Page 36: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

101. The equations of lines of regression for two variables x and y are 3x + 2y – 26 = 0 and6x + y – 31 = 0, and the variance of x is 25. Then x y, are(a) 4,7 (b) 5,6 (c) 4,6 (d) 6,7

102. The equations of lines of regression for two variables x and y are 3x + 2y – 26 = 0 and 6x + y – 31= 0, and the variance of x is 25. Then r is(a) 0.5 (b) –0.5 (c) 0 (d) 1

103. The equations of lines of regression for two variables x and y are 3x + 2y – 26 = 0 and6x + y–31 = 0, and the variance of x is 25. Then sy is(a) 15 (b) 20 (c) 25 (d) 35

104. The two regression equations are x = 5y – 7 and y = 0.1x + 1.7, the values of x y, are(a) 3, 3 (b) 3, 2 (c) 2, 5 (d) 3, 5

105. The two regression equations are x = 5y – 7 and y = 0.1x + 1.7, the value of r is(a) 0.71 (b) 0.75 (c) 0.25 (d) 0.35

106. The regression lines are X + 2Y – 5 = 0, 2X + 3Y – 8 = 0 and V(X) = 12, the value of V(Y) is(a) 16 (b) 4 (c) 3/4 (d) 4/3

107. If the two lines of regression are x = (–1/18)y + l; y = – 2x + mand the mean of the distribution is at (–1, 2) the values of l and m are :(a) l = 8/9, m = –5 (b) l = 9/8, m = –3(c) l = –10/9, m = –4 (d) l = –8/9, m = 0

108. If y = mX + 4 and X = 4Y + 5 are the regression lines of Y on X and X on Y, then m lies betweenthe values(a) 0 and 1 (b) 0 and 0.5(c) 0 and 0.25 (d) none of the above

109. Compute the value of y for x = 48 on the basis of the following information.x y

Mean 40 45S.D. 10 9Karl Pearson's correlation co-efficient between x and y = 0.50(a) 48 (b) 48.6 (c) 25 (d) 35

110. If the two lines of regression in a bivariate distribution are X + 9Y = 7 and Y + 4X = 16, then Sx: Sy is

(a) 3 : 2 (b) 2 : 3 (c) 9 : 4 (d) 4 : 9

111. Regression coefficient is independent of the change of

(a) scale (b) origin

(c) both origin and scale (d) neither origin nor scale.

* * *LINEAR CORRELATION AND REGRESSION 233

Page 37: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

ANSWER KEYS

234 STATISTICS (CPT)

Page 38: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

HOME WORK - 1

1. Bivariate Data are the data collected for

(a) Two variables (b) More than two variables

(c) Two variables at the same point of time (d) Two variables at different points of time.

2. For a bivariate frequency table having (p + q) classification the total number of cells is

(a) p (b) p+q (c) q (d) pq

3. Some of the cell frequencies in a bivariate frequency table may be

(a) Negative (b) Zero (c) a or b (d) Non of these

4. For a p x q bivariate frequency table, the maximum number of marginal distributions is

(a) p (b) p+q (c) 1 (d) 2

5. For a p x q classification of bivariate data, the maximum number of conditional distributions is

(a) p (b) p+q (c) pq (d) p or q

6. Correlation analysis aims at

(a) Predicting one variable for a (b) Establishing relation between two

given value of the other variable variables

(c) Measuring the extent of relation (d) Both (b) and (c).

between two variables

7. Regression analysis is concerned with

(a) Establishing a mathematical (b) Measuring the extent of association

relationship between two variables between two variables

(c) Predicting the value of the (d) Both (a) and (c).

dependent variable for a given

value of the independent variable

8. What is spurious correlation?

(a) It is a bad relation between (b) It is very low correlation between

two variables. two variables.

(c) It is the correlation between two (d) It is a negative correlation.

variables having no causal relation.

LINEAR CORRELATION AND REGRESSION 235

Page 39: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

9. Scatter diagram is considered for measuring

(a) Linear relationship between two variables (b) Curvilinear relationship between two variables

(c) Neither (a) nor (b) (d) Both (a) and (b).

10. If the plotted points in a scatter diagram lie from upper left to lower right, then the correlationis

(a) Positive (b) Zero (c) Negative (d) None of these.

11. If the plotted points in a scatter diagram are evenly distributed, then the correlation is

(a) Zero (b) Negative (c) Positive (d) (a) or (b).

12. If all the plotted points in a scatter diagram lie on a single line, then the correlation is

(a) Perfect positive (b) Perfect negative

(c) Both (a) and (b) (d) Either (a) or (b).

13. The correlation between shoe-size and intelligence is

(a) Zero (b) Positive (c) Negative (d) None of these.

14. The correlation between the speed of an automobile and the distance travelled by it after applyingthe brakes is

(a) Negative (b) Zero (c) Positive (d) None of these.

15. Scatter diagram helps us to

(a) Find the nature correlation between (b) Compute the extent of correlation

two variables between two variables

(c) Obtain the mathematical (d) Both (a) and (c).

relationship between two variables

16. Pearson’s correlation coefficient is used for finding

(a) Correlation for any type of relation (b) Correlation for linear relation only

(c) Correlation for curvilinear relation only (d) Both (b) and (c).

17. Product moment correlation coefficient is considered for

(a) Finding the nature of correlation (b) Finding the amount of correlation

(c) Both (a) and (b) (d) Either (a) and (b).

18. If the value of correlation coefficient is positive, then the points in a. scatter diagram tend tocluster

(a) From lower left corner to upper right corner (b) From lower left corner to lower right corner

(c) From lower right corner to upper left corner (d) From lower right corner to upper right corner.

236 STATISTICS (CPT)

Page 40: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

19. When r = 1, all the points in a scatter diagram would lie

(a) On a straight line directed from (b) On a straight line directed from upper

lower left to upper right. left to lower right

(c) On a straight line (d) Both (a) and (b).

20. Product moment correlation coefficient may be defined as the ratio of

(a) The product of standard deviations (b) The covariance between the variables

of the two variables to the to the product of the variances of them

covariance between them

(c) The covariance between the (d) Either (b) or (c).

variables to the product of their

standard deviations

21. The covariance between two variables is

(a) Strictly positive (b) Strictly negative

(c) Always 0 (d) Either positive or negative or zero.

22. The coefficient of correlation between two variables

(a) Can have any unit. (b) Is expressed as the product of units

of the two variables

(c) Is a unit free measure (d) None of these.

23. What are the limits of the correlation coefficient?

(a) No limit (b) —1 and 1

(c) 0 and 1, including the limits (d) —1 and 1, including the limits

24. In case the correlation coefficient between two variables is 1, the relationship between the

two variables would be

(a) y = a + bx (b) y=a + bx, b >0

(c) y = a + bx, b <0 (d) y = a + bx, both a and b being positive.

25. If the relationship between two variables x and y is given by 2x + 3y + 4 = 0, then the value ofthe correlation coefficient between x and y is

(a) 0 (b) 1 (c) —1 (d) negative.

26. For finding correlation between two attributes, we consider

(a) Pearson’s correlation coefficient -

(b) Scatter diagram

(c) Spearman’s rank correlation coefficient

(d) Coefficient of concurrent deviations.LINEAR CORRELATION AND REGRESSION 237

Page 41: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

27. For finding the degree of agreement about beauty between two Judges in a Beauty Contest, weuse

(a) Scatter diagram (b) Coefficient of rank correlation

(c) Coefficient of correlation (d) Coefficient of concurrent deviation.

28. If there is a perfect disagreement between the marks in Geography and Statistics, then whatwould be the value of rank correlation coefficient?

(a) Any value (b) Only 1 (c) Only -1 (d) (b) or (c)

29. When we are not concerned with the magnitude of the two variables under discussion, weconsider

(a) Rank correlation coefficient (b) Product moment correlation coefficient

(c) Coefficient of concurrent deviation (d) (a) or (b) but not (c).

30. What is the quickest method to find correlation between two variables?

(a) Scatter diagram (b) Method of concurrent deviation

(c) Method of rank correlation (d) Method of product moment correlation

31. What are the limits of the coefficient of concurrent deviations?

(a) No limit

(b) Between —1 and 0, including the limiting values

(c) Between 0 and 1, including the limiting values

(d) Between —1 and 1, the limiting values inclusive

32. If there are two variables x and y, then the number of regression equations could he

(a) 1 (b) 2 (c) Any number (d) 3.

33. Since Blood Pressure of a person depends on age, we need consider

(a) The regression equation of Blood (b) The regression equation of age on

Pressure on age Blood Pressure

(c) Both (a) and (b) (d) Either (a) or (b).

34. The method applied for deriving the regression equations is known as

(a) Least squares (b) Concurrent deviation

(c) Product moment (d) Normal equation.

35. The difference between the observed value and the estimated value in regression analysis isknown as

(a) Error (b) Residue (c) Deviation (d) (a) or (b).

36. The errors in case of regression equations are

(a) Positive (b) Negative (c) Zero (d) All these.

238 STATISTICS (CPT)

Page 42: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

37. The regression line of y on x is derived by

(a) The minimisation of vertical (b) The minimisation of horizontal

distances in the scatter diagram distances in the scatter diagram

(c) Both (a) and (b) (d) (a) or (b).

38. The two lines of regression become identical when

(a) r = 1 (b) r = -1 (c) r = 0 (d) (a) or (b).

39. What are the limits of the two regression coefficients?

(a) No limit (b) Must be positive

(c) One positive and the other negative

(d) Product of the regression coefficient must be numericall less than unity.

40. The regression coefficients remain unchanged due to a

(a) Shift of origin (b) Shift of scale

(c) Both (a) and (b) (d) (a) or (b).

41. It the coefficient of correlation between two variables is —0 9, then the coefficient of determinationis

(a) 0.9 (b) 0.81 (c) 0.1 (d) 0.19.

42. If the coefficient of correlation between two variables is 0.7 then the percentage of variationunaccounted for is

(a) 70% (b) 30% (c) 51% (d) 49%

43. If for two variable x and y, the covariance, variance of x and variance of y are 40, 16 and 256respectively, what is the value of the correlation coefficient?

(a) 0.01 (b) 0.625 (c) 0.4 (d) 0.5

44. If cov(x, y) = 15, what restrictions should be put for the standard deviations of x and y?

(a) No restriction.

(b) The product of the standard deviations should be more than 15.

(c) The product of the standard deviations should be less than 15.

(d) The sum of the standard deviations should be less than 15.

45. If the covariance between two variables is 20 and the variance of one of the variables is 16,what would he the variance of the other variable?

(a) More than 100 (b) More than 10

(c) Less than 10 (d) More than 1.25

LINEAR CORRELATION AND REGRESSION 239

Page 43: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

46. If y = a + bx. then what is the coefficient of correlation between x and y?(a) 1 (b) —1(c) 1 or —1 according as b > 0 or b <0 (d) none of these.

47. If r = 0.6 then the coefficient of non-determination is(a) 0.4 (b) —0.6 (c) 0.36 (d) 0.64

48. if u + 5x = 6 and 3y — 7v = 20 and the correlation coefficient between x and y is 0.58 then whatwould be the correlation coefficient between u and v?(a) 0.58 (b) —0.58 (c) —0.84 (d) 0.84

49. If the relation between x and u is 3x + 4u + 7 = 0 and the correlation correlation between x andy is —0.6, then what is the correlation coefficient between u and v?(a) —0.6 (b) 0.8 (c) 0.6 (d) —0.8

50. From the following datax : 2 3 5 4 7y : 4 6 7 8 10Two coefficient of correlation was found to be 0.93. What is the correlation between u and v asgiven below ?u : -3 -2 0 -1 2v : -4 -2 -1 0 2(a) —0.93 (b) 0.93 (c) 0.57 (d) —0.57

51. Referring to the data presented in Q. No. 50, what would be the correlation between u and v ?u : 10 15 25 20 35v : -24 -36 -42 -48 -60(a) -0.6 (b) 0.6 (c) -0.93 (d) 0.93

52. If the sum of squares of difference of ranks, given by two judges A and B, of 8 students in 21,what is the value of rank correlation coefficient?(a) 0.7 (b) 0.65 (c) 0.75 (d) 0.8

53. If the rank correlation coefficient between marks in management and mathematics for a groupof student in 0.6 and the sum of squares of the differences in ranks in 66, what is the number ofstudents in the group?(a) 10 (b) 9 (c) 8 (d) 11

54. While computing rank correlation coefficient between profit and investment for the last 6 yearsof a company the difference in rank for a year was taken 3 instead of 4. What is the rectifiedrank correlation coefficient if it is known that the original value of rank correlation coefficientwas 0.4?

(a) 0.3 (b) 0.2 (c) 0.25 (d) 0.28

240 STATISTICS (CPT)

Page 44: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

55. For 10 pairs of observations. No of concurrent deviations was found to be 4. What is the valueof the coefficient of concurrent deviation?(a) 0.2 (b) 0.2 (c) 1/3 (d) - 1/3

56. The coefficient of concurrent deviation for p pairs of observations was found to be 1/ 3 . If thenumber of concurrent deviations was found to-be 6, then the value of p is.

(a) 10 (b) 9 (c) 8 (d) none of these

57. What is the value of correlation coefficient due to Pearson on the basis of the following data:

x : -5 -4 -3 -2 -1 0 1 2 3 4 5

y : 27 18 11 6 3 2 3 6 11 18 27

(a) 1 (b) -1 (c) 0 (d) -0.5

58. Following are the two normal equations obtained for deriving the regression line of y and x:

5a + 10b = 40

10a + 25b = 95

The regression line of y on x is given by

(a) 2x+3y=5 (b) 2y+3x=5 (c) y = 2 + 3x (d) y=3+5x

59. If the regression line of y on x and of x on y are given by 2x + 3y = —1 and 5x + 6y = -1 then thearithmetic means of x and y are given by

(a) (1, —1) (b) (-1, 1) (c) (-1 —1) (d) (2, 3)

60. Given the regression equations as 3x + y = 13 and 2x + 5y = 20, which one is the regressionequation of y on x?

(a) 1st equation (b) 2nd equation

(c) both (a) and (b) (d) none of these.

61. Given the following equations: 2x — 3y = 10 and 3x + 4y = 15, which one is the regressionequation of x on y?

(a) 1st equation (b) 2nd equation

(c) both the equations (d) none of these

62. If u = 2x + 5 and v = - 3y - 6 and regression-coefficient of y on is 2.4, what is the regressioncoefficient of v on u?

(a) 3.6 (b) -3.6 (c) 2.4 (d) -2.4

63. If 4y — 5x = 15 is the regression line of y on x and the coefficient of correlation between x andy is 0.75, what is the value of the regression coefficient of x on y?

(a) 0.45 (b) 0.9375 (c) 0.6 (d) none of these

LINEAR CORRELATION AND REGRESSION 241

Page 45: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

64. If the regression line of y on x and that of x on y are given by y = -2x + 3 and 8x = -y + 3respectively, what is the coefficient of correlation between x and y ?

(a) 0.5 (b) -1/ 2 (c) - 0.5 (d) none of these

65. If the regression coefficient of y on x, the coefficient of correlation between x and y and varianceof y are - 3/4, - 3/2 and 4 respectively, what is the variance of x ?

(a) 2/ 3/2 (b) 16/3 (c) 4/3 (d) 4

66. If y = 3x + 4 is the regression line of y on x and the arithmetic mean of x is —1, what is thearithmetic mean of y ?

(a) 1 (b) -1 (c) 7 (d) none of these

* * *

242 STATISTICS (CPT)

Page 46: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

ANSWER KEYS

LINEAR CORRELATION AND REGRESSION 243

Page 47: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

HOME WORK - 2

1. Regression analysis is concerned with

(a) Establishing a mathematical relationship between two variables.

(b) Measuring the extent of association between two variables.

(c) Predicting the value of the dependent variable for a given value of the

independent variable.

(d) Both (a) and (c).

2. The two variables are known to be_______ if the movement on the part of one variable doesnot produce any movement of the other variable in a particular direction.

(a) Correlated (b) Positive correlated

(c) Negative correlated (d) Uncorrelated

3. A small value of r indicates only a _________ linear type of relationship between the variables.(a) Good (b) Poor (c) Maximum (d) Highest

4. If for two variable x and y, the covariance, variance of x and variance of y are 40, 16 and 256respectively, what is the value of the correlation coefficient?

(a) 0.01 (b) 0.625 (c) 0.4 (d) 0.5

5. Karl Pearson’s coefficient is defined from(a) Ungrouped data. (b) Grouped data. (c) Both. (d) None.

6. For finding the degree of agreement about beauty between two Judges in a Beauty Contest, weuse______ .

(a) Scatter diagram (b) Coefficient of rank correlation

(c) Coefficient of correlation (d) Coefficient of concurrent deviation

7. When r = 0 then cov(x,y) is equal to(a) + 1 (b) – 1 (c) 0 (d) None of these.

8. The line x = a + by represents the regression equation of

(a) y on x (b) x on y

(c) Both of above. (d) None of above.

9. In case ‘The ages of husbands and wives’ correlation is ______ .

(a) Positive (b) Negative (c) Zero (d) One

10. Maximum value of Rank Correlation coefficient is

(a) –1 (b) +1 (c) 0 (d) None of these.

244 STATISTICS (CPT)

Page 48: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

11. The method applied for deriving the regression equations is known as

(a) Least squares. (b) Concurrent deviation.

(c) Product moment. (d) Normal equation.

12. The minimum value of correlation coefficient is

(a) 0 (b) –2 (c) 1 (d) –1

13. The two lines of regression become identical when

(a) r = 1 (b) r = –1 (c) r = 0 (d) (a) or (b)

14. The two variables are known to be _________ if the movement on the part of one variable doesnot produce any movement of other variable in a particular direction.

(a) Correlated (b) Uncorrelated

(c) Positive correlated (d) Negative correlated

15. The correlation between demand and price (for normal goods) is ________.

(a) Zero (b) Positive (c) Negative (d) None of these

16. If the coefficient of correlation between two variables is –0.3, then the coefficient of determinationis

(a) 0.3 (b) 0.09 (c) 0.7 (d) 0.9

17. If the coefficient of correlation between two variables is 0.6, then the percentage of variationaccounted for is

(a) 60% (b) 40% (c) 64% (d) 36%

18. The coefficient of correlation

(a) Has no limits. (b) Can be less than one.

(c) Can be more than one. (d) Varies between ± 1.

19. Which of the following statements is not false?

(a) Scatter diagram fails to measure the extent of relationship between the variables.

(b) Scatter diagram can measure correlation only when the variables are having a linearrelationship.

(c) Scatter diagram can measure correlation only when the variables are having a non–linearrelationship.

(d) None of these.

20. If two variables x and y are independent then the correlation coefficient between x and y is______.

(a) Positive (b) Negative (c) Zero (d) One

LINEAR CORRELATION AND REGRESSION 245

Page 49: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

21. The correlation between height and intelligence is ________.

(a) Zero (b) Positive (c) Negative (d) None of these

22. If x denotes height of a group of students expressed in cm. and y denotes their weight expressedin kg. , then the correlation coefficient between height and weight

(a) Would be shown in kg. (b) Would be shown in cm.

(c) Would be shown in kg. and cm. (d) Would be free from any unit.

23. The correlation between sale of cold drinks and day temperature is ________.

(a) Zero (b) Positive (c) Negative (d) None of these

24. In case of a ________, plotted points on a scatter diagram lie from lower left corner to upperright corner.

(a) Zero correlation (b) Negative correlation

(c) Positive correlation (d) Simple correlation

25. The coefficient of correlation between two variables is 0.5, then the coefficient of determinationis

(a) 0.5 (b) 0.25 (c) –0.5 (d) 0.5

26. In case of a _______, plotted points on a scatter diagram concentrate from upper left to lowerright.

(a) Zero correlation (b) Negative correlation

(c) Positive correlation (d) Multiple correlation

27. The correlation between Employment and Purchasing power is ______ .

(a) Positive (b) Negative (c) Zero (d) None of these

28. If the coefficient of correlation between two variables is –0.4, then the coefficient of determinationis

(a) 0.6 (b) 0.16 (c) 0.4 (d) 0.2

29. The correlation is said to be positive

(a) When the values of two variables move in the same direction.

(b) When the values of two variables move in the opposite direction.

(c) When the values of two variable would not change.

(d) None of these.

30. Coefficient of determination is defined as

(a) r3 (b) 1–r2 (c) 1+r2 (d) r2

246 STATISTICS (CPT)

Page 50: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

31. Regression coefficient is independent of the change of

(a) Scale. (b) Origin

(c) Both origin and scale. (d) Neither origin nor scale

32. In case of _______, plotted points on a scatter diagram would be equally distributedwithout depicting any particular pattern.

(a) Zero correlation (b) Positive correlation

(c) Negative correlation (d) Simple correlation

33. If the coefficient of correlation between two variables is 0.6, then the percentage of variationunaccounted for is

(a) 60% (b) 40% (c) 64% (d) 36%

34. The errors in case of regression equation are

(a) Positive (b) Negative (c) Zero (d) All these

35. The regression equation are 8x – 10y + 66 = 0 and 40 x – 18y = 214 find the coefficient ofcorrelation

(a) 4/5 (b) –4/5 (c) 3/5 (d) –1

36. r, bxy, byx all have _____________ sign.

(a) Different (b) Same (c) Both (d) None of these

37. The two regression lines obtained from certain data were y = x + 5 and 16 x = 9y – 94. Find thevariance of x if variance of y is 16.

(a) 4/16 (b) 9 (c) 1 (d) 5/16

38. For a group of 8 students the sum of squares of differences in ranks for Accounts and Economicsmarks was found to be 50. What is the rank correlation coefficient.

(a) 0.50 (b) 0.40 (c) 0.30 (d) 0.20

39. Correlation coefficient is not a pure number

(a) True (b) False (c) Both (d) None of these

40. The quickest method to find correlation between two variables is:

(a) Scatter diagram

(b) Method of concurrent deviation

(c) Method of Rank Correlation

(d) Method of Product moment Correlation

41. In rank correlation coefficient the association need not be linear

(a) False (b) True (c) Both (d) None of these

LINEAR CORRELATION AND REGRESSION 247

Page 51: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

42. Find the coefficient of correlation when its probable error is 0.2 and the number of pairs of itemis 9.

(a) 0.505 (b) 0.332 (c) 0.414 (d) None of these

43. If one of the regression coefficient is greater than unity, then other is less than unity.

(a) True (b) False (c) Both (d) None of these

44. To find coefficient of correlation by scatter diagram method is not suitable, if the number ofobservations is very large

(a) True (b) False (c) Both (d) None of these

45. For a set of 100 observation taking assumed mean as 4, the sum of the deviations is –11 cm andthe sum of squares of these deviations is 257 cm2. Find the coefficient of variation.

(a) 41.13% (b) 14.13% (c) 25.13% (d) 52.13%

46. The coefficient of rank correlation of marks obtained by 10 students in English and Economicswas found to be 0.5 it was later discovered that the difference in ranks in the two subjectsobtained by one student was wrongly taken as 3 instead of 7. Find correct coefficient of rankcorrelation.

(a) 0.514 (b) 0.26 (c) 0.15 (d) None of these

47. If 2x + 5y – 9=0 and 3x-y-5=0 are two regression equation, then find the value of mean of x andmean of y.

(a) 1,2 (b) 2,2 (c) 2,1 (d) 1,1

48. After settlement the average weekly wage in a factory has increased from Rs. 8 to Rs. 12 andstandard deviation has increased from 2 to 2.5. Find the coefficient of variation after thesettlement.

(a) 25% (b) 20.83% (c) 24.04% (d) 26.30%

49. If r = 0.8 then coefficient of determination shall be

(a) 0.64 (b) 0.40 (c) 0.60 (d) 0.80

50. For a group of 8 students, the sum of squares of differences in ranks for Economics and Englishmarks was 50. The value of rank correlation coefficient is ________.

(a) 0.40 (b) 0.50 (c) 0.30 (d) None of these

51. For a group of 8 students, the sum of squares of differences in ranks for Economics and Commercemarks was 50, the value of rank correlation coefficient is equal to

(a) 0.50 (b) 0.40 (c) 0.60 (d) None of these

52. In rank co-relation method the sum of difference of rank is

(a) 1 (b) –1 (c) 0 (d) Cannot say

248 STATISTICS (CPT)

Page 52: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

53. The relation between the production of Pig iron and Soot content in a factory is

(a) Positive (b) Negative (c) 0 (d) None of these

54. If the relation between two random variables x and y is 2x + 3y = 4, then the correlationcoefficient between them is

(a) –2/3 (b) 1 (c) –1 (d) None of these

55. For a two way frequency table having (m×n) classification the total number of cells is

(a) m (b) n (c) m + n (d) mn

56. For a m×n two way or bivariate frequency table, the maximum number of marginal distributionsis

(a) 1 (b) 2 (c) m+n (d) m.n

57. The correlation coefficient r is the ………….. of the two regression coefficients.

(a) G.M. (b) H.M.

(c) Arithmetic Mean (d) None of these

58. If r = 0, then

(a) There is a perfect correlation between x & y. (b) x and y are not correlated.

(c) There is a positive correlation between x & y. (d) Do not exist.

59. If Covariance (x, y) < 0; then the relation between two variable is

(a) Positive (b) Negative (c) (a) or (b) (d) None of these

60. Consider the two regression lines 3x+2y = 26 & 6x + y = 31. Find the mean values of x and y.

(a) x = 4 & y = 7 (b) x = 7 & y = 4

(c) x = 5 & y = 6 (d) None of these

61. Consider the two regression lines 3x+2y = 26 & 6x + y = 31. Find the correlation coefficientbetween x & y.

(a) 0.5 (b) –0.5 (c) 0.6 (d) None of these

62. Two regression lines are

(a) Reversible (b) not reversible

(c) cannot say (d) None of these

63. The two regression lines are 5x = 22 + y & 64x = 24 + 45y. Find the Standard Deviation of yfrom the given information.

(a) 4 (b) 5

(c) Cannot determined (d) None of these

LINEAR CORRELATION AND REGRESSION 249

Page 53: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

64. Which one of the following is a true statement?

(a) 1/2 (bxy + byx) = r (b) 1/2 (bxy+ byx) < r

(c) 1/2 (byx + byx) > r (d) None of these

65. The correlation between two variables x and y is found to be 0.4. What is the correlation between2x and (y) ?

(a) 0.4 (b) -0.4 (c) 0.6 (d) None of these

66. Find the coefficient of correlation between the following set of observation:

(a) 1 (b) -1 (c) 0 (d) None of these

67. Find the correlation coefficient between the following set of observation.

(a) 1 (b) - 1 (c) 0 (d) None of these

68. For the bivariate data [(x, y)] = [(20,5), (21,4), (22,3)], the correlation coefficient between x andy is(a) 0 (b) 1 (c) - 1 (d) 0.5

69. The regression of y on x is 2y + 3x = 4 and the correlation coefficient between x and y is 0.8. Thisstatement is(a) True (b) False (c) Cannot say (d) None of these

70. The correlation coefficient of 3x and -2y is the same as the correlation coefficient Of x and y.This statement is(a) True (b) False (c) Cannot say (d) None of these

71. When the correlation coefficient r = + 1, then the two repression lines are

(a) Perpendicular to each other (b) Coincide

(c) Parallel to each other (d) Do not exist

72. Find the correlation coefficient between the following set of observation.

(a) 1 (b) –1 (c) 0 (d) None of these

250 STATISTICS (CPT)

Page 54: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

73. The value of Spearman’s rank correlation coefficient of a certain number of observations was tobe 2/3. The sum of the squares of differences between the corresponding ranks was 55.

Find the number of Pairs.

(a) 10 (b) 12 (c) 11 (d) None of these

74. The co–variance between the two variables is

(a) always positive (b) always negative

(c) always 0 (d) Either positive or negative or Zero

75. The coefficient of regression of Y on X is byx = 1.2. If U = and V= find bvu

(a) 0.9 (b) 0.8 (c) 0.7 (d) None of these

76. Two regression coefficient bxy and byx are 1.2 and –0.5. This is

(a) True (b) False

(c) Either (a) or (b) (d) None of these

77. Two lines of regression are given by 5x+7y–22=0 and 6x+2y–22=0. If the variance of y is 15 findthe standard deviation of x.

(a) (b) (c) (d)

* * *

LINEAR CORRELATION AND REGRESSION 251

Page 55: LINEAR CORRELATION AND REGRESSION - CA Study Web€¦ · LINEAR CORRELATION AND REGRESSION CORRELATION Correlation is a statistical tool which studies the relationship between two

1 D 17 D 33 C 49 A 65 B

2 D 18 D 34 D 50 A 66 A

3 B 19 A 35 C 51 B 67 B

4 B 20 C 36 B 52 C 68 C

5 C 21 A 37 B 53 A 69 B

6 B 22 D 38 B 54 C 70 B

7 C 23 B 39 B 55 D 71 B

8 B 24 C 40 B 56 B 72 C

9 A 25 B 41 B 57 A 73 A

10 B 26 B 42 B 58 B 74 D

11 A 27 A 43 A 59 B 75 B

12 D 28 B 44 A 60 A 76 B

13 D 29 A 45 A 61 B 77 C

14 B 30 B 46 B 62 A

15 C 31 B 47 C 63 C

16 B 32 A 48 B 64 C

ANSWER KEYS

* * *

252 STATISTICS (CPT)