chapter 6

1

Chapter 6

Intercept A and Gradient of regression line, B

2

y = A + Bx

Dependent Variable, DV

Independent Variable, IV

y – intercept or constant term

Gradient or Slope of regression line

A Gradient or Slope of

regression line = B

y – intercept or constant term

x-axis

y- axis

3

y = A + Bx is sometimes called a Deterministic model and it gives an exact relationship between y and x

But in reality yobs is slightly different from the value predicted by ypre

So y = A + Bx + e where e is random error term to take into consideration the difference (see slide 5 if you do not understand this concept)

A and B are population parameters and the regression line is calledPopulation regression line and values of A and B in the population are called true values of the y-intercept and slope.

But population data are difficult to obtain. So we use sample data to estimate the population. Thus the values calculated from sample dataare estimates and so the y-intercept and the slope for the sample dataare denoted as ‘a’ and ‘b’ and yo is denoted as the predicted or estimatedvalue for a given x. yo = a + bx this equation is called estimated regression model; it gives the regression of y on x based on sample data

4

Example 1

Income Food Expenditure35 949 1521 739 1115 528 825 9

5

Scatter Plot for example 1

eRegression

Line

x1

yobs

e (error) = ypre - yobs

Or e = y - yo

ypre – y value predicted by regression line

or best straight line

Yobs – actual y value obtained

ypre

6

Error Sum of Squares, SSEThe sum of errors is always zero for the best straight line or least squares line.i.e. Σe = Σ(y –yo) = 0

So to find the line that best fits the points, we cannot minimize the sum of errors Since it will always be zero. Instead we minimize the error sum of squares, SSE

SSE = Σe2 = Σ(y –yo)2

The value of ‘a’ and ‘b’ that give the minimum SSE are called the least squaresestimates of A and B and the regression line obtained with these estimates iscalled the least squares regression line.

For the least squares regression line, yo = a + bx Where, b = SSxy and a = y - b x SSxx

y = mean of y scoresx = mean of x scores

7

SSxy = Σxy – (Σx) (Σy) n

Can be positive or negative

SSxx = Σx2 – (Σx)2

n

Is always positive

SSxy = Σ (x - x)(y – y)

SSxx = Σ (x – x)2

y = mean of y scoresx = mean of x scores

8

Example 1

Income Food x Expenditure, y xy x2 35 9 315 122549 15 735 240121 7 147 44139 11 429 152115 5 75 22528 8 224 78425 9 225 625

Σx = 212 Σy = 64 Σxy = 2150 Σx2 = 7222

Step 1: Compute Σx, Σy, x and y.

Σx = 212 Σy= 64 = Σx / n = 212 / 7 = 30.2857 = Σy / n = 64 / 7 = 9.1429

Step 2: Compute Σxy and Σx2

X Y

9

Step 3: Compute SSxy and SSxx

SSxy = Σxy – (Σx) (Σy) n

= 2150 – (212)(64) /7 = 211.7143

SSxx = Σx2 – (Σx)2

n

= 7222 – (212)2 / 7

= 801.4286

Step 4: Compute ‘a’ and ‘b’

b = SSxy and a = y - b x SSxx

a = 9.1429 – (.2642)(30.2857) = 211.7143 801.4286 a = 1.1414

= .2642

The estimated regression model ypre = a + bx is ypre = 1.1414 + .2642x

10

This gives the regression of food expenditure on income.Using this estimated regression model, we can find the predicted valueOf y for any specific value of x.

Eg. If the monthly income is RM3500, where x = 35 in hundred Then ypre = 1.1414 – (.2642)(35) = RM10.3884 hundred = RM1038.84

But the actual y value when x = 35 is RM900

There is an error in the prediction of –RM138.84 . This negative error indicatesthat the predicted value of y is greater than the actual value of y. Thus ifWe use the regression model, the household food expenditure is overestimatedby RM138.84

Calculate what happens when income = RM0?

11

Maths Science

32 45

67 56

23 12

86 79

65 73

55 65

32 40

67 77

90 87

31 40

56 49

77 82

10 13

75 76

67 68

77 79

34 45

28 31

44 49

Exercise 1

Calculate the regression equation for theMath (x scores) and Science (y scores) marks.

12

Maths History

32 70

67 30

23 80

86 45

65 35

55 65

32 65

67 35

90 10

31 70

56 49

77 42

10 90

75 40

67 59

77 51

18 81

28 55

44 49

Exercise 2

Calculate the regression equation for theMaths (x scores) and History (y scores) marks.

13

Exercise 3

Calculate the regression equation for the graph between IQ ranges (x axis) and the Correlation coefficients ( r) between Overall Creativity (OC) and Overall Achievement (OA) (y axis) based on the data on page 77 (Graph 7.1) (Palaniappan, 2006)

chapter 6

Documents