[revised]simple linear regression and correlation

41
# BY : SOEWONO, DRS. SIMPLE LINEAR REGRESSION AND CORRELATION

Upload: adam-zakiy

Post on 25-Dec-2015

51 views

Category:

Documents


4 download

DESCRIPTION

[Revised]Simple Linear Regression and Correlation

TRANSCRIPT

Page 1: [Revised]Simple Linear Regression and Correlation

#

BY : SOEWONO, DRS.

SIMPLE LINEAR REGRESSION

AND CORRELATION

Page 2: [Revised]Simple Linear Regression and Correlation

#

SUMMARY SIMPLE LINEAR REGRESSION

Managerial decisions ⇩Relationship between two or more variables.Objective approach is to collect data on the two variables and then use statistical procedures to determine how the variables are related. ⇩Regression Analysis m is a statistical procedure that can be used to develop a mathematical equation showing how variables are related. ⇩In this section we consider the simplest type of regression , is called : Simple Linear Regression.

Page 3: [Revised]Simple Linear Regression and Correlation

#

What is SLR ?

⇩Situations involving one independent and one dependent variable for which the relationship between the variables is approximated by a straight line is called simple linear regression. ⇩Regression Analysis involving two or more independent variables is called multiple regression analysis.

Page 4: [Revised]Simple Linear Regression and Correlation

#

THE LEAST SQUARES METHODA graph of the available data in which the independent

variable appears on the horizontal axis and dependent variable appears on the vertical axis is called : scatter diagram / scatter gram .

⇩The least square method is a procedure that is used to find the straight line that provides the best approximation for the ralationship between the independent and dependent variables.

Page 5: [Revised]Simple Linear Regression and Correlation

#

The least squares method provides an estimated regression equation that minimizes the sum of squared deviations between the observed values of the dependent variable and the estimated values of the dependent variable.

No other straight line will produce a sum of squared deviations / error as small as.

The regression line fits ⇨ Data is the best.

Page 6: [Revised]Simple Linear Regression and Correlation

#

SIMPLE LINEAR REGRESSION

Regression analysis is a statistical procedure that can be used to develop a mathematical equation showing how variables are related

THE SIMPLEST TYPE OF REGRESSION ?

Situations involving one independent and one dependent variable for which the relationship between the variables is approximated by a straight line, is called simple linear regression

Page 7: [Revised]Simple Linear Regression and Correlation

#

The least squares method is a procedure that is used to find the straight line that provides the best approximation for the

relationship between the independent snd dependent variables

THE TECHNIQUE THAT PRODUCES THIS LINE IS CALLED

LSM

This line is called : the least squares line or the the fitted line or the regression line

LEAST SQUARES METHOD

1

Page 8: [Revised]Simple Linear Regression and Correlation

#

2. The least squares method provides an estimated regression equation that minimizes the sum of squared deviations between the observed value of the dependent variable (y i) and the estimated values of the dependent variable ( i)

LEAST SQUARES METHOD

• The difference between the points and the line are called residuals.

• The minimized sum of squared difference is called SSE, the sum of squares for error

No other straight line will produce a sum of squares error as small as

sum of squaresdue to error

Page 9: [Revised]Simple Linear Regression and Correlation

#

1. Introduction

John Maynard Keynes, a great British economist, wanted to explain fluctuations in consumer spending. He believed that consumer spending was one of the keys to understanding economic booms and busts. Keynes hypothesized that household income was the primary determinant of household spending.When income goes up, people spend more; when their income drops, they spend less.A simple algebraic representation of Keynes’s theory is :

Y = α + β X

Where Y is consumer spending and X is income; α and β are two unknown parameters that describe the relationship between income and consumption.Income is the explanatory variable, because changes in spending. Spending is the dependent variable, because spending depends on income.The sign of the parameter β tells us whether there is a positive or negative relationship between the explanatory and dependent variables.

Page 10: [Revised]Simple Linear Regression and Correlation

#

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.50

0.5

1

1.5

2

Positive Relationship

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

1

2

3

4

Negative RelationshipY

X

Y increases as X increases

X

Y

Y decreases as X increases

0 0.2 0.4 0.6 0.8 1 1.2 1.40

1

2

3

4

No RelationshipY

X

Y isn’t affected by X

Page 11: [Revised]Simple Linear Regression and Correlation

#

The term regression was first used as a statistical concept in 1877 by Sir

Francis Galton.

Galton made a study that showed that the height of children born to

tall parents will tend to move back, or “regress”, toward the mean

height of the population. Galton called the line describing this

relationship a “line of regression”.

He designated the word regression as the name of the general process

of predicting one variable (the height of the children) from another

(the height of the parent).

In regression analysis, we shall develop an estimating equation, that is,

a mathematical formula that relates the known variables to the

unknown variable.

Page 12: [Revised]Simple Linear Regression and Correlation

#

2. Type of Regression Model

The first step in any regression analysis is to assemble the data. The next step is to plot

the data.

The term “scatter plot” is used for the representation of data pair (x,y), where y is the

dependent / response variable and x is the explanatory / regressor / predictor /

independent variable.

In general a data set will consist of n observation points : (x1, y1), (x2, y2),…, (xn, yn).

Bivariate data set is necessary to determine the equation or model which relates the

explanatory variable, x , to the dependent variable y.

Situations involving one explanatory and one dependent variable for which the

relationship between the variable is approximated by as straight line; this is called

simple linear regression. The regression line, can be used to estimate of y for any given

value of x. Regression analysis involving two or more explanatory variables is called

multiple regression analysis.

Page 13: [Revised]Simple Linear Regression and Correlation

#

3. The Linear Regression model

Clearly any regression line must pass as close as possible to all of the data points. The model is generally used to provide an estimate for y for any given value of x and so the difference between the value of y which is actually observed and the corresponding value of y proposed by the model, the error, should be as small as possible for every data point.The equation of the straight-line model could be written in many ways. Commonly used forms are :

y = m x + n ; y = a +b x

In the context of regression analysis the equation of the straight line is usually written as:

yi = β0 + β1 xi ; i = 1,2,….,n

Where β0 is the intercept and β1 is the slope of the line

In practice, a perfect straight line passing through all the observation points, never occurs.

Page 14: [Revised]Simple Linear Regression and Correlation

#

The model for the perfect straight line plus an error, which may be positive or negative, may be written as :

yi = β0 + β1 xi + ei ; i = 1,2,…,n

The true value of the regression constant, β0, is estimated from the sample data as b0 or a. The true value of the slope, β1 , is estimated from the sample data as b1 or b.

The estimated value of y when x = xi is denoted by ŷi and may be calculated by using the equation:

ŷi = β0 + β1 xi or ŷi = a + bxi

Page 15: [Revised]Simple Linear Regression and Correlation

#

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

3.5 y

xix1 x2 xnx

(x1, y1)

(xn, yn)

(xi, yi)

(x2, y2)(x1, ŷ1)

(xi, ŷi)

di = yi - ŷi = ei

(x2, ŷ2)

(xn, ŷn)

( , )

ŷi = a + b xa = - b

A line filled to data (xi, yi) ; i = 1,2,….,n by the method of least squares.

Adrien Marie Legendre

Page 16: [Revised]Simple Linear Regression and Correlation

#

VARIABLE IN REGRESSION ANALYSIS

X Y

Predictor Predicted

Independent variable Dependent variable

Explanatory variable Explained variable

Stimulus Response

Exogenous Endogenous

Known variable Unknown variable

1. Regression analysis is predicting one variable from the other, using an estimated straight line that summarizes the relationship between the variables.

2. Linear regression analysis is predicting one variable from the other, when the two have a linear relationship.

3. Each of your data points has a residual, which tells you how far the point is above (or below, if negative) the line.

RESIDUAL = ACTUAL Y – PREDICTED Y (Ŷ)= Y – (a + b X)

Page 17: [Revised]Simple Linear Regression and Correlation

#

4. The Least Squared Method or Method of Least Squared

A mathematical technique that determines the values of a and b that minimized difference ( Y - Ŷ) is known as the least squares method.

Simple linier regression analysis is concerned with finding the straight line that fits the data best. The best fit means that we wish to find the straight line for which the differences between the actual values Yi and the values that would be predicted from the filled line of regression Ŷi are as small as possible.Because these differences will be both positive and negative for different observations, mathematically we minimize :

where Yi = actual value of Y for observation i Ŷi = predicted value of Y for observation i

Since Ŷi = a + b Xi , we are minimizing

Page 18: [Revised]Simple Linear Regression and Correlation

#

A mathematical technique that determines the values of a and b that minimizes this difference is known as the least squares method.Let,

Hence, ………………………………(*)

To minimize (*), we must take the partial derivatives with respect to a and b, set them equal to zero. We get :

………………………………(1)

………………………………(2)From (1)

Page 19: [Revised]Simple Linear Regression and Correlation

#

From (2)

Page 20: [Revised]Simple Linear Regression and Correlation

#

Hence, the equation to the line of best fit can be written as

This line is called the line of regression of Y on XThe other equation of the line, known as the line of

regression of X on Y,

Page 21: [Revised]Simple Linear Regression and Correlation

#

5. The Coefficient of Determination

Coefficient of determination can be defined as :

……………….(*)Where

Where SSE = error sum of squares

The sample coefficient of correlation r may be obtained from equation (*);so that :

regression sum of squares SSR total sum of squares SST

Page 22: [Revised]Simple Linear Regression and Correlation

#

The sample coefficient of correlation r, can be computed directly using the following formula :

or, using the “calculator” formula :

Page 23: [Revised]Simple Linear Regression and Correlation

#

EXERCISES

1. The director of Graduate Studies at a large college of business would like to be able to predict the Grade Point Index (GPI) of students in an MBA program based on Graduate Management Aptitude Test (GMAT) score. A sample of 15 students who had completed 2 years in the program is selected; the results are as follows :

Relating GPI to GMAT score

(a) Plot a scatter diagram /scatter gram /scatter plot(b) Use the least square method to find the regression coefficients a and b(c) Use the regression model, to predict the GPI for a student with a GMAT score of 600

Observation GMAT score GPI Observation GMAT score GPI

1 688 3.72 9 616 3.45

2 647 3.44 10 594 3.33

3 652 3.21 11 567 3.07

4 608 3.29 12 542 2.86

5 680 3.91 13 551 2.91

6 617 3.28 14 573 2.79

7 557 3.02 15 536 3.00

8 599 3.13

Page 24: [Revised]Simple Linear Regression and Correlation

#

2. Given are five observations taken for two variables X and Y

Observation

(a). Develop a scatter plot for these data

(b). Use the method of least squares to compute an estimated

regression equation for the data

i Xi Yi

1 2 25

2 3 25

3 5 20

4 1 30

5 8 16

Page 25: [Revised]Simple Linear Regression and Correlation

#

3. The following data were collected regarding the monthly starting salaries and the Grade Point Averages (GPA) for undergraduate students who had obtained a degree in political science.

(a) Develop a scatter gram for these data(b) Use the least squares method to develop the estimated regression equation(c) Predict the monthly starting salary for a student with a 3.0 GPA and for a student with a 3.5 GPA

GPA(x)

Monthly Salary ($)(y)

2.6 1100

3.4 1400

3.6 1800

3.2 1300

3.5 1600

2.9 1200

Page 26: [Revised]Simple Linear Regression and Correlation

#

4. A real estate agent would like to predict the selling price of single-family homes. After careful consideration, he concludes that the variable likely to be most closely related to the selling price is the size of the house. As an experiment, he takes a random sample of 15 recently sold house and records the selling price (in $1,000) and the size (in 100 ft2) of each.These data are shown in the accompanying table. Find the sample regression line for data.

House Size(x)

Selling Price(y)

House Size(x)

Selling Price(y)

20.0 89.5 24.3 119.9

14.8 79.9 20.2 87.6

20.5 83.1 22.0 112.6

12.5 56.9 19.0 120.8

18.0 66.6 12.3 78.5

14.3 82.5 14.0 74.3

27.5 126.3 16.7 74.8

16.5 79.3

Page 27: [Revised]Simple Linear Regression and Correlation

#

5. Students in a small class were polled by a surveyor attempting to establish a relationship between hours of study in the week immediately preceding a mayor midterm exam and the marks received on the exam. The surveyor gathered the data listed in the accompanying table.

(a) Find the equation of the regression line to help predict the exam score on the basis of study hours.(b) If a student study 16 hours, what is exam score?

Hours of Study(x)

Exam Score(y)

25 93

12 57

18 55

26 90

19 82

20 95

23 95

15 80

22 85

8 61

Page 28: [Revised]Simple Linear Regression and Correlation

EXAMPLES

1. Suppose the data in tabel (below) represent the grade point averages of 15 recent graduates and their starting annual salaries

04/19/2023 28SWN/PROBABILITY AND STATISTIC

Page 29: [Revised]Simple Linear Regression and Correlation

#

GPA Starting salary

2.95 18.5

3.20 20.0

3.40 21.1

3.60 22.4

3.20 21.2

2.85 15.0

3.10 18.0

2.85 18.8

3.05 15.7

2.70 14.4

2.75 15.5

3.10 17.2

3.15 19.0

2.95 17.2

2.75 16.8

04/19/2023 29SWN/PROBABILITY AND STATISTIC

Page 30: [Revised]Simple Linear Regression and Correlation

#

a) Determine a regression equation for average starting salary as a function of grade point average.

b) Determine the sample coefficient of correlation r (correlation coefficient)

Note: regression equation = regression line = least squares line = least squares prediction equation

The methodology used to obtain this line is called the least squares method (method of least squares)

04/19/2023 30SWN/PROBABILITY AND STATISTIC

Page 31: [Revised]Simple Linear Regression and Correlation

#

GPA SALARY ESTIMATED SALARY

2,95 18,5 54,575 8,7025 342,25 17,32

3,20 20,0 64,000 10,2400 400,00 19,35

3,40 21,1 71,740 11,5600 445,21 20,98

3,60 22,4 80,640 12,9600 501,76 22,60

3,20 21,2 67,840 10,2400 449,44 19,35

2,85 15,0 42,750 8,1225 225,00 16,51

3,10 18,0 55,800 9,6100 324,00 18,54

2,85 18,8 53,580 8,1225 353,44 16,51

3,05 15,7 47,885 9,3025 246,49 18,13

2,70 14,4 38,880 7,2900 207,36 15,29

2,75 15,5 42,625 7,5625 240,25 15,70

3,10 17,2 53,320 9,6100 295,84 18,54

3,15 19,0 59,850 9,9225 361,00 18,95

2,95 17,2 50,740 8,7025 295,84 17,32

2,75 16,8 46,200 7,5625 282,24 15,70

45,6 270,8 830,425 139,5100 4970,12 270,7904/19/2023 31SWN/PROBABILITY AND

STATISTIC

Page 32: [Revised]Simple Linear Regression and Correlation

a) Jadi persamaan regresi (estimasi/taksiran)

b) Correlation coefficient (koeffisient korrelasi)

04/19/2023 32SWN/PROBABILITY AND STATISTIC

Page 33: [Revised]Simple Linear Regression and Correlation

2.The heights of fathers, X, and the heights of their oldest sons when grown, Y, are given as measurements to the nearest inch.

a) Construct a scattergram / scatterplotb) Find the equation of the least squares regression linec) Compute the coefficient of correlation r?

X 68 64 70 72 69 74

Y 67 68 69 73 66 70

04/19/2023 33SWN/PROBABILITY AND STATISTIC

Page 34: [Revised]Simple Linear Regression and Correlation

#

SOLUSIa) Buat dan kerjakan sendiri dalam sistim koordinat orthogonal.

b) X Y X2 Y2 XY

68 67 4624 4489 4556

64 68 4096 4624 4352

70 69 4900 4761 4830

72 73 5184 5329 5256

69 66 4761 4356 4554

74 70 5476 4900 5180

417 413 29.041 28.459 28.728Totals

04/19/2023 34SWN/PROBABILITY AND STATISTIC

Page 35: [Revised]Simple Linear Regression and Correlation

#

Jadi, persamaan garis regressi :

c) Koefficien korrelasi r ?

04/19/2023 35SWN/PROBABILITY AND STATISTIC

Page 36: [Revised]Simple Linear Regression and Correlation

#

3. A company would like to predict how the trainees in its salesmanship court will perform. At the beginning of their two –months course, the trainees are given an aptitude test. This is the X-score shown below. Records are kept of the sales records of each salesman and constitute Y-values.

a. Plot this data

b. Find the regression line relating performance on the test to sales

c. The coefficient corellation

X 18 26 28 34 36 42 48 52 54 60

Y 54 64 54 62 68 70 76 66 76 74

04/19/2023 36SWN/PROBABILITY AND STATISTIC

Page 37: [Revised]Simple Linear Regression and Correlation

#

Solusi

Buat tabulasi, untuk menghitung ∑X, ∑Y, , dan ∑XY

∑y2 = 44680

X Y XY Y²

18 57 324 972 2916

26 64 676 1664 4096

28 54 784 1512 2916

34 62 1156 2108 3844

36 68 1296 2448 4624

42 70 1764 2940 4900

48 76 2304 3648 5776

52 66 2704 3432 4356

54 76 2916 4104 5776

60 74 3600 4440 5476

2X

175242 X398 X 664Y 27268 XY

Dari sini dapat dihitung b dan a

XY 4994.052.46

04/19/2023 37SWN/PROBABILITY AND STATISTIC

Page 38: [Revised]Simple Linear Regression and Correlation

#

`• 1.The advertising expense and profit of

company for each of five years is given below:

• Aa• East

• a.Find the equation of the least-square line.• b. If advertising expense is to be $9000 for particular

years, predict for profit that year.

Adversiting expense(in thousands) $

Profit(in thousands) $

2 15

7 50

10 110

15 220

20 200

Page 39: [Revised]Simple Linear Regression and Correlation

#

2. The mathematics SAT scores and the college grade point averages of 8 students are given below:

Calculate the correlation coefficient r !

Math SAT score (x) College Grade Point Average (y)

600 3.20

550 3.00

500 3.00

650 3.50

625 2.80

480 2.60

700 3.60

580 3.10

Page 40: [Revised]Simple Linear Regression and Correlation

#

3. A personnel manager wants to predict the salary for a system analyst based on number of years experience. A random sample of 12 systems analyst produces the following results:

a. find the least-squares line

b. predict the salary of a systems analyst with 5 year’s ex. Year’s experience Salary(thousands)

5.5 19.9

9.0 25.5

4.0 23.9

8.0 24.0

9.5 22.5

3.0 20.5

7.0 21.0

1.5 17.7

8.5 30.0

7.5 25.0

9.5 21.0

6.0 18.6

Page 41: [Revised]Simple Linear Regression and Correlation

#

Year’s experience S5alary(thousands)

5.5 19.9

9.0 25.5

4.0 23.9

8.0 24.0

9.5 22.5

3.0 20.5

7.0 21.0

1.5 17.7

8.5 30.0

7.5 25.0

9.5 21.0

6.0 18.6