multiple regression (reduced set with minitab examples)

45
1 Slide Multiple Regression (Reduced Set with MiniTab Examples) Chapter 15 BA 303

Upload: baina

Post on 22-Feb-2016

58 views

Category:

Documents


3 download

DESCRIPTION

Multiple Regression (Reduced Set with MiniTab Examples). Chapter 15 BA 303. Multiple Regression. Estimated Multiple Regression Equation. Estimated Multiple Regression Equation. ^. y = b 0 + b 1 x 1 + b 2 x 2 + . . . + b p x p. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multiple Regression (Reduced Set with  MiniTab  Examples)

1 Slide

Multiple Regression(Reduced Set with MiniTab Examples)

Chapter 15BA 303

Page 2: Multiple Regression (Reduced Set with  MiniTab  Examples)

2 Slide

MULTIPLE REGRESSION

Page 3: Multiple Regression (Reduced Set with  MiniTab  Examples)

3 Slide

A simple random sample is used to compute sample statistics b0, b1, b2, . . . , bp that are used as the point estimators of the parameters b0, b1, b2, . . . , bp.

Estimated Multiple Regression Equation

^y = b0 + b1x1 + b2x2 + . . . + bpxp

Estimated Multiple Regression Equation

Page 4: Multiple Regression (Reduced Set with  MiniTab  Examples)

4 Slide

Least Squares Method

Least Squares Criterion

2ˆmin ( )i iy y

Page 5: Multiple Regression (Reduced Set with  MiniTab  Examples)

5 Slide

The years of experience, score on the aptitude test

test, and corresponding annual salary ($1000s) for a

sample of 20 programmers is shown on the next slide.

Programmer Salary Survey

Multiple Regression Model

A software firm collected data for a sample of 20

computer programmers. A suggestion was made that

regression analysis could be used to determine if

salary was related to the years of experience and the

score on the firm’s programmer aptitude test.

Page 6: Multiple Regression (Reduced Set with  MiniTab  Examples)

6 Slide

47158100166

92105684633

781008682868475808391

88737581748779947089

24.043.023.734.335.838.022.223.130.033.0

38.026.636.231.629.034.030.133.928.230.0

Exper.(Yrs.)

TestScore

TestScore

Exper.(Yrs.)

Salary($000s)

Salary($000s)

Multiple Regression Model

Page 7: Multiple Regression (Reduced Set with  MiniTab  Examples)

7 Slide

Suppose we believe that salary (y) is related tothe years of experience (x1) and the score on

theprogrammer aptitude test (x2) by the

following regression model:

Multiple Regression Model

where y = annual salary ($000) x1 = years of experience x2 = score on programmer aptitude test

y = b0 + b1x1 + b2x2 +

Page 8: Multiple Regression (Reduced Set with  MiniTab  Examples)

8 Slide

Solving for the Estimates of b0, b1, b2

Salary = 3.174 + 1.4039YearsExp + 0.25089ApScoreNote: Predicted salary will be in thousands of dollars.

Page 9: Multiple Regression (Reduced Set with  MiniTab  Examples)

9 Slide

MULTIPLE COEFFICIENT OF DETERMINATION

Page 10: Multiple Regression (Reduced Set with  MiniTab  Examples)

10 Slide

Multiple Coefficient of Determination

Relationship Among SST, SSR, SSE

where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error

SST = SSR + SSE

2( )iy y 2ˆ( )iy y= 2ˆ( )i iy y+

Page 11: Multiple Regression (Reduced Set with  MiniTab  Examples)

11 Slide

SSR, SSE, and SST

SSR

SSTSSE

Page 12: Multiple Regression (Reduced Set with  MiniTab  Examples)

12 Slide

Multiple Coefficient of Determination

R2 = SSR/SST

Page 13: Multiple Regression (Reduced Set with  MiniTab  Examples)

13 Slide

Adjusted Multiple Coefficientof Determination

R R nn pa

2 21 1 11

( )R R n

n pa2 21 1 1

1

( )

Where p is the number of independent variables in the regression equation.

Page 14: Multiple Regression (Reduced Set with  MiniTab  Examples)

14 Slide

R2 and R2a

834.079.59933.5002

SSTSSRR

81447.01220120)834.01(1

11)1(1 22

pnnRRa

Page 15: Multiple Regression (Reduced Set with  MiniTab  Examples)

15 Slide

TESTING FOR SIGNIFICANCE

Page 16: Multiple Regression (Reduced Set with  MiniTab  Examples)

16 Slide

Testing for Significance: F Test

The F test is referred to as the test for overall significance.

The F test is used to determine whether a significant relationship exists between the dependent variable and the set of all the independent variables.

Page 17: Multiple Regression (Reduced Set with  MiniTab  Examples)

17 Slide

Testing for Significance: F Test

Hypotheses

Rejection Rule

Test Statistics

H0: b1 = b2 = . . . = bp = 0 Ha: One or more of the parameters is not equal to zero.

F = MSR/MSE

Reject H0 if p-value < a or if F > Fa ,

where Fa is based on an F distributionwith p d.f. in the numerator andn - p - 1 d.f. in the denominator.

Page 18: Multiple Regression (Reduced Set with  MiniTab  Examples)

18 Slide

F Test for Overall Significance

Say a=0.05, is the regression significant overall?

Page 19: Multiple Regression (Reduced Set with  MiniTab  Examples)

19 Slide

A separate t test is conducted for each of the independent variables in the model.

The t test is used to determine whether each of the individual independent variables is significant.

Testing for Significance: t Test

We refer to each of these t tests as a test for individual significance.

Page 20: Multiple Regression (Reduced Set with  MiniTab  Examples)

20 Slide

Testing for Significance: t Test

Hypotheses

Rejection Rule

Test Statistics

Reject H0 if p-value < a orif t < -ta or t > ta where ta is based on a t distributionwith n - p - 1 degrees of freedom.

t bs

i

bi

t bs

i

bi

0 : 0iH b

: 0a iH b

Page 21: Multiple Regression (Reduced Set with  MiniTab  Examples)

21 Slide

t Test for Significanceof Individual Parameters

Say a=0.05, which parameters are significant?

Page 22: Multiple Regression (Reduced Set with  MiniTab  Examples)

22 Slide

MULTICOLLINEARITY

Page 23: Multiple Regression (Reduced Set with  MiniTab  Examples)

23 Slide

Multicollinearity

The term multicollinearity refers to the correlation among the independent variables.

When the independent variables are highly correlated, it is not possible to determine the separate effect of any particular independent variable on the dependent variable.

Every attempt should be made to avoid including independent variables that are highly correlated.

Page 24: Multiple Regression (Reduced Set with  MiniTab  Examples)

24 Slide

Multicollinearity

The Variance Inflation Factor (VIF) measures how much the variance of the coefficient for an independent variable is inflated by one or more of the other independent variables.

This inflation of the variance means that the independent variable is highly correlated with at least one other independent variable.• VIF around 1 = no multicollinearity (good)• VIF much greater than 1 = multicollinearity

(bad)• “much greater” is subjective!

Page 25: Multiple Regression (Reduced Set with  MiniTab  Examples)

25 Slide

Multicollinearity

VIF values not available in Excel MiniTab:

Page 26: Multiple Regression (Reduced Set with  MiniTab  Examples)

26 Slide

ESTIMATION AND PREDICTION

Page 27: Multiple Regression (Reduced Set with  MiniTab  Examples)

27 Slide

Using the Estimated Regression Equationfor Estimation and Prediction

The procedures for estimating the mean value of y and predicting an individual value of y in multiple regression are similar to those in simple regression.

We substitute the given values of x1, x2, . . . , xp into the estimated regression equation and use the corresponding value of y as the point estimate.

Page 28: Multiple Regression (Reduced Set with  MiniTab  Examples)

28 Slide

PI and CI Using MiniTab

Page 29: Multiple Regression (Reduced Set with  MiniTab  Examples)

29 Slide

CATEGORICAL VARIABLES

Page 30: Multiple Regression (Reduced Set with  MiniTab  Examples)

30 Slide

In many situations we must work with categorical independent variables such as gender (male, female), method of payment (cash, check, credit card), etc.

For example, x2 might represent gender where x2 = 0 indicates male and x2 = 1 indicates female.

Categorical Independent Variables

In this case, x2 is called a dummy or indicator variable.

Page 31: Multiple Regression (Reduced Set with  MiniTab  Examples)

31 Slide

The years of experience, the score on the programmer aptitude test, whether the individual has a relevant graduate degree, and the annual salary ($000) for each of the sampled 20 programmers are shown on the next slide.

Categorical Independent Variables

Programmer Salary SurveyAs an extension of the problem involving the computer programmer salary survey, suppose that management also believes that the annual salary is related to whether the individual has a graduate degree in computer science or information systems.

Page 32: Multiple Regression (Reduced Set with  MiniTab  Examples)

32 Slide

47158100166

92105684633

781008682868475808391

88737581748779947089

24.043.023.734.335.838.022.223.130.033.0

38.026.636.231.629.034.030.133.928.230.0

Exper.(Yrs.)

TestScore

TestScore

Exper.(Yrs.)

Salary($000s)

Salary($000s)Degr.

NoYes NoYesYesYes No No NoYes

Degr. Yes NoYes No NoYes NoYes No No

Categorical Independent Variables

If grad degree, Degr = 1. If no grad degree, Degr = 0.

Page 33: Multiple Regression (Reduced Set with  MiniTab  Examples)

33 Slide

47158100166

92105684633

781008682868475808391

88737581748779947089

24.043.023.734.335.838.022.223.130.033.0

38.026.636.231.629.034.030.133.928.230.0

Exper.(Yrs.)

TestScore

TestScore

Exper.(Yrs.)

Salary($000s)

Salary($000s)Degr.

01 0111 0 0 01

Degr. 1 01 0 01 01 0 0

Categorical Independent Variables

Page 34: Multiple Regression (Reduced Set with  MiniTab  Examples)

34 Slide

Estimated Regression Equation

^where: y = annual salary ($1000) x1 = years of experience x2 = score on programmer aptitude test x3 = 0 if individual does not have a graduate degree 1 if individual does have a graduate degree

x3 is a dummy variable

y = b0 + b1x1 + b2x2 + b3x3^

Page 35: Multiple Regression (Reduced Set with  MiniTab  Examples)

35 Slide

Categorical Independent Variables

Page 36: Multiple Regression (Reduced Set with  MiniTab  Examples)

36 Slide

Categorical Independent Variables

Page 37: Multiple Regression (Reduced Set with  MiniTab  Examples)

37 Slide

Categorical Independent Variables

Page 38: Multiple Regression (Reduced Set with  MiniTab  Examples)

38 Slide

More Complex Categorical Variables

If a categorical variable has k levels, k - 1 dummy variables are required, with each dummy variable being coded as 0 or 1.

For example, a variable with levels A, B, and C could be represented by x1 and x2 values of (0, 0) for A, (1, 0) for B, and (0,1) for C.

Care must be taken in defining and interpreting the dummy variables.

Page 39: Multiple Regression (Reduced Set with  MiniTab  Examples)

39 Slide

For example, a variable indicating level of education could be represented by x1 and x2 values as follows:

More Complex Categorical Variables

HighestDegree x1 x2

Bachelor’s 0 0Master’s 1 0Ph.D. 0 1

Page 40: Multiple Regression (Reduced Set with  MiniTab  Examples)

40 Slide

AND RESIDUALS

Page 41: Multiple Regression (Reduced Set with  MiniTab  Examples)

41 Slide

The variance of , denoted by 2, is the same for all values of the independent variables.

The error is a normally distributed random variable reflecting the deviation between the y value and the expected value of y given by b0 + b1x1 + b2x2 + . . + bpxp.

Assumptions About the Error Term

The error is a random variable with mean of zero.

The values of are independent.

Page 42: Multiple Regression (Reduced Set with  MiniTab  Examples)

42 Slide

Standardized Residual Plot Against ̂y Standardized residuals are frequently used in

residual plots for purposes of:• Identifying outliers (typically, standardized

residuals < -2 or > +2)• Providing insight about the assumption that

the error term has a normal distribution

Page 43: Multiple Regression (Reduced Set with  MiniTab  Examples)

43 Slide

Standardized Residual Plot Against ̂y

Page 44: Multiple Regression (Reduced Set with  MiniTab  Examples)

44 Slide

Residuals

Page 45: Multiple Regression (Reduced Set with  MiniTab  Examples)

45 Slide