multiple regression continued…

Post on 23-Feb-2016

38 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

STAT E-150 Statistical Methods. Multiple Regression continued…. When we discussed simple linear regression, we briefly introduced prediction intervals and confidence intervals: Confidence Intervals and Prediction Intervals Let x  be a specific value of x. - PowerPoint PPT Presentation

TRANSCRIPT

Multiple Regressioncontinued…

STAT E-150Statistical Methods

2

When we discussed simple linear regression, we briefly introduced prediction intervals and confidence intervals:

Confidence Intervals and Prediction Intervals 

Let x be a specific value of x. The predicted value of y is

 We can create two different intervals:

  a prediction interval for an individual value of x

a confidence interval for the mean predicted value at x

3

The basic format for an interval is

  When we want to find a mean predicted value,

 

When we want to find an individual predicted value,

4

Let us return to our earlier discussion of the age of adolescent mothers and the weight of their babies. We found that there was a linear relationship between these variables:

weight = 245.15 age – 1163.45

 

How can we use this model to make predictions?

5

Suppose we want to predict the weight of a baby born to a mother who is 16 years old. When we analyze the data, we can choose to save the predicted values, the confidence interval and the prediction interval for each predictor value. The results will appear in the datasheet:

x-value predicted 95% CI 95% CI y-value confidence interval prediction interval

6

What weight is expected for a baby of a 16 year old mother?

7

What weight is expected for a baby of a 16 year old mother? 2759 g

8

What is the prediction interval estimate for the weight of a baby of a 16 year old mother?

9

What is the prediction interval estimate for the weight of a baby of a 16 year old mother? 2251.24 to 3266.66 g

What does it tell you? We are 95% confident that the birthweight of a baby born to a 16 year old mother is between 2575.59 and 2942.31 g.

10

What is the prediction interval estimate for the weight of a baby of a 16 year old mother? 2251.24 to 3266.66 g

What does it tell you? We are 95% confident that the birthweight of a baby born to a 16 year old mother is between 2251.24 and 3266.66 g.

11

What is the confidence interval estimate for the mean weight of babies of 16 year old mothers?

12

What is the confidence interval estimate for the mean weight of babies of 16 year old mothers? 2575.59 to 2942.31 g

What does it tell you? We are 95% confident

13

What is the confidence interval estimate for the mean weight of babies of 16 year old mothers? 2575.59 to 2942.31 g

What does it tell you? We are 95% confident that the mean birthweight of babies born to 16 year old mothers is between 2575.59 and 2942.31 g.We are 95% confident

14

The 95% confidence interval is (2575.59, 2942.31)

The 95% prediction interval is (2251.24, 3266.66)

Which is interval is wider? Why?

15

The 95% confidence interval is (2575.59, 2942.31)

The 95% prediction interval is (2251.24, 3266.66)

Which is interval is wider? Why?

The prediction interval is wider, because means vary less than individual values.

16

In the data concerning body fat percentages in men, the predictor variables were waist and height, and we found a regression equation which we can now use to make predictions:

%BodyFat = 1.773 waist - .601 height – 3.110 We can find prediction intervals and confidence intervals as we did when we used a single predictor.

17

Suppose we want to predict the body fat percentage associated with a waist size of 34 inches and a height of 6 feet. We can proceed as we did with a single predictor, by entering these values in the data window, and then saving the results of the linear regression analysis.

18

When you scroll to the right, you will see these results:

What is the predicted body fat %?

19

When you scroll to the right, you will see these results:

What is the predicted body fat %? 13.874%

20

When you scroll to the right, you will see these results:

What is the prediction interval? What does it tell you?

21

When you scroll to the right, you will see these results:

What is the prediction interval? What does it tell you?

The 95% prediction interval is (5.05, 22.69)

22

When you scroll to the right, you will see these results:

What is the prediction interval? What does it tell you?

We are 95% confident that a man who is 6 feet tall and has a 34 inch waist will have a body fat percentage between 5.05 and 22.69.

23

When you scroll to the right, you will see these results:

What is the confidence interval? What does it tell you?

24

When you scroll to the right, you will see these results:

What is the confidence interval? What does it tell you?

The 95% confidence interval is (13.10, 14.65)

25

When you scroll to the right, you will see these results:

What is the confidence interval? What does it tell you?

We are 95% confident that the mean body fat percentage for men who are 6 feet tall and have a 34 inch waist is between 13.10 and 14.65.

26

Models with Categorical Predictors

Categorical (or qualitative) variables can also be included in multiple regression models. These variables are coded as numbers so that we can employ the methods we have discussed. These coded values are called indicator variables or dummy variables.

They are often coded using 0 and 1, where   0 = absence or 0 = "no"

1 = presence 1 = "yes"

27

Example: One way colleges measure success is by graduation rates. The Education Trust publishes 6-year graduation rates along with other college characteristics on its website, www.collegeresults.org.

28

Here is a sample of the data, which represents a random sample of 22 colleges selected from the 1037 colleges in the United States with enrollments under 5000 students:

29

We define these variables:

y = 6-year graduation ratex1 = median SAT score of students accepted to the college x2 = student-related expense per full-time student (in dollars)

30

The regression model is y = β0 + β1x1 + β2x2 + β3x3 + ε

For single-sex colleges:

Rate = β0 + β1 SAT + β2 Expense + β3(1) = β0 + β1 SAT + β2 Expense + β3 + ε

For coeducational colleges:

Rate = β0 + β1 SAT + β2 Expense + β3(0) = β0 + β1 SAT + β2 Expense + ε

In either case, the slopes are determined using data from both types of colleges.

31

For single-sex colleges, the intercept is β0 + β3:

Rate = β0 + β1 SAT + β2 Expense + β3(1) = β0 + β1 SAT + β2 Expense + β3 + ε = (β0 + β3) + β1 SAT + β2 Expense + ε

For coeducational colleges: Rate = β0 + β1 SAT + β2 Expense + β3(0) = β0 + β1 SAT + β2 Expense + ε

In other words, the coefficient of the indicator variable represents the difference in intercepts for the regression lines for the two types of colleges.

32

What are the hypotheses?H0: β1 = β2 = β3 = 0Ha: The coefficients are not all zero

33

What are the hypotheses?

H0: β1 = β2 = β3 = 0Ha: The coefficients are not all zero

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression .795 3 .265 37.164 .000a

Residual .128 18 .007

Total .923 21

a. Predictors: (Constant), x3, x1, x2

b. Dependent Variable: y

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) -.391 .198 -1.977 .064

x1 .001 .000 .608 3.305 .004

x2 6.969E-6 .000 .297 1.547 .139

x3 .125 .059 .209 2.102 .050

a. Dependent Variable: y

34

Here is part of the SPSS analysis:

What is your conclusion?

35

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression .795 3 .265 37.164 .000a

Residual .128 18 .007

Total .923 21

a. Predictors: (Constant), x3, x1, x2

b. Dependent Variable: y

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) -.391 .198 -1.977 .064

x1 .001 .000 .608 3.305 .004

x2 6.969E-6 .000 .297 1.547 .139

x3 .125 .059 .209 2.102 .050

a. Dependent Variable: y

What is your conclusion?

Since F is large and p is close to 0, the null hypothesis is rejected.We can conclude that there is a linear relationship between the 6-year graduation rate and the median SAT score , the student-related expense per full-time student, and the gender of the student body.

36

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression .795 3 .265 37.164 .000a

Residual .128 18 .007

Total .923 21

a. Predictors: (Constant), x3, x1, x2

b. Dependent Variable: y

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) -.391 .198 -1.977 .064

x1 .001 .000 .608 3.305 .004

x2 6.969E-6 .000 .297 1.547 .139

x3 .125 .059 .209 2.102 .050

a. Dependent Variable: y

What is the regression equation?

37

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression .795 3 .265 37.164 .000a

Residual .128 18 .007

Total .923 21

a. Predictors: (Constant), x3, x1, x2

b. Dependent Variable: y

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) -.391 .198 -1.977 .064

x1 .001 .000 .608 3.305 .004

x2 6.969E-6 .000 .297 1.547 .139

x3 .125 .059 .209 2.102 .050

a. Dependent Variable: y

What is the regression equation?y = .001x1 + .00000697x2 + .125x3 - .391

38

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression .795 3 .265 37.164 .000a

Residual .128 18 .007

Total .923 21

a. Predictors: (Constant), x3, x1, x2

b. Dependent Variable: y

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) -.391 .198 -1.977 .064

x1 .001 .000 .608 3.305 .004

x2 6.969E-6 .000 .297 1.547 .139

x3 .125 .059 .209 2.102 .050

a. Dependent Variable: y

For single-sex colleges:y = .001x1 + .00000697x2 + .125(1) - .391

y = .001x1 + .00000697x2 - .266

39

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression .795 3 .265 37.164 .000a

Residual .128 18 .007

Total .923 21

a. Predictors: (Constant), x3, x1, x2

b. Dependent Variable: y

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) -.391 .198 -1.977 .064

x1 .001 .000 .608 3.305 .004

x2 6.969E-6 .000 .297 1.547 .139

x3 .125 .059 .209 2.102 .050

a. Dependent Variable: y

For coed colleges:y = .001x1 + .00000697x2 - .391

40

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression .795 3 .265 37.164 .000a

Residual .128 18 .007

Total .923 21

a. Predictors: (Constant), x3, x1, x2

b. Dependent Variable: y

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) -.391 .198 -1.977 .064

x1 .001 .000 .608 3.305 .004

x2 6.969E-6 .000 .297 1.547 .139

x3 .125 .059 .209 2.102 .050

a. Dependent Variable: y

What is the meaning of the coefficient β3?

We can interpret the value .125 as the “correction” we would maketo the predicted graduation rate to incorporate the difference associated with having only male or only female students.

41

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression .795 3 .265 37.164 .000a

Residual .128 18 .007

Total .923 21

a. Predictors: (Constant), x3, x1, x2

b. Dependent Variable: y

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) -.391 .198 -1.977 .064

x1 .001 .000 .608 3.305 .004

x2 6.969E-6 .000 .297 1.547 .139

x3 .125 .059 .209 2.102 .050

a. Dependent Variable: y

What is the meaning of the coefficient β3?

We can interpret the value .125 as the difference in interceptsfor the two different types of colleges.

42

Interaction and Collinearity

If the change in the mean y-value associated with a 1-unit increase in one predictor variable depends on the value of a second predictor variable, there is interaction between the two predictor variables. If we represent the variables as x1 and x2, the interaction can be modeled by including their product, x1x2, as a predictor variable.

43

Interaction and Collinearity

The regression model for two predictor variables would now include a cross-product term:

Y = β0 + β1x1 + β2x2 + β3x1x2 +ε

where β1 + β3x2 represents the change in Y for every one-unit increase in x1,

keeping x2 fixed

β2 + β3x1 represents the change in Y for every one-unit increase in x2, keeping x1 fixed

If you find that there is a linear association, be sure to check the coefficient of the interaction term.

44

We determine collinearity by examining a correlation matrix:

What is the correlation between Pct BF and Height? -.029 Is this value significant? No; p=.322Pct BF and Waist? Is this value significant?Height and Waist? Is this value significant?

Correlations

  Height Waist

Pearson Correlation Pct BF -.029 .824

Height 1.000 .187

Waist .187 1.000

Sig. (1-tailed) Pct BF .322 .000

Height . .002

Waist .002 .

N Pct BF 250 250

Height 250 250

Waist 250 250

45

We determine collinearity by examining a correlation matrix:

What is the correlation between Pct BF and Height? -.029 Is this value significant? No; p = .322Pct BF and Waist? .824 Is this value significant? Yes; p = .000Height and Waist? .187 Is this value significant? Yes; p = .002

It is important to note that this information only refers to the pair of variables in question, without regard to the influences of other variables.

Correlations

  Height Waist

Pearson Correlation Pct BF -.029 .824

Height 1.000 .187

Waist .187 1.000

Sig. (1-tailed) Pct BF .322 .000

Height . .002

Waist .002 .

N Pct BF 250 250

Height 250 250

Waist 250 250

46

Another way to assess collinearity:

VIF is the Variance Inflation Factor, which indicates whether a predictor has a strong linear relationship with the other predictors. There is reason for concern if the largest VIF is greater than 5.

The Tolerance statistic is the reciprocal of the VIF. There is a serious problem if this value is less than .2.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.

Collinearity Statistics

B Std. Error Beta Tolerance VIF

1 (Constant) -3.110 7.687   -.405 .686    Waist 1.773 .072 .859 24.768 .000 .965 1.036

Height -.601 .110 -.190 -5.470 .000 .965 1.036

a. Dependent Variable: Pct BF

 

top related