chapter 14 multiple regression models. 2 a general additive multiple regression model, which...

34
Chapter 14 Multiple Regression Models

Upload: coleen-cobb

Post on 16-Jan-2016

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

Chapter 14

Multiple Regression Models

Page 2: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

2 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

A general additive multiple regression model, which relates a dependent variable y to k predictor variables x1, x2,…, xk is given by the model equation

y = + 1x1 + 2x2 + … + kxk + e

The random deviation e is assumed to be normally distributed with mean value 0 and variance 2 for any particular values of x1, x2,…, xk. This implies that for fixed x1, x2,…, xk values, y has a normal distribution with variance 2 and

(mean y value for

fixed x1, x2,…, xk values) = + 1x1 + 2x2 + … + kxk

Multiple Regression Models

Page 3: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

3 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The i’s are called population regression coefficients; each i can be interpreted as the true average change in y when the predictor xi increases by 1 unit and the values of all the other predictors remain fixed.

The deterministic portion + 1x1 + 2x2 + … + kxk is called the population regression function.

Multiple Regression Models

Page 4: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

4 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The kth degree polynomial regression model

y = + 1x + 2x2 + … + kxk + e

Is a special case of the general multiple regression model with x1 = x, x2 = x2, … , xk = xk.The population regression function (mean value of y for fixed values of the predictors) is

+ 1x + 2x2 + … + kxk .The most important special case other than simple linear regression (k = 1) is the quadratic regression model y = + 1x + 2x2. This model replaces the line y = + x with a parabolic cure of mean values + 1x + 2x2. If 2 > 0, the curve opens upward, whereas if 2 < 0, the curve opens downward.

Polynomial Regression Models

Page 5: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

5 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

If the change in the mean y value associated with a 1-unit increase in one independent variable depends on the value of a second independent variable, there is interaction between these two variables. When the variables are denoted by x1 and x2, such interaction can be modeled by including x1x2, the product of the variables that interact, as a predictor variable.

Interaction

Page 6: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

6 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Up to now, we have only considered the inclusion of quantitative (numerical) predictor variables in a multiple regression model. Two types are very common:

Dichotomous variable: One with just two possible categories coded 0 and 1Example Gender {male, female} Marriage status {married, not-married}

Ordinal variables: Categorical variables that have a natural ordering Activity level {light, moderate, heavy} coded respectively as

1, 2 and 3 Education level {none, elementary, secondary, college,

graduate} coded respectively 1, 2, 3, 4, 5 (or for that matter any 5 consecutive integers}

Qualitative Predictor Variables.

Page 7: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

7 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

According to the principle of least squares, the fit of a particular estimated regression function

a + b1x1 + b2x2 + … + bkxk to the observed data is measured by the sum of squared deviations between the observed y values and the y values predicted by the estimated function:

[y –(a + b1x1 + b2x2 + … + bkxk )]2

The least squares estimates of , 1, 2,…, k are those values of a, b1, b2, … , bk that make this sum of squared deviations as small as possible.

Least Square Estimates

Page 8: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

8 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Predicted Values & Residuals1yT h e f i r s t p r e d i c t e d v a l u e i s o b t a i n e d b y t a k i n g

t h e v a l u e s o f t h e p r e d i c t o r v a r i a b l e s x 1 , x 2 , … , x k f o r t h e f i r s t s a m p l e o b s e r v a t i o n a n d s u b s t i t u t i n g t h e s e v a l u e s i n t o t h e e s t i m a t e d r e g r e s s i o n f u n c t i o n . D o i n g t h i s s u c c e s s i v e l y f o r t h e r e m a i n i n g o b s e r v a t i o n s y i e l d s t h e p r e d i c t e d v a l u e s

( s o m e t i m e s r e f e r r e d t o a s t h e f i t t e d v a l u e s o r f i t s ) .

2 3 kˆ ˆ ˆy , y , , y

1yT h e f i r s t p r e d i c t e d v a l u e i s o b t a i n e d b y t a k i n g t h e v a l u e s o f t h e p r e d i c t o r v a r i a b l e s x 1 , x 2 , … , x k f o r t h e f i r s t s a m p l e o b s e r v a t i o n a n d s u b s t i t u t i n g t h e s e v a l u e s i n t o t h e e s t i m a t e d r e g r e s s i o n f u n c t i o n . D o i n g t h i s s u c c e s s i v e l y f o r t h e r e m a i n i n g o b s e r v a t i o n s y i e l d s t h e p r e d i c t e d v a l u e s

( s o m e t i m e s r e f e r r e d t o a s t h e f i t t e d v a l u e s o r f i t s ) .

2 3 kˆ ˆ ˆy , y , , y

T h e f i r s t p r e d i c t e d v a l u e i s o b t a i n e d b y t a k i n g t h e v a l u e s o f t h e p r e d i c t o r v a r i a b l e s x 1 , x 2 , … , x k f o r t h e f i r s t s a m p l e o b s e r v a t i o n a n d s u b s t i t u t i n g t h e s e v a l u e s i n t o t h e e s t i m a t e d r e g r e s s i o n f u n c t i o n . D o i n g t h i s s u c c e s s i v e l y f o r t h e r e m a i n i n g o b s e r v a t i o n s y i e l d s t h e p r e d i c t e d v a l u e s

( s o m e t i m e s r e f e r r e d t o a s t h e f i t t e d v a l u e s o r f i t s ) .

2 3 kˆ ˆ ˆy , y , , y

T h e r e s i d u a l s a r e t h e n t h e d i f f e r e n c e s

b e t w e e n t h e o b s e r v e d a n d p r e d i c t e d y v a l u e s .

1 1 2 2 k kˆ ˆ ˆy y , y y , , y y

T h e r e s i d u a l s a r e t h e n t h e d i f f e r e n c e s

b e t w e e n t h e o b s e r v e d a n d p r e d i c t e d y v a l u e s .

1 1 2 2 k kˆ ˆ ˆy y , y y , , y y

Page 9: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

9 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Sums of Squares

T h e r e s i d u a l ( o r e r r o r ) s u m o f s q y a r e s , S S R e s i d , a n d t o t a l s u m o f s q u a r e s , S S T o , a r e g i v e n b y

W h e r e i s t h e m e a n o f t h e y o b s e r v a t i o n s i n t h e s a m p l e .

y

T h e r e s i d u a l ( o r e r r o r ) s u m o f s q y a r e s , S S R e s i d , a n d t o t a l s u m o f s q u a r e s , S S T o , a r e g i v e n b y

W h e r e i s t h e m e a n o f t h e y o b s e r v a t i o n s i n t h e s a m p l e .

y 2 2

ˆS S R e s i d = y - y S S T o = y - y

T h e n u m b e r o f d e g r e e s o f f r e e d o m a s s o c i a t e d w i t h S S R e s i d i s n - ( k + 1 ) , b e c a u s e k + 1 d f a r e l o s t i n e s t i m a t i n g t h e k + 1 c o e f f i c i e n t s , 1 , 2 , … , k .

Page 10: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

10 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Estimate for 2

An estimate of the random deviation variance 2 is given by

And is the estimate of .

2e

SSResids

n - (k + 1)

2e es s

An estimate of the random deviation variance 2 is given by

And is the estimate of .

2e

SSResids

n - (k + 1)

2e es s

Page 11: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

11 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Coefficient of Multiple Determination, R2

The coefficient of multiple determination, R2, interpreted as the proportion of variation in observed y values that is explained by the fitted model, is

2 SSResidR 1

SSTo

The coefficient of multiple determination, R2, interpreted as the proportion of variation in observed y values that is explained by the fitted model, is

2 SSResidR 1

SSTo

Page 12: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

12 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Adjusted R2

Generally, a model with large R2 and small se are desirable. If a large number of variables (relative to the number of data points) is used those conditions may be satisfied but the model will be unrealistic and difficult to interpret.T o s o r t o u t t h i s p r o b l e m , s o m e t i m e s c o m p u t e r p a c k a g e s c o m p u t e a q u a n t i t y c a l l e d t h e a d j u s t e d R 2 ,

N o t i c e t h a t w h e n a l a r g e n u m b e r o f v a r i a b l e s a r e u s e d t o b u i l d t h e m o d e l , t h i s v a l u e w i l l b e s u b s t a n t i a l l y l o w e r t h a n R 2 a n d g i v e a b e t t e r i n d i c a t i o n o f u s a b i l i t y o f t h e m o d e l .

2 n 1 S S R e s i da d j u s t e d R 1

n ( k 1 ) S S T o

T o s o r t o u t t h i s p r o b l e m , s o m e t i m e s c o m p u t e r p a c k a g e s c o m p u t e a q u a n t i t y c a l l e d t h e a d j u s t e d R 2 ,

N o t i c e t h a t w h e n a l a r g e n u m b e r o f v a r i a b l e s a r e u s e d t o b u i l d t h e m o d e l , t h i s v a l u e w i l l b e s u b s t a n t i a l l y l o w e r t h a n R 2 a n d g i v e a b e t t e r i n d i c a t i o n o f u s a b i l i t y o f t h e m o d e l .

2 n 1 S S R e s i da d j u s t e d R 1

n ( k 1 ) S S T o

Page 13: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

13 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

F DistributionsF distributions are similar to a Chi-Square Distributions, but have two parameters, dfden and dfnum.

Page 14: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

14 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The F Test for Model Utility

The regression sum of squares denoted by SSReg is defined by

SSREG = SSTo - SSresid

Page 15: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

15 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The F Test for Model UtilityW h e n a l l k i ’ s a r e z e r o i n t h e m o d e l

y = + 1 x 1 + 2 x 2 + … + k x k + e

A n d w h e n t h e d i s t r i b u t i o n o f e i s n o r m a l w i t h m e a n 0 a n d v a r i a n c e 2 f o r a n y p a r t i c u l a r v a l u e s o f x 1 , x 2 , … , x k , t h e s t a t i s t i c

h a s a n f p r o b a b i l i t y d i s t r i b u t i o n b a s e d o n k n u m e r a t o r d f a n d n - ( K + 1 ) d e n o m i n a t o r d f

S S R e g rkF

S S R e s i dn ( k 1 )

W h e n a l l k i ’ s a r e z e r o i n t h e m o d e l

y = + 1 x 1 + 2 x 2 + … + k x k + e

A n d w h e n t h e d i s t r i b u t i o n o f e i s n o r m a l w i t h m e a n 0 a n d v a r i a n c e 2 f o r a n y p a r t i c u l a r v a l u e s o f x 1 , x 2 , … , x k , t h e s t a t i s t i c

h a s a n f p r o b a b i l i t y d i s t r i b u t i o n b a s e d o n k n u m e r a t o r d f a n d n - ( K + 1 ) d e n o m i n a t o r d f

S S R e g rkF

S S R e s i dn ( k 1 )

Page 16: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

16 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The F Test Utility of the Model y = + 1x1 + 2x2 + … + kxk + e

Null hypothesis: H0: 1 = 2 = … = k =0

(There is no useful linear relationship between y and any of the predictors.)

Alternate hypothesis: Ha: At least one among 1, 2, … , k is not

zero

(There is a useful linear relationship between y and at least one of the predictors.)

Page 17: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

17 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The F Test Utility of the Model y = + 1x1 + 2x2 + … + kxk + e

T e s t s t a t i s t i c :S S R e g r

kF S S R e s i d

n ( k 1 )

w h e r e S S r e g = S S T 0 - S S r e s i d .

A n a l t e r n a t e f o r m u l a :2

2

RkF

(1 R )n ( k 1 )

w h e r e S S r e g = S S T 0 - S S r e s i d .

Page 18: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

18 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The F Test Utility of the Model y = + 1x1 + 2x2 + … + kxk + eThe test is upper-tailed, and the information in the Table of Values that capture specified upper-tail F curve areas is used to obtain a bound or bounds on the P-value using numerator df = k and denominator df = n - (k + 1).

Assumptions: For any particular combination of predictor variable values, the distribution of e, the random deviation, is normal with mean 0 and constant variance.

Page 19: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

19 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

An ExampleDuring a summer NSF program for teachers of statistics, the participants were asked to break into groups and develop a project similar in scope to what we would like to have our students develop. One of these groups decided that it would study lung capacity of adult humans measured in liters. To measure the capacities of a sample of adults (the sample was not particularly easy to obtain on the campus during the summer so we “shanghaied” everyone that was willing to stand still, be measured and interviewed. We used borrowed (antique liquid displacement apparatus) equipment and collected data.

Page 20: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

20 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

An ExampleThis group recorded a number of variables including gender (m or f), age (yrs), height (in), weight (lbs), waist (in), chest girth (in), smoking (Y or N), activity level (1 - light, 1 - medium, 3 - heavy) along with the lung capacity (liters).

The code for the gender is 0 = Female1 = Male

The code for smoking is 0 = No1 = Yes

The data follows on the next slides

Page 21: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

21 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

An Example - The DataId Sex Age Ht Wt Chest Waist Act Ske Lung 1 0 52 68 125 31.25 28.75 2 0 4.3 2 0 48 77 200 37 36 3 0 5.75 3 0 26 67 135 29.75 28 2 0 4.4 4 1 45 66 163 35 34.5 2 1 2.2 5 1 49 63 145 33 30.5 1 0 2.4 6 1 43 62 130 31 29.5 2 0 3 7 0 33 73 185 37 38 3 0 5.7 8 1 52 62.5 209 36.5 40.5 2 1 2.5 9 0 26 70.5 160 34.5 31 3 0 4.5 10 1 49 61 125 31 28 2 0 3.15 11 1 42 58 180 40.5 42 2 0 2.25 12 1 20 61 130 30.25 30.25 1 0 2.8 13 0 23 71 185 33.75 33.75 3 0 5 14 0 21 68 135 30 29.5 2 0 4.15 15 0 19 66 135 30.75 29 2 0 3.65

Page 22: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

22 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

An Example - The DataId Sex Age Ht Wt Chest Waist Act Ske Lung 16 1 20 64.25 119 26.75 26 3 0 3.8 17 1 32 65 112 28.25 25.5 2 0 3 18 0 55 65 130 30.5 32 1 0 2.45 19 0 56 74 165 37 35.25 3 0 5.2 20 0 52 70 225 43.5 42.25 1 1 3.2 21 0 47 68 155 34.5 33.875 2 0 3.9 22 1 14 63 87 24.75 22.5 1 0 2.95 23 0 37 74 235 38 38 1 1 5.7 24 1 14 59 95 27 25.5 2 0 2.6 25 1 18 66 123 28 24.5 2 0 3.25 26 1 23 66 140 27.5 29.25 3 0 3.8 27 1 18 65.5 118 25.5 24 1 0 2.5 28 1 20 65 185 35 32.75 1 0 3.35 29 0 20 66 135 31.75 29.5 3 0 4 30 0 24 74 185 34 37 2 0 5.15

Page 23: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

23 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

An Example - The DataId Sex Age Ht Wt Chest Waist Act Ske Lung 31 1 19 66 125 31 33 2 1 3.5 32 1 36 67 130 29.5 24 2 1 3.25 33 1 19 66 125 31.5 28.5 2 2 3.5 34 1 20 68 175 32.5 31.5 2 2 3.9 35 1 26 69 118 28.5 25.5 2 2 4.2 36 0 49 71 306 47 53 1 1 4.2 37 0 69 160 32.25 32.5 2 0 3.4 38 0 56 69 175 38.75 37 2 0 3.6 39 0 37 70 185 36 34.5 2 0 5.4 40 1 45 62 152 31.5 30 1 0 2.45 41 1 46 64 125 31 27.5 2 1 2.45

Page 24: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

24 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis - 1st with MinitabRegression Analysis: Capacity versus Age, Height, ...

The regression equation isCapacity = - 6.17 - 0.0140 Age + 0.149 Height + 0.00636 Weight - 0.0087 Chest - 0.0220 Waist + 0.343 Activity - 0.109 Smoke - 0.409 Gender

40 cases used 1 cases contain missing values

Predictor Coef SE Coef T PConstant -6.172 2.653 -2.33 0.027Age -0.014032 0.007000 -2.00 0.054Height 0.14856 0.03503 4.24 0.000Weight 0.006359 0.006094 1.04 0.305Chest -0.00867 0.05791 -0.15 0.882Waist -0.02197 0.04557 -0.48 0.633Activity 0.3427 0.1282 2.67 0.012Smoke -0.1092 0.1491 -0.73 0.469Gender -0.4086 0.2757 -1.48 0.148

S = 0.4607 R-Sq = 84.3% R-Sq(adj) = 80.2%

Page 25: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

25 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis - 2nd with MinitabNotice that the P-values on the right suggest that only the predictors height (P-value = 0.000) and activity level (P-value = 0.012) are significant at the 0.05 level of significance. The only other variable that seem possibly significant are age (P-value = 0.054 and gender (P-value =0.148).

When stepwise regression techniques are applied using Minitab, the variables that remain significant are height, activity level, age and gender.

The output is on the next two slides.

Page 26: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

26 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis - 2nd with MinitabStepwise Regression: Capacity versus Age, Height, ...

Alpha-to-Enter: 0.1 Alpha-to-Remove: 0.1

Response is Capacity on 8 predictors, with N = 40 N(cases with missing observations) = 1 N(all cases) = 41

Step 1 2 3 4Constant -10.251 -9.759 -9.787 -6.929

Height 0.209 0.191 0.198 0.161T-Value 10.42 9.87 10.43 6.55P-Value 0.000 0.000 0.000 0.000

Activity 0.35 0.31 0.30T-Value 2.87 2.60 2.67P-Value 0.007 0.013 0.011

Page 27: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

27 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis - 2nd with MinitabActivity 0.35 0.31 0.30T-Value 2.87 2.60 2.67P-Value 0.007 0.013 0.011

Age -0.0109 -0.0137T-Value -1.96 -2.54P-Value 0.057 0.016

Gender -0.47T-Value -2.24P-Value 0.032

S 0.534 0.490 0.472 0.448R-Sq 74.06 78.78 80.84 83.23R-Sq(adj) 73.38 77.63 79.24 81.32C-p 15.1 7.8 5.8 3.0

Page 28: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

28 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis - 2nd with MinitabThe resulting Minitab output from the regression analysis using those 4 predictors follows.

Regression Analysis: Capacity versus Height, Activity, Gender, Age

The regression equation isCapacity = - 6.93 + 0.161 Height + 0.302 Activity - 0.466 Gender - 0.0137 Age

40 cases used 1 cases contain missing values

Predictor Coef SE Coef T PConstant -6.929 1.708 -4.06 0.000Height 0.16079 0.02454 6.55 0.000Activity 0.3025 0.1133 2.67 0.011Gender -0.4658 0.2082 -2.24 0.032Age -0.013744 0.005404 -2.54 0.016

S = 0.4477 R-Sq = 83.2% R-Sq(adj) = 81.3%

Page 29: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

29 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis - 2nd with MinitabConsider the following graphs: residuals vs fits and the normal plot of the residual.

65432

1

0

-1

Fitted Value

Res

idu

alResiduals Versus the Fitted Values

(response is Capacity)

Page 30: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

30 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis - 2nd with Minitab

10-1

2

1

0

-1

-2

Nor

mal

Sco

re

Residual

Normal Probability Plot of the Residuals(response is Capacity)

Page 31: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

31 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis - 2nd with Minitab

Notice that both of these graphs appear to indicate that the assumptions made were justifiable. This multilinear model appears to provide a reasonably acceptable model for estimating lung capacity.

Page 32: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

32 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis - 3rd with Minitab

An number of the members on the project team felt that other variables, specifically height/weight and chest/waist rations as well as the square of the chest girth multiplied by the height might be better predictor variables.When these three combination variables were calculated and added to the height, activity level, age and gender the following Minitab output was obtained.

Page 33: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

33 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis - 3rd with MinitabRegression Analysis: Capacity versus Height, Activity, ...

The regression equation isCapacity = - 6.22 + 0.160 Height + 0.307 Activity - 0.469 Gender - 0.0150 Age - 1.04 HT/WT + 0.01 CH/Waist -0.000002 c2h

40 cases used 1 cases contain missing values

Predictor Coef SE Coef T PConstant -6.220 2.111 -2.95 0.006Height 0.16012 0.02915 5.49 0.000Activity 0.3072 0.1211 2.54 0.016Gender -0.4686 0.2245 -2.09 0.045Age -0.015039 0.006613 -2.27 0.030HT/WT -1.042 1.574 -0.66 0.512CH/Waist 0.011 1.305 0.01 0.993c2h -0.00000221 0.00000737 -0.30 0.766

S = 0.4635 R-Sq = 83.6% R-Sq(adj) = 80.0%

Page 34: Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables

34 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis - 1st with MinitabNone of these three variables appeared to be significant. The fact that the girth2•height which would be proportional (approximately) to the volume of the body came as a surprise to the members of the team.

As a side note, the literature on spirography suggests that height is the most significant factor in lung capacity and this was what this particular study indicated after it was completely analyzed.