overview of our study of the multiple linear regression model

35
Overview of our study of the multiple linear regression model Regression models with more than one slope parameter

Upload: howie

Post on 02-Feb-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Overview of our study of the multiple linear regression model. Regression models with more than one slope parameter. Example 1. Is brain and body size predictive of intelligence?. Sample of n = 38 college students - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Overview of our study of the   multiple linear regression model

Overview of our study of the multiple linear regression model

Regression models with

more than one slope parameter

Page 2: Overview of our study of the   multiple linear regression model

Is brain and body size predictive of intelligence?

• Sample of n = 38 college students• Response (y): intelligence based on PIQ

(performance) scores from the (revised) Wechsler Adult Intelligence Scale.

• Potential predictor (x1): Brain size based on MRI scans (given as count/10,000).

• Potential predictor (x2): Height in inches.• Potential predictor (x3): Weight in pounds.

Example 1

Page 3: Overview of our study of the   multiple linear regression model

Scatter matrix plot

Example 1

100.728

86.28373.25

65.75170.5

127.5

130.5

91.5

100.728

86.283

73.25

65.75

PIQ

Brain

Height

Weight

Page 4: Overview of our study of the   multiple linear regression model

Scatter matrix plot

Example 1

100.728

86.28373.25

65.75170.5

127.5

130.5

91.5

100.728

86.283

73.25

65.75

Brain Height Weight

PIQ

Bra

inH

eigh

t

Page 5: Overview of our study of the   multiple linear regression model

Scatter matrix plot

• Illustrates the marginal relationships between each pair of variables without regard to the other variables.

• The challenge is how the response y relates to all three predictors simultaneously.

Page 6: Overview of our study of the   multiple linear regression model

A multiple linear regression model with three quantitative predictors

iiiii xxxy 3322110

where …

• yi is intelligence (PIQ) of student i

• xi1 is brain size (MRI) of student i

• xi2 is height (Height) of student i

• xi3 is weight (Weight) of student i

Example 1

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

Page 7: Overview of our study of the   multiple linear regression model

Some research questions

• Which predictors – brain size, height, or weight – explain some variation in PIQ?

• What is the effect of brain size on PIQ, after taking into account height and weight?

• What is the PIQ of an individual with a given brain size, height, and weight?

Example 1

Page 8: Overview of our study of the   multiple linear regression model

Example 1

The regression equation isPIQ = 111 + 2.06 Brain - 2.73 Height + 0.001 Weight

Predictor Coef SE Coef T PConstant 111.35 62.97 1.77 0.086Brain 2.0604 0.5634 3.66 0.001Height -2.732 1.229 -2.22 0.033Weight 0.0006 0.1971 0.00 0.998

S = 19.79 R-Sq = 29.5% R-Sq(adj) = 23.3%

Analysis of VarianceSource DF SS MS F PRegression 3 5572.7 1857.6 4.74 0.007Residual Error 34 13321.8 391.8Total 37 18894.6

Source DF Seq SSBrain 1 2697.1Height 1 2875.6Weight 1 0.0

Page 9: Overview of our study of the   multiple linear regression model

Baby bird breathing habits in burrows?

• Experiment with n = 120 nestling bank swallows• Response (y): % increase in “minute ventilation”,

Vent, i.e., total volume of air breathed per minute

• Potential predictor (x1): percentage of oxygen, O2, in the air the baby birds breathe

• Potential predictor (x2): percentage of carbon dioxide, CO2, in the air the baby birds breathe

Example 2

Page 10: Overview of our study of the   multiple linear regression model

Scatter matrix plot

Example 2

17.514.5 6.752.25

484.75

52.25

17.5

14.5

Vent

O2

CO2

Page 11: Overview of our study of the   multiple linear regression model

Three-dimensional scatter plot

13-200

0

14

200

400

15 16 17 18

Vent

O2

400

600

86

4 CO22

180

19

Example 2

Page 12: Overview of our study of the   multiple linear regression model

A first order model with two quantitative predictors

iiii xxy 22110

where …

• yi is percentage of minute ventilation

• xi1 is percentage of oxygen

• xi2 is percentage of carbon dioxide

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

Example 2

Page 13: Overview of our study of the   multiple linear regression model

Some research questions

• Is oxygen related to minute ventilation, after taking into account carbon dioxide?

• Is carbon dioxide related to minute ventilation, after taking into account oxygen?

• What is the mean minute ventilation of all nestling bank swallows whose breathing air is comprised of 15% oxygen and 5% carbon dioxide?

Example 2

Page 14: Overview of our study of the   multiple linear regression model

Example 2

The regression equation isVent = 86 - 5.33 O2 + 31.1 CO2

Predictor Coef SE Coef T PConstant 85.9 106.0 0.81 0.419O2 -5.330 6.425 -0.83 0.408CO2 31.103 4.789 6.50 0.000

S = 157.4 R-Sq = 26.8% R-Sq(adj) = 25.6%

Analysis of VarianceSource DF SS MS F PRegression 2 1061819 530909 21.44 0.000Residual Error 117 2897566 24766Total 119 3959385

Source DF Seq SSO2 1 17045CO2 1 1044773

Page 15: Overview of our study of the   multiple linear regression model

Is baby’s birth weight related to smoking during pregnancy?

• Sample of n = 32 births

• Response (y): birth weight in grams of baby

• Potential predictor (x1): smoking status of mother (yes or no)

• Potential predictor (x2): length of gestation in weeks

Example 3

Page 16: Overview of our study of the   multiple linear regression model

Scatter matrix plot

4036 0.750.25

3252.5

2697.5

40

36

Weight

Gest

Smoking

Example 3

Page 17: Overview of our study of the   multiple linear regression model

A first order modelwith one binary predictor

iiii xxy 22110

where …

• yi is birth weight of baby i

• xi1 is length of gestation of baby i

• xi2 = 1, if mother smokes and xi2 = 0, if not

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

Example 3

Page 18: Overview of our study of the   multiple linear regression model

Estimated first order modelwith one binary predictor

0 1

424140393837363534

3700

3200

2700

2200

Gestation (weeks)

Wei

ght (

gram

s)

The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking

Example 3

Page 19: Overview of our study of the   multiple linear regression model

Some research questions

• Is baby’s birth weight related to smoking during pregnancy?

• How is birth weight related to gestation, after taking into account smoking status?

Example 3

Page 20: Overview of our study of the   multiple linear regression model

Example 3

The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking

Predictor Coef SE Coef T PConstant -2389.6 349.2 -6.84 0.000Gest 143.100 9.128 15.68 0.000Smoking -244.54 41.98 -5.83 0.000

S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%

Analysis of Variance

Source DF SS MS F PRegression 2 3348720 1674360 125.45 0.000Residual Error 29 387070 13347Total 31 3735789

Source DF Seq SSGest 1 2895838Smoking 1 452881

Page 21: Overview of our study of the   multiple linear regression model

Compare three treatments (A, B, C) for severe depression

• Random sample of n = 36 severely depressed individuals.

• y = measure of treatment effectiveness

• x1 = age (in years)

• x2 = 1 if patient received A and 0, if not

• x3 = 1 if patient received B and 0, if not

Example 4

Page 22: Overview of our study of the   multiple linear regression model

A B

C

706050403020

75

65

55

45

35

25

age

y

Compare three treatments (A, B, C) for severe depression

Example 4

Page 23: Overview of our study of the   multiple linear regression model

A second order model with one quantitative predictor, a three-group qualitative variable, and interactions

iiiii

iiii

xxxx

xxxy

31132112

3322110

where …

• yi is treatment effectiveness for patient i

• xi1 is age of patient i

• xi2 = 1, if treatment A and xi2 = 0, if not

• xi3 = 1, if treatment B and xi3 = 0, if not

Example 4

Page 24: Overview of our study of the   multiple linear regression model

The estimated regression function

A B

C

706050403020

80

70

60

50

40

30

20

age

y

y = 47.5 + 0.33x

y = 6.21 + 1.03x

y = 28.9 + 0.52x

Example 4

Regression equation is y = 6.21 + 1.03 age + 41.3 x2 + 22.7 x3 - 0.703 agex2 - 0.510 agex3

Page 25: Overview of our study of the   multiple linear regression model

Potential research questions

• Does the effectiveness of the treatment depend on age?

• Is one treatment superior to the other treatment for all ages?

• What is the effect of age on the effectiveness of the treatment?

Example 4

Page 26: Overview of our study of the   multiple linear regression model

Regression equation is y = 6.21 + 1.03 age + 41.3 x2 + 22.7 x3 - 0.703 agex2 - 0.510 agex3

Predictor Coef SE Coef T PConstant 6.211 3.350 1.85 0.074age 1.03339 0.07233 14.29 0.000x2 41.304 5.085 8.12 0.000x3 22.707 5.091 4.46 0.000agex2 -0.7029 0.1090 -6.45 0.000agex3 -0.5097 0.1104 -4.62 0.000

S = 3.925 R-Sq = 91.4% R-Sq(adj) = 90.0%

Analysis of VarianceSource DF SS MS F PRegression 5 4932.85 986.57 64.04 0.000Residual Error 30 462.15 15.40Total 35 5395.00

Source DF Seq SSage 1 3424.43x2 1 803.80x3 1 1.19agex2 1 375.00agex3 1 328.42

Example 4

Page 27: Overview of our study of the   multiple linear regression model

How is the length of a bluegill fish related to its age?

• In 1981, n = 78 bluegills randomly sampled from Lake Mary in Minnesota.

• y = length (in mm)

• x1 = age (in years)

Example 5

Page 28: Overview of our study of the   multiple linear regression model

Scatter plot

654321

200

150

100

age

leng

th

Example 5

Page 29: Overview of our study of the   multiple linear regression model

A second order polynomial model with one quantitative predictor

iiii xxy 21110

where …

• yi is length of bluegill (fish) i (in mm)

• xi is age of bluegill (fish) i (in years)

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

Example 5

Page 30: Overview of our study of the   multiple linear regression model

Estimated regression function

1 2 3 4 5 6

100

150

200

age

leng

th

length = 13.6224 + 54.0493 age - 4.71866 age**2

S = 10.9061 R-Sq = 80.1 % R-Sq(adj) = 79.6 %

Regression Plot

Example 5

Page 31: Overview of our study of the   multiple linear regression model

Potential research questions

• How is the length of a bluegill fish related to its age?

• What is the length of a randomly selected five-year-old bluegill fish?

Example 5

Page 32: Overview of our study of the   multiple linear regression model

The regression equation is length = 148 + 19.8 c_age - 4.72 c_agesq

Predictor Coef SE Coef T PConstant 147.604 1.472 100.26 0.000c_age 19.811 1.431 13.85 0.000c_agesq -4.7187 0.9440 -5.00 0.000

S = 10.91 R-Sq = 80.1% R-Sq(adj) = 79.6%

Analysis of VarianceSource DF SS MS F PRegression 2 35938 17969 151.07 0.000Residual Error 75 8921 119Total 77 44859...Predicted Values for New ObservationsNew Fit SE Fit 95.0% CI 95.0% PI1 165.90 2.77 (160.39, 171.42) (143.49, 188.32)

Values of Predictors for New ObservationsNew c_age c_agesq1 1.37 1.88

Example 5

Page 33: Overview of our study of the   multiple linear regression model

The good news!

• Everything you learned about the simple linear regression model extends, with at most minor modification, to the multiple linear regression model:– same assumptions, same model checking– (adjusted) R2

– t-tests and t-intervals for one slope– prediction (confidence) intervals for (mean)

response

Page 34: Overview of our study of the   multiple linear regression model

New things we need to learn!

• The above research scenarios (models) and a few more

• The “general linear test” which helps to answer many research questions

• F-tests for more than one slope• Interactions between two or more predictor

variables• Identifying influential data points

Page 35: Overview of our study of the   multiple linear regression model

New things we need to learn!

• Detection of (“variance inflation factors”) correlated predictors (“multicollinearity”) and the limitations they cause

• Selection of variables from a large set of variables for inclusion in a model (“stepwise regression and “best subsets regression”)