Transcript
Page 1: Overview of our study of the   multiple linear regression model

Overview of our study of the multiple linear regression model

Regression models with

more than one slope parameter

Page 2: Overview of our study of the   multiple linear regression model

Is brain and body size predictive of intelligence?

• Sample of n = 38 college students• Response (y): intelligence based on PIQ

(performance) scores from the (revised) Wechsler Adult Intelligence Scale.

• Potential predictor (x1): Brain size based on MRI scans (given as count/10,000).

• Potential predictor (x2): Height in inches.• Potential predictor (x3): Weight in pounds.

Example 1

Page 3: Overview of our study of the   multiple linear regression model

Scatter matrix plot

Example 1

100.728

86.28373.25

65.75170.5

127.5

130.5

91.5

100.728

86.283

73.25

65.75

PIQ

Brain

Height

Weight

Page 4: Overview of our study of the   multiple linear regression model

Scatter matrix plot

Example 1

100.728

86.28373.25

65.75170.5

127.5

130.5

91.5

100.728

86.283

73.25

65.75

Brain Height Weight

PIQ

Bra

inH

eigh

t

Page 5: Overview of our study of the   multiple linear regression model

Scatter matrix plot

• Illustrates the marginal relationships between each pair of variables without regard to the other variables.

• The challenge is how the response y relates to all three predictors simultaneously.

Page 6: Overview of our study of the   multiple linear regression model

A multiple linear regression model with three quantitative predictors

iiiii xxxy 3322110

where …

• yi is intelligence (PIQ) of student i

• xi1 is brain size (MRI) of student i

• xi2 is height (Height) of student i

• xi3 is weight (Weight) of student i

Example 1

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

Page 7: Overview of our study of the   multiple linear regression model

Some research questions

• Which predictors – brain size, height, or weight – explain some variation in PIQ?

• What is the effect of brain size on PIQ, after taking into account height and weight?

• What is the PIQ of an individual with a given brain size, height, and weight?

Example 1

Page 8: Overview of our study of the   multiple linear regression model

Example 1

The regression equation isPIQ = 111 + 2.06 Brain - 2.73 Height + 0.001 Weight

Predictor Coef SE Coef T PConstant 111.35 62.97 1.77 0.086Brain 2.0604 0.5634 3.66 0.001Height -2.732 1.229 -2.22 0.033Weight 0.0006 0.1971 0.00 0.998

S = 19.79 R-Sq = 29.5% R-Sq(adj) = 23.3%

Analysis of VarianceSource DF SS MS F PRegression 3 5572.7 1857.6 4.74 0.007Residual Error 34 13321.8 391.8Total 37 18894.6

Source DF Seq SSBrain 1 2697.1Height 1 2875.6Weight 1 0.0

Page 9: Overview of our study of the   multiple linear regression model

Baby bird breathing habits in burrows?

• Experiment with n = 120 nestling bank swallows• Response (y): % increase in “minute ventilation”,

Vent, i.e., total volume of air breathed per minute

• Potential predictor (x1): percentage of oxygen, O2, in the air the baby birds breathe

• Potential predictor (x2): percentage of carbon dioxide, CO2, in the air the baby birds breathe

Example 2

Page 10: Overview of our study of the   multiple linear regression model

Scatter matrix plot

Example 2

17.514.5 6.752.25

484.75

52.25

17.5

14.5

Vent

O2

CO2

Page 11: Overview of our study of the   multiple linear regression model

Three-dimensional scatter plot

13-200

0

14

200

400

15 16 17 18

Vent

O2

400

600

86

4 CO22

180

19

Example 2

Page 12: Overview of our study of the   multiple linear regression model

A first order model with two quantitative predictors

iiii xxy 22110

where …

• yi is percentage of minute ventilation

• xi1 is percentage of oxygen

• xi2 is percentage of carbon dioxide

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

Example 2

Page 13: Overview of our study of the   multiple linear regression model

Some research questions

• Is oxygen related to minute ventilation, after taking into account carbon dioxide?

• Is carbon dioxide related to minute ventilation, after taking into account oxygen?

• What is the mean minute ventilation of all nestling bank swallows whose breathing air is comprised of 15% oxygen and 5% carbon dioxide?

Example 2

Page 14: Overview of our study of the   multiple linear regression model

Example 2

The regression equation isVent = 86 - 5.33 O2 + 31.1 CO2

Predictor Coef SE Coef T PConstant 85.9 106.0 0.81 0.419O2 -5.330 6.425 -0.83 0.408CO2 31.103 4.789 6.50 0.000

S = 157.4 R-Sq = 26.8% R-Sq(adj) = 25.6%

Analysis of VarianceSource DF SS MS F PRegression 2 1061819 530909 21.44 0.000Residual Error 117 2897566 24766Total 119 3959385

Source DF Seq SSO2 1 17045CO2 1 1044773

Page 15: Overview of our study of the   multiple linear regression model

Is baby’s birth weight related to smoking during pregnancy?

• Sample of n = 32 births

• Response (y): birth weight in grams of baby

• Potential predictor (x1): smoking status of mother (yes or no)

• Potential predictor (x2): length of gestation in weeks

Example 3

Page 16: Overview of our study of the   multiple linear regression model

Scatter matrix plot

4036 0.750.25

3252.5

2697.5

40

36

Weight

Gest

Smoking

Example 3

Page 17: Overview of our study of the   multiple linear regression model

A first order modelwith one binary predictor

iiii xxy 22110

where …

• yi is birth weight of baby i

• xi1 is length of gestation of baby i

• xi2 = 1, if mother smokes and xi2 = 0, if not

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

Example 3

Page 18: Overview of our study of the   multiple linear regression model

Estimated first order modelwith one binary predictor

0 1

424140393837363534

3700

3200

2700

2200

Gestation (weeks)

Wei

ght (

gram

s)

The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking

Example 3

Page 19: Overview of our study of the   multiple linear regression model

Some research questions

• Is baby’s birth weight related to smoking during pregnancy?

• How is birth weight related to gestation, after taking into account smoking status?

Example 3

Page 20: Overview of our study of the   multiple linear regression model

Example 3

The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking

Predictor Coef SE Coef T PConstant -2389.6 349.2 -6.84 0.000Gest 143.100 9.128 15.68 0.000Smoking -244.54 41.98 -5.83 0.000

S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%

Analysis of Variance

Source DF SS MS F PRegression 2 3348720 1674360 125.45 0.000Residual Error 29 387070 13347Total 31 3735789

Source DF Seq SSGest 1 2895838Smoking 1 452881

Page 21: Overview of our study of the   multiple linear regression model

Compare three treatments (A, B, C) for severe depression

• Random sample of n = 36 severely depressed individuals.

• y = measure of treatment effectiveness

• x1 = age (in years)

• x2 = 1 if patient received A and 0, if not

• x3 = 1 if patient received B and 0, if not

Example 4

Page 22: Overview of our study of the   multiple linear regression model

A B

C

706050403020

75

65

55

45

35

25

age

y

Compare three treatments (A, B, C) for severe depression

Example 4

Page 23: Overview of our study of the   multiple linear regression model

A second order model with one quantitative predictor, a three-group qualitative variable, and interactions

iiiii

iiii

xxxx

xxxy

31132112

3322110

where …

• yi is treatment effectiveness for patient i

• xi1 is age of patient i

• xi2 = 1, if treatment A and xi2 = 0, if not

• xi3 = 1, if treatment B and xi3 = 0, if not

Example 4

Page 24: Overview of our study of the   multiple linear regression model

The estimated regression function

A B

C

706050403020

80

70

60

50

40

30

20

age

y

y = 47.5 + 0.33x

y = 6.21 + 1.03x

y = 28.9 + 0.52x

Example 4

Regression equation is y = 6.21 + 1.03 age + 41.3 x2 + 22.7 x3 - 0.703 agex2 - 0.510 agex3

Page 25: Overview of our study of the   multiple linear regression model

Potential research questions

• Does the effectiveness of the treatment depend on age?

• Is one treatment superior to the other treatment for all ages?

• What is the effect of age on the effectiveness of the treatment?

Example 4

Page 26: Overview of our study of the   multiple linear regression model

Regression equation is y = 6.21 + 1.03 age + 41.3 x2 + 22.7 x3 - 0.703 agex2 - 0.510 agex3

Predictor Coef SE Coef T PConstant 6.211 3.350 1.85 0.074age 1.03339 0.07233 14.29 0.000x2 41.304 5.085 8.12 0.000x3 22.707 5.091 4.46 0.000agex2 -0.7029 0.1090 -6.45 0.000agex3 -0.5097 0.1104 -4.62 0.000

S = 3.925 R-Sq = 91.4% R-Sq(adj) = 90.0%

Analysis of VarianceSource DF SS MS F PRegression 5 4932.85 986.57 64.04 0.000Residual Error 30 462.15 15.40Total 35 5395.00

Source DF Seq SSage 1 3424.43x2 1 803.80x3 1 1.19agex2 1 375.00agex3 1 328.42

Example 4

Page 27: Overview of our study of the   multiple linear regression model

How is the length of a bluegill fish related to its age?

• In 1981, n = 78 bluegills randomly sampled from Lake Mary in Minnesota.

• y = length (in mm)

• x1 = age (in years)

Example 5

Page 28: Overview of our study of the   multiple linear regression model

Scatter plot

654321

200

150

100

age

leng

th

Example 5

Page 29: Overview of our study of the   multiple linear regression model

A second order polynomial model with one quantitative predictor

iiii xxy 21110

where …

• yi is length of bluegill (fish) i (in mm)

• xi is age of bluegill (fish) i (in years)

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

Example 5

Page 30: Overview of our study of the   multiple linear regression model

Estimated regression function

1 2 3 4 5 6

100

150

200

age

leng

th

length = 13.6224 + 54.0493 age - 4.71866 age**2

S = 10.9061 R-Sq = 80.1 % R-Sq(adj) = 79.6 %

Regression Plot

Example 5

Page 31: Overview of our study of the   multiple linear regression model

Potential research questions

• How is the length of a bluegill fish related to its age?

• What is the length of a randomly selected five-year-old bluegill fish?

Example 5

Page 32: Overview of our study of the   multiple linear regression model

The regression equation is length = 148 + 19.8 c_age - 4.72 c_agesq

Predictor Coef SE Coef T PConstant 147.604 1.472 100.26 0.000c_age 19.811 1.431 13.85 0.000c_agesq -4.7187 0.9440 -5.00 0.000

S = 10.91 R-Sq = 80.1% R-Sq(adj) = 79.6%

Analysis of VarianceSource DF SS MS F PRegression 2 35938 17969 151.07 0.000Residual Error 75 8921 119Total 77 44859...Predicted Values for New ObservationsNew Fit SE Fit 95.0% CI 95.0% PI1 165.90 2.77 (160.39, 171.42) (143.49, 188.32)

Values of Predictors for New ObservationsNew c_age c_agesq1 1.37 1.88

Example 5

Page 33: Overview of our study of the   multiple linear regression model

The good news!

• Everything you learned about the simple linear regression model extends, with at most minor modification, to the multiple linear regression model:– same assumptions, same model checking– (adjusted) R2

– t-tests and t-intervals for one slope– prediction (confidence) intervals for (mean)

response

Page 34: Overview of our study of the   multiple linear regression model

New things we need to learn!

• The above research scenarios (models) and a few more

• The “general linear test” which helps to answer many research questions

• F-tests for more than one slope• Interactions between two or more predictor

variables• Identifying influential data points

Page 35: Overview of our study of the   multiple linear regression model

New things we need to learn!

• Detection of (“variance inflation factors”) correlated predictors (“multicollinearity”) and the limitations they cause

• Selection of variables from a large set of variables for inclusion in a model (“stepwise regression and “best subsets regression”)


Top Related