overview of our study of the multiple linear regression model
Post on 02-Feb-2016
24 Views
Preview:
DESCRIPTION
TRANSCRIPT
Overview of our study of the multiple linear regression model
Regression models with
more than one slope parameter
Is brain and body size predictive of intelligence?
• Sample of n = 38 college students• Response (y): intelligence based on PIQ
(performance) scores from the (revised) Wechsler Adult Intelligence Scale.
• Potential predictor (x1): Brain size based on MRI scans (given as count/10,000).
• Potential predictor (x2): Height in inches.• Potential predictor (x3): Weight in pounds.
Example 1
Scatter matrix plot
Example 1
100.728
86.28373.25
65.75170.5
127.5
130.5
91.5
100.728
86.283
73.25
65.75
PIQ
Brain
Height
Weight
Scatter matrix plot
Example 1
100.728
86.28373.25
65.75170.5
127.5
130.5
91.5
100.728
86.283
73.25
65.75
Brain Height Weight
PIQ
Bra
inH
eigh
t
Scatter matrix plot
• Illustrates the marginal relationships between each pair of variables without regard to the other variables.
• The challenge is how the response y relates to all three predictors simultaneously.
A multiple linear regression model with three quantitative predictors
iiiii xxxy 3322110
where …
• yi is intelligence (PIQ) of student i
• xi1 is brain size (MRI) of student i
• xi2 is height (Height) of student i
• xi3 is weight (Weight) of student i
Example 1
and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.
Some research questions
• Which predictors – brain size, height, or weight – explain some variation in PIQ?
• What is the effect of brain size on PIQ, after taking into account height and weight?
• What is the PIQ of an individual with a given brain size, height, and weight?
Example 1
Example 1
The regression equation isPIQ = 111 + 2.06 Brain - 2.73 Height + 0.001 Weight
Predictor Coef SE Coef T PConstant 111.35 62.97 1.77 0.086Brain 2.0604 0.5634 3.66 0.001Height -2.732 1.229 -2.22 0.033Weight 0.0006 0.1971 0.00 0.998
S = 19.79 R-Sq = 29.5% R-Sq(adj) = 23.3%
Analysis of VarianceSource DF SS MS F PRegression 3 5572.7 1857.6 4.74 0.007Residual Error 34 13321.8 391.8Total 37 18894.6
Source DF Seq SSBrain 1 2697.1Height 1 2875.6Weight 1 0.0
Baby bird breathing habits in burrows?
• Experiment with n = 120 nestling bank swallows• Response (y): % increase in “minute ventilation”,
Vent, i.e., total volume of air breathed per minute
• Potential predictor (x1): percentage of oxygen, O2, in the air the baby birds breathe
• Potential predictor (x2): percentage of carbon dioxide, CO2, in the air the baby birds breathe
Example 2
Scatter matrix plot
Example 2
17.514.5 6.752.25
484.75
52.25
17.5
14.5
Vent
O2
CO2
Three-dimensional scatter plot
13-200
0
14
200
400
15 16 17 18
Vent
O2
400
600
86
4 CO22
180
19
Example 2
A first order model with two quantitative predictors
iiii xxy 22110
where …
• yi is percentage of minute ventilation
• xi1 is percentage of oxygen
• xi2 is percentage of carbon dioxide
and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.
Example 2
Some research questions
• Is oxygen related to minute ventilation, after taking into account carbon dioxide?
• Is carbon dioxide related to minute ventilation, after taking into account oxygen?
• What is the mean minute ventilation of all nestling bank swallows whose breathing air is comprised of 15% oxygen and 5% carbon dioxide?
Example 2
Example 2
The regression equation isVent = 86 - 5.33 O2 + 31.1 CO2
Predictor Coef SE Coef T PConstant 85.9 106.0 0.81 0.419O2 -5.330 6.425 -0.83 0.408CO2 31.103 4.789 6.50 0.000
S = 157.4 R-Sq = 26.8% R-Sq(adj) = 25.6%
Analysis of VarianceSource DF SS MS F PRegression 2 1061819 530909 21.44 0.000Residual Error 117 2897566 24766Total 119 3959385
Source DF Seq SSO2 1 17045CO2 1 1044773
Is baby’s birth weight related to smoking during pregnancy?
• Sample of n = 32 births
• Response (y): birth weight in grams of baby
• Potential predictor (x1): smoking status of mother (yes or no)
• Potential predictor (x2): length of gestation in weeks
Example 3
Scatter matrix plot
4036 0.750.25
3252.5
2697.5
40
36
Weight
Gest
Smoking
Example 3
A first order modelwith one binary predictor
iiii xxy 22110
where …
• yi is birth weight of baby i
• xi1 is length of gestation of baby i
• xi2 = 1, if mother smokes and xi2 = 0, if not
and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.
Example 3
Estimated first order modelwith one binary predictor
0 1
424140393837363534
3700
3200
2700
2200
Gestation (weeks)
Wei
ght (
gram
s)
The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking
Example 3
Some research questions
• Is baby’s birth weight related to smoking during pregnancy?
• How is birth weight related to gestation, after taking into account smoking status?
Example 3
Example 3
The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking
Predictor Coef SE Coef T PConstant -2389.6 349.2 -6.84 0.000Gest 143.100 9.128 15.68 0.000Smoking -244.54 41.98 -5.83 0.000
S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%
Analysis of Variance
Source DF SS MS F PRegression 2 3348720 1674360 125.45 0.000Residual Error 29 387070 13347Total 31 3735789
Source DF Seq SSGest 1 2895838Smoking 1 452881
Compare three treatments (A, B, C) for severe depression
• Random sample of n = 36 severely depressed individuals.
• y = measure of treatment effectiveness
• x1 = age (in years)
• x2 = 1 if patient received A and 0, if not
• x3 = 1 if patient received B and 0, if not
Example 4
A B
C
706050403020
75
65
55
45
35
25
age
y
Compare three treatments (A, B, C) for severe depression
Example 4
A second order model with one quantitative predictor, a three-group qualitative variable, and interactions
iiiii
iiii
xxxx
xxxy
31132112
3322110
where …
• yi is treatment effectiveness for patient i
• xi1 is age of patient i
• xi2 = 1, if treatment A and xi2 = 0, if not
• xi3 = 1, if treatment B and xi3 = 0, if not
Example 4
The estimated regression function
A B
C
706050403020
80
70
60
50
40
30
20
age
y
y = 47.5 + 0.33x
y = 6.21 + 1.03x
y = 28.9 + 0.52x
Example 4
Regression equation is y = 6.21 + 1.03 age + 41.3 x2 + 22.7 x3 - 0.703 agex2 - 0.510 agex3
Potential research questions
• Does the effectiveness of the treatment depend on age?
• Is one treatment superior to the other treatment for all ages?
• What is the effect of age on the effectiveness of the treatment?
Example 4
Regression equation is y = 6.21 + 1.03 age + 41.3 x2 + 22.7 x3 - 0.703 agex2 - 0.510 agex3
Predictor Coef SE Coef T PConstant 6.211 3.350 1.85 0.074age 1.03339 0.07233 14.29 0.000x2 41.304 5.085 8.12 0.000x3 22.707 5.091 4.46 0.000agex2 -0.7029 0.1090 -6.45 0.000agex3 -0.5097 0.1104 -4.62 0.000
S = 3.925 R-Sq = 91.4% R-Sq(adj) = 90.0%
Analysis of VarianceSource DF SS MS F PRegression 5 4932.85 986.57 64.04 0.000Residual Error 30 462.15 15.40Total 35 5395.00
Source DF Seq SSage 1 3424.43x2 1 803.80x3 1 1.19agex2 1 375.00agex3 1 328.42
Example 4
How is the length of a bluegill fish related to its age?
• In 1981, n = 78 bluegills randomly sampled from Lake Mary in Minnesota.
• y = length (in mm)
• x1 = age (in years)
Example 5
Scatter plot
654321
200
150
100
age
leng
th
Example 5
A second order polynomial model with one quantitative predictor
iiii xxy 21110
where …
• yi is length of bluegill (fish) i (in mm)
• xi is age of bluegill (fish) i (in years)
and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.
Example 5
Estimated regression function
1 2 3 4 5 6
100
150
200
age
leng
th
length = 13.6224 + 54.0493 age - 4.71866 age**2
S = 10.9061 R-Sq = 80.1 % R-Sq(adj) = 79.6 %
Regression Plot
Example 5
Potential research questions
• How is the length of a bluegill fish related to its age?
• What is the length of a randomly selected five-year-old bluegill fish?
Example 5
The regression equation is length = 148 + 19.8 c_age - 4.72 c_agesq
Predictor Coef SE Coef T PConstant 147.604 1.472 100.26 0.000c_age 19.811 1.431 13.85 0.000c_agesq -4.7187 0.9440 -5.00 0.000
S = 10.91 R-Sq = 80.1% R-Sq(adj) = 79.6%
Analysis of VarianceSource DF SS MS F PRegression 2 35938 17969 151.07 0.000Residual Error 75 8921 119Total 77 44859...Predicted Values for New ObservationsNew Fit SE Fit 95.0% CI 95.0% PI1 165.90 2.77 (160.39, 171.42) (143.49, 188.32)
Values of Predictors for New ObservationsNew c_age c_agesq1 1.37 1.88
Example 5
The good news!
• Everything you learned about the simple linear regression model extends, with at most minor modification, to the multiple linear regression model:– same assumptions, same model checking– (adjusted) R2
– t-tests and t-intervals for one slope– prediction (confidence) intervals for (mean)
response
New things we need to learn!
• The above research scenarios (models) and a few more
• The “general linear test” which helps to answer many research questions
• F-tests for more than one slope• Interactions between two or more predictor
variables• Identifying influential data points
New things we need to learn!
• Detection of (“variance inflation factors”) correlated predictors (“multicollinearity”) and the limitations they cause
• Selection of variables from a large set of variables for inclusion in a model (“stepwise regression and “best subsets regression”)
top related