multiple linear regression v lecture 9 - github pages
TRANSCRIPT
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.1
Lecture 9Multiple Linear Regression VReading: Chapter 13
STAT 8020 Statistical Methods IISeptember 17, 2020
Whitney HuangClemson University
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.2
Agenda
1 Model Diagnostics: Influential Points
2 Non-Constant Variance & Transformation
3 Regression with Both Quantitative and QualitativePredictors
4 Polynomial Regression
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.3
Leverage
Recall in MLR that Y = X(XTX)−1XTY = HY where H isthe hat-matrix
The leverage value for the ith observation is defined as:
hi = Hii
Can show that Var(ei) = σ2(1− hi), where ei = Yi − Yi isthe residual for the ith observation
1n ≤ hi ≤ 1, 1 ≤ i ≤ n and h =
∑ni=1
hi
n = pn ⇒ a “rule of
thumb" is that leverages of more than 2pn should be looked
at more closely
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.4
Leverage Values of Species ∼ Elev + Adj
●
●
●●
●
●● ●● ● ●
●
●●
●
●
●●
●
●●
●
● ●●● ●● ● ●
0 500 1000 1500
0
1000
2000
3000
4000
5000
Elevation
Adj
acen
t
●
●
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.5
Studentized Residuals
As we have seen Var(ei) = σ2(1− hi), this suggests the use ofri = ei
σ√
(1−hi)
ri’s are called studentized residuals. ri’s are sometimespreferred in residual plots as they have been standardizedto have equal variance.
If the model assumptions are correct then Var(ri) = 1 andCorr(ei, ej) tends to be small
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.6
Studentized Residuals of Species ∼ Elev + Adj
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
0 5 10 15 20 25 30
−2
−1
0
1
2
3
Studentized Residuals
r i
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.7
Studentized Deleted Residuals
For a given model, exclude the observation i andrecompute β(i), σ(i) to obtain Yi(i)
The observation i is an outlier if Yi(i) − Yi is “large”
Can showVar(Yi(i) − Yi) = σ2
(i)
(1 + xTi (XT
(i)X(i))−1xi
)=
σ2(i)
1−hi
Define the Studentized Deleted Residuals as
ti =Yi(i) − Yiσ2(i)/1− hi
=Yi(i) − Yi
MSE(i)(1− hi)−1
which are distributed as a tn−p−1 if the model is correctand ε ∼ N(0, σ2I)
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.8
Jackknife Residuals of Species ∼ Elev + Adj
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
0 5 10 15 20 25 30
−2
−1
0
1
2
3
4
5
Jacknife Residuals
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.9
Influential Observations
DFFITS
Difference between the fitted values Yi and the predictedvalues Yi(i)
DFFITSi =Yi−Yi(i)√MSE(i)hi
Concern if absolute value greater than 1 for small datasets, or greater than 2
√p/n for large data sets
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.10
DFFITS of Species ∼ Elev + Adj
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
16
25Threshold: 0.63
−1
0
1
0 10 20 30Observation
DF
FIT
S
Influence Diagnostics for Species
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.11
Residual Plot of Species ∼ Elev + Adj
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
0 100 200 300 400
−100
−50
0
50
100
150
200
Residuals
Y
e
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.12
Residual Plot After Square Root Transformation
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
5 10 15 20
−4
−2
0
2
4
6
Residuals
Y
e
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.13
Regression with Both Quantitative and Qualitative Predictors
Multiple Linear Regression
Y = β0 + β1X1 + β2X2 + · · ·+ βp−1Xp−1 + ε, ε ∼ N(0, σ2)
X1, X2, · · · , Xp−1 are the predictors.
Question: What if some of the predictors are qualitative(categorical) variables?
⇒We will need to create dummy (indicator) variables forthose categorical variables
Example: We can encode Gender into 1 (Female) and 0(Male)
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.14
Salaries for Professors Data Set
The 2008-09 nine-month academic salary for AssistantProfessors, Associate Professors and Professors in acollege in the U.S. The data were collected as part of theon-going effort of the college’s administration to moni-tor salary differences between male and female facultymembers.
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.15
Predictors
We have three categorical variables, namely, rank,discipline, and sex.
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.16
Dummy Variable
For binary categorical variables:
Xsex =
{0 if sex = male,1 if sex = female.
Xdiscip =
{0 if discip = A,1 if discip = B.
For categorical variable with more than two categories:
Xrank1 =
{0 if rank = Assistant Prof,1 if rank = Associated Prof.
Xrank2 =
{0 if rank = Associated Prof,1 if rank = Full Prof.
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.17
Design Matrix
With the design matrix X, we can now use method ofleast squares to fit the model Y = Xβ + ε
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.18
Model Fit
Question: Interpretation of these dummy variables (e.g.βrankAssocProf)?
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.19
lm(salary ∼ sex ∗ yrs.since.phd)
0 10 20 30 40 50
100
150
200
yrs.since.phd
●
●
MaleFemale
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.20
Polynomial Regression
Suppose we would like to model the relationship betweenresponse Y and a predictor X as a pth degree polynomial in X:
Yi = β0 + β1Xi + β2X2i + · · ·+ βpX
pi + ε
We can treat polynomial regression as a special case ofmultiple linear regression. In specific, the design matrix takesthe following form:
X =
1 X1 X2
1 · · · Xp1
1 X2 X22 · · · Xp
2... · · ·
. . ....
...1 Xn X2
n · · · Xpn
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.21
Housing Values in Suburbs of Boston Data Set
10 20 30
10
20
30
40
50
Lower status of the population (percent)
Med
ian
valu
e of
ow
ner−
occu
pied
hom
es
Multiple LinearRegression V
Model Diagnostics:Influential Points
Non-ConstantVariance &Transformation
Regression with BothQuantitative andQualitative Predictors
Polynomial Regression
9.22
Polynomial Regression Fits
10 20 30
10
20
30
40
50
Lower status of the population (percent)
Med
ian
valu
e of
ow
ner−
occu
pied
hom
es