Download - Mult reg
![Page 1: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/1.jpg)
Multiple Regression
GoalsImplementation
Assumptions
![Page 2: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/2.jpg)
Goals of Regression Description Inference Prediction (Forecasting)
![Page 3: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/3.jpg)
Examples
![Page 4: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/4.jpg)
Why is there a need for more than one predictor variable?
Shown using the examples given above:
more than one variable influences a response variable.
Predictors may themselves be correlated,
What is the independent contribution of each variable to explaining the variation in the response variable.
![Page 5: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/5.jpg)
Three fundamental aspects of linear regression
Model selection – What is the most parsimonious set of
predictors that explain the most variation in the response variable
Evaluation of Assumptions Have we met the assumptions of the
regression model Model validation
![Page 6: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/6.jpg)
The multiple regression model Express a p variable regression
model as a series of equations P equations condensed into a
matrix form, gives the familiar general linear
model coefficients are known as partial
regression coefficients
![Page 7: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/7.jpg)
The p – variable Regression Model
This model gives the expected value of Y conditional on the fixed values of X2, X3, Xp, plus error
1 - Intercept
2p- Partial Regression slope coefficients
i - Residual term associated with the ith observation
ipipiii XXXY 33221
![Page 8: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/8.jpg)
Matrix Representation
Regression model is best described as a system of equations:
npnpnn
pp
pp
n XXX
XXX
XXX
Y
Y
Y
33221
22323222
113132121
1
2
1
1
![Page 9: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/9.jpg)
We can re-write these equations:
nppnn
p
p
n XXX
XXX
XXX
Y
Y
Y
2
1
2
1
332
23222
13121
2
1
1
1
1
Y = X +
(n 1)
(n p) (p 1)
(n 1)
![Page 10: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/10.jpg)
Summary of Terms
Y = n 1 column vector of observations for response variable
X = n p matrix that portrays the n observations on p – 1 independent variablesX2, , Xp, and the first column of 1’s represents the intercept term, e.g., 1
= p 1 column vector of unknown parameters, 1, 2, , p, where 1, is the intercept term and the 2, , p, are partial regression coefficients.
= n 1 column vector of residuals i
![Page 11: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/11.jpg)
Response Variable
Intercept
Partial Regression Coefficient
Predictor Variable
A Partial Regression Model
Burst = 1.21 + 2.1 Femur Length – 0.25 Tail Length + 1.0 Toe Velocity
![Page 12: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/12.jpg)
Assumption 1. Expected value of the residual vector is 0
0
0
0
2
1
n
EE
![Page 13: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/13.jpg)
Assumption 2. There is no correlation between
the ith and jth residual terms
0jiE
![Page 14: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/14.jpg)
Assumption 3. The residuals exhibit constant
variance
IE 2
![Page 15: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/15.jpg)
Assumption 4. Covariance between the X’s and
residual terms is 0 Usually satisfied if the predictor
variables are fixed and non-stochastic
0,cov X
![Page 16: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/16.jpg)
Assumption 5. The rank of the data matrix, X is p,
the number of columns p < n, the number of observations. No exact linear relationships among
X variables. Assumption of no multicollinearity
pXr
![Page 17: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/17.jpg)
If these assumptions hold… Then the OLS estimators are in
the class of unbiased linear estimators
Also minimum variance estimators
![Page 18: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/18.jpg)
What does it mean to be BLUE? What does this mean? Allows us to compute a number of
statistics. OLS estimation
![Page 19: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/19.jpg)
An estimator , is the best linear unbiased estimator of , iff Linear Unbiased, i.e., E( ) = Minimum variance in class of all linear
unbiased estimators Unbiased and minimum variance properties
means that OLS estimators are efficient estimators
If one or more of the conditions are not met than the OLS estimators are no longer BLUE
![Page 20: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/20.jpg)
Does is matter?
Yes, it means we require an alternative method for
characterizing the association between our Y
and X variables
![Page 21: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/21.jpg)
OLS Estimation
eXbY Sample-based counter part to population regression model:
OLS requires choosing values of b, such that error sum-of-squares (SSE) is as small as possible.
![Page 22: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/22.jpg)
The Normal Equations
XbYXbYeeSSE
Need to differentiate with respect to the unknowns (b):
Yields p simultaneous equations in p unknowns, Also known as the Normal Equations
![Page 23: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/23.jpg)
Matrix form of the Normal Equations
YXbXX
![Page 24: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/24.jpg)
The solution for the “b’s”
It should be apparent how to solve for the unknown parameters
Pre-multiply by the inverse of XX
YXXXbXXXX 11
![Page 25: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/25.jpg)
Solution ContinuedFrom the properties of Inverses we note that:
IXXXX 1
YXXXIb 1
YXXXb 1
This is the fundamental outcome of OLS theory
![Page 26: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/26.jpg)
Assessment of “Goodness-of-Fit” Use the R2 statistic
It represents the proportion of variability in response variable that is accounted for by the regression model
1 R2 1 Good fit of model means that R-
square will be close to one. Poor fit means that R-square will
be near 0.
![Page 27: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/27.jpg)
R2 – Multiple Coefficient of Determination
YYYY
YYYYR
ˆˆ
12
SST
SSER 12
SST
SSRR 2
Alternative Expressions
![Page 28: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/28.jpg)
Critique of R2 in Multiple Regression R2 inflated by increasing the
number of parameters in the model.
One should also analyze the residual values from the model (MSE)
Alternatively use the adjusted R2
![Page 29: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/29.jpg)
Adjusted R2
1
ˆˆ12
nYYYY
pnYYYYR
22;1 RRp
![Page 30: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/30.jpg)
How does adjusted R-square work? Total Sum-of-Squares is fixed,
because it is independent of number of variables The numerator, SSE, decreases as the
number of variables increases. R2 artificially inflated by adding
explanatory variables to the model Use Adjusted R2 to compare different
regression Adjusted R2 takes into account the number
of predictors in the model
![Page 31: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/31.jpg)
Statistical Inference and Hypothesis Testing Our goal may be:
1) hypothesis testing & 2) interval estimation
Hence we will need to impose distributional limits on the residuals
It turns out the probability distribution of the OLS estimators depends on the probability distribution of the residuals, .
![Page 32: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/32.jpg)
Recount Assumptions Normality – this means the
elements of b are normally distributed
b’s are unbiased. If these hold then we can perform
several hypothesis tests.
![Page 33: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/33.jpg)
ANOVA Approach Decomposition of total sums-of-
squares into components relating explained variance (regression) unexplained variance (error)
![Page 34: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/34.jpg)
ANOVA Table
Source of
Variation
Sums-of-Squares
df Mean Square
F-ratio
Regression
p - 1 MSR/MSE
Residual n - p
Total n - 1
2YnYXb
YXbYY
YY
1
2
p
YnYXb
pn
YXbYY
![Page 35: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/35.jpg)
Test of Null Hypothesis
Tests the null hypothesis:
H0: 2=3p = 0
Null hypothesis is known as a joint or simultaneous hypothesis, because it compares the values of all i simultaneously This tests overall significance of regression model
![Page 36: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/36.jpg)
The F-test statistic and R2 vary directly
pnYXbYY
pYnYXbF
12 pnSSE
pSSRF
1
pnSSRSST
pSSRF
1
11
p
pn
SSTSSR
SSTSSRF
11 2
2
p
pn
R
RF
![Page 37: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/37.jpg)
Tests of Hypotheses of true
Assume the regression coefficients are normally distributed
b N,2[]-1)
cov(b) = E(b - )(b - )= 2[]-1
Estimate of 2 is s2
pn
XbYXbYs
2
![Page 38: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/38.jpg)
Test Statistic
ii
ii
cs
bt
where cii is the element of the ith row and ith column of []-1
Follows a t distribution with n – p df.
iii cspntb
;
2
100(1-)% Confidence Interval is obtained from
![Page 39: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/39.jpg)
Model Comparisons Our interest is in parsimonious modeling
We seek a minimum set of X variables to predict variation in Y response variable.
Goal is to reduce the number of predictor variables to arrive at a more parsimonious description of the data.
Does leaving out one of the b’s significantly diminish the variance explained by the model.
Compare a Saturated to an Unsaturated model Note there are many possible Unsaturated models.
![Page 40: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/40.jpg)
General Philosophy Let SSE( r ) designate the error sum-of-squares
for reduced model SSE( r ) SSE(f) The saturated model will contain p parameters The reduced model will contain k < p
parameters If we assume the errors are normally
distributed with mean 0 and variance sigma squared, then we can compare the two models.
![Page 41: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/41.jpg)
Model Comparison
Compare saturated model with the reduced model Use the SSE terms as the basis for comparison
pnfSSE
kpfSSErSSE
)(
Follows an F-distribution, with (p – k), (n – p) dfIf Fobs > Fcritical we reject the reduced model as a parsimonious modelthe bi must be included in the model
Hence,
![Page 42: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/42.jpg)
How Many Predictors to Retain?A short course in Model Selection Several Options
Sequential Selection Backward Selection Forward Selection Stepwise Selection
All possible subsets MAXR MINR RSQUARE ADJUSTED RSQUARE CP
![Page 43: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/43.jpg)
Sequential Methods Forward, Stepwise, Backward
selection procedures Entails “Partialling-out” the predictor
variables Based on the partial correlation
coefficient
223
213
2313123.12
11 rr
rrrr
![Page 44: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/44.jpg)
Forward Selection Build-up” procedure. Add predictors until the “best”
regression model is obtained
![Page 45: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/45.jpg)
Outline of Forward Selection
1) No variables are included in regression equation2) Calculate correlations of all predictors with
dependent variable3) Enter predictor variable with highest correlation
into regression model if its corresponding partial F-value exceeds a predetermined threshold
4) Calculate the regression equation with the predictor
5) Select the predictor variable with the highest partial correlation to enter next.
![Page 46: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/46.jpg)
Forward Selection ContinuedCompare the partial F-test value
(called FH also known as “F-to-enter”):to a predetermined tabulated F-value
(called FC)
If FH > FC, include the variable with the highest partial correlation and return to step 5.
If FH < FC, stop and retain the regression equation as calculated
![Page 47: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/47.jpg)
Backward Selection A “deconstruction” approach Begin with the saturated (full) regression model Compute the drop in R2 as a consequence of
eliminating each predictor variable, and the partial F-test value; treat as if the variable was the last to enter the regression equation
Compare the lowest partial F-test value, (designated FL), to the critical value of F (designated FC)
a. If FL < FC, remove the variable recompute the regression equation using the remaining predictor variables and return to step 2.
b. FL < FC, adopt the regression equation as calculated
![Page 48: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/48.jpg)
Stepwise Selection Calculate correlations of all predictors with response
variable Select the predictor variable with highest correlation.
Regress Y on Xi. Retain the predictor if there is a significant F-test value.
Calculate partial correlations of all variable not in equation with response variable. Select next predictor to enter that has the highest partial correlation. Call this predictor Xj.
Compute the regression equation with both Xi and Xj entered. Retain Xj if its partial F-value exceeds the tabulated F (1, n-2-1) df.
Now determine whether Xi warrants retention. Compare its partial F-value as if Xj was entered into the equation first.
![Page 49: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/49.jpg)
Stepwise Continued Retain if its F-value exceeds the tabulated F
value Enter a new Xk variable. Compute
regression with three predictors. Compute partial F-values for Xi, Xj and Xk.
Determine whether any should be retained by comparing observed partial F with the critical F.
6) Retain regression equation when no other predictor can be entered or removed from the model.
![Page 50: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/50.jpg)
All possible subsets
s2 is residual variance for reduced model and 2 is the residual variance for full model
All subset regressions compute possible 1, 2, 3, … variable models given some optimality criterion.
Requires use of optimality criterion, e.g., Mallow’s Cp
2
22
ˆ
ˆ
pns
pC p (p = k + 1)
![Page 51: Mult reg](https://reader030.vdocuments.mx/reader030/viewer/2022012913/548ebe3cb47959c8558b483a/html5/thumbnails/51.jpg)
Mallow’s Cp
Measures total squared error Choose model where Cp ~ p