lecture 25 multiple regression diagnostics (sections 19.4-19.5) polynomial models (section 20.2)
Post on 20-Dec-2015
216 views
TRANSCRIPT
Lecture 25
• Multiple Regression Diagnostics (Sections 19.4-19.5)
• Polynomial Models (Section 20.2)
• The conditions required for the model assessment to apply must be checked.
– Is the error variable normally distributed?
– Is the regression function correctly specified as a linear function of x1,…,xk ( ) Plot the residuals versus x
and
– Is the error variance constant?
– Are the errors independent?
– Can we identify outliers and influential observations?– Is multicollinearity a problem?
19.4 Regression Diagnostics - II
Draw a histogram of the residuals
Plot the residuals versus y
Plot the residuals versus the time periods
y
0)( iE
Effects of Violated Assumptions
• Curvature ( ): slopes no longer meaningful
(Potential remedy: Transformations of responses and predictors)
• Violations of other assumptions: tests, p-values, CIs are no longer accurate. That is, inference is invalidated (Remedies may be difficult)
0)( iE j
Influential Observation
• Influential observation: An observation is influential if removing it would markedly change the results of the analysis.
• In order to be influential, a point must either be (i) an outlier in terms of the relationship between its y and x’s or
(ii) have unusually distant x’s (high leverage) and not fall exactly into the relationship between y and x’s that the rest of the data follows.
Simple Linear Regression Example
• Data in salary.jmp. Y=Weekly Salary, X=Years of Experience.Bivariate Fit of Weekly Salary By Years of Experience
0
100
200
300
400
500
600
700
Weekly
Sala
ry
0 5 1015202530354045
Years of Experience
Identification of Influential Observations
• Cook’s distance is a measure of the influence of a point – the effect that omitting the observation has on the estimated regression coefficients.
• Use Save Columns, Cook’s D Influence to obtain Cook’s Distance.
• Plot Cook’s Distances: Graph, Overlay Plot, put Cook’s D Influence in Y and leave X blank (plots Cook’
Cook’s Distance
• Rule of thumb: Observation with Cook’s Distance (Di) >1 has high influence. You should also be concerned about any observation that has Di<1 but has a much bigger Di than any other observation. Ex. 19.2:
0
0.05
0.1
0.15
0.2
0.25
Co
ok
's D
Infl
ue
nc
e P
ric
e
-10 0 102030405060708090 110
Rows
Overlay Plot
Strategy for dealing with influential observations/outliers
• Do the conclusions change when the obs. is deleted?– If No. Proceed with the obs. Included. Study the obs to
see if anything can be learned.– If Yes. Is there reason to believe the case belongs to a
population other than the one under investigation?• If Yes. Omit the case and proceed.• If No. Does the case have unusually “distant” independent
variables.
– If Yes. Omit the case and proceed. Report conclusions for the reduced range of explanatory variables.
– If No. Not much can be said. More data are needed to resolve the questions.
Multicollinearity
• Multicollinearity: Condition in which independent variables are highly correlated.
• Exact collinearity: Y=Weight, X1=Height in inches, X2=Height in feet. Then
provide the same predictions.• Multicollinearity causes two kinds of difficulties:
– The t statistics appear to be too small.– The coefficients cannot be interpreted as “slopes”.
21
21
05.25.1ˆ
245.5.1ˆ
XXY
XXY
Multicollinearity Diagnostics
• Diagnostics:– High correlation between independent variables– Counterintuitive signs on regression
coefficients– Low values for t-statistics despite a significant
overall fit, as measured by the F statistic.
Diagnostics: Multicollinearity
• Example 19.2: Predicting house price (Xm19-02) – A real estate agent believes that a house selling price can be
predicted using the house size, number of bedrooms, and lot size.
– A random sample of 100 houses was drawn and data recorded.
– Analyze the relationship among the four variables
Price Bedrooms H Size Lot Size124100 3 1290 3900218300 4 2080 6600117800 3 1250 3750
. . . .
. . . .
• The proposed model isPRICE = 0 + 1BEDROOMS + 2H-SIZE +3LOTSIZE +
The model is valid, but no variable is significantly relatedto the selling price ?!
Diagnostics: Multicollinearity
Summary of Fit RSquare 0.559998 RSquare Adj 0.546248 Root Mean Square Error 25022.71 Mean of Response 154066 Observations (or Sum Wgts) 100 Analysis of Variance Source DF Sum of Squares Mean Square F Ratio
Model 3 7.65017e10 2.5501e10 40.7269 Error 96 6.0109e+10 626135896 Prob > F
C. Total 99 1.36611e11 <.0001 Parameter Estimates Term Estimate Std Error t Ratio Prob>|t|
Intercept 37717.595 14176.74 2.66 0.0091 Bedrooms 2306.0808 6994.192 0.33 0.7423 House Size 74.296806 52.97858 1.40 0.1640 Lot Size -4.363783 17.024 -0.26 0.7982
• Multicollinearity is found to be a problem.Price Bedrooms H Size Lot Size
Price 1Bedrooms 0.6454 1H Size 0.7478 0.8465 1Lot Size 0.7409 0.8374 0.9936 1
Diagnostics: Multicollinearity
• Multicollinearity causes two kinds of difficulties:– The t statistics appear to be too small.– The coefficients cannot be interpreted as “slopes”.
Remedying Violations of the Required Conditions
• Nonnormality or heteroscedasticity can be remedied using transformations on the y variable.
• The transformations can improve the linear relationship between the dependent variable and the independent variables.
• Many computer software systems allow us to make the transformations easily.
• A brief list of transformations» y’ = log y (for y > 0)
• Use when the s increases with y, or• Use when the error distribution is positively skewed
» y’ = y2
• Use when the s2 is proportional to E(y), or
• Use when the error distribution is negatively skewed
» y’ = y1/2 (for y > 0)• Use when the s2
is proportional to E(y)
» y’ = 1/y• Use when s2
increases significantly when y increases beyond some critical value.
Reducing Nonnormality by Transformations
Transformations, Example.
Durbin - Watson Test:Are the Errors Autocorrelated?• This test detects first order autocorrelation
between consecutive residuals in a time series• If autocorrelation exists the error variables are not
independent
Positive First Order Autocorrelation
++
+
+
+
++ Residuals
Time
Positive first order autocorrelation occurs when consecutive residuals tend to be similar. Then,the value of d is small (less than 2).
0
+
Negative First Order Autocorrelation
+
++
+
+
+
+0
Residuals
Time
Negative first order autocorrelation occurs when consecutive residuals tend to markedly differ. Then, the value of d is large (greater than 2).
Durbin-Watson Test in JMP
• H0: No first-order autocorrelation.
H1: First-order autocorrelation• Use row diagnostics, Durbin-Watson test in
JMP after fitting the model.
• Autocorrelation is an estimate of correlation between errors.
Durbin-Watson Durbin-Watson Number of Obs. AutoCorrelation Prob<DW
0.5931403 20 0.5914 0.0002
• Example 19.3 (Xm19-03)– How does the weather affect the sales of lift tickets in a ski
resort?
– Data of the past 20 years sales of tickets, along with the total snowfall and the average temperature during Christmas week in each year, was collected.
– The model hypothesized was
TICKETS=0+1SNOWFALL+2TEMPERATURE+ – Regression analysis yielded the following results:
Testing the Existence of Autocorrelation, Example
20.1 Introduction
• Regression analysis is one of the most commonly used techniques in statistics.
• It is considered powerful for several reasons:– It can cover a variety of mathematical models
• linear relationships.
• non - linear relationships.
• nominal independent variables.
– It provides efficient methods for model building
Curvature: Midterm Problem 10B i v a r i a t e F i t o f M P G C i t y B y W e i g h t ( l b )
15
20
25
30
35
40MP
G Ci
ty
1500 2500 3000 3500 4000
W eight(lb)
-6
-22
610
Resid
ual
1500 2000 2500 3000 3500 4000
Weight(lb)
Remedy I: Transformations
• Use Tukey’s Bulging Rule to choose a transformation.
B i v a r i a t e F i t o f 1 / M P G C i t y B y W e i g h t ( l b )
0.03
0.04
0.05
0.06
0.07
1/MPG
City
1500 2500 3000 3500 4000
Weight(lb)
-0.010
0.000
0.010
Resid
ual
1500 2000 2500 3000 3500 4000
Weight(lb)
y = 0 + 1x1+ 2x2 +…+ pxp +
y = 0 + 1x + 2x2 + …+pxp +
Remedy II: Polynomial Models
Quadratic RegressionB i v a r i a t e F i t o f M P G C i t y B y W e i g h t ( l b )
15
20
25
30
35
40M
PG
City
1500 2500 3000 3500 4000
Weight(lb)
P a r a m e t e r E s t i m a t e s T e r m E s t i m a t e S t d E r r o r t R a t i o P r o b > | t |
I n t e r c e p t 4 0 . 1 6 6 6 0 8 0 . 9 0 2 2 3 1 4 4 . 5 2 < . 0 0 0 1 W e i g h t ( l b ) - 0 . 0 0 6 8 9 4 0 . 0 0 0 3 2 - 2 1 . 5 2 < . 0 0 0 1 ( W e i g h t ( l b ) - 2 8 0 9 . 5 ) ^ 2 0 . 0 0 0 0 0 3 4 . 6 3 4 e - 7 6 . 3 8 < . 0 0 0 1
-7-4-1
25
Res
idua
l
1500 2000 2500 3000 3500 4000
Weight(lb)
y01x
• First order model (p = 1)
y = 0 + 1x + 2x2 +
2 < 0 2 > 0
• Second order model (p=2)
Polynomial Models with One Predictor Variable
y = 0 + 1x + 2x2 + 3x3 +
3 < 0 3 > 0
• Third order model (p = 3)
Polynomial Models with One Predictor Variable
Interaction
• Two independent variables x1 and x2 interact if the effect of x1 on y is influenced by the value of x2.
• Interaction can be brought into the multiple linear regression model by including the independent variable x1* x2.
• Example: EducIQIQEducmeoInc **10*100*20001000ˆ
Interaction Cont.
• • “Slope” for x1=E(y|x1+1,x2)-E(y|x1,x2)= • • Is the expected income increase from an
extra year of education higher for people with IQ 100 or with IQ 130 (or is it the same)?
21322110 xxxxy
231 x
EducIQIQEducmeoInc **10*100*20001000ˆ
• First order model, two predictors,and interactiony = 0 + 1x1 + 2x2
+3x1x2 +
x1
X2 = 2
X2 = 3
X2 =10+2(1)] +[1+3(1)]x1
0+2(3)] +[1+3(3)]x1
0+2(2)] +[1+3(2)]x1
The two variables interact to affect the value of y.
• First order modely = 0 + 1x1 + 2x2 +
The effect of one predictor variable on y is independent of the effect of the other predictor variable on y.
x1
X2 = 1X2 = 2X2 = 3
0+2(1)] +1x10+2(2)] +1x10+2(3)] +1x1
Polynomial Models with Two Predictor Variables
Second order model withinteractiony = 0 + 1x1 + 2x2
+3x12 + 4x2
2+
y = [0+2(2)+4(22)]+ 1x1 + 3x12 +
Second order modely = 0 + 1x1 + 2x2
+ 3x12 + 4x2
2 +
X2 =1
X2 = 2
X2 = 3
y = [0+2(1)+4(12)]+ 1x1 + 3x12 +
x1
X2 =1
X2 = 2
X2 = 3y = [0+2(3)+4(32)]+ 1x1 + 3x1
2 +
Polynomial Models with Two Predictor Variables
5x1x2 +