outliers and influential data points in regression analysis james p. stevens sujin jang november 10,...
Post on 22-Dec-2015
227 views
TRANSCRIPT
Outliers and Influential Data Points in Regression Analysis
James P. Stevens
sujin jangnovember 10, 2008
Beware of Outliers
• Regression is sensitive to outliers– Important to detect outliers and influential points
• Summary stats can be misleading…– Important to explore the data, rather than relying
on just 1-2 summary stats
So what should we do?
• Ways of Detecting Outliers:– Studentized residuals for outliers on y– Mahalanobis distance &Hat matrix for outliers in
the space of predictors
Types of Outliers• Classifying Outliers:
- Outliers in the space of outcomes (outliers on y)- Outliers in the space of predictors (outliers on x)
So what should we do?
• Ways of Detecting Outliers:– Studentized residuals for outliers on y– Mahalanobis distance &Hat matrix for outliers in
the space of predictors
So what should we do?
• Ways of Detecting Outliers:– Studentized residuals for outliers on y– Mahalanobis distance &Hat matrix for outliers in
the space of predictors
BUT…The points they identify will not necessarily be influential in affecting the regression coefficients…
Cook’s Distance:Identifying Influential Points
• A measure of the change in the regression coefficients that would occur if the case was omitted. – Affected by both the case being an outlier on y and in
the set of predictors – Measures the joint (combined) influence on the case
being an outlier on y and on x
Now what?
Step 1. Detect Step 2. IsolateStep 3. Examine
-Are they qualitatively different?-Are they influential?Another thing to consider:
influential “clusters”?
Now what?
Step 1. Detect Step 2. IsolateStep 3. Examine
-Are they qualitatively different?-Are they influential?
Step 4. Delete or retain as you see fit … Or try both