![Page 1: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/1.jpg)
Screening the Data
Tedious but essential!
![Page 2: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/2.jpg)
Missing Data• Missing Not at Random (MNAR)• Missing at Random (MAR)• Missing Completely at Random (MCAR)
![Page 3: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/3.jpg)
Missing Not at Random (MNAR)
• Are missing cases on Y• Missingness is related to the value of Y• Faculty salaries – those with high salaries
may be reluctant to reveal them• Estimates of mean Y will be biased if use
just the available data
![Page 4: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/4.jpg)
Missing at Random (MAR)
• Missingness on Y not related to value of Y• Or is related but through other variables
on which we have data.• Faculty salary related to rank.• Higher rank = higher salary• If missingness is random within each rank,
within-rank estimates will be unbiased.• Overall mean = weighted sum of within-
rank estimates
![Page 5: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/5.jpg)
Missing Completely at Random (MCAR)
• There is no variable, observed or not, that is related to missingness of Y.
• Ideal, not likely ever absolutely true.
![Page 6: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/6.jpg)
Finding Patterns of Missingness
• There is specialized software. You do not have it.
• Can use SAS.• Can use SPSS with home license code.• Create missingness dummy variable• 0 = not missing, 1 = missing• Relate missingness to other variables.
![Page 7: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/7.jpg)
Dealing with MCAR Data
• Delete Cases: Will create no bias, but will lower power and precision.
• Mean Substitution: For each missing value, substitute the group mean on that value. No bias for means, but will reduce standard deviations.
![Page 8: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/8.jpg)
Dealing with MCAR Data
• Regression: For each missing score, develop a multiple regression to predict score from other variables. Impute that predicted score. Regression towards mean will reduce variability.
![Page 9: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/9.jpg)
Dealing with MAR Data
• Deletion of Variables: If another variable can serve as a proxy.
• Multiple Imputation – specialized software, may eliminate bias– Involves resampling techniques to generate
several sets of predictions of missing scores– Analyze each set and then average the
results across sets.
![Page 10: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/10.jpg)
Dealing with MNAR Data
• Sophisticated methods may reduce, but not eliminate, bias.
• Pairwise Correlation Matrix – use as input to multivariate procedures. Different correlations will be based on different subsets of the data. Can produce very strange results, not recommended.
![Page 11: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/11.jpg)
Missing Item Data Within Unidimensional Scale
• Assume each item measures the same construct.
• For each subject, compute the means on the items which do have data.
• Set to missing the scale scores for subjects who have answered fewer than a threshold number of items.
![Page 12: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/12.jpg)
Identifying Outliers
• Univariate: Box and whiskers plots• Multivariate: Compute Mahalanobis
Distance or Leverage. Investigate cases with high values. Use outlier dummy variable to compare outliers with inliers.
• Regression Diagnostics:o Leverage: Cases with unusual values on the
predictor variables
![Page 13: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/13.jpg)
Outliers
o Standardized Residuals: Cases whose actual Y is far from predicted Y.
o Cook’s D: Cases with values that make them have great influence on the regression solution.
![Page 14: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/14.jpg)
Dealing with Outliers
• Investigate: May be bad data. May be able to correct the data, may not. May represent cases not properly considered part of the population of interest.
• Out-of-Range Values: Even if not outliers, these are bad data that need correction.
![Page 15: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/15.jpg)
Dealing with Outliers
• Set to Missing: If all else fails.• Delete the Case: For example, if
convinced the respondent was not even reading the questions.– “I frequently visit planets outside of our solar
system.”– “I make all of my own clothes.”
• Delete the Variable: Last resort when it has many cases with missing data.
![Page 16: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/16.jpg)
Dealing with Outliers
• Transform the Variable: If outliers are valid but contributing to skewness.
• Change the Score: For example, reduce very high score to value a small bit higher than the remaining highest score. See Howell’s discussion of “Winsorizing.”
![Page 17: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/17.jpg)
Assumptions of the Analysis
• Check Outliers First: Dealing with outliers may resolve the problems below.
• Normality: Look at plots and measures of skewness and kurtosis. Ignore tests of significance, like Kolgomorov-Smirnov. May need to use different analysis.
• Homogeneity of Variance: Does the variance differ considerably across groups? May need to transform or use different analysis.
![Page 18: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/18.jpg)
Assumptions of the Analysis
• Homoscedasticity: Carefully inspect the residuals. May need to transform data or use a different analysis.
• Homogeneity of Variance/Covariance Matrices (across groups): Box’s M.
• Sphericity: For univariate-approach related samples ANOVA. Check with Mauchley’s Test. Correct the df or use a multivariate approach instead.
![Page 19: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/19.jpg)
Assumptions of the Analysis
• Homogeneity of Regression: In ANCOV, we assume the relationship between Y and the predictors is constant across groups. Test the Groups x Predictor(s) interactions.
• Linear Relationships: Look at plots. If necessary, transform variables or use curvilinear techniques.
![Page 20: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/20.jpg)
Multicollinearity
• One predictor is nearly perfectly correlated with the other predictors.
• Makes the regression coefficients unstable across random samples from the same population.
• Makes complicated the interpretation of unique effects.
![Page 21: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/21.jpg)
Detecting Multicollinearity
• For each predictor, compute the R2 between it and the other predictors. If very high (.9 or more), there is a problem.
• SAS will compute tolerance= (1 – that R2 ). If very low, there is a problem.
• If R2 = 1, the correlation matrix is singulair, cannot be inverted, the analysis crashes– Predictors = Verbal SAT, Math SAT, Total SAT.
![Page 22: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/22.jpg)
Variance Inflation Factor
• VIF = 1/tolerance. If high, there is a problem.
• How High? • Some say 10, some say 5, a few say 2.5.• If R2 = .9, tolerance = .1, VIF = 10.
![Page 23: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/23.jpg)
Dealing with Multicollinearity
• Drop a Predictor – may resolve the problem.
• Combine Predictors – into a composite variable
• Principle Components Analysis – conduct the analysis on the resulting weighed linear combinations of the variables. Can then transform the results back to the original variables.
![Page 24: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/24.jpg)
SAS 1• Look at the command lines in the SAS
program.• Always give every case a unique ID
number, so you can locate it later.• Label variables if their SAS name is not
informative.• input ID 1-3 @5 (Q1-Q138) (1.);label Q1='Sex' Q3 = 'Age';
![Page 25: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/25.jpg)
SAS 2
• Recode values that represent missing data.
• On several variables, such as “number of biological brothers,” response 5 was “do not know.”
• if Q15 = 5 then Q15 = . ; if Q16 = 5 then q16 = . ;
![Page 26: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/26.jpg)
SAS 3 & 4
• Transform variable to reduce positive skewness
• age_sr = sqrt(Q3); age_log = log10(Q3); age_inv = -1/(Q3);
• Dichotomize variable – transformation of last resort.
• if q3 = 1 then age_di = 1; else if q3 > 1 then age_di = 2;
![Page 27: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/27.jpg)
SAS 5 & 6
• Create composite variable• SIBS = Q15 + Q16;• Transform to reduce positive skewness• sibs_sr = sqrt(sibs);sibs_log = log10(sibs);sibs_in = -1/sibs;
![Page 28: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/28.jpg)
SAS 7
• Create mental variable and associated missingness variable.
• MENTAL = Q62 + Q65 + Q67;MentalMiss = 0;If Mental=.then MentalMiss = 1;
![Page 29: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/29.jpg)
SAS 8
• Transform to reduce negative skewness• Mental2 = Mental*Mental;Mental3 = Mental**3;Ment_exp = EXP(Mental);R_Ment = 13 - Mental;R_Ment_sr = sqrt(R_Ment); R_Ment_log = log10(R_Ment);
![Page 30: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/30.jpg)
SAS 9
• Dichotomize Mental• if 0 LE Mental LE 9 then Ment_di=1;else if Mental > 9 then Ment_di=2;
• Be careful – SAS codes missing data with an extreme negative number.
![Page 31: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/31.jpg)
SAS 10
• Check for missing data and out-of-range values.
• proc means min max n nmiss;var q1-q10 q50-q70; run;
![Page 32: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/32.jpg)
SAS 11
• Check for skewness & kurtosis• proc means min max n nmiss skewness kurtosis;var Q3 age_sr -- Mental Mental2 -- R_Ment_log; run;
![Page 33: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/33.jpg)
SAS 12
• Check distributions of variables with few values
• proc freq;tables q3 age_di sibs mental ment_di; run;
![Page 34: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/34.jpg)
SAS 13
• Locate cases with bad data• data duh; set delmel;if q9 > 3;proc print; var q9; id id; run;
• Case 159 has out-of-range on item Q9.
![Page 35: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/35.jpg)
SAS 14
• Check correlates of missingness.• proc corr nosimple data=delmel; var MentalMiss;with Q1 Q3 Q5 Q6 sibs; run;
• MentalMiss negatively correlated with sibs.• Duh, some subjects have missing data on
number of brothers or number of sisters.• Instead of Mental = Q62+Q65+Q67, use
Mental = Mean(of Q62 Q65 Q67);
![Page 36: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/36.jpg)
Multidimensional Outliers
• investigate observations with leverage greater than 2p/n, “where n is the number of observations used to fit the model, and p is the number of parameters in the model.”
• 4 variables: Q1 Q3 Q6 mental + intercept• 193 observations• 2*5/193 = .052
![Page 37: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/37.jpg)
SAS 15
• Identify multivariate outliers• proc reg data=delmel;model id = Q1 Q3 Q6 mental; output out=hat H=Leverage; run;data outliers; set hat;if leverage > .052;
![Page 38: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/38.jpg)
SAS 15
• Identify multivariate outliers• proc print; var id Q1 Q3 Q6 mental leverage; run;proc means mean;var Q1 Q3 Q6 mental; run;
• As a group, the outliers are older than the overall sample.
• All three students aged 25 or older are included among the outliers.
![Page 39: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/39.jpg)
Survey Scoundrels
• These sloths do not even read the questions, they just answer randomly to get whatever incentive is available for completing the survey.
• My daughter’s shock upon discovering this.
• Monitor how long it takes respondents to complete the survey.
![Page 40: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/40.jpg)
Items to Help Detect Scoundrels
• Repeat same item, compare responsese• “I frequently visit with aliens from other
planets.”• “I make all of my own clothes.”
![Page 41: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/41.jpg)
![Page 42: Screening the Data Tedious but essential!. Missing Data Missing Not at Random (MNAR) Missing at Random (MAR) Missing Completely at Random (MCAR)](https://reader035.vdocuments.mx/reader035/viewer/2022062304/56649dc55503460f94ab7e1b/html5/thumbnails/42.jpg)