missing data? two sas procedures to the rescue
TRANSCRIPT
![Page 1: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/1.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.Copy right © SAS Inst i tute Inc. Al l rights reserved.
Missing Data? Two SAS Procedures to the RescueHPIMPUTE and SURVEYIMPUTE
Melodie RushCustomer Success Principal Data ScientistConnect with me:LinkedIn: https://www.linkedin.com/in/melodierushTwitter: @Melodie_Rush
![Page 2: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/2.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
AGENDA
Introduction
Proc HPIMPUTE
Proc SURVEYIMPUTE
What, Why and How
Syntax, Imputation Options, Examples
Syntax, Imputation Options, Examples
![Page 3: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/3.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
What is Missing Data?Definition
In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data. - Wikipedia
![Page 4: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/4.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
What is Missing Data?SAS
Missing Value
• is a value that indicates that no data value is stored for the variable in the current observation. There are three kinds of missing values:
• numeric
• character
• special numeric
By default, SAS prints a missing numeric value as a single period (.) and a missing character value as a blank space. See Creating Special Missing Values for more information about special numeric missing values.
![Page 5: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/5.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Replace with Constant or Zero
Replace with mean or mode
Replace using an imputation method
Remove observation(s)
Wh
at s
ho
uld
yo
u d
o a
bo
ut
mis
sin
g va
lues
?
![Page 6: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/6.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTE
![Page 7: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/7.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
1. Syntax
2. Imputation Options
3. Other Options
Proc HPIMPUTE
![Page 8: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/8.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTE
The HPIMPUTE procedure executes high-performancenumeric variable imputation.
• takes only numeric variables.
• runs in either single-machine mode or distributed mode.
HPIMPUTE Procedure Documentation
![Page 9: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/9.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTESyntax
proc hpimpute options;
input variables;
impute variables <options>;
performance <performance options>;
id variables;
freq variables;
code <options>;<…>run;
![Page 10: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/10.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTE
• VALUE
– Replaces missing values with the specified value
• MEAN
– Replaces missing values with the algebraic mean of the variable
• RANDOM
– Replaces missing values with a random value that is drawn between the minimum and the maximum of the variable
• PMEDIAN
– Replaces missing values with the pseudomedian of the variable
Imputation Methods
![Page 11: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/11.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTE
• 6 variables
• First 4 have missing values
• Fifth is the frequency variable
• Last is an index variable
Example Data
Example Code and Documentation
![Page 12: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/12.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEExample Code – Value Method
Replaces missing values with the specified value
![Page 13: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/13.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEExample Results – Value Method
Variable Name
Indicator Name
ImputedVariable Name
Number Imputed
![Page 14: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/14.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEExample Output Data – Value Method
![Page 15: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/15.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEExample Code – Mean Method
Replaces missing values with the algebraic mean of the variable
![Page 16: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/16.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEExample Results – Mean Method
Variable Name
Indicator Name
ImputedVariable Name
Number Imputed
![Page 17: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/17.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEExample Output Data – Mean Method
![Page 18: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/18.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEExample Code – Random Method
Replaces missing values with a random value that is drawn between the minimum and the maximum of the variable
![Page 19: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/19.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEExample Results – Random Method
Variable Name
Indicator Name
ImputedVariable Name
Number Imputed
![Page 20: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/20.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEExample Output Data – Random Method
![Page 21: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/21.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEExample Code – Pseudo Median Method
Replaces missing values with the pseudo median of the variable
![Page 22: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/22.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEExample Results – Pseudo Median Method
Variable Name
Indicator Name
ImputedVariable Name
Number Imputed
![Page 23: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/23.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEExample Output Data – Pseudo Median Method
![Page 24: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/24.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEID Statement
![Page 25: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/25.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEID Statement
• The optional ID statement lists one or more variables from the input data set that are transferred to the output data set.
• The ID statement accepts numeric and character variables.
![Page 26: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/26.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEFREQ Statement
• The variable in the FREQ statement identifies a numeric variable in the data set that contains the frequency of occurrence for each observation.
• PROC HPIMPUTE treats each observation as if it appeared n times, where n is the value of the FREQ variable for the observation.
![Page 27: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/27.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEFREQ Statement
• If the frequency value is not an integer, it is truncated to an integer.
• If the frequency value is less than 1 or missing, the observation is not used in the analysis.
• When the FREQ statement is not specified, each observation is assigned a frequency of 1.
![Page 28: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/28.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEFREQ Statement Results
Results with FREQ Statement Without
![Page 29: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/29.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEFREQ Statement Results
Results with FREQ Statement Without
![Page 30: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/30.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTESyntax - CODE Statement
proc hpimpute data=ex1 out=out1;
id id;
input a b c d;
impute a / value=0.1;
impute b / method=pmedian;
impute c / method=random;
impute d / method=mean;
code file='c:/temp/hpimpute.sas';
run;
The CODE statement generates SAS DATA step code that mimics the computations that are performed when the IMPUTE statement runs in
single-machine mode and uses a single thread.
![Page 31: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/31.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEResults - CODE Statement
%let HPDM_seed=5;
if a = . then do;
M_a = 1;
IM_a = 0.1;
end;
else do;
M_a = 0;
IM_a = a;
end;
length M_a IM_a 8;
if b = . then do;
M_b = 1;
IM_b = 3;
end;
else do;
M_b = 0;
IM_b = b;
end;
length M_b IM_b 8;
A B
![Page 32: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/32.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEResults - CODE Statement
HPDM_vmin = 1;
HPDM_vmax = 10;
if c = . then do;
M_c = 1;
IM_c = HPDM_vmin + (HPDM_vmax –
HPDM_vmin)*ranuni(&HPDM_seed);
end;
else do;
M_c = 0;
IM_c = c;
end;
length M_c IM_c 8;
if d = . then do;
M_d = 1;
IM_d = 5.5;
end;
else do;
M_d = 0;
IM_d = d;
end;
length M_d IM_d 8;
CD
![Page 33: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/33.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTESyntax - PERFORMANCE Statement
proc hpimpute data=ex1 out=out1;
id id;
input a b c d;
impute a / value=0.1;
impute b / method=pmedian;
impute c / method=random;
impute d / method=mean;
performance nodes=0;
run;
• Defines performance parameters for multithreaded and distributed computing, passes variables that describe the distributed computing environment, and requests detailed results about the performance characteristics of the HPIMPUTE procedure.
• Also use the PERFORMANCE statement to control whether the HPIMPUTE procedure executes in single-machine or distributed mode.
Performance Statement Documentation
![Page 34: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/34.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEResults – Performance Statement
![Page 35: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/35.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTESyntax – Performance Statement
Running in a high-performance environment
option set=GRIDHOST="&GRIDHOST";
option set=GRIDINSTALLLOC="&GRIDINSTALLLOC";
proc hpimpute data=ex1 out=out1;
id id;
input a b c d;
impute a / value=0.1;
impute b / method=pmedian;
impute c / method=random;
impute d / method=mean;
performance nodes=2 details
host="&GRIDHOST" install="&GRIDINSTALLLOC";
run;
![Page 36: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/36.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc HPIMPUTEResults – Performance Statement
Running in a high-performance environment
![Page 37: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/37.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTE
![Page 38: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/38.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
SURVEY Procedures
➢SURVEYSELECT
➢SURVEYIMPUTE
➢SURVEYMEANS
➢SURVEYFREQ
➢SURVEYREG
➢SURVEYLOGISTIC
➢SURVEYPHREG
Sample selection
Imputation
Descriptive statistics
Frequency tables
Linear models
Logistic regression
Proportional hazards
SAS/Stat
![Page 39: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/39.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
1. Syntax
2. Imputation Options
3. Analyzing Results
Proc SURVEYIMPUTE
![Page 40: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/40.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Handling Missing Values in Survey Data
• How are the data collected?
• How are the missing values imputed?
Different imputation methods require different analysis techniques
Analysis of Imputed Data
![Page 41: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/41.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Handling Missing Values in Survey DataThe Nonresponse Problem
ID Income
1 40
2 120
3 60
4 80
5
6 370
7 210
• Prevention is the best solution for nonresponse
• Information is the best tool for imputation
Average household income = 147
Average household income = 190
450
Tax Return
42
116
55
84
410
320
230
![Page 42: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/42.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
PROC SURVEYIMPUTE
The SURVEYIMPUTE procedure imputes missing values of an item in a sample survey by replacing them with observed values from the same item.
Imputation methods include • Single and Multiple Hot-Deck Imputation• Approximate Bayesian Bootstrap (ABB) Imputation• Fully Efficient Fractional Imputation (FEFI)• Fractional Hot-deck Imputation (FHDI)
PROC SURVEYIMPUTE Documentation
![Page 43: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/43.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Handling Missing Values in Survey DataPROC SURVEYIMPUTE Syntax
proc surveyimpute options;
cluster variables;
repweights variables;
strata variables;
weight variable;
cells variables;
var variables;
by variables;
class variables;
id variable;
output options;
<…>run;
![Page 44: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/44.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTESyntax - Method=HotDeck
Imputation techniques that use observed values from the sample to impute (fill in) missing values are known as hot-deck imputation.
proc surveyimpute data=work.surveyimpute;
var income;
output out=hotdeck;
run;
![Page 45: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/45.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEExample Results – Method=HotDeck
![Page 46: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/46.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEExample Output Data – Method=HotDeck
![Page 47: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/47.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEHot-Deck Imputation
9
5
5
7
2
1
4
9
55
7
21
4
8 87
4
7
4
Data Imputation Cells Donors Recipients
![Page 48: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/48.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
proc surveyimpute data=work.surveyimpute
method=hotdeck(selection=SRSWOR)
ndonors=1 seed=8523;
cells cell2;
var income;
id ID;
output out=hotdeck donorid;
run;
Proc SURVEYIMPUTESyntax Method=HotDeck
The SELECTION= option modifies the donor selection
Imputation techniques that use observed values from the sample to impute (fill in) missing values are known as hot-deck imputation.
![Page 49: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/49.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEExample Results – Method=HotDeck
![Page 50: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/50.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEExample Output Data – Method=HotDeck
![Page 51: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/51.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
proc surveyimpute data=work.surveyimpute
method=hotdeck(selection=abb)
ndonors=1 seed=8523;
cells cell2;
var income;
id ID;
output out=hotdeckb donorid;
run;
Proc SURVEYIMPUTESyntax Method=HotDeck Selection=ABB
SELECTION= option modifies the donor selection
Hot Deck that requests donor selection by using the approximate Bayesian bootstrap method. For more information, see the section Approximate Bayesian Bootstrap
![Page 52: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/52.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEApproximate Bayesian Bootstrap
9
55
7
21
4
85
4
5
4
5 5
9
9
8
42
2
Donor Pool Donors
SRSWR
SRSWR
SRSWR
SRSWR
Imputation Cells Recipients
![Page 53: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/53.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEExample Results – Method=HotDeck Selection=ABB
![Page 54: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/54.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEExample Output Data – Method=HotDeck Selection=ABB
![Page 55: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/55.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTE
• Uses multiple donor units for a recipient unit.
• The number of donor units for a recipient unit is equal to the number of observed levels for the missing items.
• Each donor donates a fraction of the original weight of the recipient unit such that the sum of the fractional weights from all the donors is equal to the original weight of the recipient.
• Does not introduce additional variability that is caused by the selection of donor units.
• One disadvantage is that it can greatly increase the size of the imputed data set.
Fully Efficient Fractional Imputation (FEFI)
Fully Efficient Fractional Imputation Documentation
![Page 56: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/56.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
proc surveyimpute data=work.surveyimpute
method=FEFI;
cells cell2;
var income;
class income;
id ID;
output out=FEFI;
run;
Proc SURVEYIMPUTESyntax Method=FEFI
The Class Statement required for FEFI
Fully Efficient Fractional Imputation
![Page 57: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/57.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Handling Missing Values in Survey DataFully Efficient Fractional Imputation
9
55
7
21
4
8
5 7
1
8 9
42
9
55
7
21
4
8
5 7
1
8 9
42
Imputation Cells Donors Imputed DataRecipients
![Page 58: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/58.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEExample Results – Method=FEFI
![Page 59: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/59.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEExample Output Data – Method=FEFI
![Page 60: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/60.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTE
• Uses multiple donor units for a recipient unit. • Each donor donates a fraction of the original weight of the recipient unit such
that the sum of the fractional weights from all the donors is equal to the original weight of the recipient.
• The fraction of the recipient weight that a donor unit contributes to the recipient unit is known as the fractional weight.
• The donors are selected by using probability proportional to size (PPS) selection in which the two-stage FEFI weights are used as the size measure.
• FHDI is useful for reducing the size of the imputed data when two-stage FEFI creates many imputed rows. – FHDI follows the same imputation steps as those of two-stage FEFI, but FHDI selects
a subset of second-stage donor cells from all possible second-stage donor cells for the imputation.
Fractional Hot-Deck Imputation (FHDI)
Fractional Hot-Deck Imputation Documentation
![Page 61: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/61.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
proc surveyimpute data=work.surveyimpute2
method=FHDI ndonors=3 seed=8523;
cells cell2;
var income age (clevvar=agegroup);
class income;
id ID;
output out=FHDI;
run;
Proc SURVEYIMPUTESyntax Method=FHDI
The At least 2 missing values for each row (one continuous with a binned version)
Fractional Hot-Deck Imputation
![Page 62: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/62.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Handling Missing Values in Survey DataData - Method=FHDI
450
![Page 63: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/63.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SurveyImputeMethod=FHDI
1
2
345678
![Page 64: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/64.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEExample Results – Method=FHDI
![Page 65: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/65.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEExample Output Data – Method=FHDI
![Page 66: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/66.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Proc SURVEYIMPUTEExample Output Data – Method=FHDI
![Page 67: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/67.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Handling Missing Values in Survey Data
Ignore the imputation variance
Hot-Deck Analysis: Statements
proc surveymeans data=hotdeck3;
var income;
repweights RepWt_: /Jkcoefs=0.857;
run;
![Page 68: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/68.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Handling Missing Values in Survey DataFEFI Analysis: Statements
Use the WEIGHT and REPWEIGHTS statements
proc surveymeans data=fefi;
var income;
weight ImpWt;
repweights ImpRepWt_: / jkcoefs=0.857;
run;
![Page 69: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/69.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Handling Missing Values in Survey DataComparing the Estimates
Estimates for Average Income
Imputation Method
Estimate Standard Error
No Missing 190.00 61.10
No Imputation 146.70 50.97
Hot-Deck 178.57 53.60
FEFI 159.04 54.43
*FHDI 167.71 27.25
▪ Same analysis but different results
* FHDI based on different data set with 20 rows versus 7 in other methods
![Page 70: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/70.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Handling Missing Values in Survey Data
• PROC SURVEYIMPUTE is the tool for imputing missing values from complex surveys
• FEFI introduces no additional variability from the imputation and is the preferred method for survey data
• FHDI is the preferred method for continuous data
• The analysis technique should be tailored to both the survey design and the imputation method
Handling Nonresponse in SAS/STAT
![Page 71: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/71.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.Copy right © SAS Inst i tute Inc. Al l rights reserved.
ResourcesWhere to learn more
![Page 72: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/72.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Where to learn more?SAS Documentation
• Working with Missing Data in SAS
• Proc HPIMPUTE Documentation
• Proc SURVEYIMPUTE Documentation
• Handling Missing Values in Survey Data((Video)
• Proc SURVEYIMPUTE References
![Page 73: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/73.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Where to learn more?Papers
• Mukhopadhyay, P. K. (2016). “Survey Data Imputation with PROC SURVEYIMPUTE” In Proceedings of the SAS Global Forum 2016 Conference. Cary, NC: SAS Institute Inc.
• Stokes, Maura (and Statistical R&D Staff). “SAS/STAT 14.1: Methods for Massive, Missing, or Multifaceted Data” In Proceedings of the SAS Global Forum 2015 Conference. Cary NC: SAS Institute Inc.
• Cutler, D. Richard. “Machine Learning and Predictive Analytics in SAS® Enterprise Miner™ and SAS/STAT® Software” In the Proceedings of the SAS Global Forum 2019 Conference. Cary NC: SAS Institute Inc.
![Page 74: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/74.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.
Where to learn more?Book
Complex Survey Data Analysis with SAS
![Page 75: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/75.jpg)
FIND YOUR
USER GROUP
sas.com/usersgroups
You should do the following (if you’re not already):
◊ Tap into local resources◊ Learn from other SAS Users’
experiences◊ Connect with the local SAS
Users’ network
![Page 76: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/76.jpg)
ARE YOU AN EXPLORERWhether you’re a modeler, programmer, administrator, everyone is welcome on SAS Analytics Explorers!
More ways to:◊ Learn SAS◊ Get support◊ Connect with users across the US
Ready to become an explorer? Got questions?explorers.sas.com
?
![Page 77: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/77.jpg)
ASK THE EXPERTDON’T BE SHY,
Tips & tricks webinars on a variety of SAS topics plus get all your questions answered by the SAS expert, live.
sas.com/asktheexpert
![Page 78: Missing Data? Two SAS Procedures to the Rescue](https://reader031.vdocuments.mx/reader031/viewer/2022012019/61687246d394e9041f6fa25a/html5/thumbnails/78.jpg)
Copy right © SAS Inst i tute Inc. Al l rights reserved.Copy right © SAS Inst i tute Inc. Al l rights reserved.
sas.com
Thank you for your time and attention!Questions?
Connect with me:LinkedIn: https://www.linkedin.com/in/melodierushTwitter: @Melodie_Rush