nearest neighbor matching

31
Nearest neighbor matching USING THE GREEDY MATCH MACRO Note: Much of the code originally was written by Lori Parsons http://www2.sas.com/proceedings/sugi26/p214-26.pdf This code has been written with simplicity as a primary concern. If you do not have a large number of controls, you may

Upload: trish

Post on 22-Feb-2016

65 views

Category:

Documents


2 download

DESCRIPTION

Nearest neighbor matching. USING THE GREEDY MATCH MACRO. Note: Much of the code originally was written by Lori Parsons http://www2.sas.com/proceedings/sugi26/p214-26.pdf This code has been written with simplicity as a primary concern. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Nearest neighbor matching

Nearest neighbor matchingUSING THE GREEDY MATCH MACRO

Note: Much of the code originally was written by Lori Parsonshttp://www2.sas.com/proceedings/sugi26/p214-26.pdf

This code has been written with simplicity as a primary concern. If you do not have a large number of controls, you may want to modify it

Page 2: Nearest neighbor matching

/* Define the library for formats */

LIBNAME saslib "G:\oldpeople\sasdata\" ;

OPTIONS NOFMTERR FMTSEARCH = (saslib) ;

Page 3: Nearest neighbor matching

/* Define the library for study data */

LIBNAME study "C:\Users\AnnMaria\Documents\shrug\" ;

Page 4: Nearest neighbor matching

Include the Macro

%INCLUDE 'C:\Users\AnnMaria\Documents\shrug\nearestmacro.sas' ;

Page 5: Nearest neighbor matching

%propen(libname, dsname, idvariable, dependent, propensity)

LIBNAME = directory for data setsDSNAME = dataset with study dataIDVARIABLE = subject ID variableDEPENDENT = dependent variablePROPENSITY = propensity score produced in logistic regression

Page 6: Nearest neighbor matching

%propen(study,allpropen,id,athome,prob);

FOR EXAMPLE

Remember, we already have the study.allpropen dataset with the propensity score (prob) from the PROC LOGISTIC we just did

Page 7: Nearest neighbor matching

Explaining the macroA Challenge

Page 8: Nearest neighbor matching

%macro propen(lib,dsn,id,depend,prob);

Data in5 ;set &lib..&dsn ;

Creates a temporary data set

Page 9: Nearest neighbor matching

Propensity scores rounded to 5, then 4, 2, 3 and 1 decimals

%Do countr = 1 %to 5 ;%let digits = %eval(6 - &countr) ;%let roundto = %eval(10**&digits) ;%let roundto = %sysevalf(1/&roundto) ;%let nextin = %eval(&digits - 1) ;

Page 10: Nearest neighbor matching

MACRO NOTES

%Do countr = 1 %to 5 ;/* Starts %DO loop */

Use %EVAL function to do integer arithmetic

%let digits = %eval(6 - &countr) ;

Use %SYSEVALF function to do non-integers

Page 11: Nearest neighbor matching

/* Output control to one data set, intervention to another */

/* Create random number to sort within group */

Page 12: Nearest neighbor matching

Create 2 data sets

DATA yes1 (KEEP= &prob id_y depend_y randnum) no1 (KEEP = &prob id_n depend_n randnum ) ;SET in&digits ;

We go through this loop 5 times and create data sets of records matching to 5, 4, 3, 2 and 1 decimal placesWe only keep four variables

Page 13: Nearest neighbor matching

Assignment statements

randnum = RANUNI(0) ;&prob = ROUND(&prob,&roundto) ;

Create a random number andRound propensity score to a set

number of digits

Page 14: Nearest neighbor matching

Output to Case Data set …IF &depend = 1 THEN DO ;

id_y = &id ;depend_y = &depend ;OUTPUT yes1 ;END ;

We need to rename the dependent & id variables or they’ll get overwritten

Page 15: Nearest neighbor matching

… Or output control data set

ELSE IF &depend = 0 THEN DO ;

id_n = &id ;depend_n = &depend ;OUTPUT no1 ;

END ;

Notice the data sets were named no1 and yes1It becomes evident why shortly

Page 16: Nearest neighbor matching

/* Runs through control and experimental and matches up to 20 subjects with identical propensity score */

Page 17: Nearest neighbor matching

%Do i = 1 %to 20 ;

  %let j = %eval(&i +1) ;proc sort data = yes&i ;

by &prob randnum ;data yes&i yes&j ;

set yes&i ;by &prob ;if first.&prob then output yes&i ;

else output yes&j ;

NOTE: Matching without replacement

Page 18: Nearest neighbor matching

Same thing for controlsproc sort data = no&i ;

by &prob randnum ;data no&i no&j ;

set no&i ;by &prob ;if first.&prob then output no&i ;

else output no&j ;

The randnum insures matching scores are pulled at random

Page 19: Nearest neighbor matching

Merge matches, end loopDATA match&i ;

MERGE yes&i (in= ina) no&i (in= inb) ;BY &prob ;IF ina AND inb ;

run ;%END ;

Page 20: Nearest neighbor matching

/* Adds all matches into a single data set */

 DATA allmatches ;

SET%DO k = 1 %TO 20 ; match&k %END ;

Concatenate all data sets with matches (N=20)

Page 21: Nearest neighbor matching

Create two data sets with IDs

DATA allyes (RENAME = (id_y = &id depend_y

= &depend))

allno (RENAME = (id_n = &id depend_n = &depend));

SET allmatches ;

Page 22: Nearest neighbor matching

Create one file of all matched IDsDATA matchfile ;

SET allyes allno ;

And sort it …

proc sort data = matchfile ;by &id &depend ;

Page 23: Nearest neighbor matching

proc sort data = in&digits ;by &id &depend ;

Page 24: Nearest neighbor matching

DATA MATCHES&DIGITS IN&NEXTIN ;MERGE IN&DIGITS (IN = INA)

MATCHFILE (IN= INB) ;BY &ID &DEPEND ;IF INA AND INB THEN OUTPUT

MATCHES&DIGITS ;ELSE OUTPUT IN&NEXTIN ;

 /* Creates a data set of all subjects with n-digit match *//* Creates a second data set of subjects with no match */

Page 25: Nearest neighbor matching

TITLE "MATCHES &ROUNDTO " ;PROC FREQ DATA = MATCHES&DIGITS ;

TABLES &DEPEND ;RUN ;%END ;

JUST A GOOD HABIT TO CHECK AS THE LOOP RUNS THROUGH

End loop. Now match to 4 decimal places, etc

Page 26: Nearest neighbor matching

/* Adds 1- to 5-digit matches into a single data set */

 data &lib..finalset ;

set%do m = 1 %to 5 ; matches&m %end ;

Page 27: Nearest neighbor matching

One final check & done !Title "Distribution of Dependent

Variable in &lib..finalset " ;proc freq data = &lib..finalset ;

tables &depend ;run;%mend propen;  run ;

Page 28: Nearest neighbor matching

Did it work?Variable

QUINTILES NEAREST NEIGHBOR

AT Home

NOT Home

Prob AT Home NOT Home

Prob

Age 79.2 79.3 .60 79.1 79.1 .76ER visits

4.5 ****

3.8 ****

.0001 4.2 4.2 .88

Female 52% 54% .36 50% 50% .74Race .97 .67

** P <.01 **** P < .0001

Page 29: Nearest neighbor matching

Model Comparison

TESTWithout

MatchingQuintile

MatchingNearest

NeighborLikelihood Ratio

643.1 180.8 186.6

Score 582.4 176.0 181.4Wald 485.6 165.7 170.4

Page 30: Nearest neighbor matching

Odds ratio

No Match Quintiles Nearest Neighbor

.154 .281 .269

6.5 : 1 3.6: 1 3.7 : 1

Page 31: Nearest neighbor matching

How near?Decimals # Matches

5 9024 143 1432 1011 38