merging in sas these slides show alternatives regarding the merge of two datasets using the in data...

10
Merging in SAS • These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”, “SAS Language Reference: Dictionary” > “Data step options” > “IN=“ • In the slides, the red data goes into the merged data set. The greyed out observations are left out.

Post on 21-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

Merging in SAS

• These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”, “SAS Language Reference: Dictionary” > “Data step options” > “IN=“

• In the slides, the red data goes into the merged data set. The greyed out observations are left out.

Page 2: Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

The perfect mergeDataset A Dataset B

ID V1 V2 ID V3 V4

1 123 123 1 343 343

2 421 434 2 85 4234

3 129 436 3 325 434

4 122 767 4 763 234

5 232 34 5 229 324

6 534 435 6 554 324

7 343 89 7 884 34

8 324 6787 8 895 342

Page 3: Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

Not so perfect (if a or b;)Dataset A (in=a) Dataset B (in=b)

ID V1 V2 ID V3 V4

1 343 343

2 421 434 2 85 4234

3 129 436

4 122 767 4 763 234

5 229 324

6 534 435 6 554 324

7 343 89

8 324 6787 8 895 342

Page 4: Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

If a=b; (both datasets contribute)Dataset A (in=a) Dataset B (in=b)

ID V1 V2 ID V3 V4

1 343 343

2 421 434 2 85 4234

3 129 436

4 122 767 4 763 234

5 229 324

6 534 435 6 554 324

7 343 89

8 324 6787 8 895 342

Page 5: Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

If a; (must be in dataset A)Dataset A (in=a) Dataset B (in=b)

ID V1 V2 ID V3 V4

1 343 343

2 421 434 2 85 4234

3 129 436 . . .

4 122 767 4 763 234

5 229 324

6 534 435 6 554 324

7 343 89 . . .

8 324 6787 8 895 342

Page 6: Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

If b; (must be in dataset B)Dataset A (in=a) Dataset B (in=b)

ID V1 V2 ID V3 V4

. . 1 343 343

2 421 434 2 85 4234

3 129 436

4 122 767 4 763 234

. . 5 229 324

6 534 435 6 554 324

7 343 89

8 324 6787 8 895 342

Page 7: Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

Notes

• The examples assume there is a unique identifier. This can be either one variable (ex, CRSP's PERMNO or Compustat's GVKEY) or more than one variable (for example, PERMNO and DATE for a panel dataset).

• Assumption: Both data sets are sorted by the unique identifier(s).

Page 8: Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

Sample codeproc sort data=yourdata; by permno date; proc sort data=otherdata; by permno date; data newdata; merge yourdata (in=a) otherdata (in=b); by permno date; /* note by variables are in the same order */ /* as the sort by variables) */ /* below this, you write your control statement, one of the following */ if a; if b; if a and b; if not a; if not b;

Page 9: Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

Typical problems

• If both datasets were complete (they both have the same observed units, then the IF statements would be unnecessary; "if a and b" would be equivalent to leaving the statement out altogether)

• If you do not have a BY statement (no identifier -- you somehow know that each row of one datasets corresponds to the same one row in the other dataset), the datasets are just "glued" side-by-side.

• Common mishaps: the by variables have different formats across datasets, SAS will merge the datasets, but will put a WARNING in the log. Another common mishap is to have variables with the same name (that are not the ID) -- one of the will be overwritten.

Page 10: Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

References

Good references are • http://ftp.sas.com/techsup/download/technote/

ts644.html• and a manual called "Combining and modifying

SAS data sets: examples", which is in the RC library. It has a lot of example. Unfortunately, it does not exist in an online version (only the code is available, but the explanations are very good).