12. combining sas datasetsmaths.cnam.fr/img/pdf/sas_12_cle4b9e45.pdf · giorgio russolillo - cours...

37
12. Combining SAS datasets GIORGIO RUSSOLILLO - Cours de prépara)on à la cer)fica)on SAS «Base Programming» 269

Upload: others

Post on 18-Apr-2020

15 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

12.CombiningSASdatasets

GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming» 269

Page 2: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Appendingdatasetsindifferentsitua)ons

270GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

PROCPRINTDATA=Lib9_3.emps;RUN;PROCPRINTDATA=Lib9_3.emps2008;RUN;PROCPRINTDATA=Lib9_3.emps2009;RUN;PROCPRINTDATA=Lib9_3.emps2010;RUN;

Page 3: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

PROCAPPEND

271GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

PROCAPPENDBASE=base-datasetDATA=data-dataset;RUN;

DATAwork.emps;SETLib9_3.emps;

RUN;PROCAPPENDBASE=work.empsDATA=Lib9_3.emps2008;RUN;PROCPRINTDATA=work.emps;RUN;

-  BASE=namesthedatasettowhichobserva)onsareadded-  DATA=namesthedatasetcontainingobserva)onsthatareaddedtothebasedataset

SASdoesnotcreateanewdataset:PROCAPPENDaddstheobserva)onsofadatasetattheendofthebase-dataset-  Onlytwodatasetscanbeusedata)meinonestep-  Theobserva)onsinthebase-datasetarenotread-  Thevariableinforma)oninthedescriptorpor)onofthebase-datasetcannotchange

Page 4: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Asecondexample

272GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

PROCAPPENDBASE=work.empsDATA=Lib9_3.emps2009;RUN;PROCPRINTDATA=work.emps;RUN;

Ifthebase-datasetcontainsvariable(s)notpresentinthedata-dataset,SASspecifiesamissingvalueforthisvariable(s)

Page 5: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

TheFORCEop)on

273GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

PROCAPPENDBASE=work.empsDATA=Lib9_3.emps2010;RUN;PROCPRINTDATA=work.emps;RUN;

PROCAPPENDBASE=base-datasetDATA=data-datasetFORCE;RUN;

Weneeditwhenthedata-datasetcontainsvariablesthat-  Arenotinthebase-dataset-  Areofdifferenttype(character/numeric)-  Arelongerthanthevariablesinthebase-dataset

Thisprogramwillnotdoitsjob..Seethelog

Page 6: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

WhatSASdoeswhentheFORCEop)onisused

274GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

PROCAPPENDBASE=work.empsDATA=Lib9_3.emps2010FORCE;RUN;PROCPRINTDATA=work.emps;RUN;

Page 7: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

UsingtheDATAstepforconcatena)ngdatasets

275GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

DATAoutput-dataset;SETdataset1dataset2…;<addi+onalSASstatements>;

RUN;-  output-datasetnamesthedatasettobecreated-  dataset1dataset2arethedatasetstoberead

HowtheDATAstepworkswithtwodatasets:-  SASreadsvariablesfromthefirstdatasetandaPDViscreated-  SASreadsvariablesfromtheseconddatasetandeventuallythePDViscompleted-  SASreadseachobserva)oninthefirstdataset,writesitinthePDVandfromthere

intheoutputdataset-  SASreadseachobserva)onintheseconddataset,writesitinthePDVandfrom

thereintheoutputdatasetThenewdatasetcontainsallofthevariablesandobserva)onsfromalloftheinputdatasets

Thisprogramcreatesanewdatasetinwhichobserva)onsofsuccessivedatasetsareaddedattheendofthepreviousdatasest,aslistedintheSETstatement

Page 8: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Concatena)ngdatasetswithdifferentvariables

276GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

PROCPRINTDATA=Lib9_3.empsdk;RUN;PROCPRINTDATA=Lib9_3.empsfr;RUN;PROCPRINTDATA=Lib9_3.empscn;RUN;PROCPRINTDATA=Lib9_3.empsjp;RUN;

DATAempsall1;SETLib9_3.empsdkLib9_3.empsfr;

RUN;PROCPRINTDATA=empsall1;RUN;

Samename

DATAempsall2;SETLib9_3.empscnLib9_3.empsjp;

RUN;PROCPRINTDATA=empsall2;RUN;

Differentname

empsdk

empsfr empsjp

empscn

Page 9: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

RENAME=op)on

277GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

(RENAME=(old-name1=new-name1 old-name2=new-name2 … old-nameN=new-nameN))

DATAempsall2;SETLib9_3.empscn Lib9_3.empsjp(RENAME=(Region=Country));

RUN;PROCPRINTDATA=empsall2;RUN;

-  old-name:thevariabletoberenamed-  new-name:thenewnameforthevariable-  Youcanuserenameindifferentstatementsofadatastep(e.g.MERGE)

YoucanrenameavariableusingtheRENAMEop)oninaDATAstep

N.B.:RENAMES=op)ondoesnotrenamethevariableintheinputdataset,butittellsSASwhichslotinthePDVtousewhenbuildingtheobserva)ons

Page 10: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Howconcatena)onworksiftherearecommonvariables

278GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

Ifvariablesfromdifferentinputdatasetssharethesamenameand:-  Differenttypea{ribute:

SASstopsproceedingstheDATAstepandgivesanerrormessage

-  Different length, label, format orinformata{ributes:SAS takes the a{ributes of thevariableinthefirstdataset

PROCCONTENTSDATA=Lib9_3.empscn;RUN;DATAempscn;

LENGTHCountry$5;SETLib9_3.empscn;

RUN;PROCPRINTDATA=empscn;RUN;DATAempsall3;

SETempscnLib9_3.empsdk;RUN;PROCPRINTDATA=empsall3;RUN;

Page 11: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

AppendingvsConcatena)ng(1)

279GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

Page 12: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

AppendingvsConcatena)ng(2)

280GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

Page 13: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Interleavingdatasets

281GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

Page 14: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

DATAstepforinterleaving

282GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

DATAempsall4;SETLib9_3.empscnLib9_3.empsjp(RENAME=(Region=Country));BYFirst;

RUN;PROCPRINTDATA=empsall4;RUN;

-  dataset:namesthedatatobecreated-  SET-dataset:thedatasetstoberead-  BY-variable(s):variablesusedtointeleaveobserva)on

DATAdataset;SETSET-dataset1SET-dataset2..;BY<DESCENDING>BY-variable(s);<addi+onalstatements>;

RUN;

N.B.:EachinputdatasetmustbesortedbytheBY-variable

-  SASsortsduplicateBY-valuesindifferentdatasetsintheorderinwhichtheirdatasetsarelistedintheSETstatement

-  SASsortsduplicateBY-valuesinadatasetintheorderinwhichtheyappearinthedataset

Page 15: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

One-to-onereading

283GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

DATAoutput-dataset;SETinput-dataset1;SETinput-dataset2;

RUN;

-  output-datasetnamesthedatasettobecreated-  input-dataset1andinput-dataset2specifythedatasettoberead

-  Thenewdatasetcontainsallthevariablesfromalltheinputdatasets.-  Ifthedatasetscontainvariablesthathavethesamename,thevalueswhichwerereadfromthe

lastdatasetoverwritethevalueswhichwerereadfromearlierdatasets-  Thefirstobserva)oninonedatasetisjoinedwiththefirstobserva)onintheother,andsoon.-  TheDATAstepstopsa|erreadingthelastobserva)onfromthesmallestdataset.Hence,the

numberofobserva)oninthenewdatasetisthenumberofobserva)onofthesmallestdataset

DATAemps_OTO;SETLib9_3.empsau;SETLib9_3.PhoneH;

RUN;PROCPRINTDATA=emps_OTO;RUN;

Page 16: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

One-to-onereading

284GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

DATAoutput-dataset;SETinput-dataset1;SETinput-dataset2;

RUN;

-  output-datasetnamesthedatasettobecreated-  input-dataset1andinput-dataset2specifythedatasettoberead

DATAemps_OTO;SETLib9_3.emps;SETLib9_3.emps2010;

RUN;PROCPRINTDATA=emps_OTO;RUN;

-  Thenewdatasetcontainsallthevariablesfromalltheinputdatasets.-  Ifthedatasetscontainvariablesthathavethesamename,thevalueswhichwerereadfromthe

lastdatasetoverwritethevalueswhichwerereadfromearlierdatasets-  Thefirstobserva)oninonedatasetisjoinedwiththefirstobserva)onintheother,andsoon.-  TheDATA step stops a|er reading the last observa)on from the smallest dataset.Hence, the

numberofobserva)oninthenewdatasetisthenumberofobserva)onofthesmallestdataset.

Page 17: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Match-merging

285GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

Match-mergingmeanstocombineobserva)onsfromtwoormoredatasetsintoasingleobserva)oninanewdatasetaccordingtothevaluesofacommonvariable

DATAoutput-dataset;MERGEinput-dataset1input-dataset2;BY<DESCENDING>BY-variable(s);

RUN;

REMEMBERTOSORTTHEINPUTDATASET!Ifyoudon’tdoit,SASgivesyouanerror-  output-datasetnamesthedatasettobecreated

-  input-dataset1andinput-dataset2specifythedatasettoberead.Ifyouspecifyonlyadataset,MERGEstatementbehaviorsasSETstatement

-  BY-variable,whosevaluesareusedtomatchobserva)ons.Theymusthavethesametypeinalldatasetstobemerged

-  <DESCENDING>op)onmustbeappliedifby-variableissortedindescendingorderintheinputdatasets

-  Basic match-merging produces an output dataset that contains values from allobservaBonsinallinputdataset

-  Ifan inputdatasetdoesn’thaveanyobserva)ons forapar)cularvalueof theby-variable,thentheobserva)onintheoutputdatasetcontainsmissingvaluesforthevariablesthatareuniquetothatinputdataset

-  MatchmergingcanbedonealsousingaPROCSQLstep

Page 18: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

One-to-onematchmerging:Ex.#1

286GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

DATAemps_OTO;SETLib9_3.empsau;SETLib9_3.PhoneH;

RUN;PROCPRINTDATA=emps_OTO;RUN;

DATAemps_OTO;MERGELib9_3.empsauLib9_3.PhoneH;BYEmpID;

RUN;PROCPRINTDATA=emps_OTO;RUN;

Page 19: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

One-to-Onematchmerging:Ex.#2

287GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

DATAemps_OTO;SETsortemps;SETsortemps2010;

RUN;PROCPRINTDATA=emps_OTO;RUN;

PROCSORTDATA=Lib9_3.empsOUT=sortemps;BYFirst;RUN;PROCSORTDATA=Lib9_3.emps2010OUT=sortemps2010;BYFirst;RUN;DATAemps_OTO;

MERGEsortempssortemps2010;BYFirst;

RUN;PROCPRINTDATA=emps_OTO;RUN;

Page 20: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

OnetoManymatch-merging

288GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

DATAemps_OTM;MERGELib9_3.empsauLib9_3.PhoneHW;BYEmpID;

RUN;PROCPRINTDATA=emps_OTM;RUN;

Page 21: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

HowSASprocessesthestep:compila)onphase

289GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

Inthecompila)onphase,SAS:-  CreatesthePDVreadingthedatasetsintheordertheyarelistedintheMERGEstatement-  Assignsatrackingpointertoeachdataset

Page 22: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

HowSASprocessesthestep:execu)onphase(1)

290GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

SASreadsthefirstobserva)onsofeachdataset:theyareinthesameby-group

ThevaluesremaininthePDV(astheinputdatasetsareSASdatasets)andSASstartstheseconditera)on

OUTPUTDATASET

Page 23: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

HowSASprocessesthestep:execu)onphase(2)

291GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

SASreadsthesecondobserva)onsofeachdataset:sincetheobsintheseconddatasetisinthesameby-groupastheobsthePDV,SASoverwritesthevaluesofTypeandPhoneinthePDV

PDV

ThevaluesremaininthePDVandSASstartsthethirditera)on

OUTPUTDATASET

Page 24: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

PDVHowSASprocessesthestep:execu)onphase(3)

292GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

SASreadsthesecondobserva)onfromthefirstdatasetandthethirdobserva)onfromtheseconddataset

ThevaluesremaininthePDVandSASstartsthefourthitera)on

Totheoutputdataset

PDVSASreadsthecurrentobserva)onsfromthetwodatasetinPDV:

Sincetheyarenotinthesameby-groupastheobsinthePDV,SASreini)alizesthePDVtomissingdata

Page 25: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

HowSASprocessesthestep:execu)onphase(4)

293GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

SAScon)nuesprocessinginthiswayun)lltheendofboththedatasets

Page 26: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Nomatchescase

294GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

DATAemps_NM;MERGELib9_3.empsauLib9_3.PhoneC;BYEmpID;

RUN;PROCPRINTDATA=emps_NM;RUN;

Whatdoesithappenwhentherearegroups(levels)oftheBY-variablewhicharenotsharedbytheINPUTdatasets?

Page 27: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Nomatchcase:HowSASprocessesthecompila)onphase

295GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

InthecompilaBonphase,SAS:-  CreatesthePDVreadingthedatasetsintheordertheyarelistedintheMERGEstatement-  Assignesatrackingpointertoeachdataset

Page 28: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Nomatchcase:HowSASprocessestheexecu)onphase(1)

296GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

SASreadsthefirstobserva)onsofeachdataset:theyareinthesameby-group

OUTPUTDATASET

ThevaluesremaininthePDVandSASstartstheseconditera)on

Page 29: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

HowSASprocessestheexecu)onphase(2)

297GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

SASreadsthesecondobserva)onsofeachdataset:sincetheyarenotinthesameby-groupastheobsinthePDV,SASreini)alizesthePDVtomissingdata

SASchoosestheobserva)oninthefirstdataset(121151<121152),andwritesitinPDV..

ThevaluesremaininthePDVandSASstartsthethirditera)on

..AndintheOUTPUTDataset

Page 30: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

HowSASprocessestheexecu)onphase(3)

298GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

SASreadsthethirdobsofthefirstdatasetandthesecondobsoftheseconddataset:sincetheyarenotinthesameby-groupastheobsinthePDV,SASreini)alizesthePDVtoMD

ThevaluesremaininthePDVandSASstartsthefourthitera)on

andthenintheOUTPUTDATASET

SASreadsthecurrentobserva)onsfromthetwodatasetinPDV

Page 31: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

HowSASprocessestheexecu)onphase(4)

299GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

SAScon)nuesprocessinginthiswayun)lltheendofboththedatasets

-  Matchingobserva)onswhichcontaindatafromboththedatasets-  Non-matchingobserva)onswhichcontaindatafromoneofthedatasets

Page 32: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Selec)ngonlymatches(ornot-matches)

300GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

variable:thenameyougivetoatemporaryvariableintothePDVwhichequals1iftheobserva)onfromtheMERGEdatasetcontributestothecurrentobserva)onoftheoutputdatasetand0otherwise

(IN=variable)

DATAemps_onlyM;MERGELib9_3.empsau(IN=emps)Lib9_3.PhoneC(IN=cell);BYEmpID;IF(empsANDcell);

RUN;PROCPRINTDATA=emps_onlyM;RUN; DATAemps_onlyNM;

MERGELib9_3.empsau(IN=emps)Lib9_3.PhoneC(IN=cell);BYEmpID;IF(emps=0ORcell=0);

RUN;PROCPRINTDATA=emps_onlyNM;RUN;

UsingIN=op)onintheMERGEstatementwithanIFstatementallowsselecBngonlymatchesandnon-matches

Page 33: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

ManytoManymatch-merging

301GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

DATAemps_MTM;MERGELib9_3.empsAUUSLib9_3.PhoneO;BYCountry;

RUN;PROCPRINTDATA=emps_MTM;RUN;

N.B.:-  Thisprocedureprovidesandatasetwithsixlines-  Ifwewanta12-obsdataset(anycombina)onofobswhichsharethesamelevel)

wemustusetheSQLPROC

Page 34: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Usingdatamanipula)ontechniqueswithadatamerge

302GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

Page 35: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Whopurchasedwhat?

303GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

PROCSORTDATA=Lib9_3.Order_factOUT=Order_fact;BYCustomer_ID;

RUN;DATACustOrd;

MERGELib9_3.CustomerWork.Order_fact;BYCustomer_ID;

RUN;PROCPRINTDATA=CustOrd(OBS=30);RUN;

InthedatasetCustOrdthesameCustomerIDisrepeatedforeachorder

PROCPRINTDATA=Lib9_3.Customer(OBS=20);RUN;PROCCONTENTSDATA=Lib9_3.Customer;RUN;

PROCPRINTDATA=Lib9_3.Order_fact(OBS=20);RUN;PROCCONTENTSDATA=Lib9_3.Order_fact;RUN;

Page 36: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Outpuzngtwodatasets

304GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

DATAOrders(KEEP=Customer_NameQuan)tyTotal_Retail_Price) NoOrders(KEEP=Customer_NameBirth_Date);

MERGELib9_3.CustomerWork.Order_fact(IN=order);BYCustomer_ID;IForder=1THENOUTPUTOrders;ELSEOUTPUTNoOrders;

RUN;PROCPRINTDATA=Orders;RUN;PROCPRINTDATA=NoOrders;RUN;

Wewanttodividethemergeddatasetintwodatasets:Orders(inwhichweretainonlyvariablesCustomer_Name,Quan)tyandTotal_Retail_Price)andNoOrders(inwhichweretainonlyvariablesCustomer_NameandBirth_date).-  Inthefirstdatasetweputtheobserva)onswhichcontaininforma)onaboutthe

orders-  Intheseconddatasetweputobserva)onsaboutcustomerswhichmadenoorders.

Page 37: 12. Combining SAS datasetsmaths.cnam.fr/IMG/pdf/sas_12_cle4b9e45.pdf · GIORGIO RUSSOLILLO - Cours de préparaon à la cer)ficaon SAS «Base Programming» 291 SAS reads the second

Summingupthe#ofordersforeachcustomer

305GIORGIORUSSOLILLO-Coursdeprépara)onàlacer)fica)onSAS«BaseProgramming»

DATAOrders(KEEP=Customer_NameQuan)tyTotal_Retail_Price) NoOrders(KEEP=Customer_NameBirth_Date) Summary(KEEP=Customer_NameNumberOrders);MERGELib9_3.CustomerWork.Order_fact(IN=order);BYCustomer_ID;IForder=1THENDO; OUTPUTOrders; IFFirst.Customer_IDTHENNumberOrders=0; NumberOrders+1; IFLast.Customer_IDTHENOUTPUTSummary;END;ELSEOUTPUTNoOrders;

RUN;PROCPRINTDATA=Summary;RUN;

Wewant to create also a third dadaset, called Summary, in which there are twovariables:Customer_NameandanewvariablenamedNumberOrders,whichcountsthenumberofordersforeachcustomer