planting your rows: using sas ® formats to make the generation of zero-filled rows in tables less...
TRANSCRIPT
Planting Your Rows: Using SAS® Formats to Make the Generation of Zero-Filled Rows in Tables
Less Thorny
Kathy Hardis FraemanHASUG Q3 2012
“Planting your rows” with SAS® Formats
One of many clever tricks that can be done with SAS formats
Techniques used for this trick are not unique to this trick, can be applied to other programming tricks
I ♥ SAS formats!
When to use this trick with SAS formats
Need tables with all possible values of a variable or variables
All possible values of the variable don’t exist in the data
3
Solution without using SAS Formats
“Hard code” all possible values of a variable in the program
Disadvantages–Tedious–Potential for large number of variables and/or values
–“Possible values” could change over time
4
Solution using SAS Formats
Attach a format to each variable with all possible values described
Advantages–“Dynamic”–Only need to include variable values in the SAS format
5
SAS Programming Techniques
1. Saving a permanent SAS format library2. Attach a format to a variable3. Determine the format attached to a
variable–SCL–SQL
4. Identify the data values defined in the attached format using CNTLOUT
6
Technique 1Saving a permanent SAS format library
libname library ">>pathname<<";
proc format library = library; . . . run;
7
Permanent SAS format library
8
• Named “formats”
• Stored as a SAS Catalog
SAMPLE DATA
Obs employee year num dollar 1 Hall FY 2008 10 $10,000.00 2 Hall FY 2010 15 $15,500.00 3 Oates FY 2008 8 $500.00 4 Brooks FY 2008 15 $11,111.00 5 Brooks FY 2010 20 $12,345.67 6 Abbott FY 2008 50 $75,757.00 7 Abbott FY 2010 75 $99,999.99 8 Costello FY 2008 33 $33,333.00 9 Costello FY 2010 44 $44,444.44
9
Variables with attached formats
EMPLOYEE and YEAR are not “text” variables
Variables have attached formats–EMPLOYEE has an attached format emplfmt.–YEAR has an attached format yearfmt.
10
SAS Formats attached to variablesproc format library = library; value emplfmt 1 = "Hall" 2 = "Oates"
3 = "Brooks" 4 = "Dunn"
5 = "Abbott" 6 = "Costello" ;
value yearfmt 2008 = "FY 2008" 2009 = "FY 2009"
2010 = "FY 2010" ;
run;11
Technique 2Attach format to the variable
data in.sales; set sales; format employee emplfmt. year yearfmt. ; run;
12
Obs employee year num dollar 1 Hall FY 2008 10 $10,000.00 2 Hall FY 2010 15 $15,500.00 3 Oates FY 2008 8 $500.00 4 Brooks FY 2008 15 $11,111.00 5 Brooks FY 2010 20 $12,345.67 6 Abbott FY 2008 50 $75,757.00 7 Abbott FY 2010 75 $99,999.99 8 Costello FY 2008 33 $33,333.00 9 Costello FY 2010 44 $44,444.44
Dunn hasn’t sold anything No sales by anyone in FY 2009 Oates didn’t sell anything in 201013
REPORT WITH MISSING ROWS Number Amount of Year Employee of Sales Sales ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FY 2008 Hall 10 $10,000.00 Oates 8 $500.00 Brooks 15 $11,111.00 Abbott 50 $75,757.00 Costello 33 $33,333.00 FY 2010 Hall 15 $15,500.00 Brooks 20 $12,345.67 Abbott 75 $99,999.99 Costello 44 $44,444.44
14
Technique 3Determine a Variable’s Format
Don’t need to “hard code” the name of a variables format unless in a FORMAT statement
Can use two techniques to determine a variable’s format–%SYSFUNC with SCL–A dictionary table with PROC SQL
15
Method 1 -- %SYSFUNC with SAS SCL
%let dsid = %sysfunc(open(IN.SALES, i)); %let varnum = %sysfunc(varnum(&dsid,EMPLOYEE)); %let format = %sysfunc(varfmt(&dsid, &varnum)); %let rc = %sysfunc(close(&dsid)); %put EMPLOYEE VARIABLE FORMAT = &format;
999 %put EMPLOYEE VARIABLE FORMAT = &format; EMPLOYEE VARIABLE FORMAT = EMPLFMT. Note that the “.” in the format name is included in the
macro variable &format
16
Method 2 – PROC SQL DICTIONARY TABLES
proc sql; create table formats as select format from dictionary.columns where upcase(libname) = 'IN' and upcase(memname) = 'SALES' and
upcase(name) = ('EMPLOYEE') ; quit;
17
PROC SQL Dictionary Table Output
proc print data=formats; run;
Obs format 1 EMPLFMT.
Again note that the “.” in the format name is included in the value of the variable
18
Technique 4Identify all values of a SAS format
CNTLOUT Option of PROC FORMAT
–FMTNAME – name of the format–START – starting value of the format–LABEL – descriptive label associated with the value of START
19
PROC FORMAT Code
proc format library = library cntlout = formatlib (keep = fmtname start label); run;
proc print data = formatlib; run;
20
Output data set FORMATLIB
Obs FMTNAME START LABEL 1 EMPLFMT 1 Hall 2 EMPLFMT 2 Oates 3 EMPLFMT 3 Brooks 4 EMPLFMT 4 Dunn 5 EMPLFMT 5 Abbott 6 EMPLFMT 6 Costello 7 YEARFMT 2008 FY 2008 8 YEARFMT 2009 FY 2009 9 YEARFMT 2010 FY 2010
21
Output data set FORMATLIB
Note that the variable with the name of the format FMTNAME does not have a “.” at the end
Can add a “.” to the end
fmtname = cats(fmtname,’.’);
22
Putting it all together – finding all possible values for variables using attached formats
Uses previously described techniques
Can be done with SCL or SQL
Complete code is provided
23
SAS Macro to put it all together
Two macros, one for each technique–%SYSFUNC with SCL–PROC SQL Dictionary tables
Both macros use CNTLOUTBoth macros input name of variable, output
values of variable in a data set
%getfmt1(var=employee, outvals=empvals);%getfmt1(var=year, outvals=yearvals);
24
Method 1 – SCL%macro getfmt1(var=, outvals=); %let dsid = %sysfunc(open(IN.SALES,i)); /* Input data is hard coded*/
%let varnum = %sysfunc(varnum(&dsid, &var)); %let varfmt = %sysfunc(varfmt(&dsid, &varnum)); %let rc = %sysfunc(close(&dsid));
%put &varfmt;
proc format library = library cntlout = &outvals (keep = fmtname start label where = (cats(fmtname,'.') = "&varfmt"));
run;
title "Data set &outvals from macro GETFMT1 -- SYSFUNC and SCL"; proc print data = &outvals;
run; %mend getfmt1;%getfmt1(var=employee, outvals=empvals);%getfmt1(var=year, outvals=yearvals);
25
Method 2 – SQL (part 1)
%macro getfmt2(var=, outvals=); proc sql; create table &var.fmt as select format
from dictionary.columns where upcase(libname) = 'IN' and /* Input data is hard coded*/ upcase(memname) = 'SALES' and
upcase(name) = upcase("&var") ;
quit;
data _null_; set &var.fmt; call symputX("varfmt", format, 'L');
run;
%put &varfmt;
26
Method 2 – SQL (part 2)
proc format library = library cntlout = &outvals (keep = fmtname start label where = (cats(fmtname,'.') =
"&varfmt")); run;
title "Data set &outvals from macro GETFMT2 SQL Dictionary Table";
proc print data = &outvals; run;
%mend getfmt2;
%getfmt2(var=employee, outvals=empvals);%getfmt2(var=year, outvals=yearvals);
27
Output Data from %GETFMT1 (SCL) Data set empvals from macro GETFMT1 -- SYSFUNC and SCL
Obs FMTNAME START LABEL
1 EMPLFMT 1 Hall 2 EMPLFMT 2 Oates 3 EMPLFMT 3 Brooks 4 EMPLFMT 4 Dunn 5 EMPLFMT 5 Abbot 6 EMPLFMT 6 Costello
Data set yearvals from macro GETFMT1 -- SYSFUNC and SCL
Obs FMTNAME START LABEL
1 YEARFMT 2008 FY 2008 2 YEARFMT 2009 FY 2009 3 YEARFMT 2010 FY 201028
Output Data from %GETFMT2 (SQL) Data set empvals from macro GETFMT2 SQL Dictionary Table
Obs FMTNAME START LABEL
1 EMPLFMT 1 Hall 2 EMPLFMT 2 Oates 3 EMPLFMT 3 Brooks 4 EMPLFMT 4 Dunn 5 EMPLFMT 5 Abbot 6 EMPLFMT 6 Costello
Data set yearvals from macro GETFMT2 SQL Dictionary Table
Obs FMTNAME START LABEL
1 YEARFMT 2008 FY 2008 2 YEARFMT 2009 FY 2009 3 YEARFMT 2010 FY 201029
Data set with all possible values of both variablesUses data sets created by macros
–EMPVALS – all possible values of EMPLOYEE
–YEARVALS – all possible values of YEAR
30
Data set with all possible values of both variablesCombined with SQL join to create a data
set will all 6 possible values of the variable EMPLOYEE with all 3 possible values of the variable YEAR
Name of combined data set is ALLROWSALLROWS has 6 x 3 = 18 observations
31
SQL Join to create ALLROWSdata yearvals_mod (keep = year a); set yearvals;
year = input(trim(left(start)),8.); a = 1; run;
data empvals_mod (keep = employee b); set empvals;
employee = input(trim(left(start)),8.); b = 1; run;
proc sql; create table allrows as
select year, employee from yearvals_mod y, empvals_mod e where y.a = e.b; quit;
proc sort data=allrows; by year employee;
run;
32
Data set ALLROWS All rows needed for table
Formats will automatically be added later in the merge
Obs year employee
1 2008 1 2 2008 2 3 2008 3 4 2008 4 5 2008 5 6 2008 6 7 2009 1 8 2009 2 9 2009 3 10 2009 4 11 2009 5 12 2009 6 13 2010 1 14 2010 2 15 2010 3 16 2010 4 17 2010 5 18 2010 633
Create data set for reportproc sort data=in.sales out=sales; by year employee;
run;
data sales_all; merge allrows (in=a)
sales (in=s); by year employee;
/*------------------------------*/ /* Zero-fill the missing rows /*------------------------------*/ if a and ^s then do; num = 0; dollar = 0; end; run;
34
Data set SALES_ALL Input data for report Obs year employee num dollar 1 FY 2008 Hall 10 $10,000.00 2 FY 2008 Oates 8 $500.00 3 FY 2008 Brooks 15 $11,111.00 4 FY 2008 Dunn 0 $0.00 5 FY 2008 Abbott 50 $75,757.00 6 FY 2008 Costello 33 $33,333.00 7 FY 2009 Hall 0 $0.00 8 FY 2009 Oates 0 $0.00 9 FY 2009 Brooks 0 $0.00 10 FY 2009 Dunn 0 $0.00 11 FY 2009 Abbott 0 $0.00 12 FY 2009 Costello 0 $0.00 13 FY 2010 Hall 15 $15,500.00 14 FY 2010 Oates 0 $0.00 15 FY 2010 Brooks 20 $12,345.67 16 FY 2010 Dunn 0 $0.00 17 FY 2010 Abbott 75 $99,999.99 18 FY 2010 Costello 44 $44,444.4435
REPORT WITHOUT MISSING ROWS Number Amount of Year Employee Sales Sales ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FY 2008 Hall 10 $10,000.00 Oates 8 $500.00 Brooks 15 $11,111.00 Dunn 0 $0.00 Abbott 50 $75,757.00 Costello 33 $33,333.00
FY 2009 Hall 0 $0.00 Oates 0 $0.00 Brooks 0 $0.00 Dunn 0 $0.00 Abbott 0 $0.00 Costello 0 $0.00
FY 2010 Hall 15 $15,500.00 Oates 0 $0.00 Brooks 20 $12,345.67 Dunn 0 $0.00 Abbott 75 $99,999.99 Costello 44 $44,444.4436
Conclusions
Use of SAS formats can improve the programming process
Techniques used for this programming trick can be adapted to other programming tricks
37