the power of proc tabulate part 1: the basics david m. rivard
TRANSCRIPT
The Power of Proc Tabulate
Part 1: The Basics
David M. Rivard
Introduction
Provides the capability to produce customized professional tables for usage in presentations and reports
Syntax based on the Table Producing Language developed by the U.S. Department of Labor
Reports summary statistics which are available in Freq, Means, and Summary procedures
Handles multiple variables in rows and columns and multiple levels within the rows and columns
Series Presentation Outline
Proc Tabulate will be presented as a series because the syntax for the procedure may appear complex and intimidating
Part 1 Examples of tables produced by the Tabulate and
Means procedures Suggested basic steps to begin developing a table General coding format Creating and customizing a simple table
Dataset Attributes 428 Records Name Type Label
Cyl Numeric CylDealer $ Numeric Dealer $Doors Character DoorsEngine Numeric EngineHP Numeric HPLength Numeric LengthMPG City Numeric MPG CityMPG Hwy Numeric MPG HwyMSP $ Numeric MSP $Manufacturer Character ManufacturerModel Character ModelType Character TypeWeight Numeric WeightWhel Base Numeric Whel BaseWidth Numeric Width
NAME: 2004 New Car and Truck Data TYPE: Sample SIZE: 428 observations, 19 variablesDESCRIPTIVE ABSTRACT: Specifications are given for 428 new vehicles for the 2004 year. The variables recorded include price, measurements relating to the size of the vehicle, and fuel efficiency.
SOURCE: _Kiplinger's Personal Finance_, December 2003, vol. 57, no. 12, pp. 104-123, http:/www.kiplinger.com
Examples of Proc Means
Type=Domestic
Variable Label N Mean Std Dev Minimum Maximum
Cyl Cyl 145 6.110345 1.5416317 4 10
MPG_City MPG_City 140 19.32143 3.8910853 10 29
MPG_Hwy MPG_Hwy 140 26.35714 5.2602369 12 37
Weight Weight 145 3765.65 859.739863 2348 7190
Length Length 129 192.1008 14.2362022 150 227
Type=Import
Variable Label N Mean Std Dev Minimum MaximumCyl Cyl 283 5.60424 1.6391432 -1 12
MPG_City MPG_City 274 20.48175 5.7399087 12 60
MPG_Hwy MPG_Hwy 274 27.18613 5.8970121 14 66
Weight Weight 281 3479.98 685.355679 1850 5590Length Length 273 181.8315 11.4862848 143 208
proc means data = sasuser.cars;by type;var cyl mpg_city mpg_hwy weight length;run;
proc means data = sasuser.cars;class type;var cyl mpg_city mpg_hwy weight length;run;
Type N Obs Variable Label N Mean Std Dev Minimum Maximum
Cyl Cyl 145 6.110345 1.541632 4 10MPG_City MPG_City 140 19.32143 3.891085 10 29
MPG_Hwy MPG_Hwy 140 26.35714 5.260237 12 37
Weight Weight 145 3765.65 859.7399 2348 7190Length Length 129 192.1008 14.2362 150 227
Cyl Cyl 283 5.60424 1.639143 -1 12MPG_City MPG_City 274 20.48175 5.739909 12 60
MPG_Hwy MPG_Hwy 274 27.18613 5.897012 14 66
Weight Weight 281 3479.98 685.3557 1850 5590Length Length 273 181.8315 11.48628 143 208
Domestic 145
Import 283
Cyl=4
Type Doors N Obs Variable Mean
MPG_City 23.1MPG_Hwy 31
Weight 2985.3Length 182.1
MPG_City 24.577MPG_Hwy 32.077
Weight 2911.1Length 177.83
MPG_City 26MPG_Hwy 33
Weight 2691Length 168
MPG_City 25.257MPG_Hwy 32.2
Weight 2809.1Length 171.97
MPG_City 25MPG_Hwy 31.559
Weight 2918.6Length 175.83
4dr 63
5dr 1
Import 2dr 38
Domestic
2dr 10
4dr 26
Cyl=5
Type Doors N Obs Variable Mean
MPG_City 20.5MPG_Hwy 27
Weight 3450Length 186
MPG_City 19.6MPG_Hwy 26.8
Weight 3750.8Length 183.4
4dr 5
Import 2dr 2
Cyl=6
Type Doors N Obs Variable Mean
MPG_City 18MPG_Hwy 25.118
Weight 3721.8Length 189.53
MPG_City 18.813MPG_Hwy 26.271
Weight 3717.7Length 195.7
MPG_City 18.435MPG_Hwy 25.478
Weight 3504.2Length 179.35
MPG_City 18.494MPG_Hwy 25.278
Weight 3707.5Length 186.81
4dr 79
4dr 48
Import 2dr 46
Domestic
2dr 17
Cyl=8
Type Doors N Obs Variable Mean
MPG_City 15.214MPG_Hwy 20.5
Weight 4588.2Length 195.27
MPG_City 16.125MPG_Hwy 22.417
Weight 4486.2Length 207.39
MPG_City 15.353MPG_Hwy 21.353
Weight 4361.2Length 188.47
MPG_City 16.357MPG_Hwy 22.714
Weight 4297.9Length 193.54
4dr 29
4dr 26
Import 2dr 17
Domestic
2dr 15
proc means data = sasuser.cars mean;class type doors;by cyl;var mpg_city mpg_hwy weight length;run;
Type Doors Cyl N Obs Variable Mean
MPG_City 23.1
MPG_Hwy 31
Weight 2985.3
Length 182.1
MPG_City 18
MPG_Hwy 25.118
Weight 3721.8
Length 189.53
MPG_City 15.214
MPG_Hwy 20.5
Weight 4588.2
Length 195.27
MPG_City .
MPG_Hwy .
Weight 5300
Length 201.5
MPG_City 24.577
MPG_Hwy 32.077
Weight 2911.1
Length 177.83
MPG_City 18.813
MPG_Hwy 26.271
Weight 3717.7
Length 195.7
MPG_City 16.125
MPG_Hwy 22.417
Weight 4486.2
Length 207.39
MPG_City 26
MPG_Hwy 33
Weight 2691
Length 168
MPG_City 60
MPG_Hwy 66
Weight 1850
Import 2dr 3 1
5dr 4 1
8 26
6 48
4dr 4 26
10 2
8 15
10
6 17
Domestic 2dr 4
proc means data = sasuser.cars mean;class type doors cyl;var mpg_city mpg_hwy weight length;run;
What else can I do?
Data set had to be resorted by new variables every time a change was made
Example of Proc Tabulate
3 4 5 6 10 3 4 5 6 8 10 12Mean Mean Mean Mean Mean Mean Mean Mean Mean Mean Mean Mean
TypeDomestic . 24.22 . 18.6 . . 31.81 . 25.97 21.71 . .
Import 60 25.1 19.86 18.47 . 66 31.8 26.86 25.35 22.2 . 19
MPG_City MPG_HwyCyl Cyl
By making simple code changes you can slice and dice your table just as you would an EXCEL pivot table. The best part is that SAS does it for you inone procedure verses exporting data to EXCEL and creating the pivot table And how the data is sorted does not matter!
How to Begin
Determine what statistics to present Sketch a draft of the report Generate the basic code Test, retest, and verify the results using a subset of
the data Continue developing the code and creating the table
in stages Run the program on the complete data Clean up the final appearance of the table Include ODS Functionality (discussed in the last series)
General Coding Format
PROC TABULATE Data = mydata <options >;
CLASS variables;
TABLE page dimension, row dimension, column dimension / <options>
RUN;
Basic Syntax Rules
CLASS: categories (numeric or character) VAR: used in the analysis (numeric) TABLE: constructs the appearance
The table’s page, row, and column dimensions are separated by commas
The asterisk (*) is an operator and specifies: Add another CLASS variable split Include another variable Add a statistic to a variable Designate a format
Let Us Begin
Cylinders
MPG City MPG HWY
Domestic 2dr
4dr
All
Import 2dr
4dr
All
proc tabulate data = sasuser.cars;
class type;
var mpg_city mpg_hwy;
table type, (mpg_city mpg_hwy)*mean;
run;
proc tabulate data = sasuser.cars;
class type;
var mpg_city mpg_hwy;
table (mpg_city mpg_hwy)*mean;
run;
MPG_City MPG_Hwy
Mean Mean20.09 26.91
proc tabulate data = sasuser.cars;class type cyl;var mpg_city mpg_hwy;table type, (mpg_city mpg_hwy) *cyl *mean;run;
3 4 5 6 10 3 4 5 6 8 10 12Mean Mean Mean Mean Mean Mean Mean Mean Mean Mean Mean Mean
TypeDomestic . 24.22 . 18.6 . . 31.81 . 25.97 21.71 . .
Import 60 25.1 19.86 18.47 . 66 31.8 26.86 25.35 22.2 . 19
MPG_City MPG_HwyCyl Cyl
MPG_City MPG_Hwy
Mean MeanType
Domestic 19.32 26.36Import 20.48 27.19
proc tabulate data = sasuser.cars;class type cyl;var mpg_city mpg_hwy;table type * cyl, (mpg_city mpg_hwy)*mean;run;
proc tabulate data = sasuser.cars;class type cyl;var mpg_city mpg_hwy;table type*cyl, (mpg_city mpg_hwy)*mean n='type';run;
MPG_City MPG_Hwy
Mean MeanType Cyl
4 24.22 31.816 18.6 25.978 15.79 21.71
10 . .3 60 664 25.1 31.85 19.86 26.866 18.47 25.358 15.98 22.2
12 13 19
Import
Domestic
MPG_City MPG_Hwy
Mean MeanType Cyl
4 24.22 31.81 376 18.6 25.97 658 15.79 21.71 41
10 . . 23 60 66 14 25.1 31.8 1015 19.86 26.86 76 18.47 25.35 1258 15.98 22.2 46
12 13 19 3
type
Domestic
Import
proc tabulate data = sasuser.cars;class type cyl;var mpg_city mpg_hwy;table type*(cyl all), (mpg_city mpg_hwy)*mean;run;
proc tabulate data = sasuser.cars;class type cyl;var mpg_city mpg_hwy;table (type all)*(cyl all), (mpg_city mpg_hwy)*mean ;run;
MPG_City MPG_Hwy
Mean MeanType Cyl
4 24.22 31.816 18.6 25.978 15.79 21.71
10 . .All 19.32 26.36Cyl3 60 664 25.1 31.85 19.86 26.866 18.47 25.358 15.98 22.2
12 13 19All 20.48 27.19
Domestic
Import
MPG_City MPG_HwyMean Mean
Type Cyl4 24.22 31.816 18.6 25.978 15.79 21.71
10 . .All 19.32 26.36Cyl3 60 664 25.1 31.85 19.86 26.866 18.47 25.358 15.98 22.2
12 13 19All 20.48 27.19Cyl3 60 664 24.85 31.85 19.86 26.866 18.52 25.568 15.89 21.98
10 . .12 13 19All 20.09 26.91
Import
All
Domestic
Proc Tabulate provides flexibility
Remember Our Original Table?
Cylinders
MPG City MPG HWY
Domestic 2dr
4dr
All
Import 2dr
4dr
All
MPG_City MPG_Hwy MPG_City MPG_Hwy MPG_City MPG_Hwy MPG_City MPG_Hwy
Mean Mean Mean Mean Mean Mean Mean MeanType Doors
2dr . . 23.1 31 . . 18 25.124dr . . 24.58 32.08 . . 18.81 26.275dr . . 26 33 . . . .All . . 24.22 31.81 . . 18.6 25.97
Doors2dr 60 66 25.26 32.2 20.5 27 18.43 25.484dr . . 25 31.56 19.6 26.8 18.49 25.28All 60 66 25.1 31.8 19.86 26.86 18.47 25.35
Domestic
Import
Cyl3 4 5 6
proc tabulate data = sasuser.cars;class type doors cyl;var mpg_city mpg_hwy;table type*(doors all), cyl*(mpg_city mpg_hwy)*mean ;run;
City Hwy City Hwy City Hwy City Hwy City Hwy City HwyManufacturer Body
Style2dr . . 23.1 31 . . 18 25.12 15.21 20.5 . .4dr . . 24.58 32.08 . . 18.81 26.27 16.13 22.42 . .5dr . . 26 33 . . . . . . . .
Overall . . 24.22 31.81 . . 18.6 25.97 15.79 21.71 . .Body Style2dr 60 66 25.26 32.2 20.5 27 18.43 25.48 15.35 21.35 . .4dr . . 25 31.56 19.6 26.8 18.49 25.28 16.36 22.71 . .
Overall 60 66 25.1 31.8 19.86 26.86 18.47 25.35 15.98 22.2 . .
Domestic
Import
MPG Analysis Cyl3 4 5 6 8 10
proc tabulate data = sasuser.cars;class type doors cyl;var mpg_city mpg_hwy;table type='Manufacturer'*(doors='Body Style' all='Overall'), cyl*(mpg_city='City' mpg_hwy='Hwy')*mean=' ' / BOX = 'MPG Analysis';run;
This is okay but let us make it better
I should have written ‘Average MPG Analysis’
Now we have a table worth presenting
City Hwy City Hwy City Hwy City Hwy City Hwy City Hwy2dr . . 23.1 31 . . 18 25.12 15.21 20.5 . .4dr . . 24.58 32.08 . . 18.81 26.27 16.13 22.42 . .5dr . . 26 33 . . . . . . . .
Overall . . 24.22 31.81 . . 18.6 25.97 15.79 21.71 . .2dr 60 66 25.26 32.2 20.5 27 18.43 25.48 15.35 21.35 . .4dr . . 25 31.56 19.6 26.8 18.49 25.28 16.36 22.71 . .
Overall 60 66 25.1 31.8 19.86 26.86 18.47 25.35 15.98 22.2 . .
MPG Analysis Engine Cylinders3 4 5 6 8 10
Domestic
Import
proc tabulate data = sasuser.cars;class type doors cyl;var mpg_city mpg_hwy;table type=' '*(doors=' ' all='Overall'), cyl= ' Engine Cylinders'*(mpg_city='City' mpg_hwy='Hwy')*mean=' ' / BOX = 'MPG Analysis';run;
The ‘Missing” Option
The data contains variables with missing values Our MPG results would be incorrect if we included
on these variables WHY? If an observation contains missing values for any of
the variables then that observation is ignored (This was not the case with our sample table since no missing values occurred in the chosen variables)
Using the ‘Missing’ option in the proc statement will tell SAS to include all observations regardless of missing values
What to Expect Next
We have only scratched the surface with PROC TABULATE
In the next presentation we will add the following: Formats Percents and percent of sums Other statistics such as min, max, standard deviation, and so on ODS (Output Delivery System) function for stylized tables Inserting logos and pictures
A Bonus for Attending
Data guess;do row = 0.0 to 3.4 by 0.1;do column = 0.00 to 0.09 by 0.01;z = row + column;prob = probnorm(z); output;end;end;run;
Proc Tabulate data = guess;class row column;var prob;table row, column*prob=''*sum=''*f=5.4/rtspace=5;label row = 'Z' column = ‘Guess What This Does';run;
Can you guess what the code does?
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09Z0
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.57530.2 0.5793 0.5832 0.5871 0.591 0.5948 0.5987 0.6026 0.6064 0.6103 0.61410.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.648 0.65170.4 0.6554 0.6591 0.6628 0.6664 0.67 0.6736 0.6772 0.6808 0.6844 0.68790.5 0.6915 0.695 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.719 0.72240.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.75490.7 0.758 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.78520.8 0.7881 0.791 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.81330.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.834 0.8365 0.83891 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.877 0.879 0.881 0.8831.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.898 0.8997 0.90151.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.91771.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.93191.5 0.9332 0.9345 0.9357 0.937 0.9382 0.9394 0.9406 0.9418 0.9429 0.94411.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.95451.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.96331.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.97061.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.975 0.9756 0.9761 0.97672 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.983 0.9834 0.9838 0.9842 0.9846 0.985 0.9854 0.98572.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9892.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.99162.4 0.9918 0.992 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.99362.5 0.9938 0.994 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.99522.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.996 0.9961 0.9962 0.9963 0.99642.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.997 0.9971 0.9972 0.9973 0.99742.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.998 0.99812.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.99863 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.999 0.999
3.1 0.999 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.99933.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.99953.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.99973.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
0.5319 0.5359
Standard Normal Distribution
0.5 0.504 0.508 0.512 0.516 0.5199 0.5239 0.5279
What does this code do?
data quess;do df=1 to 30 by 1;do prob=0.80, 0.85, 0.90, 0.925, 0.975, 0.99;t=TINV(prob, df); output;end;end;run;
Proc tabulate data = quess;class df prob;var t;table df, prob*t=''*mean=''*f=7.4/rtspace=7 box = 'T';Label prob = 'Probability';run;
0.8 0.85 0.9 0.95 0.975 0.99df12 1.0607 1.3862 1.8856 2.92 4.3027 6.96463 0.9785 1.2498 1.6377 2.3534 3.1824 4.54074 0.941 1.1896 1.5332 2.1318 2.7764 3.74695 0.9195 1.1558 1.4759 2.015 2.5706 3.36496 0.9057 1.1342 1.4398 1.9432 2.4469 3.14277 0.896 1.1192 1.4149 1.8946 2.3646 2.9988 0.8889 1.1081 1.3968 1.8595 2.306 2.89659 0.8834 1.0997 1.383 1.8331 2.2622 2.8214
10 0.8791 1.0931 1.3722 1.8125 2.2281 2.763811 0.8755 1.0877 1.3634 1.7959 2.201 2.718112 0.8726 1.0832 1.3562 1.7823 2.1788 2.68113 0.8702 1.0795 1.3502 1.7709 2.1604 2.650314 0.8681 1.0763 1.345 1.7613 2.1448 2.624515 0.8662 1.0735 1.3406 1.7531 2.1314 2.602516 0.8647 1.0711 1.3368 1.7459 2.1199 2.583517 0.8633 1.069 1.3334 1.7396 2.1098 2.566918 0.862 1.0672 1.3304 1.7341 2.1009 2.552419 0.861 1.0655 1.3277 1.7291 2.093 2.539520 0.86 1.064 1.3253 1.7247 2.086 2.52821 0.8591 1.0627 1.3232 1.7207 2.0796 2.517622 0.8583 1.0614 1.3212 1.7171 2.0739 2.508323 0.8575 1.0603 1.3195 1.7139 2.0687 2.499924 0.8569 1.0593 1.3178 1.7109 2.0639 2.492225 0.8562 1.0584 1.3163 1.7081 2.0595 2.485126 0.8557 1.0575 1.315 1.7056 2.0555 2.478627 0.8551 1.0567 1.3137 1.7033 2.0518 2.472728 0.8546 1.056 1.3125 1.7011 2.0484 2.467129 0.8542 1.0553 1.3114 1.6991 2.0452 2.46230 0.8538 1.0547 1.3104 1.6973 2.0423 2.4573
T Probability
1.3764 1.9626 3.0777 6.3138 12.706 31.821
Resources
Park, Hun Myoung Joint Ph.D. student in Public Policy in the Department of Political Science and the School of Public and Environmental Affairs Indiana University. SAS Tabulate http://www.masil.org/sas/tabulate.html
Pass,Ray and Sandy McNeill. Proc Tabulate: Doin’ It I Style
Hawworth, Lauren. Anyone Can Learn PROC TABULATE, v2.0 http://www2.sas.com/proceedings/sugi27/p060-27.pdf
Why Use Proc Tabulate. http://support.sas.com/publishing/pubcat/chaps/56514.pdf
Johnson, Roger W. NAME: 2004 New Car and Truck Data Department of Mathematics and Computer Science South Dakota School of Mines and Technology http://www.amstat.org/publications/jse/jse_data_archive.html
Jonas V. Bilenas, Making Sense of PROC TABULATE JP Morgan Chase, Wilmington, DE http://www2.sas.com/proceedings/sugi30/243-30.pdf
Thank you for attending
and
happy proc tabulating