proc tabulate in action - geocities.ws fileproc tabulate in action susan sepanik new york area sas...

33
PROC TABULATE in Action Susan Sepanik New York Area SAS Users Group June 23, 2009

Upload: lamcong

Post on 25-Aug-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

PROC TABULATE in Action

Susan SepanikNew York Area SAS Users GroupJune 23, 2009

2

Topics

What is PROC TABULATE?Why use PROC TABULATE?PROC TABULATE SyntaxPROC TABULATE ExampleAdditional Resources

3

What is PROC TABULATE?PROC TABULATE generates customized

tables of descriptive statistics

It creates many of the same statistics as PROC MEANS and PROC FREQ

But, you can:Decide what goes in the rows and columnsDo analysis on several variables in one tableDecide how you want to classify variablesFormat it all into a ready-to-present table

4

Why use PROC TABULATE?

To examine raw dataChecking counts and simple descriptive data in easy-to-read tables

To present data internallyAnswer specific analysis questions for a colleague or meeting

5

Sample DataOriginal Data Set (abbreviated):

STUDID SCHOOL YEAR ATTRATE TSCORE TMISS

1 School A 2005 0.95 655 0

1 School A 2006 0.97 673 0

2 School B 2005 0.87 565 0

2 School B 2006 0.85 . 1

3 School C 2005 0.82 503 0

3 School B 2006 0.89 501 0

6

Analysis Questions

1.Do we have the correct number of students per school and school year?

2.Does any school have particularly high or low test scores in a given year?

3.Does any school or school year have particularly high levels of missing test scores?

4.Are the attendance rates at each school for each year what we would expect?

7

Getting Started

8

PROC TABULATE Syntax

PROC TABULATE DATA = dataset;VAR analysis-variable-list;CLASS classification-variable-list;TABLE page-dimension,

row-dimension, column-dimension;

9

Simple PROC TABULATEPROC TABULATE DATA = schooldata;

CLASS SCHOOL YEAR;TABLE SCHOOL, YEAR;

YEAR

2005 2006

N N

SCHOOL

3.00 3.00School A

School B 3.00 4.00

School C 4.00 3.00

10

Adding a New Variable: Syntax(TSCORE)

PROC TABULATE DATA = schooldata;VAR TSCORE;CLASS SCHOOL YEAR;

TABLE SCHOOL, YEAR*N YEAR*TSCORE*MEAN;

11

Adding a New Variable: Output(TSCORE)

YEAR YEAR

2005 2006

2005 2006 TSCORE TSCORE

N N Mean Mean

SCHOOL

3.00 3.00 574.00 619.00School A

School B 3.00 4.00 527.00 533.67

School C 4.00 3.00 597.33 605.67

12

Adding Total: Syntax(ALL)

PROC TABULATE DATA = schooldata;VAR TSCORE;CLASS SCHOOL YEAR;TABLE SCHOOL ALL, YEAR*(N TSCORE*MEAN);

13

Adding Total: Output(ALL)

YEAR

2005 2006

N TSCORE N TSCORE

Mean Mean

SCHOOL

3.00 574.00 3.00 619.00School A

School B 3.00 527.00 4.00 533.67

School C 4.00 597.33 3.00 605.67

All 10.00 571.00 10.00 582.00

14

Adding New Statistics: Syntax(PCTN, NMISS)

PROC TABULATE DATA = schooldata;VAR TSCORE;CLASS SCHOOL YEAR;

TABLE SCHOOL ALL, YEAR*(N COLPCTN TSCORE*(MEAN NMISS));

15

Adding New Statistics: Output(PCTN, NMISS)

YEAR

2005 2006

N ColPctN TSCORE N ColPctN TSCORE

Mean NMiss Mean NMiss

SCHOOL

3.00 30.00 574.00 0.00 3.00 30.00 619.00 1.00School A

School B 3.00 30.00 527.00 1.00 4.00 40.00 533.67 1.00

School C 4.00 40.00 597.33 1.00 3.00 30.00 605.67 0.00

All 10.00 100.00 571.00 2.00 10.00 100.00 582.00 2.00

16

Changing Columns and Rows(A New Sketch)

17

Changing Columns and Rows: Syntax

PROC TABULATE DATA = schooldata;VAR TSCORE;CLASS YEAR SCHOOL;

TABLE YEAR*(SCHOOL ALL),N PCTN<SCHOOL ALL> TSCORE*(MEAN NMISS);

18

Changing Columns and Rows: Output

N PctN TSCOREMean NMiss

YEAR SCHOOL

3.00 30.00 574.00 0.002005 School ASchool B 3.00 30.00 527.00 1.00

School C 4.00 40.00 597.33 1.00

All 10.00 100.00 571.00 2.00

2006 SCHOOL

3.00 30.00 619.00 1.00School ASchool B 4.00 40.00 533.67 1.00

School C 3.00 30.00 605.67 0.00

All 10.00 100.00 582.00 2.00

19

Adding More Variables: Syntax(TMISS ATTRATE)

PROC TABULATE DATA = schooldata;VAR TSCORE TMISS ATTRATE;CLASS SCHOOL YEAR;

TABLE YEAR*(SCHOOL ALL),N PCTN<SCHOOL ALL> TSCORE*MEAN TMISS*MEAN ATTRATE*MEAN;

20

Adding More Variables: Output(TMISS ATTRATE)

N PctN TSCORE TMISS ATTRATE

Mean Mean Mean

YEAR SCHOOL

3.00 30.00 574.00 0.00 0.922005 School A

School B 3.00 30.00 527.00 0.33 0.68

School C 4.00 40.00 597.33 0.25 0.82

All 10.00 100.00 571.00 0.20 0.81

2006 SCHOOL

3.00 30.00 619.00 0.33 0.95School A

School B 4.00 40.00 533.67 0.25 0.90

School C 3.00 30.00 605.67 0.00 0.86

All 10.00 100.00 582.00 0.20 0.90

21

Changing Headings: SyntaxRemoving unneeded headings:YEAR=' '

Changing headings:ATTRATE='Average Attendance Rate'

Changing or removing statistic headings:KEYLABEL N='Total Students' MEAN =' ';

Adding a table title:/BOX = 'Average Test Scores and Attendance Rates by School and Year';

22

Changing Headings: SyntaxPROC TABULATE DATA = schooldata;

VAR ATTRATE TSCORE TMISS;CLASS SCHOOL YEAR;

TABLE YEAR=' '*(SCHOOL=' ' ALL='All Schools'),N PCTN<SCHOOL ALL> TSCORE='Average Test Score'*MEANTMISS='Percentage Missing Test Scores'*MEANATTRATE='Average Attendance Rate'* MEAN

/BOX = ‘Average Test Scores and Attendance Rates by School and Year';

KEYLABEL N='Total Students' PCTN='Percentage of Students' MEAN =' ';

23

Changing Headings: OutputAverage Test Scores and

Attendance Rates by School and Year

Total Students

Percentage of Students

Average Test Score

Percentage of Missing

Test Scores

Average Attendance

Rate

2005 School A 3.00 30.00 574.00 0.00 0.92

School B 3.00 30.00 527.00 0.33 0.68

School C 4.00 40.00 597.33 0.25 0.82

All Schools 10.00 100.00 571.00 0.20 0.81

2006 School A 3.00 30.00 619.00 0.33 0.95

School B 4.00 40.00 533.67 0.25 0.90

School C 3.00 30.00 605.67 0.00 0.86

All Schools 10.00 100.00 582.00 0.20 0.90

24

Formatting Values: SyntaxFormatting all numeric cells:

FORMAT=12.0;

Formatting specific variables:*F=12.1 *F=PERCENT12.0

Specifying amount of spaces for all row headings:RTS=25

Creating your own formats:PROC FORMAT;

PICTURE PCTPIC LOW-HIGH =' 000%';RUN;

*F=PCTPIC.

25

Formatting Values: Syntax

PROC TABULATE DATA = schooldata FORMAT=12.0; VAR ATTRATE TSCORE TMISS; CLASS SCHOOL YEAR;

TABLE YEAR=' '*(SCHOOL=' ' ALL= 'All Schools'), N PCTN<SCHOOL ALL> *F=PCTPIC.TSCORE= 'Average Test Score' *F=12.1*MEAN TMISS='Percent Missing Test Scores'*MEAN*F=PERCENT12.0ATTRATE='Average Attendance Rate'*MEAN*F=PERCENT12.1

/BOX = 'Average Test Scores and Attendance Rates by School and Year' RTS=25;

KEYLABEL PCTN='Percent of Students' N='Total Students' MEAN=' ';

PROC FORMAT;PICTURE PCTPIC LOW-HIGH =' 000%';

RUN;

26

Formatting Values: OutputAverage Test Scores

and Attendance Rates by School and Year

Total Students

Percent of Students

Average Test Score

Percent Missing Test

Scores

Average Attendance

Rate

2005 School A 3 30% 574.0 0% 92.3%

School B 3 30% 527.0 33% 67.7%

School C 4 40% 597.3 25% 81.5%

All Schools 10 100% 571.0 20% 80.6%

2006 School A 3 30% 619.0 33% 94.7%

School B 4 40% 533.7 25% 90.3%

School C 3 30% 605.7 0% 85.7%

All Schools 10 100% 582.0 20% 90.2%

27

Creating an Excel File: Syntax

ODS HTML FILE = 'PROCTABULATE.XLS' STYLE = MINIMAL;PROC TABULATE DATA = schooldata;

VAR ATTRATE TSCORE TMISS; CLASS SCHOOL YEAR;

TABLE YEAR=' '*(SCHOOL=' ' ALL= 'All Schools'), (N PCTN<SCHOOL ALL> *F=PCTPIC.TSCORE= 'Average Test Score' *F=12.1*MEAN TMISS='Percent Missing Test Scores'*MEAN*F=PERCENT12.0

ATTRATE='Average Attendance Rate'*MEAN*F=PERCENT12.1) /BOX = 'Average Test Scores and Attendance Rates by School and Year' RTS=25; KEYLABEL PCTN='Percent of Students' N='Total Students' MEAN=' '; RUN;

ODS HTML CLOSE;

Creating an Excel File: Output

28

Percent Missing Test Average AttendanceScores Rate

School A 3 30% 574 0% 92.30%School B 3 30% 527 33% 67.70%School C 4 40% 597.3 25% 81.50%

All Schools 10 100% 571 20% 80.60%School A 3 30% 619 33% 94.70%School B 4 40% 533.7 25% 90.30%School C 3 30% 605.7 0% 85.70%

All Schools 10 100% 582 20% 90.20%

Average Test Scores and Attendance Rates by School and Year

Total Students Percent of Students Average Test Score

2005

2006

29

ConclusionPROC TABULATE generates customized tables of

descriptive statistics

You can format the output into ready-to-present Excel tables

The best way to create a table is:- Start simple with which variables you want in the

columns and rows- Add more statistics and variable relationships as

needed-Finish by formatting titles and values

30

Additional resourcesBASE SAS 9.1.3 Procedures Guide, Volume 3

“Making Sense of PROC TABULATE (Updated for (SAS9)” by Jonas V. Bilenas, JP Morgan Chase Paper 230-2007http://www2.sas.com/proceedings/forum2007/230-2007.pdf

“Anyone Can Learn PROC TABULATE” by Lauren Haworth, Genentech, Inc. Paper 60-27www2.sas.com/proceedings/sugi27/p060-27.pdf

PROC TABULATE by Example, by Lauren E. Haworth, SAS Institute Inc., 1999.

31

Thank you!

32

Appendix: Entire Original Data Set

STUDID SCHOOL YEAR ATTRATE TSCORE TMISS1 School A 2005 0.95 655 01 School A 2006 0.97 673 02 School B 2005 0.87 565 0

2 School B 2006 0.85 . 13 School C 2005 0.82 503 03 School B 2006 0.89 501 04 School A 2005 0.9 524 04 School A 2006 0.91 . 15 School B 2005 0.27 489 05 School B 2006 0.95 522 0

33

Appendix: Entire Original Data Set (continued):

STUDID SCHOOL YEAR ATTRATE TSCORE TMISS6 School C 2006 0.88 495 0

7 School C 2005 0.77 669 07 School C 2006 0.73 690 08 School C 2005 0.99 620 08 School C 2006 0.96 632 0

9 School C 2005 0.68 . 110 School A 2005 0.92 543 010 School A 2006 0.96 565 0

11 School B 2005 0.89 . 111 School B 2006 0.92 578 0