advanced sas programming techniques ( )

Click here to load reader

Post on 11-Apr-2015

4.754 views

Category:

Documents

9 download

Embed Size (px)

DESCRIPTION

The DATA Step, Input/Output,SAS Functions,Looping and Arrays,The NULL Data Set, Data Step Examples,

TRANSCRIPT

Advanced SAS Programming TechniquesA Workshop Presented to the Alaska Chapter The American Fisheries SocietyE. Barry Moser Department of Experimental Statistics Louisiana State University and Louisiana State University Agricultural Center Baton Rouge, LA 70803 Phone: 504-388-8376 FAX: 504-388-8344 E-mail: barry@stat.lsu.edu September 29-October 3, 1997

Contents1 Introduction 2 The DATA Step2.1 The DATA STEP process : : : : : : : : : : : : 2.1.1 An implicit loop : : : : : : : : : : : : : 2.1.2 RETURN, DELETE, and OUTPUT : : 2.1.3 Compound Statements : : : : : : : : : : 2.1.4 Data Set Options : : : : : : : : : : : : : 2.1.5 DROP, KEEP, and RETAIN : : : : : : 2.2 Input/Output : : : : : : : : : : : : : : : : : : : 2.2.1 List Input : : : : : : : : : : : : : : : : : 2.2.2 Column Input : : : : : : : : : : : : : : : 2.2.3 Pointer Control and Formatted Input : 2.2.4 The PUT Statement : : : : : : : : : : : 2.2.5 SAS Formats and Informats : : : : : : : 2.3 SAS Functions : : : : : : : : : : : : : : : : : : 2.3.1 Mathematical Functions : : : : : : : : : 2.3.2 Random Number Generators : : : : : : 2.3.3 String Functions : : : : : : : : : : : : : 2.3.4 Date and Time Functions : : : : : : : : 2.3.5 PUT and INPUT Functions : : : : : : : 2.4 Looping and Arrays : : : : : : : : : : : : : : : 2.4.1 Univariate and Multivariate Data Views

34 4 5 7 8 10 10 10 13 14 18 19 21 21 22 23 24 25 26 27

4

CONTENTS2.4.2 Indeterminant DO Loops : : : : : : : : : : : : : 2.5 The NULL Data Set : : : : : : : : : : : : : : : : : : : : 2.6 Data Step Examples : : : : : : : : : : : : : : : : : : : : 2.6.1 Simple Random Sampling Without Replacement 2.6.2 Data Recoding : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

2 32 33 35 35 36

3 Working With Files3.1 External Files : : : : : : : : : : : : 3.1.1 FTP Access : : : : : : : : : 3.1.2 WWW Access : : : : : : : 3.2 Including External SAS Code : : : 3.3 The SAS Data Library : : : : : : : 3.3.1 The LIBNAME Statement : 3.3.2 Library Procedures : : : : : 3.4 File Import/Export/Transport : : 3.4.1 Import/Export : : : : : : : 3.4.2 Transport : : : : : : : : : : 3.5 The X Files

3838 44 45 45 45 46 47 51 51 53 55

4 The Macro Language4.1 4.2 4.3 4.4 Macro Variables : : Macro Procedures : Bootstrap Example : Cluster Dendrogram: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

5757 59 62 66

5 SAS Special Files5.1 Autoexec.sas 5.2 Con g.sas : : 5.3 Pro le.sct : :: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

7070 72 75

6 SAS Internet Tools6.1 Capturing OUTPUT for the Web :: : : : : : : : : : : : : : : : : : : : : : :

7676

Chapter 1

IntroductionThe SAS1 system, composed of many diverse components, is a very powerful programming environment, data management and data analysis environment, and report generation and graphics presentation environment. This manuscript was developed for a short-course in \advanced SAS." Obviously the coverage will have to be quite limited. The coverage is designed around material that I have encountered through my teaching, research, and statistical consulting work that I believe will be relevant and useful for others dealing with basic data management and statistical analysis needs. This manuscript is not intended as a SAS language or SAS system reference manual, it hardly scratches the surface. Nor is it designed to show how to do statistical analysis with the SAS system. The manuscript will rst focus on the data step, as a lot of the power of the SAS environment can be demonstrated through the data step. Next, the input/output and library system will be discussed. Later the macro language will be introduced. And nally several chapters dealing with various parts of the system, graphics, data analysis procedures, and the internet will be introduced. As this is an \advanced" course, some items will be introduced before they are actually covered in some detail. This was purposefully done so as to avoid completely arti cial-looking or contrived examples (although a few do exist, sorry). Further, since not all of the \basics" are covered, keep copies of the SAS manuals available. The best way to learn the SAS system and to bene t from this course is to experiment with the examples and to create your own. Again, DO NOT hessitate to modify the examples and to create new ones.

SAS, SAS/BASE, SAS/GRAPH, SAS/ACCESS, SAS/ASSIST, SAS/FSP, SAS/INSIGHT, SAS/OR, SAS/ETS, SAS/IntrNet, SAS/IML, and SAS/STAT are registered trademarks or copyrights of SAS Institute, Inc., Cary, NC.1

3

Chapter 2

The DATA Step2.1 The DATA STEP process2.1.1 An implicit loopTo understand much of what happens in the data step, one rst needs to understand its overall design. When originally conceived, the SAS data step was designed to get data stored in some \raw" format into the SAS data format and to perform any transformations and computations on that data prior to data analysis with procedures that would follow. Thus, the data step was designed with an implicit loop around the data input. That is, rather than the programmer having to explicitly write a loop around the input code, as would need to be done with FORTRAN (and most other programming languages), the loop was already assumed to be needed, and was, therefore, automatically supplied. At the end of the implied loop, the resulting data, in the form of variables, is output to the new SAS data set. The programmer writes the code needed to process a single observation of data, and the data step will then automatically repeat this same code for each observation, outputing each new observation in turn into the new SAS data set. This same basic process is also followed when a SAS data set is created from an existing SAS data set, such as when several SAS data sets are concatenated or merged together. The example below illustrates the basic looping process using a portion of the ier data set.Title2 "Simple Data Step" Data One Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL Datalines 7 21 74 2 7 3 0 5 3.5 1.4 57.00 1 9 76 2 3 2 0 1 5.3 3.0 0.00 12 18 74 2 4 1 0 5 5.4 3.4 83.20 2 15 76 2 1 1 0 5 6.0 5.7 111.00 9 13 75 2 2 2 1 5 10.1 23.4 203.00 Proc Print Data=One Run

4

CHAPTER 2. THE DATA STEPData Step Examples Simple Data Step OBS 1 2 3 4 5 MO 7 1 12 2 9 DAY 21 9 18 15 13 YR 74 76 74 76 75 AR 2 2 2 2 2 ST 7 3 4 1 2 SEX 3 2 1 1 2 AGE 0 0 0 0 1 SN 5 1 5 5 5 LT 3.5 5.3 5.4 6.0 10.1 WT 1.4 3.0 3.4 5.7 23.4 TSL 57.00 0.00 83.20 111.00 203.00

5

2.1.2 RETURN, DELETE, and OUTPUTThe behavior of the data step loop can be modi ed by several statements. The RETURN statement causes execution of a loop to \return" to a speci c point in a loop. When the loop is the implicit data step loop, execution returns immediately to the beginning of the data step loop. If no OUTPUT statements are contained in the data step, then the RETURN statement also outputs the current observation in whatever stat it is in into the SAS data set.Title2 "RETURN Statement" Data One Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL If TSL=0 Then RETURN ConvFact=Lt/TSL Datalines 7 21 74 2 7 3 0 5 3.5 1.4 57.00 1 9 76 2 3 2 0 1 5.3 3.0 0.00 12 18 74 2 4 1 0 5 5.4 3.4 83.20 2 15 76 2 1 1 0 5 6.0 5.7 111.00 9 13 75 2 2 2 1 5 10.1 23.4 203.00 Proc Print Data=One Run Data Step Examples RETURN Statement OBS 1 2 3 4 5 MO 7 1 12 2 9 DAY 21 9 18 15 13 YR 74 76 74 76 75 AR 2 2 2 2 2 ST 7 3 4 1 2 SEX 3 2 1 1 2 AGE 0 0 0 0 1 SN 5 1 5 5 5 LT 3.5 5.3 5.4 6.0 10.1 WT 1.4 3.0 3.4 5.7 23.4 TSL 57.00 0.00 83.20 111.00 203.00 CONVFACT 0.061404 . 0.064904 0.054054 0.049754

Notice that the conversion factor, CONVFACT, for the second observation is missing (represented by a period). This observation could be dropped from the data set by several methods. We'll consider a couple to further illustrate the behavior of the data step. The OUTPUT statement can be used to output an observation to a data set or data sets. When it is present in a data step, observations will ONLY BE OUTPUT when the OUTPUT statement is executed. Consider the example below using both the RETURN and OUTPUT statements.

CHAPTER 2. THE DATA STEPTitle2 "RETURN and OUTPUT Statements" Data One Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL If TSL=0 Then RETURN ConvFact=Lt/TSL OUTPUT Datalines 7

View more