understanding sas data step processing

21
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com

Upload: dixie

Post on 22-Feb-2016

74 views

Category:

Documents


1 download

DESCRIPTION

Understanding SAS Data Step Processing. Alan C. Elliott stattutorials.com. Reading Raw Data. Using the following SAS program: DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*( 9 / 5 )+ 32 ; DATALINES; 0001 24 37.3 0002 35 38.2 ; run ; proc print ; run ;. Overview of SAS Data Step. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Understanding SAS Data Step Processing

Understanding SAS Data Step Processing

Alan C. Elliottstattutorials.com

Page 2: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Reading Raw Data

• Using the following SAS program:DATA NEW;INPUT ID $ AGE TEMPC;TEMPF=TEMPC*(9/5)+32;DATALINES;0001 24 37.30002 35 38.2;run;proc print;run;

Page 3: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Overview of SAS Data Step

Compile Phase(Look at Syntax)

Execution Phase(Read data, Calculate)

Output Phase(Create Data Set)

Page 4: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Compile PhaseDATA NEW;INPUT ID $ AGE TEMPC;TEMPF=TEMPC*(9/5)+32;DATALINES;0001 24 37.30002 35 38.2;run;proc print;run;

SAS Checks the syntax of the program.• Identifies type and

length of each variable• Does any variable need

conversion?

If everything is okay, proceed to the next step.

If errors are discovered, SAS attempts to interpret what you mean. If SAS can’t correct the error, it prints an error message to the log.

Page 5: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Create Input Buffer• SAS creates an input buffer• INPUT BUFFER contains data as it is read in

DATALINES;0001 24 37.30002 35 38.2;

1 2 3 4 5 6 7 8 9 10 11 120 0 0 1 2 4 3 7 . 3

INPUT BUFFER

Page 6: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Execution Phase

• PROGRAM DATA VECTOR (PDV) is created and contains information about the variables

• Two automatic variables _N_ and _ERROR_ and a position for each of the four variables in the DATA step.

• Sets _N_ = 1 _ERROR_ = 0 (no initial error) and remaining variables to missing.

_N_ _ERROR_ ID AGE TEMPC TEMPF

1 0 . . .

Page 7: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Buffer to PDV1 2 3 4 5 6 7 8 9 10 11 12

0 0 0 1 2 4 3 7 . 3

_N_ _ERROR_ ID AGE TEMPC TEMPF

1 0 0001 24 37.3 .

Calculated value

Buffer

PDV

_N_ _ERROR_ ID AGE TEMPC TEMPF

1 0 0001 24 37.3 99.14

Processes the code TEMPF=TEMPC*(9/5)+32; Initially missing

Reads 1st record

If there is an executable statement…

Page 8: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Output Phase• The values in the PDV are written to the

output data set (NEW) as the first observation:

_N_ _ERROR_ ID AGE TEMPC TEMPF

1 0 0001 24 37.3 99.14

ID AGE TEMPC TEMPF

0001 24 37.3 99.14

This is the first record in the output data set

named “NEW.” Note that _N_ and

_ERROR_ are dropped.

From PDV

Write data to data set.

Page 9: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Exceptions to Missing in PDV

• Some data values are not initially set to missing in the PDV – variables in a RETAIN statement– variables created in a SUM statement– data elements in a _TEMPORARY_ array– variables created with options in the FILE or INFILE

statements• These exceptions are covered later.

_N_ _ERROR_ ID AGE TEMPC TEMPF

1 0 . . .

Initial values usually set to missing in PDV

Page 10: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Next data record read

• Once SAS finished reading the first data record, it continues the same process, and reads the second record…sending results to output data set (named NEW in this case.)

• …and so on for all records.

ID AGE TEMPC TEMPF

0001 24 37.3 99.14

0002 35 38.2 100.76

Page 11: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Descriptor Information

• For the data set, SAS creates and maintains a description about each SAS data set:– data set attributes– variable attributes– the name of the data set– member type, the date and time that the data set

was created, and the number, names and data types (character or numeric) of the variables.

Page 12: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Data Set Description

proc datasets ; contents data=new;run;

Contents output… (abbreviated)

# Name Member Type

File Size Last Modified

1 NEW DATA 5120 20Nov13:08:59:32

Alternate program

proc contents data= new; run;

Page 13: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Description output continued…Data Set Name WORK.NEW Observations 2Member Type DATA Variables 4Engine V9 Indexes 0Created Wed, Nov 20, 2013

08:59:32 AMObservation Length 32

Last Modified Wed, Nov 20, 2013 08:59:32 AM

Deleted Observations

0

Protection Compressed NOData Set Type Sorted NOLabelData Representation WINDOWS_64Encoding wlatin1 Western

(Windows)

Page 14: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Description output continued…

Alphabetic List of Variables and Attributes# Variable Type Len2 AGE Num 81 ID Char 83 TEMPC Num 84 TEMPF Num 8

Page 15: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Original ProgramDATA NEW;INPUT ID $ AGE TEMPC;TEMPF=TEMPC*(9/5)+32;DATALINES;0001 24 37.30002 35 38.2;run;proc print;run;

Page 16: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Original ProgramDATA NEW;INPUT ID $ AGE TEMPC;TEMPF=TEMPC*(9/5)+32;DATALINES;0001 24 37.30002 35 38.2;run;proc print;run;

Obs ID AGE TEMPC

TEMPF

1 0001 24 37.3 99.142 0002 35 38.2 100.76

Program output

Page 17: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Example of ErrorDATA NEW;INPUT ID $ AGE TEMPC;TEMPF=TEMPC*(9/5)+32DATALINES;0001 24 37.30002 35 38.2;run;proc print;run;

proc datasets ; contents data=new;run;

Missing Semi-colon

Page 18: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

76 DATA NEW;77 INPUT ID $ AGE TEMPC;78 TEMPF=TEMPC*(9/5)+3279 DATALINES; --------- 2280 0001 24 37.3 ---- 180ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, -, /, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT, IN, LE, LT, MAX, MIN, NE, NG, NL, NOTIN, OR, ^=, |, ||, ~=.

ERROR 180-322: Statement is not valid or it is used out of proper order.

81 0002 35 38.282 ;83 run;

ERROR: No DATALINES or INFILE statement.

Error found during compilation

Page 19: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Summary - Compilation Phase• During Compilation– Check syntax– Identify type and length of each new variable (is a data type

conversion needed?)– creates input buffer if there is an INPUT statement for an

external file – creates the Program Data Vector (PDV)– creates descriptor information for data sets and variable

attributes – Other options not discussed here: DROP; KEEP; RENAME;

RETAIN; WHERE; LABEL; LENGTH; FORMAT; ARRAY; BY; ATTRIB; END=, IN=, FIRST, LAST, POINT=

Page 20: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

Summary – Execution Phase1. The DATA step iterates once for each observation being

created. 2. Each time the DATA statement executes, _N_ is incremented by

1.3. Newly created variables set to missing in the PDV.4. SAS reads a data record from a raw data file into the input

buffer (there are other possibilities not discussed here).5. SAS executes any other programming statements for the

current record.6. At the end of the data statements (RUN;) SAS writes an

observation to the SAS data set (OUTPUT PHASE)7. SAS returns to the top of the DATA step (Step 3 above)8. The DATA step terminates when there is no more data.

Page 21: Understanding SAS Data Step Processing

Alan C. Elliott, stattutorials.com

End