sas training - 101

118
© 2006, Cognizant Technology Solutions. All Rights Reserved. The information contained herein is subject to change without notice. SAS Training 101 1

Upload: saurabhdhapodkar

Post on 04-Mar-2015

254 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: SAS Training - 101

© 2006, Cognizant Technology Solutions. All Rights Reserved. The information contained herein is subject to change without notice.

SAS Training 101

1

Page 2: SAS Training - 101

Module 1• Introduction to SAS• Getting/Extracting Data in/from SAS• Working with the Data

Module 2• Introduction to SAS Proc Statements• Combining and Modifying SAS Datasets

Module 3• Proc SQL

• Arrays / DO-END

• Retain / First. Last.

Agenda

Page 3: SAS Training - 101

Introduction to SAS• Getting Started with SAS environment• The two parts of a SAS program• Reading the SAS Log• SAS Dataset

Getting/Extracting Data in/from SAS• SAS Data Libraries• Importing Data• Exporting Data

Working with the Data• Data Step OPTIONS• Using IF-THEN Statements• Using RETAIN and SUM Statements• PROC PRINT and PROC CONTENTS

Agenda – Module 1

Page 4: SAS Training - 101

A programming environment and language for data manipulation and analysis

Data Warehousing - Easily access, manage and analyze data from many sources

Analytical Solutions - From simple to advanced statistics

Business Solutions - Manages and reports on data from many sources

What is SAS ?

Page 5: SAS Training - 101

Interactive windows enable interface with SAS

Navigating SAS Windowing Environment

Contains reports generated by

SAS procedures and DATA steps

Contains reports generated by

SAS procedures and DATA steps

View SAS Datasets

View SAS Datasets

Execute the SAS

Program

Execute the SAS

Program

Getting Started With SAS

Write Programs

Write Programs

Contains information about the processing of this SAS program,

including warning and error messages

Contains information about the processing of this SAS program,

including warning and error messages

Contains reports generated by

SAS procedures and DATA steps

Contains reports generated by

SAS procedures and DATA steps

Page 6: SAS Training - 101

Select the Explorer tab in the SAS window bar to open the Explorer window

Functionality of the SAS explorer is similar to explorers for window-based systems

Select view explorer

Expand and collapse directories on the left. Drill-down and open specific files in the right

Right-click on a SAS dataset and select properties

– Provides general information about the dataset

Double click on the dataset to open it in VIEWTABLE window

– Can be used to edit datasets, create datasets and customize view of a SAS dataset

Exploring SAS Libraries

Page 7: SAS Training - 101

Select file Open or Click on and select the file D:\Projects\......Click on or Select run submit* to submit the program for execution

Enhanced Editor

Access and edit existing SAS programs

Write new SAS programs

Submit SAS programs

Save SAS programs to a file

* Programs can also be executed without opening them in the SAS environment using batch submit

Open a SAS Program

Running a SAS Program

Page 8: SAS Training - 101

Accumulates output in the order in which it is generated

Select Edit Clear All to clear the contents of the window

Log and output windows are open by default. These can also be accessed by selecting window Log and window Output respectively

Log Window Output Window

An audit trail of the SAS session Contains programming statements as submitted Contains notes about

Files read Records read Program execution and results

Contains warning and error messages

LOG and OUTPUT windows

Page 9: SAS Training - 101

Raw DataRaw Data

SAS Data Set

SAS Data Set

Data Step SAS Data Set

SAS Data Set

Proc Step OutputOutput

Data steps are used to CREATE SAS datasets

PROC steps are used to PROCESS SAS datasets

Data steps are used to CREATE SAS datasets

PROC steps are used to PROCESS SAS datasets

A SAS program is a sequence of steps that the user submits for execution

SAS Statements

Usually begin with an identifying keyword Always end with a semicolon Statements that begin with /* and end with */

are treated as comments

SAS Syntax Rules

SAS Statements can be upper/lower case One or more blanks or special characters can

be used separate words They can begin and end in any column A single statement can span multiple lines Several statements can be on the same line

SAS Programs

Page 10: SAS Training - 101

DATA steps

• Begin with DATA statements

• Read and Modify data

• Create a SAS data

PROC steps

• Begin with PROC statements

• Performs specific analysis or function

• Produces results or reports

PROC steps can create data sets

A step ends when SAS encounters a new statement (DATA or PROC statement ) or RUN

DATA step executes line by line

DATA and PROC steps

Page 11: SAS Training - 101

Syntax errors include Misspelled keywords

Missing or invalid punctuation

Invalid options

When SAS encounters a syntax error, SAS identifies the error and writes the location and explanation of the error to the SAS log

daat work.staff;infile ‘raw-data-file’;input LastName $ 1-20 FirstName $

21-30JobTitle $ 36-43 Salary 54-59;

run;

proc print data=work.staffrun;

daat work.staff;infile ‘raw-data-file’;input LastName $ 1-20 FirstName $

21-30JobTitle $ 36-43 Salary 54-59;

run;

proc print data=work.staffrun;

Diagnosing and Correcting Syntax Errors

Debugging a SAS Program

Page 12: SAS Training - 101

data work.staff;infile ‘raw-data-file;input LastName $ 1-20 FirstName $ 21-30

JobTitle $ 36-43 Salary 54-59;run;

proc print data=work.staff;run;

proc means data=work.staff mean max;class JobTitle;var Salary;

run;

data work.staff;infile ‘raw-data-file;input LastName $ 1-20 FirstName $ 21-30

JobTitle $ 36-43 Salary 54-59;run;

proc print data=work.staff;run;

proc means data=work.staff mean max;class JobTitle;var Salary;

run;

Submitting a SAS Program That Contains Unbalanced Quotes

Open and submit the code where the closing quote for the INFILE statement is missing

Submit the program and browse the SAS log

There are no notes in the SAS log because all the SAS statements after the INFILE statement have become part of the quoted string

To correct the problem in the Windows environment, click the break icon

Select Cancel Submitted Statements in the Tasking Manager window and select ok

Canceling Submitted Statements

Page 13: SAS Training - 101

SAS Data Sets:

Variab

le n

ames

Variab

le V

alues

The data portion of a SAS dataset is a rectangular table of data values & descriptor portion is the header

LastName FirstName JobTitle Salary

TORRES JAN Pilot 50000

LANGKAMM SARAH Mechanic 80000

SMITH MICHAEL Mechanic 40000

WAGSCHAL NADJA Pilot 77500

TOERMOEN JOCHEN Pilot 65000

Character valuesNumeric values

Variables (Columns) : Correspond to fields of data, and each data column is named

Observations (Rows) : Correspond to records or data lines

SAS Dataset

Page 14: SAS Training - 101

Variable Names

Can be 32 characters long Can be uppercase, lowercase or mixed-

case. Variable names are not case-sensitive.

Must start with a letter or underscore. Subsequent characters can be letters, underscores or numeric digits (no special character)

Examples Valid names:

Data_5

bad

cub2c3

Invalid names:

Data 5

1bad

count # 5

Variable Values

Variable Types Character: Contain any value, letters, numbers,

special characters, and blanks. Character values are stored with a length of 1 to 32,767 bytes

Numeric: Stored as floating point numbers in 8 bytes of storage by default

Date is stored as a numeric variable in SAS. Conversely, any numeric variable may be interpreted as a date. Internally, a date value is an integer which represents the number of days since January 1, 1960

SAS allows dates to be read and output in various format

Commonly used ones are:

today ( ) function returns the current date A date literal is specified as ‘<formatted date>’ d

e.g. ‘31DEC1959’ d

Stored Value Format Displayed Value

0 MMDDYY8. 01/01/60

0 MMDDYY10. 01/01/1960

-1 DATE9. 31DEC1959

365 DDMMYY10. 31/12/1960

Variable Names and Values

Page 15: SAS Training - 101

Introduction to SAS• Getting Started with SAS environment• The two parts of a SAS program• Reading the SAS Log• SAS Dataset

Getting/Extracting Data in/from SAS• SAS Data Libraries• Importing Data• Exporting Data

Working with the Data• Data Step OPTIONS• Using IF-THEN Statements• Using RETAIN and SUM Statements• PROC PRINT and PROC CONTENTS

Agenda – Module 1

Page 16: SAS Training - 101

A SAS data library is a collection of SAS files that are recognized as a unit by SAS

SAS data libraries are identified by assigning a library reference name

On invoking SAS, one automatically has access to a temporary and a permanent SAS data library

Work - Temporary library

SAS user - Permanent library

One can also create and access new permanent libraries

The work library and its SAS data-files are deleted after the SAS session ends

SAS datasets in permanent libraries are saved after the SAS session ends

Libname sample “C:\mysasfiles”;

SAS Data Library - Sample

SAS File

SAS File

SAS File

SAS Data Libraries

Page 17: SAS Training - 101

data PS_AA_team;

input NAME $ Age prior_work_ex $;

datalines;

Sayaji 40 Y

Vikrant 30 Y

Yashjit 30 Y

Hita . Y

Tuhin 20 N

Sharmila . N

Aditi . N

Shikha . Y

Anirban 30 Y

Lata . Y

Deepak 20 N

Ambrish 20 N

Vaibhav 20 N

;

run;

• Datalines / Cards is used

• Default format of variable is numeric

• Missing value for numeric needs to be entered as “.”

• Default length for character variables is 8

Creating Data

Page 18: SAS Training - 101

General form of an informat:

$ indicates a character format

informat-name names the informat

w is an optional field width

. is the required delimiter

d optionally, specifies a decimal for numeric informats

$informat-namew.d

Informat statement

Page 19: SAS Training - 101

7. or 7.0 reads seven columns of numeric data.

7.2 reads seven columns of numeric data and inserts a decimal point in the data value.

$5. reads five columns of character data and removes leading blanks.

$CHAR5. reads five columns of character data and preserves leading blanks.

COMMA7. reads seven columns of numeric data and removes selected nonnumeric characters, such as dollar signs and commas.

MMDDYY10. reads dates of the form 01/20/2000

Selected Informats

Page 20: SAS Training - 101

List directed input - data must be separated by a delimiter; must read in all variables. In case of delimited data the data values are separated by a specially designated character called the delimiter. For example, in case of comma separated values, the comma separates individual data values from each other.

Column input - data in fixed columns;must know where data starts and ends; can read in selected variables. In fixed format files the data values are placed at pre-specified column addresses in the data file.

Informat - alternative to column input; most flexible; must be used for special data

Input data can have variable names as part of the data values. In case if the data values have the names of the variables specified in the top most row of the file, then one can use PROC IMPORT;

Fixed Format Delimited

Names Available

PROC IMPORT (Use Wizard)

PROC IMPORT

Raw Data

INFILE/INPUT

@ signifies the start of the data

value

INFILE / INPUT

DLM OPTION

Importing Data

Page 21: SAS Training - 101

Raw FilesInfile “X:\raw-file"

LRECL = <length-of-observation> MISSOVER;

Input @<start-of-var1> var1 <length-of-var1>.

@<start-of-var1> var2 <length-of-var2>.

.

.

@<start-of-var1> var3 $<length-of-var3>.

;

To read a fixed file format raw file, one need to know the exact position from where each of the variables start and length of the variable

For all char variable $ symbol is used while declaring its length

If no $ symbol is used that variable by default is taken as numeric

The MISSOVER option prevents SAS from loading a new record when the end of the current record is reached. If SAS reaches the end of the row without finding values for all fields, variables without values are set to missing.

FIRSTOBS = option tells SAS what line to begin reading data

OBS = specifies number of observations to be read

DLM = specifies the delimiter used

Importing Data (Fixed Format / Delimited)

Page 22: SAS Training - 101

data <dataset>;

infile “X:\YYY.txt"

LRECL = 99 MISSOVER;

input @1 DOCID 9.

@10 SPEC $30.

@40 STREET $25.

@65 CITY $20.

@85 STATE $2.

@87 ZIP $3.

@90 PHONE 10. ;

run;

Convert a fixed format file (YYY.txt) to SAS Dataset.Start End Length Type Variable Description

1 9 9 Num DOCID Doctor ID

10 39 30 Char Spec Speciality

40 64 25 Char STREET Address - Street

65 84 20 Char CITY Address - City

85 86 2 Char STATE Address - State

87 89 3 Char ZIP Address - ZIP

90 99 10 Num PHONE Telephone Number

La

yo

ut

of

YY

Y.t

xt

Example:

Page 23: SAS Training - 101

PROC IMPORT OUT=SAS-data-set

DATAFILE=‘external-file-name’

DBMS=file-type;

GETNAMES=YES;

RUN;

General form of the IMPORT procedure

PROC IMPORT datafile='D:\fun\Ritesh Training\comp.csv' out=yyy

DBMS=CSV REPLACE;

GETNAMES=YES;

RUN;

Example Code

PROC IMPORT

Page 24: SAS Training - 101

PROC IMPORT OUT=SAS-data-set

DATAFILE=‘external-file-name’

DBMS=Delimiter REPLACE;

GETNAMES=YES;

RUN;

PROC IMPORT with slight change can read the delimited file. General format is:

PROC IMPORT data = 'D:\fun\Ritesh Training\Broker comp file.txt' out=xxx

DBMS=TAB REPLACE;

GETNAMES=YES;

run;

Example: Following code converts tab delimited file to SAS dataset

Delimited Text Files

Page 25: SAS Training - 101

Select the type of raw file which is to be imported

Browse to the raw file

Wizard is the a SAS provided graphical interface to convert raw data file to SAS dataset. It can only convert Delimited and Excel files to SAS files.

IMPORT Wizard

Page 26: SAS Training - 101

Enter the library name and name where you want to save SAS dataset Press “Finish” to convert raw file to SAS dataset

Import Wizard basically first generates PROC IMPORT code and then executes it. You can save the code that the wizard generates.

IMPORT Wizard

Page 27: SAS Training - 101

The following code segment illustrates the use of the export procedure in SAS to output a filein the csv format.

PROC EXPORT DATA= <Name of Dataset> OUTFILE= <Output Filename> DBMS=CSV REPLACE;

RUN;

Note: The output filename should be given under quotes with the full path

Example Code

SAS dataset can be converted into other file formats by using either “proc export” or the SAS “export wizard”

Exporting Data From SAS

Page 28: SAS Training - 101

Step 1: Click on file and select “Export Data”

Step 2: Select the Data to be exported

SAS “export wizard” allows us to convert a SAS dataset into other file formats without having to write any code.

EXPORT wizard

Page 29: SAS Training - 101

The SAS “export wizard” also allows us to save the corresponding “proc export” code

Step 3: Select the file format

Step 4: Specify the output filename and its location

Step 5: Enter the filename to save the code for export

EXPORT wizard

Page 30: SAS Training - 101

Introduction to SAS• Getting Started with SAS environment• The two parts of a SAS program• Reading the SAS Log• SAS Dataset

Getting/Extracting Data in/from SAS• SAS Data Libraries• Importing Data• Exporting Data

Working with the Data• Data Step OPTIONS• Using IF-THEN Statements• Using RETAIN and SUM Statements• PROC PRINT and PROC CONTENTS

Agenda – Module 1

Page 31: SAS Training - 101

• SAS language has 3 types of options:

• System options – they have the most global influence (stay in effect for the duration of your

job/session) and affect how SAS operates. They are issued when you invoke SAS or when

you use OPTIONS statement

• Statement options – they appear in individual statements and influence how SAS runs that

particular DATA or PROC step. DATA=, for example, is a statement option telling SAS which

dataset to use for a procedure

• Data set options – they affect only how SAS reads or writes an individual data set. You can

use data set options in DATA or PROC statements. Simply put the option between

parenthesis directly following the data set name. example,

• KEEP = variable list ,

• DROP = variable list,

• RENAME = (oldvar = newvar)

• FIRSTOBS = n

• IN = new_var_name

Using SAS Data Set Options

Page 32: SAS Training - 101

• PUT Statement is used to convert variables from numeric to character and INPUT

Statement is used for vice-versa

Character to Numeric Numeric to Character

newvar = INPUT (oldvar,informat); newvar = PUT (oldvar,informat);

Character to Numeric Numeric to Character

newB = INPUT (VarB,1.); newD = PUT (VarD,2.);

PUT/INPUT Statement

Page 33: SAS Training - 101

Basic form:IF Condition THEN action;

If model = ‘Mustang’ Then Make = ‘Ford’;

• You can use symbolic or mnemonic operators

• You may also use the “IN” operator to make comparisons

Example:

If Model IN (‘Corvette’, ‘Camaro’) Then Make = ‘Chevrolet’;

Symbolic Mnemonic= EQ

<>, ^= NE> GT< LT

>= GE<= LE

Using IF-THEN Statements

Page 34: SAS Training - 101

• Single IF-THEN statement can have only one action. To execute more than one action, add DO and END

Example, If Model = ‘Mustang’ Then DO;

Make = ‘Ford’

Size = ‘Compact’

End;

• Alternatively use AND / OR

Example,If Model = ‘Mustang’ and Year < 1975 Then Status = ‘Classic’;

Using IF-THEN Statements

Page 35: SAS Training - 101

Basic form:IF condition THEN action;

ELSE IF condition THEN action;

ELSE action;

• Else is automatically executed for all observations failing to satisfy any of the previous IF statements

• Else statement is simply an IF-THEN statement with an ELSE tacked onto the front

Using IF-THEN-ELSE Statements

Page 36: SAS Training - 101

Data from a survey of home improvements, containing owner’s name, description of work done and cost of improvement. Group the cost into High, Medium, Low.

Gregory cabinet facelift 2000Molly bathroom addition 11350Luther paint exterior 3910Susan second floor 75362.9

Code:

Data home_cost;Infile ‘C:\Home_data.dat’;Input Owner $1-7 Description $9-33 Cost;If Cost < 2000 Then CostGrp = ‘low’;Else if Cost < 10,000 Then CostGrp = ‘medium’;Else CostGrp = ‘high’;Run;

Example

Page 37: SAS Training - 101

Often you want to use some of the observations of the dataset and exclude the rest

Use IF statement in a DATA step• Basic form: IF expression;• Example:

If sex = ‘f’; If sex = ‘m’ Then delete;

Use IF when it is easier to specify a condition for including observations

Use DELETE when it is easier to specify a condition for excluding variables

Subsetting your data

Page 38: SAS Training - 101

• When reading raw data, SAS sets the value of all variables equal to missing at the start of each iteration of the DATA step.

• With RETAIN statement a variable is assigned its value from the previous iterations of the DATA step

• Basic form : RETAIN variables;

RETAIN variables initial-value;

• A sum statement also retains values from previous iteration of the DATA step, but you use it for cases where you simple want to cumulatively add the value of an expression to a variable

• Basic form: Variable + expression

Using RETAIN and SUM statements

Page 39: SAS Training - 101

Data from base ball game containing the date the game was played, team played, hits and run for the game

6-19 Columbia Peaches 8 3

6-20 Columbia Peaches 3 4

7-1 Plains Peanuts 10 5

7-2 Plains Peanuts 2 3

7-4 Sacremento 10 10

7-5 Sacremento 12 8

Team wants two additional variables – cumulative number of runs for the season and maximum number of runs in a game to date.

Example

Page 40: SAS Training - 101

Data games;

Infile ‘C:\Games.dat’;

Input Month 1 Day 3-4 Team $6-25 Hits 27-28 Runs 30-31;

RETAIN MaxRuns;

MaxRuns = Max (MaxRuns, Runs);

RunsToDate + Runs;

Run;

Example (Contd)..

Page 41: SAS Training - 101

Questions ??????

Page 42: SAS Training - 101

Module 1• Introduction to SAS• Getting/Extracting Data in/from SAS• Working with the Data

Module 2• Introduction to SAS Proc Statements• Combining and Modifying SAS Datasets

Module 3• Proc SQL

• Arrays / DO-END

• Retain / First. Last.

Agenda

Page 43: SAS Training - 101

Introduction to SAS Proc Statements» Proc Sort» Proc Means» Proc Freq» Proc Summary» Proc Transpose

Combining and Modifying SAS Datasets» Set statement» Merge statement

Agenda – Module 2

Page 44: SAS Training - 101

Start with the “keyword” – PROCEg :

• PROC CONTENTS DATA = Sales_force_team;

SAS will use the most recently created data if “data” option is not specified

BY statement » “required” for only PROC SORT

» everywhere else SAS performs separate analysis for each combination of BY variables

SAS Procedures

Page 45: SAS Training - 101

Introduction to SAS Proc Statements» Proc Sort» Proc Means» Proc Freq» Proc Summary» Proc Transpose

Combining and Modifying SAS Datasets» Set statement» Merge statement

Agenda – Module 2

Page 46: SAS Training - 101

Default sorting is ascending

Form of PROC SORT statementPROC SORT Data = data-name;

BY variable-1 variable-2 variable-3 … variable-n;

RUN;

NODUPKEY eliminates observation having same value for the BY variablePROC SORT Data = data-name Out = data-name NODUPKEY ;

Sorting in descending

BY variable-1 DESCENDING variable-2 DESCENDING variable-3 ;

PROC SORT

Page 47: SAS Training - 101

data marine;

input NAME $ FAMILY $ length ; datalines;

beluga whale 15

whale shark 40

basking shark 30

gray whale 50

mako shark 12

sperm whale 60

dwarf shark .5

whale shark 40

humpback . 50

blue whale 100

killer whale 30

;

run;

PROC SORT data = marine out = seasort NODUPKEY ;

BY family DESCENDING length;

PROC PRINT data = seasort;

TITLE ‘ Whales and Sharks’;

run;

Whales and Sharks

Obs Name Family Length1 humpback . 50.02 whale shark 40.03 basking shark 30.04 mako shark 12.05 dwarf shark 0.56 blue whale 100.07 sperm whale 60.08 gray whale 50.09 killer whale 30.010 beluga whale 15.0

OUTPUT

PROC SORT … Example

Page 48: SAS Training - 101

Introduction to SAS Proc Statements» Proc Sort» Proc Means» Proc Freq» Proc Summary» Proc Transpose

Combining and Modifying SAS Datasets» Set statement» Merge statement

Agenda – Module 2

Page 49: SAS Training - 101

Form of PROC MEANS statementPROC MEANS Data = data-name options;

BY variable-list;

VAR variable-list;

RUN ;

If PROC MEANS is used with no other option it gives number of non-missing values, mean, std, min and max for all variables

Writing summary statistic into a SAS dataset

PROC MEANS Data = zoo NOPRINT;

VAR lions tigers bears;

OUTPUT OUT = zoosum MEAN ( lions bears ) = Avglionwt Avgbearwt

SUM ( tigers ) = Tottigerwt;

RUN ;

PROC MEANS

Page 50: SAS Training - 101

data cake;

input LastName $ 1-12 Age 13-14 PresentScore 16-17 TasteScore 19-20 Flavor $ 23-32 Layers 34 ;

datalines;

Orlando 27 93 80 Vanilla 1

Ramey 32 84 72 Rum 2

Goldston 46 68 75 Vanilla 1

Roe 38 79 73 Vanilla 2

Larsen 23 77 84 Chocolate .

Davis 51 86 91 Spice 3

Strickland 19 82 79 Chocolate 1

Nguyen 57 77 84 Vanilla .

Hildenbrand 33 81 83 Chocolate 1

Byron 62 72 87 Vanilla 2

Sanders 26 56 79 Chocolate 1

Jaeger 43 66 74 1

Davis 28 69 75 Chocolate 2

Conrad 69 85 94 Vanilla 1

Walters 55 67 72 Chocolate 2

Rossburger 28 78 81 Spice 2

Matthew 42 81 92 Chocolate 2

Becker 36 62 83 Spice 2

Anderson 27 87 85 Chocolate 1

Merritt 62 73 84 Chocolate 1;

proc means data=cake n mean max min range std fw=8;

var PresentScore TasteScore;

title 'Summary of Presentation and Taste Scores';

run;

Summary of Presentation and Taste Scores

The MEANS Procedure

Variable N Mean MaximumMinimu

m RangeStd Dev

PresentScoreTasteScore

2020

76.15081.350

93.00094.000

56.00072.000

37.00022.000

9.3766.611

OUTPUT

PROC MEANS … Example

Page 51: SAS Training - 101

Introduction to SAS Proc Statements» Proc Sort» Proc Means» Proc Freq» Proc Summary» Proc Transpose

Combining and Modifying SAS Datasets» Set statement» Merge statement

Agenda – Module 2

Page 52: SAS Training - 101

Form of PROC FREQ statement

PROC FREQ Data = data-name options;

BY variable-list;

OUTPUT statistic-keyword(s) <OUT=SAS-data-set>;

TABLES request(s) </ option(s)>;

RUN ;

To do this Use this statement

Calculate separate frequency or cross-tabulation tables for each BY group BY

Create an output data set that contains specified statistics OUTPUT

Specify frequency or cross-tabulation tables and request tests and measures of association

TABLES

PROC FREQ

Page 53: SAS Training - 101

data color;

input Region Eyes $ Hair $ Count @@;

label eyes='Eye Color' hair='Hair Color' region='Geographic Region';

datalines;

1 blue fair 23 1 blue red 7 1 blue medium 24

1 blue dark 11 1 green fair 19 1 green red 7

1 green medium 18 1 green dark 14 1 brown fair 34

1 brown red 5 1 brown medium 41 1 brown dark 40

1 brown black 3 2 blue fair 46 2 blue red 21

2 blue medium 44 2 blue dark 40 2 blue black 6

2 green fair 50 2 green red 31 2 green medium 37

2 green dark 23 2 brown fair 56 2 brown red 42

2 brown medium 53 2 brown dark 54 2 brown black 13

;

proc freq data=color;

weight count;

tables eyes hair eyes*hair/out=freqcnt outexpect

sparse;

title 'Eye and Hair Color of European Children';

run;

proc print data=freqcnt noobs;

title2 'Output Data Set from PROC FREQ‘;run;

The TABLES statement requests three tables:

• Eyes and Hair frequencies

• Eyes by Hair cross-tabulation.

OUT = creates FREQCNT data set that contains cross-tabulation table frequencies.

OUTEXPECT stores expected cell frequencies

SPARSE stores zero cell counts in FREQCNT

PROC FREQ … Example

Page 54: SAS Training - 101

Introduction to SAS Proc Statements» Proc Sort» Proc Means» Proc Freq» Proc Summary» Proc Transpose

Combining and Modifying SAS Datasets» Set statement» Merge statement

Agenda – Module 2

Page 55: SAS Training - 101

Form of PROC SUMMARY statement

PROC SUMMARY <option(s)> <statistic-keyword(s)>;

CLASS variable(s) </ option(s)>;

VAR variable(s);

OUTPUT <OUT=SAS-data-set><output-statistic-specification(s)> <id-group-specification(s)> <maximum-id-specification(s)> <minimum-id-specification(s)></ option(s)> ;

RUN;

To do this Use this statement

Calculate separate frequency or crosstabulation tables for each BY group BY

Create an output data set that contains specified statistics OUTPUT

Grouping Variables CLASS

List of variables needs to be summarized VAR

PROC SUMMARY

Page 56: SAS Training - 101

data color;

input Region Eyes $ Hair $ Count @@;

label eyes='Eye Color' hair='Hair Color' region='Geographic Region';

datalines;

1 blue fair 23 1 blue red 7 1 blue medium 24

1 blue dark 11 1 green fair 19 1 green red 7

1 green medium 18 1 green dark 14 1 brown fair 34

1 brown red 5 1 brown medium 41 1 brown dark 40

1 brown black 3 2 blue fair 46 2 blue red 21

2 blue medium 44 2 blue dark 40 2 blue black 6

2 green fair 50 2 green red 31 2 green medium 37

2 green dark 23 2 brown fair 56 2 brown red 42

2 brown medium 53 2 brown dark 54 2 brown black 13

;

proc summary data=color;

class eyes hair;

var count;

Output out = Summary (drop=_freq_) sum=;

run;

PROC SUMMARY … Example

Page 57: SAS Training - 101

Introduction to SAS Proc Statements» Proc Sort» Proc Means» Proc Freq» Proc Summary» Proc Transpose

Combining and Modifying SAS Datasets» Set statement» Merge statement

Agenda – Module 2

Page 58: SAS Training - 101

Used to transpose SAS datasets (turning observations into variables or variables into observations)

Basic form

PROC TRANSPOSE DATA = oldname OUT = newname;

BY variable-list;

ID variable;

VAR variable-list;

To do this Use this statement

Used if you have any grouping variables that you want to retain as variables. These variables are included in transposed data set, but are not themselves transposed

BY

Names the variables whose formatted values will become new variable names. In absence of an ID statement, the new variables will be named COL1, COL2, and so on

ID

Names the variables whose values you want to transpose VAR

Changing observations to variables using PROC TRANSPOSE

Page 59: SAS Training - 101

data color;

input Region Eyes $ Hair $ Count @@;

label eyes='Eye Color' hair='Hair Color' region='Geographic Region';

datalines;

1 blue fair 23 1 blue red 7 1 blue medium 24

1 blue dark 11 1 green fair 19 1 green red 7

1 green medium 18 1 green dark 14 1 brown fair 34

1 brown red 5 1 brown medium 41 1 brown dark 40

1 brown black 3 2 blue fair 46 2 blue red 21

2 blue medium 44 2 blue dark 40 2 blue black 6

2 green fair 50 2 green red 31 2 green medium 37

2 green dark 23 2 brown fair 56 2 brown red 42

2 brown medium 53 2 brown dark 54 2 brown black 13

;

proc transpose data=color out = transpose;

by eyes hair;

id Region;

var count;

run;

PROC TRANSPOSE … Example

Page 60: SAS Training - 101

Introduction to SAS Proc Statements» Proc Sort» Proc Means» Proc Freq» Proc Summary» Proc Transpose

Combining and Modifying SAS Datasets» Set statement» Merge statement

Agenda – Module 2

Page 61: SAS Training - 101

• To read a SAS data set - start with DATA statement specifying the name of the new SAS data set. Then follow with the SET statement specifying the name of the old SAS dataset you want to read

DATA new-data-set;

SET data-set;

• To stack data sets (appending) – With two or more datasets (that have all or most of the same variables but different observations), in addition to reading the data, the SET statement concatenates the datasets one on top of the other

DATA new-data-set;

SET data-set-1 data-set-n;

Using SET Statement

Page 62: SAS Training - 101

• The datasets you want to stack are already sorted by some important variable

• Simple stacking would result in unsorting

• Option 1 – Do a simple stacking and then use Proc SORT

• Recommended Option – Use a BY statement with your SET statement

DATA new-data-set;

SET data-set-1 data-set-n;

BY variable-list;

• Before you can use the BY statement, the datasets must be sorted by the BY variables

Interleaving data sets using SET Statement

Page 63: SAS Training - 101

Introduction to SAS Proc Statements» Proc Sort» Proc Means» Proc Freq» Proc Summary» Proc Transpose

Combining and Modifying SAS Datasets» Set statement» Merge statement

Agenda – Module 2

Page 64: SAS Training - 101

• First sort all datasets by the common variable(s)

• Basic formDATA new-data-set;

MERGE data-set-1 data-set-n;

BY variable-list;

• If the datasets being merged have variables with same names (besides the BY variables), then the variables from the second dataset will overwrite any variables having the same name in the first data set.

• All observations from both the data sets are included in the final data set, irrespective of whether they had a match or not

One to One Match Merge

Page 65: SAS Training - 101

• Each observation in dataset 1 matches with more than one observation in dataset 2

• Basic form

DATA new-data-set;

MERGE data-set-1 data-set-n;

BY variable-list;

• The order of the datasets in the MERGE statement does not matter to SAS, i.e., a one to many merge is same as many to one merge

• One to many merge cannot be done without a BY statement. Without any BY variables for matching, SAS simply joins together the first observation from each data set, then the second observation from each data set and so on.

One to Many Match Merge

Page 66: SAS Training - 101

• We merge the data with certain conditions like: having the data in one file only, the data common to all datasets, the data in one file not present in other

• Basic form

DATA new-data-set;

MERGE data-set-1 (in = a) data-set-2 (in = b);

BY variable-list;

IF condition…..;

• Various Conditions used while merging data sets are:• IF a or b: Union of two datasets

• IF a and b: Intersection of two datasets

• IF a and not b: Data in one file not present in other

Various ways of merging data sets

Page 67: SAS Training - 101

• Say, you want to compare each observation in a group to the group’s mean

• Summarize your data using PROC MEANS and write the results in a new dataset

• Merge the summarized data back with the original data using a one-to-many match merge

Merging Summary statistics with the original data

Page 68: SAS Training - 101

• MERGE cannot be used as there are no common variables.

• You can use two SET statements

DATA new-data-set;

IF _N_ = 1 THEN SET summary-data-set;

SET original-data-set;

• Original-dataset is the data with more than one observation and summary data set is the data with a single observation. SAS reads original data set in a normal SET statement. It also reads the summary data set with the SET statement but only in the first iteration of the data step and then retains the value of variables from summary dataset for all observations in new data set

Combining a grand total with the original data

Page 69: SAS Training - 101

• SAS language has 3 types of options:

• System options – they have the most global influence (stay in effect for the duration of your job/session) and affect how SAS operates. They are issued when you invoke SAS or when you use OPTIONS statement

• Statement options – they appear in individual statements and influence how SAS runs that particular DATA or PROC step. DATA=, for example, is a statement option telling SAS which dataset to use for a procedure

• Data set options – they affect only how SAS reads or writes an individual data set. You can use data set options in DATA or PROC statements. Simply put the option between parenthesis directly following the data set name. Example-

• KEEP = variable list , • DROP = variable list,• RENAME = (oldvar = newvar)• FIRSTOBS = n• IN = new-var-name

Using SAS Data Set Options

Page 70: SAS Training - 101

• Can be used while combining two datasets, to track which of the original data sets contributed to each observation

• Unlike most variables, IN= variables are temporary, exiting only during the current DATA step

• SAS gives the IN= variables a value of 0 or 1 (1 implying that the dataset did contribute to the current observation and a value of 0 means that it did not)

Tracking and selecting observations with the IN = Option

Page 71: SAS Training - 101

• To create multiple datasets in a single DATA step, simply put more than one data set name in your DATA statement

• ExampleDATA lions tigers bears;

• In the above example, SAS would create 3 identical data sets

• To create different datasets, use the OUTPUT statement

• Basic formOUTPUT data-set-name;

• ExampleIF family = “Ursidae” then OUTPUT bears;

Writing multiple data sets using the OUTPUT statement

Page 72: SAS Training - 101

• To write several observations for each pass through the DATA step, put an OUTPUT statement in a DO loop or just use several OUTPUT statements

• Example - Say we want to generate data points for plotting the equation y=x2

DATA generate;

DO x = 1 to 6

Y = x ** 2;

OUTPUT;

END;

• Since the OUTPUT statement is within the DO loop, an observation is created each time through the loop. Without the OUTPUT statement, SAS would have written only one observation at the end of the DATA step

Making several observations from one using the OUTPUT statement

Page 73: SAS Training - 101

• To do certain modifications or changes to the observations of the data

• To extract certain portion of the data valuenew_variable = SUBSTR (variable, starting text, length of text)

• To check the length of values:new_variable = LENGTH (variable)

• To remove extra spaces within values:new_variable = COMPRESS (variable)

• To extract the data after some special characters like “-”, “(“, “_” etc.new_variable = SCAN (variable, position of special character, special character)

• To extract month, year or day part of dates:new_variable = MONTH (variable) or YEAR (variable)

• When variable has both Date and Time i.e. “23Apr06 00:00:00”, the date part is extracted using:new_variable = DATEPART (variable)

Some useful functions used in SAS

Page 74: SAS Training - 101

Module 1• Introduction to SAS• Getting/Extracting Data in/from SAS• Working with the Data

Module 2• Introduction to SAS Proc Statements• Combining and Modifying SAS Datasets

Module 3• Proc SQL

• Arrays / DO-END

• Retain / First. Last.

Agenda

Page 75: SAS Training - 101

Proc SQL

Arrays / DO-END

Retain / First. Last.

Agenda – Module 3

Page 76: SAS Training - 101

What can SQL do?

» Selecting

» Ordering/sorting

» Subsetting

» Restructuring

» Creating table/view

» Joining/Merging

» Transforming variables

» Editing

PROC SQL – What?

Page 77: SAS Training - 101

The Advantage of using SQL

» Combined functionality

» Faster for smaller tables

» SQL code is more portable for non-SAS applications

» Not require presorting

» Not require common variable names to join on. (need same type , length)

PROC SQL – Why?

Page 78: SAS Training - 101

PROC SQL;

SELECT DISTINCT rating FROM MFE.MOVIES;

QUIT;

The simplest SQL code, need 3 statements

By default, it will print the resultant query, use NOPRINT option to suppress this feature.

Begin with PROC SQL, end with QUIT; not RUN;

Need at least one SELECT… FROM statement

DISTINCT is an option that removes duplicate rows

Selecting Data

Page 79: SAS Training - 101

PROC SQL ;

SELECT *

FROM MFE.MOVIES

ORDER BY category;

QUIT;

Remember the placement of the SAS statements has no effect; so we can put the middle statement into 3 lines

SELECT * means we select all variables from dataset MFE.MOVIES

Put ORDER BY after FROM

We sort the data by variable “category”

Ordering/Sorting Data

Page 80: SAS Training - 101

PROC SQL;

SELECT title, category

FROM MFE.MOVIES

WHERE category CONTAINS 'Action';

QUIT;

Use comma (,) to separate selected variables

CONTAINS in WHERE statement only for character variables

Also try WHERE UPCASE(category) LIKE '%ACTION%';

Use wildcard char. Percent sign (%) with LIKE operator.

Sub-Setting DataCharacter searching in WHERE

Page 81: SAS Training - 101

PROC SQL;

SELECT title, category, rating

FROM MFE.MOVIES

WHERE category =* 'Drana';

QUIT;

Always Put WHERE after FROM

Sounds like operator =*

Search movie title for the phonetic variation of “drama”, also help possible spelling variations

Sub-Setting DataPhonetic Matching in WHERE

Page 82: SAS Training - 101

PROC SQL;

CREATE TABLE ACTION AS

SELECT title, category

FROM MFE.MOVIES

WHERE category CONTAINS 'Action';

QUIT;

CREATE TABLE … AS can always be in front of SELECT … FROM statement to build a sas file.

In SELECT, the results of a query are converted to an output object (printing). Query results can also be stored as data. The CREATE TABLE statement creates a table with the results of a query. The CREATE VIEW statement stores the query itself as a view. Either way, the data identified in the query can beused in later SQL statements or in other SAS steps.

Produce a new dataset (table) ACTION in work directory, no printing

Creating New DataCreate Table

Page 83: SAS Training - 101

PROC SQL;

SELECT *

FROM MFE.CUSTOMERS, MFE.MOVIES;

QUIT;

Terminology: Join (Merge) datasets (tables)

No prior sorting required – one advantage over DATA MERGE

Use comma (,) to separate two datasets in FROM

Without WHERE, all possible combinations of rows from each tables is produced, all columns are included

Turn on the HTML result option for better display: Tool/Options/Preferences…/Results/ check Create HTML/OK

Join Tables (Merge datasets)Cartesian Join

Page 84: SAS Training - 101

PROC SQL;

SELECT *,

COUNT(title) AS notitle,

MAX(year) AS most_recent,

MIN(year) AS earliest,

SUM(length) AS total_length,

NMISS(rating) AS nomissing

FROM MFE.MOVIES

GROUP BY rating;

QUIT;

Simple summarization functions available

All function can be operated in GROUPs

Transforming DataSummarizing Data using SQL functions

Page 85: SAS Training - 101

Proc SQL

Arrays / DO-END

Retain / First. Last.

Agenda – Module 3

Page 86: SAS Training - 101

You can use arrays to simplify programs that

» perform repetitive calculations

» create many variables with the same attributes

» read data

» rotate SAS data sets by making variables into observations or observations into variables

» compare variables

» perform a table lookup.

Array Processing

Page 87: SAS Training - 101

An array in SAS provides a means for repetitively processing variables using a do-loop. Arrays are merely a convenient way of grouping variables, and do not persist beyond the data step in which they are used

SAS arrays can be used for simple repetitive tasks, reshaping data sets, and remembering values from observation-to-observation

Arrays can be used to allow some traditional matrix-style programming techniques to be used in the data step

In short a SAS array » is a temporary grouping of SAS variables that are arranged in a particular order

» is identified by an array name

» exists only for the duration of the current DATA step

» is not a variable.

Each value in an array is» called an element

» identified by a subscript that represents the position of the element in the array.

What Is a SAS Array?

Page 88: SAS Training - 101

ARRAY name<fnelemg> <$> <<elements <(initial-values)>>;

Examples:•array x x1-x3;•array check{5} _temporary_;•array miss{4} _temporary_ (9 9 99 9);•array dept $ dept1-dept4 ('Sales',‘ Research', ‘Training');•array value{3}; * generates value1, value2 and value3;

All variables in an array must have the same type (numeric or character)

An array name can't have the same name as a variable

You must explicitly state the number of elements when using _temporary_; in other cases SAS figures it out from context, generating new variables if necessary.

Array Statement: Syntax

Page 89: SAS Training - 101

...

D

ID QTR4QTR2 QTR3QTR1

CONTRIBCONTRIB

Firstelement

Secondelement

Thirdelement

Fourthelement

Array references

CONTRIB{1} CONTRIB{2} CONTRIB{3} CONTRIB{4}

Array name

What is a SAS Array?

Page 90: SAS Training - 101

The ARRAY statement defines the elements in an array. These elements will be processed as a group. You refer to elements of the array by the array name and subscript.

ARRAY array-name {subscript} <$> <length> <array-elements> <(initial-value-list)>;

ARRAY array-name {subscript} <$> <length> <array-elements> <(initial-value-list)>;

The ARRAY Statement

Page 91: SAS Training - 101

The ARRAY statement

» must contain all numeric or all character elements

» must be used to define an array before the array name can be referenced

» creates variables if they do not already exist in the PDV

» is a compile-time statement.

The ARRAY Statement

Page 92: SAS Training - 101

Write an ARRAY statement that defines the four quarterly contribution variables as elements of an array.

array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4;

Firstelement

Secondelement

Thirdelement

Fourthelement

ID QTR4QTR2 QTR3QTR1

CONTRIBCONTRIB

...

Defining an Array

Page 93: SAS Training - 101

Variables that are elements of an array need not have similar, related or numbered names.

array Contrib2{4} Q1 Qrtr2 ThrdQ Qtr4;

...

QTR4QRTR2 THRDQQ1

CONTRIB2CONTRIB2

Firstelement

Secondelement

Thirdelement

Fourthelement

ID

Defining an Array

Page 94: SAS Training - 101

Array processing often occurs within DO loops. An iterative DO loop that processes an array has the following form:

To execute the loop as many times as there are elements in the array, specify that the values of index-variable range from 1 to number-of-elements-in-array.

DO index-variable=1 TO number-of-elements-in-array; additional SAS statements using array-name{index-variable}…END;

DO index-variable=1 TO number-of-elements-in-array; additional SAS statements using array-name{index-variable}…END;

Processing an Array

Page 95: SAS Training - 101

CONTRIB{QTR}CONTRIB{QTR}

4

CONTRIB{4}

3

CONTRIB{3}

2

CONTRIB{2}

array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4;do Qtr=1 to 4; Contrib{Qtr}=Contrib{Qtr}*1.25;end;

QTR4

QTR2

QTR3

QTR1

1

Value of index variable Qtr

CONTRIB{1}

array reference

...

Firstelement

Secondelement

Thirdelement

Fourthelement

Processing an Array

Page 96: SAS Training - 101

...

When Qtr=1

Qtr1=Qtr1*1.25;

data charity(drop=Qtr); set prog2.donate; array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4; do Qtr=1 to 4; Contrib{Qtr}=Contrib{Qtr}*1.25; end; run;

Contrib{1}=Contrib{1}*1.25;

Performing Repetitive Calculations

Page 97: SAS Training - 101

...

When Qtr=2

Qtr2=Qtr2*1.25;

data charity(drop=Qtr); set prog2.donate; array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4; do Qtr=1 to 4; Contrib{Qtr}=Contrib{Qtr}*1.25; end; run;

Contrib{2}=Contrib{2}*1.25;

Performing Repetitive Calculations

Page 98: SAS Training - 101

When Qtr=3

...

Qtr3=Qtr3*1.25;

data charity(drop=Qtr); set prog2.donate; array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4; do Qtr=1 to 4; Contrib{Qtr}=Contrib{Qtr}*1.25; end; run;

Contrib{3}=Contrib{3}*1.25

Performing Repetitive Calculations

Page 99: SAS Training - 101

When Qtr=4

...

Qtr4=Qtr4*1.25;

data charity(drop=Qtr); set prog2.donate; array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4; do Qtr=1 to 4; Contrib{Qtr}=Contrib{Qtr}*1.25; end; run;

Contrib{4}=Contrib{4}*1.25;

Performing Repetitive Calculations

Page 100: SAS Training - 101

Partial PROC PRINT Output

ID Qtr1 Qtr2 Qtr3 Qtr4

E00224 15.00 41.25 27.50 .E00367 43.75 60.00 50.00 37.50E00441 . 78.75 111.25 112.50E00587 20.00 23.75 37.50 36.25E00598 5.00 10.00 7.50 1.25

proc print data=charity noobs;run;

Performing Repetitive Calculations

Page 101: SAS Training - 101

Calculate the percentage that each quarter's contribution represents of the employee's total annual contribution. Base the percentage only on the employee's actual contribution and ignore the company contributions.

Partial Listing of prog2.donate

ID Qtr1 Qtr2 Qtr3 Qtr4

E00224 12 33 22 .E00367 35 48 40 30

Creating Variables with Arrays

Page 102: SAS Training - 101

data percent(drop=Qtr); set prog2.donate; Total=sum(of Qtr1-Qtr4); array Contrib{4} Qtr1-Qtr4; array Percent{4}; do Qtr=1 to 4; Percent{Qtr}=Contrib{Qtr}/Total; end; run;

The second ARRAY statement creates four numeric variables: Percent1, Percent2, Percent3, and Percent4.

c07s3d1.sas

Creating Variables with Arrays

Page 103: SAS Training - 101

ID Percent1 Percent2 Percent3 Percent4

E00224 18% 49% 33% .E00367 23% 31% 26% 20%E00441 . 26% 37% 37%E00587 17% 20% 32% 31%E00598 21% 42% 32% 5%

proc print data=percent noobs; var ID Percent1-Percent4; format Percent1-Percent4 percent6.;run;

Partial PROC PRINT Output

Creating Variables with Arrays

Page 104: SAS Training - 101

Calculate the difference in each employee's actual contribution from one quarter to the next.

Partial Listing of prog2.donate

ID Qtr1 Qtr2 Qtr3 Qtr4

E00224 12 33 22 .E00367 35 48 40 30

Firstdifference

Seconddifference

Thirddifference

...

Creating Variables with Arrays

Page 105: SAS Training - 101

data change(drop=i); set prog2.donate; array Contrib{4} Qtr1-Qtr4; array Diff{3}; do i=1 to 3; Diff{i}=Contrib{i+1}-Contrib{i}; end; run;

c07s3d2.sas

Creating Variables with Arrays

Page 106: SAS Training - 101

When i=1

...

Diff1=Qtr2-Qtr1;

data change(drop=i); set prog2.donate; array Contrib{4} Qtr1-Qtr4; array Diff{3}; do i=1 to 3; Diff{i}=Contrib{i+1}-Contrib{i}; end; run;

Diff{1}=Contrib{2}-Contrib{1};

Creating Variables with Arrays

Page 107: SAS Training - 101

When i=2

...

Diff2=Qtr3-Qtr2;

data change(drop=i); set prog2.donate; array Contrib{4} Qtr1-Qtr4; array Diff{3}; do i=1 to 3; Diff{i}=Contrib{i+1}-Contrib{i}; end; run;

Diff{2}=Contrib{3}-Contrib{2};

Creating Variables with Arrays

Page 108: SAS Training - 101

When i=3

...

Diff3=Qtr4-Qtr3;

data change(drop=i); set prog2.donate; array Contrib{4} Qtr1-Qtr4; array Diff{3}; do i=1 to 3; Diff{i}=Contrib{i+1}-Contrib{i}; end; run;

Diff{3}=Contrib{4}-Contrib{3};

Creating Variables with Arrays

Page 109: SAS Training - 101

ID Diff1 Diff2 Diff3

E00224 21 -11 .E00367 13 -8 -10E00441 . 26 1E00587 3 11 -1E00598 4 -2 -5

proc print data=change noobs; var ID Diff1-Diff3; run;

Partial PROC PRINT Output

Creating Variables with Arrays

Page 110: SAS Training - 101

Determine the difference between employee contributions and last year‘s average quarterly goals of $10, $15, $5, and $10 per employee.

data compare(drop=Qtr Goal1-Goal4); set prog2.donate; array Contrib{4} Qtr1-Qtr4; array Diff{4}; array Goal{4} Goal1-Goal4 (10,15,5,10); do Qtr=1 to 4; Diff{Qtr}=Contrib{Qtr}-Goal{Qtr}; end;

run;

Assigning Initial Values

Page 111: SAS Training - 101

ID Diff1 Diff2 Diff3 Diff4

E00224 2 18 17 .E00367 25 33 35 20E00441 . 48 84 80E00587 6 4 25 19E00598 -6 -7 1 -9

proc print data=compare noobs; var ID Diff1 Diff2 Diff3 Diff4;run;

Partial PROC PRINT Output

Assigning Initial Values

Page 112: SAS Training - 101

Proc SQL

Arrays / DO-END

Retain / First. Last.

Agenda – Module 3

Page 113: SAS Training - 101

_N_ and _ERROR_

» N_ indicates the number of times SAS has looped through the DATA step (not necessarily equal to the observation number)

» _ERROR_ has a value of 1 if there is a data error for that observation and 0 if there isn’t

FIRST. variable and LAST. Variable

» FIRST. variable and LAST. variable are available when using a BY statement in a DATA step.

» The FIRST. variable will have a value 1 when SAS is processing an observation with the first occurrence of a new value for that variable and a value of 0 for the other observations.

» Similarly for LAST. variable, value is 1 for an observation with the last occurrence of a value for that variable.

Using SAS Automatic Variables

Page 114: SAS Training - 101

data real_life; input person topicA;cards;1 0 1 1 3 -1 1 0 2 0 1 1 2 -1 2 -1 3 0 3 1 4 0 1 1 4 1 4 0 2 -1 4 0 4 0 1 -1 ;run;

The goal is to compare each observation with the previous and the next observation. If they are the same then flag the observation….

Use of Retain, first. and last.

Page 115: SAS Training - 101

We need to number the observations within each person. We will be using first. person in the process of doing this, so we must first sort the data on person. Then we will create the count variable which will enumerates the observations within each person.

proc sort data=real_life out=sort_real; by person;run;

data count_real; set sort_real; retain count; by person; if first.person then count = 0; count = count + 1;run;

proc print data=count_real noobs;run;

….Using first.

Page 116: SAS Training - 101

data wide_real; set count_real; array AtopicA(6) topicA_1-topicA_6; retain topicA_1-topicA_6; by person; if first.person then do; do i = 1 to 6; AtopicA[i] = .; end; end; AtopicA(count) = topicA; /*looping across values in the

variable count*/ if last.person then output; /* outputs only the last obs

per person */run;

proc print data=wide_real noobs; var person topicA_1-topicA_6;run;

We now convert the data set from long to wide.

Note: We are using first. person and last. person but we do not need to resort the data since it is already sorted on person.

….Use of both first. and last.

Page 117: SAS Training - 101

Now, let's find the people who have the same value for 3 observations in a row.

data three; set wide_real; array topic(6) topicA_1-topicA_6; do i = 2 to 5; if topic[i-1] ne . & topic[i] ne . & topic[i+1]

ne . & topic[i]=topic[i-1] & topic[i]=topic[i+1] then

flagA=1; end; if flagA=. then flagA=0;run;

proc print data=three noobs; var person topicA_1-topicA_6 flagA;run;

….Use of both first. and last.

Page 118: SAS Training - 101

17

Thank you !

© 2006, Cognizant Technology Solutions. Confidential