structuring sas data sets and working with multiple observations per subject

38
Structuring SAS Data Sets and Working with Multiple Observations per Subject Timothy Forsyth Ashok Viswanathan Debbie McCullough Donn Garvert

Upload: cicero

Post on 22-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Timothy Forsyth Ashok Viswanathan Debbie McCullough Donn Garvert. Structuring SAS Data Sets and Working with Multiple Observations per Subject. Why Restructure?. Participant #206. 3.0. 2.5. 2.0. Ave. Positive Mood. 1.5. Ave. Negative Mood. 1.0. 0.5. 0.0. 1. 2. 3. 4. 5. 6. 7. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Structuring SAS Data Sets and Working with Multiple Observations per SubjectTimothy ForsythAshok ViswanathanDebbie McCulloughDonn Garvert

Page 2: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Why Restructure?

Some operations are more convenient when there is one observation per subject Example: Regression analysis

Some operations are more convenient when there are several observations per subject Example: Plot individual observations as a time series

Participant #206

0.0

0.5

1.0

1.5

2.0

2.5

3.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Day Since Injury

Ave. Positive MoodAve. Negative Mood

Page 3: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Converting Data Set with One Observation per Subject to a Data Set with Several Observations per Subject

• First method: Using arrays– Can use arrays to transpose the data set– Gives user more control

• Original data:

Page 4: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Converting Data Set with One Observation per Subject to a Data Set with Several Observations per Subject

• Transposing the entire data set

Page 5: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Converting a Data Set with Several Observations per Subject to a Data Set with One Observation per Subject

• Transposing the entire data set back to original form

Page 6: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Converting a Data Set with Several Observations per Subject to a Data Set with One Observation per Subject

• Key commands:• RETAIN: since these variables are not in the dataset rainfall1 we must

retain these values as SAS will set to missing if we do not• CALL MISSING: this will set any number of numeric values or character

values to missing all at once• In our example, if there was a missing data point somewhere in between

month 1 and month 5 for each subject the CALL MISSING command will set these values to missing all at once. We use the OF command so that SAS looks at all 5 values.

Page 7: Structuring SAS Data Sets and Working with Multiple Observations per Subject

PROC TRANSPOSE: Converting a Data Set with One Observation per Subject to a Data Set with Several Observations per Subject

• Without adding options to PROC TRANSPOSE we will get a confusing output, but the data will be correct:

Page 8: Structuring SAS Data Sets and Working with Multiple Observations per Subject

PROC TRANSPOSE: Converting a Data Set with One Observation per Subject to a Data Set with Several Observations per Subject

• When we add options to PROC TRANSPOSE we can get a cleaner looking data set• Renamed _name_ to “month”• Renamed col1 to “rainfall”• Dropped subjects with missing data

Page 9: Structuring SAS Data Sets and Working with Multiple Observations per Subject

PROC TRANSPOSE: Converting a Data Set with Several Observations per Subject to a Data Set with One Observation per Subject

• Transposing the rainfall data back to original form

Page 10: Structuring SAS Data Sets and Working with Multiple Observations per Subject

PROC TRANSPOSE: Converting a Data Set with Several Observations per Subject to a Data Set with One Observation per Subject

PREFIX= option: this was not used in the example, but can be used as an option in this conversion Useful when your id variable is a number For example if month was listed as 1, 2, 3, 4, 5 we could use

PREFIX=Month ▪ This will result in Month 1, Month 2, …, Month 5

Page 11: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Chapter 24: Multiple Observations per Subject

Data Sets Often have two or more observations per subject (or other groupings)

Patients who have repeated visits to the Doctor’s office or clinic Sales at a store on a given day. Inches of rainfall over many months in a given city (Current Example)

This is known as Longitudinal Data

SAS processes data one observation at a time so special techniques are needed to perform calculations across observations

I will cover :

1) Identifying the first and last observation in a group.

2) A few different ways to count the number of occurrences of a subject (or other grouping.). In our example we will count, for a given city, the number of months that had their average rainfall recorded.

Page 12: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Listing of SAS Data Set Rainfall Ave Average City Rainfall Temp Sunlight

Alameda 5.8 53.6 8.9Fremont 5.2 52.9 8.7Fremont 4.5 57.6 9.5Fremont 3.3 59.7 10.1Fremont 2.2 60.6 11.0Fremont 1.8 63.9 11.5Hayward 1.4 60.4 9.3Hayward 3.6 57.8 10.1Hayward 2.0 62.5 10.2Hayward 4.1 55.2 10.8Hayward 5.7 53.6 11.2Oakland 3.1 59.2 9.5Oakland 3.4 60.1 10.1Sunnyvale 5.7 54.2 9.4Sunnyvale 3.4 56.8 9.8Sunnyvale 4.9 55.2 10.3Sunnyvale 5.0 52.9 10.7

Three Step process:

Step 1) Sort the data first by the grouping variable (City) and then by the counting variable(Rainfall).

Step 2) Create First and Last Variables.

Step 3) Counting the months of recorded rainfall using either a data step or proc sql.

Goal: Create a Variable That Counts the Number of Times Each City Has Had the Rainfall Recorded in a Five Month Period

Page 13: Structuring SAS Data Sets and Working with Multiple Observations per Subject

city Rainfall

Alameda 5.8Fremont 5.2Fremont 4.5Fremont 3.3Fremont 1.8 Hayward 1.4Hayward 3.6Hayward 2.0 Hayward 4.1 Hayward 5.7 Oakland 3.1 Oakland 3.4 Sunnyvale 5.7 Sunnyvale 3.4 Sunnyvale 4.9Sunnyvale 5.0

proc sort data=rainfall5;by city rainfall;run;proc print data=rainfall5;var city rainfall;run;

city Rainfall

Alameda 5.8Fremont 1.8Fremont 2.2Fremont 3.3Fremont 4.5Fremont 5.2Hayward 1.4Hayward 2.0Hayward 3.6Hayward 4.1Hayward 5.7Oakland 3.1Oakland 3.4Sunnyvale 3.4Sunnyvale 4.9Sunnyvale 5.0Sunnyvale 5.7

Dataset Rainfall has been sorted first by city and second by rainfall

Step One: proc sort. Sort by Grouping Variable and Then by Counting Variable

Page 14: Structuring SAS Data Sets and Working with Multiple Observations per Subject

city RainfallAlameda 5.8Fremont 1.8Fremont 2.2Fremont 3.3Fremont 4.5Fremont 5.2Hayward 1.4Hayward 2.0Hayward 3.6Hayward 4.1Hayward 5.7Oakland 3.1Oakland 3.4Sunnyvale 3.4Sunnyvale 4.9Sunnyvale 5.0Sunnyvale 5.7

data rainfall_last rainfall_first;set rainfall5;by city;

if last.city then output rainfall_last;else if first.city then output rainfall_first;run;

proc print data=rainfall_first;run;

Listing of First_City

city Rainfall

Fremont 1.8Hayward 1.4Oakland 3.1Sunnyvale 3.4

Using the set dataset; by var(city); creates two temporary SAS variables, first.var and last.var. These two are logical variables; They equal 1 if true and 0 if false. In our case we have generated variables first.city and last.city.

Step Two: Create First. and Last. Variables

Page 15: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Logical Variables first.var and last.var

city=Alameda Rainfall=5.8 FIRST.city=1 LAST.city=1city=Fremont Rainfall=1.8 FIRST.city=1 LAST.city=0city=Fremont Rainfall=2.2 FIRST.city=0 LAST.city=0city=Fremont Rainfall=3.3 FIRST.city=0 LAST.city=0city=Fremont Rainfall=4.5 FIRST.city=0 LAST.city=0city=Fremont Rainfall=5.2 FIRST.city=0 LAST.city=1city=Hayward Rainfall=1.4 FIRST.city=1 LAST.city=0city=Hayward Rainfall=2 FIRST.city=0 LAST.city=0

city RainfallAlameda 5.8Fremont 1.8Fremont 2.2Fremont 3.3Fremont 4.5Fremont 5.2Hayward 1.4Hayward 2.0Hayward 3.6Hayward 4.1Hayward 5.7Oakland 3.1Oakland 3.4Sunnyvale 3.4Sunnyvale 4.9Sunnyvale 5.0Sunnyvale 5.7

Observation 1 for Alameda is both the first and the last variable so first.city =1(true )andLast.city=1(true). Obs 1 for Fremont first.city = 1 last.city =0, etc., etc.

data rainfall_last rainfall_first;set rainfall5;by city;put city= rainfall= first.city= last.city=;if last.city then output rainfall_last;else if first.city then output rainfall_first;run;

Contents from the log

Page 16: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Listing of Counts # of Months City Rainfall RecordedAlameda 5.8 1Fremont 5.2 5Hayward 5.7 5Oakland 3.4 2Sunnyvale 5.7 4

city RainfallAlameda 5.8Fremont 1.8Fremont 2.2Fremont 3.3Fremont 4.5Fremont 5.2Hayward 1.4Hayward 2.0Hayward 3.6Hayward 4.1Hayward 5.7Oakland 3.1Oakland 3.4Sunnyvale 3.4Sunnyvale 4.9Sunnyvale 5.0Sunnyvale 5.7

data months_of_rec_rainfall;set rainfall5;by city;if first.city then N_months_rec = 0;N_months_rec +1;if last.city then output;run;title 'Listing of Counts';proc print data=months_of_rec_rainfall label noobs;label N_months_rec = '# of Months Recorded';var City Rainfall N_months_rec;;run;

Prediction(actual) ; Alameda 1(1) , Fremont 5(5), Hayward 5(5) Oakland 2(2), Sunnyvale 4(4).

Step Three: Use Data Step to Count the Number of Months of Recorded Rainfall

Page 17: Structuring SAS Data Sets and Working with Multiple Observations per Subject

data months_of_rec_rainfall;set rainfall5;by city;if first.city then N_months_rec= 0; 1) N_months_rec +1; 2)if last.city then output; 3) run;title 'Listing of Counts';proc print data=months_of_rec_rainfall label noobs;label N_months_rec = '# of Months Recorded';var City Rainfall N_months_rec;;run;

1) Initialize the counter at zero 2) Sum Statement 3) Conditional Statement

A Look at the Previous SAS Code

Page 18: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Counting the Number of Months with Recorded Rainfall Using proc sql

city RainfallAlameda 5.8Fremont 1.8Fremont 2.2Fremont 3.3Fremont 4.5Fremont 5.2Hayward 1.4Hayward 2.0Hayward 3.6Hayward 4.1Hayward 5.7Oakland 3.1Oakland 3.4Sunnyvale 3.4Sunnyvale 4.9Sunnyvale 5.0Sunnyvale 5.7

months_rec_city rain

Alameda 1Fremont 5Hayward 5Oakland 2Sunnyvale 4

proc sql;create table Months_Rec_Rain asselect city,count(city) as months_rec_rainfrom rainfall5group by city;quit;

The proc sql gives the same result as the data step.

Page 19: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Summary Chapter 24: Part One

It was shown, via a three step process, how to create a variable to count the number of occurrences of a grouping variable. The example shown dealt with the number of months of recorded rainfall in a given city. These techniques have utility in the medical or clinical setting.

Page 20: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Preparing the Data – Converting Data Set to Longitudinal Data – An Alternative Method

SAS CODE

/*eliminate missing values*//*create longitudinal data*/ data rainfall1; set rainfall; array rainfall_array{5} month_1-month_5; array temp_array{5} temp_1-temp_5; array hours_array{5} hours_1-hours_5; do month = 1 to 5; if missing(rainfall_array{month}) then leave; Rain = rainfall_array{month}; AveTemp = temp_array{month}; AverageSunlight = hours_array{month}; Output; end; keep city rain month AveTem AverageSunlight; run; proc print data=rainfall1; run;

SAS OUTPUT

Output: Average

Obs city month Rain AveTemp Sunlight  1 Hayward 1 1.4 60.4 9.3 2 Hayward 2 3.6 57.8 10.1 3 Hayward 3 2.0 62.5 10.2 4 Hayward 4 4.1 55.2 10.8 5 Hayward 5 5.7 53.6

11.2 6 Oakland 1 3.1 59.2 9.5 7 Oakland 2 3.4 60.1 10.1 8 Alameda 1 5.8 53.6 8.9 9 Sunnyval 1 5.7 54.2 9.4 10 Sunnyval 2 3.4 56.8 9.8 11 Sunnyval 3 4.9 55.2 10.3 12 Sunnyval 4 5.0 52.9 10.7 13 Fremont 1 5.2 52.9 8.7 14 Fremont 2 4.5 57.6 9.5 15 Fremont 3 3.3 59.7 10.1 16 Fremont 4 2.2 60.6 11.0 17 Fremont 5 1.8 63.9 11.5

Page 21: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Counting the Number of Months That Data Has Been Recorded for Each City

A FEW POINTS TO NOTE:

A proc freq approach can be used in addition to a proc means approach

We present here the proc means approach

Both ways can give us the same result/output

CODE

proc means data=rainfall1 nway noprint;

class city;output out=counts (rename=(_freq_ =

N_Recorded)

drop = _type_);

run;proc print data=counts;run;

Things to notice: nway nprint rename

Page 22: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Output

The SAS System 14:46 Sunday, May 30, 2010 13  AverageObs city N_Recorded _STAT_ month Rain AveTemp Sunlight  1 Alameda 1 N 1.00000 1.00000 1.0000 1.0000 2 Alameda 1 MIN 1.00000 5.80000 53.6000 8.9000 3 Alameda 1 MAX 1.00000 5.80000 53.6000 8.9000 4 Alameda 1 MEAN 1.00000 5.80000 53.6000 8.9000 5 Alameda 1 STD . . . . 6 Fremont 5 N 5.00000 5.00000 5.0000 5.0000 7 Fremont 5 MIN 1.00000 1.80000 52.9000 8.7000 8 Fremont 5 MAX 5.00000 5.20000 63.9000 11.5000 9 Fremont 5 MEAN 3.00000 3.40000 58.9400 10.1600 10 Fremont 5 STD 1.58114 1.45430 4.0685 1.1261 11 Hayward 5 N 5.00000 5.00000 5.0000 5.0000 12 Hayward 5 MIN 1.00000 1.40000 53.6000 9.3000 13 Hayward 5 MAX 5.00000 5.70000 62.5000 11.2000 14 Hayward 5 MEAN 3.00000 3.36000 57.9000 10.3200 15 Hayward 5 STD 1.58114 1.71552 3.6469 0.7259 16 Oakland 2 N 2.00000 2.00000 2.0000 2.0000 17 Oakland 2 MIN 1.00000 3.10000 59.2000 9.5000 18 Oakland 2 MAX 2.00000 3.40000 60.1000 10.1000 19 Oakland 2 MEAN 1.50000 3.25000 59.6500 9.8000 20 Oakland 2 STD 0.70711 0.21213 0.6364 0.4243 21 Sunnyval 4 N 4.00000 4.00000 4.0000 4.0000 22 Sunnyval 4 MIN 1.00000 3.40000 52.9000 9.4000 23 Sunnyval 4 MAX 4.00000 5.70000 56.8000 10.7000 24 Sunnyval 4 MEAN 2.50000 4.75000 54.7750 10.0500 25 Sunnyval 4 STD 1.29099 0.96782 1.6460 0.5686

Page 23: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Counting the Differences Between Values in the Longitudinal Data Set

We would like to see the differences in values by city and by month for the following variables:

The average rainfall recorded (given by ‘rain’) The average temperature recorded (given by ‘avetemp’) The average sunlight recorded (given by ‘avesunlight’)

Page 24: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Preparing data for handling differences being analyzed

proc sort data=rainfall1 out=rainfall1;by city month;run; data last;set rainfall1;by city;put city=month=first.city = last.city=;if last.city;run;

Page 25: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Logfile for the code generated by first.city and last.citylog file:NOTE: There were 17 observations read from the data set WORK.RAINFALL1.NOTE: The data set WORK.RAINFALL1 has 17 observations and 5 variables.NOTE: PROCEDURE SORT used (Total process time): real time 0.01 seconds cpu time 0.00 seconds  6162 data last;63 set rainfall1;64 by city;65 put city=month=first.city = last.city=;66 if last.city;67 run; city=Alameda month=1 FIRST.city=1 LAST.city=1city=Fremont month=1 FIRST.city=1 LAST.city=0city=Fremont month=2 FIRST.city=0 LAST.city=0city=Fremont month=3 FIRST.city=0 LAST.city=0city=Fremont month=4 FIRST.city=0 LAST.city=0city=Fremont month=5 FIRST.city=0 LAST.city=1city=Hayward month=1 FIRST.city=1 LAST.city=0city=Hayward month=2 FIRST.city=0 LAST.city=0city=Hayward month=3 FIRST.city=0 LAST.city=0city=Hayward month=4 FIRST.city=0 LAST.city=0city=Hayward month=5 FIRST.city=0 LAST.city=1city=Oakland month=1 FIRST.city=1 LAST.city=0city=Oakland month=2 FIRST.city=0 LAST.city=1city=Sunnyval month=1 FIRST.city=1 LAST.city=0city=Sunnyval month=2 FIRST.city=0 LAST.city=0city=Sunnyval month=3 FIRST.city=0 LAST.city=0city=Sunnyval month=4 FIRST.city=0 LAST.city=1NOTE: There were 17 observations read from the data set WORK.RAINFALL1.NOTE: The data set WORK.LAST has 5 observations and 5 variables.NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds

Page 26: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Capturing Differences Generated From Observed Values by City and Month

data difference;set rainfall1;by city;if first.city and last.city then delete;Diff_Rain = rain - lag(rain);Diff_AveTemp = AveTemp - lag(AveTemp);Diff_AverageSunlight = AverageSunlight - lag(AverageSunlight);if not first.city then output;run;

 proc print data=difference;run;

Page 27: Structuring SAS Data Sets and Working with Multiple Observations per Subject

A Complete Set of Differences Generated – Resulting Output

The SAS System 14:46 Sunday, May 30, 2010 5  Ave Average Diff_ Diff_ Diff_ Average Obs city month Rain Temp Sunlight Rain AveTemp Sunlight  1 Fremont 2 4.5 57.6 9.5 -0.7 4.7 0.8 2 Fremont 3 3.3 59.7 10.1 -1.2 2.1 0.6 3 Fremont 4 2.2 60.6 11.0 -1.1 0.9 0.9 4 Fremont 5 1.8 63.9 11.5 -0.4 3.3 0.5 5 Hayward 2 3.6 57.8 10.1 2.2 -2.6 0.8 6 Hayward 3 2.0 62.5 10.2 -1.6 4.7 0.1 7 Hayward 4 4.1 55.2 10.8 2.1 -7.3 0.6 8 Hayward 5 5.7 53.6 11.2 1.6 -1.6 0.4 9 Oakland 2 3.4 60.1 10.1 0.3 0.9 0.6 10 Sunnyval 2 3.4 56.8 9.8 -2.3 2.6 0.4 11 Sunnyval 3 4.9 55.2 10.3 1.5 -1.6 0.5 12 Sunnyval 4 5.0 52.9 10.7 0.1 -2.3 0.4 

Page 28: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Capturing the Differences of the First and Last Observations for a Given City

data first_last;set rainfall1;by city;if first.city and last.city then delete;if first.city or last.city then do;diff_rain = rain - lag(rain);diff_temp = avetemp - lag(avetemp);diff_sunlight = averagesunlight - lag(averagesunlight);end;if last.city then output;run; proc print data=first_last;run;

 

Page 29: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Capturing the Differences of the First and Last Observations for a Given City 

  Ave Average Average diff_ diff_ diff_ Obs city month Rain Temp Sunlight rain temp Sunlight  1 Fremont 5 1.8 63.9 11.5 -3.4 11.0 2.8 2 Hayward 5 5.7 53.6 11.2 4.3 -6.8 1.9 3 Oakland 2 3.4 60.1 10.1 0.3 0.9 0.6 4 Sunnyval 4 5.0 52.9 10.7 -0.7 -1.3 1.3

Page 30: Structuring SAS Data Sets and Working with Multiple Observations per Subject

The RETAIN Statement

• Using RETAIN statement is one of the best ways to “remember” values from previous observations

• Variables that do not come from SAS data sets are set to a missing value during each iteration of the DATA step• A RETAIN statement allows you to tell SAS not to do this

Page 31: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Using a RETAIN Statement to Compute Difference Between First and Last Observation

data new_data_set;set old_data_set;by ID;if first.ID and last.ID then delete;

Retain First_Var_1 First_Var_2 … First_Var_3;

If first .ID then do;First_Var_1 = Var_1First_Var_2 = Var_2…First_Var_n = Var_n

If last.ID then do;Diff_Var_1 = Var_1 - First_Var_1;Diff_Var_2 = Var_2 - First_Var_2;…Diff_Var_n = Var_n - First_Var_n;

End;Drop Frist_: ;Run;

Need to sort by the ID, or which ever variable you are grouping by!

The RETAIN statement ensures these variables are not set back to missing values during iteration.The Variables named within

the RETAIN Statement are not replaced with missing values during the iterations.

Page 32: Structuring SAS Data Sets and Working with Multiple Observations per Subject

What Happens?

The RETAIN statement ensures your variables are not set back to missing values.

When processing first observation, the retained variables are set to respective variable values.

The last iteration subtracts the retained first values from the respective last variable values.

Page 33: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Example:Computing Differences Between the First and Last Observation in a BY Group Using RETAIN Statement

proc sort data=rainfall;by City Month;

run;

data last;set rainfall;

by City;put City= Month= First.City= Last.City=;if last.City;

run;

data first_last;set rainfall;by City;if first.city and last.city then delete;

retain First_Rainfall First_Temp First_Hours;

if first.City then do;First_Rainfall = Rainfall;First_Temp = Temp;First_Hours = Hours;

end;

if last.City then do;Diff_Rainfall = Rainfall - First_Rainfall; Diff_Temp = Temp - First_Temp;Diff_Hours = Hours - First_Hours;output;

end;drop First_: ;run;

Page 34: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Our Output

Displaying the RETAIN Statement

City Month Rainfall Temp Hours Diff_ Rainfall Diff_ Temp Diff_Hours

Fremont 5 1.8 63.9 11.5 -3.4 11.0 2.8

Hayward 5 5.7 53.6 11.2 4.3 -6.8 1.9

Oakland 5 3.6 60.3 10.8 0.5 1.1 1.3

Sunnyvale 4 5.0 52.9 10.7 -0.7 -1.3 1.3

Page 35: Structuring SAS Data Sets and Working with Multiple Observations per Subject

How does RETAIN compare to LAG?

Output from LAG statement:

city month Rain Temp Sunlight diff_rain diff_temp diff_Sunlight Fremont 5 1.8 63.9 11.5 -3.4 11.0 2.8Hayward 5 5.7 53.6 11.2 4.3 -6.8 1.9Oakland 2 3.4 60.1 10.1 0.3 0.9 0.6Sunnyvale 4 5.0 52.9 10.7 -0.7 -1.3 1.3

Output from RETAIN statement

City Month Rainfall Temp Hours Diff_ Rainfall Diff_ Temp Diff_Hours

Fremont 5 1.8 63.9 11.5 -3.4 11.0 2.8Hayward 5 5.7 53.6 11.2 4.3 -6.8 1.9Oakland 5 3.6 60.3 10.8 0.5 1.1 1.3Sunnyvale 4 5.0 52.9 10.7 -0.7 -1.3 1.3

Page 36: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Using Retained Variable to “Remember” a Previous Value

Suppose you want to know if a certain variable value is the maximum of all of you observations

The RETAIN Statement allows us to easily find this while preserving a variable’s value from previous iteration

Page 37: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Example:Finding the Maximum Rainfall, Temperature and Hours of Daylight of all Observations

data Maximums;Set rainfall;Retain Max_Rainfall Max_Temp Max_Hours;Max_Rainfall = Max(Max_Rainfall,

Rainfall);Max_Temp = Max(Max_Temp, Temp);Max_Hours = Max(Max_Hours, Hours);run;

title "Displaying the RETAIN Statement- Finding Maximums“;

proc print data=Maximums noobs;run;

Page 38: Structuring SAS Data Sets and Working with Multiple Observations per Subject

Our Output

Displaying the RETAIN Statement- Finding Maximums city Month Rainfall Temp Hours Max_Rainfall Max_Temp Max_Hours

Alameda 1 5.8 53.6 8.9 5.8 53.6 8.9 Fremont 1 5.2 52.9 8.7 5.8 53.6 8.9 Fremont 2 4.5 57.6 9.5 5.8 57.6 9.5 Fremont 3 3.3 59.7 10.1 5.8 59.7 10.1 Fremont 4 2.2 60.6 11.0 5.8 60.6 11.0 Fremont 5 1.8 63.9 11.5 5.8 63.9 11.5 Hayward 1 1.4 60.4 9.3 5.8 63.9 11.5 Hayward 2 3.6 57.8 10.1 5.8 63.9 11.5 Hayward 3 2.0 62.5 10.2 5.8 63.9 11.5 Hayward 4 4.1 55.2 10.8 5.8 63.9 11.5 Hayward 5 5.7 53.6 11.2 5.8 63.9 11.5 Oakland 1 3.1 59.2 9.5 5.8 63.9 11.5 Oakland 2 3.4 60.1 10.1 5.8 63.9 11.5 Oakland 5 3.6 60.3 10.8 5.8 63.9 11.5 Sunnyvale 1 5.7 54.2 9.4 5.8 63.9 11.5 Sunnyvale 2 3.4 56.8 9.8 5.8 63.9 11.5 Sunnyvale 3 4.9 55.2 10.3 5.8 63.9 11.5 Sunnyvale 4 5.0 52.9 10.7 5.8 63.9 11.5