sas 101 - sascommunity_… · sas® 101 based on learning sas by example: a programmer‟s guide...
TRANSCRIPT
SAS® 101
Based on
Learning SAS by Example:
A Programmer‟s Guide
Chapter 21, 22, & 23
By Tasha Chapman, Oregon Health Authority
Topics covered…
All the leftovers!
Infile options
Missover
LRECL=/Pad/Truncover
FirstObs=/OBS=
Topics covered…
Advanced formats and informats
User created informats
Reading both character and numeric data
Formats within formats
Saving formats
Using formats as look-up tables
CNTLIN and CNTLOUT datasets
Topics covered…
Transposing data
PROC Transpose
Other topics – Saving and storing macros
%Include
Autocall library
Stored compiled macros
Infile options
PROC Import with a twist (redux)
Run PROC Import
Copy the SAS log to the Program Editor
PROC Import will create a DATA step with INFILE and
INPUT statements in the log
Delete any non-SAS code
Modify informats, formats, and lengths (as needed)
Run the new code
From Week 3 – Chapters 5 & 6
PROC Import with a twist (redux)
Run PROC Import
Copy the SAS log to the Program Editor
Delete any non-SAS code
Modify informats, formats, and lengths (as needed)
Run the new code
Infile statement options
From Week 3 – Chapters 5 & 6
Common INFILE options
Option Purpose
dsd Stands for delimiter sensitive data. Changes default delimiter from blank to
comma. If two delimiters in a row, assumes missing value between. Quotes
stripped from character values.
dlm= Stands for delimiter. Specifies alternate delimiter(s).
missover At the end of an input line of raw data, sets remaining values to missing if there
are more variables than data values.
lrecl= Stands for logical record length. Specifies the record length of the raw data
file (necessary if greater than the default 256 bytes).
pad Pads the input records with blanks out the the end of the logical record length.
truncover Essentially has the effect of both PAD and MISSOVER options combined.
firstobs= Specifies which record number in the raw dataset is the first observation of
data. Useful if raw data includes headers.
obs= Specifies the record number of the last record to read. Useful if only want to
read a select number of observations.
MISSOVER
Missing data results in
shorter than expected line
MISSOVER
MISSOVER option fills in the
blanks at the end of the line
with missing values.
Common INFILE options
Option Purpose
dsd Stands for delimiter sensitive data. Changes default delimiter from blank to
comma. If two delimiters in a row, assumes missing value between. Quotes
stripped from character values.
dlm= Stands for delimiter. Specifies alternate delimiter(s).
missover At the end of an input line of raw data, sets remaining values to missing if there
are more variables than data values.
lrecl= Stands for logical record length. Specifies the record length of the raw data
file (necessary if greater than the default 256 bytes).
pad Pads the input records with blanks out the the end of the logical record length.
truncover Essentially has the effect of both PAD and MISSOVER options combined.
firstobs= Specifies which record number in the raw dataset is the first observation of
data. Useful if raw data includes headers.
obs= Specifies the record number of the last record to read. Useful if only want to
read a select number of observations.
FIRSTOBS=
FIRSTOBS=
First row contains header
information
FIRSTOBS=
FIRSTOBS=2 option starts
reading the raw data file
on the second row.
Common INFILE options
Option Purpose
dsd Stands for delimiter sensitive data. Changes default delimiter from blank to
comma. If two delimiters in a row, assumes missing value between. Quotes
stripped from character values.
dlm= Stands for delimiter. Specifies alternate delimiter(s).
missover At the end of an input line of raw data, sets remaining values to missing if there
are more variables than data values.
lrecl= Stands for logical record length. Specifies the record length of the raw data
file (necessary if greater than the default 256 bytes).
pad Pads the input records with blanks out the the end of the logical record length.
truncover Essentially has the effect of both PAD and MISSOVER options combined.
firstobs= Specifies which record number in the raw dataset is the first observation of
data. Useful if raw data includes headers.
obs= Specifies the record number of the last record to read. Useful if only want to
read a select number of observations.
Advanced formats and informats
PROC Format (redux)
value $gender
Value statement begins new format
Can create more than one format per
PROC Format
$gender is the name of the new
format
Format name begins with a $ to
indicate that the format is to be
applied to Character data
Input
value
Output
value
From Week 3 – Chapters 5 & 6
What are informats? (redux)
Informats are instructions that tell SAS how to read
a data value
Can be as simple as w.d
3.1 tells SAS to read „123‟ as 12.3
$3. tells SAS to read „123‟ as „123‟ and store it as
character data
Excellent for reading dates, dollars, and percents
MMDDYY8. tells SAS to read ‟12/26/07‟ and store it
as 17526 (a SAS date that can be used for
calculations, etc.)
From Week 3 – Chapters 5 & 6
Creating informats
invalue score
Invalue statement creates informats
Dollar sign $ indicates format will
be creating character variables
(i.e. output value will be character)
Absence of dollar sign indicates
format will be creating numeric
variables (as in this example)
Input
value
Output
value
Creating informats
Survey scale entered as character values
(SA = Strongly Agree, A = Agree, etc.)
Want to convert to numeric Likert-type scale
Creating informats
Use PROC
Format to create
the informat
score
Apply the
informat while
reading in the
raw data
Creating informats
UPCASE option
Converts all input strings to uppercase before they are compared to ranges
JUST option
Left justifies all input strings before they are compared to ranges
Useful options as raw data may be messy (mixed case, leading blanks, etc.)
Creating informats
Dataset of patient temperature readings
Normal temperature coded as “N”
Actual temperature entered if not normal
Both character and
numeric data in
same field
Creating informats
Use PROC
Format to create
the informat
tempfmt
Numeric temperatures
within valid range will be
read as written.
“N” will be converted to
98.6.
Any other values
(including numeric temps
outside valid range) will
be converted to missing.
Formats within formats
Formats and informats can be nested within each
other
Useful for applying multiple types of formats (e.g.
picture and value formats) to the same variable
depending on the data value
Formats within formats
Phone directory dataset
Some provided full phone numbers
Want to show as (999) 999-9999
Some provided extensions
Want to show as x9999
Some have no phone number
Want to show as “Unlisted”
Formats within formats
(503) 373-1793
x1793
Applies format based
on data value
Saving and storing formats
PROC Format saves user-created formats to a
catalog
Usually these catalogs are in the WORK library,
and are deleted at the end of each SAS Session
However, formats can be easily saved and stored to
other permanent libraries
Saving and storing formats
Save a format to a permanent library using a
library= option
This will create a formats catalog
(called “formats” by default) in the
mylib folder
Saving and storing formats
To use the saved formats in another program, use
the fmtsearch= option to add that catalog to the
list of available formats
CNTLIN/CNTLOUT
Can create a format from a dataset using the
CNTLIN= option in PROC Format
Can create a dataset from a format using the
CNTLOUT= option in PROC Format
CNTLIN/CNTLOUT
Have a dataset of ICD9
codes and descriptions
Want to convert this to a
SAS format
CNTLIN/CNTLOUT
The input dataset has to have specific variables:
FMTNAME – name of the format
START – the single value to be formatted (or start
value if the beginning of a range of values)
END (optional) – the end value of a range of values to
be formatted
LABEL – the formatted value
TYPE (optional) – type of format, C for character, N for
numeric
CNTLIN/CNTLOUT
Original dataset
Ready to be
made into a
format
CNTLIN/CNTLOUT
Use PROC Format to
convert the dataset
to a format
CNTLIN/CNTLOUT
Use PROC Format to
convert a format to a dataset
Transposing data
Transposing data
Transposing data
Transposing is converting variables to observations
and vice versa
Multiple ways of restructuring and transposing data
PROC Transpose
DATA step – Arrays and DO Loops
Transposing data – basic example
Input (original)
dataset
Two variables
Seven observations
Output (transposed) dataset
Seven variables
Two observations
Transposing data – basic example
data=
Input (original)
dataset
out=
Output (transposed)
dataset
Transposing data – basic example
Name of transposed variables
stored in “_NAME_” column
Transposing data – basic example
New variables generically
named COL1, COL2, etc.
VAR statement
The var statement specifies which variables should
be transposed
If omitted, by default PROC Transpose will only
transpose numeric variables
ID statement
The ID statement specifies which variables should
be used to name the new columns
If the value is not a valid variable name (e.g., starts
with a number), SAS will convert it to a valid name
(e.g., leading underscore)
ID statement
The variable names can be modified with the
prefix=, delimiter=, or suffix= options
Transposing data – BY groups
Two temperatures
(HighTemp and LowTemp)
Three cities
(Eugene, Portland, and Salem)
BY statement
Can specify more than one BY group variable
Data must be sorted by BY group variable(s)
City _NAME_ Day1 Day2 Day3 Day4 Day5 Day6 Day7
Eugene HighTemp 68 65 66 63 60 63 65
Eugene LowTemp 46 41 45 44 45 43 44
Portland HighTemp 62 63 61 62 60 62 66
Portland LowTemp 44 43 42 39 44 45 45
Salem HighTemp 65 66 62 60 58 62 68
Salem LowTemp 45 42 43 41 41 45 46
BY statement
DayOf Week City HighTemp LowTemp
1 Eugene 68 46
2 Eugene 65 41
3 Eugene 66 45
4 Eugene 63 44
5 Eugene 60 45
6 Eugene 63 43
7 Eugene 65 44
1 Portland 62 44
2 Portland 63 43
3 Portland 61 42
4 Portland 62 39
5 Portland 60 44
6 Portland 62 45
7 Portland 66 45
1 Salem 65 45
2 Salem 66 42
3 Salem 62 43
4 Salem 60 41
5 Salem 58 41
6 Salem 62 45
7 Salem 68 46
1 Portland 62 44
2 Portland 63 43
3 Portland 61 42
4 Portland 62 39
5 Portland 60 44
6 Portland 62 45
7 Portland 66 45
DayOf Week City HighTemp LowTemp
1 Eugene 68 46
2 Eugene 65 41
3 Eugene 66 45
4 Eugene 63 44
5 Eugene 60 45
6 Eugene 63 43
7 Eugene 65 44
1 Salem 65 45
2 Salem 66 42
3 Salem 62 43
4 Salem 60 41
5 Salem 58 41
6 Salem 62 45
7 Salem 68 46
NAME= option
Use the name= option to name the variable
containing the name of the transposed variable
(_NAME_ column)
Saving and storing macros
Saving and storing macros
Need to store and share macro code
Multiple ways to save and store macros for future
use
%Include
Autocall facility
Stored compiled macro facility
Saving and storing macros
Which method to choose depends on your needs
and operating environment
SAS recommends:
Don‟t store macros while still in development
If you are running production-level jobs using name-
style macros, consider stored compiled macros
If you are letting a group of users share macros,
consider the autocall facility
LIBNAME trick (redux)
Save your commonly used and/or passworded
LIBNAME statements in a text file (using Notepad)
Use a %include statement to reference the text file
at the beginning of every SAS program
SAS will include the code in
the text file as if it were
part of your program.
From Week 2 – Chapters 3 & 4
Save your macro definitions in a text file
Use %include to reference the file at the start of
every program
%Include
%Include
Advantages:
Easy and straightforward approach
Excellent first step towards starting a macro library
Disadvantages:
The macro definition is compiled every time the
%include is executed (inefficient)
If efficiency is an issue, each file should contain only
one macro (which would result in multiple files to
include)
Requires you to know where the physical text files are
stored
Autocall facility
An autocall library is a directory containing
individual files
Similar in concept to %include, but files stored as SAS
files
Each file contains one macro definition
The name of the file must be the same as the macro
name
An autocall library can also be a SAS catalog
Autocall facility
Save the SAS code for
your macro using the
macro name as the
program file name
To avoid confusion, this
folder should have nothing
but autocall macros
Autocall facility
To use the macro later…
Reference the folder storing the autocall macros with a FILEREF
(created with a filename statement)
Not a libref!
Autocall facility
mautosource option turns on the autocall macro facility
mautolocdisplay option (optional) displays the location of
the source code in the log when the macro is called
sasautos= option tells SAS where the autocall macros are
stored
Autocall facility
Advantages:
Macros stored as SAS code – can use enhanced editor
to modify them
User-defined macros stored in a standard location
No need to remember multiple file names when calling
macros
Macro code only compiled the first time it is used in a
session (efficient)
Easy to share
Autocall facility
Disadvantages:
Because macro code only compiled once per session,
this can be difficult during editing phase
Stored compiled macros
Macros are always compiled before they are
executed
Compiled macros are stored in a catalog called
SASMACR
In a typical session, this catalog is stored in the
WORK library
However, this catalog can be stored in a more
permanent library for future use
Stored compiled macros
Create a library to store the SASMACR catalog
Stored compiled macros
mstored option turns the stored compiled macro facility on
sasmstore= option identifies the library
where the SASMACR catalog will be stored
Stored compiled macros
Run the macro you want to store
store option tells SAS to store this macro
source option (optional) stores the source code with the
compiled code
des= option (optional) assigns a descriptive title for the macro
entry in the SAS catalog
Stored compiled macros
To use the macro later…
mstored option turns the stored compiled macro facility on
sasmstore= option identifies the library
where the SASMACR catalog is stored
Stored compiled macros
SASMACR catalog
available to view in the
Explorer Window
Description stored as a file property
(Right-click Properties)
Stored compiled macros
If the source option was used during
macro storage, the source code can be
retrieved using %copy
(Code will be printed to log)
Stored compiled macros
Advantages:
Macro programs only compiled once
Compile and store is faster
Can store more than one macro per catalog
Keeping track of macros is easy
Source code does not have to be stored with SASMACR
catalog
But for maintenance purposes, it is recommended
Stored compiled macros
Disadvantages:
Cannot recreate source statements from a compiled
macro
Cannot be moved directly to other operating systems
Must be saved and recompiled under new OS at any
new location
May need to be recompiled for new releases of SAS
Saving and storing macros
If macros are stored in multiple locations, SAS will
search for macro definitions in this order:
WORK.SASMACR catalog
Stored compiled macros
Autocall macros
Additional Reading
Missover, Truncover, and Pad, Oh My!! or Making Sense of the Infile and Input Statements
Yes We Can…Save SAS Formats
Learn the Basics of Proc Transpose
Turning the Data Around: Proc Transpose and Alternative Approaches
Use of a Macro to Revise Data
Creating a Stored Macro Facility in Ten Minutes
Ways to Store Macro Source Codes and How to Retrieve Them
Building and Using Macro Libraries
That‟s all, folks!
You Did It!