module 1 : the data step (1, 2, 3) - auburn universitycarpedm/courses/stat6110/notes/module1/...i...

29
Introduction to SAS Programming and Applications Module 1 : THE DATA STEP (1, 2, 3) MARK CARPENTER, Ph.D. Slide 1-1 Keywords : DATA, INFILE, INPUT, FILENAME, DATALINES Procedures : PRINT Pre-Lecture Preparation: create directory on your local hard drive called “i\stat6110\module1”. Download SAS programs called “Ex1_1.sas”, “Ex1_2.sas” into this directory. Create a raw text file called ex1_1.txt and the comma delimited file ex1_1.csv in this directory, consisting of the following 3 data lines: ex1_1.txt ex1_1.csv 1 18 92 1,18,92 2 21 88 2,21,88 3 26 98 3,26,98 SAS Programs: module1_examples1.sas and module1_examples2.sas h"p://support.sas.com/documenta2on/onlinedoc/base/index.html#base94 SAS Documenta-on:

Upload: duongmien

Post on 03-May-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Introduction to SAS Programming and Applications

Module 1 : THE DATA STEP (1, 2, 3)

MARK CARPENTER, Ph.D.

Slide 1-1

Keywords : DATA, INFILE, INPUT, FILENAME, DATALINES

Procedures : PRINT

Pre-Lecture Preparation: create directory on your local hard drive called “i\stat6110\module1”. Download SAS programs called “Ex1_1.sas”, “Ex1_2.sas” into this directory. Create a raw text file called ex1_1.txt and the comma delimited file ex1_1.csv in this directory, consisting of the following 3 data lines:

ex1_1.txt ex1_1.csv 1 18 92 1,18,92 2 21 88 2,21,88 3 26 98 3,26,98

SAS Programs: module1_examples1.sas and module1_examples2.sas

h"p://support.sas.com/documenta2on/onlinedoc/base/index.html#base94    

SAS  Documenta-on:  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

DATA step a programming language that you use to manipulate and manage your data. SAS procedures software tools for data analysis and reporting. macro facility a tool for extending and customizing SAS software programs and for reducing text in your programs. DATA step debugger a programming tool that helps you find logic problems in DATA step programs. Output Delivery System (ODS) a system that delivers output in a variety of easy-to-access formats, such as SAS data sets, procedure output files, or Hypertext Markup Language (HTML). SAS windowing environment an interactive, graphical user interface that enables you to easily run and test your SAS programs.

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

The Title of this session is “DATA STEP Programming (1, 2, 3)”. The “1, 2, 3” refers to the three basic elements required in producing a SAS data set in SAS using the DATA STEP process. These three elements are:

(1)   DATA STEP (begins with a DATA statement and the name of new SAS data set),

(2)   DATA Source (we use INFILE and FILENAME statements to tell SAS the location and type of data when importing data. A SET statement is used when the source is another SAS data set)

(3)   DATA Structure: Telling SAS the structure of the data (INPUT, INFOMAT, FORMAT. When the source is an existing SAS data set the INPUT, INFILE, INFORMAT and FILENAME statements are not needed).

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

•  The primary method for creating a SAS data set with Base SAS software.

•  A DATA step is a group of SAS language statements that begin with a DATA statement. The group of language statements contains other programming statements that manipulate existing SAS data sets or create SAS data sets from raw data files.

•  A DATA step creates a SAS data set. This data set can be a SAS data file or a SAS view. A SAS data file stores data values while a SAS view stores instructions for retrieving and processing data. When you can use a SAS view as a SAS data file, as is true in most cases, this documentation uses the broader term SAS data set.

What is a DATA STEP?

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

DATA Statement: Begins a DATA step and provides names for any output SAS data sets, views, or programs.

INFILE Statement: Specifies an external file to read with an INPUT statement. Can be used with or without a FILENAME statement (see below)

INPUT Statement: Describes the arrangement of values in the input data record and assigns input values to the corresponding SAS variables

FILENAME Statement: Associates a SAS fileref with an external file or an output device, disassociates a fileref and external file, or lists attributes of external files. Can be used for either importing files or exporting files.

DATALINES Statement: Specifies that data lines follow for the current DATA STEP. Useful when working with small datasets and SAS programs are being used by more than one person.

INFORMATS/FORMAT: Statements Described on next slide.

.

Some DATA Step Statements (sometimes optional)

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

.

DATA Step when pre-existing SAS DATA used as inputs:

All of the above statements are used with importing data from an external source. When you use a SAS data set as input into a DATA step, the description of the data set is available to SAS. In your DATA step, use a SET, MERGE, MODIFY, or UPDATE statement to read the SAS data set. Use SAS programming statements to process the data and create an output SAS data set.

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

INFORMAT Statement: specifies any special formats for incoming data during the importation process during a data step. For example, incoming data may have special characters that SAS uses for other purposes ($,&, etc), date formats will have characters, such as the slash or the month spelled out, etc. The optional INFORMAT Statement tells SAS what to expect with the incoming variables. Note: the format in the resulting data set does not necessarily reflect the incoming format.

FORMAT Statement: This specifies the final format of the Data Set produced from a DATA STEP. For example, the date coming in (INFORMAT) might be of the form February 15, 1963, but once the SAS data set is produced (DATA STEP is completed), the format can be changed to 02/15/63.

.

Data and resulting variables come in many types, character, date, numerical, scientific notation. We refer to the format of incoming data from external sources as INFORMATS and the format of variables in SAS data sets as FORMATS. Sometimes it is necessary to specify these during DATA step processing.

INFORMAT  AND  FORMAT  Statements  Some DATA Step Statements (cont)

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

.

Example  1.1:    Simple  DATA  Steps  to  import  a  raw  data  set  containing  3  data  lines  and  3  variables.  To  demonstrate  different  methods,  this  is  done  with  6  different  DATA  steps,  example1_1a-­‐f,  as  described  below:      Example1_1a:  imports  from  datalines  using  DATALINES  statement.      

Example1_1b:  imports  from  a  raw  external  data  file  called  “ex1_1.txt”    

Example1_1c:  same  as  above  but  the  external  file  is  located  at  URL    

Example1_1d:  same  as  above  but  adds  step  of  using  FILENAME    

Example1_1e:  Uses  FILENAME  to  demonstrate  different  uses.    

Example1_1f:  Uses  SET  statement  to  create  SAS  dataset  from  exis2ng  SAS  set      

Example1_1g:  Uses  SET  statement  to  create  SAS  dataset  from  exis2ng  SAS  sets.  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

DATA Example1_1a; INFILE DATALINES; INPUT ID Age Exam1; DATALINES; 1 18 92 2 21 88 3 26 98 ;

These lines represent the DATA step that produces the SAS dataset called “Example1_1”.

Example  1.1.a:    Simple  DATA  Step  Using  DATALINES  The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

DATA Example1_1a; INFILE DATALINES; INPUT ID Age Exam1; DATALINES; 1 18 92 2 21 88 3 26 98 ;

Begins  a  DATA  step  and  provides  names  for  any  output  SAS  data  sets  

Example  1.1.a:    Simple  DATA  Step  Using  DATALINES  The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

DATA Example1_1a; INFILE DATALINES; INPUT ID Age Exam1; DATALINES; 1 18 92 2 21 88 3 26 98 ;

INFILE  usually  Specifies  an  external  file  to  read  with  an  INPUT  statement,  but  in  this  case  it  specifies  the  special  case  of  DATALINES  within  the  datastep.  

Example  1.1.a:    Simple  DATA  Step  Using  DATALINES  The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

DATA Example1_1a; INFILE DATALINES; INPUT ID Age Exam1; DATALINES; 1 18 92 2 21 88 3 26 98 ;

Describes the arrangement of values in the input data record and assigns input values to the corresponding SAS variables.

Example  1.1.a:    Simple  DATA  Step  Using  DATALINES  The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

DATA Example1_1a; INFILE DATALINES; INPUT ID Age Exam1; DATALINES; 1 18 92 2 21 88 3 26 98 ;

Specifies that data lines follow for the current DATA STEP. Useful when working with small datasets and SAS programs are being used by more than one person.

Example  1.1.a:    Simple  DATA  Step  Using  DATALINES  The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

DATA Example1_1a; INFILE DATALINES; INPUT ID Age Exam1; DATALINES; 1 18 92 2 21 88 3 26 98 ;

The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.

The actual data lines that make up the data to be placed in the final data set. The semicolon must be on the line immediately following the last data line.

Note: If the data are contained in a file external to SAS or an existing SAS dataset the DATALINES statement and data would not be needed.

Example  1.1.a:    Simple  DATA  Step  Using  DATALINES  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

The Following DATA step creates a SAS data set called “Example1_1b”. It produces a data set identical to Example1_1a, but it reads the data from a text file locate on the local harddrive. DATA Example1_1b; INFILE ‘i:\stat6110\module1\ex1_1.txt'; INPUT ID Age Exam1; RUN;  

Example  1.1.b:    Simple  DATA  Step  from  external  data  source  

DATA  Step  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

The Following DATA step creates a SAS data set called “Example1_1b”. It produces a data set identical to Example1_1a, but it reads the data from a text file locate on the local harddrive. DATA Example1_1b; INFILE ‘i:stat6110\module1\ex1_1.txt'; INPUT ID Age Exam1; RUN;  

Example  1.1.b:    Simple  DATA  Step  from  external  data  source  

Full  path  name  to  file  on  hard  drive  is  in  quotes.  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

The Following DATA step creates a SAS data set called “Example1_1b”. It produces a data set identical to Example1_1a, but it reads the data from a text file locate on the local harddrive. DATA Example1_1b; INFILE ‘i:\stat6110\module1\ex1_1.txt'; INPUT ID Age Exam1; RUN;  

Example  1.1.b:    Simple  DATA  Step  from  external  data  source  

Full  path  name  to  file  on  hard  drive  is  in  quotes.  

Note:  either  single  quotes  or  double  quotes  (double  quotes  key  on  keyboard)  will  work  here.    If  single  quotes  used,  SAS  ignores  special  reserves  characters  and  treats  the  string  literally.    If  double  quotes,  SAS  will  act  if  it  comes  across  a  SAS  reserve  characters,  like  &  for  a  macro  variable,  for  example.    

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

The Following DATA step creates a SAS data set called “Example1_1b”. It produces a data set identical to Example1_1a, but it reads the data from a text file locate on the local hard drive. DATA Example1_1b; INFILE ‘i:\module1\ex1_1.txt'; INPUT ID Age Exam1; RUN;  

Example  1.1.b:    Simple  DATA  Step  from  external  data  source  

RUN  not  necessary  but  when  SAS  reads  this  statement  it  officially  ends  the  DATA  step  statements  and  SAS  begins  to  process.  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

The Following DATA step creates a SAS data set called “Example1_1c”. It produces a data set identical to Example1_1a & b, but it reads the data from a text file located at the indicated URL.

DATA Example1_1c; INFILE 'http://www.auburn.edu/~carpedm/courses/notes/module1/ex1_1.txt’

DEVICE=URL; INPUT ID Age Exam1; RUN;  

Example  1.1.c:    Simple  DATA  Step  from  URL  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

The Following DATA step creates a SAS data set called “Example1_1c”. It produces a data set identical to Example1_1a & b, but it reads the data from a text file located at the indicated URL.

DATA Example1_1c; INFILE 'http://www.auburn.edu/~carpedm/courses/notes/module1/ex1_1.txt’

DEVICE=URL; INPUT ID Age Exam1; RUN;  

Example  1.1.c:    Simple  DATA  Step  from  URL  

Notice how the DEVICE=URL goes after the quoted string which points SAS to the data file.  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

The Following DATA step creates a SAS data set called “Example1_1d”. It produces a data set identical to Example1_1a,b, & c, but it reads the data from a text file using the Fileref created with the FILENAME statement. The FILENAME statement associates the name “FromWeb” with the file located at the indicated URL. Note: SAS must be informed that the file will be found through the indicated URL by including the Keyword URL in the statement. By default SAS assumes the file is located on a local or virtual hard drive. FILENAME FromWeb URL 'http://www.auburn.edu/~carpedm/courses/stat6110/notes/module1/ex1_1.txt’;

DATA Example1_1d; INFILE FromWeb; INPUT ID Age Exam1; RUN;

Example  1.1.d:    Simple  DATA  Step  from  URL  Using  FILENAME  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

The Following DATA step creates a SAS data set called “Example1_1e”. It produces a data set identical to Example1_1a,b, c & d, but it reads the data from a text file located on the hard drive at the path indicated in quotes. The data step uses the Fileref, “FromHD”, that was created by the preceding FILENAME Statement. Note: because the file is located on a local hard drive, SAS doesn’t have to be informed of any special devices like URL in the previous example. FILENAME FromHD 'i:\stat6110\module1\ex1_1.txt'; DATA Example1_1e; INFILE FromHD; INPUT ID Age Exam1; RUN;

Example  1.1.e:    Using  FILENAME  for  local  hard  drive  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

The Following DATA step creates a SAS data set called “Example1_1f” from the existing SAS data set Example1_1f using the SET Statement. Example1_1g demonstrates how several SAS data sets can be combined (concatenated) by placing a list of SAS data sets in the SET statement. DATA Example1_1f; SET Example1_1e; RUN;

Example  1.1.f  &  g:    Simple  DATA  Step  from  Exis-ng  SAS  Data  Set  

DATA Example1_1g; SET Example1_1a Example1_1b Example1_1c; RUN;  

The  SET  and  MERGE  statements  will  be  covered  in  greater  detail  in  Module  2  :  Combining  and  Sor2ng  SAS  Data  Sets.  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

 Form  1:  DATA  <data-­‐set-­‐name-­‐1  <(data-­‐set-­‐op2ons-­‐1)>  >    <...data-­‐set-­‐name-­‐n  <(data-­‐set-­‐op2ons-­‐n)>  >    </  <DEBUG><NESTING><STACK  =  stack-­‐size>  ><NOLIST>;  

DATA Statement Syntax: SAS Documentation indicates several forms (1-6) of DATA steps syntax to reflect different situations, but we only look at the first form here (_null_ data sets, data views, stored programs, passwords, etc, will be discussed later). As Form 1, below indicates, several data sets can be produced during one data step, so to keep it simple for now we examine Revised Form 1syntax.

 Revised  Form  1:  DATA  <data-­‐set-­‐name  <(data-­‐set-­‐op2ons)>>;    

(data-set-options) specifies optional arguments that the DATA step applies when it writes observations to the output.  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

Valid  in:   DATA  Step  

Category:   File-­‐handling  

Type:   Executable  

Opera2ng  environment:   The  INFILE  statement  contains  opera2ng  environment-­‐specific  material.  See  the  SAS  documenta2on  for  your  opera2ng  environment  before  using  this  statement.    

See:   INFILE  Statement  under  Windows,  UNIX,  and  z/OS  

INFILE Statement Specifies an external file to read with an INPUT statement.

Syntax    INFILE  file-­‐specifica2on  <device-­‐type><op2ons  >;    INFILE  DBMS-­‐specifica2ons;  

device-type

device-type = specifies the type of device or the access method that is used if the fileref points to an input or output device or location that is not a physical file: FTP, URL, socket, etc.

Options: delimiter = ‘,’ for example, others, file-specification identifies the source of the data, file location, etc.

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

Valid  in:   DATA  step  

Category:   File-­‐handling  

Type:   Executable  

Describes the arrangement of values in the input data record and assigns input values to the corresponding SAS variables.

INPUT  Statement  

Syntax    INPUT  <specifica2on(s)>  <@  |  @@>;    

Specifica2ons  -­‐  variable  or  list  of  variables,  along  with  informats.  Default  is  numerical  (BEST12.)  

$ - specifies to store the variable value as a character value rather than as a numeric value. Tip: if the variable is previously defined as character, the $ is not necessary.

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

Valid in: Anywhere Category: Data Access See: FILENAME Statement under Windows, UNIX, and z/OS

Associates a SAS fileref with an external file or an output device, disassociates a fileref and external file, or lists attributes of external files.

FILENAME  Statement  

FILENAME  fileref  <device-­‐type>  'external-­‐file'  <ENCODING='encoding-­‐value'>    <op2ons>  <opera2ng-­‐environment-­‐op2ons>;  

Syntax:  

FILENAME  fileref  <device-­‐type>  'external-­‐file‘;  

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

fileref                  is  any  SAS  name  that  you  use  when  you  assign  a  new  fileref.  

Tip:  The  associa2on  between  a  fileref  and  an  external  file  lasts  only  for  the                      dura2on  of  the  SAS  session  or  un2l  you  change  it  or  discon2nue  it  by  using                        another  FILENAME  statement.  Change  the  fileref  for  a  file  as  ooen  as  you  want.  

FILENAME  fileref  <device-­‐type>  'external-­‐file‘;  

device-type

specifies the type of device or the access method that is used if the fileref points to an input or output device or location that is not a physical file. Ex: URL, FTP, etc.

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

Introduction to SAS Programming and Applications

Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1

The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.

Example  1_2  a-­‐g:    are  repeats  of  1_1  a-­‐g  but  the  files  are  comma  delimited  

DATA Example1_2a; INFILE DATALINES delimiter=‘,’; INPUT ID Age Exam1; DATALINES; 1,18,92 2,21,88 3,26,98 ;