LEARN SAS within 7 weeks: Part2 (Introduction to SAS – The Data Step)

Download LEARN SAS within 7 weeks: Part2 (Introduction to SAS – The Data Step)

Post on 11-Apr-2015

12.753 views

Category:

Documents

4 download

Embed Size (px)

DESCRIPTION

The CARDS statement, INFILE and INPUT statements. How to Input Another SAS Data Set (the LIBNAME statement) More on LIBNAME and LIBREF, the SET statementThe FILE and PUT statementsData Input/Output from ASCII to ASCIIAdvanced INPUT Features How to Handle Missing Values How to Describe SAS Data Sets How to Label Variables How to Label a Data Set The PROC CONTENTS Procedure How to Use FORMAT to Document Variable Values Using the VIEWTABLE Minimizing the Space Taken by a SAS Data Set

TRANSCRIPT

<p>Week 8 Unit 4</p> <p>Introduction to SAS The DATA Step</p> <p>SAS for Data Management</p> <p>Week 8: Introduction to SAS The Data Step Welcome. As mentioned in the introduction to this unit (click on the Unit 4 tab) , the two principal building blocks of a SAS program are the DATA step and the PROC step. This reading is a detailed introduction to the DATA step. The emphasis is on using the DATA step for purposes of reading, displaying, and writing data. Not described, but</p> <p>possible, is use of the DATA step to accomplish other tasks, such as simulations. The latter is beyond the scope of this course. Goals of Week 8: Introduction to SAS The Data Step 1. To understand the nature of, and purposes of, the DATA step; 2. To be able to read data into SAS from a variety of platforms (instream, external file, other SAS data set); 3. To appreciate, and be competent in, the formatting of data for ease of readability; 4. To be able to view data; 5. To be able to write SAS data out to a variety of platforms; 6. To be familiar with the SAS viewtable feature and to appreciate that this is not recommended for use in data editing; and 7. To appreciate, and be competent in, the minimization of SAS storage of data.</p> <p>week 08</p> <p>8.1</p> <p>Week 8</p> <p>Introduction to SAS The DATA Step</p> <p>Week 8 Outline Introduction to SAS: The Data Step Section Topic 1. 2. 3. 4. 5. 6. 7. 8. 9. How SAS Represents Data . .. How to Input Data Instream (the CARDS statement) . How to Input Data Stored Text Format (INFILE and INPUT) .... How to Input Another SAS Data Set (the LIBNAME statement) . More on LIBNAME and LIBREF . How to Read and Write From One or More SAS Data Sets to Another (the SET statement) .. Writing Data to ASCII from SAS (the FILE and PUT statements). Data Input/Output from ASCII to ASCII The INPUT Command . a. List input .. b. Character ($) and Imbedded Blanks (&amp;) . c. Column or Formatted Input d. Easy Column Input Using the At Symbol (@) Advanced INPUT Features .. a. Reading Data With Multiple Lines Per Record (# and Slash) b. Reading Multiple Records from the Same Line of Data.. c. Reading Varying Numbers of Lines per Record . How to Handle Missing Values .. a. SAS Missing Value Codes .. b. The MISSING Statement . c. The INVALIDDATA Option . How to Describe SAS Data Sets . a. How to Label Variables b. How to Label a Data Set . c. The PROC CONTENTS Procedure .. d. How to Use FORMAT to Document Variable Values . e. Using the VIEWTABLE Minimizing the Space Taken by a SAS Data Set </p> <p>Page 3 5 6 7 11 15 16 17 18 18 23 24 25 28 28 31 33 36 36 39 40 43 44 45 48 50 54 58</p> <p>10.</p> <p>11.</p> <p>12.</p> <p>13</p> <p>week 08</p> <p>8.2</p> <p>Week 8 1. How SAS Represents Data</p> <p>Introduction to SAS The DATA Step</p> <p>SAS represents data in tabular or rectangular form, where each column represents a field or variable, which must be named, and each row represents a record or observation. Observations are numbered sequentially. When data is sorted on some field, such as age, the observations will be renumbered sequentially after sorting. The observation number is not stored with the data, but is printed or displayed as a convenience.</p> <p>Typical Listing of Data in SAS Procedure: Obs 1 2 3 4 sid 1 2 3 4 age 17 26 41 29 height 56 62 60 66</p> <p>Listing from Print Procedure Using Print Displayed in HTML Table Format:Obs 1 2 3 4 sid 1 2 3 4 age 17 26 41 29 height 56 62 60 66</p> <p>View of Data using SAS VIEWTABLE:</p> <p>week 08</p> <p>8.3</p> <p>Week 8</p> <p>Introduction to SAS The DATA Step</p> <p>The DATA step is the most common method of data input or output from the SAS system. The DATA step consists of several SAS statements, where the particular statements required depend upon the source of data input. All data steps begin with the keyword DATA.</p> <p>week 08</p> <p>8.4</p> <p>Week 8</p> <p>Introduction to SAS The DATA Step</p> <p>2. How to Input Data Instream (the CARDS statement) When you have a small amount of data that can be entered directly by typing it in within a program, you may choose instream data entry using the CARDS statement. This is most common when trying a small example or testing out a new program.</p> <p>The following example creates a temporary SAS dataset called A1 with 3 variables and 4 observations.</p> <p>DATA A1; /* A1 is name of new dataset */ INPUT SID AGE HEIGHT; /* INPUT specifies variable names */ CARDS; / * CARDS indicates data follows */ 1 17 56 2 26 62 3 41 60 4 29 66 ; /* The semicolon indicates end of data*/; RUN; / * RUN indicates end of data step */ Notice the provision of /* comments*/ to explain the meaning of the code. The DATA statement names the dataset to be created. The INPUT statement names the variables or fields that are to be read. The CARDS statement indicates that data lines follow, and the semicolon (;) on the line after the data, indicates the end of the data lines. A RUN statement is used at the end of each DATA or PROC step in SAS so that the group of statements will be executed. This is optional if the data step is followed by another data step or proc step but you must have it at the end of a program or the last step will not be executed.</p> <p>week 08</p> <p>8.5</p> <p>Week 8</p> <p>Introduction to SAS The DATA Step</p> <p>3. How to Input Data Stored Text Format (the INFILE and INPUT statements) More commonly data is read in from other sources, such as ASCII data files, or from other SAS data files rather than appearing instream in the program. The basic syntax of a DATA step when reading the data from an ASCII file is as follows:</p> <p>DATA NEW1 ; INFILE 'C:\TEMP\RAW.DTA'; INPUT VAR1 VAR2 ; RUN;</p> <p>/* NEW1 is the name of the new SAS data set */ /* specifies the file RAW.DTA on C:\TEMP */ /* specifies names for variables */</p> <p>The INFILE statement can identify an ASCII data file stored on a disk drive or from directories on the hard drive by specifying the appropriate path. The path and filename must be enclosed in single quotes. Many options are available to tailor the INFILE statement to a particular data set. For example, the number of columns to be read can be controlled with a linesize or logical record length specification on the INFILE statement. For more details see the SAS Language Guide or SAS HELP.</p> <p>Following the INFILE statement in SAS will be an INPUT statement that specifies the correspondence between variable names assigned in SAS and columns in the ASCII data file. This is where variable names are assigned. This statement will be discussed in more detail later.</p> <p>week 08</p> <p>8.6</p> <p>Week 8</p> <p>Introduction to SAS The DATA Step</p> <p>4. How to Input Another SAS Data Set (the LIBNAME Statement)</p> <p>When the data file to be input is itself a SAS data file, the DATA step takes on a slightly different form. A SAS data file already has the columns identified with variable names, and so the INPUT statement is not needed. The following example reads a previously stored SAS data file called example3, and creates a temporary SAS data file called A2. LIBNAME SDATA 'C:\TEMP'; DATA A2; SET SDATA.EXAMPLE3; ... RUN; /* specifies location of SAS data files */ /* names new dataset to be created */ /* names SAS dataset to be read */ /* ( other SAS statements here) */</p> <p>The LIBNAME statement is just a nickname (SAS calls this the libref) together with its companion pointer to the path (the drive and directory) where the SAS data set is to be saved. Consider the libname statement LIBNAME sdata c:\temp; The nickname (called the libref in SAS) is sdata Thus, sdata is the nickname for the path c:\temp</p> <p>The SET statement names the SAS data set that is to be read in. When a single level name (single word, no dot . followed by an extension) is used in creating a dataset, as A2 in this example, it is saved as a "working" (meaning temporary) data set while you are running the SAS system. Thus, as soon as you close SAS the "working" data sets are erased. Working data</p> <p>week 08</p> <p>8.7</p> <p>Week 8</p> <p>Introduction to SAS The DATA Step</p> <p>sets are stored in the SAS WORK library. You can view active SAS libraries in the Explorer Window:</p> <p>To save a SAS data set as a permanent data set one that will be there after you exit from the SAS software a two level (libref.dsn) name must be given in the DATA statement. This example saves a copy of a temporary SAS dataset. o The first part of the name (the library reference or libref) matches exactly the nickname (which points to the path comprised of drive and subdirectory) assigned in a LIBNAME statement.</p> <p>In order to create a permanent (saved) SAS data set, you need to run the following lines in a SAS Program Editor window.</p> <p>week 08</p> <p>8.8</p> <p>Week 8</p> <p>Introduction to SAS The DATA Step</p> <p>This is the location, in single quotes, of the physical directory where you would like to save the permanent SAS dataLIBNAME IN A:\HW3; DATA IN.A2; SET A2; RUN; This is the name (A2) of the temporary SAS data set that you want to save.</p> <p>This is the name you would like to call your permanent SAS data set. The libref (IN) before the dot (.) must match the name you wrote on a LIBNAME</p> <p>o Stored or permanent SAS data files all have an automatic filename extension added. You will see this extension when you look at the file in the Windows Explorer or My Computer. This extension is assigned by SAS, and is not specified in any SAS statements. In version 8, the extension added is .sas7bdat .</p> <p>Icon and name for saved V8 SAS data set, as seen in Windows Explorer.The location, or path (disk drive and directory) of SAS data files, is specified in a LIBNAME statement. If you double-click on this icon, the SAS Windows will open, and the data file will open in VIEWTABLE format.</p> <p>week 08</p> <p>8.9</p> <p>Week 8</p> <p>Introduction to SAS The DATA Step</p> <p>DO NOT change the name of a SAS data file in Windows Explorer or My Computer. Information on the external file name is saved within the file. If you rename A2.sas7bdat to be A3.sas7bdat you will get an error message when you try to open or use the file in SAS.</p> <p>week 08</p> <p>8.10</p> <p>Week 8 5. More on LIBNAME and LIBREF</p> <p>Introduction to SAS The DATA Step</p> <p>You can think of the directories on hard disk or floppy disks as libraries for storing data. The LIBNAME statement is simply a pointer, an instruction that says Im pointing to a location. The location that is pointed to is a directory and subdirectory path address that is contained in single or double quotes (I recommend double quotes) It gives a convenient way of indicating a code word or library reference (SDATA and IN, in the above examples) that refers to a specific location (library) for reading and/or storing SAS data files.</p> <p>Libname IN z:\bigelow\consulting\jurgens 2003\sasdata;</p> <p>LIBNAME is informing SAS that an address (where stuff can be found) is being provided.</p> <p> Here it is given the nickname libref IN . z\bigelow\consulting\jurgens 2003\sasdata is the actual directory and subdirectory path location.</p> <p>week 08</p> <p>8.11</p> <p>Week 8 </p> <p>Introduction to SAS The DATA Step New library button:</p> <p>Libraries can also be defined from the toolbar.</p> <p>Using the new library button lets you define the LIBREF (or code word for that library), the ENGINE (or data format) and the PATH (drive and directory).</p> <p>TIP: The advantage of using a libname statement within a program is that the definition of the library becomes part of your program, and will be redefined each time the program is run. If you use the toolbar to set your library, you must remember to set up your libraries each time you re-open SAS.</p> <p>week 08</p> <p>8.12</p> <p>Week 8</p> <p>Introduction to SAS The DATA Step</p> <p>You must have a separate library defined for each version (engine) of SAS Older versions of SAS stored data in different formats. SAS refers to these as engines. For example, version 6.12 of SAS used a default extension of .SD2. Earlier DOS versions (6.04) of SAS used the extension .SSD . If you know you are reading SAS data files that were saved with an earlier version of SAS, you must have these data sets stored in a different directory or subdirectory from V8 SAS data files. A separate LIBNAME statement must be used for each (sub)-directory.</p> <p>For example, the following lines could be used to read an old SAS data set (version 6.12), and save a copy of it in the new SAS (version 8.2) format: LIBNAME OLD V612 C:\OLDSAS; LIBNAME NEW V8 C:\TEMP; DATA NEW.D1; SET OLD.D1; RUN; /* Old uses v612 engine, .sd2 format */ /* New v8 engine, .sas7bdat */</p> <p>Two libname statements are used to name 2 directories, the first called OLD, which contains the file D1.SD2, version 6.12 format. The new data set, D1.SAS7bdat will be saved in the C:\TEMP directory. The engine or version of SAS that created the data set (in this example, they are v612 and v8) can be named before the path specification on the libname statement. If you are unsure of the engine, it is not required, as long as only one type of SAS file can be found in that directory.</p> <p>week 08</p> <p>8.13</p> <p>Week 8</p> <p>Introduction to SAS The DATA Step</p> <p>Take care that data stored by older versions of SAS or other formats that will be used in SAS, are stored in separate directories, otherwise you will get an error message indicating that the data cannot be read.</p> <p>Do not use the engine names for library names. Note that the SAS engine names begin with V for version. Therefore, avoid using a library name such as Vnnn, where nnn is a number. A list of engine names can be found in the new library window.</p> <p>week 08</p> <p>8.14</p> <p>Week 8</p> <p>Introduction to SAS The DATA Step</p> <p>6. How to Read and Write Data from One or more SAS Data Sets to Another (the SET statement)</p> <p>When data is already in SAS format, use a SET statement after the DATA statement to point to the SAS data set you are reading from.</p> <p>The next example reads two SAS data files, and concatenates them, storing the result as a single new SAS data set in the same directory. If you want to store the new data file in a different location (directory), a separate libname statement is required. LIBNAME SDATA 'C:\TEMP\'; /* specifies location of SAS data */ DATA SDATA.NEW1; /* creates a file named NEW1.SAS7BDAT on C:\TEMP */ SET SDAT.TEST1 SDAT.TEST2; /* concatenates files TEST1 and TEST1 */; /* other SAS instructions would go here */ RUN;</p> <p>The SET statement in the DATA step can list a single SAS data file, or many files. Various options are available using the SET statement to help tailor how the two files will be combined. The SET statement may also be replaced by a MERGE statement when data records are to be combined on a record-by-record basis. Each of these applications will be discussed in greater detail in a later section.</p> <p>week 08</p> <p>8.15</p> <p>Week 8</p> <p>Introduction to SAS The DATA Step</p> <p>7. Writing Data to ASCII Files from SAS (the FILE and PUT statements) It is also possible to create ASCII files from SAS datasets. This can be useful for transferring data into other programs for specific applications. Creation of ASCII output data files from SAS data sets makes use of a combination of the LIBNAME and SE...</p>