11 chapter 3: reading and processing data 3.1 processing sas data sets 3.2 processing external files

63
1 1 Chapter 3: Reading and Processing Data 3.1 Processing SAS Data Sets 3.2 Processing External Files

Upload: gladys-york

Post on 28-Dec-2015

233 views

Category:

Documents


1 download

TRANSCRIPT

11

Chapter 3: Reading and Processing Data

3.1 Processing SAS Data Sets

3.2 Processing External Files

22

Chapter 3: Reading and Processing Data

3.1 Processing SAS Data Sets3.1 Processing SAS Data Sets

3.2 Processing External Files

3

Objectives

3

Use SAS file I/O functions to manipulate SAS data sets.

Retrieve metadata.

4

Managing SAS Data Sets The Orion Star programmers need macros to perform the following data management tasks:

1. Test the existence of a data set.

2. Determine the number of observations in a data set.

3. Determine the age of a data set.

4. Archive a data set.

5. Create a data set for every worksheet in an Excel workbook.

They decided to use the SAS File I/O functions and metadata to accomplish these tasks.

4

5

Using Functions to Manipulate FilesSAS supports different ways to manipulate and obtain information about SAS files and other files. Many of these techniques require a DATA step or PROC step to be part of the SAS code.

Some functions, generally used in the DATA step and SCL, permit direct access to files. These functions, when used with the macro facility, enable the same direct access without introducing additional program steps.

The functions can be categorized into two groups: SAS file I/O functions external file functions

5

6

SAS File I/O FunctionsFunctions to access a SAS data set: EXIST OPEN CLOSE

Functions to access data set descriptor information: DSNAME VARNUM ATTRC ATTRN

Functions to access data library information: LIBREF PATHNAME

6

7

Task 1: Determine Data Set Existence Use the EXIST function to test for the existence of a data set before progressing further into a macro program.

7

%macro printds(dset); %if %sysfunc(exist(&dset))= 0 %then %do;

%put ERROR: Data set &dset does not exist.; %put ERROR- Macro will terminate now.;

%return; %end; proc print data=&dset (obs=10) noobs; title "First 10 Observations from &dset"; run;%mend printds;

m203d01

8

Task 1: Determine Data Set Existence Partial SAS Log

Partial SAS Log

8

%printds(orion.daily_sales)NOTE: There were 10 observations read from the data set ORION.DAILY_SALES.NOTE: PROCEDURE PRINT used: real time 0.01 seconds cpu time 0.00 seconds

29 %printds(orion.daily)ERROR: Data set orion.daily does not exist. Macro will terminate now.

m203d01

9

Task 2: Obtain Attribute InformationThe Orion Star programmers found that many times a data set might exist but is empty. They want to verify that a data set is not empty before performing further processing.

The following steps provide data set attribute information:

1. Open the data set using the OPEN function.

2. Retrieve a numeric attribute using the ATTRN function.

3. Retrieve a character attribute using the ATTRC function.

4. Close the data set using the CLOSE function.

9

10

Step 1: Open the SAS Data SetThe OPEN function opens a SAS data set and returns a unique numeric data set identifier. The data set identifier, a nonzero positive number, is used in most other SAS File I/O functions. The OPEN function returns 0 if the data set cannot be opened.

General form of the OPEN function:

Partial SAS Log

10

OPEN(data-set-name)OPEN(data-set-name)

4 %let dsid=%sysfunc(open(orion.daily_sales));5 %put dsid=&dsid;dsid=1

11

Step 2: Use the ATTRN FunctionThe ATTRN function returns the value of a numericattribute of a data set.

General form of the ATTRN function:

Selected attribute-name values and descriptions:

11

ATTRN(data-set-identifier, attribute-name) ATTRN(data-set-identifier, attribute-name)

CRDTE creation date (SAS datetime value)

MODTE the last modified date (SAS datetime value)

NVARS number of variables

ISINDEX whether a data set is indexed (0 or 1)

NLOBS number of non-deleted observations

12

Step 3: Use the ATTRC Function The ATTRC function returns the value of a characterattribute of a data set.

General form of the ATTRC function:

Selected attribute-name values and descriptions:

12

ATTRC(data-set-identifier, attribute-name)ATTRC(data-set-identifier, attribute-name)

SORTEDBY BY variables (if data set is sorted)

LABEL data set label

MEM data set name

LIB current libref for the data set

13

Step 4: Close the SAS Data Set The CLOSE function closes a SAS data set. The CLOSE function returns 0 if the operation was successful and returns a nonzero value if it was not successful.

General form of the CLOSE function:

Partial SAS Log

It is important to close all SAS data sets as soon as they are no longer needed by the application.

13

CLOSE(data-set-identifier)CLOSE(data-set-identifier)

6 %let dsidc=%sysfunc(close(&dsid));7 %put dsidc=&dsidc;dsidc=0

14

Obtaining Number of Observations Use the NLOBS attribute to obtain the number of observations in a data set and assign this value to a macro variable.

14

%macro numobs(dsn); %local dsid nobs dsidc; %let dsn=%upcase(&dsn); %let dsid=%sysfunc(open(&dsn)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsidc=%sysfunc(close(&dsid)); %if &nobs=0 %then %do; %put ERROR: &dsn contains 0 Observations.; %put ERROR- PROC PRINT will not execute.; %return; %end; proc print data=&dsn (obs=10) noobs; title "First 10 Observations"; title2 "&dsn Contains &nobs Observations"; run;%mend numobs;

m203d02

15

Obtaining Number of Observations Partial SAS Log

15

231 %numobs(orion.daily_sales)NOTE: There were 10 observations read from the data set ORION.DAILY_SALES. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds

232 %numobs(orion.no_rows)ERROR: ORION.NO_ROWS contains 0 Observations. PROC PRINT will not execute.

m203d02

16

Obtaining Number of Observations PROC PRINT Output

16

First 10 Observations ORION.DAILY_SALES Contains 58 Observations

Total_ Product_ID Product_Name Retail_Price

220200200024 Pro Fit Gel Gt 2030 Women's Running Shoes $178.50 220200100092 Big Guy Men's Air Terra Sebec Shoes $83.00 240200100043 Bretagne Performance Tg Men's Golf Shoes L. $282.40 220100700024 Armadillo Road Dmx Women's Running Shoes $99.70 220200300157 Hardcore Men's Street Shoes Large $220.20 240200100051 Bretagne Stabilites 2000 Goretex Shoes $420.90 220200100035 Big Guy Men's Air Deschutz Viii Shoes $125.20 220200100090 Big Guy Men's Air Terra Reach Shoes $177.20 220200200018 Lulu Men's Street Shoes $132.80 240200100052 Bretagne Stabilities Tg Men's Golf Shoes $99.70

m203d02

1717

18

3.01 Quiz1. Open the program m203a01.

2. Add the syntax to create the macro variable SORTED that contains the SORTEDBY= attribute for the data set orion.staff.

What is the value of &SORTED?

18m203a01

%let dsn=orion.staff;%let openrc=%sysfunc(open(&dsn));

%let sorted= ;

%let closerc=%sysfunc(close(&openrc));

%put Data set &dsn is sorted by &sorted..;

19

3.01 Quiz – Correct Answer1. Open the program m203a01.

2. Add the syntax to create the macro variable SORTED that contains the SORTEDBY= attribute for the data set orion.staff.

What is the value of &SORTED? Employee_ID

19m203a01

%let dsn=orion.staff;%let openrc=%sysfunc(open(&dsn));

%let sorted=%sysfunc(attrc(&openrc,sortedby));

%let closerc=%sysfunc(close(&openrc));

%put Data set &dsn is sorted by &sorted..;

20

Task 3: Determine the Age of a SAS Data Set The Orion Star programmers need a way to determine when to refresh a data set. They decided to use the CRDTE attribute to calculate the age of a data set.

20m203d03

%macro age(dsn); %local dsid crdate dsidc days; %let dsid=%sysfunc(open(&dsn)); %let crdate=%sysfunc(attrn(&dsid,crdte)); %let dsidc=%sysfunc(close(&dsid)); %let days=%sysevalf("&sysdate9"d -%sysfunc(datepart(&crdate))); %if &days > 0 %then %do; %put WARNING: &dsn is &days day(s) old. It is being recreated.; data &dsn; infile 'orders03.dat'; input Order_ID Order_Type Order_Date : date9.; format Order_Date date9.; run; %end; %else %put NOTE: &dsn is current.;%mend age;

21

Task 3: Determine the Age of a SAS Data Set Partial SAS Log

21

22 %age(orion.orders03)WARNING: orion.orders03 is 1 day(s) old. It is being recreated.

NOTE: The infile 'orders03.dat' is: Filename=C:\workshop\orders03.dat, RECFM=V,LRECL=256,File Size (bytes)=2496, Last Modified=31Jan2008:18:09:56, Create Time=16Jun2008:17:09:05

NOTE: 104 records were read from the infile 'orders03.dat'. The minimum record length was 22. The maximum record length was 22.NOTE: The data set ORION.ORDERS03 has 104 observations and 3 variables.NOTE: DATA statement used (Total process time): real time 0.14 seconds cpu time 0.06 seconds

23 %age(orion.orders03)NOTE: orion.orders03 is current.

m203d03

22

Task 4: Archive a SAS Data SetBecause many of Orion Star’s macro applications refresh SAS data sets, the programmers want to archive the current data set before the data set is refreshed. They decided to concatenate today’s date to the end of the data set name, using the RENAME and TODAY functions.

General form of the RENAME function:

General form of the TODAY function:

22

RENAME(old-name, new-name)RENAME(old-name, new-name)

TODAY( )TODAY( )

23

Task 4: Archive a SAS Data Set

Partial PROC CONTENTS Output

23m203d04

%let newname=daily_sales_%sysfunc(today(), date9.);%let rc=%sysfunc(rename(orion.daily_sales, &newname));

proc contents data=orion._all_ nods;run;

Member File # Name Type Size Last Modified

1 COUNTRY DATA 17408 01Jul08:23:11:48 COUNTRY INDEX 17408 01Jul08:23:11:48 2 CUSTOMER DATA 33792 30Jul08:22:28:42 3 CUSTOMER_DIM DATA 33792 14Dec07:09:05:44 4 CUSTOMER_TYPE DATA 17408 30Jul08:01:29:54 CUSTOMER_TYPE INDEX 9216 30Jul08:01:29:54 5 DAILY_SALES_07OCT2008 DATA 9216 21Aug08:14:18:18 6 ORDER_FACT DATA 66560 10Jul08:19:45:26 7 SALES DATA 25600 27Jul08:21:40:55

24

Task 5: Create Data Sets from WorksheetsThe Orion Star programmersneed a macro to import every worksheet in a given Excelworkbook.

24

%READXLS

Australia$ United States$

Sales.xls

Australia UnitedStates

25

Task 5: Create Data Sets from WorksheetsThe programmers will use SAS session metadata that is available via PROC SQL DICTIONARY tables or Sashelp views.

The metadata includes information on the following: SAS files external files macro variables system options, titles, and footnotes

25

26

Task 5: Create Data Sets from WorksheetsThe macro will incorporate these elements: SAS/ACCESS LIBNAME statement sashelp.vtable an iterative %DO loop indirect macro variable references

26

27

Reading Excel Files Using the LIBNAME Statement The SAS/ACCESS LIBNAME statement extends the LIBNAME statement to support assigning a library reference name (libref) to Microsoft Excel workbooks.

This enables you to reference worksheets directly in a DATA step or SAS procedure. Each worksheet in the Excel workbook is treated as though it were a SAS data set.

27

libname xlsdata 's:\workshop\c3\sales.xls';

proc contents data=xlsdata._all_;run;

m203d05

28

Partial PROC CONTENTS Output

28

The CONTENTS Procedure

Directory

Libref XLSDATA Engine EXCEL Physical Name sales.xls User Admin

DBMS Member Member # Name Type Type

1 Australia$ DATA TABLE 2 UnitedStates$ DATA TABLE

Reading Excel Files Using the LIBNAME Statement

29

Reading Excel Files Using the LIBNAME Statement All worksheets will be referenced with a SAS two-level name, that is, libref.data-set-name. If the worksheet name contains special characters, you must use the SAS name literal construct of "name"n.

29

data australia; set xlsdata.'Australia$'n; run;

m203d05

30

Using SAS Session MetadataUse sashelp.vtable to create a series of macro variables that contain the member names.

Partial SAS Log

30

data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname); if last then call symputx('n', _n_);run;

473 %put _user_;GLOBAL SHEET1 Australia$GLOBAL SHEET2 UnitedStates$GLOBAL N 2

m203d05

3131

32

3.02 QuizOpen the program m203a02 and replace the question marks in the SYMPUTX routine so that it creates macro variables containing the names of all of the data sets in the ORION library.

32

data _null_; set sashelp.vtable; where libname='ORION'; call symputx(cats('dsn', _N_), ??????????); run;

%put _user_;

m203a02

33

3.02 Quiz – Correct AnswerOpen the program m203a02 and replace the question marks in the SYMPUTX routine so that it creates macro variables containing the names of all of the data sets in the ORION library.

33

data _null_; set sashelp.vtable end=last; where libname='ORION'; call symputx(cats('dsn', _N_), memname);run;

m203a02

34

Iterative %DO Loops (Review)The iterative %DO statement executes a section of a macro repetitively, based on the value of an index variable.

General form of the iterative %DO statement:

34

%DO index-variable=start %TO stop <%BY increment>; text%END;

%DO index-variable=start %TO stop <%BY increment>; text%END;

%macro putloop; %do i=1 %to &n; %put Sheet&i is &&sheet&i; %end;%mend putloop;

m203d05

35

The indirect reference causes a second scan of the macro variable reference.

Partial Symbol Table

35

Indirect Macro Variable References (Review)

reference

1st scan

&&sheet&i

&sheet1

Australia$2nd scan

Variable Value

I 1

SHEET1 Australia$

SHEET2 UnitedStates$

3636

37

3.03 QuizHow would you use indirect references to refer to the macro variables created in m203a02 so that you can use them in the following DO loop?

37

%do i=1 %to &n; %put The values of the macro variables are __________ ;%end;

38

3.03 Quiz – Correct AnswerHow would you use indirect references to refer to the macro variables created in m203a02 so that you can use them in the following DO loop?

38

do i=1 %to &n; %put The names of the macro variables are &&dsn&i ;%end;

39

Processing a Data LibraryUse a %DO loop to generate a DATA step and a PROC PRINT step for every worksheet in an Excel workbook.

39 m203d06

%macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear;%mend readxls;%readxls(sales.xls)

...

40

Processing a Data LibraryThe %LENGTH function is used to return the number of characters in &&SHEET&I.

40

%macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear;%mend readxls;%readxls(sales.xls)

...m203d06

41

Processing a Data LibraryThe %EVAL function enables subtraction of 1 from that length to create a macro variable LEN that is the length of the spreadsheet name without the $.

41

%macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear;%mend readxls;%readxls(sales.xls)

...

42

Processing a Data LibraryThe %SUBSTR function creates a macro variable DSN, beginning at position 1, for the length number of characters in the macro variable LEN.

42

%macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear;%mend readxls;%readxls(sales.xls)

43

Processing a Data LibraryPartial SAS Log

43 m203d06

%readxls(sales.xls)

NOTE: There were 63 observations read from the data set XLSDATA.'Australia$'n.NOTE: The data set WORK.AUSTRALIA has 63 observations and 9 variables.NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds

NOTE: There were 63 observations read from the data set WORK.AUSTRALIA.NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds

NOTE: There were 102 observations read from the data set XLSDATA.'UnitedStates$'n.NOTE: The data set WORK.UNITEDSTATES has 102 observations and 9 variables.NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds

NOTE: There were 102 observations read from the data set WORK.UNITEDSTATES.NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds cpu time 0.01 seconds

44

Exercise

This exercise reinforces the concepts discussed previously.

44

4545

Chapter 3: Reading and Processing Data

3.1 Processing SAS Data Sets

3.2 Processing External Files3.2 Processing External Files

46

Objectives Use external file functions to examine files that are not

SAS files.

46

47

Processing External Files The Orion Star programmers want to reduce redundant code when reading multiple external files into SAS data sets. The applications should be able to process the files in a given directory and subdirectory in order to accomplish the following tasks:

1. Process all DAT files.

2. Import all CSV files.

3. Read every worksheet in all of the Excel workbooks.

They decided to use the external file functions to accomplish these three tasks.

47

48

External File FunctionsFunctions to access a directory: DOPEN DNUM DREAD DCLOSE

Functions to access an external file: FILEEXIST and FEXIST FILENAME FOPEN FCLOSE

Functions to read from or write to an external file: FREAD FGET FPUT and FWRITE

48

49

Processing External Files

1. Use the FILENAME function to assign a fileref to the directory.

2. Use the DOPEN function to open the directory.

3. Use the DNUM function to identify how many members are in the directory.

4. Use the DREAD function to extract each member name.

5. Process the external files.

6. Use the DCLOSE function to close the directory.

49

%SYSFUNC is required to execute these functions within the macro facility.

The DOPEN, DNUM, and DREAD functions enable access to all the external files found in a given directory.Use these steps for processing files from a directory:

50

Steps 1 and 2: Access a DirectoryFor applications to extract information about a directory and its contents, it is necessary to first open the directory using the DOPEN function. If it is successful, the function returns a directory identifier.

50 m203d07

%macro direxist(dir); %local fileref rc did didc; %let rc=%sysfunc(filename(fileref,&dir)); %let did=%sysfunc(dopen(&fileref)); %if &did=0 %then %do; %put ERROR: Directory does not exist; %return; %end; %put NOTE: Directory ID is &did ; %let didc=%sysfunc(dclose(&did)); %let rc=%sysfunc(filename(fileref));%mend direxist;

%dirlist(s:\workshop)

51

Steps 1 and 2: Access a DirectoryPartial SAS Log

51

50 %direxist(s:\workshop)NOTE: Directory ID is 15152 %direxist(s:\bad directory)ERROR: Directory does not exist

m203d07

52

Steps 3 and 4: Identify Members in a Directory To extract a list of member names, use the DNUM and DREAD functions.

52

%macro dirlist(dir); %local fileref rc did dnum dmem memname didc; %let rc=%sysfunc(filename(fileref,&dir)); %let did=%sysfunc(dopen(&fileref)); %if &did=0 %then %do; %put ERROR: Directory does not exist; %return; %end; %let dnum=%sysfunc(dnum(&did)); %do dmem=1 %to &dnum; %let memname=%sysfunc(dread(&did,&dmem)); %put &memname; %end; %let didc=%sysfunc(dclose(&did)); %let rc=%sysfunc(filename(fileref));%mend dirlist;

%dirlist(s:\workshop) m203d08

5353

54

3.04 QuizOpen the program m203d08, submit it, and investigate the log.

1.Are the extensions of the raw data files in uppercase or lowercase?

2.Are the extensions of the Excel workbooks in uppercase or lowercase?

54

55

3.04 Quiz – Correct AnswerOpen the program m203d08, submit it, and investigate the log.

1.Are the extensions of the raw data files in uppercase or lowercase?

The extension is DAT in lowercase.

2.Are the extensions of the Excel workbooks in uppercase or lowercase?

The extension is XLS in lowercase.

55

56

Steps 3 and 4: Identify Members in a Directory Partial SAS Log

56 m203d08

442 %dirlist(s:\workshop)age.sasattrc.sasattrn.sasbetween.sasC2C3C4C5charlist.sasclub_members.sas7bdatcountry.sas7bdatcountry_lookup.sas7bdatcustomer.sas7bdatcustomer_dim.sas7bdatcustomer_type.sas7bdatdaily_sales.sas7bdatdaily_sales.xlsdelsql.sasdelvars.sas

57

Task 1: Reading All DAT Files in a Directory

This demonstration illustrates reading each raw data file in a directory into a SAS data set.

57 m203d09

58

Task 2: Read All Excel WorkbooksThe Orion Star programmers want a single macro to import all Excel files found in a given directory.

58

%READXLS

order_type.xls customertype.xls daily_sales.xlsOrderFact.xlsSales.xls

order_type customertype sales orderfact daily_sales

59

Task 2: Read All Excel Workbooks Currently the READXLS macro accepts a single workbook name as a parameter. The programmers want to enhance the macro to read all workbooks in a directory.

Partial SAS Code

59

%macro readxls(dir); %local fileref rc did dnum dmem memname len dsn didc; %let rc=%sysfunc(filename(fileref,&dir)); %let did=%sysfunc(dopen(&fileref)); %if &did=0 %then %do; %put ERROR: Directory does not exist; %return; %end; %let dnum=%sysfunc(dnum(&did)); %do dmem=1 %to &dnum; %let memname=%sysfunc(dread(&did,&dmem)); %if %upcase(%scan(&memname,-1,.))=XLS %then %do;

m203d10

60

Task 2: Reading All Excel Files in a Directory

This demonstration illustrates reading all of the worksheets in all workbooks in a directory into a SAS data set.

60 m203d10

61

Task 3: Read All Excel Files in SubdirectoriesTo implement subdirectory recursion, use the %SCAN function to extract the second word of the member name where the period is the delimiter. If the second word is XLS, then read the Excel

spreadsheet. If the second word resolves to null, there is no

extension, so the first word identifies a subdirectory. Therefore, call the macro again.

61 m203d11

%else %if %scan(&memname,2,.)= %then %readxls(&dir\&memname);

Partial SAS Code

62

Task 3: Reading All Excel Files in Subdirectories

This demonstration illustrates reading all of the worksheets in all workbooks in a directory and a subdirectory into a SAS data set.

62 m203d11

63

Exercise

This exercise reinforces the concepts discussed previously.

63