Transcript

SAS Programming

SAS Data Mart

Outline

• Access different format of data for SAS• SAS data mart• SAS data manipulation

2

Making Use of SAS Enterprise Guide CodeImport a text file Example: Orders.txt

Import an Excel file Example: SupplyInfo.xls

3

Learn from ExamplesSAS Help Contents -> Learning to use SAS -> Sample SAS

Programs -> Base SAS “Base Usage Guide Examples”

– Chapter 3, 4

4

Import an Excel Sheetproc import out=work.commrex

datafile ="C:\Lin\Shared\ISQS6339\Commrex_3358.xls" dbms=excel replace;sheet="Company";getnames=yes;mixed=no;scantext=yes;usedate=yes;scantime=yes;run;proc print data=work.commrex;run;

5

Excel SAS/ACCESS LIBNAME Enginelibname xlsdata 'C:\Lin\Shared\ISQS6339\Commrex_6339.xls';

proc print data=xlsdata.New1;

run;

6

Exercise 10: SAS Data Step Programming Import and print text file Userlog00.txt

Import and print Excel sheet customers in Afs_customers.xls

Send the Word file containing the Screenshots of two outputs to [email protected]

7

8

SAS Data MartDefinitions with minor differences

A repository for data to be used as a source for business intelligence Object-oriented data repository, held for decision support and BI needs.

A data mart is a specialized version of a data warehouse. Like data warehouses, data marts contain a snapshot of operational data that

helps business people to strategize based on analyses of past trends and experiences.

The key difference is that the creation of a data mart is predicated on a specific, predefined need for a certain grouping and configuration of select data.

A data mart configuration emphasizes easy access to relevant information.

9

SAS Data Mart Structures for Different PurposesLongitudinal analysis Data mart has observations over time The generalized form of the data structure: time <-- value -->

category Example: Beer sales

Transactional data, and aggregated levels Example: web log

The order of rows matters Cross-sectional dimension data – ranked by categories Interleaved time series – ranked by time

10

AuctionDescription: This is data from 19 livestock auction markets. The columns

include: the number of head of different livestock sold (in thousands) including CATTLE, CALVES, HOGS, and SHEEP, the cost of operation of the auction market (in thousands of dollars) (COST), and the market identifier (MARKETID).

The object is to use multiple linear regression to describe the relationship between the cost of operations to the number of livestock sold in the various classes. COST will be the dependent variable and CATTLE, CALVES, HOGS, and SHEEP the independent variables.

An additional variable, VOLUME, is the total of all major livestock sold in each market. It is the sum of the variables CATTLE, CALVES, HOGS, and SHEEP, and can be used to demonstrate an exact linear dependency between independent variables. rows, columns.

Regression, Graphic Analysis

11

Beer SalesDescription: Beer sales records monthly sales of beer

in hectoliters, along with the average high and low temperatures in the region, over a period of five years.

The object is to see how beer sales change over time. You can also consider the relationships between beer sales and temperatures.

4 rows, 60 columns.Regression, Time Series

12

SAS Data Mart Topics

Data accessing Data manipulation – Proc SQLData transposingData transformations

Interval-scaled observations processing– Derived variables– Time intervals

Define start and end of the interval Define how intervals are measured

– Binning Observations into groups– Transformations of distributions– Replacing missing values

Categorical observations processing

Coding for predictive modelingData preparation for different formats of data martsData samplingModel scoring and automation

13

Data Transpose

Basic transposing tasks (Ch14) From a multi-rows-per-subject to a one-row-per-subject From a one-row-per-subject to a multi-rows-per-subject Creating key-value tables

Transposing longitudinal data (Ch15) From STANDARD to INTERLEAVED and back From CROSS SECTIONAL to STANDARD and back Complex scenarios

Macro

14

Data Transformations

Transformations of Interval-Scaled Variables (Ch16) Simple derived variables Derived relative variables Time intervals Binning observations into groups Transformations of distributions Replacing missing values

Transformations of Categorical Variables (Ch17) Derived variables Combining categories Dummy coding of categorical variables Multidimensional categorical variables

15

Exercise 11

Data transpose. Write a SAS code to transpose SAS data set Auction in the shared directory.

Transpose without other parameters Drop some variables Remove the generated variables

Input data. Userlog0.txt, Userlog1.txt

16

SAS Programming Illustrative Examples

Data sets merge

SAS Procedures (Proc) Proc Freq Proc Means

Demonstration: QUERYLOG.TXT

17

PROC FREQ

data SummerSchool; input Gender $ Internship $ Enrollment $ Count @@; datalines; boys yes yes 35 boys yes no 29 boys no yes 14 boys no no 27 girls yes yes 32 girls yes no 10 girls no yes 53 girls no no 23;proc freq data=SummerSchool order=data; weight count; tables Internship*Enrollment / chisq;

run;

18

PROC FREQ

PROC SQL;CREATE VIEW WORK.SORT

AS SELECT AnonID, QueryDate FROM SASUSER.IMPW_0012;

QUIT;

PROC freq DATA=WORK.SORTORDER=INTERNAL;

TABLES AnonID / NOPERCENT NOCUM SCORES=TABLE; TABLES QueryDate / NOPERCENT NOCUM SCORES=TABLE;RUN;

19

PROC MEANS

PROC SQL;CREATE VIEW WORK.SORTTempTableSorted

AS SELECT QueryTime, AnonID, QueryDate FROM SASUSER.IMPW_0012;QUIT;

PROC MEANS DATA=WORK.SORTTempTableSorted (drop= _way_ _type_)FW=12 PRINTALLTYPES CHARTYPE NWAY VARDEF=DF

MIN MAX N;VAR QueryTime;CLASS AnonID / ORDER=UNFORMATTED ASCENDING;CLASS QueryDate / ORDER=UNFORMATTED ASCENDING;

OUTPUT OUT=QUERYLOG(LABEL="Summary Statistics for SASUSER.IMPW_0012")MIN()= MAX()= N()=

/ AUTONAME AUTOLABEL WAYS INHERIT;RUN;

20

Case Study – Building a Customer Data Mart

Business Context To create an analytical model that predicts customers that have a high

probability to leave the company. To create a data mart that reflects the properties and behaviors of customers. The data mart will also contain a variable that presents the fact that a

customer has left the company at a certain point in time. Five SAS datasets

Customer – demographic and customer baseline data Accounts – information customer accounts Leasing – data on leasing information Callcenter – transactional data on call center contacts Scores – transactional data of value segment scores

The datasets have the unique CUSTID variable.

21

Data Mart ProgramsCreate aggregation on a subject level of ACCOUNTS, LEASING,

CALLCENTERS and SCORES tables ACCOUNT

– Derive sums per customer– Calculate an account type split based on the balance

LEASING– Summarize the value and the annual rate per customer

CALLCENTER– Aggregate the call center data over all customers using the FREQ proc– Calculate the frequencies for the category COMPLAINT

SCORES– Contains the VALUESEGMENT for each customer and month. If a customer cancelled his contract, he

got an “8.lost”, or he got “1.Gold”, etc.– Create three output data sets: value segments in actual month, one month ago, and in a future month

Join the resulting tables together and create a CUSTOMER DATA MART table.

22

Exercise 12

Use PROC FREQ to generate a summary statistics table from 6339EX12.xls dataset

Hints Define path:

– libname xlsdata ‘<path>\6339EX12.xls'; Define the worksheet as ‘weblog$’n Column: Page, Row: UserID

Explore the data mart created by the code of Ch25.sas Use SAS Enterprise Guide to create joint tables from the dataset Try One-way Frequency and Summary Statistics functions with the

ACCOUNTS data set

23

More TopicsMultiple Interval-Scaled Observation per Subject

Multiple Categorical per Subject

Coding for Predictive Modeling

Data Preparation for Multi-Rows-per-Subject and Longitudinal Data Marts

Sampling

Scoring and Automation

Do’s and Don’ts

24

Base Procedures Guide Examples

Elementary Statistics Procedures, Chapter 1 Reporting Procedures, Chapter 2 APPEND Procedure, Chapter 5 CALENDAR Procedure, Chapter 7 CHART Procedure, Chapter 9 COMPARE Procedure, Chapter 11 CORR Procedure, Chapter 15 FORMAT Procedure, Chapter 18 FORMS Procedure, Chapter 19 FREQ Procedure, Chapter 20 MEANS Procedure, Chapter 21 PLOT Procedure, Chapter 25 PRINT Procedure, Chapter 27 RANK Procedure, Chapter 29 SQL Procedure, Chapter 34 STANDARD Procedure, Chapter 35 TABULATE Procedure, Chapter 37 TIMEPLOT Procedure, Chapter 40 TRANSPOSE Procedure, Chapter 41 UNIVARIATE Procedure, Chapter 42

25

Base Usage Guide Examples

What is the SAS System?, Chapter 1 Introduction to DATA Step Processing, Chapter 2 Starting with Raw Data, Chapter 3 Starting with SAS Data Sets, Chapter 4 Understanding DATA Step Processing, Chapter 5 Working with Numeric Variables, Chapter 6 Working with Character Variables, Chapter 7 Acting on Selected Observations, Chapter 8 Creating Subsets of Observations, Chapter 9 Grouped or Sorted Observations, Chapter 10 More than One Obs. in a Calculation, Chapter 11 Finding Shortcuts in Programming, Chapter 12 Working with Dates in the SAS System, Chapter 13 Concatenating SAS Data Sets, Chapter 15 Interleaving SAS Data Sets, Chapter 16 Merging SAS Data Sets, Chapter 17 Updating SAS Data Sets, Chapter 18 Manipulating SAS Data Sets, Chapter 19 Understanding and Enhancing Your Output, Chapter 20 Printing Detailed Reports, Chapter 24 Creating Summary Tables, Chapter 25 Mailing Labels and Other Forms, Chapter 26 Plotting Relationships Between Variables, Chap. 27 Charting the Values of a Variable, Chapter 28 Writing Output, Chapter 29 Customizing Output, Chapter 30

26

SQL ExamplesDemonstrates Basic SQL SELECT Statements Demonstrates Basic SQL CREATE Statements Demonstrates Basic SQL DELETE Statements Demonstrates Basic SQL Dictionary Statements Demonstrates Basic SQL Insert Statements Demonstrates SQL Join Performance with Macros Demonstrates Basic SQL JOIN Features Demonstrates SQL Set Operators Introduction to the SQL Procedure, Chapter 1 Retrieving Data with the SELECT Statement, Chapter 2 Datasets Used in SQL Usage Guide Demonstrates Basic SQL UPDATE Statements Demonstrates SQL View Capabilities Demonstrates SQL View Update Capabilities Demonstrates Recursive Joins with PROC SQL Demonstrates Nested Statistics with SQL Demonstrates CASE Statement with PROC SQL Demonstrates the Product of Columns with PROC SQL Demonstrates Merging Tables with Proc SQL Demonstrates Use of Macros with PROC SQL Demonstrates Use of Cartesian Product with PROC SQL More Uses of the Cartesian Product with PROC SQL Demonstrates Use of Subqueries with PROC SQL Demonstrates Relationship Between Data Step and SQL


Top Related