sas programming sas data mart. outline access different format of data for sas sas data mart sas...

Download SAS Programming SAS Data Mart. Outline Access different format of data for SAS SAS data mart SAS data manipulation 2

Post on 23-Dec-2015

230 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • Slide 1
  • SAS Programming SAS Data Mart
  • Slide 2
  • Outline Access different format of data for SAS SAS data mart SAS data manipulation 2
  • Slide 3
  • Making Use of SAS Enterprise Guide Code Import a text file Example: Orders.txt Import an Excel file Example: SupplyInfo.xls 3
  • Slide 4
  • Learn from Examples SAS Help Contents -> Learning to use SAS -> Sample SAS Programs -> Base SAS Base Usage Guide Examples Chapter 3, 4 4
  • Slide 5
  • Import an Excel Sheet proc import out=work.commrex datafile ="C:\Lin\Shared\ISQS6339\Commrex_3358.xls" dbms=excel replace; sheet="Company"; getnames=yes; mixed=no; scantext=yes; usedate=yes; scantime=yes; run; proc print data=work.commrex; run; 5
  • Slide 6
  • Excel SAS/ACCESS LIBNAME Engine libname xlsdata 'C:\Lin\Shared\ISQS6339\Commrex_6339.xls'; proc print data=xlsdata.New1; run; 6
  • Slide 7
  • Exercise 10: SAS Data Step Programming Import and print text file Userlog00.txt Import and print Excel sheet customers in Afs_customers.xls Send the Word file containing the Screenshots of two outputs to zhangxi.lin@hotmail.com 7
  • Slide 8
  • 8 SAS Data Mart Definitions with minor differences A repository for data to be used as a source for business intelligence Object-oriented data repository, held for decision support and BI needs. A data mart is a specialized version of a data warehouse. Like data warehouses, data marts contain a snapshot of operational data that helps business people to strategize based on analyses of past trends and experiences. The key difference is that the creation of a data mart is predicated on a specific, predefined need for a certain grouping and configuration of select data. A data mart configuration emphasizes easy access to relevant information.
  • Slide 9
  • 9 SAS Data Mart Structures for Different Purposes Longitudinal analysis Data mart has observations over time The generalized form of the data structure: time category Example: Beer sales Transactional data, and aggregated levels Example: web log The order of rows matters Cross-sectional dimension data ranked by categories Interleaved time series ranked by time
  • Slide 10
  • 10 Auction Description: This is data from 19 livestock auction markets. The columns include: the number of head of different livestock sold (in thousands) including CATTLE, CALVES, HOGS, and SHEEP, the cost of operation of the auction market (in thousands of dollars) (COST), and the market identifier (MARKETID). The object is to use multiple linear regression to describe the relationship between the cost of operations to the number of livestock sold in the various classes. COST will be the dependent variable and CATTLE, CALVES, HOGS, and SHEEP the independent variables. An additional variable, VOLUME, is the total of all major livestock sold in each market. It is the sum of the variables CATTLE, CALVES, HOGS, and SHEEP, and can be used to demonstrate an exact linear dependency between independent variables. rows, columns. Regression, Graphic Analysis
  • Slide 11
  • 11 Beer Sales Description: Beer sales records monthly sales of beer in hectoliters, along with the average high and low temperatures in the region, over a period of five years. The object is to see how beer sales change over time. You can also consider the relationships between beer sales and temperatures. 4 rows, 60 columns. Regression, Time Series
  • Slide 12
  • 12 SAS Data Mart Topics Data accessing Data manipulation Proc SQL Data transposing Data transformations Interval-scaled observations processing Derived variables Time intervals Define start and end of the interval Define how intervals are measured Binning Observations into groups Transformations of distributions Replacing missing values Categorical observations processing Coding for predictive modeling Data preparation for different formats of data marts Data sampling Model scoring and automation
  • Slide 13
  • 13 Data Transpose Basic transposing tasks (Ch14) From a multi-rows-per-subject to a one-row-per-subject From a one-row-per-subject to a multi-rows-per-subject Creating key-value tables Transposing longitudinal data (Ch15) From STANDARD to INTERLEAVED and back From CROSS SECTIONAL to STANDARD and back Complex scenarios Macro
  • Slide 14
  • 14 Data Transformations Transformations of Interval-Scaled Variables (Ch16) Simple derived variables Derived relative variables Time intervals Binning observations into groups Transformations of distributions Replacing missing values Transformations of Categorical Variables (Ch17) Derived variables Combining categories Dummy coding of categorical variables Multidimensional categorical variables
  • Slide 15
  • 15 Exercise 11 Data transpose. Write a SAS code to transpose SAS data set Auction in the shared directory. Transpose without other parameters Drop some variables Remove the generated variables Input data. Userlog0.txt, Userlog1.txt
  • Slide 16
  • 16 SAS Programming Illustrative Examples Data sets merge SAS Procedures (Proc) Proc Freq Proc Means Demonstration: QUERYLOG.TXT
  • Slide 17
  • 17 PROC FREQ data SummerSchool; input Gender $ Internship $ Enrollment $ Count @@; datalines; boys yes yes 35 boys yes no 29 boys no yes 14 boys no no 27 girls yes yes 32 girls yes no 10 girls no yes 53 girls no no 23 ; proc freq data=SummerSchool order=data; weight count; tables Internship*Enrollment / chisq; run;
  • Slide 18
  • 18 PROC FREQ PROC SQL; CREATE VIEW WORK.SORT AS SELECT AnonID, QueryDate FROM SASUSER.IMPW_0012; QUIT; PROC freq DATA=WORK.SORT ORDER=INTERNAL; TABLES AnonID / NOPERCENT NOCUM SCORES=TABLE; TABLES QueryDate / NOPERCENT NOCUM SCORES=TABLE; RUN;
  • Slide 19
  • 19 PROC MEANS PROC SQL; CREATE VIEW WORK.SORTTempTableSorted AS SELECT QueryTime, AnonID, QueryDate FROM SASUSER.IMPW_0012; QUIT; PROC MEANS DATA=WORK.SORTTempTableSorted (drop= _way_ _type_) FW=12 PRINTALLTYPES CHARTYPE NWAYVARDEF=DF MIN MAX N; VAR QueryTime; CLASS AnonID /ORDER=UNFORMATTED ASCENDING; CLASS QueryDate /ORDER=UNFORMATTED ASCENDING; OUTPUT OUT=QUERYLOG(LABEL="Summary Statistics for SASUSER.IMPW_0012") MIN()= MAX()= N()= / AUTONAME AUTOLABEL WAYS INHERIT; RUN;
  • Slide 20
  • 20 Case Study Building a Customer Data Mart Business Context To create an analytical model that predicts customers that have a high probability to leave the company. To create a data mart that reflects the properties and behaviors of customers. The data mart will also contain a variable that presents the fact that a customer has left the company at a certain point in time. Five SAS datasets Customer demographic and customer baseline data Accounts information customer accounts Leasing data on leasing information Callcenter transactional data on call center contacts Scores transactional data of value segment scores The datasets have the unique CUSTID variable.
  • Slide 21
  • 21 Data Mart Programs Create aggregation on a subject level of ACCOUNTS, LEASING, CALLCENTERS and SCORES tables ACCOUNT Derive sums per customer Calculate an account type split based on the balance LEASING Summarize the value and the annual rate per customer CALLCENTER Aggregate the call center data over all customers using the FREQ proc Calculate the frequencies for the category COMPLAINT SCORES Contains the VALUESEGMENT for each customer and month. If a customer cancelled his contract, he got an 8.lost, or he got 1.Gold, etc. Create three output data sets: value segments in actual month, one month ago, and in a future month Join the resulting tables together and create a CUSTOMER DATA MART table.
  • Slide 22
  • 22 Exercise 12 Use PROC FREQ to generate a summary statistics table from 6339EX12.xls dataset Hints Define path: libname xlsdata \6339EX12.xls'; Define the worksheet as weblog$n Column: Page, Row: UserID Explore the data mart created by the code of Ch25.sas Use SAS Enterprise Guide to create joint tables from the dataset Try One-way Frequency and Summary Statistics functions with the ACCOUNTS data set
  • Slide 23
  • 23 More Topics Multiple Interval-Scaled Observation per Subject Multiple Categorical per Subject Coding for Predictive Modeling Data Preparation for Multi-Rows-per-Subject and Longitudinal Data Marts Sampling Scoring and Automation Dos and Donts
  • Slide 24
  • 24 Base Procedures Guide Examples Elementary Statistics Procedures, Chapter 1 Reporting Procedures, Chapter 2 APPEND Procedure, Chapter 5 CALENDAR Procedure, Chapter 7 CHART Procedure, Chapter 9 COMPARE Procedure, Chapter 11 CORR Procedure, Chapter 15 FORMAT Procedure, Chapter 18 FORMS Procedure, Chapter 19 FREQ Procedure, Chapter 20 MEANS Procedure, Chapter 21 PLOT Procedure, Chapter 25 PRINT Procedure, Chapter 27 RANK Procedure, Chapter 29 SQL Procedure, Chapter 34 STANDARD Procedure, Chapter 35 TABULATE Procedure, Chapter 37 TIMEPLOT Procedure, Chapter 40 TRANSPOSE Procedure, Chapter 41 UNIVARIATE Procedure, Chapter 42
  • Slide 25
  • 25 Base Usage Guide Examples What is the SAS System?, Chapter 1 Introduction to DATA Step Processing, Chapter 2 Starting with Raw Data, Chapter 3 Starting with SAS Data Sets, Chapter 4 Understanding DATA Step Processing, Chapter 5 Working with Numeric Variables, Chapter 6 Working with Character Variables, Chapter 7 Acting on Selected Observations, Chapter 8 Creating Subsets of Observations, Chapte

Recommended

View more >