# How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging.

Post on 05-Jan-2016

215 views

TRANSCRIPT

How to start using SASTina Tian

The topicsAn overview of the SAS systemReading raw data/ create SAS data setCombining SAS data sets & Match merging SAS Data Sets Formatting data Introduce some simple statistical analysis procedures

Basic Screen NavigationMain:Editor contains the SAS program to be submitted.Log contains information about the processing of the SAS program, including any warning and error messagesOutput contains reports generated by SAS procedures and DATA steps Side:Explore navigate to other objects like libraries Results navigate your Output window

SAS programsA SAS program is a sequence of steps that the user submits for execution.Data steps are typically used to create SAS data setsPROC steps are typically used to process SAS data sets ( that is, generate reports and graphs, sort data and analyze data)

SAS Data LibrariesA SAS data library is a collection of SAS files that are recognized as a unit by SASA SAS data set is one type of SAS file stored in a data libraryWork library is temporary library, when SAS is closed, all the datasets in the Work library are deleted; create a permanent SAS dataset via your own library.

SAS Data LibrariesIdentify/create SAS data libraries by assigning each a library reference name (libref) with LIBNAME statement LIBNAME libref file-folder-location; Eg: LIBNAME readData 'C:\temp\sas class\readData;Rules for naming a library reference name:The name must be 8 characters or lessThe name must begin with a letter or underscoreThe remaining characters must be letters, numbers or underscores.

Reading internal raw data in SAS systemPut small amounts of raw data directly in the SAS program to create SAS data set, you mustStart a DATA step and name the SAS data set being created with DATA statementDescribe how to read the data fields from the raw data file with INPUT statementUse the DATALINES statement to indicate internal dataThe RUN statement detects the end of a step

Reading internal raw data in SAS systemExample: DATA dog1; INPUT ID Age Gender $ Income; DATALINES; 1 10 m 2300 2 13 f 1500 3 12 f 1700 4 9 m 100 5 13 m 1000; RUN;

Reading external raw data files into SAS systemIn order to create a SAS data set from a raw data file, you mustStart a DATA step and name the SAS data set being created (DATA statement)Identify the location of the raw data file to read (INFILE statement)Describe how to read the data fields from the raw data file (INPUT statement)The RUN statement detects the end of a step

Reading external raw data file into SAS systemLIBNAME readData C:\temp\sas class;DATA readData.dog1; INFILE C:\temp\sas class\dog.txt; INPUT ID Age Gender $ Income;RUN;The LIBNAME statement assigns a libref readData to a data library.The DATA statement creates a permanent SAS data set named dog1.The INFILE statement points to a raw data file. The INPUT statement - name the SAS variables - identify the variables as character or numeric ($ indicates character data) - specify the locations of the fields in the raw data - can be specified as column, formatted, list, or named input The RUN statement detects the end of a step

Reading Delimited or PC Database Files with the IMPORT ProcedureIf your data file has the proper extension, use the simplest form of the IMPORT procedure: PROC IMPORT DATA FILE = filename OUT = data-set DBMS = identifier ; RUN; Type of File Extension DBMS Identifier Comma-delimited .csv CSV Tab-delimited .txt TAB Excel .xls EXCEL Lotus Files .wk1, .wk3, .wk4 WK1,WK3,WK4 Delimiters other than commas or tabs DLMExamples: PROC IMPORT DATAFILE=c:\temp\sale.xls OUT=readData.import1; DBMS = EXCEL; RUN;

Reading Delimited or PC Database Files with the IMPORT ProcedureIf your file does not have the proper extension, or your file is of type with delimiters other than commas or tabs, then you must use the DBMS= and DELIMITER= option PROC IMPORT DATA FILE = filename OUT = data-set DBMS = identifier ; DELIMITER = delimiter-character; RUN;Examples: PROC IMPORT DATAFILE=c:\temp\sale.txt OUT=readData.import2; DBMS = DLM; DELIMITER = &; RUN;

Reading Files with the IMPORT ProcedureIf your file does not have the proper extension, or your file is of type with delimiters other than commas or tabs, then you must use the DBMS= and DELIMITER= option PROC IMPORT DATAFILE = filename OUT = data-set DBMS = identifier; DELIMITER = delimiter-character; RUN;Example: PROC IMPORT DATAFILE = C:\sas class\readData\import2.txt OUT =readData.sasfile DBMS =DLM; DELIMITER = &; RUN;

Format in SAS data setStandard Formats (selected): Character: $w.Date, Time and Datetime: DATEw., MMDDYYw., TIMEw.d, Numeric: COMMAw.d, DOLLARw.d, Use FORMAT statement PROC PRINT DATA=sales; VAR Name DateReturned CandyType Profit; FORMAT DateReturned DATE9. Profit DOLLAR 6.2; RUN;

Format in SAS data setCreate your own custom formats with two steps:Create the format using PROC FORMAT and VALUE statement.Assign the format to the variable using FORMAT statement.General form of a simple PROC FORMAT steps: PROC FORMAT; VALUE name range-1=formatted-text-1 range-2=formatted-text-2 ; RUN;The name in VALUE statement is the name of the format you are creating, which cant be longer than eight characters, must not start or end with a number. If the format is for character data, it must start with a $.

Format in SAS data setExmaple:/* Step1: Create the format for certain variables */ PROC FORMAT; VALUE $genFmt m = 'Male' f = 'Female'; VALUE polFmt 1=likes 2=dont care 3=dislikes 9=no answer RUN;/* Step2: Assign the variables */ DATA Mydata.dog123(replace=yes); SET Mydata.dog123; FORMAT Gender genFmt. Policy polFmt.; RUN;

Format in SAS data setPermanently store formats in a SAS catalog byCreating a format catalog file with LIB in PROC FORMAT statementSetting the format search optionsExample: LIBNAME Mydata C:\sas class\Format; OPTIONS FMTSEARCH=(Mydata.dogfmt); PROC FORMAT LIB=Mydata.dogfmt; VALUE $genFmt m = 'Male f = 'Female'; RUN;Read formats OPTIONS nofmterr; OPTIONS FMTSEARCH=(Mydata.dogfmt);

Combining SAS Data Sets: Concatenating and InterleavingUse the SET statement in a DATA step to concatenate SAS data sets.Use the SET and BY statements in a DATA step to interleave SAS data sets.

Combining SAS Data Sets: Concatenating and InterleavingGeneral form of a DATA step concatenation:DATA new SAS-data-set; SET SAS-data-set1 SAS-data-set2 ; RUN;Example: DATA mydata.dog12; SET dog1 mydata.dog2; RUN;

Combining SAS Data Sets: Concatenating and InterleavingGeneral form of a DATA step interleave:DATA new-data-set; SET SAS-data-set1 SAS-data-set2 ; BY BY-variable; RUN;Sort all SAS data set first by using PROC SORTExample:PROC SORT data=dog1 OUT=dog1_sorted; BY ID; RUN; DATA mydata.dog12; SET dog1 mydata.dog2; BY ID; RUN;

Match-Merging SAS Data SetsOne-to-one match merge One-to-many match merge Many-to-many match mergeThe SAS statements for all three types of match merge are identical in the following form: DATA new-data-set; MERGE SAS-data-set-1 SAS-data-set-2 SAS-data-set-3 ; BY by-variable(s); /* indicates the variable(s) that control which observations to match */ RUN;

Merging SAS Data Sets: A More Complex Example/* To match-merge the data sets by common variables - EmpID, the data sets must be ordered by EmpID */PROC SORT data=combData.Groupsched; BY EmpID; RUN;

Example: Merge two data sets acquire the names of the group team that is scheduled to fly next week. combData.employee combData.groupsched

EmpIDLastNameE00632StraussE01483LeeE01996NickE04064Waschk

EmpIDFlightNumE040645105E06325250E019965501

Merging SAS Data Sets: A More Complex Example/* simply merge two data sets */DATA combData.nextweek; MERGE combData.employee combData.groupsched; BY EmpID;RUN;

EmpIDLastJNameFlightNumE00632Strauss5250E01483LeeE01996Nick5501E04064Waschk5105

Merging SAS Data Sets: A More Complex ExampleEliminating Nonmatches Use the IN= data set option to determine which dataset(s) contributed to the current observation.General form of the IN=data set option: SAS-data-set (IN=variable)Variable is a temporary numeric variable that has two possible values:0 indicates that the data set did not contribute to the current observation.1 indicates that the data set did contribute to the current observation.

Merging SAS Data Sets: A More Complex Example/* Exclude from the data set employee who are not scheduled to fly next week. */LIBNAME combData K:\sas class\merge;DATA combData.nextweek; MERGE combData.employee combData.groupsched (in=InSched); BY EmpID; IF InSched=1; TrueRUN;

EmpIDLastJNameFlightNumE00632Strauss5250E01996Nick5501E04064Waschk5105

Merging SAS Data Sets: A More Complex Example/* Find employees who are not in the flight scheduled group. */LIBNAME combData K:\sas class\merge;DATA combData .nextweek; MERGE combData .employee (in=InEmp) combData.groupsched (in=InSched); BY EmpID; IF InEmp=1; True IF InSched=0; False RUN;

EmpIDLastJNameFlightNumE01483Lee

Different Types of Merges in SASDATA work.three; MERGE work.one work.two; BY X;RUN;One-to-Many Merging

Work.threeWork.twoWork.one

XY1A2B3C

XE1A11A22B13C13C2

XYZ1AA11AA22BB13CC13CC2

Different Types of Merges in SASDATA work.three; MERGE work.one work.two; BY X;RUN;Many-to-Many Merging

Work.threeWork.twoWork.one

XY1A11A22B12B2

XZ1AA11AA21AA32BB12BB2

XYZ1A1AA11A2AA21A2AA32B1BB12B2BB2

Some simple analysis procedureThe PRINT Procedure The CONTENTS Procedure The FREQ Procedure The SORT Procedure The MEANS Procedure The CORR Procedure The TTEST Procedure The ANOVA Procedure

The PRINT ProcedureThe PRINT procedure prints the observations in a SAS data set.General form of a simple PROC PRINT steps: PROC PRINT DATA = SAS-data-set; VAR variable(s) ; SUM variable(s) ; RUN;The VAR statement specifies which variables to print and the orderThe SUM statement indicates the total values of numeric variables

The Contents ProcedureThe CONTENTS procedure shows the contents of a SAS data set and prints the directory of the SAS data library

General form of a simple PROC CONTENTS steps: PROC CONTENTS DATA = SAS-data-set; RUN;

The SORT ProcedureThe SORT procedure orders SAS data set observations by the values of one or more character or numeric variables.

General form of a simple PROC SORT steps: PROC SORT DATA = SAS-data-set; BY variable-1 ; RUN;

The MEANS ProcedureThe MEANS procedure provides descriptive statistics for variables across all observations

General form of a simple PROC MEANS steps: PROC MEANS DATA = SAS-data-set; CLASS variable(s) ; VAR variable(s) RUN;

The FREQ ProcedureThe FREQ procedure produces one-way to n-way frequency and crosstabulation (contingency) tables

General form of a simple PROC FREQ steps: PROC FREQ DATA = SAS-data-set; TABLE requests < / options > ; RUN;The TABLES statement requests one-way to n-way frequency and crosstabulation tables and statistics for those tables

The TTEST ProcedureThe TTEST procedure performs t tests for one sample, two samples, and paired observations.General form of a simple PROC FREQ steps: PROC TTEST DATA = SAS-data-set H0=m; VAR variable(s); RUN; PROC TTEST DATA = SAS-data-set; VAR variable(s); CLASS variable; RUN;use H0 option to a given number in the one sample t testuse CLASS statement in the two groups comparison t test

The ANOVA ProcedureThe ANOVA procedure performs one-way analysis of variance (ANOVA) for balanced data

General form of a simple PROC FREQ steps:PROC ANOVA DATA = SAS-data-set; CLASS variable(s) ; MODLE dependents = effects ; RUN;

Some simple analysis procedureThe UNIVARIATE ProcedureThe REG Procedure The LOGISTIC Procedure

The UNIVARIATE ProcedureThe UNIVARIATE procedure provides descriptive statistics, histograms, quartile - quartile plots (Q-Q plots) and probability plotsGeneral form of a simple PROC FREQ steps:PROC UNIVARIATE DATA = SAS-data-set; VAR variables; HISTOGRAM; QQPLOT;RUN;

The REG procedure The REG procedure is one of many regression procedures in the SAS System. The REG procedure allows several MODEL statements and gives additional regression diagnostics, especially for detection of collinearity. It also creates plots of model summary statistics and regression diagnostics. PROC REG ; MODEL dependents=independents ; PLOT ; RUN;

An examplePROC REG DATA=water; MODEL Water = Temperature Days Persons / VIF; MODEL Water = Temperature Production Days / VIF; RUN;PROC REG DATA=water; MODEL Water = Temperature Production Days; PLOT STUDENT.* PREDICTED.; /*To get studentized Residual */ PLOT STUDENT.* NPP.; /*To get Normal Cumulative Distribution*/ PLOT r.*nqq.; /*Produce normal Q-Q plot */ RUN;

The LOGISTIC procedureThe binary or ordinal responses with continuous independent variables PROC LOGISTIC < options > ; MODEL dependents=independents < / options > ; RUN;The binary or ordinal responses with categorical independent variables PROC LOGISTIC < options > ; CLASS categorical variables < / option > ; MODEL dependents=independents < / options > ; RUN;

Example PROC LOGISTIC data=Mydata2.pain; CLASS Treatment Sex; MODEL Pain= Treatment Sex Treatment*Sex Age Duration; RUN;

**SAS steps begin with a DATA statement or PROC statement.DATA steps read raw data or sas existing data set into SAS, and also edit the data set as the way you want to create a new SAS data set. Then PROC step will process the data set you created in DATA step to produce more complicated statistical analysis.

- SAS use data libraries to store data sets.- You can think of a SAS data library as a drawer in a filling cabinet and a SAS data set as one of the file folders in the drawer.- The Work library is a default library in SAS, and its temporary. When SAS is closed, all the datasets in the Work library are deleted. if you want to save a dataset to continue to work with it later, create a permanent SAS dataset via a library.If you put your raw data directly in your SAS program, then your data are internal. You may want to do this when you have small amounts of data, or you are testing a program with a small test data set.**The data set name must begin with a letter or underscore and the remaining characters must be letters, numbers, or underscores.***SAS build in formats/function*Create our own custom formats when you use a lot of coded data.Formats can remind you of the meaning behind the category. Note that formats do not change the actual value of the variable, just how its displayed.

*If the format is for character data, it must start with a $************DATA option specifies the input dataset.*the observations are random samples drawn from normally distributed populations, this assumption can be checked using the UNIVARIATE procedure; *the keyword NPP. or NQQ., which can be used with any of the preceding variables to construct normal P-P or Q-Q plots, VIF (variance inflation) for person = 6.6267 & production=6.6596 is 5

Recommended