sas session

Upload: sudhir-singh

Post on 13-Jan-2016

7 views

Category:

Documents


0 download

DESCRIPTION

qfa

TRANSCRIPT

  • Introduction to SAS

  • OutlineThe SAS programming environment and language.

    Introduction to the SAS Window Environment the structure and components of SAS programs SAS Data Sets SAS libraries : temporary and permanent

    Working with data sets in SAS Getting Your Data into SAS Modifying SAS Data Sets Combining (Appending & Merging) Selecting, Sorting Printing

    Using Basic Statistical Summary Procedures Means Procedure Freq Procedure Plot Procedure Univariate Procedure

  • SAS (Statistical Analysis System)

    The SAS programming environment and language used to read, process and analyze data. Introduction to the SAS Window Environment The structure and components of SAS programs SAS Data Sets SAS libraries : Temporary and Permanent

  • The SAS Window Environment

    Editor Window

    Log Window

    Output

    ExplorerWindow

    Results

    SAS Command Bar ToolbarPull-down Menus

  • Window Environment

    Editor Window : is a text editor. Used to type in, edit, and submit /execute SAS programs as well as edit other text files such as raw data files.

    Log Window: lists program statements that are processed and gives notes, warnings and errors.

    Output Window : gives the output from the program if there is any printable results.

    Results Window : lists each part of your results in an outline form.

    Explorer Window: The Explorer window gives you easy access to your SAS files and libraries

  • SAS Programs

    SAS programs are used to access, manage, analyze, and present the data A SAS program written using the SAS language is a sequence of statements

    executed in order. SAS statements as with any language, there are a few rules to follow while

    writing SAS programs. File extension - .sas A sample SAS program

    DATA demo;INPUT sale year;DATALINES;20 200134 200215 200321 200430 2005;PROC PRINT DATA= demo;RUN;

  • SAS programs

    Syntax Rules for SAS statements Free format : does not differentiate between upper and lower case Usually begin with an identifying keyword Can span multiple lines Every statement ends with a semicolon Multiple statements can be on the same line can start in any column.

    To add the comments use either option start with an asterisk (*) and end with a semicolon (;). start with a slash asterisk (/*) and end with an asterisk slash (*/).

    Possible Errors Indicated in the Log window Misspelled key words Missing or invalid punctuation (missing semi-colon common) Invalid options

  • Raw Data

    Read in DataProcess Data

    (Create new variables)

    Output Data(Create SAS Dataset)

    Analyze Data Using Statistical Procedures

    Data Step

    PROCs

  • SAS programs

    2 Basic steps in SAS programs: Data Steps

    Read and Modify data Create new data Begins with DATA

    statement

    Proc Steps perform specific analysis or

    function produce results or report Begins with PROC

    statement

    Data bankacct;infile records;input Name $ 1-10 AccountType $ 12-20

    Deposit 22-25;run;proc print data=bankacct; run;proc means data=bankacct;var deposit;run;

  • SAS Programs

    The end of the data or proc steps are indicated by: RUN statement most steps QUIT statement some steps Beginning of another step (DATA or PROC statement)

    Output generated from SAS program SAS log

    Information about the processing of the SAS program Includes any warnings or error messages Accumulated in the order the data and procedure steps are

    submitted

    SAS output Reports generated by the SAS procedures Accumulates output in the order it is generated

  • SAS Data Sets

    Before you want to do any analysis, write a report or do anything with your data, you must read the data into SAS.

    Before SAS can analyze your data, the data must be as a SAS data set. It is very similar to Excel

    Made up of rows and columns Rows are called observations Columns are called variables

    An observation is all the information for one entity (employee, company, country)

    SAS processes data one observation at a time

  • SAS Data Sets

    There are two types of data Character that includes letters, numbers, symbols etc., e.g. Emp ID,

    Color Code

    Numeric floating point numbers e.g. age ,salary, temperature

    Rules for SAS variable names must be 32 characters or fewer in length. must start with a letter or an underscore ( _ ). can contain only letters, numerals, or underscores ( _ ). can contain upper- and lowercase letters.

    Each file is located/stored in SAS Data Libraries.

  • SAS Data Libraries

    A SAS library is simply a location where SAS data sets (as well as other types of SAS files) are stored.

    Identified by assigning a library reference name libref Depending on the library name that you use when you create a file, you

    can store SAS files temporarily or permanently. Temporary

    Work library SAS data files are deleted when session ends Library reference name not necessary

    Permanent SASUSER library SAS data sets are saved after session ends You can create and access your own libraries Eg : LIBNAME test 'H:\lab class ';

  • Referencing SAS Files

    To reference a SAS file, you use a two-level name, libref.filename.

    In the two-level name, libref is the name for the SAS library that contains the file, and filename is the name of the file itself.

    A period separates the libref and filename. To reference temporary SAS files, you specify the default libref Work,

    a period, and the filename Or simply use a one-level name (the filename only)

    Referencing a SAS file in any library other than the Work indicates that the SAS file is stored permanently.

  • Working with data sets in SAS

    Getting data into SASModifying SAS Data Sets

    Selecting Variables /Observations Sorting Combining: Appending & Merging

    Printing

  • Getting data into SAS

    There can be different methods for getting your data into SAS like

    Importing existing data set Using Import menu option Using the PROC IMPORT

    Entering raw data manually Using Table Editor Using the DATA steps

  • Using the import data menu option/Import Wizard1. File Import Data2. Standard data source select the file format3. Specify file location or Browse to select file4. Create name for the new SAS data set and specify location

    (permanent or temporary)Example: demo1.exl, demo1.txt

    S. No. Age Gender Income Job Status

    1234567

    205540.

    352438

    FMFMMMF

    10000250003500018000120001600030000

    LMHMLMH

    Getting data into SAS

  • Manually Entering Raw Data Files using the Table Editor.

    1. Tools Table Editor2. Enter data manually into table

    - Observations in each row- Variables in each column

    3. Right Click Column Column Attributes- Variable Name, Variable Label, Type Character/Numeric,

    Format, InformatNote: Informats determine how raw data is read.

    Formats determine how variable is displayed.4. Close window Save Changes Yes

    Specify File name and directory

    Getting data into SAS

  • Manually Entering Raw Data Files in SAS program using the DATA Step

    Few Examples

    /*Reading raw data separated by spaces*/

    data one;input S_No Age Gender $ Income Job_Status $;datalines;1 20 F 10000 L2 55 M 25000 M3 40 F 35000 H4 . M 18000 M5 35 M 12000 L6 24 M 16000 M7 38 F 30000 H;proc print data = one;

    title ' Problem 1';Run;

    Getting data into SAS

  • /*Reading data arranged in columns */

    data two;input S_No 1 Age 2-3 Gender $ 4 Income 5-9 Job_Status $ 10 ;

    datalines;120F10000L255M25000M340F35000H4. M18000M535M12000L624M16000M738F30000H;

    proc print data = two;title ' Problem 2';

    run;

  • /* Reading selected variables from your data */

    data three;input S_No 1 income 5-9;

    datalines;120F10000L255M25000M340F35000H4. M18000M535M12000L624M16000M738F30000H;proc print data = three;

    title ' Problem 3';run;

    Getting data into SAS

  • /*Creating permanant data set */

    /*LIBNAME test 'H:\lab class;*/

    data test.one;input S_No Age Gender $ Income Job_Status $;datalines;1 25 F 10000 L2 55 M 25000 M3 40 F 35000 H4 . M 18000 M5 35 M 12000 L6 25 M 16000 M7 40 F 30000 H;proc print data = test.one;

    title ' Problem 4';run;

    Getting data into SAS

  • Modifying SAS Data Sets

    Create a new SAS data set using an existing SAS data set as input Syntax:

    DATA new _data;SET old_data;;

    RUN; By default the SET statement reads all observations and variables from the

    old data set into the output data set. Example:data new;set test.one;run;proc print data=new;Title 'original data';run;

  • Modifying SAS Data Sets Selecting Variables

    Use DROP and KEEP to determine which variables are written to new SAS data set.

    DROP and KEEP as statements Syntax:DROP V1 V2;

    KEEP V3 V4;

    DROP and KEEP options in SET statement Syntax: SET old_data (KEEP=V1); Syntax: SET old_data (DROP=V1);

    Conditional Processing Uses IF-THEN-ELSE logic

    Syntax: IF THEN ;

  • /* keep variables s_no and age only in the new file*/

    data new1; set test.one;

    keep s_no age;

    run;

    proc print data = new1; run;

    OR

    data new1;

    set test.one (keep= s_no age);

    run;

    /* Same file can be obtained by dropping the variables income, gender and job_status*/

    data new2; set test.one;

    drop income gender job_status;

    run;

    proc print data = new2; run;

    Modifying SAS Data Sets

  • /* Creating new data set where income is greater than 20,000*/data new3; set test.one;

    if income>20000;run;proc print data = new3; run;

    Creating new variables from the existing variables/* Recode into a new variables */data new;set test.one;if age>35 then age_code=2;else age_code=1;drop age;title ' Problem 6';

    run;proc print data = new; run;

    Modifying SAS Data Sets

  • Modifying SAS Data Sets

    Subsetting Rows (Observations) Using IF statement

    Only writes observations to the new data set in which an expression is true; General Form: IF ; Example: IF Job_status = H;

    Using WHERE option in SET statement only read rows from the input data set in which the expression is true General Form: SET input_data_set (where=()); Example: SET test.one (where=(Job_Status=H));

    Comparison Resulting output data set is equivalent IF statement all rows read from the input data set Where option only rows where expression is true are read from input data set Difference in processing time when working with big data sets

  • Modifying SAS Data Sets/* Subsetting the observations */data new4; set test.one;if job_status='H';run;proc print data = new4; run;ORdata new5; set test.one (where=(Job_Status= 'H')); run;proc print data = new5; run;/*Conditionally Deleting an Observation*/data new6; set test.one; if job_status='H' then delete; run;proc print data=new6; run;

  • Modifying SAS Data Sets Sorting Data on particular variable(s)

    General Form: PROC SORT DATA=old_data ;BY Variable1 Variable2;

    RUN;

    Sorts data according to Variable1 and then Variable2; By default, SAS sorts data in ascending order Use DESCENDING statement for numbers high to low and letters Z to A Example

    data one; set test.one; run; /*To read data */

    proc sort data=one out=two;

    by income;

    run;

    proc print data=one; run;

    proc print data=two; run;

  • Modifying SAS Data Sets Concatenating (or Appending)

    Stacks each data set upon the other If one data set does not have a variable that the other datasets do, the

    variable in the new data set is set to missing for the observations from that data set.

    Form: DATA output_data_set;SET data1 data2;

    run;

    Example:

    PROC APPEND may also be used Form: PROC APPEND BASE=old_data DATA=new_data;

    run;

  • Concatenating (or Appending)./*Reading second data set to be added into the old one*/data two;input S_No Age Gender $ Income Job_Status $;datalines;8 25 F 10000 L9 50 M 35000 H10 35 F 25000 M;data three;set test.one two;run;proc print data = three;run;/*OR*//* using PROC APPEND: it adds the data into the same file*/data one; set test.one; run; /*To read data */proc append base =one data=two ;run;proc print data = test.one;run;

  • Modifying SAS Data Sets Merging Data Sets

    usually needs one common variable A single record in a data set corresponds to a single record in all

    other data sets

    Form: DATA output_data_set;MERGE input_data1 input_data2;

    By variable1 ;

    RUN;

    Data must be sorted before merging can be done (PROC SORT)

  • Merging Data Sets/*Reading the new data set containing the

    new variable to be added */

    data two;input S_No Name$;datalines;1 Leena2 Ajay 3 Sunita4 Gopal5 Sachin6 Tanay7 Mamta;data three;

    merge test.one two;run;proc print data = three;run;

    /*Reading the new data set containing the new variable to be added */

    data two;input S_No Name$;datalines;1 Leena2 Ajay 4 Gopal5 Sachin7 Mamta3 Sunita6 Tanay;data three;

    merge test.one two;by S_No ; /*match merge*/

    run;proc print data = three;run;

  • Using Basic Statistical Summary Procedures

    Print Procedure Means Procedure Freq Procedure Chart Procedure Plot Procedure Univariate Procedure

  • Print Procedure PROC PRINT is used to print data to the output window By default, prints all observations and variables in the SAS data set Form: PROC PRINT DATA=input_data

    ;RUN;

    Some Options input_data (obs=n): Specifies the number of observations to be printed NOOBS: Suppresses printing observation number LABEL: Prints the labels instead of variable names

    Optional SAS statements SUM variable1 variable2 variable3;

    Prints sum of listed variables at the bottom of the output VAR variable1 variable2 variable3;

    Prints only listed variables in the output

    Proc Print

  • Proc Print

    /* Printing selected variables(use of var, where, sum, noobs, n, print selected observations, html output)*/

    ods html file='try.html';proc print data = test.one noobs n;

    var S_No Age;where age >30;title ' Proc print usage';sum income;

    run;ods html close;

    proc print data = test.one(firstobs=2 obs=4) ;var S_No Age income;

    run;

  • Means Procedure Proc Means is used to get the simple summary statistics of numeric variables General Form: PROC MEANS DATA=input_data_set options;

    ;

    RUN;

    With no options or optional SAS statements, the Means procedure will print out the number of non-missing values, mean, sd, min, and max for all numeric variables .

    Some of the options:MAX, MIN, MEAN, MEDIAN, N (number of non-missing values), NMISS (number of

    missing values), RANGE, STDDEV, SUM.

    Exampleproc means data=test.one;

    class gender;

    var income age;

    run;

    Proc Means

  • FREQ Procedure PROC FREQ is used to generate frequency tables (chracter variables) Most common usage is create table showing the distribution of categorical

    variables

    General Form: PROC FREQ DATA=input_data_set;TABLES variable-combinations;

    RUN;

    Use BY statement to get percentages within each category of a variable

    Exampleproc freq data =test.one;

    tables gender gender *job_status;

    run;

    Proc Freq

  • Used to create frequency bar chart General form: PROC CHART DATA=input_data;

    VBAR variable_list /options;RUN;

    /*Use of Proc Chart to create frequency bar chart for characteristics variable*/proc chart data=test.one;vbar job_status ; run;

    /*Use of Proc Chart to create frequency bar chart for numeric variable*/ods graphics on;ods html ;proc chart data=test.one;vbar income /descrete; /* other options could be midpoints=values/ range or

    levels=no.*/run;ods graphics off;ods html close;

    Proc Chart

  • Plot Procedure Used to create basic scatter plots of the data General Form: PROC PLOT DATA=input_data_set;

    PLOT vertical_variable * horizontal_variable/;

    RUN;

    By default, SAS uses letters to mark points on plots A for a single observation, B for two observations at the same point, etc.

    To specify a different character to represent a point PLOT vertical_variable * horizontal variable = *;

    To plot more than one variable on the vertical axis PLOT vertical_variable1 * horizontal_variable=2

    vertical_variable2 * horizontal_variable=1/OVERLAY;

    Use PROC GPLOT or PROC SGPLOT for more sophisticated plots

    Proc Plot

  • Example* Create data for variables x and y;data generate;

    do x = 1 to 8;y1 = x **2;y2 = x**3;output;

    end;

    proc print data = generate;title 'generated data';

    run;/*A Simple Scatter Plot*/proc plot data=generate;

    plot y1*x;run;

    proc plot data=generate;plot y1*x='1' y2*x='2'/overlay;

    run;

    Proc Plot

  • Univariate Procedure PROC UNIVARIATE is used to examine the distribution of data Produces summary statistics for a single variable

    Includes mean, median, mode, standard deviation, skewness, kurtosis, quantiles, etc.

    General Form: PROC UNIVARIATE DATA=input_data_set ;VAR variable1 variable2 variable3;

    RUN ;

    If the variable statement is not used, summary statistics will be produced for all numeric variables in the input data set.

    Options include: PLOT produces Stem-and-leaf plot, Box plot, and Normal probability plot;

    NORMAL produces tests of Normality Example

    proc univariate data = test.one;var age;run;

    Proc Univariate

    Introduction to SAS OutlineSAS (Statistical Analysis System)The SAS Window EnvironmentWindow EnvironmentSAS ProgramsSAS programsSlide Number 8SAS programsSAS ProgramsSAS Data SetsSAS Data SetsSAS Data LibrariesReferencing SAS Files Working with data sets in SASGetting data into SASGetting data into SASGetting data into SASGetting data into SASSlide Number 20Getting data into SASGetting data into SASModifying SAS Data SetsModifying SAS Data SetsSlide Number 25Modifying SAS Data SetsModifying SAS Data SetsModifying SAS Data SetsModifying SAS Data SetsModifying SAS Data SetsSlide Number 31Modifying SAS Data SetsSlide Number 33Using Basic Statistical Summary ProceduresSlide Number 35Proc PrintSlide Number 37Slide Number 38Proc ChartSlide Number 40Slide Number 41Slide Number 42