1-reading raw data in sas week1

Upload: vikyanii

Post on 02-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 1-Reading Raw Data in SAS Week1

    1/75

    Powerpoint Templates

    1Powerpoint Templates

    SAS - Basic Concepts

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    2/75

    Powerpoint Templates

    2

    Statistical Application Software

    The SAS system has a suite of products

    Each product associated with a set of functionalities

    Capable of efficiently handling very large data sets

    What is SAS?

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    3/75

    Powerpoint Templates

    3

    Some products in the SAS System

    Core of the SAS System The basic software to make SAS run

    Base SAS Software SAS language, DATA step and Basic Procedures

    SAS/STAT Procedures for various statistical analyses

    SAS/GRAPH Procedures and options to create graphs

    SAS/ETS

    Economic Time Series Time Series Analysis SAS/OR

    Operations Research Optimization, LP etc

    SAS/EIS Enterprise Information Systems for OLAP models

    Enterprise Miner Mining Package with various techniques

    SAS/Intrnet Web based application and portal development

    Analyst/AF/FSP/Other Front End Based Features

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    4/75

    Powerpoint Templates

    4

    Program Editor Window

    Write your code inthis window

    ExplorerandResultsWindow

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    5/75

    Powerpoint Templates

    5

    Log Window

    Log Window

    View the Log Created by

    the Program Execution

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    6/75

    Powerpoint Templates

    6

    Output Window

    Output Window

    View the Values of a Datasetin this Window

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    7/75

    Powerpoint Templates 7

    Components of SAS Programs

    DATA steps typically create or modify SAS data sets.

    put your data into a SAS data set compute values

    check for and correct errors in your data

    produce new SAS data sets by subsetting, merging, and updating existing data sets.

    PROC (procedure) steps are pre-written routines thatenable you to analyze and process the data in a SASdata set

    you can use PROC steps to

    create a report that lists the data

    produce descriptive statistics

    create a summary report

    produce plots and charts.

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    8/75

    Powerpoint Templates 8

    Characteristics of SAS Programs

    SAS programs consist ofSAS statements.

    A SAS statement has two important characteristics: It usually begins with a SAS keyword.

    It always ends with a semicolon.

    SAS statements are in free format. This means that they can begin and end anywhere on a line

    One statement can continue over several lines

    Several statements can be on a line.

    Blanks or special characters separate "words" in a SASstatement.

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    9/75

    Powerpoint Templates 9

    Overview of Data Sets

    Descriptor Portion

    The descriptor portion of a SAS data set contains informationabout the data set, including

    the name of the data set

    the date and time that the data set was created

    the number of observations

    the number of variables

    Data Portion Contains Rows, Column and actual Value

    Variable Attr i butes

    Name

    Type

    Length

    Format

    Informat

    Label

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    10/75

    Powerpoint Templates 10

    SAS Libraries

    Every SAS file is stored in a SAS library, which is a

    collection of SAS files. A SAS data library is the highest level of organization for

    information within SAS.

    General form, basic LIBNAME statement:

    LIBNAME libref'SAS-data-library' ;

    where

    librefis 1 to 8 characters long, begins with a letter or underscore,and contains only letters, numbers, or underscores.

    SAS-data-libraryis the name of a SAS data library in which SAS

    data files are stored. The specification of the physical name ofthe library differs by operating environment.

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    11/75

    Powerpoint Templates 11

    Storing Files Temporarily or Permanently

    Storing files temporarily:

    If you don't specify a library name when you create a file (or if youspecify the library name Work), the file is stored in the temporary SASdata library.

    When you end the session, the temporary library and all of its files aredeleted. Temporary SAS libraries last only for the current SAS session

    Storing files permanently: To store files permanently in a SAS data library, you specify a

    library name other than the default library name Work.

    Permanent SAS libraries are available to you during subsequentSAS sessions

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    12/75

    Powerpoint Templates 12

    Referencing SAS Files

    To reference a permanent SAS data set in your SAS programs, you use atwo-level name:

    l ibref. f i lename

    In the two-level name, l ibrefis the name of the SAS data library that contains the file, andf i lenameis the name of the file itself. A period separates the libref and filename.

    To reference temporary SAS files, you can specify the default librefWork,

    a period, and the filename. For example, the two-level name Work.Test

    Alternatively, you can use a one-level name (the filename only) to reference a file in atemporary SAS library. When you specify a one-level name, the default librefWork isassumed.

    If the USER library is assigned, SAS uses the Userlibrary rather than theWork library for one-level names. Useris a permanent library.

    So referencing a SAS file in any library except Work indicates that the SASfile is stored permanently.

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    13/75

    Powerpoint Templates 13

    Example of a SAS Data set

    ID NAME HT WT

    1

    2

    3

    4

    5

    6

    Observations

    Variables

    ID, HT and WT are Numeric Variables

    NAME is a Character Variable

    Character Variables if blank are represented by a space

    Numeric Variables if blank are represented by a .

    53 Lucy 42 41

    54 Tom 46 54

    55 Dan 43 .

    56 Tim 45 56

    57 42 48

    58 Mary 48 43

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    14/75

    Powerpoint Templates 14

    CREATING LIST REPORTS

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    15/75

    Powerpoint Templates 15

    Basic Report

    You can easily list the contents of a SAS data set by

    using a simple program like the one shown below.libname clinic 'your-SAS-data-library';

    proc print data=clinic.admit;

    run;

    You can produce column totals for numeric variableswithin your report.libname clinic 'your-SAS-data-library';

    proc print data=clinic.admit;

    sum fee;

    run;

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    16/75

    Powerpoint Templates 16

    Selected Observations and Variables

    You can choose the observations and variables that

    appear in your report. In addition, you can remove thedefault Obs column that displays observation numbers.

    libname clinic 'your-SAS-data-library';

    proc print data=clinic.admit noobs;

    var age height weight fee;

    where age>30;

    run;

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    17/75

    Powerpoint Templates 17

    Specifying WHERE Expressions

    Symbol Meaning Example

    = or eq equal to where name='Jones, C.';

    ^= or ne not equal to where temp ne 212;

    > or gt greater than where income>20000;

    < or lt less than where partno lt "BG05";

    >= or ge greater than or equal to where id>='1543';

  • 7/27/2019 1-Reading Raw Data in SAS Week1

    18/75

    Powerpoint Templates 18

    More Operators

    Using the CONTAINS Operator

    The CONTAINS operator selects observations that include thespecified substring. The mnemonic equivalent for theCONTAINS operator is ?

    where firstname CONTAINS 'Jon';

    where firstname ? 'Jon';

    IN operatorwhere actlevel in ('LOW','MOD');

    where fee in (124.80,178.20);

    Between And

    Where date Between 21Dec2010d And 20Jan2011d;

    Like Operator Where name like _uj%;

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    19/75

    Powerpoint Templates 19

    CREATING SAS DATA SETSFROM RAW DATA

    St t C t SAS D t S t

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    20/75

    Powerpoint Templates 20

    Steps to Create a SAS Data Set

    To Do This Use This SAS Statement Example

    Reference a SAS data library LIBNAME statement libname libref 'SAS-data-library';

    Reference an external file FILENAME statement filename tests 'c:\users\tmill.dat';

    Name a SAS data set DATA statement data clinic.stress;

    Identify an external file INFILE statement infile tests obs=10;

    Describe data INPUT statement input ID 1-4 Age 6-7 ...;

    Execute the DATA step RUN statement run;

    List the data PROC PRINT statement proc print data=clinic.stress;

    Execute the final program step RUN statement run;

    St t C t SAS D t S t

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    21/75

    Powerpoint Templates 21

    Steps to Create a SAS Data Set

    Using a LIBNAME Statement

    libname taxes 'c:\users\acct\qtr1\report'; Using a FILENAME Statement

    filename exer 'c:\users\exer.dat';

    Naming the Data Set

    DATA SAS-data-set-1 ; Rules for SAS Names

    SAS data set names

    can be 1 to 32 characters long

    must begin with a letter (AZ, either uppercase or lowercase) or an

    underscore (_) can continue with any combination of numbers, letters, or

    underscores.

    St t C t SAS D t S t

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    22/75

    Powerpoint Templates 22

    Steps to Create a SAS Data Set

    Specifying the Raw Data File

    INFILEfile-specification ;

    Describing the Data General form, INPUT statement using column input:

    INPUT variable ; startcol-endcol . . .where

    variable is the SAS name that you assign to the field

    the dollar sign ($) identifies the variable type as character (if thevariable is numeric, then nothing appears here)

    startcolrepresents the starting column for this variable endcolrepresents the ending column for this variable.

    St t C t SAS D t S t

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    23/75

    Powerpoint Templates 23

    Steps to Create a SAS Data Set

    filename exer 'c:\users\exer.dat';

    data exercise;infile exer;

    input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14;

    run;

    When you use column input, you can read any or all fields from the raw data file

    read the fields in any order

    specify only the starting column for values that occupy only onecolumn.

    input ActLevel $ 9-12 Sex $ 14 Age 6-7;

    V if i th D t

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    24/75

    Powerpoint Templates 24

    Verifying the Data

    Whenever you use the DATA step to read raw data

    Write the DATA step using the OBS= option in the INFILEstatement.

    Submit the DATA step.

    Check the log for messages.

    View the resulting data set.

    Remove the OBS= option and re-submit the DATA step.

    Check the log again.

    View the resulting data set again.

    C ti d M dif i V i bl

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    25/75

    Powerpoint Templates 25

    Creating and Modifying Variables

    General form, assignment statement:

    variable=expression;where

    variable names a new or existing variable

    expression is any valid SAS expression.

    SAS Expressions An expression is a sequence of operands and operators that

    form a set of instructions. The instructions are performed toproduce a new value:

    Operands are variable names or constants. They can be numeric,

    character, or both. Operators are special-character operators, grouping parentheses,

    or functions.

    U i O t i SAS E i

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    26/75

    Powerpoint Templates 26

    Using Operators in SAS Expressions

    Operator Action Example Priority

    - negative prefix negative=-x; I

    ** exponentiation raise=x**y; I

    * multiplication mult=x*y; II

    / division divide=x/y; II

    + addition sum=x+y; III

    - subtraction diff=x-y; III

    When you use more than one arithmetic operator in an expression,

    operations of priority I are performed before operations of priority II,and so on

    consecutive operations that have the same priority are performed

    from right to left within priority I

    from left to right within priorities II and III

    you can use parentheses to control the order of operations.

    Reading In stream Data

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    27/75

    Powerpoint Templates 27

    Reading In stream Data

    To read in stream data, you use a DATALINES

    statement as the last statement in the DATA step(except for the RUN statement) and immediatelypreceding the data lines

    A null statement (a single semicolon) to indicate the

    end of the input data

    Using Data step for Internal raw data

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    28/75

    Powerpoint Templates 28

    Using Data step for Internal raw data

    Internal raw data

    Datalines or Cards to indicate that the data isinternal

    data cities;

    input City $ Rank ;

    datalines;

    Mumbai 1

    Delhi 2

    Chennai 3

    Calcutta 4

    ;

    run ;

    *

    Steps to Create a Raw Data File

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    29/75

    Powerpoint Templates 29

    Steps to Create a Raw Data File

    data _null_;

    set clinic.stress;file 'c:\clinic\patients\stress.dat';

    put id 1-4 name 6-25 resthr 27-29 maxhr 31-33

    rechr 35-37 timemin 39-40 timesec 42-43

    tolerance 45 totaltime 47-49;

    run;

    Using the_NULL_ Keyword The keyword _NULL_, which enables you to use the DATA step

    without actually creating a SAS data set

    A SET statement specifies the SAS data set that you want toread from.

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    30/75

    Powerpoint Templates 30

    UNDERSTANDING DATA STEPPROCESSING

    C il ti phase

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    31/75

    Powerpoint Templates 31

    Compilation phase

    A SAS DATA step is processed in two phases:

    During the compilation phase, each statement is scanned for syntax errors. Most syntax errors

    prevent further processing of the DATA step.

    When the compilation phase is complete, the descriptor portionof the new data set is created.

    Execution phase

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    32/75

    Powerpoint Templates 32

    Execution phase

    If the DATA step

    compiles successfully,then the executionphase begins. During the execution

    phase, the DATA step

    reads and processesthe input data.

    The DATA stepexecutes once for eachrecord in the input file

    Compile Program

    Initialize variablesto missing

    Execute inputstatement

    Execute otherstatements

    End ofFile ?

    Output to SASdataset

    Next step

    Yes

    No

    Execution Phase

    Compilation Phase In detail

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    33/75

    Powerpoint Templates 33

    Compilation Phase In detail

    Input Buffer

    At the beginning of the compilation phase, the input buffer(anarea of memory) is created to hold a record from the external file

    The input buffer is created only when raw data is read, not whena SAS data set is read

    The term input bufferrefers to a logical concept; it is not a

    physical storage area

    1 2 3 4 5 6 7 8 9

    Program Data Vector

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    34/75

    Powerpoint Templates 34

    Program Data Vector

    Program Data Vector

    The program data vector is the area of memory where SASbuilds a data set, one observation at a time

    Like the term input buffer, the term program data vectorrefersto a logical concept

    The program data vector contains two automatic variables that

    can be used for processing but which are not written to the dataset as part of an observation

    _N_counts the number of times that the DATA step begins toexecute.

    _ERROR_signals the occurrence of an error that is caused by the

    data during execution. The default value is 0, which means there is no error. When one or more errors

    occur, the value is set to 1.

    Syntax Checking

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    35/75

    Powerpoint Templates 35

    Syntax Checking

    During the compilation phase, SAS scans each

    statement looking for following syntax errors. missing or misspelled keywords

    invalid variable names

    missing or invalid punctuation

    invalid options.

    Data Set Variables As the INPUT statement is compiled, a slot is added to the

    program data vector for each variable in the new data set

    variable attributes such as length and type are determined the

    first time a variable is encountered. Any variables that are created with an assignment statement in

    the DATA step are also added

    Execution Phase In detail

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    36/75

    Powerpoint Templates 36

    Execution Phase In detail

    Initializing Variables

    At the beginning of the execution phase, the value of _N_ is 1. Because there are no data errors, the value of _ERROR_ is 0.

    The remaining variables are initialized to missing.

    Missing numeric values are represented by periods (.)

    missing character values are represented by blanks ()

    Input Data The INFILE statement identifies the location of the raw data.

    Input Pointer INPUT statement uses an input pointer to keep track of its

    position The input pointer starts at column 1 of the first record, unless

    otherwise directed

    Execution Phase - End of the DATA Step

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    37/75

    Powerpoint Templates 37

    Execution Phase - End of the DATA Step

    1. The values in the program data vector are written to the

    output data set as the first observation2. The value of _N_ is set to 2 and control returns to the

    top of the DATA step

    3. The variable values in the program data vector are re-

    set to missing4. That the automatic variables _N_ and _ERROR_ retain

    their values

    5. The DATA step works like a loop, repetitively executing

    statements to read data values and create observationsone by one

    End-of-File Marker The ultimate End !!

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    38/75

    Powerpoint Templates 38

    End-of-File Marker The ultimate End !!

    End-of-File Marker

    The execution phase continues the iterations until the end-of-filemarker is reached in the raw data file

    The order in which variables are defined in the DATA stepdetermines the order in which the variables are stored in thedata set

    data perm.update;

    infile invent;

    input Item $ 1-13 IDnum $ 15-19

    InStock 21-22 BackOrd 24-25;

    Total=instock+backord;

    run;

    Methods for getting your data into the SAS system

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    39/75

    Powerpoint Templates 39

    Methods for getting your data into the SAS system

    Entering data directly into SAS dataset

    Creating SAS datasets from raw files Using Data step

    Using Import Procedure

    Converting other softwares data files into SAS datasets

    Reading other softwares data files directly

    Different types of input

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    40/75

    Powerpoint Templates 40

    Different types of input

    Column Input

    List Input Formatted input

    Types ofinput

    Columninput

    List inputFormatted

    input

    Free-Format Data

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    41/75

    Powerpoint Templates 41

    Free Format Data

    What is free format data

    Data that is not arranged in columns The fields are often separated by blanks or by some other

    delimiter

    Fixed-Field Data

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    42/75

    Powerpoint Templates 42

    Fixed Field Data

    What is Fixed-Field Data

    Data is arranged in columns or fixed fields You can specify a beginning and ending column for each field

    Reading Free-Format Data

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    43/75

    Powerpoint Templates 43

    Reading Free Format Data

    Using List Input

    General form, INPUT statement using list input:INPUT variable ;

    where

    variable specifies the variable whose value the INPUT statementis to read

    $ specifies that the variable is a character variable.

    Because list input, by default, does not specify columnlocations, all fields must be separated by at least one blank or other

    delimiter fields must be read in orderfrom left to right

    you cannot skip or re-read fields.

    Using Data step for External raw data

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    44/75

    Powerpoint Templates 44

    External raw data

    Infile statement to tell SAS the filename and the path

    data cities;

    infile "C:\training\sample1.txt ;

    input City $ Rank ;

    run ;

    NOTE: The infile "C:\training\sample1.txt" is: File Name=C:\training\sample1.txt,RECFM=V, LRECL=256

    NOTE: 4 records were read from the infile "C:\training\sample1.txt".

    The minimum record length was 8.

    The maximum record length was 10.

    NOTE: The data set WORK.CITIES has 4 observations and 2 variables.

    NOTE: DATA statement used:

    real time 0.25 seconds

    cpu time 0.11 seconds

    Mumbai 1

    Delhi 2

    Chennai 3

    Calcutta 4

    Text in the external file

    Using Data step for External raw data

    *

    Working with Delimiters

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    45/75

    Powerpoint Templates 45

    Working with Delimiters

    Use the DLM= option in the INFILE statement to

    specify a delimiter other than a blank (the default)

    Example:data perm.survey;

    infile credit dlm=',';input Gender $ Age Bankcard FreqBank Deptcard FreqDept;

    run;

    Reading Raw data separated by spaces

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    46/75

    Powerpoint Templates 46

    Reading Raw data separated by spaces

    List Inputdata runners;

    input name $ surname $ age runtime1 runtime3 ;

    datalines;

    Scott A 15

    23.3 21.5

    Mark . 13 25.2 24.1

    ;

    run ;

    NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

    NOTE: The data set WORK.RUNNERS has 2 observations and 5 variables.

    All missing data must be indicated by a periodAll values are separated by at least one spaceCharacter data are eight characters or fewer

    Should not have embedded spaces

    *

    Reading Raw data separated by spaces

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    47/75

    Powerpoint Templates 47

    Reading Raw data separated by spaces

    NOTE: Invalid data for runtime2 in line 228 1-7.

    RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-

    228 Michael M 14 12 .name=Jon surname=K age=13 runtime1=25.1 runtime2=. _ERROR_=1 _N_=3

    NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

    NOTE: The data set WORK.RUNNERS has 4 observations and 5 variables.

    data runners;

    input name $ surname $ age runtime1

    runtime2 ;

    datalines;

    Scott A 15

    22.0 21.9

    Mark . 13 25.2 24.1

    Jon K 13 25.1

    Michael M 14 12 .

    ;

    run ;

    data runners;

    input name $ surname $ age runtime1

    runtime2 ;

    datalines;

    Scott A 15

    22.0 21.9 Mark . 13 25.2 24.1

    Michael M 14 12 .

    ;

    run ;

    *

    Reading Missing Values at the End of a Record

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    48/75

    Powerpoint Templates 48

    Reading Missing Values at the End of a Record

    Missover option:

    If the missing values occur at the end of the record, you can usethe MISSOVER option in the INFILE statement to read themissing values at the end of the record

    The MISSOVER option prevents SAS from going to anotherrecord if, when using list input, it does not find values in the

    current line for all the INPUT statement variables At the end of the current record, values that are expected but not

    found are set to missing

    The MISSOVER option works only for missing values that occurat the end of the record

    Reading Missing Values at the Beginning or

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    49/75

    Powerpoint Templates 49

    g g g gMiddle of a Record

    The DSD Option

    You can use the DSD option in the INFILE statement to correctlyread the raw data

    sets the default delimiter to a comma

    treats two consecutive delimiters as a missing value

    removes quotation marks from values

    If the data uses multiple delimiters or a single delimiter otherthan a comma, then simply specify the delimiter value(s) with theDLM= option

    data perm.survey;

    infile credit dsd dlm='*';

    input Gender $ Age Bankcard FreqBank Deptcard FreqDept;

    run;

    The LENGTH Statement

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    50/75

    Powerpoint Templates 50

    The variable attributes are defined when the variable is

    first encountered in the DATA step

    Modifying List Input

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    51/75

    Powerpoint Templates 51

    y g p

    The ampersand (&) modifier :

    is used to read character values that contain embedded blanks. The & indicates that a character value that is being read with list

    input might contain one or more single embedded blanks

    The value is read until two or more consecutive blanks areencountered

    The colon (:) modifier : enables you to read nonstandard data values and character

    values that are longer than eight characters, but which containno embedded blanks

    The colon (:) indicates that values are read until a blank (or otherdelimiter) is encountered, and then an informat is applied

    input Rank City & $12. Pop86 : comma.;

    Creating Free-Format Data

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    52/75

    Powerpoint Templates 52

    g

    The PUT statement can also be used with list output to

    create free-format raw data files. data _null_;

    set perm.finance;

    file 'c:\data\findat2'dlm=',' ;

    put ssn name salary date : date9.;

    run;

    PROC EXPORT DATA=SAS-data-set;OUTFILE=filename ;

    RUN;

    Reading Raw Data in Fixed Fields

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    53/75

    Powerpoint Templates 53

    g

    Column Input Features

    It can be used to read character variable values that containembedded blanks.

    input Name $ 1-25;

    No placeholder is required for missing data. A blank field is readas missing and does not cause other fields to be readincorrectly.

    Fields or parts of fields can be re-read.

    Fields do not have to be separated by blanks or other delimiters.

    Identifying Standard and Nonstandard Numeric

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    54/75

    Powerpoint Templates 54

    y gData

    Standard numeric data values can contain only

    numbers decimal points

    numbers in scientific, or E, notation (23E4)

    minus signs and plus signs.

    Some examples of standard numeric data are 15, -15, 15.4,

    +.05, 1.54E3, and -1.54E-3.

    Nonstandard numeric data includes values that contain special characters, such as percent signs

    (%), dollar signs ($), and commas (,)

    date and time values data in fraction, integer binary, real binary, and hexadecimal

    forms.

    Using Formatted Input

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    55/75

    Powerpoint Templates 55

    g p

    Whenever you encounter raw data that is organized into

    fixed fields, you can use column input to read standard data only

    formatted input to read both standard and nonstandard data.

    General Form of the INPUT Statement UsingFormatted Input

    INPUT variable informat.;

    where

    pointer-controlpositions the input pointer on a specified column

    variable is the name of the variable that is being created

    informatis the special instruction that specifies how SAS readsraw data.

    @n Column Pointer Control

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    56/75

    Powerpoint Templates 56

    @

    Using the @nColumn Pointer Control

    The @n is an absolute pointer control that moves the inputpointer to a specific column number

    The @moves the pointer to column n, which is the first columnof the field that is being read

    You can use the @n to move a pointer forward or backward

    when reading a record.INPUT @n variable informat.;

    input @9 FirstName $5. @1 LastName $7.

    The +n Pointer Control

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    57/75

    Powerpoint Templates 57

    The +nPointer Control

    The +n pointer control moves the input pointer forward to acolumn number that is relative to the current position

    The + moves the pointer forward ncolumns

    INPUT +n variable informat.;

    Using Informats

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    58/75

    Powerpoint Templates 58

    An informat is an instruction that tells SAS how to read

    raw data SAS provides many informats for reading standard and

    nonstandard data values

    Note that each informat contains a wvalue to indicate the width of the raw

    data field

    each informat also contains a period, which is a requireddelimiter

    for some informats, the optional dvalue specifies the number ofimplied decimal places

    informats for reading character data always begin with a dollarsign ($).

    PERCENTw.d DATEw. NENGOw.

    $BINARYw. DATETIMEw. PDw.d

    HEXw. PERCENTw.

    $w. JULIANw. TIMEw.

    COMMAw.d MMDDYYw. w.d

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    59/75

    Powerpoint Templates 59

    Reading Character Values

    The $w. informat enables you to read character data The wrepresents the field width of the data value (the total

    number of columns that contain the raw data field)

    Reading Standard Numeric Data

    The informat for reading standard numeric data is the w.dinformat

    The wspecifies the field width of the raw data value, the periodserves as a delimiter, and the doptionally specifies the numberof implied decimal places for the value

    . The w.dinformat ignores any specified dvalue if the dataalready contains a decimal point

    Reading Nonstandard Numeric Data

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    60/75

    Powerpoint Templates 60

    The COMMAw.dinformat is used to read numeric values

    and to remove embedded blanks commas

    dashes

    dollar signs

    percent signs right parentheses

    left parentheses, which are converted to minus signs

    1. the informat name COMMA

    2

    a value that specifies the width of the field to be read (including dollar

    signs, decimal places, or other special characters), followed by a period w.

    3an optional value that specifies the number of implied decimal places for avalue (not necessary if the value already contains decimal places).

    d

    Reading Date and Time Values

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    61/75

    Powerpoint Templates 61

    How SAS Stores Date Values ?

    When you use a SAS informat to read a date, SAS converts it toa numeric date value. A SAS date value is the number of daysfrom January 1, 1960, to the given date.

    How SAS Stores Time Values ? SAS stores time values similar to the way it stores date values. A

    SAS time value is stored as the number of seconds sincemidnight.

    A SAS datetime is a special value that combines bothdate and time information. A SAS datetime value is

    stored as the number of seconds between midnight onJanuary 1, 1960, and a given date and time.

    Date and Time Informats

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    62/75

    Powerpoint Templates 62

    MMDDYYw. Informat

    Reads mmddyyormmddyyyy

    DATEw. Informat Reads ddmmmyyorddmmmyyyy

    Date Expression SAS Date Informat

    101599 MMDDYY6.

    10/15/99 MMDDYY8.

    10 15 99 MMDDYY8.

    10-15-1999 MMDDYY10.

    Date Expression SAS Date Informat

    30May00 DATE7.

    30May2000 DATE9.

    30-May-2000 DATE11.

    Date and Time Informats

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    63/75

    Powerpoint Templates 63

    TIMEw. Informat

    Reads hh:mm:ss.ss where

    hh is an integer from 00 to 23, representing the hour

    mm is an integer from 00 to 59, representing the minute

    ss.ss is an optional field that represents seconds and hundredths of

    seconds.

    Five is the minimum acceptable field width for theTIMEw. informat.

    Time Expression SAS Time Informat

    17:00:01.34 TIME11.

    17:00 TIME5.

    2:34 TIME5.

    Date and Time Informats

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    64/75

    Powerpoint Templates 64

    The WEEKDATEw. Format

    The WEEKDATEw. format writes date values in the form day-of-week, month-name dd, yy(oryyyy).

    The WORDDATEw. Format The WORDDATEw. format is similar to the WEEKDATEw.

    format, but it does not display the day of the week or the two-

    digit year values.

    FORMAT Statement Result

    format datein weekdate3.; Mon

    format datein weekdate6.; Monday

    format datein weekdate17.; Monday, Apr 5, 99

    format datein weekdate21.; Monday, April 5, 1999

    FORMAT Statement Result

    format datein worddate3.; Apr

    format datein worddate5.; April

    format datein worddate14.; April 15, 1999

    Line Pointer Controls

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    65/75

    Powerpoint Templates 65

    When SAS reads raw data values, it keeps track of its

    position with an input pointer We have used column pointer controls and column

    specifications to determine the column placement of theinput pointer

    We can also position the input pointer on a specificrecord by using a line pointer control

    There are two types of line pointer controls The forward slash (/) specifies a line location that is relative to

    the current one

    The #nspecifies the absolute number of the line to which youwant to move the pointer

    Reading Multiple Records Sequentially

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    66/75

    Powerpoint Templates 66

    The Forward Slash (/) Line Pointer Control

    Refer to the embedded word doc for illustration

    Note that the raw data file must contain the samenumber of records for each observation that is beingcreated

    Reading Multiple Records Sequentially

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    67/75

    Powerpoint Templates 67

    The #n Line Pointer Control

    The #n specifies the absolute number of the line to which youwant to move the input pointer

    The #n pointer control can read records in any order

    Refer to the embedded word doc for illustration

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    68/75

    Creating a Single Observation from MultipleR d

  • 7/27/2019 1-Reading Raw Data in SAS Week1

    69/75

    Powerpoint Templates69

    Records

    The forward slash (/) specifies a line location that is

    relative to the current one. The / advances the input pointer to the next record. The / line pointer control only moves the input pointerforward

    and must be specified afterthe instructions for reading thevalues in the current record.

    Note that the raw data file must contain the same number ofrecords for each observation that is being created.

    Reading Multiple Records Non-Sequentially

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    70/75

    Powerpoint Templates70

    The #n Line Pointer Control

    The #n specifies the absolute number of the line to which youwant to move the input pointer.

    The #n pointer control can read records in any order

    It must be specified before the instructions for reading values ina specific record.

    Points to Remember Because the / pointer control can only move forward, the pointer

    control is specified afterthe values in the current record areread.

    The #n pointer control can read records in any order and mustbe specified before the variable names are defined.

    A semicolon should be placed at the end of the complete INPUTstatement.

    Creating Multiple Observations from a SingleR d

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    71/75

    Powerpoint Templates71

    Record

    SAS provides two line-hold specifiers. A Line-Hold

    Specifiers hold the current record for next inputstatement.

    The trailing at sign (@) holds the input record for theexecution of the next INPUT statement.

    The double trailing at sign (@@) holds the input recordfor the execution of the next INPUT statement, evenacross iterations of the DATA step.

    The term trailing indicates that the @ or @@ must bethe last item that is specified in the INPUT statement. E.g. input Name $20. @; or input Name $20. @@;

    Using the Double Trailing At Sign (@@) toH ld th C t R d

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    72/75

    Powerpoint Templates72

    Hold the Current Record

    Typically, each time a DATA step executes, the INPUT

    statement reads a new record. When the trailing @@ is used, the INPUT statement

    holds the current record and reads the next value.

    The double trailing at sign (@@)

    Holds the data line in the input bufferacross multipleexecutions of the DATA step

    Typically is used to read multiple SAS observations from a singledata line

    Should not be used with the @ pointer control, with column

    input, nor with the MISSOVER option.

    Using the Double Trailing At Sign (@@) toH ld th C t R d

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    73/75

    Powerpoint Templates73

    Hold the Current Record

    A record that is being held by the double trailing at sign

    (@@) is not released until one of the following eventsoccurs: The input pointer moves past the end of the record. Then the

    input pointer moves down to the next record.

    An INPUT statement that has no line-hold specifier executes.

    Using the Single Trailing At Sign (@) to Holdth C t R d

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    74/75

    Powerpoint Templates74

    the Current Record

    Like the double trailing @@, the single trailing @

    Enables the next INPUT statement to read from the same record Releases the current record when a subsequent INPUT

    statement executes without a line-hold specifier.

    It's easy to distinguish between the trailing @@ and thetrailing @ by remembering that the double trailing at sign (@@) holds a record across multiple

    iterations of the DATA step until the end of the record is reached.

    the single trailing at sign (@) releases a record when control

    returns to the top of the DATA step.

    http://www.powerpointstyles.com/http://www.powerpointstyles.com/
  • 7/27/2019 1-Reading Raw Data in SAS Week1

    75/75

    THANK YOU

    HAPPY LEARNING