enterprise miner

Upload: shiva-kumar

Post on 14-Apr-2018

245 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Enterprise Miner

    1/63

    SAS Enterpr ise Miner

    Release 4.3

    A brief overview: analysis of theDonor Recapture Case (Case 3)

    Kevin Garsek Class of 2006

  • 7/27/2019 Enterprise Miner

    2/63

    Importing Base Data

    SASs main drawback is the fact that if any

    line of data has a null or blank value it willtotally disregard the full record

    In this case, if we were unable tomanipulate the data, the available recordswould decrease dramatically

    We can fight back by recoding the data aswill be shown in the import step

  • 7/27/2019 Enterprise Miner

    3/63

    Base SAS Interface Screen

  • 7/27/2019 Enterprise Miner

    4/63

    Importing Charity Data

    Text Editor

  • 7/27/2019 Enterprise Miner

    5/63

    Text Editor

    We will use the text editor in Base SAS to import the Charity Case data. In orderto use this editor, you simply type as you would in any text editor.

  • 7/27/2019 Enterprise Miner

    6/63

    Text Editor

    A line by line example of the code that we will use is as follows:

    libname charity 'C:\Documents and Settings\Kevin\Desktop\Datamining\charity.1';denotes the master folder where the raw data is housed your local PC

    data charity.raw;tells SAS to create a new dataset named charity raw

    infile 'chr\2.dat' missover firstobs=2;lets SAS know the individual subfolder in which the data is housed and tells it to import it into the new dataset

    input OSOURCE $;names the data column OSOURCE and the $ tells SAS that this is character based data (if this was left out, SASassumes that the data is numerical in format)

    OSOURCE_D = 0;due to prevalent missing data, this creates a new dummy variable termed OSOURCE_D and makes the value 0for every record

    if trim(OSOURCE) = "

    the trim statement deletes any erroneous spaces and the if sets up the opening of an if then statement tocompensate for blank data

    then do; OSOURCE = "0";this sets all missing values in the OSOURCE column to 0

    OSOURCE_D = 1;this sets the newly created dummy variable to 1 when OSOURCE was blank in the input file

    end;this ends this statement as all code from infile to end can be written on a single line in the text editor

  • 7/27/2019 Enterprise Miner

    7/63

    Importing Charity Data

    The below depicts the completed code. The actual code can be easily writtenIn Excel using a & statement and then pasted into the text editor. Moving the

    writing process to Excel will save considerable time during this laborious process.

  • 7/27/2019 Enterprise Miner

    8/63

    Importing Charity Data

    Once the code is completed, you will need to right hand click in the text editorand select submit all. This will tell SAS to read through the code in the text

    editor and execute. Be prepared, due to the large size of the data, this willtake considerable time to complete.

  • 7/27/2019 Enterprise Miner

    9/63

    Starting Enterprise Miner from Base SAS module

    You should now have a fully working dataset and you are now ready to openEnterprise Miner by following the subsequent slides.

  • 7/27/2019 Enterprise Miner

    10/63

    Starting Enterprise Miner from Base SAS module

  • 7/27/2019 Enterprise Miner

    11/63

    Starting Enterprise Miner from Base SAS module

  • 7/27/2019 Enterprise Miner

    12/63

    Binding Data to Program

    This is an exasperating activity

    Even for someone who took a SAStraining course in Enterprise Miner

    The documentation is pathetic

    Ill document each step carefully in case

    this ever happens to you

  • 7/27/2019 Enterprise Miner

    13/63

    Name Project Charity and DragInput Data Node to Workspace

  • 7/27/2019 Enterprise Miner

    14/63

    Bind Data to Project

    Right click on tools to get this menu.

  • 7/27/2019 Enterprise Miner

    15/63

    Bind Data to Project

    Left click on initialization, left click top edit.

  • 7/27/2019 Enterprise Miner

    16/63

    Bind Data to Project

    Right click select; browse for library RDATA; click ok

  • 7/27/2019 Enterprise Miner

    17/63

    Bind Data to Project

    Gotcha: Must select RAW and hit enter even though only data set in RDATA

  • 7/27/2019 Enterprise Miner

    18/63

    Change to Larger Sample

    Left click change; changed to 10,000 to give low response items representation

  • 7/27/2019 Enterprise Miner

    19/63

    Success!

  • 7/27/2019 Enterprise Miner

    20/63

    Click Variables Tab

    Notice that some variables rejected including some, this is typically due to thefact that that column has only one value throughout e.g. a dummy variable that

    is 0 due to no variation in the input data.

  • 7/27/2019 Enterprise Miner

    21/63

    Then Bad Things Happen

    Who knows why.

    If I hadnt taken the course the slides

    would stop here.

    Thats the only reason I know what to do

    Ill document this also, in case it happens

    to you.

  • 7/27/2019 Enterprise Miner

    22/63

    Crash Recovery

    Right click on top level icon; select explore

  • 7/27/2019 Enterprise Miner

    23/63

    Crash Recovery

    Open emproj; delete all files with extension .lck; open user subfolder; delete

    everything in user subfolder

  • 7/27/2019 Enterprise Miner

    24/63

    Analysis Resumes

    Well have a look at MAILCODE.

    Enterprise Miner has some neat graphicaltools that are easy to use.

    The simplest and easiest are part of thedata input tool.

  • 7/27/2019 Enterprise Miner

    25/63

    A Histogram

    Right click item, select view distribution of MAILCODE from drop down menu

  • 7/27/2019 Enterprise Miner

    26/63

    Histogram of Mailcode

    SAS has classified as missing data that R accepted and used!

  • 7/27/2019 Enterprise Miner

    27/63

    Must Identify TARGET_D as Target

    Right click row item in column Model Role, select Change Model Role from

    drop down menu, select target from next drop down menu

  • 7/27/2019 Enterprise Miner

    28/63

    Histogram of Target

    This is what makes the problem hard: extremely low response rate!

  • 7/27/2019 Enterprise Miner

    29/63

    Save changes!

  • 7/27/2019 Enterprise Miner

    30/63

    Add Data Partition Node

    Drag down from tool bar above and connect line by dragging the mouse.

  • 7/27/2019 Enterprise Miner

    31/63

    This is What it Does

    We will choose to use an 80%/20% training/validation allocation.

    Close box, right click, click Run on drop down menu.

  • 7/27/2019 Enterprise Miner

    32/63

    Design Philosophy

    Click lower tools tab. Note tools on left. One drags a tool to worksheet and

    connects with arrows. Well now drag and connect regression.

  • 7/27/2019 Enterprise Miner

    33/63

    Regression

    Chose stepwise selection, validation error. That mimics what we did in R.

  • 7/27/2019 Enterprise Miner

    34/63

    Regression

    Right hand click on the Regression node and select run

  • 7/27/2019 Enterprise Miner

    35/63

    Regression

    Regression is highlighted in green while running

  • 7/27/2019 Enterprise Miner

    36/63

    Regression

    Lets take a look at the results; SAS has a very different interpretation of importantvariables that the R analysis

  • 7/27/2019 Enterprise Miner

    37/63

    Regression

    The error rate is not that bad, but the significant variables are not necessarily easily

    interpretable.

  • 7/27/2019 Enterprise Miner

    38/63

    Regression

    Lets try it again with a few changes to the model selection

  • 7/27/2019 Enterprise Miner

    39/63

    Regression

    Again, we get results, but nothing easily interpretable.

  • 7/27/2019 Enterprise Miner

    40/63

    Regression

    Lets limit the regression to those variables determined by R to be significant.To do this, we will again right hand click on regression and select open.

  • 7/27/2019 Enterprise Miner

    41/63

    Regression

    Then go to the variables tab. Right hand click under the status column for eachunneeded variable and set the status to dont use.

  • 7/27/2019 Enterprise Miner

    42/63

    Regression

    In addition to limiting our variables to those from the R results we are going to addan interaction as well as a squared variable. The first step is to add the squared term

    by adding a transform variables node and right hand clicking on the node andselecting open.

  • 7/27/2019 Enterprise Miner

    43/63

    Regression

    From the variables tab, we will right hand click on DOB and select Transform.

  • 7/27/2019 Enterprise Miner

    44/63

    Regression

    We will now select square. This will create a new variable, DOB_L1S6, which will

    then be used in our next regression.

  • 7/27/2019 Enterprise Miner

    45/63

    Regression

    Our next step is to create an interaction. To do this, go back to the main diagram anddouble click on regression. This should bring you into the model manager where youwill click on the Interaction Builder icon.

  • 7/27/2019 Enterprise Miner

    46/63

    Regression

    On this screen, you should use the Ctrl button to highlight both Lastgift and Pepstrfl.Next, press the Cross button in order to create the new interaction variable. The newvariable should be added to the available terms window and should be used in

    subsequent regressions.

  • 7/27/2019 Enterprise Miner

    47/63

    Regression

    Results! While the initial bar graph may look complex, this is how SAS handles

    character data and creating dummy variables.

  • 7/27/2019 Enterprise Miner

    48/63

    Regression

    As we now look at the table, or coefficient estimates, we have interpretable

    results!

  • 7/27/2019 Enterprise Miner

    49/63

    Regression

    For those that are interested, you can look at the Code tab and see the actualSAS coding that one would have to write if you were to program this regression

    manually.

  • 7/27/2019 Enterprise Miner

    50/63

    Regression

    Lets add another level of analysis and try to rid the data of outliers. To do this, youwill need to incorporate a Filter Outlier node between the Transform Variables and

    Regression nodes.

  • 7/27/2019 Enterprise Miner

    51/63

    Regression

    Double click on the Filter Outliers node and then go to the Settings tab. I have usedthe above settings, but feel free to experiment for the best outcome. Once you

    have completed this step, run the regression.

  • 7/27/2019 Enterprise Miner

    52/63

    Moving On, Try a Tree

    T

  • 7/27/2019 Enterprise Miner

    53/63

    The tree itself is on the next slide.

    Does this look familiar?

    This is exactly the same as Fig 22,Learning and Validation MSEof Topic 2, Bias Variance Tradeoff.

    Tree

    T

  • 7/27/2019 Enterprise Miner

    54/63

    SAS does have some great graphics! Below is the tree which is

    typically presentable to a general audience.

    Tree

  • 7/27/2019 Enterprise Miner

    55/63

    Moving On, Try a Neural Net

    Net

  • 7/27/2019 Enterprise Miner

    56/63

    Net

    We will use the defaults for this round of processing. Duringthe run we see the below graphic.

    Net

  • 7/27/2019 Enterprise Miner

    57/63

    Net

    The results. Decent output but very difficult to disseminate toa general audience.

  • 7/27/2019 Enterprise Miner

    58/63

    Assessment Tool

    The assessment tool is supposed to givelift charts.

    Apparently it only does so for binary

    response.

    The menu item is blank for predictivemodels.

    The tool is good for easily comparingvarying model error rates.

  • 7/27/2019 Enterprise Miner

    59/63

    Assessment Tool

  • 7/27/2019 Enterprise Miner

    60/63

    Assessment ToolWhen you double click on the node you will see the following:

    Tool Root ASE Root ASE 2

    Tree 4.457445 19.86881593

    Regresion 4.421218 19.5471686

    Neural Network 4.455325 19.84992086

  • 7/27/2019 Enterprise Miner

    61/63

    Assessment ToolAs for lift charts, they are unavailable for this analysis

  • 7/27/2019 Enterprise Miner

    62/63

    Done!

    The intention was to illustrate theinterface, not assess the SASs Enterprise

    Miner per se.

    With more effort to fix the missing valuesproblems on input, better results cansurely be achieved.

    With more experience, many of the falsesteps would not have occurred.

  • 7/27/2019 Enterprise Miner

    63/63

    Looping and Control

    SASs biggest deficiency is the lack of

    looping and control structures.

    This affects all of SAS, not just Enterprise

    Miner.

    Any data manipulation, such as fixingmissing values, must be done by hand,

    one variable at a time. R has a huge advantage here!