sas stat studio v3.1

Upload: rajesh-kumar

Post on 14-Apr-2018

240 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 SAS Stat Studio v3.1

    1/69

    Sun LiSenior StatisticianNov, 2009

  • 7/29/2019 SAS Stat Studio v3.1

    2/69

    What is Stat Studio? - An introduction An overview of Stat Studio interface

    Getting started with examples

    Creating and editing data to understand Data Table

    Exploratory Data Analysis

    One /Two /Three dimensional plots

    Distribution analysis

    Data smoothing LOESS

    Model fitting Generalized Linear Models

    Linear Model

    Poisson Regression

  • 7/29/2019 SAS Stat Studio v3.1

    3/69

    New statistical software in SAS 9.2 that blends

    Interactivity of SAS/Insight

    Dynamically linked statistical graphics

    Standard statistical modeling

    Extensibility & flexibility of SAS/IML

    Matrix programming language

    Analytical power of SAS/STAT procedures

    What is the best? Point-and-click capabilities forgraphical exploration and modeling. That means:

    no programming work in using SAS!

  • 7/29/2019 SAS Stat Studio v3.1

    4/69

  • 7/29/2019 SAS Stat Studio v3.1

    5/69

    Stat Studio as an Insight Successor

    Stat Studio is a client software that can connect to SASservers which might be running on a different computerthan Stat Studio.

    Stat Studio is programmable, and therefore extensible.For example, you can program your graphics by addinglegends, curves, and other features to the graphics inorder to better analyze and visualize your data.

    Stat Studio contains many features that are not availablein SAS/INSIGHT, such as robust regression models, thegeneralized linear model with a multinomial response.

  • 7/29/2019 SAS Stat Studio v3.1

    6/69

    Stat Studio as a Programming Environment

  • 7/29/2019 SAS Stat Studio v3.1

    7/69

    Open a StatStudio session

    Start ->

    All Programs ->SAS ->Stat Studio 3.1

  • 7/29/2019 SAS Stat Studio v3.1

    8/69

  • 7/29/2019 SAS Stat Studio v3.1

    9/69

    Important notes

    Installation requirements:

    Stat Studio requires that you have a license for Base

    SAS, SAS/STAT, and SAS/IML. Stat Studio runs on a PC in the Microsoft Windows

    operating environment.

    Language supports:

    If you need to open a data set containing Chinese,Japanese, or Korean characters, it is important that youconfigure the Regional and Language Options in theWindows Control Panel for the appropriate country.

  • 7/29/2019 SAS Stat Studio v3.1

    10/69

    To create a new data set

    To understand a data table

    To change variable / observation properties

    To select data

  • 7/29/2019 SAS Stat Studio v3.1

    11/69

    To Create a new data set:File > New >Data Set

  • 7/29/2019 SAS Stat Studio v3.1

    12/69

    Step 1: Type Employee in the Name field In the Type field, select Character Click OK.

  • 7/29/2019 SAS Stat Studio v3.1

    13/69

    Step 2: Create a new variable Edit > Variables > New Variables Type Quarter in the Name field Select Nominal from the Measure Level menu Click OK.Step 3: Create a new variable Edit > Variables > New Variables Type Sales in the Name field In the Label field, type Sales in thousands In the Format list, select Dollar. Type 4 in the W field. Click OK.

  • 7/29/2019 SAS Stat Studio v3.1

    14/69

    Step 4: Enter the data

    File > Save

    For editing data:

  • 7/29/2019 SAS Stat Studio v3.1

    15/69

    To understand a Data Table:Variable name

    Variable measure levelVariable headings

    Observation number

    To indicate if the observation is included in plots and analyses.e.g. The 4th observation is excluded from plots;

    The 5th observation is excluded from analyses.

    Observation headings

  • 7/29/2019 SAS Stat Studio v3.1

    16/69

    Variable Properties: Click a column, then right-click it.

    Alternatively, from the main menu, select Edit > Variables.

    Generate _OBSTAT_ variable:

    The _OBSTAT_ variable is a charactervariable of length 20. It captures thestate of observations, including the

    color and shape of markers and whether

    an observation is selected. The first fewcharacters encode the state of binaryoptions such as whether an observationis selected. A character is 1 if thecorresponding property is true and 0 ifthe related property is false.

  • 7/29/2019 SAS Stat Studio v3.1

    17/69

    Observation Properties: Click a row, then right-click it.

    Alternatively, from the main menu, select Edit > Observations.

    Examine Selected Observation:

  • 7/29/2019 SAS Stat Studio v3.1

    18/69

    Data selection:The left portion of the figure indicates a data table that has 3394selected observations; none of the 36 variables are selected.

    The right figure indicates that 6 variables are selected, but none of the6188 observations are selected.

    Question: how many obs and var are selected?

  • 7/29/2019 SAS Stat Studio v3.1

    19/69

    After data selection:Select File > New > Data Set from Selected Data from the main menu. Anew data table appears, containing only the selected subset of theoriginal data. Then you can start working on this new dataset.

  • 7/29/2019 SAS Stat Studio v3.1

    20/69

    What is Stat Studio? - An introduction

    An overview of Stat Studio interface

    Getting started with examples

    Creating and editing data

    Exploratory Data Analysis One /Two /Three dimensional plots Distribution analysis Data smoothing

    Model fitting Generalized Linear Models

  • 7/29/2019 SAS Stat Studio v3.1

    21/69

    Demo: Open a SAS data set

    Open an Excel data set

    Open a .txt data set

    Open a SAS data set: Select File > Open > File Click Go to Installation Directorynear the bottom of the dialog box

    Double-click on the Data Setsfolder

    Select the Hurricanes.sas7bdat file.

    Click Open.

  • 7/29/2019 SAS Stat Studio v3.1

    22/69

    Open an Excel data set : Select File > Open > File Go to the folder where you saved the downloaded training data

    Select the ibmff.xls file then click Open. Check Import Options then click Ok. The file can be saved as a SAS data set by clicking File > Save as File .

  • 7/29/2019 SAS Stat Studio v3.1

    23/69

    Open a text data set: Select File > Open > File Go to the folder where you saved the downloaded training data

    Select the sp1.txt then click Ok.

    Future coding work isneeded.

  • 7/29/2019 SAS Stat Studio v3.1

    24/69

    Exploratory Data Analysis

    One /Two /Three dimensional plots

    Creating a histogram (& a matrix of histograms)

    Creating a scatter plot (& a matrix of scatter plots)

    Creating a rotating scatter plot

    Distribution analysis

    Data smoothing - LOESS

  • 7/29/2019 SAS Stat Studio v3.1

    25/69

    Creating a histogram Open the Hurricanes.sas7bdat data set

    Select Graph > Histogram from the menu Select the latitude variable, and click Set X. Then click Ok.

  • 7/29/2019 SAS Stat Studio v3.1

    26/69

    You can click on a histogram bar to select the observations contained in that bin, orhold down the CTRL key to select observations in multiple bins. The observationscategorized under the selected bins will be selected in the data window.

  • 7/29/2019 SAS Stat Studio v3.1

    27/69

    Demo: Edit Graph Area Properties to add title and background color. Edit Plot Area Properties to change bin color and show labels. Edit Axis Properties to change tick positions.

  • 7/29/2019 SAS Stat Studio v3.1

    28/69

    You can create a matrix of plots for a set of variables.

    Select a set of variables in a data table

    Draw graph from Graph menu

  • 7/29/2019 SAS Stat Studio v3.1

    29/69

    Creating a scatter plot Open the Hurricanes.sas7bdat data set

    Select Graph > Scatter Plot from the menu Select the variable wind_kts, and click Set Y. Select the variable minpressure, and click Set X. Then click Ok.

  • 7/29/2019 SAS Stat Studio v3.1

    30/69

    A matrix of scatter plots for selected variables.

  • 7/29/2019 SAS Stat Studio v3.1

    31/69

    Creating a rotating scatter plot Open the Hurricanes.sas7bdat data set

    Select Graph > Rotating Plot from the menu Select the variable wind_kts, and click Set Z. Select the variable latitude, and click Set Y. Select the variable longitude, and click Set X. Then click Ok.

  • 7/29/2019 SAS Stat Studio v3.1

    32/69

    A better visualization

  • 7/29/2019 SAS Stat Studio v3.1

    33/69

  • 7/29/2019 SAS Stat Studio v3.1

    34/69

    Exploratory Data Analysis

    One /Two /Three dimensional plots

    Distribution analysis

    Descriptive statistics

    Robust location (Trimmed/Winsorized mean).

    Distributional modeling

    Data smoothing - LOESS

  • 7/29/2019 SAS Stat Studio v3.1

    35/69

    Descriptive statistics Open the Hurricanes.sas7bdat data set

    Select Analysis > Distribution Analysis > Descriptive Statistics Select the variable pressure_outer_isobar, and click Set Y. Click the Tables tab. Select Extreme Values, and Missing Values. Then click Ok.

  • 7/29/2019 SAS Stat Studio v3.1

    36/69

  • 7/29/2019 SAS Stat Studio v3.1

    37/69

    Robust location (trimmed/winsorized mean) Open the Hurricanes.sas7bdat data set

    Select Analysis > Distribution Analysis > Location and Scale Statistics Select the variable pressure_outer_isobar, and click Set Y.

    Click the Tables tab. Select Modes. Select Robust location (trimmed/Winsorized mean). Select Robust scale.

    Click Ok. Note:1) The trimmed mean is computed after the ksmallest and klargestobservations are deleted from the sample.

    2) The Winsorized mean is computed after the k smallest observationsare replaced by the (k+1)st smallest observation, and the k largestobservations are replaced by the (k+1)st largest observation.

  • 7/29/2019 SAS Stat Studio v3.1

    38/69

  • 7/29/2019 SAS Stat Studio v3.1

    39/69

    Distributional modeling

    Open the Hurricanes.sas7bdat data set

    Select Analysis > Distribution Analysis > Distributional Modeling Select the variable pressure_outer_isobar, and click Set Y. Click the Estimator tab. Keep all as default.

    Click the Plots tab, and select all plots. Then click Ok.

  • 7/29/2019 SAS Stat Studio v3.1

    40/69

  • 7/29/2019 SAS Stat Studio v3.1

    41/69

    Exploratory Data Analysis

    One /Two /Three dimensional plots

    Distribution analysis

    Data smoothing - LOESS (or LOWESS)

  • 7/29/2019 SAS Stat Studio v3.1

    42/69

    Data Smoothing techniques:are used to eliminate noise and extract real trends and patterns.

    LOESS: locally estimated scatterplot smoothing (LOWESS stands for

    locally weighted scatterplot smoothing).

    A specific width of points along the x axis is selected adjacentto the point being predicted, and a low degree polynomialequation (often just linear) is fit through that subset of the data.

    Data requirements: densely sampled datasets, typicallycontinuous numeric data, although discrete numeric data canbe used.

    Limitations: computationally intensive, no ready formula to use.

    The related SAS procedure is LOESS procedure in SAS/STAT.

  • 7/29/2019 SAS Stat Studio v3.1

    43/69

    Open the Miningx.sas7bdat data set

    Select Analysis > Data Smoothing > Loess Select the variable driltime, and click Set Y. Select the variable depth, and click Set X. Click the Plots tab Select Raw residuals vs. Explanatory,

    and Smoothing criterion vs Smoothing parameter.

  • 7/29/2019 SAS Stat Studio v3.1

    44/69

  • 7/29/2019 SAS Stat Studio v3.1

    45/69

    To Compare Smoothers:The predicted value at a point x is determined by a

    weighted average of observations near x. The number ofobservations used to form the predicted value depends on thesmoothing parameter.

    For these data, the optimal smoothing parameter wasapproximately 0.131. This value results in a smoother thatvaries with the hardness of the underlying rock strata.

    While 0.131 is a global minimum of the AICC function,

    there might be a local minimum at a larger value of thesmoothing parameter. Using a larger value results in asmoother that is less sensitive to local variation in rockhardness.

  • 7/29/2019 SAS Stat Studio v3.1

    46/69

    Click on the scatter plot ofdriltime versus depth to activate that window.

    Select Analysis > Data Smoothing > Loess Click the Method Tab. Click Exhaustive search for minimum. Click Restrict search range and type 0.5 for the Lower bound. Click the Plots tab, and clear Raw residuals vs. Explanatory.

    The Exhaustive search for minimumoption is computationallyexpensive. This case has 80observations, the option results in

    evaluating loess models with atleast 40 (0.5 80) points in thelocal neighborhoods.

  • 7/29/2019 SAS Stat Studio v3.1

    47/69

    The new loess smootherindicates that thedrilling time variesroughly linearly atdepths between 0 and200 feet, and linearly

    (with a different slope)at depths greater than300 feet. Between 200and 300 feet, theresponse variesnonlinearly.

    Penner and Watts (1991) suggest that air forced through thedrill shaft is able to expel debris from the hole at depths lessthan 200 feet, but at greater depths more and more debrisfalls back into the hole, thus reducing the drills efficiency.

  • 7/29/2019 SAS Stat Studio v3.1

    48/69

    What is Stat Studio? - An introduction

    An overview of Stat Studio interface

    Getting started with examples

    Creating and editing data

    Exploratory Data Analysis

    Model fitting Generalized Linear Models Linear Model Poisson Regression

  • 7/29/2019 SAS Stat Studio v3.1

    49/69

    The generalized linear model is a generalization of the traditional

    linear model. It differs from a linear model in that it assumes thatthe response distribution is related to the linear predictorthrough a function called the link function.

    The related SAS procedure is GENMOD procedure in SAS/STAT.

  • 7/29/2019 SAS Stat Studio v3.1

    50/69

    Open the Drug.sas7bdat data set

    This experiment was carried out to

    evaluate the effect of four drugs with

    three experimentally induced diseases.

    Each drug-by-disease combination was

    applied to six randomly selected dogs.

    The response variable, changbp, is the

    increase in systolic blood pressure due to

    the treatment. The variables drug and

    disease are classification variables: their

    values identify distinct levels or groups.

  • 7/29/2019 SAS Stat Studio v3.1

    51/69

    1. variable properties Choose variable Drug and Disease

    Right-click the selected variables

    Choose Nominal instead ofInterval

    Step 1: Exploring the data

  • 7/29/2019 SAS Stat Studio v3.1

    52/69

    2. Box plots Select Graph > Box Plot Create a box plot ofchange_bp vs drug

    Right-click near the center of plot, and select Plot Area Properties Select Mean: with one standard deviation Click OK, and create a box plot ofchange_bp vs disease

  • 7/29/2019 SAS Stat Studio v3.1

    53/69

    Step 2: Create an initial model Select Analysis > Model Fitting >Generalized Linear Models Select change_bp in Add Y Select drug & disease in Add Class.

  • 7/29/2019 SAS Stat Studio v3.1

    54/69

    Click Effects tab Select drug & disease from the Explanatory Variable list Select Cross from the Standard Effects list. Click Add.The Method tab enables you to specify aspects of the generalized linear modelsuch as the response distribution and the link function. The default distribution

    for the response is the normal distribution, and the default link function is theidentity function.

    You do not need to modify this tab for linear models.

    Click Tables tab In Type 3 Analysis of Contrasts, clear Wald, and select Likelihood Ratio. Click OK.

  • 7/29/2019 SAS Stat Studio v3.1

    55/69

  • 7/29/2019 SAS Stat Studio v3.1

    56/69

    Step 3: Revise the model Reopen the GLM analysis menu.

    Click Effects tab. Select drug & disease from the Effects in Model list. Select Remove, then click OK to rerun the model.

    Note: The items on the Analysis menuare not available if the output windowis active.

  • 7/29/2019 SAS Stat Studio v3.1

    57/69

  • 7/29/2019 SAS Stat Studio v3.1

    58/69

    Poisson Model -For a response variable Y and the expected value ofY is ,a Poisson model for this case is:

    log()= XSometimes the counts represent the number of events that occurred

    during an observed time period. Some counts might correspond tolonger time periods than others do. In this situation, you want tomodel the rate at which the events occur.

    When you model a rate, you are modeling the number of events, Y,per unit of time, T.The model can be rewritten as:

    log()= log(T)+ Xlog(T)is called an offset variable.

  • 7/29/2019 SAS Stat Studio v3.1

    59/69

    Open the Ship.sas7bdat data set

    The response variable, y, is the number ofdamage incidents that occurred during thenumber of months that ship was in service(contained in the months variable).The quantity log(months) is an offset variablefor this model. The threeclassification variables are as follows:

  • 7/29/2019 SAS Stat Studio v3.1

    60/69

    Step 1: Explore the data1. Create the ratio of y to months Select Analysis > Variable Transformation Use the Y / X transformation from the Two Variable family. Select y to be Y and months to be X, then click Next. Give Name as IncidentsPerMonth, then click Finish.

  • 7/29/2019 SAS Stat Studio v3.1

    61/69

    2. Create box plots IncidentsPerMonth vsdifferent explanatory variables.

  • 7/29/2019 SAS Stat Studio v3.1

    62/69

    Step 2: Create the offset variable Select Analysis > Variable Transformation The transformation log(Y+a) is highlighted by default. Click Next. Select the months variable, and click Set Y.

    Click Finish.

  • 7/29/2019 SAS Stat Studio v3.1

    63/69

    Step 3: Model the data Select Analysis > Model Fitting> Generalized Linear Models Select y in Add Y Select type, year & period inAdd Class. Click Tables tab Select Likelihood Ratio torequest Type 3 contrasts andclear Wald.

  • 7/29/2019 SAS Stat Studio v3.1

    64/69

    Click Roles tab Select Logmonths, and click Set Offset. Click Methods tab Select Poisson for Response Distribution. Click OK to run analysis.

    When a response

    distribution is Poisson, thedefault link function is thenatural log. Consequently,you do not need to changethe Link function value.

  • 7/29/2019 SAS Stat Studio v3.1

    65/69

  • 7/29/2019 SAS Stat Studio v3.1

    66/69

    Step 3: Model overdispersion Reopen the GLM analysis menu.

    Click Methods tab. Select Pearson chi-square/DF for the field Estimate scale parameter asClick OK to rerun the model.

  • 7/29/2019 SAS Stat Studio v3.1

    67/69

  • 7/29/2019 SAS Stat Studio v3.1

    68/69

    What is Stat Studio? - An introduction An overview of Stat Studio interface

    Getting started with examples

    Creating and editing data to understand Data Table

    Exploratory Data Analysis One /Two /Three dimensional plots

    Distribution analysis

    Data smoothing LOESS

    Model fitting Generalized Linear Models

    Linear Model

    Poisson Regression

  • 7/29/2019 SAS Stat Studio v3.1

    69/69

    ThanK yoU!