model lifecycle

Upload: madangarli

Post on 08-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Model Lifecycle

    1/30

    Model LifecycleAjit Ghanekar

  • 8/6/2019 Model Lifecycle

    2/30

    Model Life Cycle

    Model

    Development

    ModelValidation

    ModelAssessment

    Model

    Monitoring

  • 8/6/2019 Model Lifecycle

    3/30

    Model Development

  • 8/6/2019 Model Lifecycle

    4/30

    Model Development - Process

    Understanding of

    Business Pains

    and

    Available Data

    Identification of

    Objective

    and

    Expected Outcome

    Formulation of

    Modeling Approach

    and

    Data Requirement

    Identification ofAnalysis Tool

    and

    I/O Requirement

  • 8/6/2019 Model Lifecycle

    5/30

    Model Development -Difficulties

    Voluminous Data

    Missing Data Elements

    Lack of Data Insight

    Inter-Correlated Characteristics

    & Many More

  • 8/6/2019 Model Lifecycle

    6/30

    Model Development SEMMA Methodology

    Sample Explore Modify Model Assess

  • 8/6/2019 Model Lifecycle

    7/30

    Rationale

    Manageable Data for Model Development

    Suppose to Represent Population.

    Enough to Develop model on Sample

    Model Developed on Sample Valid for Population

    Sample

  • 8/6/2019 Model Lifecycle

    8/30

    Techniques

    Popular Sampling Techniques

    Simple Radom Sampling

    With Replacement (SRSWR)

    Without Replacement(SRSWOR)

    Stratified Sampling

    Sample

  • 8/6/2019 Model Lifecycle

    9/30

  • 8/6/2019 Model Lifecycle

    10/30

    Data Partitioning

    Avoid Over-fitting of model

    Validating a Model

    Comparison of a Model

    Sample

  • 8/6/2019 Model Lifecycle

    11/30

    Data Partitioning

    Divide Sample randomly into three Parts

    Suggested Division

    Sample

    Data Type Purpose Suggested

    TrainingData

    Build Model 60%

    ValidationData

    Validate Model 30%

    TestingData

    Compare Model 10%

  • 8/6/2019 Model Lifecycle

    12/30

    Rationale

    Provides Preliminary Insights into Data

    Preliminary Insights include

    Causal Relationships

    Correlated characteristics

    Central Tendency Dispersions

    & Many More Explore

  • 8/6/2019 Model Lifecycle

    13/30

    Techniques

    Statistical Charts

    Histogram

    P-P Plot/ Q-Q Plot

    Box Chart

    Preliminary Data Analysis Mean/Median/Mode

    Symmetry/ Kurtosis

    Variance

    Explore

  • 8/6/2019 Model Lifecycle

    14/30

  • 8/6/2019 Model Lifecycle

    15/30

    Techniques

    Imputation Missing Data Analysis

    Standardization

    Standardize data

    Normalization Log Transform Logit Transform Probit Transform

    Data Reduction

    Principal Component Analysis Canonical Correlation Modify

  • 8/6/2019 Model Lifecycle

    16/30

    Rationale

    Establishes causal relationship between independentcharacteristics and Target

    Can preserve relationship in precise and concisemathematical function

    Provides unique measurement scale in-form of weightedsum of characteristics, where weights are data dependent

    Model may satisfy one of the Objectives Classification Prediction Forecasting Model

  • 8/6/2019 Model Lifecycle

    17/30

    Techniques

    For Classification Classification Trees Logistic Regression Neural Network

    For Prediction Regression Trees Linear Regression Neural Network

    Forecasting ARIMA Models

    Smoothing Techniques Exponential Smoothing Holt Winters Smoothing Moving Average Smoothing

    Model

  • 8/6/2019 Model Lifecycle

    18/30

    Model Validation

  • 8/6/2019 Model Lifecycle

    19/30

    Model Validation - Rationale

    Check for modelAccuracy

    Check for Over-fitting of Model

    Check for ModelValidity across Population

    Check for Predictabilityof Model

  • 8/6/2019 Model Lifecycle

    20/30

    Model Validation - Process

    Compute PredictedOutcome based on

    establishedDecision Rule

    Compare PredictedOutcome with

    historical Outcome

    Measure efficiencyof Model

    Check for

    unconsumedInformation

    Measure gain overrandom model

    Compute PredictedOutcome based on

    establishedDecision Rule

    Compare PredictedOutcome with

    historical Outcome

    Measure efficiencyof Model

    Measure gain overrandom model

    Training Data Validation Data

  • 8/6/2019 Model Lifecycle

    21/30

    Model Validation - Techniques

    Checking Accuracy of Model

    Confusion Matrix

    Mean Squared Error (MEE)

    Checking Efficiency of Model

    R2 and Adjusted R2

    Checking for Unconsumed Information Using Error Plots

    Gain over Random Model

    Lift Chart

  • 8/6/2019 Model Lifecycle

    22/30

    Model Validation Error Plots

  • 8/6/2019 Model Lifecycle

    23/30

    Model Validation Lift Chart

  • 8/6/2019 Model Lifecycle

    24/30

    Model Validation Confusion Matrix

    True Positive

    True NegativeFalse Positive

    False Negative

  • 8/6/2019 Model Lifecycle

    25/30

    Model Assessment & Deployment

  • 8/6/2019 Model Lifecycle

    26/30

    Model Assessment & Deployment

    Multiple Competing Models for Same problem

    Needs common Metric for Comparison

    Best Model is considered as Champion Model

    Best Model is used for Scoring on Current Data.

    Model is Deployed as

    Web Service PMML Code

    C /sas code/ R code

    ETL Job

  • 8/6/2019 Model Lifecycle

    27/30

    Test Data is Used for Model Comparison

    Test Data is Scored using various Models

    Following Metric is compared for all models

    Lift Achieved /Net Gain

    Accuracy of Models

    Adjusted R2

    Best Model is determined based on Above Metric

    Metric for Model Comparison

  • 8/6/2019 Model Lifecycle

    28/30

    Model Monitoring

  • 8/6/2019 Model Lifecycle

    29/30

    Model Monitoring

    Model Performance is Not Static

    Model Performance is Constantly Changing

    Model Performance always depends

    Changing Population

    Changing Characteristics

    Population Changes always

  • 8/6/2019 Model Lifecycle

    30/30