7-lecture jan 15th 2014

Upload: abhas-agarwal

Post on 03-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 7-Lecture Jan 15th 2014

    1/40

    Data Mining

    Lecture #1 : Jan 15th 2014

  • 8/11/2019 7-Lecture Jan 15th 2014

    2/40

    Introduction

    Data is produced at a phenomenal rate

    Our ability to store has grown

    Users expect more sophisticated information How?

    UNCOVER HIDDEN INFORMATION

    DATA MINING

  • 8/11/2019 7-Lecture Jan 15th 2014

    3/40

    Data Mining works with Warehouse

    Data

    Data Warehousing provides the

    Enterprise with a memory

    Data Mining provides theEnterprise with intelligence

  • 8/11/2019 7-Lecture Jan 15th 2014

    4/40

    Database Processing vs. Data Mining

    Processing

    Query

    Well defined

    SQL

    Query

    Poorly defined

    No precise query language

    Output Precise

    Subset of database

    Output Fuzzy

    Not a subset of database

    Database Processing Data Mining

  • 8/11/2019 7-Lecture Jan 15th 2014

    5/40

    Data Mining a business Process

    Business Process: data mining is a business process

    that interacts with other business processes

    data mining starts with data, then through

    analysis informs or inspires action which in turncreates data that begets data mining

    Organizations wanting to excel do not view data

    mining as a side show. It readily fits in with other

    strategies for understanding markets and

    customers

  • 8/11/2019 7-Lecture Jan 15th 2014

    6/40

    Data Mining large amounts of data

    i. How much is a lot of Data?

    ii. Excel: max rows possible ?. A very versatile tool forworking with relatively small amounts of data

    iii. Early days of data mining (1960s and 70s) data wasscarce and some of the techniques were developedin that period

    iv. Today computing power is readily available andlarge amount of data is not a handicap it is anadvantage

    Data mining techniques work better with a large samplepopulation

  • 8/11/2019 7-Lecture Jan 15th 2014

    7/40

    Data Mining Meaningful Patterns and Rules

    i. Business Operations : generates the data as well asthe patterns at the same time

    ii. Data Mining: the goal is to find patterns that areuseful to the business. Helping business is moreimportant than amusing the miner.

    iii. Call Center Application: Classifies customers asGreen, Amber and Red for targeting retention,facilitating customer acquisition goal beingoffering better customer value.

    iv. Companies are generating business modelscentered around data mining.

  • 8/11/2019 7-Lecture Jan 15th 2014

    8/40

    Data Mining and Customer Relationship Management

    Firms of all sizes need to create 1-2-1 relationships withcustomers. Form a learning relationship with their customers.

    Firms are learning to look at the value of each customer

    individually to focus on profitable customers.

    Segmentation to personalization requires changes throughoutthe organization especially in marketing sales and customer

    support.

    Delivery centered to a customer centered organization

    Data Mining is only a collection of tools and techniques tosupport a customer centric organisation

  • 8/11/2019 7-Lecture Jan 15th 2014

    9/40

    What is Data Mining

    Narrow sense : a collection of tools andtechniques to support the business

    Broader sense: is an attitude that business

    actions should be based on learning, thatinformed decisions are better thanuninformed decisions and that measuringresults is beneficial to the business

    It is a business process and methodology forapplying analytical tools and techniques

  • 8/11/2019 7-Lecture Jan 15th 2014

    10/40

  • 8/11/2019 7-Lecture Jan 15th 2014

    11/40

    Making Money or Loosing Money

    Home Equity Loans generate revenues for the

    banks like Fidelity Investments

    Bill Paying Service should it be discontinued

    as it is loosing money. Customers perceive it

    as a value added service.

    Customer owns a house and a large credit

    card outstanding debt what should the bank

    do?

  • 8/11/2019 7-Lecture Jan 15th 2014

    12/40

    Bank of America Case Study

    BofA - boost its home equity loans business.

    Using common sense the message was:

    People with college age children want to borrow

    against home equity to pay tuition bills

    People with high but variable incomes want to use

    home equity to smooth out peaks and valleys in

    their income stream.

  • 8/11/2019 7-Lecture Jan 15th 2014

    13/40

    Bank of America Case Study Data from 42 systems of record was cleansed. Some records

    dated back to 1914. customer records had about 250 fields Decision Tree techniques were applied to the customer. Those

    that had availed the product offering as well as those whospurned the offering. Rules were discovered and a goodprospect flag was generated by a data mining model.

    Sequential patterns were studied when does the customerwant the loan. Clustering was done. 14 clusters weregenerated, one or two had intriguing properties 39% of customers had business and personal accounts

    The cluster accounted for 25%+ of the customers who had beenclassified as responders

    People may be using home equity loans to start a business

    Message use your equity to do what you always wanted to do

  • 8/11/2019 7-Lecture Jan 15th 2014

    14/40

    Virtuous Cycle

    1. Identify Business Opportunities

    2. Mining Data transform into actionable

    information

    3. Acting on Information

    4. Measure results

    5. GO TO STEP 1 ( infinite loop)Focus on Business Results rather than amusing

    the data miner

  • 8/11/2019 7-Lecture Jan 15th 2014

    15/40

    Data Mining and Marketing TestsControl Group

    Chosen at random

    receives message

    Response measures

    message without model

    Target Group

    chosen by model

    receives message

    Response measures

    message with model

    Holdout Group

    Chosen at Random

    receives no message

    Response Measures

    background response

    Modeled Holdout

    Group

    Chosen by model no

    message

    Response measuresmessage model without

    message

    Message

    Yes

    No

    Picked by Model

    NO YES

  • 8/11/2019 7-Lecture Jan 15th 2014

    16/40

    Data Mining Systems vs Operational

    Systems

  • 8/11/2019 7-Lecture Jan 15th 2014

    17/40

    Chapter 2

    Data Mining Applications in

    Marketing and CustomerRelationship Marketing

  • 8/11/2019 7-Lecture Jan 15th 2014

    18/40

    Customer Lifecycle

    Data Mining refers to the life cycle of the

    customer relationship. Five major phases:

    Prospects: are in the target market

    Responders prospects who have exhibited interest

    New Customers: responders who make a

    commitment

    Established Customers Former customers

  • 8/11/2019 7-Lecture Jan 15th 2014

    19/40

    Customer Lifecycle Stages

  • 8/11/2019 7-Lecture Jan 15th 2014

    20/40

    Subscription vs Event Based Relationships

    Event Based Relationships

    Transactions purchasing a mobile prepaid card

    Companies communicate via broadcasts

    Encourage customers to visit websites

    Subscription Based Relationships

    Postpaid Mobile Contract

    Contracts enable a learning relationships

    Customer can be studied over time

  • 8/11/2019 7-Lecture Jan 15th 2014

    21/40

    Customer Experiencenewspaper subscribes

  • 8/11/2019 7-Lecture Jan 15th 2014

    22/40

    Data Mining

    Process : Customer Acquisition

    Who are prospects:

    Prospect base may change over time

    Will the past be a good predictor for the future

    Prospects in a new geography may differ from

    current customers

    Changes to products, services may bring in a

    different target audience

  • 8/11/2019 7-Lecture Jan 15th 2014

    23/40

    Data Mining

    Role in Customer Acquisition

    Data availability limits the role of data mining Response Modeling is used for channels such as direct mail and

    telemarketing as cost of contact is relatively high. Dataavailability falls into 3 categories:

    Source of prospect Appended individual/household data

    Appended demographic data at geographical level

    Typically prospect lists are purchased

    Modeling may be required to shortlist customers for direct

    marketing perhaps based on demographic data

    Echo effect is a challenge to building models. For example aprospect receives an e-mail but responds over phone

  • 8/11/2019 7-Lecture Jan 15th 2014

    24/40

    Data Mining

    Role in Customer Activation

    Operational process, how can data mining help

    Activation provides a view of new customers at

    the point they start. A very important perspective

    and as a data source it needs to be preserved

    Customer activation provides initial conditions of

    customer relationship. Such initial conditions are

    often useful predictors of long term customerbehaviour.

  • 8/11/2019 7-Lecture Jan 15th 2014

    25/40

    Activation Funnelhome delivery newspaper subscribers

    New sales leads come though many channels Prospects/Leads

    Only leads with verifiable addresses and credit

    cards become ordersORDERS

    SUBSCRIPTIONS

    PAIDSUBSCRIPTIONS

    Only orders with routable addresses become

    subscriptions

    Only some subscriptions are paid

    Data Mining can play the role in understanding whether or not customers are

    moving through the process the way they should be or what characteristics

    cause a customer to fail during the activation stage

  • 8/11/2019 7-Lecture Jan 15th 2014

    26/40

    Data Mining

    Customer Relationship Management

    Primary goal is to increase customer value

    Up-selling buy a more expensive model

    Cross Selling broaden customer relationship

    Usage Stimulation loyalty points Customer Value Calculation assign a future

    expected value to each customer

    Customer Options vs Simplicity ?

    Are data mining and personalizationcompatible ?

  • 8/11/2019 7-Lecture Jan 15th 2014

    27/40

    Data Mining

    Customer Relationship Management

    Data Mining helps dig out customer affinities

    Data Mining can play a key role in understandingthe operational side of the business

    Customer retention is the key Predictive Modeling is often applied in this area

    Techniques of Survival Analysis or comparing longstanding customers with customers with short tenures

    Win-back Why customers left? ( analyze customer complaints)

    Tends to depend more on operational strategies

  • 8/11/2019 7-Lecture Jan 15th 2014

    28/40

    Targeting Customers

    A nationwide publication determined itsreaders have the following characteristics:

    59% readers are college educated

    46% have professional or executive occupations 21% have household income >= USD75K

    7% have household income >= $100K

    Targeting Objective:i. Any suggestions for increasing revenue for the publication?

  • 8/11/2019 7-Lecture Jan 15th 2014

    29/40

    Targeting Customers

    A nationwide publication determined itsreaders have the following characteristics:

    59% readers are college educated

    46% have professional or executive occupations 21% have household income >= USD75K

    7% have household income >= $100K

    Targeting Objective:i. Increase circulation amongst prospects matching the profile

    ii. Sell advertising space to businesses wanting to reach such an

    audience

    iii. Next Steps ?

  • 8/11/2019 7-Lecture Jan 15th 2014

    30/40

    Who Fits the Profile

    Who matches the profile better Amy : Professional College educated earns $80K pa

    Bob: High School Grad earning $50K pa

    How will you make the comparisons?

  • 8/11/2019 7-Lecture Jan 15th 2014

    31/40

    Who Fits the Profile

    Observations?

    Any room for improvement?

  • 8/11/2019 7-Lecture Jan 15th 2014

    32/40

    Who Fits the Profile

    US Population Figures: College Educated = 20.3%

    Professional/Executive = 19.2%

    Income > $75K = 9.5%

    Income > $100K = 2.4%

  • 8/11/2019 7-Lecture Jan 15th 2014

    33/40

    Who Fits the Profile

  • 8/11/2019 7-Lecture Jan 15th 2014

    34/40

    Who Fits the Profile

    New scores( Index Based) relate the

    publications target audience with the US

    Population, hence they make more sense

  • 8/11/2019 7-Lecture Jan 15th 2014

    35/40

    Data Mining & Direct Marketing Advertising : reaches prospects about whom

    nothing is known as individuals

    Direct Marketing : requires min min a phoneno or email id

    Countries have restrictions on use of data

    Household-Level data can be used directly fora rough cut segmentation based on income,

    car ownership . Is this dataset the right size?

  • 8/11/2019 7-Lecture Jan 15th 2014

    36/40

    Response Modeling Campaign response rates are in low single digits

    Models help improve response rates to direct solicitation

    Likelihood of response

    Ranking of prospects

    Data Mining techniques are extensively applied to response

    modeling.

    Direct Solicitation is an expensive process and must conform

    to resource constraints (budgets)

  • 8/11/2019 7-Lecture Jan 15th 2014

    37/40

  • 8/11/2019 7-Lecture Jan 15th 2014

    38/40

    Response Modeling

    Concen

    tration

    Or Penetration

    Lift = concentration/penetration

    Benefit = concentration - penetration

  • 8/11/2019 7-Lecture Jan 15th 2014

    39/40

    Max Benefit

    Max Benefit = penetration where the perpendicular distance

    between the curves is max

    KS statistic is also max

    Split points results in a good list and bad list of prospects

    Maximizes un-weighted average of sensitivity and specificity

    Sensitivity : likelihood that diagnosis is correct ( in medical

    world) = true positives/(false negative + true positive)

    Specificity : = proportion of true negatives amongst the people

    who get a negative result = True -ve/(True

    ve + false +ve)

    Max Benefit point also minimizes the expected loss

  • 8/11/2019 7-Lecture Jan 15th 2014

    40/40

    Confusion Matrix

    Model Prediction

    Actual

    No YES

    NO True Negative False Positive

    YES False Negative True Positive

    Sensitivity : = True positives/(False negative + True positive)Specificity : = True Negative /(True Negative + False positive)