7-lecture jan 15th 2014
TRANSCRIPT
-
8/11/2019 7-Lecture Jan 15th 2014
1/40
Data Mining
Lecture #1 : Jan 15th 2014
-
8/11/2019 7-Lecture Jan 15th 2014
2/40
Introduction
Data is produced at a phenomenal rate
Our ability to store has grown
Users expect more sophisticated information How?
UNCOVER HIDDEN INFORMATION
DATA MINING
-
8/11/2019 7-Lecture Jan 15th 2014
3/40
Data Mining works with Warehouse
Data
Data Warehousing provides the
Enterprise with a memory
Data Mining provides theEnterprise with intelligence
-
8/11/2019 7-Lecture Jan 15th 2014
4/40
Database Processing vs. Data Mining
Processing
Query
Well defined
SQL
Query
Poorly defined
No precise query language
Output Precise
Subset of database
Output Fuzzy
Not a subset of database
Database Processing Data Mining
-
8/11/2019 7-Lecture Jan 15th 2014
5/40
Data Mining a business Process
Business Process: data mining is a business process
that interacts with other business processes
data mining starts with data, then through
analysis informs or inspires action which in turncreates data that begets data mining
Organizations wanting to excel do not view data
mining as a side show. It readily fits in with other
strategies for understanding markets and
customers
-
8/11/2019 7-Lecture Jan 15th 2014
6/40
Data Mining large amounts of data
i. How much is a lot of Data?
ii. Excel: max rows possible ?. A very versatile tool forworking with relatively small amounts of data
iii. Early days of data mining (1960s and 70s) data wasscarce and some of the techniques were developedin that period
iv. Today computing power is readily available andlarge amount of data is not a handicap it is anadvantage
Data mining techniques work better with a large samplepopulation
-
8/11/2019 7-Lecture Jan 15th 2014
7/40
Data Mining Meaningful Patterns and Rules
i. Business Operations : generates the data as well asthe patterns at the same time
ii. Data Mining: the goal is to find patterns that areuseful to the business. Helping business is moreimportant than amusing the miner.
iii. Call Center Application: Classifies customers asGreen, Amber and Red for targeting retention,facilitating customer acquisition goal beingoffering better customer value.
iv. Companies are generating business modelscentered around data mining.
-
8/11/2019 7-Lecture Jan 15th 2014
8/40
Data Mining and Customer Relationship Management
Firms of all sizes need to create 1-2-1 relationships withcustomers. Form a learning relationship with their customers.
Firms are learning to look at the value of each customer
individually to focus on profitable customers.
Segmentation to personalization requires changes throughoutthe organization especially in marketing sales and customer
support.
Delivery centered to a customer centered organization
Data Mining is only a collection of tools and techniques tosupport a customer centric organisation
-
8/11/2019 7-Lecture Jan 15th 2014
9/40
What is Data Mining
Narrow sense : a collection of tools andtechniques to support the business
Broader sense: is an attitude that business
actions should be based on learning, thatinformed decisions are better thanuninformed decisions and that measuringresults is beneficial to the business
It is a business process and methodology forapplying analytical tools and techniques
-
8/11/2019 7-Lecture Jan 15th 2014
10/40
-
8/11/2019 7-Lecture Jan 15th 2014
11/40
Making Money or Loosing Money
Home Equity Loans generate revenues for the
banks like Fidelity Investments
Bill Paying Service should it be discontinued
as it is loosing money. Customers perceive it
as a value added service.
Customer owns a house and a large credit
card outstanding debt what should the bank
do?
-
8/11/2019 7-Lecture Jan 15th 2014
12/40
Bank of America Case Study
BofA - boost its home equity loans business.
Using common sense the message was:
People with college age children want to borrow
against home equity to pay tuition bills
People with high but variable incomes want to use
home equity to smooth out peaks and valleys in
their income stream.
-
8/11/2019 7-Lecture Jan 15th 2014
13/40
Bank of America Case Study Data from 42 systems of record was cleansed. Some records
dated back to 1914. customer records had about 250 fields Decision Tree techniques were applied to the customer. Those
that had availed the product offering as well as those whospurned the offering. Rules were discovered and a goodprospect flag was generated by a data mining model.
Sequential patterns were studied when does the customerwant the loan. Clustering was done. 14 clusters weregenerated, one or two had intriguing properties 39% of customers had business and personal accounts
The cluster accounted for 25%+ of the customers who had beenclassified as responders
People may be using home equity loans to start a business
Message use your equity to do what you always wanted to do
-
8/11/2019 7-Lecture Jan 15th 2014
14/40
Virtuous Cycle
1. Identify Business Opportunities
2. Mining Data transform into actionable
information
3. Acting on Information
4. Measure results
5. GO TO STEP 1 ( infinite loop)Focus on Business Results rather than amusing
the data miner
-
8/11/2019 7-Lecture Jan 15th 2014
15/40
Data Mining and Marketing TestsControl Group
Chosen at random
receives message
Response measures
message without model
Target Group
chosen by model
receives message
Response measures
message with model
Holdout Group
Chosen at Random
receives no message
Response Measures
background response
Modeled Holdout
Group
Chosen by model no
message
Response measuresmessage model without
message
Message
Yes
No
Picked by Model
NO YES
-
8/11/2019 7-Lecture Jan 15th 2014
16/40
Data Mining Systems vs Operational
Systems
-
8/11/2019 7-Lecture Jan 15th 2014
17/40
Chapter 2
Data Mining Applications in
Marketing and CustomerRelationship Marketing
-
8/11/2019 7-Lecture Jan 15th 2014
18/40
Customer Lifecycle
Data Mining refers to the life cycle of the
customer relationship. Five major phases:
Prospects: are in the target market
Responders prospects who have exhibited interest
New Customers: responders who make a
commitment
Established Customers Former customers
-
8/11/2019 7-Lecture Jan 15th 2014
19/40
Customer Lifecycle Stages
-
8/11/2019 7-Lecture Jan 15th 2014
20/40
Subscription vs Event Based Relationships
Event Based Relationships
Transactions purchasing a mobile prepaid card
Companies communicate via broadcasts
Encourage customers to visit websites
Subscription Based Relationships
Postpaid Mobile Contract
Contracts enable a learning relationships
Customer can be studied over time
-
8/11/2019 7-Lecture Jan 15th 2014
21/40
Customer Experiencenewspaper subscribes
-
8/11/2019 7-Lecture Jan 15th 2014
22/40
Data Mining
Process : Customer Acquisition
Who are prospects:
Prospect base may change over time
Will the past be a good predictor for the future
Prospects in a new geography may differ from
current customers
Changes to products, services may bring in a
different target audience
-
8/11/2019 7-Lecture Jan 15th 2014
23/40
Data Mining
Role in Customer Acquisition
Data availability limits the role of data mining Response Modeling is used for channels such as direct mail and
telemarketing as cost of contact is relatively high. Dataavailability falls into 3 categories:
Source of prospect Appended individual/household data
Appended demographic data at geographical level
Typically prospect lists are purchased
Modeling may be required to shortlist customers for direct
marketing perhaps based on demographic data
Echo effect is a challenge to building models. For example aprospect receives an e-mail but responds over phone
-
8/11/2019 7-Lecture Jan 15th 2014
24/40
Data Mining
Role in Customer Activation
Operational process, how can data mining help
Activation provides a view of new customers at
the point they start. A very important perspective
and as a data source it needs to be preserved
Customer activation provides initial conditions of
customer relationship. Such initial conditions are
often useful predictors of long term customerbehaviour.
-
8/11/2019 7-Lecture Jan 15th 2014
25/40
Activation Funnelhome delivery newspaper subscribers
New sales leads come though many channels Prospects/Leads
Only leads with verifiable addresses and credit
cards become ordersORDERS
SUBSCRIPTIONS
PAIDSUBSCRIPTIONS
Only orders with routable addresses become
subscriptions
Only some subscriptions are paid
Data Mining can play the role in understanding whether or not customers are
moving through the process the way they should be or what characteristics
cause a customer to fail during the activation stage
-
8/11/2019 7-Lecture Jan 15th 2014
26/40
Data Mining
Customer Relationship Management
Primary goal is to increase customer value
Up-selling buy a more expensive model
Cross Selling broaden customer relationship
Usage Stimulation loyalty points Customer Value Calculation assign a future
expected value to each customer
Customer Options vs Simplicity ?
Are data mining and personalizationcompatible ?
-
8/11/2019 7-Lecture Jan 15th 2014
27/40
Data Mining
Customer Relationship Management
Data Mining helps dig out customer affinities
Data Mining can play a key role in understandingthe operational side of the business
Customer retention is the key Predictive Modeling is often applied in this area
Techniques of Survival Analysis or comparing longstanding customers with customers with short tenures
Win-back Why customers left? ( analyze customer complaints)
Tends to depend more on operational strategies
-
8/11/2019 7-Lecture Jan 15th 2014
28/40
Targeting Customers
A nationwide publication determined itsreaders have the following characteristics:
59% readers are college educated
46% have professional or executive occupations 21% have household income >= USD75K
7% have household income >= $100K
Targeting Objective:i. Any suggestions for increasing revenue for the publication?
-
8/11/2019 7-Lecture Jan 15th 2014
29/40
Targeting Customers
A nationwide publication determined itsreaders have the following characteristics:
59% readers are college educated
46% have professional or executive occupations 21% have household income >= USD75K
7% have household income >= $100K
Targeting Objective:i. Increase circulation amongst prospects matching the profile
ii. Sell advertising space to businesses wanting to reach such an
audience
iii. Next Steps ?
-
8/11/2019 7-Lecture Jan 15th 2014
30/40
Who Fits the Profile
Who matches the profile better Amy : Professional College educated earns $80K pa
Bob: High School Grad earning $50K pa
How will you make the comparisons?
-
8/11/2019 7-Lecture Jan 15th 2014
31/40
Who Fits the Profile
Observations?
Any room for improvement?
-
8/11/2019 7-Lecture Jan 15th 2014
32/40
Who Fits the Profile
US Population Figures: College Educated = 20.3%
Professional/Executive = 19.2%
Income > $75K = 9.5%
Income > $100K = 2.4%
-
8/11/2019 7-Lecture Jan 15th 2014
33/40
Who Fits the Profile
-
8/11/2019 7-Lecture Jan 15th 2014
34/40
Who Fits the Profile
New scores( Index Based) relate the
publications target audience with the US
Population, hence they make more sense
-
8/11/2019 7-Lecture Jan 15th 2014
35/40
Data Mining & Direct Marketing Advertising : reaches prospects about whom
nothing is known as individuals
Direct Marketing : requires min min a phoneno or email id
Countries have restrictions on use of data
Household-Level data can be used directly fora rough cut segmentation based on income,
car ownership . Is this dataset the right size?
-
8/11/2019 7-Lecture Jan 15th 2014
36/40
Response Modeling Campaign response rates are in low single digits
Models help improve response rates to direct solicitation
Likelihood of response
Ranking of prospects
Data Mining techniques are extensively applied to response
modeling.
Direct Solicitation is an expensive process and must conform
to resource constraints (budgets)
-
8/11/2019 7-Lecture Jan 15th 2014
37/40
-
8/11/2019 7-Lecture Jan 15th 2014
38/40
Response Modeling
Concen
tration
Or Penetration
Lift = concentration/penetration
Benefit = concentration - penetration
-
8/11/2019 7-Lecture Jan 15th 2014
39/40
Max Benefit
Max Benefit = penetration where the perpendicular distance
between the curves is max
KS statistic is also max
Split points results in a good list and bad list of prospects
Maximizes un-weighted average of sensitivity and specificity
Sensitivity : likelihood that diagnosis is correct ( in medical
world) = true positives/(false negative + true positive)
Specificity : = proportion of true negatives amongst the people
who get a negative result = True -ve/(True
ve + false +ve)
Max Benefit point also minimizes the expected loss
-
8/11/2019 7-Lecture Jan 15th 2014
40/40
Confusion Matrix
Model Prediction
Actual
No YES
NO True Negative False Positive
YES False Negative True Positive
Sensitivity : = True positives/(False negative + True positive)Specificity : = True Negative /(True Negative + False positive)