big data monetization in telecoms

100
© 2014 Beyond the Arc, Inc. Customer Experience & Strategic Communications July 15, 2014 Big Data Monetization in Telecoms Addressing your biggest business challenges with Data Science Workshop

Upload: beyond-the-arc-inc

Post on 18-Jul-2015

150 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Big Data Monetization in Telecoms

Beyond the Arc, Inc. © 2014 Beyond the Arc, Inc.

Customer Experience & Strategic Communications

July 15, 2014

Big Data Monetization in Telecoms

Addressing your biggest business challenges

with Data Science

Workshop

Page 2: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

About Beyond the Arc

2

Beyond the Arc is a Berkeley-based customer experience consultancy that helps businesses use Big Data to:

• Transform the customer experience

• Streamline operations

• Develop the products of the future

Page 3: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Who am I?

Brandon Purcell

Data Science Team Lead at Beyond the Arc

• Manage a team of data scientists who specialize in translating business challenges into data challenges, then translating data solutions back into implementable business solutions

• Trader on the CBOE and American Stock Exchange

• Peace Corps volunteer in Benin (West Africa)

• MBA from Haas School of Business at UC Berkeley

• BA from Dartmouth College

3

Page 4: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Who are you?

Tell us about yourselves

• Name and current position

• Background – how did you find your way into Big Data?

• What are you hoping to get out of today’s workshop?

• Something about yourself that you don’t usually reveal to people you’ve just met

4

Page 5: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Workshop goal #1

5

Our goal today is to teach you a standard framework for data

mining and apply it to your specific Big Data business challenges

Page 6: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Workshop goal #2

6

Have fun!(We are in Vegas, after all)

Page 7: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Agenda

7

Time Items Goals

1:30 to 1:45 Introductions Introduce ourselvesReview workshop goals

1:45 to 2:00 What is Big Data? Define Big DataIdentify your top Big Data challenges

2:00 to 3:00 CRISP-DM• Business Understanding• Data Understanding• Data Preparation

Introduction to Data Mining and CRISP-DMApply CRISP-DM to your real-world challenges

3:00 to 3:30 Break Relax and reenergize

3:30 to 4:00 CRISP-DM• Modeling• Evaluation• Deployment

Apply CRISP-DM to your real-world challenges

4:00 to 4:30 Wild-card discussion and conclusion

Discuss top-of-mind topicsWrap up the workshop

Page 8: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Agenda

8

Time Items Goals

1:30 to 1:45 Introductions Introduce ourselvesReview workshop goals

1:45 to 2:00 What is Big Data? Define Big DataIdentify your top Big Data challenges

2:00 to 3:00 CRISP-DM• Business Understanding• Data Understanding• Data Preparation

Introduction to Data Mining and CRISP-DMApply CRISP-DM to your real-world challenges

3:00 to 3:30 Break Relax and reenergize

3:30 to 4:00 CRISP-DM• Modeling• Evaluation• Deployment

Apply CRISP-DM to your real-world challenges

4:00 to 4:30 Wild-card discussion and conclusion

Discuss top-of-mind topicsWrap up the workshop

Page 9: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

What is Big Data and how can we monetize it?

9

Page 10: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

What is Big Data?

10

Page 11: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

and how can we monetize it?

11

Page 12: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

What Big Data business challenges are you

facing?

12

In other words, why are you here?

• Tell us one of your most pressing Big Data business challenges and we will attempt to address it today

• Note: “I need a Big Data solution” is not an adequate answer

◦ Why do you need it?

◦ What real business problem will it address?

◦ How will an effective solution impact your bottom line?

Page 13: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 13

Look for this criteria

Business problems with:

• Clear business objective

• Available data

• Feeds into existing business process and makes it better

Page 14: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 14

Selecting your project

• What are your most pressing problems…

• about which you have data…

• that we can address in a short time (30-90 days)?

Page 15: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

What Big Data business challenges are you

facing?

15

In other words, why are you here?

Tell us one of your most pressing Big Data business challenges and we will

attempt to address it today

Page 16: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

What Big Data business challenges are you

facing?

Common examples of Big Data business challenges:

• Retention – How can we stop our customers from leaving?

• Cross-sell and up-sell – How can we make our customer relationships more profitable?

• Product development – How do I know what people want? How do a I improve an existing product or service?

• Operational efficiency – Where is there waste in our operations? How can we cut costs?

• Predictive maintenance – Can we predict and therefore prevent outages before they occur?

• Compliance – How can we prevent complaints and compliance issues before they occur?

16

Page 17: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Big Data opportunities for telecoms

17

Page 18: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Agenda

18

Time Items Goals

1:30 to 1:45 Introductions Introduce ourselvesReview workshop goals

1:45 to 2:00 What is Big Data? Define Big DataIdentify your top Big Data challenges

2:00 to 3:00 CRISP-DM• Business Understanding• Data Understanding• Data Preparation

Introduction to Data Mining and CRISP-DMApply CRISP-DM to your real-world challenges

3:00 to 3:30 Break Relax and reenergize

3:30 to 4:00 CRISP-DM• Modeling• Evaluation• Deployment

Apply CRISP-DM to your real-world challenges

4:00 to 4:30 Wild-card discussion and conclusion

Discuss top-of-mind topicsWrap up the workshop

Page 19: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 19

Introduction to Data Mining

and CRISP-DM

Page 20: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

What is data mining?

• Data mining means:

o finding patterns in your data

o which you can use

o to do your business better

20

Page 21: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Data mining algorithms are only tools

• Data Mining algorithms are incredibly smart data-wise, but incredibly dumb business-wise.

• Algorithms find patterns in data.

• We’re looking for patterns in business and customer behavior.

• Only significant and actionable patterns are interesting -computers can’t decide that.

21

Page 22: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Okay, how do we do it?

• Create models to understand and predict behavior; apply the models by generating a score for each customer

• Deploy these models immediately, tested and targeted across the customer base

• Evaluate change in customer behavior

• Repeat this process to learn for the future, which leads to…

22

Page 23: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Decision Models

• Most exciting results of data mining are decision models.

• These are executable objects that can be put to work wherever appropriate in the business.

• For example, decision models can score each customer with:

o Their risk to default on payment

o Their propensity to buy a new product/service

o The likelihood that they will close their account

23

Page 24: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Applications

• Propensity too Make a purchase

o Take up offer

o Default on payment

o Cancel a contract

• Likelihood of risko Credit risk

o Fraud risk

• Cross-Sell / Up-Sello Next best activities

o Next best offer

• Segmentationo Groups of customers

o Groups of products

o Groups of communities

• Campaign Strategyo Creative Campaign strategy

o Call Centre strategy

o Direct Mailing strategy

o Viral Marketing strategy

24

Page 25: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Likelihood to purchase

Customer1 ………...90%Customer2 ………...50%Customer3 ………...80%Customer4 ………...10%Customer5 ………... 5%Customer6 ………...95%

Scoring each customer

• Predictive models are scored using current customer data

• Each customer is given a unique score (e.g., cross-selling likehood, risk, potential revenue)

25

DataWarehouse

Data MiningModel

Page 26: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 26

Business Examples

of Predictive Analytics

Page 27: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Customer Acquisition

• Problem: We need more customers

• Resolution: Send promotions to those most likely to accept our offer

• Process:

o Get data showing who accepted the same or similar offer in the past

o Purchase demographic data about people

o Build models that match buyers and demographics

o Use the models to rank the prospects

Page 28: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Cross-sell and Up-sell

• Problem: Let’s make the customers we have more profitable by

selling them more products or products with a higher margin

• Resolution: Predict which ones are most likely to buy and why. Present targeted offers to those most likely to buy.

• Process:

o Build predictive models from historical data

o Use the models to understand reasons for purchases

o Predict buying propensity and reason for each customer

Page 29: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Churn Analysis – Customer Retention

29

• Problem: Customers are leaving

• Resolution: Predict which ones are most likely to leave and why. Prepare targeted retention messages and incentives, so the call center is ready when customers call.

• Process:

o Using historical data, build models predicting which customers are most likely to leave and why

o Predict risk level and reasons for each customer and store in database, along with prepared response text

Page 30: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 30

CRISP-DM:

Cross-Industry Standard Process

for Data Mining

Page 31: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

CRISP-DM Process

What it is:

• Reliable and repeatable process for doing data mining projects

• Used across industries

Six phases:

• Business Understanding

• Data Understanding

• Data Preparation

• Modeling

• Evaluation

• Deployment

31

Page 32: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 32

Business Understanding

Page 33: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Focus of Business Understanding

• Understand project objectives and requirements from a business perspective

• Convert this knowledge into:

o Data mining problem definition

o Preliminary plan designed to achieve the objectives

33

Page 34: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 34

Business Understanding Tasks

• Determine Business Objectives

• Assess Situation

• Determine Data Mining Goal

• Produce Project Plan

Page 35: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 35

Tasks with details

• Determine Business Understandingo Describe the business owner’s primary objectives

o Describe the criteria for a successful/useful outcome

• Assess Situation

o Identify resources, assumptions, constraints, and risks

o Do a cost-benefit analysis

• Determine Data Mining Goals

o Describe outputs and define success criteria

• Produce Project Plan

o Specify steps, including selection of tools and techniques

o List stages and dependencies

Page 36: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 36

Business Understanding – Key Questions

• Imagine that you are starting a new data mining project.

• What questions would you ask to understand the business owner’s needs and expectations?

Page 37: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 37

Business Understanding – Key Questions

• Who are my business partners

• What is important to them?

• What resources do I need? Have available?

• What assumptions am I making?

• What constraints should I consider?

• What are my data mining goals?

• How will I know if I’ve achieved them?

• What’s the timeline? Budget?

Page 38: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 38

Exercise – Business Understanding

Answer as many of these key questions as possible

• Who are my business partners

• What is important to them?

• What resources do I need? Have available?

• What assumptions am I making?

• What constraints should I consider?

• What are my data mining goals?

• How will I know if I’ve achieved them?

• What’s the timeline? Budget?

You have 10 minutes

Page 39: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 39

Data Understanding

Page 40: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Focus of Data Understanding

• Begins with data collection

• Followed by activities to:

o Get familiar with the data

o Identify data quality problems

o Discover first insights into the data

o Detect interesting subsets to form hypotheses for hidden information

40

Page 41: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 41

Data Understanding Tasks

• Collect Data

• Describe Data

• Explore Data

• Verify Data Quality

Page 42: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 42

Tasks with details

• Collect datao Acquire data necessary for project

o Integration of multiple data sources may be necessary

• Describe datao High level report on data properties

• Explore datao Identify key attributes

o Identify interesting subsets

• Verify data qualityo Determine whether data is complete

o Determine steps for Data Preparation

Page 43: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 43

Data Understanding – Key Questions

When starting a new data mining project with a new data source, what key questions would you ask?

Page 44: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 44

Data Understanding – Key Questions

• What data do I have?

• What data do I need?

• Where do I acquire it and how do I get access?

• How often is it updated?

• How clean is it?

• How and when was it collected?

• How far back does it go?

• Is the data internal? External? Mixed?

• What’s the security protocol? Can I share it / take it home?

• Can the data be improved at all going forward?

Page 45: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 45

Data Understanding – Key Questions

Tell us a little bit about your data

• What data do I have?

• What data do I need?

• Where do I acquire it and how do I get access?

• How often is it updated?

• How clean is it?

• How and when was it collected?

• How far back does it go?

• Is the data internal? External? Mixed?

• What’s the security protocol? Can I share it / take it home?

• Can the data be improved at all going forward?

You have 10 minutes

Page 46: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 46

Data Preparation

Page 47: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Focus of Data Preparation

• Covers all activities needed to construct final dataset

• Tasks are likely to be performed multiple times, and not in any set order; they include:

o Table, record, and attribute selection

o Transformation and cleaning of data

90% of an analyst’s time is spent on Data Prep

47

Page 48: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 48

Data Preparation Tasks

• Select Data

• Clean Data

• Construct Data

• Integrate Data

• Format Data

You will typically spend most of your time on this step

Page 49: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 49

Tasks with details

• Select datao Decide which data is necessary for analysis

• Clean datao Improve data quality so it can be used for modeling

• Construct datao Derive new attributes

o Transform existing values

• Integrate datao Combine information from multiple tables

• Format datao Prepare data for tool use

Page 50: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 50

Data Preparation – Key Questions

When preparing data sources for analysis, what questions would you need to ask?

Page 51: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 51

Data Preparation – Key Questions

• What should my data look like to enable me to do the analysis?

• What specific data do I need to select for the analysis? Why?

• Which fields are necessary for my analysis?

• What data am I lacking?

• Do I have enough data?

• Do I have duplicates?

• What fields do I need to derive?

• What do I do with null values? (discard, impute, etc.)

• Do I need to combine data from multiple sources?

Page 52: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 52

Data Preparation – Key Questions

What are 3 data prep steps you will need to accomplish?

• What should my data look like to enable me to do the analysis?

• What specific data do I need to select for the analysis? Why?

• Which fields are necessary for my analysis?

• What data am I lacking?

• Do I have enough data?

• Do I have duplicates?

• What fields do I need to derive?

• What do I do with null values? (discard, impute, etc.)

• Do I need to combine data from multiple sources?

You have 5 minutes

Page 53: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Agenda

53

Time Items Goals

1:30 to 1:45 Introductions Introduce ourselvesReview workshop goals

1:45 to 2:00 What is Big Data? Define Big DataIdentify your top Big Data challenges

2:00 to 3:00 CRISP-DM• Business Understanding• Data Understanding• Data Preparation

Introduction to Data Mining and CRISP-DMApply CRISP-DM to your real-world challenges

3:00 to 3:30 Break Relax and reenergize

3:30 to 4:00 CRISP-DM• Modeling• Evaluation• Deployment

Apply CRISP-DM to your real-world challenges

4:00 to 4:30 Wild-card discussion and conclusion

Discuss top-of-mind topicsWrap up the workshop

Page 54: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 54

Break

Page 55: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Agenda

55

Time Items Goals

1:30 to 1:45 Introductions Introduce ourselvesReview workshop goals

1:45 to 2:00 What is Big Data? Define Big DataIdentify your top Big Data challenges

2:00 to 3:00 CRISP-DM• Business Understanding• Data Understanding• Data Preparation

Introduction to Data Mining and CRISP-DMApply CRISP-DM to your real-world challenges

3:00 to 3:30 Break Relax and reenergize

3:30 to 4:00 CRISP-DM• Modeling• Evaluation• Deployment

Apply CRISP-DM to your real-world challenges

4:00 to 4:30 Wild-card discussion and conclusion

Discuss top-of-mind topicsWrap up the workshop

Page 56: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 56

Modeling

Page 57: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 57

Translating business problems to data mining goals

Problem Example

Predict (Propensity) Which people will become customers? Who will leave the company?

(Estimation) What do we forecast as a future network load?

Classify How to profile or describe the customers who belong to known groups of interest (e.g., high profit/low profit/loss making)?

Segment Which customers form groups which have highly similar members?

Associate Which products or services are bought together?

Sequence Find the most common sequences of events and see how they play out for a given customer’s behavior.

Page 58: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 58

Matching data mining goals to analytic approaches

Problem Example Approach

Predict (Propensity) Which people will become customers? Who will leave the company?

(Estimation) What do we forecast as a future network load?

C5.0, C&RT, Neural Networks

C&RT, Neural Networks, Linear Regression

Classify How to profile or describe the customers who belong to known groups of interest (e.g., high profit/low profit/loss making)?

Neural Networks, C5.0, C&RT, Logistic Regression

Segment Which customers form groups which have highly similar members?

Kohonen (Self-Organizing) Mapping, K-Means Clustering, Two-Step Clustering, C5.0, C&RT

Associate Which products or services are bought together?

Apriori, GRI

Sequence Find the most common sequences of events and see how they play out for a given customer’s behavior.

Sequence (CARMA)

Page 59: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Focus of Modeling

• Select and apply the modeling techniques - typically, there are several techniques for the same type of data mining problem

• Some techniques have specific requirements on the form of data…

• So it’s often necessary to go back to the data preparation phase

59

Page 60: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 60

Modeling Tasks

• Select Modeling Technique

• Generate Test Design

• Build Model

• Assess Model

Page 61: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 61

Tasks with details

• Select Modeling Technique

o Select the specific modeling technique

• Generate Test Design

o Separate the dataset into train and test sets

o Build the model on the train set

o Estimate its quality of the test set

• Build Model

o Run the modeling tool on the dataset

• Assess Model

o Use evaluation criteria to assess the model or models

Page 62: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Frame the business problem

• Understand the "real“ problem, (not just the data mining/modeling tasks).

• In some cases, studying the problem may reveal that a model or other sophisticated analysis is not needed.

• In most cases, the model will only be one part of a larger solution.

• A predictive model or clustering mechanism must be aligned with the structure of the overall solution.

62

Page 63: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Validate your approach

Data mining models always require validation

o Test model performance against new/unseen data to prove that it works

o This means partition your data into “Training” and “Validation” sets

63

Page 64: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 64

Modeling – Key Questions

When selecting a modeling technique, what key questions would you ask?

Page 65: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 65

Modeling – Key Questions

• What type of problem am I trying to solve?

• What measurement type is my target field?

• What measurement types are my input fields?

• How should I partition the data?

• What models should I use?

• How will I evaluate them?

• How will I explain models to my leadership?

Page 66: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 66

Modeling – Key Questions

Which model might you use and why?

• What type of problem am I trying to solve?

• What measurement type is my target field?

• What measurement types are my input fields?

• How should I partition the data?

• What models should I use?

• How will I evaluate them?

• How will I explain models to my leadership?

You have 5 minutes to select a model

Page 67: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 67

Evaluation

Page 68: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Focus of Evaluation

• At this phase, one or more high quality models have been built

• Before proceeding, thoroughly evaluate the models to be certain they achieve the business objectives

• Determine if there is an important business issue that has not been sufficiently addressed

• At the end of this phase, reach a decision on the use of the data mining results

68

Page 69: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 69

Evaluation Tasks

• Evaluate Results

• Review Process

• Determine Next Steps

Page 70: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 70

Tasks with details

• Evaluate Results

o Assess degree to which model meets business objectives

• Review Process

o Check to see if an important factor or task has been overlooked

o Conduct QA

• Determine Next Steps

o Decide whether or not to move to deployment

Page 71: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 71

Evaluation – Key Questions

Once you’ve modeled the data, what questions would you ask to evaluate your findings?

Page 72: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 72

Evaluation – Key Questions

• How do I evaluate the results?

• What should I do if I get no results?

• What tools do I use?

• Which model is best?

• Does the model make sense from a business standpoint?

• Are the results in an actionable form?

Page 73: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 73

Evaluation – Key Questions

How will you evaluate your results?

• How do I evaluate the results?

• What should I do if I get no results?

• What tools do I use?

• Which model is best?

• Does the model make sense from a business standpoint?

• Are the results in an actionable form?

You have 5 minutes

Page 74: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 74

Deployment

Page 75: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Focus of Deployment

• Creation of model is generally not end of project

• Even if purpose of model is to increase knowledge of data, this knowledge will need to be organized and presented in a way business owner can use it

• Depending on requirements - deployment phase can be as simple as generating a report, or as complex as implementing a repeatable data mining process

• In many cases, it will be business owner, not data analyst who will carry out deployment steps

• Business owner needs to understand up front what actions need to be carried out to make use of models

75

Page 76: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 76

Deployment Tasks

• Plan Deployment

• Plan Monitoring and Maintenance

• Produce Final Report

• Review Project

Page 77: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 77

Tasks with details

• Plan Deploymento Summarize your deployment strategy/steps

• Plan Monitoring and Maintenanceo Summarize your monitoring and maintenance strategy/steps

• Produce Final Reporto Produce a final written report

• Review Projecto Assess what happened

o Identify areas for improvement

Page 78: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 78

Deployment – Key Questions About Findings

Before presenting your findings, what questions would you want to answer?

Page 79: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 79

Deployment – Key Questions About Findings

• What are the insights that emerged from this analysis?

• Do they answer the business problem that I set out to solve?

• Do they answer another business problem?

• Are they actionable?

• Who is affected by this? (audience for presentation)

• Are there political sensitivities about these insights?

• What is the most compelling way to present these to my management team?

• How will we measure success?

Hint: The “so what” needs to be upfront; include recommendations for action as appropriate.

Page 80: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 80

Deployment – Key Questions About Findings

What would a key insight look like and how would you put it into action?

• What are the insights that emerged from this analysis?

• Do they answer the business problem that I set out to solve?

• Do they answer another business problem?

• Are they actionable?

• Who is affected by this? (audience for presentation)

• Are there political sensitivities about these insights?

• What is the most compelling way to present these to my management team?

• How will we measure success?

You have 5 minutes to complete this

Page 81: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Congratulations! You have used the CRISP-DM

process to solve a Big Data business problem!

Next Steps:

• Collect your notes and present to leadership and relevant stakeholders

• Initiate project, adhering to CRISP-DM process

• Measure success

• Bask in professional glory

• Retire early and move to Kauai

81

Page 82: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Agenda

82

Time Items Goals

1:30 to 1:45 Introductions Introduce ourselvesReview workshop goals

1:45 to 2:00 What is Big Data? Define Big DataIdentify your top Big Data challenges

2:00 to 3:00 CRISP-DM• Business Understanding• Data Understanding• Data Preparation

Introduction to Data Mining and CRISP-DMApply CRISP-DM to your real-world challenges

3:00 to 3:30 Break Relax and reenergize

3:30 to 4:00 CRISP-DM• Modeling• Evaluation• Deployment

Apply CRISP-DM to your real-world challenges

4:00 to 4:30 Wild-card discussion and conclusion

Discuss top-of-mind topicsWrap up the workshop

Page 83: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 83

Wild-Card

Discussion

Page 84: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 84

Brandon Purcell 510.926.2694 [email protected]

Office 877.676.3743

Web

Blog

beyondthearc.com

Beyondthearc.com/blog

:: Twitter :: LinkedIn :: Facebook

Thank you

Please keep in touch!

Page 85: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 85

Appendix

Page 86: Big Data Monetization in Telecoms

Beyond the Arc, Inc.

Determine Business

Objectives•Background•Business Objectives•Business Success

Criteria

Assess Situation •Inventory of Resources•Requirements,

Assumptions, andConstraints•Risks and Contingencies•Terminology•Costs and Benefits

Determine Data Mining Goal

•Data Mining Goals•Data Mining Success

Criteria

Produce Project Plan•Project Plan•Initial Asessment of

Tools and Techniques

Collect Initial Data•Initial Data Collection

Report

Describe Data•Data Description Report

Explore Data•Data Exploration Report

Verify Data Quality •Data Quality Report

Data Set•Data Set Description

Select Data •Rationale for Inclusion

/ Exclusion

Clean Data •Data Cleaning Report

Construct Data•Derived Attributes•Generated Records

Integrate Data•Merged Data

Format Data•Reformatted Data

Evaluate Results•Assessment of Data

-Mining Results w.r.t. -Business Success

Criteria•Approved Models

Review Process•Review of Process

Determine Next Steps•List of Possible ActionsDecision

Plan Deployment•Deployment Plan

Plan Monitoring and Maintenance

•Monitoring and Maintenance Plan

Produce Final Report•Final Report•Final Presentation

Review Project•Experience

Documentation

Deployment

Select ModelingTechnique

•Modeling Technique•Modeling Assumptions

Generate Test Design•Test Design

Build Model•Parameter Settings•Models•Model Description

Assess Model•Model AssessmentRevised Parameter

Settings

CRISP DM Process

Business Understanding

DataUnderstanding

DataPreparation

EvaluationModeling Deployment

Page 87: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 87

Model Matrix

Output(Target) Input

Clear results

structure? Algorithm Goal Results

Flag or NominalFlag, or Nominal and Continuous Yes

C5.0, CHAID, C&RT,QUEST Predict / Profile

Rule Set or Decision Tree with predicted category

and associated confidence (CHAID, C&RT, QUEST

can be built interactively)

Flag or NominalFlag, or Nominal and Continuous No

Decision List, Neural Network, SVM, Bayes Net, Discriminant, KNN Predict / Profile

Ranked set of rows, more opaque solution

structure

Flag or NominalFlag, or Nominal and Continuous No SLRM Campaign modeling Prediction of response yes/no

ContinuousContinuous and Flag or Nominal Yes C&RT, CHAID Predict / Profile

Decision Tree with mean predictions and

associated variance (CHAID, C&RT, QUEST can be

built interactively)

ContinuousContinuous (and Flag or Nominal) Yes Linear Regression Predict Equation for mean prediction with coefficients

ContinuousContinuous (and Flag or Nominal) No

Neural Network, Generalized Linear Model, SVM, KNN Predict Numeric prediction

Flag or NominalContinuous (and Flag or Nominal) Sort of Logistic Regression Predict

Equation for prediction of probability and

associated coefficients

Flag or Nominal or Continuous

Continuous,Flag, or Nominal No Neural Network Predict

Prediction and relative importance of input

variables, but no equation or tree (black box

solution)

NoneContinuous,Flag, or Nominal Sort of Kohonen Map Cluster

Cluster Membership represented as X and Y

coordinates

None Numeric Yes K-Means Cluster Cluster Membership

NoneContinuous,Flag, or Nominal Yes Two-Step Cluster Cluster Membership

Page 88: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 88

Model Matrix – Specialty algorithms

Output(Target)

Input Clear results

structure?

Algorithm Goal Results

Flag or Nominal Flag or Nominal Yes Apriori, CARMA Associate Association with confidence

Flag or Nominal

Flag or Nominal with time sequence Yes Sequence (CARMA) Sequence Sequence Association with confidence

Continuous Continuous Sort of

Time SeriesExp SmoothingARIMA Forecast

Equation and future predictions with confidence

intervals,

line graph

Numeric Continuous Yes Cox RegressionTime until something happens Predicted time to event

None Continuous Sort of Factor / PCA

Reduce number of variables, remove correlation

Variable groupings that make up factors,

Continuous score for each factor to use in next

model

NoneContinuous,Flag, or Nominal Sort of

Feature SelectionAnomaly Detection Detect data problems Filter node for variables, flag for outlier cases

Page 89: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 89

Approaches to business problems–C5.0

Approach More examples Strengths Watch out for…

C5.0 •Key behaviors of customers who are likely

to leave

•Customer acquisition profiles

•Improve profitability with a targeted

message to customers

•Discover best niche markets

•Discover unusual segments for better

business strategy

•Identify most important variables from a

larger set by using those which appear

higher in tree and by using the Rule Set

generator

Gives a clear explanation in

the form of a rule set or

decision tree.

Works well with :•Complicated data

• Nonlinear data

Works with small-cell

designs

Allows for multiple rules to

fire and can select the best

rule by voting

Can help you generate

hypotheses and insight from

the rules

Important variables may be

combined with a derive

node and / or dropped in

later models to gain

increased insight.

The rule set is not built

directly from the tree but

from a subset of the tree that

may be better for real data.

If two numeric input

variables are highly

correlated, or if two

symbolic variables are

closely related, C5.0 will use

only one of them for the tree

and drop the other.

Classification results are just

as good, but it may be the

case that the dropped

variable is easier to obtain

for deployment. You may

wish to test the data in C&RT

as well to see if there is a

variable substitution effect

as described above.

Page 90: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 90

Approaches to business problems–C&RT

Approach More examples Strengths Watch out for…

C&RT •Key behaviors of customers who are

likely to leave

•Customer acquisition profiles

•Improve profitability with a targeted

message to customers

•Discover best niche markets

•Discover unusual segments for

better business strategy

•Identify most important variables

from a larger set by using those

which appear higher in the tree

Gives a clear explanation

in the form of a rule set

or decision tree.

Works well with: • Complicated data

• Nonlinear data

Works with small-cell

designs

Accepts a numeric or a

symbolic target

You can explore

surrogate variables the

model would have used

if the variable it did use

was not available

Only gives binary splits

in the tree

Page 91: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 91

Approaches to business problems–Linear Regression

Approach More examples Strengths Watch out for…

Linear

Regression

•Detect fraudulent transactions by

looking at outliers and poorly

predicted cases

•Predict customer behavior within

the observed range of example data

•Which variables are most important?

•What-if scenarios by substituting

new values into the regression

equation

•What is the amount of expected

change in the outcome variable when

one of the inputs changes?

Works well with linear

data

Gives an easily

understood equation

with effects for each

input, controlling for the

other variables

Can not make reliable

predictions outside the

observed range of

inputs

Can not have a two

category or dummy

coded target; use

Logistic Regression

instead.

Can not have highly

correlated input

variables; use PCA to

remove this problem.

Does not work well with:• Complicated data• Nonlinear data• Time series forecasting

Page 92: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 92

Approaches to business problems–Logistic Regression

Approach More examples Strengths Watch out for…

Logistic

Regression

•Predict probability of one behavior

vs. another (e.g., cancel vs. active)

•“What if…” scenarios

•Discover influential factors

pertaining to desired outcome

•Predict probabilities for multi-

category targets (e.g., cancel, active,

terminate)

Gives an equation

evaluating tradeoffs

(changes in odds) for a

given combination of

inputs.

Allows for many dummy

coded variables as

inputs

Widely understood

The equation gives the

natural log of the odds

ratio which is not always

easily understood

Page 93: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 93

Approaches to business problems–Neural Network

Approach More examples Strengths Watch out for…

Neural

Network

•Use propensity of customer to churn

to select best offer for save

•Evaluate risk of customer to defect

•Discover unusual customers

potentially associated with fraud

•Identify best potential customers

•Identify most important variables

using relative importance measure

Works well with: • Complicated data

• Nonlinear data

Often has high accuracy

Has many different

topology methodologies

to pick from

Gives a “black box” or

unexplainable solution.

Too many input

categories may cause

the model to over fit the

data

Page 94: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 94

Approaches to business problems–Kohonen Map

Approach More examples Strengths Watch out for…

Kohonen

Map

•Discover similarities in customer

behavior

•Create new segment variable to use

as input for further analysis

•Use segments to create targeted

messages

Works well with: • Complicated data

• Nonlinear data

Based on similar

patterns

You may experiment in

determining the ideal

number of clusters by

exploring more than one

map layout

You may use rule

induction methods or

graphical techniques to

profile segments

Cluster understanding is

still necessary to

interpret findings

Page 95: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 95

Approaches to business problems–K-Means Clustering

Approach More examples Strengths Watch out for…

K-Means

Clustering

•Create new segment variable to use

as input for further analysis

•Use segments to create targeted

messages

Searches for cases that

are close together in

multi-dimensional space

using a distance

measure

Uses fewer data passes

than traditional

hierarchical clustering

User must determine

ideal number of clusters

by exploring more than

one solution (choosing

different values of K and

re-running the analysis)

Cluster understanding is

still necessary to

interpret findings

You may use rule

induction methods or

graphical techniques to

profile segments

Page 96: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 96

Approaches to business problems–Two-Step Clustering

Approach More examples Strengths Watch out for…

Two-Step

Clustering

Create new segment variable to use

as input for further analysis

Use segments to create targeted

messages

Searches for cases that

are close together in

space

Uses only 1 data pass

Uses statistical criterion

to determine ideal

number of clusters

Data should be

randomized before the

analysis

Cluster understanding is

still necessary to

interpret findings

You may use rule

induction methods or

graphical techniques to

profile segments

Page 97: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 97

Approaches to business problems–Apriori

Approach More examples Strengths Watch out for…

Apriori •Market Basket Analysis

•Identify behaviors associated with a

particular outcome

•Identify features used together

•Identify “hot spots”

Data can be

transactional or tabular

A time field may be

incorporated to tell the

model when events

happen at the same

time.

Can control

interestingness of rules

from a variety of

perspectives

Can evaluate flags as

true-only or as presence

and absence

Only symbolic inputs

and conclusions

Page 98: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 98

Approaches to business problems–GRI

Approach More examples Strengths Watch out for…

GRI •Market Basket Analysis

•Identify thresholds of inputs

associated with particular behaviors

•Identify behaviors associated with a

particular outcome

•Identify features used together

Data can be

transactional or tabular

Can control length,

coverage and confidence

of rules

Can evaluate flags as

true-only or as presence

and absence

While inputs may be

symbolic or numeric,

conclusions may only be

symbolic

Page 99: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 99

Approaches to business problems–Sequence Detection

Approach More examples Strengths Watch out for…

Sequence

Detection

•Use insight to streamline business

process.

•Discover unusual sequences

indicating areas for business

improvement

Order of events must be

recorded though not

necessarily timing.

Thousands of sequences

can be evaluated

Uses the CARMA

algorithm

It may take a long time

or a lot of memory

Page 100: Big Data Monetization in Telecoms

Beyond the Arc, Inc. 100

Approaches to business problems–Factor/PCA

Approach More examples Strengths Watch out for…

Factor /

Principal

Components

Analysis

(PCA)

•Remove correlation between

independent variables

•Discover which inputs are most

important for each underlying factor

Place the PCA nugget in

your stream followed by

a Type node. Use the

variables created by the

PCA node as inputs into

your model. Remove

(set to None) the

variables that were used

as input for the PCA

model.

Can make direct

interpretation of the

effect of inputs very

awkward (what is the

meaning of a unit

change in a factor

score?). Consider using

the mean of the

variables with the

highest inputs on each

factor, excluding the

variables used on other

factors, to create an

interpretable set of

variables.