data mining for business analydaytics - week 1 (1)
DESCRIPTION
data mining basicsTRANSCRIPT
Introduction to Data Mining
Data Mining for Business Analytics
Introduction to Data Science
Kent State University
Spring 2015 – Class 1
These slides incorporate the result of input/collaborations with Maytal Saar-Tsechansky,
Claudia Perlich, and Foster Provost.
An example business problem
• TelCo, a major telecommunications firm, wants to investigate its problem with customer attrition, or “churn”
• Let’s consider this for now as a marketing problem only
How would you go about targeting some customers with
a special offer, prior to contract expiration? Think about
what data should be available for you to use.
Moneyball
•The story of Oakland A's general manager Billy Beane's successful attempt to put together a baseball club on a budget by employing computer-based data analysis to draft his players.
Roles in Data Science
• Data Scientist• Understand the potential
• Can translate from business to execution
• Ability to evaluate proposal and execution
• Can do the actual modeling
• Applied statistician x computer scientist
Roles in Data Science
• Data Scientist• Understand the potential• Can translate from business to execution• Ability to evaluate proposal and execution• Can do the actual modeling• Applied statistician x computer scientist
• Collaborator in a data-centric project• Can translate from business to the execution
• Managing a data-mining project• Understanding the potential• Ability to evaluate a proposal and execution• Ability to interface with broad variety of people
• Strategist, Investor, …• Envisions opportunities, come up with novel ideas, evaluate the promise of new ideas, design data
science project / companies conceptually
Learning Goals
• Approach business problems data-analytically• Think carefully & systematically about whether & how data can improve
performance
• Be able to interact competently on the topic of data mining for business analytics
• Know the basics of the data mining processes, techniques, and concepts well enough
• Receive hands-on experience mining data• You should be able to follow up on ideas or opportunities that present
themselves
From data & business to strategy
HURRICANE FRANCES was on its way, barreling across the Caribbean, threatening a direct hit on Florida's Atlantic coast. Residents made for higher ground, but far away, in Bentonville, Ark., executives at Wal-Mart Stores decided that the situation offered a great opportunity for one of their newest data-driven weapons, something that the company calls predictive technology.
A week ahead of the storm's landfall, Linda M. Dillman, Wal-Mart's chief information officer, pressed her staff to come up with forecasts based on what had happened when Hurricane Charley struck several weeks earlier. Backed by the trillions of bytes' worth of shopper history that is stored in Wal-Mart's data warehouse, she felt that the company could "start redicting what's going to happen, instead of waiting for it to happen," as she put it.
From NY Times
Why data mining? Why now?
• Confluence of 4 technical advances• Storage
• Disk densities have been doubling each year
• A $100 disk today has over 1,000,000x more capacity/$ than the disks of 30 years ago
• Networking• Data can be transferred easily between collection, storage, and use eBusiness systems do it as a
matter of course (w/ fewer data errors)
• Algorithms• Advanced algorithms from machine learning, pattern recognition, and applied statistics have
become mature enough for mainstream use
• Computing power• Processing power has been doubling every 1.5 years or so (Moore’s law)
• Laptops are more powerful than the supercomputers of yesteryear
• Note: all four of these are essential for effective, successful data mining
What really is Data Mining
A process for using information technology to extract useful (non-trivial, hopefully actionable) knowledge from large bodies of data
• A set of principles, concepts, and techniques that structure thinking and analysis of data
• Extracts useful information and knowledge from large volumes of data by following a process with reasonably well defined steps
• Changes the way you think about data and its role in business
Data Mining Process Outline
• Business Understanding
• Data Understanding
• Data Preparation
• Modeling
• Evaluation
• Deployment
Types of Data Mining TasksMany business problems have as an important component one of these data mining tasks:
• Affinity grouping (a.k.a. “associations”, “market-basket analysis”)• What items are commonly purchased together?
• Similarity Matching• What other companies are like our best small business customers?
• Description/Profiling• What does “normal behavior” look like?
• Clustering• Do my customers form natural groups?
• Predictive Modeling (including causal modeling & link prediction)• Will customer X churn next month/default on her loan?
• How much would prospect X spend?
• Who might be good “friends” on our social networking site?
Un
sup
erv
ise
dS
up
erv
ise
d
This is NOT a course about…
• Statistics
• Database Querying• SQL
• Data Warehousing
• Regression Analysis• Explanatory vs. Predictive Modeling
Data Mining versus…
• Data Warehousing / Storage• Data warehouses coalesce data from across an enterprise, often from
multiple transaction-processing systems
• Querying / Reporting (SQL, Excel, QBE, other GUI-based querying)• Very flexible interface to ask factual questions about data
• No modeling or sophisticated pattern finding
• Most of the cool visualizations
• OLAP – On-line Analytical Processing• OLAP provides easy-to-use GUI to explore large data collections
• Exploration is manual; no modeling
• Dimensions of analysis preprogrammed into OLAP system
Data Mining versus…
• Traditional statistical analysis• Mainly based on hypothesis testing or estimation / quantification of
uncertainty
• Should be used to follow-up on data mining’s hypothesis generation
• Automated statistical modeling (e.g., advanced regression)• This is data mining, one type – usually based on linear models
• Massive databases allow non-linear alternatives
Answering business questions with these
techniques…
• Who are the most profitable customers?• Database querying
• Is there really a difference between profitable customers and the average customer?
• Statistical hypothesis testing
• But who really are these customers? Can I characterize them?• OLAP (manual search), Data mining (automated pattern finding)
• Will some particular new customer be profitable? How much revenue should I expect this customer to generate?
• Data mining (predictive modeling)