can we automate predictive analytics
Post on 08-Aug-2015
149 Views
Preview:
TRANSCRIPT
CAN WE AUTOMATE PREDICTIVE ANALYTICS? Thomas W. Dinsmore
O P E ND A T AS C I E N C EC O N F E R E N C E_ BOSTON 2015
@opendatasci
Can we automate predictive analytics?• Buzz about automation
• Degrees of automation
• Some history
• Where we are today
• The last mile
• The impact of automation
Thomas W. Dinsmore
Thomas W. Dinsmore
Thomas W. Dinsmore
Now
Future
Never
0% 20% 40% 60% 80%
19%
76%
5%
0%
8%
15%
23%
30%
Years1-2 2-5 5-10 10-20 20-50 >50
6%8%
16%
28%
14%
4%
When will most expert level data scientist tasks…be automated?
Source: kdnuggets.comThomas W. Dinsmore
– Mark Ansermino, Director of Pediatric Anesthesia, University of British Columbia
“We are convinced the machine can do better than human anesthesiologists”
Thomas W. Dinsmore
Thomas W. Dinsmore
Thomas W. Dinsmore
Thomas W. Dinsmore
Thomas W. Dinsmore
Thomas W. Dinsmore
Thomas W. Dinsmore
Thomas W. Dinsmore
Levels of Autonomy• Level 0: Driver completely controls
• Level 1: Individual controls automated
• Level 2: At least two controls automated together
• Level 3: Driver can cede control under certain conditions
• Level 4: Vehicle controls all functions for the entire trip
National Highway Traffic Safety Administration
Thomas W. Dinsmore
1995: Unica PRW
• Optimized neural network specification
• 1998: branded as Model One
• Automated model selection
• Now called IBM PredictiveInsight (Enterprise Marketing Management)
Thomas W. Dinsmore
Late 1990s: MarketSwitch
• “Fire your SAS programmers!”
• “Russian rocket scientists”
• Bought by Experian
• Automation replaced by services
Thomas W. Dinsmore
Late 1990s: KXEN• Structural risk minimization for
model selection
• Original release: rudimentary UI
• Repositioned as easy to use tool for marketers
• SAP purchased for $40 million in 2013
Thomas W. Dinsmore
SAS and SPSSSAS Rapid Modeler
• Add-in to SAS Enterprise Miner
• Macros for outlier ID, missing value treatment, variable selection and model selection
• User specifies data set, response measure and depth of search
SPSS Modeler
• Automated data prep features handle missing value treatment, outlier ID, date/time prep, binning, etc
• Auto Classifier, Auto Numeric and Auto Cluster handle model selection across defined search plan
Thomas W. Dinsmore
Open Sourcecaret
• R package
• Suite of tools to automate model selection
• Includes preprocessing tools for tasks like dummy coding and feature selection
• Supports 40+ R packages, ~ 200 techniques
MLBase
• Joint project of AMPLab and Brown DMRG
• Develop scalable machine learning platform on Spark
• ML Optimizer translates user spec into a test plan
• Currently in development (alpha release postponed from 2014)
Thomas W. Dinsmore
Startups
Thomas W. Dinsmore
DataRobot• Builds smart test plans
• Seeded with library of Kaggle-winning techniques
• Users can add or extend techniques with R or Python
• Leverages clusters to quickly run large-scale experiments
• User controls depth of automation
• Designed for rapid model deployment and integration
Thomas W. Dinsmore
Levels of Autonomy• Level 0: Analyst completely controls
• Level 1: Individual features automated
• Level 2: At least two features automated together
• Level 3: Analyst can cede control under certain conditions
• Level 4: Platform controls all functions end to end
Predictive Analytics Platforms
Thomas W. Dinsmore
Level 4 Automated Analytics
Model Scoring
• Predictive models developed offline
• Models uploaded through PMML
• Scoring built into an automated process
Unsupervised Learning
• Anomaly detection
• Social networks
• Topic modeling or taste profiles for personalization
Thomas W. Dinsmore
“Data science is 1% science
and 99% data.”
Thomas W. Dinsmore
Data sources are complex and diverse
Thomas W. Dinsmore
Enterprise data:
Thomas W. Dinsmore
It’s still a mess.
Thomas W. Dinsmore
For good results, analytic methods require specific transformations
Logistic Regression
Naive Bayes Classifier
Dummy code categorical predictors
Bin numeric predictors
Thomas W. Dinsmore
We can pre-build data source connections
Thomas W. Dinsmore
Conventional Wisdom
• For good results, make the data perfect, e.g.:
• Find and remove anomalies
• Replace missing data
• Consumes time, but worth it
The Right Way
• Investigate and act on anomalies, but do not remove them
• Use techniques that can handle missing data
• Your predictive model has to work with dirty data, you should too
Work with data “as is”
Thomas W. Dinsmore
Data Marshaling Data Cleansing Data Transformation Model Training Model Selection
Model Training
Model Training
{ }The Conventional Wisdom Test and Learn
Bring data transformation into the test and learn cycle
Thomas W. Dinsmore
Data Marshaling Data Cleansing
Data Transformation
Model Training Model Selection
Model Training
Model Training
{ }Test and Learn
Data Transformation
Data Transformation
Bring data transformation into the test and learn cycle
Thomas W. Dinsmore
“The doctor will see you now.”Thomas W. Dinsmore
How often are results of your analytics used?
0% 25% 50% 75% 100%
1%5%28%50%16%
Always Most of the time Sometimes Rarely Never
2013 Rexer Data Miners Survey
Thomas W. Dinsmore
Why your analysis isn’t used
• You do not understand the client’s business problem
• You do not understand the deployment environment
• The client does not understand your work
Thomas W. Dinsmore
Automation lets data scientists spend more time collaborating,
less time crunching
Wrangle the dataDefine
the problem
Explain your work
Develop models
From this:
Wrangle the dataDefine the problem Explain your workDevelop
models
To this:
Thomas W. Dinsmore
Can we automate predictive analytics?• Buzz about automation
• Degrees of automation
• Some history
• Where we are today
• The last mile
• The impact of automation
• We already have — almost
• The last mile is a steep challenge
• Automation will not replace data scientists — it will make them more effective
Thomas W. Dinsmore
Questions
Thomas W. Dinsmore
Thank You
Thomas W. Dinsmore
The Big Analytics Blog: www.thomaswdinsmore.com
email: thomaswdinsmore@gmail.com
@thomaswdinsmore
CAN WE AUTOMATE PREDICTIVE ANALYTICS? Thomas W. Dinsmore
O P E ND A T AS C I E N C EC O N F E R E N C E_ BOSTON 2015
@opendatasci
top related