a few challenges to make machine learning easy
DESCRIPTION
Dr. Francisco J Martin: In the age of data, Machine Learning is the key component to make data-driven decisions, develop smart applications, and build predictive analytics. However, Machine Learning is complex. The current tools are complicated and do not scale well. Most solutions are costly, easily involving hundreds of thousands of dollars and substantial resources. Additionally, experts with industry experience are very scarce. BigML is building a scalable, cloud-based service that makes Machine Learning easy or, at least, lowers the barriers that most developers and business folks face to learn from data. In this talk, I will first demo BigML and then describe the efforts, highlight some of the key findings, and discuss some of the challenges from a technical, user, and business perspective, related to developing a Machine Learning service for the masses.TRANSCRIPT
June 3rd, 2013BigML Inc, 2013
Challenges to Make Machine Learning Easy
ACM San Francisco Bay Area Professional Chapter
Francisco J Martin, Ph.D.BigML Co-founder & CEO
eBay Whitman Campus
June 3rd, 2013BigML Inc, 2013 2
Expert: Published papers at KDD, ICML, NIPS, etc or developed own ML algorithms used at large scale.
Sampling the Audience
Aficionado: Understands pros/cons of different techniques and/or can tweak algorithms as needed.
Newbie: Just taking Coursera ML class or reading an introductory book to ML.
Absolute beginner: ML sounds like science fiction
Practitioner: Very familiar with ML packages (Weka, Scikit, R, etc).
June 3rd, 2013BigML Inc, 2013 3
Data, data everywhereA special report on managing information
Why make ML easy?
In the age of data, Machine Learning is the key component to:
‣ make data-driven decisions‣ develop smart applications‣ build predictive analytics
June 3rd, 2013BigML Inc, 2013
However, Machine Learning is COMPLEX:
‣tools are complicated and do not scale well‣solutions are costly‣e x p e r t s w i t h i n d u s t r y experience are scarce
4
Why make ML easy?
http://ttic.uchicago.edu/~samory/
June 3rd, 2013BigML Inc, 2013 5
Why make ML easy?
June 3rd, 2013BigML Inc, 2013 6
Why make ML easy?
April, 2013BigML Inc, 2013 7
BigML A cloud-based service that makes
Machine Learning SIMPLE
$ bigmler --train customer2012.csv \ --test new_customers.csv \ --objective churn
>>> from bigml.api import BigML>>> api = BigML()>>> source = api.create_source("s3://bigml-public/csv/sales.csv")>>> dataset = api.create_dataset(source)>>> model = api.create_model(dataset)
$ curl https://bigml.io/model?$BIGML_AUTH \ -X POST \ -H "content-type: application/json" \ -d '{"dataset": "dataset/50ca447b3b56356ae0000029"}'
June 3rd, 2013BigML Inc, 2013 8
AgendaBigML web-based interface (10-15 min)
Questions (10-15 min)
$ bigmler --train customer2012.csv \ --test new_customers.csv \ --objective churn
>>> from bigml.api import BigML>>> api = BigML()>>> source = api.create_source("s3://bigml-public/csv/iris.csv")>>> dataset = api.create_dataset(source)>>> model = api.create_model(dataset)
$ curl https://bigml.io/dataset?$BIGML_AUTH \ -X POST \ -H "content-type: application/json" \ -d '{"source": "source/50ca447b3b56356ae0000029"}'
BigML API, API Bindings, BigMLer (5 min)
Challenges (10-15 min)
#1 Machine Learning Breadth and Depth#2 User Diversity #3 Simplicity#4 Scalability #5 Measuring Impact#6 Pricing
June 3rd, 2013BigML Inc, 2013 9
How it works
June 3rd, 2013BigML Inc, 2013 10
BigML Resources
csv, arff, xlshttps, s3, azure, odata
Sources local and remote
Datasets
Stream histogramsStatistics
ModelsInteractive Compoundable Random Decision Forests
Actionable: exportable to rules, code, pmml
PredictionsForm-based PredictionsQuestion by QuestionLocal predictions
Evaluations
ClassificationRegressionComparison
June 3rd, 2013BigML Inc, 2013 11
BigML API
June 3rd, 2013BigML Inc, 2013 12
3,500+ users
35,000+ models
BigML
June 3rd, 2013BigML Inc, 2013 14
Challenges
#1 Machine Learning breadth and depth#2 User Diversity #3 Simplicity#4 Scalability #5 Measuring Machine Learning Impact#6 Pricing
June 3rd, 2013BigML Inc, 2013 15
...or you can deal with that!
#1 Supervised learning #2 Unsupervised learning#3 Semi-supervised learning #4 Reinforcement learning#5 Learning to Learn
#1 machine learning breadth and depth
June 3rd, 2013BigML Inc, 2013 16
...or you can deal with that!#1 machine learning breadth and depth
June 3rd, 2013BigML Inc, 2013 17
Phrase a problem as an ML task
The stages of an ML application
Data Wrangling
Feature Engineering
Learn from Data
Pre-evaluate
Measure Impact
June 3rd, 2013BigML Inc, 2013 18
Problems
Techniques
Applications
ClassificationRegressionClusteringDensity EstimationManifold learningActive learningetc. Just solving a couple of
problems and using a few techniques thousands of
applications can be developed
churn prevention, date matching, decision making, diagnostics, fraud detection, detecting tumors, detecting investment opportunities, human body pose estimation, pedestrian tracking, predictive analytics, recommendation systems, risk analysis, spam detection, etc
#1 machine learning breadth and depth
June 3rd, 2013BigML Inc, 2013 19
Understanding the past
Predicting the future
Why Trees first?
June 3rd, 2013BigML Inc, 2013 20
Why Trees?
June 3rd, 2013BigML Inc, 2013 21
A Machine Learning application requires more tasks (that are even more important) than just learning from data.
Just solving one problem more will enable a huge number of applications more.
What problem(s) to tackle next and which techniques to use?
#1 machine learning breadth and depth
June 3rd, 2013BigML Inc, 2013 22
Experts
Aficionados
Practitioner
Newbies
Absolute beginners
#2 user diversity
How to prioritize what to build next? More features for the
expert or simplifying more for the newbies?
June 3rd, 2013BigML Inc, 2013 23
Tim
e-to
-pro
duct
ivit
y
+
+
Expertise
#2 user diversity
June 3rd, 2013BigML Inc, 2013 24
#2 user diversity
MBs PBs
MBs PBs
Actual size
Size
Most users believe their data is much bigger than it really is
June 3rd, 2013BigML Inc, 2013 25
Num
ber
of J
obs
+
+
Size of Job
#2 user diversity
June 3rd, 2013BigML Inc, 2013 26
#3 simplicity
June 3rd, 2013BigML Inc, 2013 27
“Any fool can make something complicated. It takes a genius to
make it simple.”
― Woody Guthrie
#3 simplicity
June 3rd, 2013BigML Inc, 2013 28
‣install ‣configure‣use‣train ‣understand‣test‣pre-evaluate‣measure impact‣deploy‣scale ‣access programmatically (API)
#3 simplicity Simple means much more than a easy-to-use interface
June 3rd, 2013BigML Inc, 2013 29
#4 scalability
N CONCURRENT
JOBSfrom
1 CUSTOMER
1 JOBfrom
1 USER
N JOBSfrom
M CUSTOMERS
June 3rd, 2013BigML Inc, 2013 30
Infrastructure
June 3rd, 2013BigML Inc, 2013 31
#5 measuring machine learning impact
June 3rd, 2013BigML Inc, 2013 32
Measuring “actual” impact is complex and goes beyond traditional performance evaluation.
Imagine that an algorithm predicts that user Alice is going to buy a Magic Potion.
‣ But Magic Potions are out of stock.
‣ Should we blame ‣ the algorithm for the “false positive” prediction?‣ the data scientist for not including that feature?‣ operations for running out of stock on things that customers want to buy?
#5 measuring machine learning impact
June 3rd, 2013BigML Inc, 2013 33
Kiri Wagstaff, Machine Learning that Matters, ICML, 2012
The stages of an ML research program
Very inspirational!!!
June 3rd, 2013BigML Inc, 2013 34
Phrase a problem as an ML task
Data Wrangling
Learn from Data
The stages of an ML application
Feature Engineering
Pre-evaluate
Measure Impact !!!!!
June 3rd, 2013BigML Inc, 2013 35
#6 pricing
June 3rd, 2013BigML Inc, 2013 36
#6 pricing
June 3rd, 2013BigML Inc, 2013 37
Pre-pay-as-you-go
June 3rd, 2013BigML Inc, 2013 38
Subscriptions
June 3rd, 2013BigML Inc, 2013 39
...or you can deal with that!
BigML 1-click model
You can deal with this...
Machine Learning made easy?
June 3rd, 2013BigML Inc, 2013 40
BigML 1-click model
You can deal with this...
...or you can deal with that!
Machine Learning made easy?
June 3rd, 2013BigML Inc, 2013 41
Ease
-of-
use
+
+
2013
Machine Learning made easy?
June 3rd, 2013BigML Inc, 2013 42
Ease
-of-
use
+
+
2013 2014 2015 2016 2017 2018
Machine Learning made Easy!!!
June 3rd, 2013BigML Inc, 2013 43
Questions
June 3rd, 2013BigML Inc, 2013 44
Unknown Modelf : X -> Y
Example: ideal credit approval formula
ModelsM
Example: set of candidate credit approval formulas
Learning from Data
LearningAlgorithm
Based on Learning from Data by Y. Abu-Mostafa, M. Magdon-Ismail and H. Lin
Final Modelg ~ f
Example: learned credit approval formula
Training Examples(x1, l1), (x2, l2), ..., (xN, lN)
Example: historical records of credit customers
x1
xN
labelf1 f2 fn