machine learning streams with spark 1.0
DESCRIPTION
TRANSCRIPT
Seattle Spark Meetup Machine Learning Streams with Spark 1.0 Drew Minkin Principal Program Manager, Ubix Labs
A Frost Venture Partners Company 01.14 | Revision 10.0 | Confidential and Proprietary Information
Machine Learning and Business Analytics Streams and Real Time Analytics Deep Dive into MLlib
AGENDA
Machine Learning and Business Analytics
Machine Learning is Not A Spectator Sport
Machine Learning and Data Science
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Reactive Proactive
Prod
uctio
n Re
sear
ch
The Analytics Spectrum
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Graph
Data Management
Simulation
Process Improvement Content Delivery
Knowledge Management
Data Modeling
Visualization
Data Quality
Monitoring
Analysis
Optimization
Algorithms
Trialing
Statistics
Domain Expertise
Integration
Big Data
Collaboration
Descriptive Predictive Prescriptive
Five Families of Algorithms
http://en.wikipedia.org/wiki/Wu_Xing
Association
Classification
Estimation
Forecasting
Clustering
Classification
http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/
Target a Discrete Answer –Yes/No § Find All Columns Driving its Value § Use model to score new records
§ Many Different Measures of Accuracy § Quick and Improving Iterations § Most Actionable Types of Models
§ Hospital Readmission § Equipment Failure § Likelihood to purchase
Examples
Credit Scoring Banding
Association and Sequencing
http://38.media.tumblr.com/tumblr_m81wcfIO3V1qmzwx0o1_1280.jpg
Examples § Collaborative Filtering § Identify cross-sell § Identify sequential, next-sale § Make purchase recommendations § Complex event associations
§ Transactions and items in § Rules, Sequences and Itemsets out
Recommender Systems
Forecasting and Time Series
http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/
• Input of measure over time and related series • Predictions generated for short term trends • Based on cycles and events
Examples § Workforce Optimization § Timing Purchasing Decisions § Optimizing Maintenance Windows § Material Cost Planning § Equipment Usage Planning
Demand Sensing
Estimation and Regression
http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/
Predicting a Continuous Distribution § Many Different Measures of Accuracy § Quick and Improving Iterations § Most Actionable Types of Models
§ Length Of Stay Estimation § Customer Lifetime Value
Examples
Pricing Optimization
Clustering
http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/
§ Hard and Soft Groupings § Profiles of Subgroups § Likenesses and Differences
Examples • Marketing Campaigns • Reward Programs • Equipment Utilization • Process Improvement Analysis
Market Segmentation
Combining Algorithms in Harmony
http://en.wikipedia.org/wiki/Wu_Xing
Streams and Real Time Analytics
A Frost Venture Partners Company 01.14 | Revision 10.0 | Confidential and Proprietary Information
The Challenges of Scaling Analytics Classes of Analytics Complexity Spark vs. Storm, etc. Stream Paradigms and Spark
AGENDA
Streams and Real Time Analytics
Will Business Run out of Modeling Opportunities?
The Approaching Crisis for Machine Learning
Hype vs. Reality in Scaling Data Science
http://www.kdnuggets.com/2013/04/poll-results-largest-dataset-analyzed-data-mined.html
2009 vs. 2014 Scaling Data Science
http://www.kdnuggets.com
Spectrum of Stream Based Analytics La
tency
Events/Sec
Months Days Hours Minutes Seconds 100 ms < 1 ms
0 10 102 103 104 105 106
Big Data NoSQL RDBMS
Business Monitoring
Machine Monitoring
Real Time Monitoring
Web Analytics
EDW Analytics
Operational Analytics
http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx
Challenges of Stream Based Applications
http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx
Devices
Sensors Web servers
Feeds
Complex Analytics & Mining
Challenges of Stream Based Applications
http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx
Hopping Windows
Tumbling Windows
Event Synchronization Latency Time Window Management
Deep Dive into MLlib
A Frost Venture Partners Company 01.14 | Revision 10.0 | Confidential and Proprietary Information
Architecture Descriptive Analytics Predictive Analytics Prescriptive Analytics
AGENDA
Deep Dive into MLlib
MLlib Descriptive Analytics
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Graph
Data Management
Simulation
Process Improvement
Reactive Proactive
Prod
uctio
n Re
sear
ch
Content Delivery
Knowledge Management
Data Modeling
Visualization
Data Quality
Monitoring
Analysis
Optimization
Algorithms
Trialing
Statistics
Domain Expertise
Integration
Big Data
Collaboration
MLlib Descriptive Analytics - Data Types
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Vectors • Dense
MLlib Descriptive Analytics - Data Types
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Vectors • Sparse
MLlib Descriptive Analytics - Data Types
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Linear Algebra • CoordinateMatrix • DistributedMatrix • IndexedRow • IndexedRowMatrix • MatrixEntry • RowMatrix
MLlib Descriptive Analytics – Summary Statistics
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Sample size Maximum value of each column Sample mean vector Minimum value of each column Number of nonzero elements Sample variance vector
MLlib Descriptive Analytics - SVD
http://public.lanl.gov/mewall/kluwer2002.html
Singular Value Decomposition Can Collapse Sparse Matrices to Denser Forms
MLlib Descriptive Analytics – PCA
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Primary Component Analysis Reduces Dimensionality with Feature Selection
MLLib Predictive Analytics
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Graph
Data Management
Simulation
Process Improvement
Reactive Proactive
Prod
uctio
n Re
sear
ch
Content Delivery
Knowledge Management
Data Modeling
Visualization
Data Quality
Monitoring
Analysis
Optimization
Algorithms
Trialing
Statistics
Domain Expertise
Integration
Big Data
Collaboration
MLlib Predictive Analytics – Bayesian Classifier
http://xkcd.com/1132/
MLlib Predictive Analytics – Logistic Regression
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Grandaddy of Algorithms
Coefficients from states or exact values Small scores can make big changes
MLlib Predictive Analytics - SVM
http://www.youtube.com/watch?v=3liCbRZPrZA http://www.projectrho.com/public_html/rocket/fasterlight.php
Linear Support Vector Machine for classifiers
Behold the “kernel trick”
MLlib Predictive Analytics – Regression
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Linear Ridge
Least Absolute Shrinkage & Selection Operator
MLlib Predictive Analytics – Kmeans
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
MLlib Predictive Analytics – Matrix Factorization
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Collaborative Filtering Alternating Least Squares (ALS)
Reactive Proactive
Prod
uctio
n Re
sear
ch
Prescriptive Analytics
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Graph
Data Management
Simulation
Process Improvement Content Delivery
Knowledge Management
Data Modeling
Visualization
Data Quality
Monitoring
Analysis
Optimization
Algorithms
Trialing
Statistics
Domain Expertise
Integration
Big Data
Collaboration
MLlib Prescriptive Analytics – Gradient Descent
http://bleedingedgemachine.blogspot.com/2012/12/gradient-descent.html http://kungfupanda.wikia.com/wiki/Monkey
Linear and Nonlinear Optimization
minimize smooth functions without constraints,
MLlib Prescriptive Analytics – L-BFGS
http://graphics.utdallas.edu/sites/default/files/gpucvt.png
Limited-Memory BFGS
Nonlinear Minimize Smoothing Constraint is Memory
Notes from the MLlib Streams Field
MLlib Predictive Analytics – K Nearest Neighbor
http://www.youtube.com/watch?v=3liCbRZPrZA http://www.projectrho.com/public_html/rocket/fasterlight.php
Variation for classifiers
MLlib – A Call to Action
http://www.fanpop.com/clubs/voltron/images/2172709/title/original-fanart http://adventuretime.wikia.com/wiki/Princess_Monster_Wife
Coming Soon • Decision Trees • Model Performance Tools It Takes A Village • Time Series • Ensemble MLI