real-world machine learning - leverage the features of mapr converged data platform

34
© 2016 MapR Technologies 1 © 2016 MapR Technologies 1 MapR Confidential © 2016 MapR Technologies Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform Mathieu Dumoulin ([email protected] ) Mateusz Dymczyk ([email protected] ) Hadoop Summit Tokyo 2016

Upload: hadoop-summit

Post on 07-Jan-2017

544 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 1© 2016 MapR Technologies 1MapR Confidential © 2016 MapR Technologies

Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

Mathieu Dumoulin ([email protected])Mateusz Dymczyk ([email protected])Hadoop Summit Tokyo 2016

Page 2: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 2© 2016 MapR Technologies 2MapR Confidential

Mathieu Dumoulin, Data Engineer

• Master’s degree in text classification on Hadoop at Fujitsu Canada’s Innovation Lab

• In Tokyo, I’ve worked as Data Scientist, Search Engineer and Data Engineer

• My favorite ML libs are Scikit-Learn and H2O

• 日本料理が大好き。とくに鍋としゃぶ

しゃぶです。

Page 3: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 3© 2016 MapR Technologies 3MapR Confidential

Mateusz Dymczyk, Software Engineer

• About me

Page 4: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential

A Machine Learning Pipeline

Image from scikit-learn.org

Page 5: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential

… Meets the Real World

Must be integrated with a production system

Page 6: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 6© 2016 MapR Technologies 6MapR Confidential

… Meets the Real WorldData comes from many sourcesmaybe very large

Data isn’t always labeled!

Must be integrated with a production system

Page 7: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential

… Meets the Real WorldData comes from many sources,maybe very large

Needs ETL and cleaning

Finding the best algorithm and parameters can use a lot of CPU

Data isn’t always labeled!

Page 8: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential

… Meets the Real WorldData comes from many sources,maybe very large

Needs ETL and cleaning

Finding the best algorithm and parameters can use a lot of CPU

Data isn’t always labeled!

From production systems? Is it real time?

Must be integrated with a production system

The predictions are used by another system...

Page 9: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential

Doing Machine Learning here...

Page 10: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential

Is very different than here

Page 11: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential

We don’t have better algorithms, we just have more data.

Peter Norvig, CTO at Google

Page 12: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential

Machine Learning at scale mattersGrowing number of ML use cases at successful companies

Anomaly Detection

Customer 360Fraud DetectionLog Security

Analysis

Recommender Sensor Data (IoT)

Personalized Offers

Ad Tech

Page 13: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential

ML at scale matters… but it’s HARD

Ref: http://advancedspark.com/ , https://github.com/fluxcapacitor/pipeline

Page 14: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential

There must be a better way...

Page 15: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential

A platform for big data ML

• ML projects can start simple and show value

• Just work. Integrate with existing systems, and tools

• Integrate common technology, not just YARN

• Easy, unified administration

• Share the cluster (multi-tenancy)

• Keeps your data safe and secure

What’s an ideal big data platform for ML?

Page 16: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential © 2016 MapR Technologies

MapR Converged Data Platform

Page 17: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential

MapR Converged Data Platform

Open Source Engines & Tools Commercial Engines & Applications

Utility-Grade Platform Services

Dat

aP

roce

ssin

g

Enterprise StorageMapR-FS MapR-DB MapR Streams

Database Event Streaming

Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy

Search & Others

Cloud & Managed Services

Custom Apps

Unified M

anagement and M

onitoring

Page 18: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential

Unique MapR features useful for ML

● MapR-FS and NFS mount

● Topologies

● Mirrors and Snapshots

● Reliability

● Multi-tenancy

● Data Governance

● Security

Page 19: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential

MapR MCS

● Unified view

● Easy use of features

● REST API and

maprcli utility

Page 20: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential

NFS MountMount the cluster as a regular folder

$> sudo mount -o hard,nolock ip-10-0-0-110:/mapr /mapr$> ll /mapr/hadoopsummit/total 3drwxr-xr-x. 3 mapr mapr 1 Oct 13 11:21 appsdrwxr-xr-x. 2 mapr mapr 0 Oct 13 11:12 hbasedrwxr-xr-x. 3 root root 1 Oct 13 11:21 installerdrwxr-xr-x. 2 mapr mapr 0 Oct 13 11:14 optdrwxrwxrwx. 2 mapr mapr 1 Oct 14 10:41 tmpdrwxr-xr-x. 6 mapr mapr 4 Oct 14 10:52 userdrwxr-xr-x. 3 mapr mapr 1 Oct 13 11:13 var

Page 21: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2014 MapR Technologies 21

MapR NFS and Volumes

[mapr@ip-10-0-0-110 mapr]$ pwd/mapr/hadoopsummit/user/mapr

Page 22: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2014 MapR Technologies 22

MapR NFS and Volumes

[mapr@ip-10-0-0-110 mapr]$ pwd/mapr/hadoopsummit/user/mapr

Page 23: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2014 MapR Technologies 23

MapR NFS and Volumes

[mapr@ip-10-0-0-110 mapr]$ pwd/mapr/hadoopsummit/user/mapr

Page 24: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2014 MapR Technologies 24

Match a Volume to a Topology

Match data to nodes or groups of nodes precisely

Page 25: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential

CRISP-DM Model

● Industry Standard Model

● Full project view, from business idea to production deployment

● Realistic: lots of cycles

Page 26: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential

MapR Features for Data Understanding

Data Collection:• NFS Mount• POSIX Client• MapR Streams (Kafka API)• MapR DB (HBase API)Data Exploration

• <Insert your favorite tool>

Page 27: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 27© 2016 MapR Technologies 27MapR Confidential

MapR Features for Data Preparation

Data Cleaning and Feature Engineering (ETL):

• NFS Mount, POSIX Client• Snapshots• Streamsets Data Collector w/

MapR support• Apache Spark• <Your favorite tool>

Page 28: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 28© 2016 MapR Technologies 28MapR Confidential

MapR Features for Modeling

MapR does not “do” machine learning, that’s your job!• MapR Filesystem

• NFS mount/POSIX client

• Mirrors and Snapshots

• Topologies

• Use your existing tools

Page 29: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 29© 2016 MapR Technologies 29MapR Confidential

MapR Features for Evaluation

- Collect data- Explore data

• MapR-FS

• Mirrors

• Snapshots

• Support any tools

Page 30: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 30© 2016 MapR Technologies 30MapR Confidential

MapR Features for Deployment

- Collect data- Explore data

• NFS/POSIX client

• Mirrors

• Snapshots

• Microservice model *

• MapR-DB, MapR Streams

• Security* Check out the converged application blueprint : https://www.mapr.com/appblueprint/overview

Page 31: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 31© 2016 MapR Technologies 31MapR Confidential

Converged Data Platform Machine Learning• Features that work together to support all phases of real

production ML

• Supports all the tools you know and the state of the art

frameworks

• Easier to manage, more robust and secure.

• MapR is made for the enterprise

Page 32: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 32© 2016 MapR Technologies 32MapR Confidential © 2016 MapR Technologies

Demo: ML with H2O on MapR

Page 33: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 33© 2016 MapR Technologies 33MapR Confidential

Demo of H2O on MapR: Features in Action

Page 34: Real-World Machine Learning - Leverage the Features of MapR Converged Data Platform

© 2016 MapR Technologies 34© 2016 MapR Technologies 34MapR Confidential

Q & A@mapr

[email protected]

Engage with us!

mapr-technologies