h2o in knime: integrating high performance machine learning · h2o in knime •offer our users...

39
© 2018 KNIME AG. All Rights Reserved. H2O in KNIME: Integrating High Performance Machine Learning Jo-Fai Chow (H2O.ai), Marten Pfannenschmidt (KNIME), Christian Dietz (KNIME)

Upload: others

Post on 27-Apr-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved.

H2O in KNIME: Integrating High Performance Machine Learning

Jo-Fai Chow (H2O.ai), Marten Pfannenschmidt (KNIME),Christian Dietz (KNIME)

Page 2: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

2© 2018 KNIME AG. All Rights Reserved.

H2O for High Performance Machine Learning

Page 3: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

3

Page 4: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

4

Gartner 2018 Magic Quadrant for Data Science and Machine Learning Platforms

https://www.kdnuggets.com/2018/02/gartner-2018-mq-data-science-machine-learning-changes.html

Page 5: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

HDFS

S3

NFS

DistributedIn-Memory

Load Data

Loss-lessCompression

H2O Compute Engine

Production Scoring Environment

Exploratory &Descriptive

Analysis

Feature Engineering &

Selection

Supervised &Unsupervised

Modeling

ModelEvaluation &

Selection

Predict

Data & ModelStorage

Model Export:Plain Old Java

Object

YourImagination

Data Prep Export:Plain Old Java

Object

Local

SQL

High Level Architecture

5

Page 6: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

HDFS

S3

NFS

DistributedIn-Memory

Load Data

Loss-lessCompression

H2O Compute Engine

Production Scoring Environment

Exploratory &Descriptive

Analysis

Feature Engineering &

Selection

Supervised &Unsupervised

Modeling

ModelEvaluation &

Selection

Predict

Data & ModelStorage

Model Export:Plain Old Java

Object

YourImagination

Data Prep Export:Plain Old Java

Object

Local

SQL

High Level Architecture

6

Multiple Interfaces

Page 7: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

Supervised Learning

• Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie

• Naïve Bayes

Statistical Analysis

Ensembles

• Distributed Random Forest: Classification or regression models

• Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations

Deep Neural Networks

• Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations

Algorithms Overview

Unsupervised Learning

• K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k

Clustering

Dimensionality Reduction

• Principal Component Analysis: Linearly transforms correlated variables to independent components

• Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data

Anomaly Detection

• Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning

7

Page 8: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

Scientific Advisory Council

8

Page 9: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

H2O on a Single Machine

CPU

Model Building

H2O

Page 10: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

H2O on a Multi-Node Cluster

YARN

CPU CPU CPU

Model Building

H2O Distributed In-Memory

SQL NFS

S3

Firewall or Cloud

Page 11: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

Distributed Algorithms

• Foundation for In-Memory Distributed Algorithm Calculation - Distributed Data Frames and columnar compression

• All algorithms are distributed in H2O: GBM, GLM, DRF, Deep Learning and more. Fine-grained map-reduce iterations.

• Only enterprise-grade, open-source distributed algorithms in the market

User Benefits

Advantageous Foundation

• “Out-of-box” functionalities for all algorithms (NO MORE SCRIPTING) and uniform interface across all languages: R, Python, Java

• Designed for all sizes of data sets, especially large data• Highly optimized Java code for model exports• In-house expertise for all algorithms

Parallel Parse into Distributed Rows

Fine Grain Map Reduce Illustration: Scalable Distributed Histogram Calculation for GBM

Fou

nd

atio

n fo

r D

istr

ibu

ted

Alg

ori

thm

s

11

Page 12: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

Performance Benchmark (GBM)by Szilard Pafka (https://github.com/szilard/benchm-ml)

12

X = million of samples

Time (s)

AUC

• H2O Gradient Boosting Machine (GBM) is the fastest algorithm in the test of 10M Samples.

• Area Under Curve (AUC)

• Comparable with xgboost

• Better than R/Spark MLLib

• Benchmark results for other algorithms (Generalized Linear Model, Random Forest etc.) are available in Szilard’s GitHub repository.

Page 13: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

13© 2018 KNIME AG. All Rights Reserved.

H2O in KNIME

Page 14: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 14

H2O in KNIME

• Offer our users high-performance machine learning algorithms from H2O in KNIME

• Allow to mix & match with other KNIME functionality

– Data wrangling KNIME Analytics Platform functionality

– KNIME Big-Data Connectors

– Text Mining, Image Processing, Cheminformatics, …

– and more!

Page 15: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 15

H2O in KNIME

Live Demo

Page 16: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

16© 2018 KNIME AG. All Rights Reserved.

Customer prediction with H2O in KNIME

Page 17: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 17

Kaggle.com

Get the data Build a model Make a submission

Page 18: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 18

The competition

Date Store ID Visitors

2016-01-01 ba937bf13d40fb24 28

… … …

2017-04-22 324f7c39a8410e7c 216

Date Store ID Visitors

2017-04-23 e8ed9335d0c38333 ?

… … …

2017-05-31 8f13ef0f5e8c64dd ?

Provided data:• Number of visitors• Reservations• Store information• Calendar date info

Page 19: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 19

Solving a Kaggle competition with KNIME and H2O

Data preparation

Model training

Model optimization

Model evaluation

Deployment

Page 20: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 20

Solving a Kaggle competition with KNIME and H2O

Data preparation

Model training

Model optimization

Model evaluation

Deployment

Page 21: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 21

Data preparation with KNIME Nodes

Page 22: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 22

The blended dataset

Date Store ID Visitors Mean visitors Area Reservations Holiday Genre …

2016-01-01 ba937bf13d40fb24 28 23.8 Area1 15 False Bar …

… … … … … … … … …

2017-04-22 324f7c39a8410e7c 216 45.3 Area2 22 True Karaoke …

Date Store ID Visitors Mean visitors Area Reservations Holiday Genre …

2017-04-23 e8ed9335d0c38333 ? 12.5 Area1 10 False Italian …

… … … … … … … … …

2017-05-31 8f13ef0f5e8c64dd ? 54.1 Area7 20 False Izakaya …

Data used for model training

Data used for Kaggle prediction

Page 23: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 23

Solving a Kaggle competition with KNIME and H2O

Data preparation

Model training

Model optimization

Model evaluation

Deployment

Page 24: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 24

Solving a Kaggle competition with KNIME and H2O

Data preparation

Model training

Model optimization

Model evaluation

Deployment

Page 25: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 25

Modeling with the new H2O nodes

Page 26: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 26

Modeling with the new H2O nodes

Page 27: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 27

Modeling with the new H2O nodes

Page 28: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 28

Modeling with the new H2O nodes

Page 29: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 29

Solving a Kaggle competition with KNIME and H2O

Data preparation

Model training

Model optimization

Model evaluation

Deployment

Page 30: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 30

Solving a Kaggle competition with KNIME and H2O

.

Data preparation

Model training

Model optimization

Model evaluation

Deployment

Page 31: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

31© 2018 KNIME AG. All Rights Reserved.

What’s Cooking?

Page 32: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 32

What’s Cooking: Scoring with H2O MOJOs on Spark

• Scoring using MOJOs

Page 33: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 33

What’s Cooking: Scoring with H2O MOJOs on Spark

• New: Scoring using MOJOs on Spark

Page 34: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 34

What’s Cooking: H2O Sparkling Water in KNIME

Page 35: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 35

What’s Cooking: H2O Sparkling Water in KNIME

Page 36: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 36

What’s Cooking: H2O Sparkling Water in KNIME

Page 37: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 37

What’s Cooking: H2O Sparkling Water in KNIME

Page 38: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

© 2018 KNIME AG. All Rights Reserved. 38

What’s Cooking: H2O Sparkling Water in KNIME

Page 39: H2O in KNIME: Integrating High Performance Machine Learning · H2O in KNIME •Offer our users high-performance machine learning algorithms from H2O in KNIME •Allow to mix & match

39© 2018 KNIME AG. All Rights Reserved.

The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME AG under license from KNIME GmbH, and are registered in the United States.

KNIME® is also registered in Germany.