cuml: a library for gpu accelerated machine learning · 2019-03-29 · gpu-accelerated machine...

62
Onur Yilmaz, Ph.D. | [email protected] | Senior ML/DL Scientist and Engineer Corey Nolet | [email protected] | Data Scientist and Senior Engineer cuML: A Library for GPU Accelerated Machine Learning

Upload: others

Post on 04-Apr-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

Onur Yilmaz, Ph.D. | [email protected] | Senior ML/DL Scientist and EngineerCorey Nolet | [email protected] | Data Scientist and Senior Engineer

cuML: A Library for GPU Accelerated Machine Learning

Page 2: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

2

About Us

Corey Nolet

Data Scientist & Senior Engineer on the RAPIDS cuML team at NVIDIA

Focuses on building and scaling machine learning algorithms to support extreme data loads at light-speed

Over a decade experience building massive-scale exploratory data science & real-time analytics platforms for HPC environments in the defense industry

Working towards PhD in Computer Science, focused on unsupervised representation learning

Onur Yilmaz, Ph.D.

Senior ML/DL Scientist and Engineer on the RAPIDS cuML team at NVIDIA

Focuses on building single and multi GPU machine learning algorithms to support extreme data loads at light-speed

Ph.D. in computer engineering, focusing on ML for finance.

Page 3: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

3

Agenda

• Introduction to cuML

• Architecture Overview

• cuML Deep Dive

• Benchmarks

• cuML Roadmap

Page 4: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

4

Introduction

“Details are confusing. It is only by selection, by elimination, by emphasis, that we get to the real meaning of things.”

~ Georgia O'KeefeMother of American Modernism

Page 5: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

5

Realities of Data

Page 6: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

6

ProblemData sizes continue to grow

Page 7: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

7

ProblemData sizes continue to grow

Page 8: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

8

ProblemData sizes continue to grow

min(variance)

min(bias)

Page 9: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

9

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Page 10: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

10

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Page 11: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

11

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Better to start with as much data aspossible and explore / preprocess to scaleto performance needs.

Page 12: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

12

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Massive Dataset

Better to start with as much data aspossible and explore / preprocess to scaleto performance needs.

Page 13: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

13

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Massive Dataset

Better to start with as much data aspossible and explore / preprocess to scaleto performance needs.

Page 14: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

14

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Massive Dataset

Better to start with as much data aspossible and explore / preprocess to scaleto performance needs.

Iterate.

Page 15: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

15

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Massive Dataset

Better to start with as much data aspossible and explore / preprocess to scaleto performance needs.

Iterate. Cross Validate.

Page 16: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

16

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Massive Dataset

Better to start with as much data aspossible and explore / preprocess to scaleto performance needs.

Iterate. Cross Validate & Grid Search.

Page 17: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

17

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Massive Dataset

Better to start with as much data aspossible and explore / preprocess to scaleto performance needs.

Iterate. Cross Validate & Grid Search.

Iterate some more.

Page 18: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

18

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Massive Dataset

Better to start with as much data aspossible and explore / preprocess to scaleto performance needs.

Iterate. Cross Validate & Grid Search.

Iterate some more.

Meet reasonable speed vs accuracy tradeoff

Page 19: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

19

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Massive Dataset

Better to start with as much data aspossible and explore / preprocess to scaleto performance needs.

Iterate. Cross Validate & Grid Search.

Iterate some more.

Meet reasonable speed vs accuracy tradeoff

TimeIncreases

Page 20: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

20

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Massive Dataset

Better to start with as much data aspossible and explore / preprocess to scaleto performance needs.

Iterate. Cross Validate & Grid Search.

Iterate some more.

Meet reasonable speed vs accuracy tradeoff

Hours?

TimeIncreases

Page 21: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

21

ProblemData sizes continue to grow

Histograms / Distributions

Dimension ReductionFeature Selection

Remove Outliers

Sampling

Massive Dataset

Better to start with as much data aspossible and explore / preprocess to scaleto performance needs.

Iterate. Cross Validate & Grid Search.

Iterate some more.

Meet reasonable speed vs accuracy tradeoff

Hours? Days?

TimeIncreases

Page 22: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

22

ML Workflow Stifles InnovationIt Requires Exploration and Iterations

All Data

ETL

Manage Data

Structured Data Store

Feature Engineering

Training

Model Training

Tuning & Selection

Evaluate

Inference

Deploy

Accelerating just `Model Training` does have benefit but doesn’t address the whole problem

Iterate …Cross Validate …Grid Search …Iterate some more.

Page 23: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

23

ML Workflow Stifles InnovationIt Requires Exploration and Iterations

All Data

ETL

Manage Data

Structured Data Store

Feature Engineering

Training

Model Training

Tuning & Selection

Evaluate

Inference

Deploy

Accelerating just `Model Training` does have benefit but doesn’t address the whole problem

End-to-End acceleration is needed

Iterate …Cross Validate …Grid Search …Iterate some more.

Page 24: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

24

Architecture

“More data requires better approaches!”~ Xavier Amatriain

CTO, CurAI

Page 25: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

25

RAPIDS: OPEN GPU DATA SCIENCEcuDF, cuML, and cuGraph mimic well-known libraries

Data Preparation VisualizationModel Training

CUDA

PYTHON

DAS

K

DL FRAMEWORKS

CUDNN

RAPIDS

CUMLCUDF CUGRAPH

APACHE ARROW

Pandas-like

ScikitLearn-like

NetworkX-like

Page 26: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

26

HIGH-LEVEL APIs

CUDA/C++

Multi-Node & Multi-GPU Communications

ML Primitives

ML Algorithms

Python

Dask Multi-GPU ML

Scikit-Learn-Like

Host 2

GPU1 GPU3

GPU2 GPU4

Host 1

GPU1 GPU3

GPU2 GPU4

Dask-CUML

CuML

libcuml

Page 27: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

27

cuML API

Python

Algorithms

Primitives

GPU-accelerated machine learning at every layer

Scikit-learn-like interface for data scientists utilizing cuDF & Numpy

CUDA C++ API for developers to utilize accelerated machine learning algorithms.

Reusable building blocks for composing machine learning algorithms.

Page 28: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

28

Linear Algebra

PrimitivesGPU-accelerated math optimized for feature matrices

Statistics

Matrix / Math

Random

Distance / Metrics

Objective Functions

More to come!

• Element-wise operations

• Matrix multiply

• Norms

• Eigen Decomposition

• SVD/RSVD

• Transpose

• QR Decomposition Sparse Conversions

Page 29: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

29

AlgorithmsGPU-accelerated Scikit-Learn

Classification / Regression

Statistical Inference

Clustering

Decomposition & Dimensionality Reduction

Timeseries Forecasting

Recommendations

Decision Trees / Random ForestsLinear RegressionLogistic RegressionK-Nearest NeighborsKalman FilteringBayesian InferenceGaussian Mixture ModelsHidden Markov ModelsK-MeansDBSCANSpectral ClusteringPrincipal ComponentsSingular Value DecompositionUMAPSpectral EmbeddingARIMAHolt-Winters

Implicit Matrix Factorization

Cross Validation

More to come!

Hyper-parameter Tuning

Page 30: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

30

HIGH-LEVEL APIs

CUDA/C++

Multi-Node / Multi-GPU Communications

ML Primitives

ML Algorithms

Python

Dask Multi-GPU ML

Scikit-Learn-Like

Host 2

GPU1 GPU3

GPU2 GPU4

Host 1

GPU1 GPU3

GPU2 GPU4

Data Distribution

Model Parallelism

Page 31: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

31

HIGH-LEVEL APIs

CUDA/C++

Multi-Node / Multi-GPU Communications

ML Primitives

ML Algorithms

Python

Dask Multi-GPU ML

Scikit-Learn-Like

Host 2

GPU1 GPU3

GPU2 GPU4

Host 1

GPU1 GPU3

GPU2 GPU4

Data Distribution

Model Parallelism

• Portability

• Efficiency

• Speed

Page 32: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

32

Dask cuMLDistributed Data-parallelism Layer

• Distributed computation scheduler for Python

• Scales up and out

• Distributes data across processes

• Enables model-parallel cuML algorithms

Page 33: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

33

ML Technology Stack

Python

Cython

cuML Algorithms

cuML Prims

CUDA Libraries

CUDA

Dask cuMLDask cuDF

cuDFNumpy

ThrustCub

cuSolvernvGraphCUTLASScuSparsecuRandcuBlas

Page 34: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

34

cuML Deep Dive

“I would posit that every scientist is a data scientist.”~ Arun Subramaniyan

V.P. of Data Science & Analytics, Baker Hughes, a GE Company

Page 35: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

35

Linear Regression (OLS)Python Layer

cuDF

Pandas

Page 36: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

36

Linear Regression (OLS)Python Layer

cuDF

Page 37: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

37

Linear Regression (OLS)Python Layer

cuML

Scikit-Learn

Page 38: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

38

Linear Regression (OLS)Python Layer

cuML

Scikit-Learn

Page 39: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

39

Linear Regression (OLS)Python Layer

cuML

Scikit-Learn

Page 40: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

40

Linear Regression (OLS)cuML Algorithms CUDA C++ Layer

Page 41: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

41

Linear Regression (OLS)cuML Algorithms CUDA C++ Layer

Page 42: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

42

Linear Regression (OLS)cuML ML-Prims CUDA C++ Layer

Page 43: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

43

Linear Regression (OLS)cuML ML-Prims CUDA C++ Layer

Page 44: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

44

Linear Regression (OLS)cuML ML-Prims CUDA C++ Layer

Page 45: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

45

Linear Regression (OLS)cuML ML-Prims CUDA C++ Layer

Page 46: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

46

Linear Regression (OLS)cuML ML-Prims CUDA C++ Layer

Page 47: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

47

Linear Regression (OLS)cuML ML-Prims CUDA C++ Layer

Page 48: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

48

Linear Regression (OLS)cuML ML-Prims CUDA C++ Layer

c1 c2 c3 ….cN

Mat

rix

AVe

ctor

b ….

Page 49: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

49

Benchmarks

Page 50: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

50

ALGORITHMSBenchmarked on DGX1

Page 51: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

51

UMAPReleased in 0.6!

Page 52: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

52

cuDF + XGBoostDGX-2 vs Scale Out CPU Cluster

• Full end to end pipeline• Leveraging Dask + cuDF• Store each GPU results in sys mem then read back in• Arrow to Dmatrix (CSR) for XGBoost

Page 53: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

53

cuDF + XGBoostScale Out GPU Cluster vs DGX-2

0 50 100 150 200 250 300 350

5x DGX-1

DGX-2

Chart Title

ETL+CSV (s) ML Prep (s) ML (s)

• Full end to end pipeline• Leveraging Dask for multi-node + cuDF• Store each GPU results in sys mem then read back in• Arrow to Dmatrix (CSR) for XGBoost

Page 54: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

54

cuDF + XGBoostFully In- GPU Benchmarks

• Full end to end pipeline• Leveraging Dask cuDF• No Data Prep time all in memory• Arrow to Dmatrix (CSR) for XGBoost

Page 55: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

55

XGBoostMulti-node, Multi-GPU Performance

2290

1956

1999

1948

169

157

0 500 1000 1500 2000 2500

20CPU Nodes

30CPU Nodes

50CPU Nodes

100CPU Nodes

DGX-2

5xDGX-1

20CPU Nodes

30CPU Nodes

50CPU Nodes

100CPU Nodes

DGX-2 5xDGX-1

Benchmark

200GB CSV dataset; Data preparation includes joins, variable transformations.

CPU Cluster Configuration

CPU nodes (61 GiB of memory, 8 vCPUs, 64-bit platform), Apache Spark

DGX Cluster Configuration

DGX nodes on InfiniBand network

Page 56: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

56

Single Node Multi-GPU

Linear Regression

• Reduction: 40mins -> 1min• Size: 225gb• System: DGX2

tSVD

• Reduction: 1.6hrs-> 1.5min• Size: 220gb• System: DGX2

Nearest Neighbors

• Reduction: 4+hrs-> 30sec• Size: 128gb• System: DGX1

Will be Released in 0.6

Page 57: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

57

Roadmap

“Data science is the fourth pillar of the scientific method!”~ Jensen Huang

Page 58: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

58

CUMLSingle GPU and XGBoost

cuML SG MG MGMNGradientBoostedDecisionTrees

(GBDT)

GLM

LogisticRegression

RandomForest(regression)

K-Means

K-NN

DBSCAN

UMAP

ARIMA

KalmanFilter

Holts-Winters

PrincipalComponents

SingularValueDecomposition

Page 59: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

59

DASK-CUMLOLS, tSVD, and KNN in RAPIDS 0.6

cuML SG MG MGMNGradientBoostedDecisionTrees

(GBDT)

GLM

LogisticRegression

RandomForest(regression)

K-Means

K-NN

DBSCAN

UMAP

ARIMA

KalmanFilter

Holts-Winters

PrincipalComponents

SingularValueDecomposition

Page 60: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

60

DASK-CUMLK-Means*, DBSCAN & PCA in RAPIDS 0.7/0.8

cuML SG MG MGMNGradientBoostedDecisionTrees

(GBDT)

GLM

LogisticRegression

RandomForest(regression)

K-Means

K-NN

DBSCAN

UMAP

ARIMA

KalmanFilter

Holts-Winters

PrincipalComponents

SingularValueDecomposition• Deprecating the current K-means in 0.6 for new K-means built on MLPrims

Page 61: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

61

CuML 0.6

New Algorithms

• Stochastic Gradient Descent [Single GPU]

• UMAP [Single GPU]

• Linear Regression (OLS) [Single Node, Multi-GPU]

• Truncated SVD [Single Node, Multi-GPU]

Will be released with RAPIDS 0.6 on Friday!

Notable Improvements

• Exposing support for hyperparsmeter tuning

• Removing external requirement on FAISS

• Lowered Nearest Neighbors memory requirement

Page 62: cuML: A Library for GPU Accelerated Machine Learning · 2019-03-29 · GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF&

Thank you!

https://rapids.aihttps://github.com/cuml

https://github.com/dask-cuml

Corey Nolet: @cjnoletOnur Yilmaz: @Onur02128993