gpu computing with python and anaconda: the next frontier

15
© 2017 Anaconda, Inc. - Confidential & Proprietary GPU Computing with Python and Anaconda: The Next Frontier Accelerate. Connect. Empower. Stan Seibert Director of Community Innovation

Upload: nvidia

Post on 21-Jan-2018

901 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary

GPU Computing with Python and Anaconda: The Next Frontier

Accelerate. Connect. Empower.

Stan Seibert

Director of Community Innovation

Page 2: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary 2

GPUs & Python: A Great Combination

• Python is becoming the glue that binds data

science

• Rapid integration empowers data scientists to

combine new technologies

• This is our goal for Anaconda:

• Free distribution of Python and R for

Win/Mac/Linux

• Includes GPU-accelerated packages:

Caffe, TensorFlow, PyTorch, Theano,

Numba, Pyculib...

Page 3: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary 3

ReLU

ReLU

ReLU

ReLU

Deep Learning: An Early Success

• Powerful machine learning

technique

• Many great open source options

• Every major package has a Python

interface

• Very compute intensive

➡Perfect for GPU acceleration

Page 4: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary 4

• Compile numerical

Python functions for

CPU or GPU

• Based on the LLVM

compiler library

• Great for rapid,

custom algorithm

development

Numba: JIT Python Compilation

Page 5: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary

Problem: An Ecosystem of Silos?

GPU

ETL/Data

Prep

Database

Machine

Learning

Visualization

Data

Data Data

Data

Page 6: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary

Problem: An Ecosystem of Silos?

GPU

ETL/Data

Prep

Database

Machine

Learning

Visualization

Data

Data Data

Data

CPU transfer

CPU transferCPU transfer

Page 7: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary

Problem: An Ecosystem of Silos?

GPU

ETL/Data

Prep

Database

Machine

Learning

Visualization

Data

Data Data

Data

CPU transfer

CPU transferCPU transfer Why do GPU applications share

data through slow CPU memory?

Page 8: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary

GPU Open Analytics Initiative

Goal:

Standardize data exchange between

GPU analytics applications

Current Members:

MapD, Anaconda, H2O.ai,

BlazingDB, Graphistry, Gunrock

http://gpuopenanalytics.com/

Page 9: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary 9

Streamlining the Data Science Pipeline

GPU Database

Python Data

Transformation

Generalized

Linear Model

All data stays on the GPU

GDFPacked

Array

Apache

Arrow

Page 10: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary 10

• A format for tabular data in GPU memory

• Exchange GDF between different libraries

• Move between processes using CUDA IPC

• Based on Apache Arrow

• Code in separate library

• Work in progress to move functionality

into Arrow project

GPU Dataframe (GDF)

Page 11: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary 11

• A Python library of manipulating GPU Dataframes:

• Create from NumPy arrays and Pandas Dataframes

• Exchange between processes

• Math operations

• Sort, Filter, Join, Group By

• Ideal for data manipulation and feature engineering stages between data source and machine learning

• Not intended to replace dedicated database applications

• Interoperates with our Python compiler for GPU: Numba

PyGDF: Python GPU Dataframes

Page 12: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary 12

PyGDF: Group By Performance

GPU speedup become

very large above 10

million elements

Aggregation functions

are extremely efficient

on the GPU

Page 13: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary 13

• Scalable execution task graphs of task graphs from single

computers to 1000+ node clusters

• Scheduler is "resource aware" and can direct GPU tasks to nodes

with appropriate hardware. Great for heterogeneous clusters!

Dask: Distributed Computing

Page 14: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary 14

The Future

• In flight:

• Merger of common code into Apache Arrow GPU support

• Node.js interface to GDF (Graphistry)

• Dask GDF: Distributed GPU dataframe

• Other potential future projects:

• Tensor exchange between Python GPU libraries

• GPU shared memory service (Plasma for GPU)

• Can we improve the interaction of unified memory and IPC?

• What do you want to see?

Page 15: GPU Computing with Python and Anaconda: The Next Frontier

© 2017 Anaconda, Inc. - Confidential & Proprietary

Learn More

GPU Open Analytics Websitehttp://gpuopenanalytics.com

GOAI Github Organizationhttps://github.com/gpuopenanalytics/

GOAI Google Grouphttps://groups.google.com/forum/#!forum/gpuopenanalytics