deep learning at scale

15
Proprietary and confidential. Do not distribute. Nervana’s Deep Learning Platform MAKING MACHINES SMARTER.™ Hanlin Tang, PhD Algorithms Engineer

Upload: nervana-systems

Post on 16-Apr-2017

347 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Deep Learning at Scale

Proprietary and confidential. Do not distribute.

Nervana’s Deep Learning Platform

MAKING MACHINES SMARTER.™

Hanlin Tang, PhDAlgorithms Engineer

Page 2: Deep Learning at Scale

Facebook DeepMask

Silver et al, 2016

The Atlantic, March 2016

“The error rate has been cut by a factor of two in all the languages, more than a factor of two in many cases. That’s mostly due to deep learning and the way we have optimized it …”

Alex Acero, Siri Senior Director, AppleArticle in Backhannel/WIRED, Aug 2016

Deep Learning

Page 3: Deep Learning at Scale

neon deep learning

framework

train deployexplore

nervanaengine

Fastest deep learning framework

cloudn

Page 4: Deep Learning at Scale

• Unprecedented computing power• 10x speedup over current Maxwell GPUs (~55 TeraOps)

• 32 GB High-Bandwidth Memory

• Six bi-directional high-bandwidth links for 3D torus interconnect• 8 chips in a box, seamlessly scale to multiple chassis

Page 5: Deep Learning at Scale

https://github.com/NervanaSystems/neon

Page 6: Deep Learning at Scale

• https://github.com/NervanaSystems/ModelZoo• Pre-trained weights and models

SegNet

Deep Speech 2

Skip-thought

Autoencoders

Deep Dream

Page 7: Deep Learning at Scale

Badrinarayanan et al., 2015

Page 8: Deep Learning at Scale

Neon (ms) Caffe (ms) Speed-upForward 101 719 7.1x

Backward 164 746 4.5xTotal 265 1455 5.5x

Page 9: Deep Learning at Scale

neon v1.6 + mgpu v1.6

neon v2.0Modular dataloader (aeon)Neural machine translation model

neon v3.0•Nervana Graph•Tensorflow inter-operability•Graph-enabled models•Distributed computing

Page 10: Deep Learning at Scale
Page 11: Deep Learning at Scale

“Training neural networks is a dark art.”Hyperparameters:•Number and type of units/layers•Convolution filter size•Weight Initialization•Optimization method•Learning Rate schedule

Page 12: Deep Learning at Scale
Page 13: Deep Learning at Scale

Command Line client Web Interface

Page 14: Deep Learning at Scale

Nervana in actionHealthcare: Tumor detection

Automotive: Speech interfacesFinance: Time-series search engine

Positive:

Negative:

Agricultural Robotics Oil & Gas

Positive:

Negative:

Proteomics: Sequence analysis

Query:

Results:

Page 15: Deep Learning at Scale

+ n