hadoop summit 2014 - san jose - introduction to deep learning on hadoop
DESCRIPTION
As the data world undergoes its cambrian explosion phase our data tools need to become more advanced to keep pace. Deep Learning has emerged as a key tool in the non-linear arms race of machine learning. In this session we will take a look at how we parallelize Deep Belief Networks in Deep Learning on Hadoop’s next generation YARN framework with Iterative Reduce. We’ll also look at some real world examples of processing data with Deep Learning such as image classification and natural language processing.TRANSCRIPT
![Page 1: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/1.jpg)
Deep Learning on Hadoop
Scaleout Deep Learning on YARN
![Page 2: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/2.jpg)
Adam GibsonEmail:
Twitter: @agibsonccc
Github: https://github.com/agibsonccc
Slideshare:http://slideshare.net/agibsonccc/
Instructor athttp://zipfianacademy.com/
Wired Coverage:http://www.wired.com/2014/06/skymind-deep-learning/
![Page 3: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/3.jpg)
Josh PattersonEmail:
Twitter: @jpatanooga
Github: https://github.com/jpatanooga
PastPublished in IAAI-09:
“TinyTermite: A Secure Routing Algorithm”Grad work in Meta-heuristics, Ant-algorithms
Tennessee Valley Authority (TVA)
Hadoop and the SmartgridCloudera
Principal Solution ArchitectToday: Patterson Consulting
![Page 4: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/4.jpg)
Overview• What is Deep Learning?• Deep Belief Networks• Implementation on Hadoop/YARN• Results
![Page 5: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/5.jpg)
What is Deep Learning?
![Page 6: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/6.jpg)
What is Deep Learning?Algorithm that tries to learn simple features in lower layers
And more complex features in higher layers
![Page 7: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/7.jpg)
Interesting Properties of Deep Learning
Reduces a problem with overfitting in neural networks. Introduces new techniques for "unsupervised feature learning”
introduces new more automatic ways to figure out the parts of your data you should feed into your learning algorithm.
![Page 8: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/8.jpg)
Chasing NatureLearning sparse representations of auditory signals
leads to filters that closely correspond to neurons in early audio processing in mammals
When applied to speechLearned representations showed a striking resemblance to the cochlear filters in the auditory cortext
![Page 9: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/9.jpg)
Yann LeCunn on Deep Learning
Has become the dominant method for acoustic modeling in speech recognitionQuickly becoming the dominant method for several vision tasks such as
object recognitionobject detectionsemantic segmentation.
![Page 10: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/10.jpg)
Deep Belief Networks
![Page 11: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/11.jpg)
What is a Deep Belief Network?
Generative probabilistic modelComposed of one visible layer
Many hidden layersEach hidden layer learns relationship between units in lower layer
Higher layer representations tend to become more complext
![Page 12: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/12.jpg)
Restricted Boltzmann Machines Unsupervised model: Does feature learning by repeated sampling
of the input data. Learns how to reconstruct data for good feature detection. RBMs have different formulas for different kinds of data:
Binary
Continuous
![Page 13: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/13.jpg)
DeepLearning4JImplementation in Java
Self-contained & built on Akka, Hazelcast, JblasDistributed to run faster and with more features than current Theano-based implementations.Talks to any data source, expects one format.
![Page 14: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/14.jpg)
Vectorized Implementation
Handles lots of data concurrently. Any number of examples at once, but the code does not change.Faster: Allows for native/GPU execution.One format: Everything is a matrix.
![Page 15: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/15.jpg)
DL4J vs Theano PerfGPUs are inherently faster than normal native.Theano is not distributed, and GPUs have very low RAM.DL4J allows for situations where you have to “throw CPUs at it.”
![Page 16: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/16.jpg)
What are Good Applications for Deep Learning?
Image ProcessingHigh MNIST Scores
Audio ProcessingCurrent Champ on TIMIT dataset
Text / NLP ProcessingWord2vec, etc
![Page 17: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/17.jpg)
Deep Learning on Hadoop
![Page 18: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/18.jpg)
Past Work: Parallel Iterative Algorithms on YARN
Started withParallel linear, logistic regressionParallel Neural Networks
Packaged in Metronome100% Java, ASF 2.0 Licensed, on github
![Page 19: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/19.jpg)
19
Parameter Averaging
McDonald, 2010Distributed Training Strategies for the Structured Perceptron
Langford, 2007Vowpal Wabbit
Jeff Dean’s Work on Parallel SGDDownPour SGD
![Page 20: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/20.jpg)
20
MapReduce vs. Parallel Iterative
Input
Output
Map Map Map
Reduce Reduce
Processor Processor Processor
Superstep 1
Processor Processor
Superstep 2
. . .
Processor
![Page 21: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/21.jpg)
21
SGD: Serial vs Parallel
Model
Training Data
Worker 1
Master
Partial Model
Global Model
Worker 2
Partial Model
Worker N
Partial Model
Split 1 Split 2 Split 3
…
![Page 22: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/22.jpg)
Managing ResourcesRunning through YARN on hadoop is important
Allows for workflow schedulingAllows for scheduler oversight
Allows the jobs to be first class citizens on Hadoop
And share resources nicely
![Page 23: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/23.jpg)
Parallelizing Deep Belief Networks
Two phase trainingPre TrainFine tune
Each phase can do multiple passes over datasetEntire network is averaged at master
![Page 24: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/24.jpg)
PreTrain and Lots of DataWe’re exploring how to better leverage the unsupervised aspects of the PreTrain phase of Deep Belief Networks
Allows for the use of far less unlabeled dataAllows us to more easily modeled the massive amounts of structured data in HDFS
![Page 25: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/25.jpg)
Results
![Page 26: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/26.jpg)
DBNs on IR Performance Faster to Train. Parameter averaging is an automatic form of
regularization. Adagrad with IR allows for better
generalization of different features and even pacing.
![Page 27: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/27.jpg)
Scale Out MetricsBatches of records can be processed by as many workers as there are data splitsMessage passing overhead is minimalExhibits linear scaling
Example: 3x workers, 3x faster learning
![Page 28: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/28.jpg)
Usage From Command Line
Run Deep Learning on Hadoopyarn jar iterativereduce-0.1-SNAPSHOT.jar [props file]
Evaluate model./score_model.sh [props file]
![Page 29: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/29.jpg)
Handwriting Renders
![Page 30: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/30.jpg)
Faces Renders
![Page 31: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/31.jpg)
…In Which We Gather Lots of Cat Photos
![Page 32: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/32.jpg)
Future DirectionGPUsBetter Vectorization toolingMove YARN version back over to JBLAS for matrices
![Page 33: Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop](https://reader035.vdocuments.mx/reader035/viewer/2022081412/53fdd9ed8d7f72a81c8b4b4d/html5/thumbnails/33.jpg)
References“A Fast Learning Algorithm for Deep Belief Nets”
Hinton, G. E., Osindero, S. and Teh, Y. - Neural Computation (2006)
“Large Scale Distributed Deep Networks”Dean, Corrado, Monga - NIPS (2012)
“Visually Debugging Restricted Boltzmann Machine Training with a 3D Example”
Yosinski, Lipson - Representation Learning Workshop (2012)