![Page 1: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/1.jpg)
Reza Zadeh
Matrix Computations and "Neural Networks in Spark
@Reza_Zadeh | http://reza-zadeh.com
Paper: http://arxiv.org/abs/1509.02256 Joint work with many folks on paper.
![Page 2: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/2.jpg)
Training Neural Networks Datasets growing faster than processing speeds Solution is to parallelize on clusters and GPUs
How do we do train neural networks on these things?
![Page 3: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/3.jpg)
Outline Resilient Distributed Datasets and Spark Key idea behind mllib.linalg The Three Dimensions of Machine Learning
Neural Networks in Spark 1.5 Local Linear Algebra Future of Deep Learning in Spark
![Page 4: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/4.jpg)
Traditional Network Programming
Message-passing between nodes (e.g. MPI) Very difficult to do at scale: » How to split problem across nodes? • Must consider network & data locality
» How to deal with failures? (inevitable at scale) » Even worse: stragglers (node not failed, but slow) » Ethernet networking not fast » Have to write programs for each machine
Rarely used in commodity datacenters
![Page 5: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/5.jpg)
Spark Computing Engine Extends a programming language with a distributed collection data-structure » “Resilient distributed datasets” (RDD)
Open source at Apache » Most active community in big data, with 75+
companies contributing
Clean APIs in Java, Scala, Python, R
![Page 6: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/6.jpg)
Resilient Distributed Datasets (RDDs)
Main idea: Resilient Distributed Datasets » Immutable collections of objects, spread across cluster » Statically typed: RDD[T] has objects of type T
val sc = new SparkContext()!val lines = sc.textFile("log.txt") // RDD[String]!!// Transform using standard collection operations !val errors = lines.filter(_.startsWith("ERROR")) !val messages = errors.map(_.split(‘\t’)(2)) !!messages.saveAsTextFile("errors.txt") !
lazily evaluated
kicks off a computation
![Page 7: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/7.jpg)
MLlib: Available algorithms classification: logistic regression, linear SVM,"naïve Bayes, least squares, classification tree, neural networks regression: generalized linear models (GLMs), regression tree collaborative filtering: alternating least squares (ALS), non-negative matrix factorization (NMF) clustering: k-means|| decomposition: SVD, PCA optimization: stochastic gradient descent, L-BFGS
![Page 8: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/8.jpg)
Key idea in mllib.linalg
![Page 9: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/9.jpg)
Key Idea Distribute matrix across the cluster Ship computations involving matrix to cluster
Keep vector operations local to driver
![Page 10: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/10.jpg)
Simple Observation Matrices are often quadratically larger than vectors
A: n x n (matrix) O(n2) v: n x 1 (vector) O(n)
"Even n = 1 million makes cluster necessary
![Page 11: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/11.jpg)
Distributing Matrices How to distribute a matrix across machines? » By Entries (CoordinateMatrix) » By Rows (RowMatrix)
» By Blocks (BlockMatrix) All of Linear Algebra to be rebuilt using these partitioning schemes
As of version 1.3
![Page 12: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/12.jpg)
Distributing Matrices Even the simplest operations require thinking about communication e.g. multiplication
How many different matrix multiplies needed? » At least one per pair of {Coordinate, Row,
Block, LocalDense, LocalSparse} = 10 » More because multiplies not commutative
![Page 13: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/13.jpg)
Example mllib.linalg algorithms » Matrix Multiplication » Singular Value Decomposition (SVD) » QR decomposition
» Optimization primitives » … yours? Simple idea goes a long way
![Page 14: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/14.jpg)
The Three Dimensions of Scalable Machine Learning (and Deep Learning)
![Page 15: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/15.jpg)
ML Objectives
Almost all machine learning objectives are optimized using this update
w is a vector of dimension d"we’re trying to find the best w via optimization
![Page 16: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/16.jpg)
Scaling Dimensions 1) Data size: n 2) Number of models: hyperparameters
3) Model size: d
![Page 17: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/17.jpg)
Neural Networks Data Scaling
![Page 18: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/18.jpg)
Separable Updates Can be generalized for » Unconstrained optimization » Smooth or non-smooth
» LBFGS, Conjugate Gradient, Accelerated Gradient methods, …
![Page 19: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/19.jpg)
Local Linear Algebra
![Page 20: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/20.jpg)
Why local linear algebra? GPU and CPU hardware acceleration can be tricky on the JVM
Forward and Backward propagation in Neural Networks and many other places need local acceleration
![Page 21: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/21.jpg)
Comprehensive Benchmarks
From our paper: http://arxiv.org/abs/1509.02256
![Page 22: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/22.jpg)
Future of the Spark and Neural Networks
![Page 23: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/23.jpg)
Coming for Neural Networks Future versions will include » Convolutional Neural Networks (Goal: 1.6) » Dropout, different layer types (Goal: 1.6)
» Recurrent Neural Networks and LSTMs (Goal: 1.7)
![Page 24: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/24.jpg)
Most active open source community in big data
200+ developers, 50+ companies contributing
Spark Community
Giraph Storm
0
50
100
150
Contributors in past year
![Page 25: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/25.jpg)
Continuing Growth
source: ohloh.net
Contributors per month to Spark
![Page 26: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/26.jpg)
Conclusions
![Page 27: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/27.jpg)
Spark and Research Spark has all its roots in research, so we hope to keep incorporating new ideas!
![Page 28: Matrix Computations and Neural Networks in Sparkrezab/slides/stratany2015.pdf · Spark Computing Engine Extends a programming language with a distributed collection data-structure](https://reader030.vdocuments.mx/reader030/viewer/2022041016/5ec8f6caa1e6465def38cb79/html5/thumbnails/28.jpg)
Conclusion Data flow engines are becoming an important platform for machine learning
More info: spark.apache.org