deep learning and recurrent neural networks in the enterprise

Deep Learning and Recurrent Neural Networks in the Enterprise

StampedeConSt. Louis 2016

Josh Patterson, Skymind

Presenter: Josh Patterson

Past

Research in Swarm Algorithms: Real-time optimization techniques in mesh sensor networks

TVA / NERC: Smartgrid, Sensor Collection, and Big Data

Cloudera: Principal SA, Working with Fortune 500

Patterson Consulting: Working with Fortune 500 on Big Data, ML

Today

Skymind, Director Field Engineering

[email protected] / @jpatanoogaDL4J Co-creator,

Co-Author on Upcoming Oreilly Book“Deep Learning: A Practitioner’s Approach”

Topics

• What is Deep Learning?• DL4J• Recurrent Neural Network Applications

WHAT IS DEEP LEARNING?

Defining Deep Learning

• Higher neuron counts than in previous generation neural networks

• Different and evolved ways to connect layers inside neural networks

• More computing power to train• Automated Feature Learning

Automated Feature Learning

• Deep Learning can be thought of as workflows for automated feature construction– From “feature construction” to “feature learning”

• As Yann LeCun says:– “machines that learn to represent the world”

These are the features learned at each neuron in a Restricted Boltzmann Machine (RBMS)

These features are passed to higher levels of RBMs to learn more complicated things.

Part of the “7” digit

Unreasonable Effectiveness: Benchmark Records

1. Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014) 2. Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014) 3. Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014) 4. Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014) 5. Medium vocabulary speech recognition (Geiger et al., Interspeech 2014) 6. English to French translation (Sutskever et al., Google, NIPS 2014) 7. Audio onset detection (Marchi et al., ICASSP 2014) 8. Social signal classification (Brueckner & Schulter, ICASSP 2014) 9. Arabic handwriting recognition (Bluche et al., DAS 2014) 10. TIMIT phoneme recognition (Graves et al., ICASSP 2013) 11. Optical character recognition (Breuel et al., ICDAR 2013) 12. Image caption generation (Vinyals et al., Google, 2014) 13. Video to textual description (Donahue et al., 2014) 14. Syntactic parsing for Natural Language Processing (Vinyals et al., Google, 2014) 15. Photo-real talking heads (Soong and Wang, Microsoft, 2014).

Four Major Architectures

• Deep Belief Networks• Convolutional Neural Networks• Recurrent Neural Networks• Recursive Neural Networks

Quick Usage Guide

• If I have Timeseries or Audio Input– I should use a Recurrent Neural Network– Examples: Fraud Detection, Anomaly Detection

• If I have Image input– I should use a Convolutional Neural Network

• If I have Video input– I should use a hybrid Convolutional + Recurrent

Architecture!

Convolutional Generated Art

The More Things Change…

• Deep Learning is still trying to answer the same fundamental questions such as:– “is this image a face?”

• The difference is Deep Learning makes hard questions easier to answer with better architectures and more computing power– We do this by matching the correct architecture

w the right problem

DL4JBuilding Deep Neural Networks with

DL4J• “The Hadoop of Deep Learning”

– Java, Scala, and Python APIs– ASF 2.0 Licensed

• Java implementation– Parallelization (Yarn + Spark)– GPU support

• Also Supports multi-GPU per host

• Runtime Neutral– Local– Hadoop / YARN + Spark

• https://github.com/deeplearning4j/deeplearning4j

https://github.com/deeplearning4j/deeplearning4j

https://github.com/deeplearning4j/deeplearning4j

DL4J Workflow Toolchain

ETL(DataVec)

Vectorization

(DataVec)

Modeling

(DL4J)

Evaluation

(Arbiter)

Execution Platforms: Spark, Single Machine

ND4J - Linear Algebra Runtime: CPU, GPU

ND4J: The Need for Speed• Javacpp (cython for java)

– Auto generate JNI bindings for C++ by parsing classes– Allows for easy maintenance and deployment of c++ binaries in java

• CPU backends– Openmp (multithreading within native operations)– Openblas or MKL (BLAS operations)– SIMD-extensions

• GPU backends– DL4J supports Cuda 7.5 at the moment, and will support 8.0

support as soon as it comes out.– Leverages cudnn as well

Prepping Data is Time Consuming

http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#633ea7f67f75

Preparing Data for Modeling is Hard

DataVec

• DataVec is a tool for machine learning ETL (Extract, Transform, Load) operations. – Spark-Enabled and focused on Supporting DL4J

• Also performs vectorization– Image, CSV, Sequences (timeseries), more

• Open Source, ASF 2.0 Licensed– https://github.com/deeplearning4j/DataVec

RECURRENT NEURAL NETWORK APPLICATIONS

Using DL4J for

Source: IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009.

.

Transactional Data Explosion

• 2,500 exabytes of new information in 2012 with Internet as primary driver• Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year

Relational

Transactional (Logs, Sensors)

(You)

NERC Sensor Data CollectionopenPDC PMU Data Collection circa 2009

• 120 Sensors• 30 samples/second• 4.3B Samples/day• Housed in Hadoop

Sensor Timeseries Classification with RNNs

• Recurrent Neural Networks have the ability to model change of input over time

• Older techniques (mostly) do not retain time domain– Hidden Markov Models do…• but are more limited

• Key Takeaway: – For working with Timeseries data, RNNs will be

more accurate

RNN Architectures

Standard supervised learning

Imagecaptioning

Sentiment analysis

Video captioning,Natural language translation

Part of speechtagging

Generative models for text

Anomaly Detection

• Model the normal patterns in the data• Autoencoders give us the ability to look at

data that it hasn’t seen before– Find anomalous patterns in sequences– Can also use RNNs for pattern classification

• Interesting Industry Applications– Telecom– Financial Services

Audio Applications

• Text-to-Speech• Recognize specific songs / audio• Enables natural language interfaces

“Google is living a few years in the future and sending the rest of us

messages”-- Doug Cutting in 2013

• However– Most organizations are not built like Google• (and Jeff Dean does not work at your company…)

• Anyone building Next-Gen infrastructure has to consider these things

Certified on Two Hadoop Distributions

• Running Spark on Hadoop via YARN gives us– Sharing cluster resources between heterogeneous

workloads concurrently– Access to the yarn scheduler capabilities– Better control of executors in Spark– Kerberos support for security

• Certified on CDH 5.4• Certified on HDP 2.4– [ Coming later this month ]

Questions?

Thank you for your time and attention

“Deep Learning: A Practitioner’s Approach” (Oreilly, October 2016)

Running DL4J Workflows on Spark

• DataVec is built to scale out via Spark RDDs– RDD<LabeledPoint>– RDD<DataSet>

• DL4J Uses same MultiLayerConfiguration as single host version– Uses SparkDl4jMultiLayer to drive the training on spark– Performs Parameter Averaging

spark-submit --class io.skymind.spark.dl4j.datavec.BasicDataVecExample --master yarn --num-executors 1 --properties-file ./spark_extra.props ./Skymind_spark-1.0-SNAPSHOT.jar

deep learning and recurrent neural networks in the enterprise

Data & Analytics