diving into deep learning (silicon valley code camp 2017)

65
Diving into Deep Learning Silicon Valley Code Camp 10/08/2007 Paypal San Jose Oswald Campesato [email protected]

Upload: oswald-campesato

Post on 17-Mar-2018

209 views

Category:

Software


4 download

TRANSCRIPT

Page 1: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Diving into Deep Learning

Silicon Valley Code Camp

10/08/2007 Paypal San Jose

Oswald Campesato

[email protected]

Page 2: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Overview

intro to AI/ML/DL

linear regression

activation functions

cost functions

gradient descent

back propagation

hyper-parameters

what are CNNs

Android and DL

Page 3: Diving into Deep Learning (Silicon Valley Code Camp 2017)

The Data/AI Landscape

Page 4: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Gartner 2016: Where is Deep Learning?

Page 5: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Gartner 2017: Deep Learning (YES!)

Page 6: Diving into Deep Learning (Silicon Valley Code Camp 2017)

The Official Start of AI (1956)

Page 7: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Neural Network with 3 Hidden Layers

Page 8: Diving into Deep Learning (Silicon Valley Code Camp 2017)

AI/ML/DL: How They Differ

Traditional AI (20th century):

based on collections of rules

Led to expert systems in the 1980s

The era of LISP and Prolog

Page 9: Diving into Deep Learning (Silicon Valley Code Camp 2017)

AI/ML/DL: How They Differ

Machine Learning:

Started in the 1950s (approximate)

Alan Turing and “learning machines”

Data-driven (not rule-based)

Many types of algorithms

Involves optimization

Page 10: Diving into Deep Learning (Silicon Valley Code Camp 2017)

AI/ML/DL: How They Differ

Deep Learning:

Started in the 1950s (approximate)

The “perceptron” (basis of NNs)

Data-driven (not rule-based)

large (even massive) data sets

Involves neural networks (CNNs: ~1970s)

Lots of heuristics

Heavily based on empirical results

Page 11: Diving into Deep Learning (Silicon Valley Code Camp 2017)

The Rise of Deep Learning

Massive and inexpensive computing power

Huge volumes of data/Powerful algorithms

The “big bang” in 2009:

”deep-learning neural networks and NVidia GPUs"

Google Brain used NVidia GPUs (2009)

Page 12: Diving into Deep Learning (Silicon Valley Code Camp 2017)

AI/ML/DL: Commonality

All of them involve a model

A model represents a system

Goal: a good predictive model

The model is based on:

Many rules (for AI)

data and algorithms (for ML)

large sets of data (for DL)

Page 13: Diving into Deep Learning (Silicon Valley Code Camp 2017)

A Basic Model in Machine Learning

Let’s perform the following steps:

1) Start with a simple model (2 variables)

2) Generalize that model (n variables)

3) See how it might apply to a NN

Page 14: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Linear Regression

One of the simplest models in ML

Fits a line (y = m*x + b) to data in 2D

Finds best line by minimizing MSE:

m = average of x values (“mean”)

b also has a closed form solution

Page 15: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Linear Regression in 2D: example

Page 16: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Sample Cost Function #1 (MSE)

Page 17: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Linear Regression: example #1

One feature (independent variable):

X = number of square feet

Predicted value (dependent variable):

Y = cost of a house

A very “coarse grained” model

We can devise a much better model

Page 18: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Linear Regression: example #2

Multiple features:

X1 = # of square feet

X2 = # of bedrooms

X3 = # of bathrooms (dependency?)

X4 = age of house

X5 = cost of nearby houses

X6 = corner lot (or not): Boolean

a much better model (6 features)

Page 19: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Linear Multivariate Analysis

General form of multivariate equation:

Y = w1*x1 + w2*x2 + . . . + wn*xn + b

w1, w2, . . . , wn are numeric values

x1, x2, . . . , xn are variables (features)

Properties of variables:

Can be independent (Naïve Bayes)

weak/strong dependencies can exist

Page 20: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Neural Network with 3 Hidden Layers

Page 21: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Neural Networks: equations

Node “values” in first hidden layer:

N1 = w11*x1+w21*x2+…+wn1*xn

N2 = w12*x1+w22*x2+…+wn2*xn

N3 = w13*x1+w23*x2+…+wn3*xn

. . .

Nn = w1n*x1+w2n*x2+…+wnn*xn

Similar equations for other pairs of layers

Page 22: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Neural Networks: Matrices

From inputs to first hidden layer:

Y1 = W1*X + B1 (X/Y1/B1: vectors; W1: matrix)

From first to second hidden layers:

Y2 = W2*X + B2 (X/Y2/B2: vectors; W2: matrix)

From second to third hidden layers:

Y3 = W3*X + B3 (X/Y3/B3: vectors; W3: matrix)

Apply an “activation function” to y values

Page 23: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Neural Networks (general)

Multiple hidden layers:

Layer composition is your decision

Activation functions: sigmoid, tanh, RELU

https://en.wikipedia.org/wiki/Activation_function

Back propagation (1980s)

https://en.wikipedia.org/wiki/Backpropagation

=> Initial weights: small random numbers

Page 24: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Euler’s Function

Page 25: Diving into Deep Learning (Silicon Valley Code Camp 2017)

The sigmoid Activation Function

Page 26: Diving into Deep Learning (Silicon Valley Code Camp 2017)

The tanh Activation Function

Page 27: Diving into Deep Learning (Silicon Valley Code Camp 2017)

The ReLU Activation Function

Page 28: Diving into Deep Learning (Silicon Valley Code Camp 2017)

The softmax Activation Function

Page 29: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Activation Functions in Python

import numpy as np

...

# Python sigmoid example:

z = 1/(1 + np.exp(-np.dot(W, x)))

...# Python tanh example:

z = np.tanh(np.dot(W,x));

# Python ReLU example:

z = np.maximum(0, np.dot(W, x))

Page 30: Diving into Deep Learning (Silicon Valley Code Camp 2017)

What’s the “Best” Activation Function?

Initially: sigmoid was popular

Then: tanh became popular

Now: RELU is preferred (better results)

Softmax: for FC (fully connected) layers

NB: sigmoid and tanh are used in LSTMs

Page 31: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Even More Activation Functions!

https://stats.stackexchange.com/questions/115258/comprehensive-list-of-activation-functions-in-neural-networks-with-pros-cons

https://medium.com/towards-data-science/activation-functions-and-its-types-which-is-better-a9a5310cc8f

https://medium.com/towards-data-science/multi-layer-neural-networks-with-sigmoid-function-deep-learning-for-rookies-2-bf464f09eb7f

Page 32: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Sample Cost Function #1 (MSE)

Page 33: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Sample Cost Function #2

Page 34: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Sample Cost Function #3

Page 35: Diving into Deep Learning (Silicon Valley Code Camp 2017)

How to Select a Cost Function

1) Depends on the learning type:

=> supervised/unsupervised/RL

2) Depends on the activation function

3) Other factors

Example:

cross-entropy cost function for supervised

learning on multiclass classification

Page 36: Diving into Deep Learning (Silicon Valley Code Camp 2017)

GD versus SGD

SGD (Stochastic Gradient Descent):

+ involves a SUBSET of the dataset

+ aka Minibatch Stochastic Gradient Descent

GD (Gradient Descent):

+ involves the ENTIRE dataset

More details:

http://cs229.stanford.edu/notes/cs229-notes1.pdf

Page 37: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Setting up Data & the Model

Normalize the data (DL only):

Subtract the ‘mean’ and divide by stddev

[Central Limit Theorem]

Initial weight values for NNs:

Random numbers between -1 and 1

More details:

http://cs231n.github.io/neural-networks-2/#losses

Page 38: Diving into Deep Learning (Silicon Valley Code Camp 2017)

What are Hyper Parameters?

higher level concepts about the model such as

complexity, or capacity to learn

Cannot be learned directly from the data in the

standard model training process

must be predefined

Page 39: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Hyper Parameters (examples)

# of hidden layers in a neural network

the learning rate (in many models)

the dropout rate

# of leaves or depth of a tree

# of latent factors in a matrix factorization

# of clusters in a k-means clustering

Page 40: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Hyper Parameter: dropout rate

"dropout" refers to dropping out units (both hidden and visible) in a neural network

a regularization technique for reducing overfitting in neural networks

prevents complex co-adaptations on training data

a very efficient way of performing model averaging with neural networks

Page 41: Diving into Deep Learning (Silicon Valley Code Camp 2017)

How Many Layers in a DNN?

Algorithm #1 (from Geoffrey Hinton):

1) add layers until you start overfitting your

training set

2) now add dropout or some another

regularization method

Algorithm #2 (Yoshua Bengio):

"Add layers until the test error does not improve

anymore.”

Page 42: Diving into Deep Learning (Silicon Valley Code Camp 2017)

How Many Hidden Nodes in a DNN?

Based on a relationship between:

# of input and # of output nodes

Amount of training data available

Complexity of the cost function

The training algorithm

Page 43: Diving into Deep Learning (Silicon Valley Code Camp 2017)

CNNs versus RNNs

CNNs (Convolutional NNs):

Good for image processing

2000: CNNs processed 10-20% of all checks

=> Approximately 60% of all NNs

RNNs (Recurrent NNs):

Good for NLP and audio

Page 44: Diving into Deep Learning (Silicon Valley Code Camp 2017)

CNNs: Convolution Calculations

https://docs.gimp.org/en/plug-in-

convmatrix.html

Page 45: Diving into Deep Learning (Silicon Valley Code Camp 2017)

CNNs: Convolution Matrices (examples)

Sharpen:

Blur:

Page 46: Diving into Deep Learning (Silicon Valley Code Camp 2017)

CNNs: Convolution Matrices (examples)

Edge detect:

Emboss:

Page 47: Diving into Deep Learning (Silicon Valley Code Camp 2017)

CNNs: Sample Convolutions/Filters

Page 48: Diving into Deep Learning (Silicon Valley Code Camp 2017)

CNNs: Max Pooling Example

Page 49: Diving into Deep Learning (Silicon Valley Code Camp 2017)

CNNs: convolution-pooling (1)

Page 50: Diving into Deep Learning (Silicon Valley Code Camp 2017)

CNNs: convolution and pooling (2)

Page 51: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Sample CNN in Keras (fragment) from keras.models import Sequential

from keras.layers.core import Dense, Dropout, Flatten, Activation

from keras.layers.convolutional import Conv2D, MaxPooling2D

from keras.optimizers import Adadelta

input_shape = (3, 32, 32)

nb_classes = 10

model = Sequential()

model.add(Conv2D(32, (3, 3), padding='same’,

input_shape=input_shape))

model.add(Activation('relu'))

model.add(Conv2D(32, (3, 3)))

model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Dropout(0.25))

Page 52: Diving into Deep Learning (Silicon Valley Code Camp 2017)

GANs: Generative Adversarial Networks

Page 53: Diving into Deep Learning (Silicon Valley Code Camp 2017)

GANs: Generative Adversarial Networks

Make imperceptible changes to images

Can consistently defeat all NNs

Can have extremely high error rate

Some images create optical illusions

https://www.quora.com/What-are-the-pros-and-cons-of-using-generative-adversarial-networks-a-type-of-neural-network

Page 54: Diving into Deep Learning (Silicon Valley Code Camp 2017)

GANs: Generative Adversarial Networks

Create your own GANs:

https://www.oreilly.com/learning/generative-adversarial-networks-for-

beginners

https://github.com/jonbruner/generative-adversarial-networks

GANs from MNIST:

http://edwardlib.org/tutorials/gan

Page 55: Diving into Deep Learning (Silicon Valley Code Camp 2017)

GANs: Generative Adversarial Networks

GANs, Graffiti, and Art:

https://thenewstack.io/camouflaged-graffiti-road-signs-can-fool-

machine-learning-models/

GANs and audio:

https://www.technologyreview.com/s/608381/ai-shouldnt-believe-

everything-it-hears

Houdini algorithm: https://arxiv.org/abs/1707.05373

Page 56: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Deep Learning Playground

TF playground home page:

http://playground.tensorflow.org

Demo #1:

https://github.com/tadashi-aikawa/typescript-

playground

Converts playground to TypeScript

Page 57: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Android and Deep Learning (1)

Option #1: generate the model outside of Android

(use a Python script)

Option #2: use a pre-trained model

Option #3: use an existing apk with DL

Option #4: use TensorFlow Lite APIs (when?)

Page 58: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Android and Deep Learning (2)

Generate the model outside of Android

Perform the following steps:

Create an app in Android Studio

generate a (“.pb”) model (via Python script)

Copy the model into the assets folder

Compile and deploy to a device

Page 59: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Android and Deep Learning (3)

Android app with pre-configured model

Download/uncompress this sample:

http://nilhcem.com/android/custom-tensorflow-classifier

Open the project in Android Studio

Compile and deploy to an Android device

Page 60: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Android and Deep Learning (4)

TensorFlow Lite: Google I/O (release date?)

A subset of the TensorFlow APIs (which ones?)

Provides “regular” TensorFlow APIs for apps

Does not require Python scripts (?)

Page 61: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Deep Learning and Art

“Convolutional Blending” images:

=> 19-layer Convolutional Neural Network

www.deepart.io

Bots created their own language:

https://www.recode.net/2017/3/23/14962182/ai-learning-

language-open-ai-research

https://www.fastcodesign.com/90124942/this-google-

engineer-taught-an-algorithm-to-make-train-footage-

and-its-hypnotic

Page 62: Diving into Deep Learning (Silicon Valley Code Camp 2017)

What Do I Learn Next?

PGMs (Probabilistic Graphical Models)

MC (Markov Chains)

MCMC (Markov Chains Monte Carlo)

HMMs (Hidden Markov Models)

RL (Reinforcement Learning)

Hopfield Nets

Neural Turing Machines

Autoencoders

Hypernetworks

Pixel Recurrent Neural Networks

Bayesian Neural Networks

SVMs

Page 63: Diving into Deep Learning (Silicon Valley Code Camp 2017)

Some Recent Books

1) HTML5 Canvas and CSS3 Graphics (2013)

2) jQuery, CSS3, and HTML5 for Mobile (2013)

3) HTML5 Pocket Primer (2013)

4) jQuery Pocket Primer (2013)

5) HTML5 Mobile Pocket Primer (2014)

6) D3 Pocket Primer (2015)

7) Python Pocket Primer (2015)

8) SVG Pocket Primer (2016)

9) CSS3 Pocket Primer (2016)

10) Android Pocket Primer (2017)

11) Angular Pocket Primer (2017)

Page 65: Diving into Deep Learning (Silicon Valley Code Camp 2017)

About Me: Training=> Deep Learning and TensorFlow

=> Mobile and TensorFlow

=> Python and TensorFlow

=> Python and Keras

=> R Programming

=> Angular 4 (with RxJS)