background of neural networks deep learning conclusion · kevin patel deep learning 12/61....

MotivationBackground of Neural Networks

Deep LearningConclusion

Deep Learning

Kevin Patel

June 24, 2016

Kevin Patel Deep Learning 1 / 61

Outline

1 Motivation

2 Background of Neural NetworksPerceptronFeedforward Neural Networks

3 Deep LearningGreedy Layerwise Unsupervised Pretraining

4 Conclusion

Human Beings and Computer Science

What qualities of human beings make them good computerscientists?

ImpatienceLaziness

Impatience

Laziness

ImpatienceLaziness

Machine Translation

Information Retrieval

Sentiment Analysis

ImpatienceLaziness

Major reasons for the evolution of the field of ArtificialIntelligence

Know the limits

AI always fascinates people

Hyped by movies

Important to know what is actually possible

Also helps to decide where to push

Know the limits

Hyped by movies

Know the limits

Hyped by movies

Know the limits

Hyped by movies

Classification of AI systems

Two major categories based on working

Rule basedStatistical

Statistical systems are further classified based on their datausage

SupervisedUnsupervised

Statistical systems are also classified based on the problemsolved

ClassificationRegression...

A Refresher in Linear Algebra

Vector representations

Vector operations

Similarity between vectors

A Refresher in Optimization

Minima and Maxima

Local and Global Minima

Partial Differentiation from Calculus

PerceptronFeedforward Neural Networks

The Ultimate Computing Machine

The award for the most amazing computing machine goes to

Human Brain

Who gave it this award?

We, the researchers of AI did

How exactly?

By constantly trying to imitate it

Human Brain

How exactly?

Human Brain

How exactly?

Human Brain

How exactly?

Human Brain

How exactly?

Human Brain

How exactly?

Perceptron Algorithm

Given a set of input/label pairs (x1, y1), . . . , (xn, yn)

Learn a function to classify the problem

Learn a set of weights (w1, . . . ,wm) for the input feature

f (x) =

∑mi=1 wixi > 0

0 otherwise

Activationfunction

inputs weights

ProblemHand-crafted

Features

TrainableClassifier

Output

Input weights

Perceptron Example

The perceptron calculates, y =∑m

i=1 wi × xi + b

This is similar to y = m × x + c which is equation of a line in2d and hyperplane in general

Divide the input feature space into two regions, (positive andnegative class regions)

Training a Perceptron

Algorithm 2.1: Perceptron Algorithm(D)

comment: Initialize the feature weights to zero or random initialization

w = zeros()for each (x , y) ∈ D

comment: Calculate prediction

t = f (∑m

i=1 wixi )comment: Update the feature weights

if w = w + α(y − t)x

Training a Perceptron: Example

x1 x2 y

0 0 00 1 11 0 11 1 1

Train a perceptron for OR gate

Learn weights w = [w1,w2]

Initialize weights to zero, w1 = 0,w2 = 0

Input, x1 = 0, x2 = 0

t = f( w1 × x1 + w2 × x2) = f(0 × 0 + 0 × 0) = f(0) = 0

w = w + α(y − t)x = w + (0− 0)x = w

Input, x1 = 0, x2 = 1

t = f( w1 × x1 + w2 × x2) = f(0 × 0 + 0 × 1) = f(0) = 0

w = w + α(y − t)x = w + (1− 0)x = w + x

w = [0, 1]

Input, x1 = 1, x2 = 0

t = f( w1 × x1 + w2 × x2) = f(0 × 1 + 1 × 0) = f(0) = 0

w = w + α(y − t)x = w + (1− 0)x = w + x

w = [1, 1]

Input, x1 = 1, x2 = 1

t = f( w1 × x1 + w2 × x2) = f(1 × 1 + 1 × 1) = f(2) = 1

w = w + α(y − t)x = w + (1− 1)x = w

decision boundary red circles indicate 1white circles indicate 0

Disadvantages of Perceptron Algorithm

Cannot learn non-linear function

Famous XOR example

x1 x2 y

0 0 00 1 11 0 11 1 0

Train a perceptron to replicate XOR gate

Learn weights w1,w2

red circles indicate 1white circles indicate 0

A single perceptron cannot learn an XOR function

Need multiple perceptrons

What about hierarchy of perceptrons connected together?

Multilayer Perceptron for XOR Problem

x2w2 = 1

x1 w1 = −1

x2w3 = −1

x1 w4 = 1

w5 = 1

w6 = 1

Feedforward Neural Networks

Basically, a hierarchy perceptrons

With much more smoother activation functions, such as:

SigmoidTanhReLu...

Feedforward Neural Networks (contd.)

Inputlayer

Hiddenlayer

Outputlayer

Feedforward Neural Network: Forward Propagation

Let X = (x1, . . . , xn) be the set of input features

hidden layer activation neurons,aj = f (

∑ni=1Wjixi ), ∀j ∈ 1, . . . h

Feedforward Neural Network: Forward Propagation

Let a = (a1, . . . , ah) be the set of hidden layer features

output neurons, ok = g(∑h

j=1 Ukjaj), ∀k ∈ 1, . . .K

Feedforward Neural Network: Learning Algorithm

Adjust weights W and U to minimize the error on training set

Define the error to be squared loss between predictions andtrue output

2Error2 =

2(y − o)2 (1)

Gradient w.r.t to output is,

∂ok=

2× 2× (yk − ok) = (yk − ok) (2)

Feedforward Neural Network:: Learning Algorithm

We have the errors calculated at output neurons

Send the error to lower layers

Calculate gradient w.r.t to parameters U

∂ok=

2× 2× (yk − ok) = (yk − ok)

ok = g(∑h

j=1 Ukjaj), ∀k ∈ 1, . . .K

∂Ukj=∂E

∂ok× g ′(

h∑j=1

Ukjaj)× aj (3)

Update for Ukj will be,

Ukj = Ukj − η ×∂E

∂Ukj(4)

How to update the parameters W ?

aj = f (∑n

i=1Wjixi )

ok = g(∑h

j=1 Ukjaj)

Replacing for aj , ok = g(∑h

j=1 Ukj f (∑n

i=1Wjixi ))

Calculate gradient w.r.t aj

ok = g(∑h

j=1 Ukjaj)

We have calculated ∂E∂ok

∂aj=

K∑k=1

∂ok× g ′ × Ukj (5)

Feedforward Neural Network: Backpropagation of errors

Updation of parameters indicated by red lines

Errors are now accumulated at hidden layer neurons

Feedforward Neural Network:: Backpropagation of errors

We have calculated errors accumulated at each hiddenneuron, ∂E

Use this to update the parametrs W

aj = f (∑n

i=1Wjixi ), ∀j ∈ 1, . . . h

∂Wji=∂E

∂oj× f ′(

n∑i=1

Wjixi )× xi (6)

Update for Wji will be,

Wji = Wji − η ×∂E

∂Wji(7)

Updation of parameters indicated by red lines

FeedForward Neural Network

The training proceeds in an online fashion

Minibatches are also used (i.e, parameters are updated afterseeing k examples )

Monitor the error on validation set after one complete sweepof training set

The training repeats until the error on validation set stops todecrease

Greedy Layerwise Unsupervised Pretraining

Problems with Feedforward Networks

Vanishing gradient

Stuck in local optimas

Problems with Feedforward Networks (contd.)

Inputlayer

Hidlayer

Outputlayer

Problems with Feedforward Networks (contd.)

Can we instead start at a relatively good position?

Then even small updates will not be much of an issue

Increased number of parameters

Solution to both: Unsupervised Learning

Advantages of Unsupervised Learning

Unsupervised Learning has no labels

But, there’s lot of unsupervised data

Is the whole of Internet enough?

But, they have no labels

One option: use the input itself as labels

AutoEncoders

Two functions: Encoder fθ and Decoder gθ

Datapoint x ’s representation: h = fθ(x)

Reconstructed using decoder r = gθ(h)

Goal: Minimize reconstruction error

AutoEncoder Example

AutoEncoder Trivial Solution

Regularized AutoEncoders

Sparse AutoEncoders

Denoising AutoEncoders

Contractive AutoEncoders

Sparse AutoEncoder

Denoising AutoEncoder

Contractive AutoEncoder

The main reason for major success of Deep Learning

Definition:

Greedy layerwise: One layer trained according to local optimaUnsupervised: The intermediate layers do not need the finallabels for trainingPretraining: After this is done, another supervised trainingstep is applied on the entire network

Stacked AutoEncoders

Greedy Layerwise Unsupervised Pretraining in action

Inputlayer

Hidlayer

Outputlayer

Inputlayer

Hidlayer

Inputlayer

Hidlayer

Advanced Architectures and Optimization Algorithms

Architectures

Convolutional Neural NetworksRecurrent Neural NetworksRecursive Neural Networks

Optimization Algorithms

RMSpropADAGradADADelta

Conclusion

Motivated AI in general

Discussed perceptron and feedforward neural networks

Understood shortcomings of normal attempts at deepnetworks

Understood Greedy Layerwise Unsupervised Pretraining

Thank You

background of neural networks deep learning conclusion · kevin patel deep learning 12/61....

Documents

intro to deep learning - bptt bright minds … ·...

learning deep learning

lecture 1: introduction to deep learning -...

from reinforcement learning to deep reinforcement...

deep learning / representation learning

deep learning for depth learning cs 229 course project...

introduction to deep learning - computer graphics€¦ ·...

demystifying deep learning - mathworks · demystifying deep...

deploying enterprise deep learning masterclass preview - ...

tutorial: deep reinforcement learning - machine learning...

deep learning for reinforcement learning in · pdf filedeep...

hangzhou deep learning meetup-deep reinforcement learning

tutorial: deep reinforcement learning · outline...

sdrl: interpretable and data-efficient deep reinforcement...

powerai : accelerating deep learning adoption in the...

executing deep learning strategies masterclass preview -...

lecture 1: introduction to deep learning · uva deep...

deep learning

npdl global report - new pedagogies for deep learning ·...

deep learning for reinforcement learning in pacman · deep...