background of neural networks deep learning conclusion · kevin patel deep learning 12/61....

Post on 31-Jul-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MotivationBackground of Neural Networks

Deep LearningConclusion

Deep Learning

Kevin Patel

June 24, 2016

Kevin Patel Deep Learning 1 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Outline

1 Motivation

2 Background of Neural NetworksPerceptronFeedforward Neural Networks

3 Deep LearningGreedy Layerwise Unsupervised Pretraining

4 Conclusion

Kevin Patel Deep Learning 2 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Human Beings and Computer Science

What qualities of human beings make them good computerscientists?

ImpatienceLaziness

Kevin Patel Deep Learning 3 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Human Beings and Computer Science

What qualities of human beings make them good computerscientists?

Impatience

Laziness

Kevin Patel Deep Learning 3 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Human Beings and Computer Science

What qualities of human beings make them good computerscientists?

ImpatienceLaziness

Kevin Patel Deep Learning 3 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Machine Translation

Kevin Patel Deep Learning 4 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Information Retrieval

Kevin Patel Deep Learning 5 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Sentiment Analysis

Kevin Patel Deep Learning 6 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Human Beings and Computer Science

What qualities of human beings make them good computerscientists?

ImpatienceLaziness

Major reasons for the evolution of the field of ArtificialIntelligence

Kevin Patel Deep Learning 7 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Know the limits

AI always fascinates people

Hyped by movies

Important to know what is actually possible

Also helps to decide where to push

Kevin Patel Deep Learning 8 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Know the limits

AI always fascinates people

Hyped by movies

Important to know what is actually possible

Also helps to decide where to push

Kevin Patel Deep Learning 8 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Know the limits

AI always fascinates people

Hyped by movies

Important to know what is actually possible

Also helps to decide where to push

Kevin Patel Deep Learning 8 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Know the limits

AI always fascinates people

Hyped by movies

Important to know what is actually possible

Also helps to decide where to push

Kevin Patel Deep Learning 8 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Classification of AI systems

Two major categories based on working

Rule basedStatistical

Statistical systems are further classified based on their datausage

SupervisedUnsupervised

Statistical systems are also classified based on the problemsolved

ClassificationRegression...

Kevin Patel Deep Learning 9 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

A Refresher in Linear Algebra

Vector representations

Vector operations

Similarity between vectors

Kevin Patel Deep Learning 10 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

A Refresher in Optimization

Minima and Maxima

Local and Global Minima

Partial Differentiation from Calculus

Kevin Patel Deep Learning 11 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

The Ultimate Computing Machine

The award for the most amazing computing machine goes to

Human Brain

Who gave it this award?

We, the researchers of AI did

How exactly?

By constantly trying to imitate it

Kevin Patel Deep Learning 12 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

The Ultimate Computing Machine

The award for the most amazing computing machine goes to

Human Brain

Who gave it this award?

We, the researchers of AI did

How exactly?

By constantly trying to imitate it

Kevin Patel Deep Learning 12 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

The Ultimate Computing Machine

The award for the most amazing computing machine goes to

Human Brain

Who gave it this award?

We, the researchers of AI did

How exactly?

By constantly trying to imitate it

Kevin Patel Deep Learning 12 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

The Ultimate Computing Machine

The award for the most amazing computing machine goes to

Human Brain

Who gave it this award?

We, the researchers of AI did

How exactly?

By constantly trying to imitate it

Kevin Patel Deep Learning 12 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

The Ultimate Computing Machine

The award for the most amazing computing machine goes to

Human Brain

Who gave it this award?

We, the researchers of AI did

How exactly?

By constantly trying to imitate it

Kevin Patel Deep Learning 12 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

The Ultimate Computing Machine

The award for the most amazing computing machine goes to

Human Brain

Who gave it this award?

We, the researchers of AI did

How exactly?

By constantly trying to imitate it

Kevin Patel Deep Learning 12 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Perceptron Algorithm

Given a set of input/label pairs (x1, y1), . . . , (xn, yn)

Learn a function to classify the problem

Learn a set of weights (w1, . . . ,wm) for the input feature

f (x) =

{1 if

∑mi=1 wixi > 0

0 otherwise

Kevin Patel Deep Learning 13 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Perceptron Algorithm

Activationfunction

∑x2

w2

...

xn

wn

x1 w1

x0

inputs weights

Kevin Patel Deep Learning 14 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Perceptron Algorithm

ProblemHand-crafted

Features

TrainableClassifier

Output

Input weights

Kevin Patel Deep Learning 15 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Perceptron Example

The perceptron calculates, y =∑m

i=1 wi × xi + b

This is similar to y = m × x + c which is equation of a line in2d and hyperplane in general

Divide the input feature space into two regions, (positive andnegative class regions)

Kevin Patel Deep Learning 16 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Training a Perceptron

Algorithm 2.1: Perceptron Algorithm(D)

comment: Initialize the feature weights to zero or random initialization

w = zeros()for each (x , y) ∈ D

do

comment: Calculate prediction

t = f (∑m

i=1 wixi )comment: Update the feature weights

if w = w + α(y − t)x

Kevin Patel Deep Learning 17 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Training a Perceptron: Example

x1 x2 y

0 0 00 1 11 0 11 1 1

Train a perceptron for OR gate

Learn weights w = [w1,w2]

Kevin Patel Deep Learning 18 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Training a Perceptron: Example

Initialize weights to zero, w1 = 0,w2 = 0

Input, x1 = 0, x2 = 0

t = f( w1 × x1 + w2 × x2) = f(0 × 0 + 0 × 0) = f(0) = 0

w = w + α(y − t)x = w + (0− 0)x = w

Kevin Patel Deep Learning 19 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Training a Perceptron: Example

Initialize weights to zero, w1 = 0,w2 = 0

Input, x1 = 0, x2 = 1

t = f( w1 × x1 + w2 × x2) = f(0 × 0 + 0 × 1) = f(0) = 0

w = w + α(y − t)x = w + (1− 0)x = w + x

w = [0, 1]

Kevin Patel Deep Learning 20 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Training a Perceptron: Example

Initialize weights to zero, w1 = 0,w2 = 1

Input, x1 = 1, x2 = 0

t = f( w1 × x1 + w2 × x2) = f(0 × 1 + 1 × 0) = f(0) = 0

w = w + α(y − t)x = w + (1− 0)x = w + x

w = [1, 1]

Kevin Patel Deep Learning 21 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Training a Perceptron: Example

Initialize weights to zero, w1 = 1,w2 = 1

Input, x1 = 1, x2 = 1

t = f( w1 × x1 + w2 × x2) = f(1 × 1 + 1 × 1) = f(2) = 1

w = w + α(y − t)x = w + (1− 1)x = w

Kevin Patel Deep Learning 22 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Training a Perceptron: Example

x1

x2

decision boundary red circles indicate 1white circles indicate 0

Kevin Patel Deep Learning 23 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Disadvantages of Perceptron Algorithm

Cannot learn non-linear function

Famous XOR example

Kevin Patel Deep Learning 24 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Disadvantages of Perceptron Algorithm

x1 x2 y

0 0 00 1 11 0 11 1 0

Train a perceptron to replicate XOR gate

Learn weights w1,w2

Kevin Patel Deep Learning 25 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Disadvantages of Perceptron Algorithm

x1

x2

red circles indicate 1white circles indicate 0

Kevin Patel Deep Learning 26 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Disadvantages of Perceptron Algorithm

A single perceptron cannot learn an XOR function

Need multiple perceptrons

What about hierarchy of perceptrons connected together?

Kevin Patel Deep Learning 27 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Multilayer Perceptron for XOR Problem

y1

x2w2 = 1

x1 w1 = −1

y2

x2w3 = −1

x1 w4 = 1

y

w5 = 1

w6 = 1

Kevin Patel Deep Learning 28 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Networks

Basically, a hierarchy perceptrons

With much more smoother activation functions, such as:

SigmoidTanhReLu...

Kevin Patel Deep Learning 29 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Networks (contd.)

Inputlayer

Hiddenlayer

1

Hiddenlayer

2

Outputlayer

Kevin Patel Deep Learning 30 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Network: Forward Propagation

Let X = (x1, . . . , xn) be the set of input features

hidden layer activation neurons,aj = f (

∑ni=1Wjixi ), ∀j ∈ 1, . . . h

Kevin Patel Deep Learning 31 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Network: Forward Propagation

Let a = (a1, . . . , ah) be the set of hidden layer features

output neurons, ok = g(∑h

j=1 Ukjaj), ∀k ∈ 1, . . .K

Kevin Patel Deep Learning 32 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Network: Learning Algorithm

Adjust weights W and U to minimize the error on training set

Define the error to be squared loss between predictions andtrue output

E =1

2Error2 =

1

2(y − o)2 (1)

Gradient w.r.t to output is,

∂E

∂ok=

1

2× 2× (yk − ok) = (yk − ok) (2)

Kevin Patel Deep Learning 33 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Network:: Learning Algorithm

We have the errors calculated at output neurons

Send the error to lower layers

Kevin Patel Deep Learning 34 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Network: Learning Algorithm

Calculate gradient w.r.t to parameters U

∂E

∂ok=

1

2× 2× (yk − ok) = (yk − ok)

ok = g(∑h

j=1 Ukjaj), ∀k ∈ 1, . . .K

∂E

∂Ukj=∂E

∂ok× g ′(

h∑j=1

Ukjaj)× aj (3)

Update for Ukj will be,

Ukj = Ukj − η ×∂E

∂Ukj(4)

Kevin Patel Deep Learning 35 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Network: Learning Algorithm

How to update the parameters W ?

aj = f (∑n

i=1Wjixi )

ok = g(∑h

j=1 Ukjaj)

Replacing for aj , ok = g(∑h

j=1 Ukj f (∑n

i=1Wjixi ))

Calculate gradient w.r.t aj

Kevin Patel Deep Learning 36 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Network: Learning Algorithm

ok = g(∑h

j=1 Ukjaj)

We have calculated ∂E∂ok

∂E

∂aj=

K∑k=1

∂E

∂ok× g ′ × Ukj (5)

Kevin Patel Deep Learning 37 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Network: Backpropagation of errors

Updation of parameters indicated by red lines

Kevin Patel Deep Learning 38 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Network: Backpropagation of errors

Errors are now accumulated at hidden layer neurons

Kevin Patel Deep Learning 39 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Network:: Backpropagation of errors

We have calculated errors accumulated at each hiddenneuron, ∂E

∂aj

Use this to update the parametrs W

aj = f (∑n

i=1Wjixi ), ∀j ∈ 1, . . . h

∂E

∂Wji=∂E

∂oj× f ′(

n∑i=1

Wjixi )× xi (6)

Update for Wji will be,

Wji = Wji − η ×∂E

∂Wji(7)

Kevin Patel Deep Learning 40 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

Feedforward Neural Network: Backpropagation of errors

Updation of parameters indicated by red lines

Kevin Patel Deep Learning 41 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

PerceptronFeedforward Neural Networks

FeedForward Neural Network

The training proceeds in an online fashion

Minibatches are also used (i.e, parameters are updated afterseeing k examples )

Monitor the error on validation set after one complete sweepof training set

The training repeats until the error on validation set stops todecrease

Kevin Patel Deep Learning 42 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Problems with Feedforward Networks

Vanishing gradient

Stuck in local optimas

Kevin Patel Deep Learning 43 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Problems with Feedforward Networks (contd.)

Inputlayer

Hidlayer

1

Hidlayer

2

Hidlayer

3

Hidlayer

4

Hidlayer

5

Hidlayer

6

Hidlayer

7

Hidlayer

8

Outputlayer

Kevin Patel Deep Learning 44 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Problems with Feedforward Networks (contd.)

Can we instead start at a relatively good position?

Then even small updates will not be much of an issue

Increased number of parameters

Solution to both: Unsupervised Learning

Kevin Patel Deep Learning 45 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Advantages of Unsupervised Learning

Unsupervised Learning has no labels

But, there’s lot of unsupervised data

Is the whole of Internet enough?

But, they have no labels

One option: use the input itself as labels

Kevin Patel Deep Learning 46 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

AutoEncoders

Two functions: Encoder fθ and Decoder gθ

Datapoint x ’s representation: h = fθ(x)

Reconstructed using decoder r = gθ(h)

Goal: Minimize reconstruction error

Kevin Patel Deep Learning 47 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

AutoEncoder Example

Kevin Patel Deep Learning 48 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

AutoEncoder Trivial Solution

Kevin Patel Deep Learning 49 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Regularized AutoEncoders

Sparse AutoEncoders

Denoising AutoEncoders

Contractive AutoEncoders

Kevin Patel Deep Learning 50 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Sparse AutoEncoder

Kevin Patel Deep Learning 51 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Denoising AutoEncoder

Kevin Patel Deep Learning 52 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Contractive AutoEncoder

Kevin Patel Deep Learning 53 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Greedy Layerwise Unsupervised Pretraining

The main reason for major success of Deep Learning

Definition:

Greedy layerwise: One layer trained according to local optimaUnsupervised: The intermediate layers do not need the finallabels for trainingPretraining: After this is done, another supervised trainingstep is applied on the entire network

Kevin Patel Deep Learning 54 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Stacked AutoEncoders

Greedy Layerwise Unsupervised Pretraining in action

Inputlayer

Hidlayer

1

Hidlayer

2

Hidlayer

3

Outputlayer

Kevin Patel Deep Learning 55 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Stacked AutoEncoders

Greedy Layerwise Unsupervised Pretraining in action

Inputlayer

Hidlayer

1

Inputlayer

Kevin Patel Deep Learning 56 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Stacked AutoEncoders

Greedy Layerwise Unsupervised Pretraining in action

Hidlayer

1

Hidlayer

2

Hidlayer

1

Kevin Patel Deep Learning 57 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Stacked AutoEncoders

Greedy Layerwise Unsupervised Pretraining in action

Hidlayer

2

Hidlayer

3

Hidlayer

2

Kevin Patel Deep Learning 58 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Greedy Layerwise Unsupervised Pretraining

Advanced Architectures and Optimization Algorithms

Architectures

Convolutional Neural NetworksRecurrent Neural NetworksRecursive Neural Networks

Optimization Algorithms

RMSpropADAGradADADelta

Kevin Patel Deep Learning 59 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Conclusion

Motivated AI in general

Discussed perceptron and feedforward neural networks

Understood shortcomings of normal attempts at deepnetworks

Understood Greedy Layerwise Unsupervised Pretraining

Kevin Patel Deep Learning 60 / 61

MotivationBackground of Neural Networks

Deep LearningConclusion

Thank You

Kevin Patel Deep Learning 61 / 61

top related