cs 7643: deep learning - georgia institute of …...cs 7643: deep learning dhruv batra georgia tech...

91
CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: Computational Graphs Notation + example Computing Gradients Forward mode vs Reverse mode AD

Upload: others

Post on 20-Apr-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

CS 7643: Deep Learning

Dhruv Batra Georgia Tech

Topics: – Computational Graphs

– Notation + example– Computing Gradients

– Forward mode vs Reverse mode AD

Page 2: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Administrativia• HW1 Released

– Due: 09/22

• PS1 Solutions– Coming soon

(C) Dhruv Batra 2

Page 3: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Project• Goal

– Chance to try Deep Learning– Combine with other classes / research / credits / anything

• You have our blanket permission• Extra credit for shooting for a publication

– Encouraged to apply to your research (computer vision, NLP, robotics,…)

– Must be done this semester.

• Main categories– Application/Survey

• Compare a bunch of existing algorithms on a new application domain of your interest

– Formulation/Development• Formulate a new model or algorithm for a new or old problem

– Theory• Theoretically analyze an existing algorithm

(C) Dhruv Batra 3

Page 4: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Administrativia• Project Teams Google Doc

– https://docs.google.com/spreadsheets/d/1AaXY0JE4lAbHvoDaWlc9zsmfKMyuGS39JAn9dpeXhhQ/edit#gid=0

– Project Title– 1-3 sentence project summary TL;DR– Team member names + GT IDs

(C) Dhruv Batra 4

Page 5: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Recap of last time

(C) Dhruv Batra 5

Page 6: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

How do we compute gradients?• Manual Differentiation

• Symbolic Differentiation

• Numerical Differentiation

• Automatic Differentiation– Forward mode AD– Reverse mode AD

• aka “backprop”

(C) Dhruv Batra 6

Page 7: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Any DAG of differentiable modules is allowed!

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 7

Computational Graph

Page 8: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Directed Acyclic Graphs (DAGs)• Exactly what the name suggests

– Directed edges– No (directed) cycles– Underlying undirected cycles okay

(C) Dhruv Batra 8

Page 9: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Directed Acyclic Graphs (DAGs)• Concept

– Topological Ordering

(C) Dhruv Batra 9

Page 10: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Directed Acyclic Graphs (DAGs)

(C) Dhruv Batra 10

Page 11: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Computational Graphs• Notation #1

(C) Dhruv Batra 11

f(x1, x2) = x1x2 + sin(x1)

Page 12: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Computational Graphs• Notation #2

(C) Dhruv Batra 12

f(x1, x2) = x1x2 + sin(x1)

Page 13: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Example

(C) Dhruv Batra 13

f(x1, x2) = x1x2 + sin(x1)

+

sin( )

x1 x2

*

Page 14: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Logistic Regression as a Cascade

(C) Dhruv Batra 14

Given a library of simple functions

Compose into a

complicate function� log

✓1

1 + e�w

|x

w

|x

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Page 15: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Forward mode vs Reverse Mode• Key Computations

(C) Dhruv Batra 15

Page 16: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

16

g

Forward mode AD

Page 17: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

17

g

Reverse mode AD

Page 18: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Example: Forward mode AD

(C) Dhruv Batra 18

f(x1, x2) = x1x2 + sin(x1)

+

sin( )

x1 x2

*

Page 19: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

(C) Dhruv Batra 19

+

sin( )

x1 x2

*

x1 x1

w1 = cos(x1)x1

x2

w2 = x1x2 + x1x2

w3 = w1 + w2

Example: Forward mode ADf(x1, x2) = x1x2 + sin(x1)

Page 20: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

(C) Dhruv Batra 20

+

sin( )

x1 x2

*

x1 x1

w1 = cos(x1)x1

x2

w2 = x1x2 + x1x2

w3 = w1 + w2

Example: Forward mode ADf(x1, x2) = x1x2 + sin(x1)

Page 21: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Example: Reverse mode AD

(C) Dhruv Batra 21

f(x1, x2) = x1x2 + sin(x1)

+

sin( )

x1 x2

*

Page 22: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

(C) Dhruv Batra 22

Example: Reverse mode ADf(x1, x2) = x1x2 + sin(x1)

+

sin( )

x1 x2

*

w3 = 1

w1 = w3 w2 = w3

x1 = w1 cos(x1) x1 = w2x2 x2 = w2x1

Page 23: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Forward Pass vs Forward mode AD vs Reverse Mode AD

(C) Dhruv Batra 23

+

sin( )

x1 x2

*

w3 = 1

w1 = w3 w2 = w3

x1 = w2x2 x2 = w2x1x1 = w1 cos(x1)

+

sin( )

x2

*

x1 x1 x2

w2 = x1x2 + x1x2

w3 = w1 + w2

w1 = cos(x1)x1

x1

+

sin( )

x1 x2

*

f(x1, x2) = x1x2 + sin(x1)

Page 24: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Forward mode vs Reverse Mode• What are the differences?

• Which one is more memory efficient (less storage)? – Forward or backward?

(C) Dhruv Batra 24

Page 25: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Forward mode vs Reverse Mode• What are the differences?

• Which one is more memory efficient (less storage)? – Forward or backward?

• Which one is faster to compute? – Forward or backward?

(C) Dhruv Batra 25

Page 26: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Plan for Today• (Finish) Computing Gradients

– Forward mode vs Reverse mode AD– Patterns in backprop– Backprop in FC+ReLU NNs

• Convolutional Neural Networks

(C) Dhruv Batra 26

Page 27: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Patterns in backward flow

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 28: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Patterns in backward flow

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 29: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

add gate: gradient distributor

Patterns in backward flow

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 30: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

add gate: gradient distributorQ: What is a max gate?

Patterns in backward flow

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 31: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

add gate: gradient distributormax gate: gradient router

Patterns in backward flow

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 32: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

add gate: gradient distributormax gate: gradient routerQ: What is a mul gate?

Patterns in backward flow

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 33: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

add gate: gradient distributormax gate: gradient routermul gate: gradient switcher

Patterns in backward flow

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 34: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

+

Gradients add at branches

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 35: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Duality in Fprop and Bprop

(C) Dhruv Batra 35

+

+

FPROP BPROPSUM

COPY

Page 36: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

36

Graph (or Net) object (rough psuedo code)

Modularized implementation: forward / backward API

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 37: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

37

(x,y,z are scalars)

x

y

z*

Modularized implementation: forward / backward API

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 38: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

38

(x,y,z are scalars)

x

y

z*

Modularized implementation: forward / backward API

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 39: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

39

Example: Caffe layers

Caffe is licensed under BSD 2-Clause

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 40: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

40

* top_diff (chain rule)

Caffe is licensed under BSD 2-Clause

Caffe Sigmoid Layer

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 41: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

(C) Dhruv Batra 41

Page 42: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

(C) Dhruv Batra 42

Page 43: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Key Computation in DL: Forward-Prop

(C) Dhruv Batra 43Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Page 44: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Key Computation in DL: Back-Prop

(C) Dhruv Batra 44Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Page 45: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

f(x) = max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Jacobian of ReLU

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 46: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

46

f(x) = max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Q: what is the size of the Jacobian matrix?

Jacobian of ReLU

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 47: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

47

f(x) = max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Q: what is the size of the Jacobian matrix?[4096 x 4096!]

Jacobian of ReLU

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 48: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

i.e. Jacobian would technically be a[409,600 x 409,600] matrix :\

f(x) = max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Q: what is the size of the Jacobian matrix?[4096 x 4096!]

in practice we process an entire minibatch (e.g. 100) of examples at one time:

Jacobian of ReLU

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 49: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Q: what is the size of the Jacobian matrix?[4096 x 4096!]

Q2: what does it look like?

f(x) = max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Jacobian of ReLU

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 50: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Jacobians of FC-Layer

(C) Dhruv Batra 50

Page 51: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Jacobians of FC-Layer

(C) Dhruv Batra 51

Page 52: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Jacobians of FC-Layer

(C) Dhruv Batra 52

Page 53: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Neural Networks(without the brain stuff)

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 54: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

54

Example: 200x200 image40K hidden units

~2B parameters!!!

- Spatial correlation is local- Waste of resources + we have not enough training samples anyway..

Fully Connected Layer

Slide Credit: Marc'Aurelio Ranzato

Page 55: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

55

Example: 200x200 image40K hidden unitsFilter size: 10x10

4M parameters

Note: This parameterization is good when input image is registered (e.g., face recognition).

Locally Connected Layer

Slide Credit: Marc'Aurelio Ranzato

Page 56: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

56

STATIONARITY? Statistics is similar at different locations

Note: This parameterization is good when input image is registered (e.g., face recognition).

Example: 200x200 image40K hidden unitsFilter size: 10x10

4M parameters

Locally Connected Layer

Slide Credit: Marc'Aurelio Ranzato

Page 57: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

57

Share the same parameters across different locations (assuming input is stationary):Convolutions with learned kernels

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

Page 58: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutions for mathematicians

(C) Dhruv Batra 58

Page 59: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

(C) Dhruv Batra 59

"Convolution of box signal with itself2" by Convolution_of_box_signal_with_itself.gif: Brian Ambergderivative work: Tinos (talk) - Convolution_of_box_signal_with_itself.gif. Licensed under CC BY-SA 3.0 via Commons -

https://commons.wikimedia.org/wiki/File:Convolution_of_box_signal_with_itself2.gif#/media/File:Convolution_of_box_signal_with_itself2.gif

Page 60: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutions for computer scientists

(C) Dhruv Batra 60

Page 61: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutions for programmers

(C) Dhruv Batra 61

Page 62: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolution Explained• http://setosa.io/ev/image-kernels/

• https://github.com/bruckner/deepViz

(C) Dhruv Batra 62

Page 63: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 63

Page 64: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 64

Page 65: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 65

Page 66: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 66

Page 67: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 67

Page 68: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 68

Page 69: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 69

Page 70: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 70

Page 71: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 71

Page 72: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 72

Page 73: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 73

Page 74: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 74

Page 75: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 75

Page 76: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 76

Page 77: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 77

Page 78: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Mathieu et al. “Fast training of CNNs through FFTs” ICLR 2014

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 78

Page 79: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

* -1 0 1-1 0 1-1 0 1

=

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 79

Page 80: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Learn multiple filters.

E.g.: 200x200 image100 FiltersFilter size: 10x10

10K parameters

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 80

Page 81: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

30721

Fully Connected Layer32x32x3 image -> stretch to 3072 x 1

10 x 3072 weights

activationinput

110

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 82: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

30721

Fully Connected Layer32x32x3 image -> stretch to 3072 x 1

10 x 3072 weights

activationinput

1 number: the result of taking a dot product between a row of W and the input (a 3072-dimensional dot product)

110

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 83: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

83

Convolutional Layer

Page 84: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

84

Convolutional Layer

Page 85: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

32

32

3

Convolution Layer32x32x3 image -> preserve spatial structure

width

height

depth

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 86: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

32

32

3

Convolution Layer

5x5x3 filter

32x32x3 image

Convolve the filter with the imagei.e. “slide over the image spatially, computing dot products”

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 87: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

32

32

3

Convolution Layer

5x5x3 filter

32x32x3 image

Convolve the filter with the imagei.e. “slide over the image spatially, computing dot products”

Filters always extend the full depth of the input volume

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 88: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

32

32

3

Convolution Layer32x32x3 image5x5x3 filter

1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image(i.e. 5*5*3 = 75-dimensional dot product + bias)

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 89: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

32

32

3

Convolution Layer32x32x3 image5x5x3 filter

convolve (slide) over all spatial locations

activation map

1

28

28

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 90: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

32

32

3

Convolution Layer32x32x3 image5x5x3 filter

convolve (slide) over all spatial locations

activation maps

1

28

28

consider a second, green filter

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 91: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

32

32

3

Convolution Layer

activation maps

6

28

28

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:

We stack these up to get a “new image” of size 28x28x6!

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n