deep neural networks are our friendslxmls.it.pt/2016/deep-neural-networks-are-our-friends.pdf ·...

Post on 14-Sep-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Deep Neural NetworksAre Our Friends

Wang Ling

● Part I - Neural Networks are our friends○ Numbers are our friends ○ Operators are our friends○ Functions are our friends○ Parameters are our friends○ Cost Functions are our friends○ Optimizers are our friends○ Gradients are our friends○ Computation Graphs are our friends

Outline

● Part I - Neural Networks are our friends● Part 2 - Into Deep Learning

○ Nonlinear Neural Models○ Multilayer Perceptrons○ Using Discrete Variables○ Example Applications

Outline

Numbers are our friends

Numbers are our friendsAbby Cadabby

How many apples does Abby have?

Numbers are our friends● Types of Numbers:

○ Integers : 5○ Rationals : 1/2○ Reals : 1.4e10 ...

Operators are our friends

4

Bert

Operators are our friends

41

Bert

If Abby has 4 apples, and gives Bert 1 apple, how many apples will

Abby have?

Operators are our friends

3 1

Bert

Operators are our friends● Arithmetic Operators

○ Addition : 23 + 12 = 35○ Subtraction : 31 - 15 = 16○ Multiplication : 4 x 5 = 20○ Division : 20 / 5 = 4

Functions are our friends

41

Functions are our friends

4

5?

1

If Bert always returns 3 bananas for each apple, how many bananas will

Abby receive for 2 apples

Functions are our friends

y = 3x

● Input, x - Number of Apples given by Abby

Functions are our friends

y = 3x

● Input, x - Number of Apples given by Abby

● Output, y - Number of Bananas received by Abby

Functions are our friends

4

5?

1

y = 3x

Functions are our friends

4

5?

1

y = 3x , x =1

Functions are our friends

4

53

1

y = 3x , x =1y = 3

Functions are our friendsy = 3x

Functions are our friendsy = 3x

Cookie Monster

Functions are our friendsy = 3x y = ??

Functions are our friendsy = ??

0

1

Functions are our friendsy = ??

0

1

16

5

Functions are our friendsy = ??

0

1

16

5

20

6

Functions are our friendsy = ??

0

1

16

5

20

6

?

3

If Abby gives Cookie Monster 3 apples, how many bananas

does she get?

Parameters are our friends

y = 3x + 1

● Input● Output

Parameters are our friends

y = wx + b

● Input● Output● Parameters

Input - Fixed, comes from dataParameters - Need to be estimated

Parameters are our friendsy = wx + b

0

1

16

5

20

6

?

3

Data

Parameters are our friendsy = wx + b

0

1

16

5

20

6

?

3

Parameters are our friendsy = wx + b

?

3

x y

1 0

5 16

6 20

Parameters are our friends

y = wx + bx y

1 0

5 16

6 20

Data Model

Parameters are our friends

y = wx + bx y

1 0

5 16

6 20

Data Model

How to find the parameters w and b?

Parameters are our friends

y = wx + bx y

1 0

5 16

6 20

Data ModelModel

Candidate 1x y ŷ

1 0 1

5 16 5

6 20 6y = 1x + 0

Parameters are our friends

y = wx + bx y

1 0

5 16

6 20

Data ModelModel

Candidate 1x y ŷ

1 0 1

5 16 5

6 20 6

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Parameters are our friends

y = wx + bx y

1 0

5 16

6 20

Data ModelModel

Candidate 1x y ŷ

1 0 1

5 16 5

6 20 6

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2Which one is better ?

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1x y ŷ

1 0 1

5 16 5

6 20 6

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1x y ŷ

1 0 1

5 16 5

6 20 6

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

n x y ŷ (y-ŷ)

0 1 0 1 1

1 5 16 5

2 6 20 6

2

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

n x y ŷ (y-ŷ)

0 1 0 1 1

1 5 16 5 121

2 6 20 6

2

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

n x y ŷ (y-ŷ)

0 1 0 1 1

1 5 16 5 121

2 6 20 6 196

2

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

n x y ŷ (y-ŷ)

0 1 0 1 1

1 5 16 5 121

2 6 20 6 196

2

318C(1,0)

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1

n x y ŷ (y-ŷ)

0 1 0 1 1

1 5 16 5 121

2 6 20 6 196

Model Candidate 2

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

2

318

n x y ŷ (y-ŷ)

0 1 0 4 16

1 5 16 12 16

2 6 20 14 36

2

68

C(1,0)

C(2,2)

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1

Model Candidate 2

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

318

68

C(1,0)

C(2,2)

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data Model

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data Model

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

How to find the parameters w and b?

Optimizers are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data Model

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2Optimizer

arg min C(w,b)w,b∈[-∞,∞]

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68

w

b

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68

w

b

2

2

68

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68w1,b1 = 3,2 : C(w1,b1) = ?

w

b

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68w1,b1 = 3,2 : C(w1,b1) = 26

n x y ŷ (y-ŷ)

0 1 0 5 25

1 5 16 17 1

2 6 20 20 0

C(3,2) 26

w

b

2

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68w1,b1 = 3,2 : C(w1,b1) = 26

n x y ŷ (y-ŷ)

0 1 0 5 25

1 5 16 17 1

2 6 20 20 0

C(3,2) 26

w

b

2

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w1,b1 = 3,2 : C(w1,b1) = 26w2,b2 = 4,2 : C(w2,b2) = ??

w

b

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w1,b1 = 3,2 : C(w1,b1) = 26w2,b2 = 4,2 : C(w2,b2) = 136

w

b

n x y ŷ (y-ŷ)

0 1 0 6 36

1 5 16 22 64

2 6 20 26 36

C(4,2) 136

2

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w1,b1 = 3,2 : C(w1,b1) = 26

w

b

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w1,b1 = 3,2 : C(w1,b1) = 26w2,b2 = 3,3 : C(w2,b2) = 41

w

b

n x y ŷ (y-ŷ)

0 1 0 6 36

1 5 16 18 4

2 6 20 21 1

C(3,3) 41

2

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w1,b1 = 3,2 : C(w1,b1) = 26

w

b

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w1,b1 = 3,2 : C(w1,b1) = 26w2,b2 = 3,1 : C(w2,b2) = 17

w

b

n x y ŷ (y-ŷ)

0 1 0 4 16

1 5 16 16 0

2 6 20 19 1

C(3,1) 17

2

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w2,b2 = 3,1 : C(w2,b2) = 17

w

b

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w2,b2 = 3,1 : C(w2,b2) = 17

w

b

w3,b3 = 3,0 : C(w3,b3) = 13

n x y ŷ (y-ŷ)

0 1 0 3 9

1 5 16 15 1

2 6 20 18 4

C(3,0) 13

2

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13w4,b4 = 3,-1 : C(w4,b4) = 17

n x y ŷ (y-ŷ)

0 1 0 2 4

1 5 16 14 4

2 6 20 17 9

C(3,-1) 17

2

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13w4,b4 = 2,0 : C(w4,b4) = 104

n x y ŷ (y-ŷ)

0 1 0 2 4

1 5 16 10 36

2 6 20 12 64

C(2,0) 104

2

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13w4,b4 = 4,0 : C(w4,b4) = 104

n x y ŷ (y-ŷ)

0 1 0 4 16

1 5 16 20 16

2 6 20 24 16

C(2,0) 54

2

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w?,b? = 4,-2 : C(w?,b?) = ??

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

n x y ŷ (y-ŷ)

0 1 0 2 4

1 5 16 18 4

2 6 20 22 4

C(4,-2) 12

2

w?,b? = 4,-2 : C(w?,b?) = 12

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13

Search Problem

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13w4,b4 = 3.01,0 : C(w4,b4) = 12.82

n x y ŷ (y-ŷ)

0 1 0 3.01 9.06

1 5 16 15.01 0.98

2 6 20 18.01 3.96

C(3.01,0) 12.82

2

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w*,b* = 4,-2 : C(w*,b*) = 12

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w*,b* = 4,-2 : C(w*,b*) = 12

y = wx + b

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w*,b* = 4,-4 : C(w*,b*) = 0

y = wx + b

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

Should be used sparingly

y = wx + b

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hwhw = 1

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hwhw = 1C(w0+hw,b0) = C(3,2) = 26

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hwhw = 1C(w0+hw,b0) = C(3,2) = 26 (C(w0+1,b0)-C(w0,b0))

(C(3,2)-C(2,2))=-421

1

rw=

rw=

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hwhw = 1, r = -42hw = 0.1, r = -98hw = 0.01, r = -104hw = 0.001, r = -104

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hwhw = 1, r = -42hw = 0.1, r = -98hw = 0.01, r = -104hw = 0.001, r = -104 ∂C

∂w(w0,b0)hw → 0, r =

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hw∂C

∂w=

∂∑(ŷn-yn) 2

∂wn

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hw∂C

∂w=

∂∑(ŷn-yn) 2

∂wn = ∑-2(ŷn-yn)xn

n

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68

∂C

∂w=

∂∑(ŷn-yn) 2

∂wn = ∑-2(ŷn-yn)xn

n

∂w(w0,b0)hw → 0, rw = = -104

∂C

n x y ŷ (ŷ-y) -2(ŷ-y)x

0 1 0 4 4 8

1 5 16 12 -4 -40

2 6 20 14 -6 -72

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hw∂C

∂w=

∂∑(ŷn-yn) 2

∂wn = ∑-2(ŷn-yn)xn

n

∂C

∂b=

∂∑(ŷn-yn) 2

∂bn = ∑-2(ŷn-yn)

n

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68

∂w(w0,b0)hw → 0, rw = = -104

∂C

n x y ŷ (ŷ-y) -2(ŷ-y)

0 1 0 4 4 8

1 5 16 12 -4 -8

2 6 20 14 -6 -12

∂w(w0,b0)hb → 0, rb = = -12

∂C

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68

∂w(w0,b0)hw → 0, rw = = -104

∂C

∂w(w0,b0)hb → 0, rb = = -12

∂C

w

b

y = wx + b

2

2w1 = w0 - rw

b1 = b0 - rb → Learning Rate

Gradients are our friendsy = 4x-4

Data

0

1

16

5

20

6

?

3

Gradients are our friendsy = 4x-4

Data

0

1

16

5

20

6

8

3

Computation Graphs are our friends

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

∂C

∂w=

∂∑(ŷn-yn)

∂wn = ∑-2(ŷn-yn)xn

n

∂C

∂b=

∂∑(ŷn-yn) 2

∂bn = ∑-2(ŷn-yn)

n

y = wx + b

Easy!

2

Computation Graphs are our friends

Harder!

y = wx + b + tanh(yx + b)2

Computation Graphs are our friends

Computation Graphs can

compute gradients for you!

y = wx + b + tanh(yx + b)2

Computation Graphs are our friends

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

∂C

∂w=

∂∑(ŷn-yn)

∂wn = ∑-2(ŷn-yn)xn

n

∂C

∂b=

∂∑(ŷn-yn) 2

∂bn = ∑-2(ŷn-yn)

n

y = wx + b

2

Computation Graphs are our friends

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

∂C

∂w=

∂(ŷn-yn)

∂ynn

= ∑-2(ŷn-yn)xn n

2

= ∑-2(ŷn-yn) n

y = wx + b

∂yn

∂w

2

∂C

∂b=

∂(ŷn-yn)

∂ynn

∂yn

∂b∑

Computation Graphs are our friends

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

∂C

∂w=

∂(ŷn-yn)

∂ynn

2

y = wx + b

∂yn

∂w

2

∂C

∂b=

∂(ŷn-yn)

∂ynn ∂b∑ ∂yn

Computation Graphs are our friends

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

∂C

∂w=

∂(ŷn-yn)

∂ynn

2

y = o + bo = wx

∂yn

∂w

2

∂C

∂b=

∂(ŷn-yn)

∂ynn ∂b∑ ∂yn

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂ynn

2

c = dd = y - ŷy = o + bo = wx

∂yn

∂w

2

∂C

∂b=

∂(ŷn-yn)

∂ynn ∂b∑ ∂yn

2

∂(ŷn-yn)

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

2

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b=

∂(ŷn-yn)

∂ynn ∂b∑ ∂yn

2

∂dn

∂yn

∂yn

∂on

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

Add

Product

Sub

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

Add

Product

forward(x,y) → zbackward(x,y,dz) → dx,dy

Sub

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

Add

Product

forward(x,y) : return x - ybackward(x,y,dz) : return dz, -dz

Sub

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

Add

Product

forward(x,y) : return x - ybackward(x,y,dz) : return dz, -dz

Sub

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

Add

Product

forward(x,y) : return x - ybackward(x,y,dz) : return 1, -1

Sub ∂dn

∂ŷn

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

Add

Product

o

w x

Product

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷ

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

o

w x

Product

b

Add

y

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

∂on

∂w∑

∂C

∂b

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

o

w x

Product

b

Add

y ŷ

d c

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0}

∂C

∂w=

∂cn

∂dnn

∂on

∂w∑

∂C

∂b

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0}

∂C

∂w=

∂cn

∂dnn

∂on

∂w∑

∂C

∂b

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

Input

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0}

∂C

∂w=

∂cn

∂dnn

∂on

∂w∑

∂C

∂b

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

Input

Parameters

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables

Variables

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables

Variables

2 values: x and dx

0,0

0,0

0,00,0 0,0

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

10,0

0,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

10,0

12,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

2,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

10,0

2,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

Computation Graphs are our friendsPower 2

Sub

o

Add

y

d c Id CForward:

1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

Computation Graphs are our friends

o

y

d c CForward:

1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Computation Graphs are our friendsPower 2

Sub

o

Add

y

d c Add CForward:

1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

g0,0

Add

s 0,0

Computation Graphs are our friends

o

y

d c CForward:

1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

g0,0

s 0,0

1st

2nd

3th

4th

5th 6th 7th

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

0,0

0,0

0,00,0 0,0

1st

2nd

3rd

4th 5th

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

10,0

0,0

0,00,0 0,0

1st

2nd

3rd

4th 5th

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

10,0

12,0

0,00,0 0,0

1st

2nd

3rd

4th 5th

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

10,0

12,0

-4,00,0 0,0

1st

2nd

3rd

4th 5th

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

10,0

12,0

-4,016,0 0,0

1st

2nd

3rd

4th 5th

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

10,0

12,0

-4,016,0

1st

2nd

3rd

4th 5th16,0

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

5-Set gradients to final variables

10,0

12,0

-4,016,0

1st

2nd

3rd

4th 5th16,1

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,016,0

1st

2nd

3rd

4th 5th16,1

∂C

∂c C=c =1

dc = dC ∂C

∂c

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,016,1

1st

2nd

3rd

4th 5th16,1

∂C

∂c C=c =1

dc = dC ∂C

∂c

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,016,1

1st

2nd

3rd

4th 5th16,1

c = d2

dd = dc ∂c

∂d

∂c

∂d= 2d

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,016,1

1st

2nd

3rd

4th 5th16,1

c = d2

dd = dc ∂c

∂d

∂c

∂d= 2 x -4

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,016,1

1st

2nd

3rd

4th 5th16,1

c = d2

dd = dc ∂c

∂d

∂c

∂d= -8

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,-816,1

1st

2nd

3rd

4th 5th16,1

c = d2

dd = dc ∂c

∂d

∂c

∂d= -8

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,-816,1

1st

2nd

3rd

4th 5th16,1

d = y - ŷ ∂d

∂y= 1

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

d = y - ŷ ∂d

∂y= 1

dy = dd ∂d

∂y

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

y = o + b

∂y

∂o= 1

do = dy ∂y

∂o

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

y = o + b

∂y

∂o= 1

∂y

∂b= 1

bt+1 = b - dy ∂y

∂b

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

y = o + b

∂y

∂o= 1

∂y

∂b= 1

bt+1 = b - dy ∂y

∂b

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

y = o + b

∂y

∂o= 1

∂y

∂b= 1

bt+1 = b - ∂c

∂d

∂d∂y

∂y∂b

∂C

∂c

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

y = o + b

∂y

∂o= 1

∂y

∂b= 1

bt+1 = b - ∂C

∂b

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

o = wx

∂o

∂w= x

wt+1 = w - do ∂o

∂w

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52.8

2.2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)7-update parameters 10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

o = wx

∂o

∂w= x

wt+1 = w - do ∂o

∂w

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52.8

2.210,-8

12,-8

-4,-816,1 16,1

o = wx

∂o

∂w= x

wt+1 = w - do ∂o

∂w

Existing Tools:-Tensorflow ( https://www.tensorflow.org )-Torch ( https://github.com/torch/nn )-CNN ( https://github.com/clab/cnn )-JNN ( https://github.com/wlin12/JNN )-Theano (http://deeplearning.net/software/theano/ )

Into Deep Learning

Nonlinear Neural Modelsy = 4x-4

Data

0

1

16

5

20

6

?

3

Nonlinear Neural Models

Data

0

1

16

5

20

6

?

3

There is a limit of bananas I can give you

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

Data

x

y y = 4x-4

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data

x

y y = 4x-4

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data

x

y y = 2x+3

Model Problem

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data

x

y y = 2x+3

Model Problem

Underfitting

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data

x

y y = ???

Can we learn arbitrary functions?

Nonlinear Neural Models

y = (w1x + b1)s1 + (w2x+b2)s2

Use different linear functions depending on the value of x?

Nonlinear Neural Models

y = (w1x + b1)s1 + (w2x+b2)s2s1 - 1 if x < 6 and 0 otherwises2 - 1 if x >= 6 and 0 otherwise

Nonlinear Neural Models

y = (w1x + b1)s1 + (w2x+b2)s2

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data

y = (4x - 4)s1 + (0x+20)s2

s1 - 1 if x < 6 and 0 otherwises2 - 1 if x >= 6 and 0 otherwise

Nonlinear Neural Models

s = (wx + b)

(t) = 11 + e-t

Nonlinear Neural Models

s = (1000x)

x = 0.1 then (1000x) = 1

x = -0.1 then (1000x) = 0

Nonlinear Neural Models

s = (1000x)

x = 0.1 then (1000x) = 1

x = -0.1 then (1000x) = 0

Nonlinear Neural Models

s = (1000x - 6000)

x = 6.1 then (1000x - 6000) = 1

x = 5.9 then (1000x - 6000) = 0

Nonlinear Neural Models

y = (w1x + b1)s1 + (w2x+b2)s2

s1 = (w3x + b3)s2 = (w4x + b4)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (4x - 4)s1 + (0x+20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (4x - 4)s1 + (0x+20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (16)s1 + (0x+20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (16)s1 + (20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (16)s1 + (20)s2

s1 = (1000)s2 = (1000x - 6000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (16)s1 + (20)s2

s1 = (1000)s2 = (-1000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (16)1 + (20)0

s1 = (1000)s2 = (-1000)

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = 16

s1 = (1000)s2 = (-1000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (4x - 4)s1 + (0x+20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (32)s1 + (0x+20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (32)s1 + (20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (32)s1 + (20)s2

s1 = (-3000)s2 = (1000x - 6000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (32)s1 + (20)s2

s1 = (-3000)s2 = (3000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (32)0 + (20)1

s1 = (-3000)s2 = (3000)

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = 20

s1 = (-3000)s2 = (3000)

Nonlinear Neural Models

Data

0

1

16

5

20

6

?

3

If you give me too many apples, I will give them to...

Nonlinear Neural Models

Data

0

1

16

5

20

6

?

3

Count Von Count

Nonlinear Neural Models

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data

x

y y = (4x - 4)s1 + (0x+20)s2

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

x

y y = (4x - 4)s1 + (0x+20)s2

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = ????s3 = (1000x - 15000)

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = not s1 and not s3

s3 = (1000x - 15000)

Multilayer Perceptrons

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

Multilayer Perceptrons

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

Layer 1 Perceptron

Layer 1 Perceptron

Multilayer Perceptrons

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

Layer 2 Perceptron

Layer 1 Perceptron

Layer 1 Perceptron

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = not s1 and not s3

s3 = (1000x - 15000)

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = (-1000s1 - 1000s3 + 500)s3 = (1000x - 15000)

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = (-1000s1 - 1000s3 + 500)s3 = (1000x - 15000)

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (40)s1 + (20)s2 + (1)s3 s1 = (-1000x + 6000)s2 = (-1000s1 - 1000s3 + 500)s3 = (1000x - 15000)

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (40)s1 + (20)s2 + (1)s3 s1 = (-5000) = 0s2 = (-1000s1 - 1000s3 + 500)s3 = (-4000) = 0

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (40)s1 + (20)s2 + (1)s3 s1 = (-5000) = 0s2 = (-1000s4 - 1000s5 + 500)s3 = (-4000) = 0

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (40)s1 + (20)s2 + (1)s3 s1 = (-5000) = 0s2 = (500)s3 = (-4000) = 0

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (40)s1 + (20)s2 + (1)s3 s1 = (-5000) = 0s2 = (500) = 1s3 = (-4000) = 0

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (40)0 + (20)1 + (1)0s1 = (-5000) = 0s2 = (500) = 1s3 = (-4000) = 0

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = 20s1 = (-5000) = 0s2 = (500) = 1s3 = (-4000) = 0

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = (-1000s1 - 1000s3 + 500)s3 = (1000x - 15000)

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (772)s1 + (20)s2 + (1)s3 s1 = (-1000x + 6000)s2 = (-1000s4 - 1000s5 + 500)s3 = (1000x - 15000)

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (772)s1 + (20)s2 + (1)s3 s1 = (-13000) = 0s2 = (-1000s4 - 1000s5 + 500)s3 = (4000) = 1

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (772)s1 + (20)s2 + (1)s3 s1 = (-13000) = 0s2 = (-1000 + 0 + 500)s3 = (4000) = 1

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (772)s1 + (20)s2 + (1)s3 s1 = (-13000) = 0s2 = (-500) = 0s3 = (4000) = 1

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (772)0 + (20)0 + (1)1s1 = (-13000) = 0s2 = (-500) = 0s3 = (4000) = 1

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = 1s1 = (-13000) = 0s2 = (-500) = 0s3 = (4000) = 1

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

x

yy = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3

Multilayer Perceptrons

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

Layer 2 Perceptron

Layer 1 Perceptron

Layer 1 Perceptron

Multilayer Perceptrons

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

x

s1

s3

s2

w4x

b4

Multilayer Perceptrons

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

x

s2

w4x

b4

w7x

b5

s1

s3

Multilayer Perceptrons

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

x

s2

s1

s3

w6s3w5s1

b5

Multilayer Perceptrons

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

x

s2

s1

s3x < 6 x > 15

!(x > 15) & !(x < 6)

Multilayer Perceptrons

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

x

s2

s1

s3x < 6 x > 15

x∈[6,15]

Multilayer Perceptrons

x

s2

s1

s3x < 6 x > 15

x∈[6,15]

s4

x∈]-∞,6] & ]15,∞]

Multilayer Perceptrons

x

s5

s1

s2x < 6 x > 15

x∈[6,15]

s3 x > 2

s4 x < 3

s7

s6

s7

x∈]-∞,6] & ]15,∞] x∈[2,15] x∈[2,3]

Multilayer Perceptrons

x

s5

s1

s2x < 6 x > 15

x∈[6,15]

s3 x > 2

s4 x < 3

s7

s6

s7

x∈]-∞,6] & ]15,∞] x∈[2,15] x∈[2,3]

Input

Layer 1 (Input Features)

Layer 2 (And and Or Combinations)

Multilayer Perceptrons

x

s5

s1

s2x < 6 x > 15

x∈[6,15]

s3 x > 2

s4 x < 3

s7

s6

s7

x∈]-∞,6] & ]15,∞] x∈[2,15] x∈[2,3]

Input

Layer 1 (Input Features)

Layer 2 (And and Or Combinations)

And(s1,s2) = (1000s1 + 1000s3 - 1500)Or(s1,s2) = (1000s1 + 1000s3 - 500)

Multilayer Perceptrons

x

s5

s1

s2

s3

s4

s7

s6

s7

Input

Layer 1 (Input Features)

Layer 2 (And and Or Combinations)

Layer 3 (Xor Combinations)s8

s9

sa

sb

Multilayer Perceptrons

x

s5

s1

s2

s3

s4

s7

s6

s7

Input

Layer 1 (Input Features)

Layer 2 (And and Or Combinations)

Layer 3 (Xor Combinations)s8

s9

sa

sb

Xor(s1,s2) = Or(And(s1,!s2), And(!s1,s2))

Multilayer Perceptrons

x

s5

s1

s2

s3

s4

s7

s6

s7

Input

Layer 1 (Input Features)

Layer 2 (And and Or Combinations)

Layer 3 (Xor Combinations)s8

s9

sa

sb

Xor(s1,s2) = Or(s5, s6)

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

x

y

Universal approximator

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

x

y

but...

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

x

y

No guarantee that the best function will

be found

Multilayer Perceptrons

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

n x y

0 1 0

1 5 16

2 6 20

y

Multilayer Perceptrons

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

n x y

0 1 0

1 5 16

2 6 20

y = 0s5 + 16s6 + 20s7

y

Multilayer Perceptrons

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

n x y

0 1 0

1 5 16

2 6 20

y

y = 0s5 + 16s6 + 20s7

Multilayer Perceptrons

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

n x y

0 1 0

1 5 16

2 6 20

y

y = 0s5 + 16s6 + 20s7

Multilayer Perceptrons

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

n x y

0 1 0

1 5 16

2 6 20Overfitting

y = 0s5 + 16s6 + 20s7

Multilayer Perceptrons

y

Model Problem

Task Complexity

Model Complexity

Multilayer Perceptrons

Task Complexity

Model Complexity

Underfitting

Multilayer Perceptrons

Task Complexity

Model Complexity

Overfitting

Underfitting

Multilayer Perceptrons

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Multilayer Perceptrons

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Line

ar R

egre

ssio

n

MLP

1 L

ayer

MLP

2 L

ayer

MLP

3 L

ayer

Multilayer Perceptrons

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Line

ar R

egre

ssio

n

Line

ar

Reg

ress

ion

mor

e fe

atur

es

Multilayer Perceptrons

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Line

ar R

egre

ssio

n

MLP

1 L

ayer

MLP

2 L

ayer

MLP

3 L

ayer

Multilayer Perceptrons

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Line

ar R

egre

ssio

n

MLP

1 L

ayer

MLP

2 L

ayer

MLP

3 L

ayer

Sentiment analysis

Multilayer Perceptrons

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Line

ar R

egre

ssio

n

MLP

1 L

ayer

MLP

2 L

ayer

MLP

3 L

ayer

Sentiment analysis

Machine Translation

Multilayer Perceptrons

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Data

Multilayer Perceptrons

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Data

Multilayer Perceptrons

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Data

Multilayer Perceptrons

yn x y

0 1 0

1 5 16

2 6 20

y y

Multilayer Perceptrons

yn x y

0 1 0

1 5 16

2 6 20

3 2 4

y y

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 2 4

y y

Multilayer Perceptrons

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Model Bias

Multilayer Perceptrons

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Model BiasL1 & L2 RegularizationStochastic Dropout (Srivastava et al, 2014)Model Structure (CNN, RNNs)

Multilayer Perceptrons

Regularization

C(w,b) = ∑(yn-ŷn) + (w+b)ß

ß = Regularization constantn∈{0,1,2}

2

Multilayer Perceptrons

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

y

Regularization

Multilayer Perceptrons

x

s5

s1

s2x > 1 nothing

x∈]-∞,1]

s3 nothing

s4 x < 6

s7

s6

nothing x∈[6,∞]

y

Regularization

Multilayer Perceptrons

x

s5

s1

s2x > 1 nothing

x∈]-∞,1]

s3 nothing

s4 x < 6

s7

s6

nothing x∈[6,∞]

y

Regularization

Find solutions that require less effort

Multilayer Perceptrons

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

y

Stochastic Dropout (Srivastava et al, 2014)

Multilayer Perceptrons

Stochastic Dropout (Srivastava et al, 2014)

x

s5

s1

s2x > 1 0

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

0 0

y

Multilayer Perceptrons

Stochastic Dropout (Srivastava et al, 2014)

x

s5

s1

s2x > 1 0

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

0 0

y Find robust models

Multilayer Perceptrons

Model Structure

Weighted sum of linear functions VS MLP

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

Multilayer Perceptrons

Model Structure

Weighted sum of linear functions VS MLP

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

Convolutional Vs RNNs

Multilayer Perceptrons

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

x

s2

s1

s3

w6s3w5s1

b5

Representation

Multilayer Perceptrons

s1 = (W3x + b3)s2 = (W4s1 + b4)

Representation

s1

s2

2

1

1xx

s2

s1

s3

Multilayer Perceptrons

Representation

s1

s2

1000

1000

100x

s1 = (Ws2 + b)

Multilayer Perceptrons

Representation

s1

s2

1000

1000

100x

s1 = (Ws2 + b)Tensoflow Code

s1 = tf.matmul(x, W1) + b1

s1 = tf.nn.sigmoid(s1)

s2 = tf.matmul(s1, W2) + b2

s2 = tf.nn.sigmoid(s2)

Multilayer Perceptrons

Using Discrete Variables

Data

0

1

16

5

20

6

?

3

Using Discrete Variables

Data

0

1

16

5

20

6

?

3

Using Discrete Variables

Data

0

1

16

5

20

6

?

3

?

Using Discrete Variables

x

s5

s1

s2

s3

s4

s7

s6

y

Number of fruit to offer

Number of fruit received

Using Discrete Variablesx

y

Number of fruit to offer

Number of fruit received

s1

s2

Using Discrete Variablesx

y

Number of fruit to offer

uType of fruit to offer

v Number of fruit receivedType of fruit received

s1

s2

Using Discrete Variablesx

y

Number of fruit to offer

uType of fruit to offer

v Number of fruit receivedType of fruit received

s1

s2

u∈{Apple, Banana, Coconut}

v∈{Apple, Banana, Coconut}

Using Discrete VariablesLookup Tables

e1 e2 e3 e4

Apple 0.1 -0.4 0.2 0.5

Banana 0.4 1.4 -1.0 0.1

Coconut 1.1 0.9 1.1 0.5

u

V = 3

Using Discrete VariablesLookup Tables

e1 e2 e3 e4

Apple 0.1 -0.4 0.2 0.5

Banana 0.4 1.4 -1.0 0.1

Coconut 1.1 0.9 1.1 0.5

u

V = 3

Using Discrete VariablesLookup Tables

e1 e2 e3 e4

Apple 0.1 -0.4 0.2 0.5

Banana 0.4 1.4 -1.0 0.1

Coconut 1.1 0.9 1.1 0.5

u

Embedding for u Size = 4

V = 3

Using Discrete VariablesLookup Tables

e1 e2 e3 e4

Apple 0.1 -0.4 0.2 0.5

Banana 0.4 1.4 -1.0 0.1

Coconut 1.1 0.9 1.1 0.5

u

Embedding for u

Banana

Size = 4

V = 3

Using Discrete VariablesLookup Tables

e1 e2 e3 e4

0 0.1 -0.4 0.2 0.5

1 0.4 1.4 -1.0 0.1

2 1.1 0.9 1.1 0.5

u

Embedding for u

1

Size = 4

V = 3

Using Discrete VariablesLookup Tables

u

Embedding for u

1

Lookup

Size = 4

Using Discrete Variablesx

y

Number of fruit to offer

uType of fruit to offer

v Number of fruit receivedType of fruit received

s1

s2

u∈{Apple, Banana, Coconut}

v∈{Apple, Banana, Coconut}

eu

Lookup

Using Discrete VariablesSoftmax

V = 3

Apple Banana Coconut

w1 0.1 -0.4 0.2

w2 0.4 1.4 -1.0

w3 1.1 0.9 1.1

w4 1.3 0.1 0.4

Using Discrete VariablesSoftmax

Input vector Size = 4V = 3

Apple Banana Coconut

w1 0.1 -0.4 0.2

w2 0.4 1.4 -1.0

w3 1.1 0.9 1.1

w4 1.3 0.1 0.4

Using Discrete VariablesSoftmax

Input vector Size = 4

logits Size = V

V = 3

Apple Banana Coconut

w1 0.1 -0.4 0.2

w2 0.4 1.4 -1.0

w3 1.1 0.9 1.1

w4 1.3 0.1 0.4

Using Discrete VariablesSoftmax

Input Vector

Logits

V = 3

Apple Banana Coconut

w1 0.1 -0.4 0.2

w2 0.4 1.4 -1.0

w3 1.1 0.9 1.1

w4 1.3 0.1 0.4

s1

s2

s3

s4

d1

d2

d3

1 -1 -2

Using Discrete VariablesSoftmax

Input Vector

Logits

V = 3

Apple Banana Coconut

w1 0.1 -0.4 0.2

w2 0.4 1.4 -1.0

w3 1.1 0.9 1.1

w4 1.3 0.1 0.4

s1

s2

s3

s4

d1

d2

d3

1 -1 -2

p1

p2

p2

0.84 0.11 0.05

Using Discrete VariablesSoftmax

Input Vector

Logits

V = 3

Apple Banana Coconut

w1 0.1 -0.4 0.2

w2 0.4 1.4 -1.0

w3 1.1 0.9 1.1

w4 1.3 0.1 0.4

s1

s2

s3

s4

d1

d2

d3

1 -1 -2

p1

p2

p2

0.84 0.11 0.05

Apple

Using Discrete Variablesx

y

Number of fruit to offer

uType of fruit to offer

v Number of fruit receivedType of fruit received

s1

s2

u∈{Apple, Banana, Coconut}

v∈{Apple, Banana, Coconut}

eu

Softmax

Lookup

Using Discrete Variablesx

y

Number of fruit to offer

uType of fruit to offer

v Number of fruit receivedType of fruit received

s1

s2

u∈{Apple, Banana, Coconut}

v∈{Apple, Banana, Coconut}

eu

Softmax

Lookup

Example Applications

Window-based Tagging (Collobert et al, 2011)

Abby likes to eat apples and bananas

NNP VBZ TO VB NNS CC NNS

Example Applications

Window-based Tagging (Collobert et al, 2011)

Abby likes to eat apples and bananas

e-2 e-1 e-0 e1 e2

Example Applications

Window-based Tagging (Collobert et al, 2011)

Abby likes to eat apples and bananas

e-2 e-1 e-0 e1 e2 Word Embeddings

Non-Linear Layer 1s1

s2 Non-Linear Layer 2

Example Applications

Window-based Tagging (Collobert et al, 2011)

Abby likes to eat apples and bananas

e-2 e-1 e-0 e1 e2 Word Embeddings

Non-Linear Layer 1s1

s2 Non-Linear Layer 2

VB Softmax

Example Applications

Window-based Tagging (Collobert et al, 2011)

Abby likes to eat apples and bananas

e-2 e-1 e-0 e1 e2 Word Embeddings

Non-Linear Layer 1s1

s2 Non-Linear Layer 2

VB Softmax

Example Applications

Window-based Tagging (Collobert et al, 2011)

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

ContextPredict

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

e-4 e-3 e-2 e-1

s1

s2

Softmax

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

0.2<s>

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

0.10.2

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

0.10.2 0.3

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

0.10.2 0.3 0.5 0.7 0.4 0.20.000378

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas 0.000378

Abby dislikes to drink apples and bananas 0.00012

John does to eat coconuts and bananas 0.00003

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas 0.000378

Abby dislikes to drink apples and bananas 0.00012

John does to eat coconuts and bananas 0.00003

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

ContextPredict

Translation

Source

Abby gosta de comer macas e bananas

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

ContextPredict

Translation

Source

Abby gosta de comer macas e bananas

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

Translation

macas

e-4 e-3 e-2 e-1

s1

s2

f-1

Example Applications

Translation Rescoring (Devlin et al, 2014)

Translation Score (BLEU) Arabic - English Chinese - English

Best Rescored System 52.8 34.7

1st OpenMT12 49.5 32.6

Hierarchical 43.4 30.1

Deep Neural Networks are our friends?Convolutional Neural Network

Deep Neural Networks are our friends?Convolutional Neural Network

x1 x2 x3 x4

x5 x6 x7 x8

x9 x10 x11 x12

x13 x14 x15 x16

4x4 image

Deep Neural Networks are our friends?Convolutional Neural Network

x1 x2 x3 x4

x5 x6 x7 x8

x9 x10 x11 x12

x13 x14 x15 x16

4x4 image

Deep Neural Networks are our friends?Convolutional Neural Network

x1 x2 x3 x4

x5 x6 x7 x8

x9 x10 x11 x12

x13 x14 x15 x16

4x4 image

z1

x1

x2

...

x11

z1

w9

w1

Deep Neural Networks are our friends?Convolutional Neural Network

x1 x2 x3 x4

x5 x6 x7 x8

x9 x10 x11 x12

x13 x14 x15 x16

4x4 image

z1 z2

x2

x3

...

x12

z1

w1

w9

Deep Neural Networks are our friends?Convolutional Neural Network

x1 x2 x3 x4

x5 x6 x7 x8

x9 x10 x11 x12

x13 x14 x15 x16

4x4 image

z1 z2

z3 z4

Deep Neural Networks are our friends?Convolutional Neural Network

x1 x2 x3 x4

x5 x6 x7 x8

x9 x10 x11 x12

x13 x14 x15 x16

4x4 image

z1 z2

z3 z4

z1

z2

z3

z4

y Is this a cat?

top related