[pr12] capsule networks - jaejun yoo

53
Capsule Networks PR12와 함께 이해하는 Jaejun Yoo Ph.D. Candidate @KAIST PR12 17 th Dec, 2017

Upload: jaejun-yoo

Post on 22-Jan-2018

610 views

Category:

Science


2 download

TRANSCRIPT

Page 1: [PR12] Capsule Networks - Jaejun Yoo

Capsule Networks

PR12와 함께 이해하는

Jaejun YooPh.D. Candidate @KAIST

PR12

17th Dec, 2017

Page 2: [PR12] Capsule Networks - Jaejun Yoo

Today’s contents

Dynamic Routing Between Capsules

by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton

Oct. 2017: https://arxiv.org/abs/1710.09829

NIPS 2017 Paper

Page 3: [PR12] Capsule Networks - Jaejun Yoo

Convolutional Neural Networks

What is the problem with CNNs?

Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

1) If the images have rotation, tilt or any other different orientation then CNNs have poor performance.2) In CNN each layer understands an image at a much more granular level (slow increase in receptive field).

DATA AUGMENTATION,MAX POOLING

Page 4: [PR12] Capsule Networks - Jaejun Yoo

Convolutional Neural Networks

What is the problem with CNNs?

Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

“Pooling helps in creating the positional invariance. Otherwise This invariance also triggers false positive for images which have the components of a ship but not in the correct order.”

Page 5: [PR12] Capsule Networks - Jaejun Yoo

Convolutional Neural Networks

What is the problem with CNNs?

Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

“Pooling helps in creating the positional invariance. Otherwise This invariance also triggers false positive for images which have the components of a ship but not in the correct order.”

This was never the intention of pooling layer!

Page 6: [PR12] Capsule Networks - Jaejun Yoo

Convolutional Neural Networks

What we need : EQUIVARIANCE (not invariance)

Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

“Equivariance makes a CNN understand the rotation or proportion change and adapt itself accordingly so that the spatial positioning inside an image is not lost.”

Page 7: [PR12] Capsule Networks - Jaejun Yoo

Capsules

“A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part.”

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Page 8: [PR12] Capsule Networks - Jaejun Yoo

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Capsules

8D vector

Inverse Rendering

Page 9: [PR12] Capsule Networks - Jaejun Yoo

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Capsules

8D vector

Page 10: [PR12] Capsule Networks - Jaejun Yoo

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Capsules

8D vector

Page 11: [PR12] Capsule Networks - Jaejun Yoo

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Capsules

8D vector

Equivariance of Capsules

Page 12: [PR12] Capsule Networks - Jaejun Yoo

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Capsules

8D vector

Equivariance of Capsules

Page 13: [PR12] Capsule Networks - Jaejun Yoo
Page 14: [PR12] Capsule Networks - Jaejun Yoo

Contents from https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-iii-dynamic-routing-between-capsules-349f6d30418

Routing by Agreements

Page 15: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Primary Capsules

=

=

Primary Capsules

Page 16: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Predict Next Layer’s Output

=

=

Primary Capsules

Page 17: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Predict Next Layer’s Output

=

=

Primary Capsules

Page 18: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Predict Next Layer’s Output

=

=

One transformation matrix Wi,jper part/whole pair (i, j).

ûj|i = Wi,j ui

Primary Capsules

Page 19: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Predict Next Layer’s Output

=

=

Primary Capsules

Page 20: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Predict Next Layer’s Output

=

=

Primary Capsules

Page 21: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules

Page 22: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Routing by Agreement

=

=

Predicted Outputs

Primary Capsules

Strong agreement!

Page 23: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

The rectangle and triangle capsules should be routed to the boat capsules.

Routing by Agreement

=

=

Predicted Outputs

Primary Capsules

Strong agreement!

Page 24: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Routing Weights

=

=

Predicted Outputs

Primary Capsules

bi,j=0 for all i, j

Page 25: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Routing Weights

=

=

Predicted Outputs

Primary Capsules

0.5

0.5

0.5

0.5

bi,j=0 for all i, j

ci = softmax(bi)

Page 26: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Compute Next Layer’s Output

=

=

Predicted Outputs

sj = weighted sum

Primary Capsules

0.5

0.5

0.5

0.5

Page 27: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules

0.5

0.5

0.5

0.5

sj = weighted sum

vj = squash(sj)

Page 28: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Actual outputsof the next layer capsules(round #1)

Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules

0.5

0.5

0.5

0.5

sj = weighted sum

vj = squash(sj)

Page 29: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Actual outputsof the next layer capsules(round #1)

Update Routing Weights

=

=

Predicted Outputs

Primary Capsules

Agreement

Page 30: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Actual outputsof the next layer capsules(round #1)

Update Routing Weights

=

=

Predicted Outputs

Primary Capsules

Agreement bi,j += ûj|i . vj

Page 31: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Actual outputsof the next layer capsules(round #1)

Update Routing Weights

=

=

Predicted Outputs

Primary Capsules

Agreement bi,j += ûj|i . vj

Large

Page 32: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Actual outputsof the next layer capsules(round #1)

Update Routing Weights

=

=

Predicted Outputs

Primary Capsules

Disagreement bi,j += ûj|i . vj

Small

Page 33: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules

0.2

0.1

0.8

0.9

Page 34: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Compute Next Layer’s Output

=

=

Predicted Outputs

sj = weighted sum

Primary Capsules

0.2

0.1

0.8

0.9

Page 35: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules

sj = weighted sum

vj = squash(sj)0.2

0.1

0.8

0.9

Page 36: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Actual outputsof the next layer capsules(round #2)

Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules

0.2

0.1

0.8

0.9

Page 37: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Handling Crowded Scenes

=

=

=

=

Page 38: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Handling Crowded Scenes

=

=

=

=

Is this an upside down house?

Page 39: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Handling Crowded Scenes

=

=

=

=

House

Thanks to routing by agreement, the ambiguity is quickly resolved (explaining away).

Boat

Page 40: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Classification CapsNet

|| ℓ2 || Estimated Class Probability

Page 41: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Training

|| ℓ2 || Estimated Class ProbabilityTo allow multiple classes, minimize margin loss:

Lk = Tk max(0, m+ - ||vk||2)

+ λ (1 - Tk) max(0, ||vk||2 - m-)

Tk = 1 iff class k is present

In the paper:m- = 0.1m+ = 0.9λ = 0.5

Page 42: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Training

Translated to English:

“If an object of class k

is present, then ||vk||2should be no less than 0.9. If not, then ||vk||2should be no more than 0.1.”

|| ℓ2 || Estimated Class ProbabilityTo allow multiple classes, minimize margin loss:

Lk = Tk max(0, m+ - ||vk||2)

+ λ (1 - Tk) max(0, ||vk||2 - m-)

Tk = 1 iff class k is present

In the paper:m- = 0.1m+ = 0.9λ = 0.5

Page 43: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Regularization by Reconstruction

|| ℓ2 ||

Feedforward Neural Network

Decoder

Reconstruction

Page 44: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Regularization by Reconstruction

|| ℓ2 ||

Feedforward Neural Network

Decoder

Reconstruction

Loss = margin loss + α reconstruction loss

The reconstruction loss is the squared difference between the reconstructed image and the input image.In the paper, α = 0.0005.

Page 45: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

A CapsNet for MNIST

(Figure 1 from the paper)

Page 46: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

A CapsNet for MNIST – Decoder

(Figure 2 from the paper)

Page 47: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Interpretable Activation Vectors

(Figure 4 from the paper)

Page 48: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

Pros

● Reaches high accuracy on MNIST, and promising on CIFAR10

● Requires less training data

● Position and pose information are preserved (equivariance)

● This is promising for image segmentation and object detection

● Routing by agreement is great for overlapping objects (explaining away)

● Capsule activations nicely map the hierarchy of parts

● Offers robustness to affine transformations● Activation vectors are easier to interpret (rotation, thickness, skew…)

● It’s Hinton! ;-)

Page 49: [PR12] Capsule Networks - Jaejun Yoo

Aurélien Géron, 2017

● Not state of the art on CIFAR10 (but it’s a good start)

● Not tested yet on larger images (e.g., ImageNet): will it work well?

● Slow training, due to the inner loop (in the routing by agreement algorithm)

● A CapsNet cannot see two very close identical objects○ This is called “crowding”, and it has been observed as well in human vision

Cons

Page 50: [PR12] Capsule Networks - Jaejun Yoo

Results

What the individual dimensions of a capsule represent

Page 51: [PR12] Capsule Networks - Jaejun Yoo

Results

MultiMNISTSegmenting Highly Overlapping Digits

Page 52: [PR12] Capsule Networks - Jaejun Yoo

Questions Remained

Does capsules really work as the real neurons do?

perceptual illusions

Thompson, P. (1980). Margaret Thatcher: a new illusion. Perception, 38, (6). 483-484.

Page 53: [PR12] Capsule Networks - Jaejun Yoo

• https://arxiv.org/abs/1710.09829 (paper)• https://jhui.github.io/2017/11/03/Dynamic-Routing-Between-Capsules/

• https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

• https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

• https://www.youtube.com/watch?v=pPN8d0E3900 (video)• https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets (video slides)

References