dynamic routing between capsules - heidelberg university...dynamic routing between capsules capsules...

55
Dynamic Routing Between Capsules Explainable Machine Learning

Upload: others

Post on 31-Dec-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

Dynamic Routing Between Capsules

Explainable Machine Learning

Page 2: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald2

Introduction

Dynamic Routing Between CapsulesDynamic Routing Between Capsulesby Sara Sabour, Nicholas Frosst, Geoffrey Hinton

from October 2017

Page 3: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald3

Geoffrey Hinton

● Significant contributions for the backpropagration algorithm

● Idea for AlexNet● Invented Dropout

Page 4: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald4

List of Content

● Motivation for Capsules ● Idea of Inverse Graphics● Capsules● Dynamic Routing Between Capsules● Capsules on MNIST● Conclusion

Page 5: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald5

Convolutional Neural Networks (CNN)

● Special type of multi-layer neural networks, constructed to recognize visual patterns directly from pixel images

Page 6: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald6

Features Maps of CNN’s

Page 7: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald7

Max-Pooling Layer

● Dimension Reduction● Selective routing of features

Loses out positional information

Page 8: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald8

Achievements of CNN’s

Page 9: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald9

Spatial Relation

Page 10: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald11

Motivation

Hinton: “The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster.”

Looking for equivarianceChanges in viewpoint leads to corresponding changes in neural activities

Page 11: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald12

List of Content

● Motivation for Capsules ● Idea of Inverse Graphics● Capsules● Dynamic Routing Between Capsules● Capsules on MNIST● Conclusion

Page 12: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald13

Computer Graphics

● Construct a visual image (rendering) from abstract representation of an object

Page 13: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald14

Inverse Graphics

● Reverse process: start from the image and get the parameters trough inverse rendering

Page 14: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald15

List of Content

● Motivation for Capsules ● Idea of Inverse Graphics● Capsules● Dynamic Routing Between Capsules● Capsules on MNIST● Conclusion

Page 15: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald16

Capsule Network

Capsule Network is a neural network that tries to perform inverse graphics

Page 16: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald17

Capsule

● A Group of neurons ● Goal: predict the presence and the instantiation

parameters of a specific entity at a given location● Presence represent by the length of the activity

vector (probability)● Instantiation paramerters are:

– position, size, orientation, deformation, hue, texture, etc.

Page 17: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald18

Primary Capsule Activities

Input: Image features

1st layer: Convolutional layer with ReLu activation

2nd layer: Convolutional capsule layer

Squashing function

Page 18: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald19

Squashing Function

● Output vector represent probability that the entity is present

● Apply non-linearity to the whole capsule

Page 19: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald20

Capsules

Inverse Rendering

Image Capsule activations

Page 20: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald21

Capsules

Inverse Rendering

Image

Equivariance

Capsule activations

Page 21: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald22

List of Content

● Motivation for Capsules ● Idea of Inverse Graphics● Capsules● Dynamic Routing Between Capsules● Capsules on MNIST● Conclusion

Page 22: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald23

Dynamic Routing

● Prediction vector

– With the previous capsule output ui and transformation matrix Wij

● The capsule output of the next layer– With vj ouput of the next layer

Page 23: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald24

Dynamic Routing

● Couping Constants cij

– trained during the iterative dynamic routing process

– determined by ‘routing softmax’ whose initial logits bij are the log prior probabilities that capsule i should be coupled to capsule j.

● Agreement

– is simply the scalar product

Page 24: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald25

Dynamic Routing

Routing Algorithm:

Page 25: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald26

Dynamic Routing

Loss function:● For each digit capsule k, the loss function is margin

loss as

● where Tk = 1 when digit k is present and m+ = 0.9 and m− = 0.1. Default

Page 26: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald27

Capsules

Hierachy of parts

Page 27: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald28

Capsules

Inverse Rendering

Page 28: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald29

Capsules

Inverse Rendering

Predicted outputs

Page 29: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald30

Capsules

Inverse Rendering

AgreementShould be only routed to 7

Predicted outputs

Page 30: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald31

Capsules

Inverse Rendering

Predicted outputs

Dynamic Routing:● bij = 0 for all i,j● Ci = softmax(bi)

0.5

0.50.5

0.5

Page 31: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald32

Capsules

Inverse Rendering

Predicted outputs

Sj = weighted sumVj = squash(sj)

Output for round #1

0.50.5 0.5 0.5

Page 32: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald33

Capsules

Inverse Rendering

Predicted outputs

Sj = weighted sumVj = squash(sj)

Output for round #1

0.50.5 0.5 0.5

Agreement huge

Page 33: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald34

Capsules

Inverse Rendering

Predicted outputs

Sj = weighted sumVj = squash(sj)

Output for round #1

0.50.5 0.5 0.5

Disagreement small

Page 34: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald35

Capsules

Inverse Rendering

Predicted outputs

Dynamic Routing:● bij = 0 for all i,j● Ci = softmax(bi)

0.8

0.10.2

0.9

Page 35: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald36

Capsules

Inverse Rendering

Predicted outputs

Sj = weighted sumVj = squash(sj)

Output for round #2

0.80.2 0.1 0.9

Page 36: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald37

Clustering on agreement

What really happens:

Mean

Page 37: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald38

Clustering on agreement

What really happens:

Weighted mean

Page 38: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald39

Clustering on agreement

What really happens:

Weighted mean

Page 39: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald40

Classification

Inverse Rendering

Loss function

Page 40: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald41

Capsule Network Architecture

Page 41: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald42

Reconstruction

Inverse Rendering

Loss function

Decoder

Neural Net

Reconstruction

Page 42: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald43

Reconstruction

● Decoder structure to reconstruct a digit from the DigitCaps layer

Page 43: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald44

Reconstruction as a regularization method

● Force the digit capsules to encode the instantiation parameters of the input digit

● minimize the sum of squared differences between the outputs of the logistic units and the pixel intensities

● Loss = margin loss + a reconstruction loss

with a = 0.0005

Page 44: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald45

List of Content

● Motivation for Capsules ● Idea of Inverse Graphics● Capsules● Dynamic Routing Between Capsules● Capsules on MNIST● Conclusion

Page 45: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald46

Capsules on MNIST

● Images have been shifted by up to 2 pixel in each direction with zero padding, no other data augmentation or model averaging

● Baseline standard CNN with three Conv-Layers of (256,256,128 channels, 5x5 kernel, stride 1) followed by 2 fully connected layers (328,192 (dropout))

● Number of parameters: baseline 35.4 M, CapsNet 8.2 M and without reconstrcution subnetwork 6.8 M

Page 46: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald47

Individual Dimensions of a Capsule

Page 47: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald48

Robustness to Affine Transformations

● Trained a CapsNet and traditional CNN (with pooling) on a padded and translated MNIST training set

● tested networks on the affNIST (MNIST digit with a random small affine transformation)

● Under-trained CapsNet (99,23%) achieved 79 %● Traditional CNN (99,22%) with similar number of

parameters achieved 66 %

Page 48: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald49

MultiMNIST

● create MultiMNIST dataset by overlaying a digit on top of another digit of a different class

● For each digit in MNIST they generate 1K MultiMNIST examples● Trainings set size 60M, test set size 10 M

Page 49: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald50

MultiMNIST

Page 50: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald51

CIFAR10

● Slight modification from the simple model they used for MNIST, with 3 routing iteration

● Achieved 10.6 % test error ● About what standard CNN achieved when they

were first applied to CIFAR10

Page 51: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald52

List of Content

● Motivation for Capsules ● Idea of Inverse Graphics● Capsules● Dynamic Routing Between Capsules● Capsules on MNIST● Conclusion

Page 52: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald53

Conclusion

● Achieved state of the art accuracy on MNIST● Spatial relation are preserved (equivariance)

– Promising for object detection and segmentation● Dynamic Routing works great for overlapping digits● Robustness to affine transformation● Activation vectors are easier to interpret (scale, thickness,

rotation etc.)● Ability to analyze the hierarchy of objects

Page 53: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald54

Conclusion

● Not state of the art on CIFAR10● No results on larger data sets (e.g. ImageNet)● Slow in training, because of the inner loop in

Dynamic Routing

Page 54: Dynamic Routing Between Capsules - Heidelberg University...Dynamic Routing Between Capsules Capsules on MNIST Conclusion 6/19/18 Michael Dorkenwald 46 Capsules on MNIST Images have

6/19/18 Michael Dorkenwald55

Sources

● Slide 3: https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

● Slide 5: https://www.mathworks.com/discovery/convolutional-neural-network.html

● Slide 6 and 7: https://cs231n.github.io/convolutional-networks/#pool

● Slide 11: https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton/clyj4jv/

● Slide 7: https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/pooling_layer.html

● Slide 13 and 14 : https://kndrck.co/posts/capsule_networks_explained/

● Slide 8:https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

● Slide 9: https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952