capsules dynamic routing between · 2018-04-19 · capsules should be routed to the boat capsule...

Dynamic Routing Between Capsules

Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy

Problems & Results

Object classification in images without losing information about important parts of the picture.

smallNORB: Images of 3D objects 5 classes. Images of 50 toys in different anglesCapsNet(2017): 2.7% errorState of the art (2017): 2.56% error*CapsNet (2018): 1.4% error

MNIST: Handwritten digit classificationResult: 0.25% error (state of the art)

How ConvNets would have achieved rotational invariance?

Problem with CNN

Traditional ConvNet

● Translational invariance: Max Pooling

● Susceptible to affine transformations

● Max Pooling throw away information

● Human brains don’t work like that

Capsule Networks

● Equivariance by “Routing by Agreement”

● Equivariance keeps track of where something is in the image

● Robust to affine transformations

● Makes biological sense

● Achieves inverse rendering (with capsules)

Motivation

Rendering vs. Inverse Rendering

● A capsule is a group of neurons which outputs a vector activation

● The vector represents features related to the object

● Capsule represents the inverse graphics of the patch of image

● Orientation of vector: Represents properties of the entity

● Length of vector: Represents existence of the entity

What is a capsule?

A Toy Example

Slides heavily inspired by Aurélien Géron [2]

A Toy Example


Predict Next Layer’s Output

Strong agreement!The rectangle and triangle capsules should be routed to the boat capsule


Predict Next Layer’s Output

Output of capsule j (parent)

Routing Algorithm

Routing coefficient between capsule i to parent capsule j.

Predict next layer output


Routing Weights

0.5 0.5

0.5 0.5


Compute Next Layer’s Output

Agreement!

Large


Update Routing Weights

Disagreement!

Small


Update Routing Weights


Routing Weights

0.5 0.5

0.5 0.5


Routing Weights

0.2 0.8

0.1 0.9


Routing Weights

● 70,000 handwritten digits● 28x28 grayscale images● DIgit classification (10 classes)

The MNIST Dataset

Architecture

Image 28x28

Conv1256x20x20

256 , 9x9

Conv256x6x6

256 , 9x9

stride 2

reshape

Capsules32x8x6x6

DigitCaps10x16

Wij[8x16]

Loss Function

● A decoder is used to reconstruct object from capsule representation

● Reconstruction loss: mean-squared error

● Encourages capsules to encode the instantiation parameters of the input digit

Input

Reconstructed

Reconstruction

Baseline #parameters: 35.4M

CapsNet (with reconstruction) #parameters: 8.2M

CapsNet (without reconstruction) #parameters: 6.8M

MNIST Result

l = labelp = predictionr = reconstruction target Predicted 3, reconstructed

from 5Predicted 3, reconstructed from 3

MNIST Results

Capsule Interpretation

MNIST data set with small random affine transformations.

Training Data : Expanded and translated MNIST dataset

Traditional CNN CapsuleNet

Expanded & Translated 99.22% 99.23%

Affine Transformation 66% 79%

MNIST Results continued

● Two digits fused together

● Each digit has 80% overlap

● Training size: 60M, Testing size: 10M

5,0 6,7 4,9

MultiMNIST

While = Input

Red = Digit 1 reconstruction

Green = Digit 2 reconstruction

L:(l1,l2) = Label for digit1 and digit 2

R:(r1,r2) = digits used for reconstruction

MultiMNIST Results

CNN 8.5

Caps(1 itr) 7.1

Caps(3 itr) 5.2

While = Input

Red = Digit 1 reconstruction

Green = Digit 2 reconstruction

L:(l1,l2) = Label for digit1 and digit 2

R:(r1,r2) = digits used for reconstruction

MultiMNIST Results

CIFAR10: 60000 32x32 colour images in 10 classes(airplane,bird,cat,deer,dog,frog,horse etc )

Result : 10.6% errorState of the art: ~2.5% error

SVHN: Street view house numbers

Result : 4.3% errorState of the art: 1.69% error

smallNORB: Images of 3D objects 5 classes. Images of 50 toys in different anglesResult: 2.7% errorState of the art: 2.56% errorCapsNet (2018): 1.4%

Other Datasets

Pros:● Requires less training data

● Position and pose is preserved (Equivariance)

● Robust affine transformations

● Activation vector is easy to interpret

● Less trainable parameters required (77% less for MNIST)

● Great for overlapping objects

● Good for dealing with segmentation

Cons:● Computational heavy

● CapsNet does not allow two instances of the same class at the same location

● Likes to account for everything in the image

● Requires a lot of further research

Discussion

● Capsule: A group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part

● The vector parameters could be: rotation, position, size, texture….● Dynamic routing routes information to higher layers by agreeing on output between layers● Achieves inverse rendering● Equivariance: Keeps track of where the entity is in the image.

Summary

[1] Awesome Capsule Networks. (https://github.com/aisummary/awesome-capsule-networks)

[2] Capsule Networks (CapsNets) - Tutorial. (https://www.youtube.com/watch?v=pPN8d0E3900)

[3] Understanding Hinton’s Capsule Networks. Part IV: CapsNet Architecture.

(https://medium.com/@pechyonkin/part-iv-capsnet-architecture-6a64422f7dce)

[4] Geoffrey Hinton talk "What is wrong with convolutional neural nets ?"

(https://www.youtube.com/watch?v=rTawFwUvnLE)

Additional Information on CapsNet

https://github.com/aisummary/awesome-capsule-networks

https://www.youtube.com/watch?v=pPN8d0E3900

https://medium.com/@pechyonkin/part-iv-capsnet-architecture-6a64422f7dce

https://www.youtube.com/watch?v=rTawFwUvnLE

capsules dynamic routing between · 2018-04-19 · capsules should be routed to the boat capsule...

Documents