capsules dynamic routing between · 2018-04-19 · capsules should be routed to the boat capsule...
TRANSCRIPT
Dynamic Routing Between Capsules
Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy
Problems & Results
Object classification in images without losing information about important parts of the picture.
smallNORB: Images of 3D objects 5 classes. Images of 50 toys in different anglesCapsNet(2017): 2.7% errorState of the art (2017): 2.56% error*CapsNet (2018): 1.4% error
MNIST: Handwritten digit classificationResult: 0.25% error (state of the art)
How ConvNets would have achieved rotational invariance?
Problem with CNN
Traditional ConvNet
● Translational invariance: Max Pooling
● Susceptible to affine transformations
● Max Pooling throw away information
● Human brains don’t work like that
Capsule Networks
● Equivariance by “Routing by Agreement”
● Equivariance keeps track of where something is in the image
● Robust to affine transformations
● Makes biological sense
● Achieves inverse rendering (with capsules)
Motivation
Rendering vs. Inverse Rendering
● A capsule is a group of neurons which outputs a vector activation
● The vector represents features related to the object
● Capsule represents the inverse graphics of the patch of image
● Orientation of vector: Represents properties of the entity
● Length of vector: Represents existence of the entity
What is a capsule?
A Toy Example
Slides heavily inspired by Aurélien Géron [2]
A Toy Example
Slides heavily inspired by Aurélien Géron [2]
Predict Next Layer’s Output
Slides heavily inspired by Aurélien Géron [2]
Predict Next Layer’s Output
Strong agreement!The rectangle and triangle capsules should be routed to the boat capsule
Slides heavily inspired by Aurélien Géron [2]
Predict Next Layer’s Output
Output of capsule j (parent)
Routing Algorithm
Routing coefficient between capsule i to parent capsule j.
Predict next layer output
Slides heavily inspired by Aurélien Géron [2]
Routing Weights
0.5 0.5
0.5 0.5
Slides heavily inspired by Aurélien Géron [2]
Compute Next Layer’s Output
Slides heavily inspired by Aurélien Géron [2]
Compute Next Layer’s Output
Agreement!
Large
Slides heavily inspired by Aurélien Géron [2]
Update Routing Weights
Disagreement!
Small
Slides heavily inspired by Aurélien Géron [2]
Update Routing Weights
Slides heavily inspired by Aurélien Géron [2]
Routing Weights
0.5 0.5
0.5 0.5
Slides heavily inspired by Aurélien Géron [2]
Routing Weights
0.2 0.8
0.1 0.9
Slides heavily inspired by Aurélien Géron [2]
Routing Weights
● 70,000 handwritten digits● 28x28 grayscale images● DIgit classification (10 classes)
The MNIST Dataset
Architecture
Image 28x28
Conv1256x20x20
256 , 9x9
Conv256x6x6
256 , 9x9
stride 2
reshape
Capsules32x8x6x6
DigitCaps10x16
Wij[8x16]
Architecture
Image 28x28
Conv1256x20x20
256 , 9x9
Conv256x6x6
256 , 9x9
stride 2
reshape
Capsules32x8x6x6
DigitCaps10x16
Wij[8x16]
Architecture
Image 28x28
Conv1256x20x20
256 , 9x9
Conv256x6x6
256 , 9x9
stride 2
reshape
Capsules32x8x6x6
DigitCaps10x16
Wij[8x16]
Architecture
Image 28x28
Conv1256x20x20
256 , 9x9
Conv256x6x6
256 , 9x9
stride 2
reshape
Capsules32x8x6x6
DigitCaps10x16
Wij[8x16]
Architecture
Image 28x28
Conv1256x20x20
256 , 9x9
Conv256x6x6
256 , 9x9
stride 2
reshape
Capsules32x8x6x6
DigitCaps10x16
Wij[8x16]
Architecture
Image 28x28
Conv1256x20x20
256 , 9x9
Conv256x6x6
256 , 9x9
stride 2
reshape
Capsules32x8x6x6
DigitCaps10x16
Wij[8x16]
Loss Function
● A decoder is used to reconstruct object from capsule representation
● Reconstruction loss: mean-squared error
● Encourages capsules to encode the instantiation parameters of the input digit
Input
Reconstructed
Reconstruction
Baseline #parameters: 35.4M
CapsNet (with reconstruction) #parameters: 8.2M
CapsNet (without reconstruction) #parameters: 6.8M
MNIST Result
l = labelp = predictionr = reconstruction target Predicted 3, reconstructed
from 5Predicted 3, reconstructed from 3
MNIST Results
Capsule Interpretation
MNIST data set with small random affine transformations.
Training Data : Expanded and translated MNIST dataset
Traditional CNN CapsuleNet
Expanded & Translated 99.22% 99.23%
Affine Transformation 66% 79%
MNIST Results continued
● Two digits fused together
● Each digit has 80% overlap
● Training size: 60M, Testing size: 10M
5,0 6,7 4,9
MultiMNIST
While = Input
Red = Digit 1 reconstruction
Green = Digit 2 reconstruction
L:(l1,l2) = Label for digit1 and digit 2
R:(r1,r2) = digits used for reconstruction
MultiMNIST Results
CNN 8.5
Caps(1 itr) 7.1
Caps(3 itr) 5.2
While = Input
Red = Digit 1 reconstruction
Green = Digit 2 reconstruction
L:(l1,l2) = Label for digit1 and digit 2
R:(r1,r2) = digits used for reconstruction
MultiMNIST Results
CIFAR10: 60000 32x32 colour images in 10 classes(airplane,bird,cat,deer,dog,frog,horse etc )
Result : 10.6% errorState of the art: ~2.5% error
SVHN: Street view house numbers
Result : 4.3% errorState of the art: 1.69% error
smallNORB: Images of 3D objects 5 classes. Images of 50 toys in different anglesResult: 2.7% errorState of the art: 2.56% errorCapsNet (2018): 1.4%
Other Datasets
Pros:● Requires less training data
● Position and pose is preserved (Equivariance)
● Robust affine transformations
● Activation vector is easy to interpret
● Less trainable parameters required (77% less for MNIST)
● Great for overlapping objects
● Good for dealing with segmentation
Cons:● Computational heavy
● CapsNet does not allow two instances of the same class at the same location
● Likes to account for everything in the image
● Requires a lot of further research
Discussion
● Capsule: A group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part
● The vector parameters could be: rotation, position, size, texture….● Dynamic routing routes information to higher layers by agreeing on output between layers● Achieves inverse rendering● Equivariance: Keeps track of where the entity is in the image.
Summary
[1] Awesome Capsule Networks. (https://github.com/aisummary/awesome-capsule-networks)
[2] Capsule Networks (CapsNets) - Tutorial. (https://www.youtube.com/watch?v=pPN8d0E3900)
[3] Understanding Hinton’s Capsule Networks. Part IV: CapsNet Architecture.
(https://medium.com/@pechyonkin/part-iv-capsnet-architecture-6a64422f7dce)
[4] Geoffrey Hinton talk "What is wrong with convolutional neural nets ?"
(https://www.youtube.com/watch?v=rTawFwUvnLE)
Additional Information on CapsNet