[pr12] capsule networks - jaejun yoo

Capsule Networks

PR12와 함께 이해하는

Jaejun YooPh.D. Candidate @KAIST

PR12

17th Dec, 2017

Today’s contents

Dynamic Routing Between Capsules

by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton

Oct. 2017: https://arxiv.org/abs/1710.09829

NIPS 2017 Paper

https://arxiv.org/abs/1710.09829

Convolutional Neural Networks

What is the problem with CNNs?

Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

1) If the images have rotation, tilt or any other different orientation then CNNs have poor performance.2) In CNN each layer understands an image at a much more granular level (slow increase in receptive field).

DATA AUGMENTATION,MAX POOLING

https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc




“Pooling helps in creating the positional invariance. Otherwise This invariance also triggers false positive for images which have the components of a ship but not in the correct order.”





“Pooling helps in creating the positional invariance. Otherwise This invariance also triggers false positive for images which have the components of a ship but not in the correct order.”

This was never the intention of pooling layer!



What we need : EQUIVARIANCE (not invariance)


“Equivariance makes a CNN understand the rotation or proportion change and adapt itself accordingly so that the spatial positioning inside an image is not lost.”


Capsules

“A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part.”

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

8D capsule e.g.


Capsules

8D vector

Inverse Rendering

https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets


8D capsule e.g.


Capsules

8D vector



8D capsule e.g.


Capsules

8D vector

Equivariance of Capsules


Contents from https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-iii-dynamic-routing-between-capsules-349f6d30418

Routing by Agreements

https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-iii-dynamic-routing-between-capsules-349f6d30418

Aurélien Géron, 2017

Primary Capsules

=

=

Primary Capsules


Predict Next Layer’s Output

=

=

Primary Capsules



=

=

One transformation matrix Wi,jper part/whole pair (i, j).

ûj|i = Wi,j ui

Primary Capsules



=

=

Primary Capsules


Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules


Routing by Agreement

=

=

Predicted Outputs

Primary Capsules

Strong agreement!


The rectangle and triangle capsules should be routed to the boat capsules.

Routing by Agreement

=

=

Predicted Outputs

Primary Capsules

Strong agreement!


Routing Weights

=

=

Predicted Outputs

Primary Capsules

bi,j=0 for all i, j


Routing Weights

=

=

Predicted Outputs

Primary Capsules

0.5

0.5

0.5

0.5

bi,j=0 for all i, j

ci = softmax(bi)



=

=

Predicted Outputs

sj = weighted sum

Primary Capsules

0.5

0.5

0.5

0.5



=

=

Predicted Outputs

Primary Capsules

0.5

0.5

0.5

0.5

sj = weighted sum

vj = squash(sj)


Actual outputsof the next layer capsules(round #1)


=

=

Predicted Outputs

Primary Capsules

0.5

0.5

0.5

0.5

sj = weighted sum

vj = squash(sj)



Update Routing Weights

=

=

Predicted Outputs

Primary Capsules

Agreement




=

=

Predicted Outputs

Primary Capsules

Agreement bi,j += ûj|i . vj




=

=

Predicted Outputs

Primary Capsules

Agreement bi,j += ûj|i . vj

Large




=

=

Predicted Outputs

Primary Capsules

Disagreement bi,j += ûj|i . vj

Small



=

=

Predicted Outputs

Primary Capsules

0.2

0.1

0.8

0.9



=

=

Predicted Outputs

sj = weighted sum

Primary Capsules

0.2

0.1

0.8

0.9



=

=

Predicted Outputs

Primary Capsules

sj = weighted sum

vj = squash(sj)0.2

0.1

0.8

0.9




=

=

Predicted Outputs

Primary Capsules

0.2

0.1

0.8

0.9


Handling Crowded Scenes

=

=

=

=



=

=

=

=

Is this an upside down house?



=

=

=

=

House

Thanks to routing by agreement, the ambiguity is quickly resolved (explaining away).

Boat


Classification CapsNet

|| ℓ2 || Estimated Class Probability


Training

|| ℓ2 || Estimated Class ProbabilityTo allow multiple classes, minimize margin loss:

Lk = Tk max(0, m+ - ||vk||2)

+ λ (1 - Tk) max(0, ||vk||2 - m-)

Tk = 1 iff class k is present

In the paper:m- = 0.1m+ = 0.9λ = 0.5


Training

Translated to English:

“If an object of class k

is present, then ||vk||2should be no less than 0.9. If not, then ||vk||2should be no more than 0.1.”

|| ℓ2 || Estimated Class ProbabilityTo allow multiple classes, minimize margin loss:

Lk = Tk max(0, m+ - ||vk||2)

+ λ (1 - Tk) max(0, ||vk||2 - m-)

Tk = 1 iff class k is present

In the paper:m- = 0.1m+ = 0.9λ = 0.5


Regularization by Reconstruction

|| ℓ2 ||

Feedforward Neural Network

Decoder

Reconstruction


Regularization by Reconstruction

|| ℓ2 ||

Feedforward Neural Network

Decoder

Reconstruction

Loss = margin loss + α reconstruction loss

The reconstruction loss is the squared difference between the reconstructed image and the input image.In the paper, α = 0.0005.


A CapsNet for MNIST

(Figure 1 from the paper)


A CapsNet for MNIST – Decoder



Interpretable Activation Vectors



Pros

● Reaches high accuracy on MNIST, and promising on CIFAR10

● Requires less training data

● Position and pose information are preserved (equivariance)

● This is promising for image segmentation and object detection

● Routing by agreement is great for overlapping objects (explaining away)

● Capsule activations nicely map the hierarchy of parts

● Offers robustness to affine transformations● Activation vectors are easier to interpret (rotation, thickness, skew…)

● It’s Hinton! ;-)


● Not state of the art on CIFAR10 (but it’s a good start)

● Not tested yet on larger images (e.g., ImageNet): will it work well?

● Slow training, due to the inner loop (in the routing by agreement algorithm)

● A CapsNet cannot see two very close identical objects○ This is called “crowding”, and it has been observed as well in human vision

Cons

Results

What the individual dimensions of a capsule represent

Results

MultiMNISTSegmenting Highly Overlapping Digits

Questions Remained

Does capsules really work as the real neurons do?

perceptual illusions

Thompson, P. (1980). Margaret Thatcher: a new illusion. Perception, 38, (6). 483-484.

• https://arxiv.org/abs/1710.09829 (paper)• https://jhui.github.io/2017/11/03/Dynamic-Routing-Between-Capsules/

• https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

• https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

• https://www.youtube.com/watch?v=pPN8d0E3900 (video)• https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets (video slides)

References

https://arxiv.org/abs/1710.09829

https://jhui.github.io/2017/11/03/Dynamic-Routing-Between-Capsules/


https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

https://www.youtube.com/watch?v=pPN8d0E3900


[pr12] capsule networks - jaejun yoo

Science