![Page 1: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/1.jpg)
Article overview by Ilya Kuzovkin
K. He, X. Zhang, S. Ren and J. Sun Microsoft Research
Computational Neuroscience Seminar University of Tartu
2016
Deep Residual Learning for Image Recognition ILSVRC 2015
MS COCO 2015 WINNER
![Page 2: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/2.jpg)
THE IDEA
![Page 3: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/3.jpg)
1000 classes
![Page 4: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/4.jpg)
2012
8 layers 15.31% error
![Page 5: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/5.jpg)
2012
8 layers 15.31% error
2013
9 layers, 2x params 11.74% error
![Page 6: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/6.jpg)
2012
8 layers 15.31% error
2013 2014
9 layers, 2x params 11.74% error
19 layers 7.41% error
![Page 7: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/7.jpg)
2012
8 layers 15.31% error
2013 2014 2015
9 layers, 2x params 11.74% error
19 layers 7.41% error
?
![Page 8: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/8.jpg)
2012
8 layers 15.31% error
2013 2014 2015
9 layers, 2x params 11.74% error
19 layers 7.41% error
Is learning better networks as easy as stacking more layers?
?
![Page 9: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/9.jpg)
?
2012
8 layers 15.31% error
2013 2014 2015
9 layers, 2x params 11.74% error
19 layers 7.41% error
Is learning better networks as easy as stacking more layers?
Vanishing / exploding gradients
![Page 10: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/10.jpg)
?
2012
8 layers 15.31% error
2013 2014 2015
9 layers, 2x params 11.74% error
19 layers 7.41% error
Is learning better networks as easy as stacking more layers?
Vanishing / exploding gradients
Normalized initialization & intermediate normalization
![Page 11: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/11.jpg)
?
2012
8 layers 15.31% error
2013 2014 2015
9 layers, 2x params 11.74% error
19 layers 7.41% error
Is learning better networks as easy as stacking more layers?
Vanishing / exploding gradients
Normalized initialization & intermediate normalization
Degradation problem
![Page 12: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/12.jpg)
Degradation problem
“with the network depth increasing, accuracy gets saturated”
![Page 13: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/13.jpg)
Not caused by overfitting:
Degradation problem
“with the network depth increasing, accuracy gets saturated”
![Page 14: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/14.jpg)
Conv
Conv
Conv
Conv
Trained
Accuracy X%
Tested
![Page 15: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/15.jpg)
Conv
Conv
Conv
Conv
Trained
Accuracy X%
Conv
Conv
Conv
Conv
Identity
Identity
Identity
IdentityTested
Tested
![Page 16: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/16.jpg)
Conv
Conv
Conv
Conv
Trained
Accuracy X%
Conv
Conv
Conv
Conv
Identity
Identity
Identity
Identity
Same performance
Tested
Tested
![Page 17: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/17.jpg)
Conv
Conv
Conv
Conv
Trained
Accuracy X%
Conv
Conv
Conv
Conv
Identity
Identity
Identity
Identity
Same performance
Conv
Conv
Conv
Conv
Conv
Conv
Conv
Conv
Trained
Tested
Tested
Tested
![Page 18: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/18.jpg)
Conv
Conv
Conv
Conv
Trained
Accuracy X%
Conv
Conv
Conv
Conv
Identity
Identity
Identity
Identity
Same performance
Conv
Conv
Conv
Conv
Conv
Conv
Conv
Conv
Trained
Worse!
Tested
Tested
Tested
![Page 19: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/19.jpg)
Conv
Conv
Conv
Conv
Trained
Accuracy X%
Conv
Conv
Conv
Conv
Identity
Identity
Identity
Identity
Same performance
Conv
Conv
Conv
Conv
Conv
Conv
Conv
Conv
Trained
Worse!
Tested
Tested
Tested
“Our current solvers on hand are unable to find solutions that are comparably good or
better than the constructed solution (or unable to do so in feasible time)”
![Page 20: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/20.jpg)
Conv
Conv
Conv
Conv
Trained
Accuracy X%
Conv
Conv
Conv
Conv
Identity
Identity
Identity
Identity
Same performance
Conv
Conv
Conv
Conv
Conv
Conv
Conv
Conv
Trained
Worse!
Tested
Tested
Tested
“Our current solvers on hand are unable to find solutions that are comparably good or
better than the constructed solution (or unable to do so in feasible time)”
“Solvers might have difficulties in approximating identity mappings by
multiple nonlinear layers”
![Page 21: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/21.jpg)
Conv
Conv
Conv
Conv
Trained
Accuracy X%
Conv
Conv
Conv
Conv
Identity
Identity
Identity
Identity
Same performance
Conv
Conv
Conv
Conv
Conv
Conv
Conv
Conv
Trained
Worse!
Tested
Tested
Tested
“Our current solvers on hand are unable to find solutions that are comparably good or
better than the constructed solution (or unable to do so in feasible time)”
“Solvers might have difficulties in approximating identity mappings by
multiple nonlinear layers”
Add explicit identity connections and “solvers may simply drive the weights of
the multiple nonlinear layers toward zero”
![Page 22: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/22.jpg)
Add explicit identity connections and “solvers may simply drive the weights of the multiple
nonlinear layers toward zero”
is the true function we want to learn
![Page 23: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/23.jpg)
Add explicit identity connections and “solvers may simply drive the weights of the multiple
nonlinear layers toward zero”
is the true function we want to learn
Let’s pretend we want to learn
instead.
![Page 24: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/24.jpg)
Add explicit identity connections and “solvers may simply drive the weights of the multiple
nonlinear layers toward zero”
is the true function we want to learn
Let’s pretend we want to learn
instead.
The original function is then
![Page 25: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/25.jpg)
![Page 26: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/26.jpg)
Network can decide how deep it needs to be…
![Page 27: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/27.jpg)
Network can decide how deep it needs to be…
“The identity connections introduce neither extra parameter nor computation complexity”
![Page 28: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/28.jpg)
2012
8 layers 15.31% error
2013 2014 2015
9 layers, 2x params 11.74% error
19 layers 7.41% error
?
![Page 29: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/29.jpg)
2012
8 layers 15.31% error
2013 2014 2015
9 layers, 2x params 11.74% error
19 layers 7.41% error
152 layers 3.57% error
![Page 30: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/30.jpg)
EXPERIMENTS AND DETAILS
![Page 31: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/31.jpg)
• Lots of convolutional 3x3 layers • VGG complexity is 19.6 billion FLOPs
34-layer-ResNet is 3.6 bln. FLOPs
![Page 32: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/32.jpg)
• Lots of convolutional 3x3 layers • VGG complexity is 19.6 billion FLOPs
34-layer-ResNet is 3.6 bln. FLOPs !• Batch normalization • SGD with batch size 256 • (up to) 600,000 iterations • LR 0.1 (divided by 10 when error plateaus) • Momentum 0.9 • No dropout • Weight decay 0.0001
![Page 33: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/33.jpg)
• Lots of convolutional 3x3 layers • VGG complexity is 19.6 billion FLOPs
34-layer-ResNet is 3.6 bln. FLOPs !• Batch normalization • SGD with batch size 256 • (up to) 600,000 iterations • LR 0.1 (divided by 10 when error plateaus) • Momentum 0.9 • No dropout • Weight decay 0.0001 !• 1.28 million training images • 50,000 validation • 100,000 test
![Page 34: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/34.jpg)
• 34-layer ResNet has lower training error. This indicates that the degradation problem is well addressed and we manage to obtain accuracy gains from increased depth.
![Page 35: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/35.jpg)
• 34-layer ResNet has lower training error. This indicates that the degradation problem is well addressed and we manage to obtain accuracy gains from increased depth. !
• 34-layer-ResNet reduces the top-1 error by 3.5%
![Page 36: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/36.jpg)
• 34-layer ResNet has lower training error. This indicates that the degradation problem is well addressed and we manage to obtain accuracy gains from increased depth. !
• 34-layer-ResNet reduces the top-1 error by 3.5% !
• 18-layer ResNet converges faster and thus ResNet eases the optimization by providing faster convergence at the early stage.
![Page 37: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/37.jpg)
GOING DEEPER
![Page 38: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/38.jpg)
Due to time complexity the usual building block is replaced by Bottleneck Block
50 / 101 / 152 - layer ResNets are build from those blocks
![Page 39: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/39.jpg)
![Page 40: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/40.jpg)
![Page 41: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/41.jpg)
ANALYSIS ON CIFAR-10
![Page 42: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/42.jpg)
![Page 43: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/43.jpg)
![Page 44: Paper overview: "Deep Residual Learning for Image Recognition"](https://reader030.vdocuments.mx/reader030/viewer/2022020103/589cdd341a28abf86d8b4743/html5/thumbnails/44.jpg)
ImageNet Classification 2015 1st 3.57% error
ImageNet Object Detection 2015 1st 194 / 200 categories
ImageNet Object Localization 2015 1st 9.02% error
COCO Detection 2015 1st 37.3%
COCO Segmentation 2015 1st 28.2%
http://research.microsoft.com/en-us/um/people/kahe/ilsvrc15/ilsvrc2015_deep_residual_learning_kaiminghe.pdf