ilsvrc 2015 cls-loc. d. yoo1, k. paeng1, s. park1, s. hwang2
TRANSCRIPT
![Page 1: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/1.jpg)
ILSVRC 2015 CLS-LOC.
Multi-Class AttentionNet.D. Yoo1, K. Paeng1, S. Park1, S. Hwang2, H. E. Kim2, J. Lee2,
M. Jang2, A. S. Paek2, K. K. Kim1, S. D. Kim1, I. S. Kweon1.1KAIST, 2Lunit Inc.
![Page 2: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/2.jpg)
State-of-the-art methods for object localization.
![Page 3: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/3.jpg)
State-of-the-art methods for object localization.
1) Box-regression with a CNN.
[Szegedy et al., NIPS’13],
DeepMultiBox [Erhan et al., CVPR’14],OverFeat [Sermanet et al., ICLR’14],
…
![Page 4: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/4.jpg)
State-of-the-art methods for object localization.
1) Box-regression with a CNN.
(−) Direct mapping from an image to an exactbounding box is relatively difficult for a CNN.
(X1,Y1)
(X2,Y2)
CN
N
![Page 5: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/5.jpg)
State-of-the-art methods for object localization.
2) Region proposal + classifier.
R-CNN [Gkioxari et al., CVPR’14],Fast R-CNN [Gkioxari, ICCV’15],
Faster R-CNN [Ren et al., NIPS’15],DeepMultiBox [Erhan et al., CVPR’14],
…
![Page 6: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/6.jpg)
State-of-the-art methods for object localization.
2) Region proposal + classifier.
(−) Prone to focus on discriminative part (e.g. face)rather than entire object (e.g. human body).
![Page 7: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/7.jpg)
Idea:Ensemble of weak directions.
![Page 8: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/8.jpg)
Idea:Ensemble of weak directions.
![Page 9: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/9.jpg)
Idea:Ensemble of weak directions.
![Page 10: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/10.jpg)
Idea:Ensemble of weak directions.
![Page 11: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/11.jpg)
Idea:Ensemble of weak directions.
![Page 12: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/12.jpg)
Idea:Ensemble of weak directions.
![Page 13: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/13.jpg)
Idea:Ensemble of weak directions.
Stop signal.
![Page 14: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/14.jpg)
Idea:Ensemble of weak directions.
Stop signal.
![Page 15: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/15.jpg)
Idea:Ensemble of weak directions.
Stop signal.
![Page 16: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/16.jpg)
Idea:Ensemble of weak directions.
Stop signal.
Stop signal.
![Page 17: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/17.jpg)
Idea:Ensemble of weak directions.
Stop signal.
Stop signal.
![Page 18: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/18.jpg)
Model:
![Page 19: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/19.jpg)
Model:(CNN regression model)
![Page 20: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/20.jpg)
Model:Rather than CNN regression model, we use CNN classification model.
![Page 21: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/21.jpg)
Model:Rather than CNN regression model, we use CNN classification model.
Define weak directions:fixed length, and quantized.
![Page 22: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/22.jpg)
Strength to the previous methods.
Box-regression:(−) Relatively
difficult for a CNN.
Weak direction:
(+) Relatively
easy for a CNN.
![Page 23: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/23.jpg)
Strength to the previous methods.
R-CNN:(−) Focuses on
distinctive parts.
Box-regression:(−) Relatively
difficult for a CNN.
Weak direction:
(+) Relatively
easy for a CNN.
Stop signal:
(+) Supervision of
clear terminal point.
Stop signal.
Stop signal.
![Page 24: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/24.jpg)
AttentionNet:Two layers for each corner.
CNN
Top-left corner. Bottom-right corner.
![Page 25: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/25.jpg)
AttentionNet:Two layers for each corner.
CNN
Top-left corner. Bottom-right corner.
![Page 26: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/26.jpg)
AttentionNet:Two layers for each corner.
CNN
F F
Top-left corner. Bottom-right corner.
![Page 27: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/27.jpg)
AttentionNet: iterative classification.
![Page 28: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/28.jpg)
AttentionNet.
AttentionNet: iterative classification.
Resize
CN
N
F•
F•
![Page 29: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/29.jpg)
AttentionNet.
AttentionNet: iterative classification.
F &&
F
Resize
CN
N
F•
F•
1
Reject.
![Page 30: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/30.jpg)
Detected.
AttentionNet.
AttentionNet: iterative classification.
F &&
F
• &&
•
0
Resize
CN
N
F•
F•
1 1
Reject.
![Page 31: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/31.jpg)
Detected.
AttentionNet.
AttentionNet: iterative classification.
F &&
F
• &&
•
0 0
Resize
CN
N
F•
F•
1 1
Reject.
![Page 32: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/32.jpg)
Detected.
AttentionNet.
AttentionNet: iterative classification.
F &&
F
• &&
•
0 0
Resize
CN
N
F•
F•
1 1
Reject.
![Page 33: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/33.jpg)
Detected.
AttentionNet.
AttentionNet: iterative classification.
F &&
F
• &&
•
0 0
Resize
CN
N
F•
F•
1 1
Reject.
![Page 34: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/34.jpg)
Detected.
AttentionNet.
AttentionNet: iterative classification.
F &&
F
• &&
•
0 0
Resize
CN
N
F•
F•
1 1
Reject.
![Page 35: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/35.jpg)
![Page 36: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/36.jpg)
![Page 37: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/37.jpg)
Initial box proposal:
![Page 38: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/38.jpg)
Initial box proposal:
Boxes satisfying .
![Page 39: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/39.jpg)
Initial box proposal:
Boxes satisfying .
Rejected.
![Page 40: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/40.jpg)
Initial box proposal:
Boxes satisfying .
Rejected.
![Page 41: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/41.jpg)
Initial box proposal:
Boxes satisfying .
Continue.
![Page 42: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/42.jpg)
Initial box proposal:
Boxes satisfying .
Detected.
AttentionNet.
F &&
F
• &&
•
0 0
Resize
CN
N
F•
F•
1 1
Reject.
![Page 43: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/43.jpg)
Initial box proposal:
Boxes satisfying .
Multi-{scale, aspect ratio} sliding window searchusing fully-convolutional network.
![Page 44: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/44.jpg)
Initial detection and refinement.
![Page 45: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/45.jpg)
Initial detection and refinement.
![Page 46: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/46.jpg)
Initial detection and refinement.
![Page 47: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/47.jpg)
Initial detection and refinement.
![Page 48: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/48.jpg)
Initial detection and refinement.
![Page 49: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/49.jpg)
Extension to multiple classes.
AttentionNet.
CNN
![Page 50: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/50.jpg)
Extension to multiple classes.
AttentionNet.
CNN
Class 1.
![Page 51: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/51.jpg)
Extension to multiple classes.
Multi-class AttentionNet.
CNN
Class 1. Class 2.
![Page 52: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/52.jpg)
Extension to multiple classes.
CNN
Class 1. Class 2. Class 3.
Multi-class AttentionNet.
![Page 53: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/53.jpg)
Extension to multiple classes.
CNN
Class 1. Class 2. Class 3. Class N.
⋯
Multi-class AttentionNet.
![Page 54: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/54.jpg)
Extension to multiple classes.
CNN
Class 1. Class 2. Class 3. Class N.
⋯
Multi-class AttentionNet.
Class-wise direction layers. Classification layer.
![Page 55: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/55.jpg)
Final architecture.
•↘ →↓ •↑ ←↖
Conv8-C1-TL.
1*1*1,024*4
Conv8-C1-BR.
1*1*1,024*4
Conv8-C2-TL.
1*1*1,024*4
Conv8-C2-BR.
1*1*1,024*4
Conv8-CN-TL.
1*1*1,024*4
Conv8-CN-BR.
1*1*1,024*4
•↘ →↓ •↑ ←↖ •↘ →↓ •↑ ←↖
Conv8-CLS.
1*1*1,024*(N+1)
FC1 C2 C3 ⋯ CN
� � � �� � �
Classification layer.
Directional layers.
GoogLeN
et
[Sze
gedy
et a
l, CVPR’1
5]
![Page 56: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/56.jpg)
Training multi-class AttentionNet.
![Page 57: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/57.jpg)
Training multi-class AttentionNet.
•Pre-training.• GoogLeNet [Szegedy et al, CVPR’15].
• ILSVRC-CLS dataset.
![Page 58: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/58.jpg)
Training multi-class AttentionNet.
•Pre-training.• GoogLeNet [Szegedy et al, CVPR’15].
• ILSVRC-CLS dataset.
•Fine-tuning.• # epochs: 5.
• # training region: 22M. (randomly generated.)• Learning rate of the classification layer: 0.01.
• Learning rate of the 2K(=1K+1K) directional layers: 0.01.
• Learning rate of the layers from conv1 to conv21: 0.001.
![Page 59: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/59.jpg)
Training multi-class AttentionNet.
𝐿𝑜𝑠𝑠 =1
3𝐿𝑜𝑠𝑠𝑇𝐿 +
1
3𝐿𝑜𝑠𝑠𝐵𝑅 +
1
3𝐿𝑜𝑠𝑠𝐶𝐿𝑆,
![Page 60: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/60.jpg)
Training multi-class AttentionNet.
𝐿𝑜𝑠𝑠 =1
3𝐿𝑜𝑠𝑠𝑇𝐿 +
1
3𝐿𝑜𝑠𝑠𝐵𝑅 +
1
3𝐿𝑜𝑠𝑠𝐶𝐿𝑆,
𝐿𝑜𝑠𝑠𝑇𝐿 =1
𝑁
𝑖=1
𝑁
𝑡𝑐𝑖𝑇𝐿 ≠ 0 ⋅ 𝑆𝑜𝑓𝑡𝑀𝑎𝑥𝐿𝑜𝑠𝑠 𝑦𝑐𝑖
𝑇𝐿, 𝑡𝑐𝑖𝑇𝐿 ,
𝐿𝑜𝑠𝑠𝐵𝑅 =1
𝑁
𝑖=1
𝑁
𝑡𝑐𝑖𝐵𝑅 ≠ 0 ⋅ 𝑆𝑜𝑓𝑡𝑀𝑎𝑥𝐿𝑜𝑠𝑠 𝑦𝑐𝑖
𝐵𝑅 , 𝑡𝑐𝑖𝐵𝑅 ,
𝐿𝑜𝑠𝑠𝐶𝐿𝑆 = 𝑆𝑜𝑓𝑡𝑀𝑎𝑥𝐿𝑜𝑠𝑠 𝑦𝐶𝐿𝑆, 𝑡𝐶𝐿𝑆 .
![Page 61: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/61.jpg)
Test:Given top-5 class predictions,
we detect the classes by AttentionNet.
![Page 62: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/62.jpg)
Test:Given top-5 class predictions,
we detect the classes by AttentionNet.
•Top-5 class prediction (7% Err): Ensemble of GoogLeNet, GoogLeNet-BN, VGG-16.
![Page 63: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/63.jpg)
Test:Given top-5 class predictions,
we detect the classes by AttentionNet.
•Top-5 class prediction (7% Err): Ensemble of GoogLeNet, GoogLeNet-BN, VGG-16.
•Number of multi-{scale, aspect ratio} inputs: 6.
![Page 64: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/64.jpg)
Results on validation set.
![Page 65: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/65.jpg)
Results on validation set.
Method. Top-5 CLS-LOC Error.
OverFeat [Sermanet et al., ICLR’14] 30.00%
VGG [Simonyan and Zisserman, ICLR’15] 26.90%
GoogLeNet [Szegedy et al, CVPR’15] 26.70% (test set)
![Page 66: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/66.jpg)
Results on validation set.
Method. Top-5 CLS-LOC Error.
OverFeat [Sermanet et al., ICLR’14] 30.00%
VGG [Simonyan and Zisserman, ICLR’15] 26.90%
GoogLeNet [Szegedy et al, CVPR’15] 26.70% (test set)
A single “Multi-class AttentionNet”, without test augmentation.
16.11%
![Page 67: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/67.jpg)
Results on validation set.
Method. Top-5 CLS-LOC Error.
OverFeat [Sermanet et al., ICLR’14] 30.00%
VGG [Simonyan and Zisserman, ICLR’15] 26.90%
GoogLeNet [Szegedy et al, CVPR’15] 26.70% (test set)
A single “Multi-class AttentionNet”, without test augmentation.
16.11%
A single “Multi-class AttentionNet”, with test augmentation (original and flip).
14.96%
![Page 68: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/68.jpg)
Results on validation set.
Method. Top-5 CLS-LOC Error.
OverFeat [Sermanet et al., ICLR’14] 30.00%
VGG [Simonyan and Zisserman, ICLR’15] 26.90%
GoogLeNet [Szegedy et al, CVPR’15] 26.70% (test set)
A single “Multi-class AttentionNet”, without test augmentation.
16.11%
A single “Multi-class AttentionNet”, with test augmentation (original and flip).
14.96%
Note that we use a SINGLE “Multi-class AttiontionNet”.
![Page 69: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2](https://reader031.vdocuments.mx/reader031/viewer/2022030322/588c5d221a28ab3a218b606f/html5/thumbnails/69.jpg)
Related publication:
Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony S. Paek, In So Kweon,
AttentionNet: Aggregating Weak Directions for Accurate Object Detection,
In ICCV, 2015.