uc berkeley fully convolutional networks for semantic segmentation jonathan long* evan shelhamer*...
TRANSCRIPT
![Page 1: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/1.jpg)
UC Berkeley
Fully Convolutional Networksfor Semantic Segmentation
Jonathan Long* Evan Shelhamer* Trevor Darrell
1Presented by: Gordon Christie
Slide credit: Jonathan Long
![Page 2: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/2.jpg)
Overview
• Reinterpret standard classification convnets as “Fully convolutional” networks (FCN) for semantic segmentation
• Use AlexNet, VGG, and GoogleNet in experiments• Novel architecture: combine information from different
layers for segmentation• State-of-the-art segmentation for PASCAL VOC
2011/2012, NYUDv2, and SIFT Flow at the time• Inference less than one fifth of a second for a typical
image
2Slide credit: Jonathan Long
![Page 3: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/3.jpg)
pixels in, pixels outmonocular depth estimation (Liu et al. 2015)
boundary prediction (Xie & Tu 2015)
semanticsegmentation
3Slide credit: Jonathan Long
![Page 4: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/4.jpg)
4
“tabby cat”
1000-dim vector
< 1 millisecond
convnets perform classification
end-to-end learning
Slide credit: Jonathan Long
![Page 5: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/5.jpg)
R-CNN
5
many seconds
“cat”
“dog”
R-CNN does detection
Slide credit: Jonathan Long
![Page 6: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/6.jpg)
6
R-CNN
figure: Girshick et al.
Slide credit: Jonathan Long
![Page 7: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/7.jpg)
7
< 1/5 second
end-to-end learning
???
Slide credit: Jonathan Long
![Page 8: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/8.jpg)
“tabby cat”
8
a classification network
Slide credit: Jonathan Long
![Page 9: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/9.jpg)
9
becoming fully convolutional
Slide credit: Jonathan Long
![Page 10: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/10.jpg)
10
becoming fully convolutional
Slide credit: Jonathan Long
![Page 11: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/11.jpg)
11
upsampling output
Slide credit: Jonathan Long
![Page 12: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/12.jpg)
conv, pool,nonlinearity
upsampling
pixelwiseoutput + loss
end-to-end, pixels-to-pixels network
12Slide credit: Jonathan Long
![Page 13: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/13.jpg)
Dense Predictions
• Shift-and-stitch: trick that yields dense predictions without interpolation
• Upsampling via deconvolution• Shift-and-stitch used in preliminary experiments, but not
included in final model• Upsampling found to be more effective and efficient
13
![Page 14: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/14.jpg)
Classifier to Dense FCN
• Convolutionalize proven classification architectures: AlexNet, VGG, and GoogLeNet (reimplementation)
• Remove classification layer and convert all fully connected layers to convolutions
• Append 1x1 convolution with channel dimensions and predict scores at each of the coarse output locations (21 categories + background for PASCAL)
14
![Page 15: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/15.jpg)
Classifier to Dense FCN
Cast ILSVRC classifiers into FCNs and compare performance on validation set of PASCAL 2011
15
![Page 16: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/16.jpg)
spectrum of deep features
combine where (local, shallow) with what (global, deep)
fuse features into deep jet
(cf. Hariharan et al. CVPR15 “hypercolumn”) 16Slide credit: Jonathan Long
![Page 17: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/17.jpg)
skip layers
skip to fuse layers!
interp + sum
interp + sum
dense output 17
end-to-end, joint learningof semantics and location
Slide credit: Jonathan Long
![Page 18: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/18.jpg)
skip layers
18
![Page 19: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/19.jpg)
Comparison of skip FCNs
19
Results on subset of validation set of PASCAL VOC 2011
![Page 20: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/20.jpg)
stride 32
no skips
stride 16
1 skip
stride 8
2 skips
ground truthinput image
skip layer refinement
20Slide credit: Jonathan Long
![Page 21: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/21.jpg)
training + testing- train full image at a time without patch sampling - reshape network to take input of any size- forward time is ~150ms for 500 x 500 x 21 output
21Slide credit: Jonathan Long
![Page 22: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/22.jpg)
Results – PASCAL VOC 2011/12
VOC 2011: 8498 training images (from additional labeled data
22
![Page 23: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/23.jpg)
Results – NYUDv2
23
1449 RGB-D images with pixelwise labels 40 categories
![Page 24: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/24.jpg)
Results – SIFT Flow2688 images with pixel labels
33 semantic categories, 3 geometric categoriesLearn both label spaces jointly
learning and inference have similar performance and computation as independent models
24
![Page 25: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/25.jpg)
FCN SDS* Truth Input
25
Relative to prior state-of-the-art SDS:
20% relative improvementfor mean IoU
286× faster
*Simultaneous Detection and Segmentation Hariharan et al. ECCV14Slide credit: Jonathan Long
![Page 26: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/26.jpg)
leaderboard
== segmentation with Caffe
26
FCNFCNFCNFCNFCNFCNFCNFCNFCNFCNFCN
FCNFCNFCN
FCN
Slide credit: Jonathan Long
![Page 27: UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide](https://reader033.vdocuments.mx/reader033/viewer/2022061612/5697bfa31a28abf838c96df1/html5/thumbnails/27.jpg)
conclusion
fully convolutional networks are fast, end-to-end models for pixelwise problems
- code in Caffe branch (merged soon)- models for PASCAL VOC, NYUDv2,
SIFT Flow, PASCAL-Context
27
caffe.berkeleyvision.org
github.com/BVLC/caffefcn.berkeleyvision.org
Slide credit: Jonathan Long