object detection, deep learning, and r-cnnsshapiro/ee562/notes/vision2-18.pdfregion-based...
TRANSCRIPT
![Page 1: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/1.jpg)
Object detection, deep learning, and R-CNNs
Partly from Ross Girshick
Microsoft Research
Now at Facebook
![Page 2: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/2.jpg)
Outline
• Object detection• the task, evaluation, datasets
• Convolutional Neural Networks (CNNs)• overview and history
• Region-based Convolutional Networks (R-CNNs)
![Page 3: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/3.jpg)
Image classification
• 𝐾 classes
• Task: assign correct class label to the whole image
Digit classification (MNIST) Object recognition (Caltech-101)
![Page 4: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/4.jpg)
Classification vs. Detection
Dog
DogDog
![Page 5: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/5.jpg)
Problem formulation
person
motorbike
Input Desired output
{ airplane, bird, motorbike, person, sofa }
![Page 6: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/6.jpg)
Evaluating a detector
Test image (previously unseen)
![Page 7: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/7.jpg)
First detection ...
‘person’ detector predictions
0.9
![Page 8: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/8.jpg)
Second detection ...
0.9
0.6
‘person’ detector predictions
![Page 9: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/9.jpg)
Third detection ...
0.9
0.6
0.2
‘person’ detector predictions
![Page 10: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/10.jpg)
Compare to ground truth
ground truth ‘person’ boxes
0.9
0.6
0.2
‘person’ detector predictions
![Page 11: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/11.jpg)
Sort by confidence
... ... ... ... ...
✓ ✓ ✓
0.9 0.8 0.6 0.5 0.2 0.1
truepositive(high overlap)
falsepositive(no overlap,low overlap, or duplicate)
X X X
![Page 12: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/12.jpg)
Evaluation metric
... ... ... ... ...
0.9 0.8 0.6 0.5 0.2 0.1
✓ ✓ ✓X X X
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑡 =#𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠@𝑡
#𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠@𝑡 + #𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠@𝑡
𝑟𝑒𝑐𝑎𝑙𝑙@𝑡 =#𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠@𝑡
#𝑔𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ 𝑜𝑏𝑗𝑒𝑐𝑡𝑠
𝑡
✓✓ + X
![Page 13: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/13.jpg)
Evaluation metric
Average Precision (AP)0% is worst100% is best
mean AP over classes(mAP)
... ... ... ... ...
0.9 0.8 0.6 0.5 0.2 0.1
✓ ✓ ✓X X X
![Page 14: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/14.jpg)
Pedestrians
Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005
AP ~77%More sophisticated methods: AP ~90%
(a) average gradient image over training examples(b) each “pixel” shows max positive SVM weight in the block centered on that pixel(c) same as (b) for negative SVM weights(d) test image(e) its R-HOG descriptor(f) R-HOG descriptor weighted by positive SVM weights(g) R-HOG descriptor weighted by negative SVM weights
![Page 15: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/15.jpg)
Overview of HOG Method
1. Compute gradients in the region to be described
2. Put them in bins according to orientation
3. Group the cells into large blocks
4. Normalize each block
5. Train classifiers to decide if these are parts of a human
![Page 16: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/16.jpg)
Details
• Gradients[-1 0 1] and [-1 0 1]T were good enough filters.
• Cell HistogramsEach pixel within the cell casts a weighted vote for an orientation-based histogram channel based on the values found in the gradient computation. (9 channels worked)
• BlocksGroup the cells together into larger blocks, either R-HOGblocks (rectangular) or C-HOG blocks (circular).
![Page 17: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/17.jpg)
More Details
• Block Normalization
They tried 4 different kinds of normalization.Let be the block to be normalized and e be a small constant.
![Page 18: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/18.jpg)
Example: Dalal-Triggs pedestrian detector
1. Extract fixed-sized (64x128 pixel) window at each position and scale
2. Compute HOG (histogram of gradient) features within each window
3. Score the window with a linear SVM classifier
4. Perform non-maxima suppression to remove overlapping detections with lower scores
Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
![Page 19: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/19.jpg)
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
![Page 20: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/20.jpg)
uncentered
centered
cubic-corrected
diagonal
Sobel
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Outperforms
![Page 21: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/21.jpg)
• Histogram of gradient orientations
• Votes weighted by magnitude
• Bilinear interpolation between cells
Orientation: 9 bins (for unsigned angles)
Histograms in 8x8 pixel cells
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
![Page 22: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/22.jpg)
Normalize with respect to surrounding cells
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
![Page 23: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/23.jpg)
X=
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
# features = 15 x 7 x 9 x 4 = 3780
# cells
# orientations
# normalizations by neighboring cells
![Page 24: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/24.jpg)
Training set
![Page 25: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/25.jpg)
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
pos w neg w
![Page 26: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/26.jpg)
pedestrian
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
![Page 27: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/27.jpg)
Detection examples
![Page 28: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/28.jpg)
![Page 29: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/29.jpg)
Deformable Parts Model
• Takes the idea a little further
• Instead of one rigid HOG model, we have multiple HOG models in a spatial arrangement
• One root part to find first and multiple other parts in a tree structure.
![Page 30: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/30.jpg)
The Idea
Articulated parts model• Object is configuration of parts
• Each part is detectable
Images from Felzenszwalb
![Page 31: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/31.jpg)
Deformable objects
Images from Caltech-256
Slide Credit: Duan Tran
![Page 32: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/32.jpg)
Deformable objects
Images from D. Ramanan’s datasetSlide Credit: Duan Tran
![Page 33: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/33.jpg)
How to model spatial relations?• Tree-shaped model
![Page 34: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/34.jpg)
![Page 35: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/35.jpg)
Hybrid template/parts model
Detections
Template Visualization
Felzenszwalb et al. 2008
![Page 36: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/36.jpg)
Pictorial Structures Model
Appearance likelihood Geometry likelihood
![Page 37: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/37.jpg)
Results for person matching
39
![Page 38: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/38.jpg)
Results for person matching
40
![Page 39: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/39.jpg)
BMVC 2009
![Page 40: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/40.jpg)
2012 State-of-the-art Detector:Deformable Parts Model (DPM)
42Felzenszwalb et al., 2008, 2010, 2011, 2012
Lifetime
Achievement
1. Strong low-level features based on HOG2. Efficient matching algorithms for deformable part-based
models (pictorial structures)3. Discriminative learning with latent variables (latent SVM)
![Page 41: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/41.jpg)
Why did gradient-based models work?
Average gradient image
![Page 42: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/42.jpg)
Generic categories
Can we detect people, chairs, horses, cars, dogs, buses, bottles, sheep …?PASCAL Visual Object Categories (VOC) dataset
![Page 43: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/43.jpg)
Generic categoriesWhy doesn’t this work (as well)?
Can we detect people, chairs, horses, cars, dogs, buses, bottles, sheep …?PASCAL Visual Object Categories (VOC) dataset
![Page 44: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/44.jpg)
Quiz time(Back to Girshick)
![Page 45: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/45.jpg)
Warm up
This is an average image of which object class?
![Page 46: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/46.jpg)
Warm up
pedestrian
![Page 47: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/47.jpg)
A little harder
?
![Page 48: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/48.jpg)
A little harder
?Hint: airplane, bicycle, bus, car, cat, chair, cow, dog, dining table
![Page 49: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/49.jpg)
A little harder
bicycle (PASCAL)
![Page 50: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/50.jpg)
A little harder, yet
?
![Page 51: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/51.jpg)
A little harder, yet
?Hint: white blob on a green background
![Page 52: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/52.jpg)
A little harder, yet
sheep (PASCAL)
![Page 53: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/53.jpg)
Impossible?
?
![Page 54: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/54.jpg)
Impossible?
dog (PASCAL)
![Page 55: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/55.jpg)
Impossible?
dog (PASCAL)Why does the mean look like this?
There’s no alignment between the examples!How do we combat this?
![Page 56: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/56.jpg)
PASCAL VOC detection history
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mea
n A
vera
ge P
reci
sio
n (
mA
P)
year
DPM
DPM,HOG+BOW
DPM,MKL
DPM++DPM++,MKL,SelectiveSearch
Selective Search,DPM++,MKL
41%41%37%
28%23%
17%
![Page 57: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/57.jpg)
Part-based models & multiple features (MKL)
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mea
n A
vera
ge P
reci
sio
n (
mA
P)
year
DPM
DPM,HOG+BOW
DPM,MKL
DPM++DPM++,MKL,SelectiveSearch
Selective Search,DPM++,MKL
41%41%37%
28%23%
17%
![Page 58: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/58.jpg)
Kitchen-sink approaches
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mea
n A
vera
ge P
reci
sio
n (
mA
P)
year
DPM
DPM,HOG+BOW
DPM,MKL
DPM++DPM++,MKL,SelectiveSearch
Selective Search,DPM++,MKL
41%41%37%
28%23%
17%
increasing complexity & plateau
![Page 59: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/59.jpg)
Region-based Convolutional Networks (R-CNNs)
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mea
n A
vera
ge P
reci
sio
n (
mA
P)
year
DPM
DPM,HOG+BOW
DPM,MKL
DPM++DPM++,MKL,SelectiveSearch
Selective Search,DPM++,MKL
41%41%37%
28%23%
17%
53%
62%
R-CNN v1
R-CNN v2
[R-CNN. Girshick et al. CVPR 2014]
![Page 60: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/60.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mea
n A
vera
ge P
reci
sio
n (
mA
P)
year
~1 year
~5 years
Region-based Convolutional Networks (R-CNNs)
[R-CNN. Girshick et al. CVPR 2014]
![Page 61: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/61.jpg)
Convolutional Neural Networks
• Overview
![Page 62: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/62.jpg)
Standard Neural Networks
𝒙 = 𝑥1, … , 𝑥784𝑇 𝑧𝑗 = 𝑔(𝒘𝑗
𝑇𝒙) 𝑔 𝑡 =1
1 + 𝑒−𝑡
“Fully connected”
![Page 63: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/63.jpg)
From NNs to Convolutional NNs
• Local connectivity
• Shared (“tied”) weights
• Multiple feature maps
• Pooling
![Page 64: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/64.jpg)
Convolutional NNs
• Local connectivity
• Each green unit is only connected to (3)neighboring blue units
compare
![Page 65: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/65.jpg)
Convolutional NNs
• Shared (“tied”) weights
• All green units share the same parameters 𝒘
• Each green unit computes the same function,but with a different input window
𝑤1
𝑤2
𝑤3
𝑤1
𝑤2
𝑤3
![Page 66: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/66.jpg)
Convolutional NNs
• Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function,but with a different input window
𝑤1
𝑤2
𝑤3
![Page 67: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/67.jpg)
Convolutional NNs
• Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function,but with a different input window
𝑤1
𝑤2
𝑤3
![Page 68: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/68.jpg)
Convolutional NNs
• Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function,but with a different input window
𝑤1
𝑤2
𝑤3
![Page 69: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/69.jpg)
Convolutional NNs
• Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function,but with a different input window
𝑤1
𝑤2
𝑤3
![Page 70: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/70.jpg)
Convolutional NNs
• Convolution with 1-D filter: [𝑤3, 𝑤2, 𝑤1]
• All green units share the same parameters 𝒘
• Each green unit computes the same function,but with a different input window𝑤1
𝑤2
𝑤3
![Page 71: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/71.jpg)
Convolutional NNs
• Multiple feature maps
• All orange units compute the same functionbut with a different input windows
• Orange and green units compute different functions
𝑤1
𝑤2
𝑤3
𝑤′1𝑤′2𝑤′3
Feature map 1(array of greenunits)
Feature map 2(array of orangeunits)
![Page 72: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/72.jpg)
Convolutional NNs
• Pooling (max, average)
1
4
0
3
4
3
• Pooling area: 2 units
• Pooling stride: 2 units
• Subsamples feature maps
![Page 73: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/73.jpg)
Image
Pooling
Convolution
2D input
![Page 74: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/74.jpg)
1989
Backpropagation applied to handwritten zip code recognition, Lecun et al., 1989
![Page 75: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/75.jpg)
Historical perspective – 1980
![Page 76: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/76.jpg)
Historical perspective – 1980Hubel and Wiesel1962
Included basic ingredients of ConvNets, but no supervised learning algorithm
![Page 77: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/77.jpg)
Supervised learning – 1986
Early demonstration that error backpropagation can be usedfor supervised training of neural nets (including ConvNets)
Gradient descent training with error backpropagation
![Page 78: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/78.jpg)
Supervised learning – 1986
“T” vs. “C” problem Simple ConvNet
![Page 79: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/79.jpg)
Practical ConvNets
Gradient-Based Learning Applied to Document Recognition, Lecun et al., 1998
![Page 80: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/80.jpg)
Demo
• http://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html
• ConvNetJS by Andrej Karpathy (Ph.D. student at Stanford)
Software libraries
• Caffe (C++, python, matlab)
• Torch7 (C++, lua)
• Theano (python)
![Page 81: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/81.jpg)
The fall of ConvNets
• The rise of Support Vector Machines (SVMs)
• Mathematical advantages (theory, convex optimization)
• Competitive performance on tasks such as digit classification
• Neural nets became unpopular in the mid 1990s
![Page 82: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/82.jpg)
The key to SVMs
• It’s all about the features
Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005
SVM weights(+) (-)
HOG features
![Page 83: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/83.jpg)
Core idea of “deep learning”
• Input: the “raw” signal (image, waveform, …)
• Features: hierarchy of features is learned from the raw input
![Page 84: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/84.jpg)
• If SVMs killed neural nets, how did they come back (in computer vision)?
![Page 85: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/85.jpg)
What’s new since the 1980s?
• More layers• LeNet-3 and LeNet-5 had 3 and 5 learnable layers
• Current models have 8 – 20+
• “ReLU” non-linearities (Rectified Linear Unit)• 𝑔 𝑥 = max 0, 𝑥
• Gradient doesn’t vanish
• “Dropout” regularization
• Fast GPU implementations
• More data
𝑥
𝑔(𝑥)
![Page 86: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/86.jpg)
What else? Object Proposals• Sliding window based object detection
• Object proposals• Fast execution
• High recall with low # of candidate boxes
ImageFeature
ExtractionClassificaiton
Iterate over window size, aspect ratio, and location
ImageFeature
ExtractionClassificaiton
Object Proposal
![Page 87: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/87.jpg)
The number of contours wholly enclosed by a bounding box is indicative of the likelihood of the box containing an object.
![Page 88: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/88.jpg)
Ross’s Own System: Region CNNs
![Page 89: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/89.jpg)
Competitive Results
![Page 90: Object detection, deep learning, and R-CNNsshapiro/EE562/notes/Vision2-18.pdfRegion-based Convolutional Networks (R-CNNs) [R-CNN. Girshick et al. CVPR 2014] ... Backpropagation applied](https://reader035.vdocuments.mx/reader035/viewer/2022081616/601f141f1401e05e457fcd76/html5/thumbnails/90.jpg)
Top Regions for Six Object Classes