![Page 1: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/1.jpg)
Rodrigo Benenson
Detecting objects at 100 Hz with rigid templates
Markus MathiasRadu Timofte
Marco PedersoliShanshan Zhang
Mohamed OmranJan HosangPiotr Dollar
Luc Van GoolTinne Tuytelaars
Bernt Schiele(and other authors)
![Page 2: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/2.jpg)
![Page 3: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/3.jpg)
1) Who is familiar with the “HOG descriptor” ?
2) Who is already familiar with the “ICF detectors” ?
3) Who has needed to use a detector ?
4) Who has trained a detector him/herself ?
5) Who has written a paper about detection ?
6) Any burning question ? Shoot now !
![Page 4: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/4.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 7
The next 45 minutes:
How to design a detectorrunning at 100 Hz (CPU only),
step by step
https://youtu.be/scDyiDl7D7c
![Page 5: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/5.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 8
Why care about 100 Hz detection ?
● When detection is only a component of a larger system
● When computational power matters (e.g. cell phones !)
● When latency matters
● For certain categories, can be both fast and reach competitive detection quality (e.g. pedestrian, face, and traffic signs)
● Used as class-specific object proposals for convnets
![Page 6: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/6.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 9
How to reach high detection speed ?
I only know three ways:
1) Wait 6 months
2) Make a better fit between algorithm and hardware
3) Design an algorithm that computes less
![Page 7: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/7.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 10
How to reach high detection quality ?
● “A + B features”; boring, and slower than A or B.
● Find the core ingredient, then optimize it.
● Do not hand-design, arbitrary choices are most likely wrong.
![Page 8: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/8.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 11
Running example:Pedestrian detection
![Page 9: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/9.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 12
Pedestrian detection is harder than you might think
INRIA training examples
![Page 10: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/10.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 13
Pedestrian detection is harder than you might think
Which of these is not a pedestrian ? (Caltech test set)
![Page 11: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/11.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 14
Pedestrian detection is an interesting problem
● Large variance for intra-class appearance
● Strong illumination changes
● Deformations
● Occlusions
● (Interest on small instances)
● No structural variations(number of wings in an airplane)
![Page 12: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/12.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 17
http://europa.informatik.uni-freiburg.de
![Page 13: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/13.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 18
Part I: Quality (first)
Part II: Speed (given a quality target)
![Page 14: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/14.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 19
Part I
Qualityaka “what makes strong rigid templates ?”
Viola & Jones 2001Dollar et al. 2009, 2014
Benenson et al. 2013, 2014, 2015
![Page 15: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/15.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 25
HOG: Histogram of oriented gradients
● Good features are key
● Gradient orientation is a good feature
● Fully hand-crafted [Dalal & Triggs 2005]
![Page 16: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/16.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 26
DPM: Deformable part models
● Good (pre-convnet) generic detector● Components● Latent variables estimation● Many free parameters (despite best effort from authors)● Are parts good to handle deformation ? [Hariharan et al. 2014]● Components do not share training samples [Zhu et al. 2014]
[Felzenszwalb et al. 2008; 30 Hz Sadeghi & Forsyth 2014]
![Page 17: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/17.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 27
In this talk, we focus on rigid templates
● Simpler● Faster● Deformations modeled via components
![Page 18: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/18.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 29
Everything old is new againBest rigid templates (I know), are Viola&Jones++
[Viola & Jones 2001; Dollar et al. 2009; Benenson et al. 2013, 2015]
![Page 19: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/19.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 30
Strong detection with (shallow) boosted decision trees
![Page 20: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/20.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 31
+1 -1 +1 -1
score= w1⋅h1+
Strong detection with (shallow) boosted decision trees
![Page 21: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/21.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 32
+1 -1 +1 -1 +1 -1 +1 -1
score= w2⋅h2+w1⋅h1+
Strong detection with (shallow) boosted decision trees
![Page 22: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/22.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 33
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1
score=
⋯
⋯ +wN⋅hNw2⋅h2+w1⋅h1+
Strong detection with (shallow) boosted decision trees
[ChnFtrs, Dollar et al. 2009; Viola & Jones 2001] (~4 Hz on GPU)
![Page 23: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/23.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 34
Less is moreSophisticated featuresDeformable partsDeeper architecturesNon-linear classifiersRicher training dataGeometric priorsMotion information
INRIA dataset
Better
Better
![Page 24: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/24.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 35
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1
score=
⋯
⋯ +wN⋅hNw2⋅h2+w1⋅h1+
Using oriented gradients channels is key
![Page 25: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/25.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 36
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1
score=
⋯
⋯ +wN⋅hNw2⋅h2+w1⋅h1+
Using oriented gradients channels is key
![Page 26: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/26.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 37
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1
score=
⋯
⋯ +wN⋅hNw2⋅h2+w1⋅h1+
Using oriented gradients channels is key
![Page 27: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/27.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 38
HOG != ICF
● No clipping ● No fix cells nor blocks● No cell nor block normalization● Not only oriented gradient● More machine learning, fewer hand-crafted choices
vs
![Page 28: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/28.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 39
HOG is dead, long live the HOG !
Dalal & Triggs2005
(ICF variants)
![Page 29: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/29.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 40
Additional ingredients for detection quality
![Page 30: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/30.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 42
The feature pool matters
Which feature pool use ?
![Page 31: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/31.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 43
All rectangles > all squares > random rectangles
![Page 32: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/32.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 44
Irregular cells are better than regular ones
![Page 33: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/33.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 45
One template cannot detect at multiple scales
✔✔ ✘
![Page 34: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/34.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 46
One template cannot detect at multiple scales
![Page 35: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/35.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 47
One model for all scales is sub-optimal
Scale 1(64x128 pixels)
vs
Scales 1,2,4,8
![Page 36: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/36.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 48
Multi-scales model is better than single scale
![Page 37: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/37.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 49
Multi-scales model is better than single scale
![Page 38: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/38.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 50
INRIA dataset ETH dataset
We improve over previous best
Caltech dataset
[Benenson et al. 2013]
![Page 39: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/39.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 51
Pooling ⇨ Filtering + response reading
Plenty of possible filter banks:
[Zhang et al. 2015]
![Page 40: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/40.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 52
Filtered channels further push quality
![Page 41: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/41.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 53
Normalized gradient is better than raw gradient
For datasets with difficult illumination,gradient normalization is crucial(37 23 MR, without with normalization)→ →
, gradient magnitude
, 10x10 local average
![Page 42: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/42.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 54
More and better training data is a game changer
![Page 43: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/43.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 55
1. More and better training data
2. Filtered feature channels (one more layer)
3. Use a large features pool
4. Use more than one model size
5. Do local features normalization
Additional ingredients for detection quality
![Page 44: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/44.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 56
![Page 45: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/45.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 57
Only pedestrians ?
![Page 46: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/46.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 58
![Page 47: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/47.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 59
[Mathias et al. 2014]
![Page 48: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/48.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 60
For multi-view detection we use components
mirrored mirrored
(+20°,-20°)(-60°,-20°)(-100°,-60°) (+20°,+60°) (+60°,+100°)
![Page 49: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/49.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 61
![Page 50: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/50.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 62
Part IISpeed, speed, speed
aka “techniques to make your processor happier”
Viola & Jones 2001Bourdev & Brandt 2005
Benenson et al. 2012Dollar et al. 2010, 2012, 2014
Sadeghi & Forsyth 2013Costea et al. 2015
![Page 51: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/51.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 63
(Fast != Scalable)
[Branch&Rank, Lehmann et al. 2014]
![Page 52: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/52.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 64
CPU != GPU
![Page 53: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/53.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 65
How to make detectionfaster ?
![Page 54: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/54.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 66
What slows down fastHOG ?
[Prisacariu and Reid 2009]
![Page 55: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/55.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 67
How to make features computation
faster ?
![Page 56: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/56.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 68
1. Features aggregation2. Scales handling3. Cascades4. Geometric prior
![Page 57: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/57.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 69
1. Features aggregation2. Scales handling3. Cascades4. Geometric prior
Faster
Fewer
![Page 58: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/58.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 70
1. Features aggregation2. Scales handling3. Cascades4. Geometric prior
Faster
Fewer
![Page 59: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/59.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 71
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1
score=
⋯
⋯ +wN⋅hNw2⋅h2+w1⋅h1+
How to quickly compute the input to the trees ?
![Page 60: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/60.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 72
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1
⋯
How to quickly compute the input to the trees ?
![Page 61: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/61.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 73
How to quickly compute the input to the trees ?
![Page 62: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/62.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 74
Single scale features are a fix bottleneck
● Use minimal set of feature channels● Show-off your low-level optimization skills !● This stage is very CPU and GPU friendly
![Page 63: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/63.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 75
Integral channels allow to handle any rectangle
Also known as “integral images”, “summed area table”.
Computing table is O(N),but CPU and GPU unfriendly
Reading a feature is O(1)
![Page 64: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/64.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 76
Integral channels allow to handle any rectangle
Also known as “integral images”, “summed area table”.
Computing table is O(N),but CPU and GPU unfriendly Will pay-off when
handling scales
![Page 65: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/65.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 78
We can go faster is we only use few shapes
Computing table is O(N),but CPU and GPU friendly
Reading a feature is O(1)(and 4x faster than integral channels)
Small drop in quality, for high speed gains(3 square sizes already good enough)
![Page 66: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/66.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 79
1. Features aggregation2. Scales handling3. Cascades4. Geometric prior
![Page 67: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/67.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 80
One template cannot detect at multiple scales
✔✔ ✘
![Page 68: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/68.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 81
Traditionally, features are computed many times
~50 scales
![Page 69: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/69.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 82
Traditionally, features are computed many times
~50 scales
![Page 70: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/70.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 83
We can invert the relation
1 model, 50 image scales
50 models, 1 image scale
![Page 71: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/71.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 84
Training one model per scale is too expensive
~50 scales
![Page 72: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/72.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 85
5 models, 1 image scale
We propose a method to reduce training time 10x
50 models, 1 image scale
≈
![Page 73: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/73.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 86
Features can be approximated across scales
~5 scales ~50 scales
≈
[Dollar et al. 2010]
![Page 74: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/74.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 87
We transfer test time computation to training time
1 model, 5 image scales
5 models, 1 image scale
(3x reduction in features computation)
![Page 75: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/75.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 88
50 models, 1 image scale
At runtime, we use as many models as scales
5 models, 1 image scale
≈
[Benenson et al. 2012]
![Page 76: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/76.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 89
![Page 77: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/77.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 90
Detecting without resizing improves quality
![Page 78: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/78.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 91
One can also use hybrids
[Costea et al. 2015]
5 models, 3 image scale
= 15 effective scales
![Page 79: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/79.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 93
1. Features aggregation2. Scales handling3. Cascades4. Geometric prior
Faster
Fewer
![Page 80: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/80.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 94
There are many cascade types:1. Soft cascade2. Hard cascade3. Cross-talk cascade4. NMS cascade(and many others not covered here)
![Page 81: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/81.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 95
Cascades are all about detecting dead ends early
We want to stop wasting computation in doomed candidates as early as possible (CPU, GPU).
![Page 82: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/82.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 96
Cascades are all about stopping at an early stage
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1
score=
⋯
⋯ +wN⋅hNw2⋅h2+w1⋅h1+
![Page 83: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/83.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 97
Cascades are all about stopping at an early stage
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1
score=
⋯
⋯ +wN⋅hNw2⋅h2+w1⋅h1+
![Page 84: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/84.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 98
Cascades are all about stopping at an early stage
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1
score=
⋯
⋯ +wN⋅hNw2⋅h2+w1⋅h1+
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1+1 +1 +1 +1
![Page 85: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/85.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 99
Cascades are all about stopping at an early stage
+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1
score=
⋯
⋯ +wN⋅hNw2⋅h2+w1⋅h1+
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1+1 +1 +1 +1
Can we stop now ?
![Page 86: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/86.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 100
Soft-cascades are post-hoc training
[Bourdev & Brandt 2005]
![Page 87: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/87.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 101
Soft-cascades are post-hoc training
2k 30→ : For a detector with 2k weak classifier using soft-cascade, the average number of evaluated weak classifiers is ~30.
![Page 88: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/88.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 102
Soft-cascades are post-hoc training
[Bourdev & Brandt 2005]
Soft-cascades are convenient, but:● Needs a good validation set to adjust properly● Are conceptually wrong, it asks too much of the later weak classifiers
![Page 89: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/89.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 103
Hard-cascades are created during training
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1+1 +1 +1 +1
Tree 1 Tree N
![Page 90: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/90.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 104
Hard-cascades are created during training
Tree 1 Tree N
![Page 91: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/91.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 105
Hard-cascades are created during training
● Number of stages needs to be decided at training time
● The second stage only sees the difficult cases that pass the first stage, thus focuses its resources on the problem of interest
Tree 1 Tree N
![Page 92: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/92.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 106
Soft-cascade != Hard-cascade
![Page 93: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/93.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 107
There are many cascade types:1. Soft cascade2. Hard cascade3. Cross-talk cascade4. NMS cascade(and many others not covered here)
Soft- and Hard-cascades ignore the neighbor detections
![Page 94: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/94.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 108
Neighbor windows score trajectories are very correlated
Both across space and scale, one detection tells a lot about its neighbors
![Page 95: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/95.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 109
Crosstalk cascades exploits neighbors to minimize computation
Crosstalk is designed around a simple logic:
1) Start with coarse grid of active detections (the rest is dormant)
2) If a detection is under-performing, do early stop
3) If a detection is promising, trigger is dormant neighbors
4) If a detection is less promising than neighbors, do early stop
(Neighbors are across space and scales)
Crosstalk detector runs at ~30Hz on CPU (GPU).
[Dollar et al. 2012]
![Page 96: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/96.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 110
NMS cascades are simplified Crosstalk cascades
Crosstalk is designed around a simple logic:
1) Start with coarse grid of active detections (the rest is dormant)
2) If a detection is under-performing, do early stop
3) If a detection is promising, trigger is dormant neighbors
4) If a detection is less promising than neighbors, do early stop
![Page 97: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/97.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 111
NMS cascades are simplified Crosstalk cascades
Tree 1 Tree N
![Page 98: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/98.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 112
NMS cascades are simplified Crosstalk cascades
Tree 1 Tree N
Lets do NMS now !
[Costea et al. 2015]
~10x reduction
![Page 99: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/99.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 113
1. Soft cascade2. Hard cascade3. Cross-talk cascade4. NMS cascade(and many others not covered here)
Soft + NMS cascade is easy to implement, and provides large speed gains (10x~30x).
![Page 100: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/100.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 114
1. Features aggregation2. Scales handling3. Cascades4. Geometric prior
![Page 101: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/101.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 115
One row one person scale→
![Page 102: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/102.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 116
One row one person scale→
![Page 103: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/103.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 117
One row one person scale→
![Page 104: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/104.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 118
One row one person scale→
When ground plane assumption holds, easy 5x reduction in search space
[Park et al. 2010; Sudowe & Leibe 2011]
![Page 105: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/105.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 119
But we still slide along each row, can we do better ?
![Page 106: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/106.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 120
But we still slide along each row, can we do better ?
Yes, if stereo images are available
![Page 107: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/107.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 121
Stereo data can be leveraged for fast detection
Using stereo input, object regions can be estimated directly from the stereo pair, without computing a pixel-wise depth map.
Object regions (stixels) estimation runs at 200Hz on CPU.
30x reduction in search space compared to full image search.
[Benenson et al. 2011, 2012]
![Page 108: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/108.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 123
1. Features aggregation2. Scales handling3. Cascades4. Geometric prior
![Page 109: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/109.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 124
Aspect Relativespeed
Absolute speed
Baseline detector 1× 1.4 Hz
+ Scales handling 2× 2.8 Hz+ Soft-cascade 20× 50 Hz+ Geometric prior (ground plane) 2× 100 Hz+ Geometric prior (stixels) 1.6× 160 Hz
Final CPU + GPU ICF detector - 160 Hz
[Benenson et al. 2012]
![Page 110: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/110.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 125
Fast ICF detectors are implementable and re-implementable
● ACF, LDCF, https://github.com/pdollar/toolbox (CPU)● VeryFast, Roerei, https://bitbucket.org/rodrigob/doppia (GPU)● ICF http://opencv.org (CPU), also http://libccv.org (CPU)
● 100 fps results also reached by: Costea et al. 2015 CPU, and Runia et al. 2015 GPU
![Page 111: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/111.jpg)
Rodrigo Benenson | ICCV 2015 Tutorial 126
Four lessons to remember
1. The right design of features is key for high quality
2. The right design minimizes hand-crafting
3. Fast features computation is 80% of the effort for high speed detection
4. Proper cascades and search space tunning will do the rest
![Page 112: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/112.jpg)
Rodrigo Benensonhttp://rodrigob.github.com
?
![Page 113: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015](https://reader033.vdocuments.mx/reader033/viewer/2022051903/5ff4ba6941e0c0221e487808/html5/thumbnails/113.jpg)