detecting objects at 100 hz with rigid...

113
Rodrigo Benenson Detecting objects at 100 Hz with rigid templates Markus Mathias Radu Timofte Marco Pedersoli Shanshan Zhang Mohamed Omran Jan Hosang Piotr Dollar Luc Van Gool Tinne Tuytelaars Bernt Schiele (and other authors)

Upload: others

Post on 15-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson

Detecting objects at 100 Hz with rigid templates

Markus MathiasRadu Timofte

Marco PedersoliShanshan Zhang

Mohamed OmranJan HosangPiotr Dollar

Luc Van GoolTinne Tuytelaars

Bernt Schiele(and other authors)

Page 2: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015
Page 3: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

1) Who is familiar with the “HOG descriptor” ?

2) Who is already familiar with the “ICF detectors” ?

3) Who has needed to use a detector ?

4) Who has trained a detector him/herself ?

5) Who has written a paper about detection ?

6) Any burning question ? Shoot now !

Page 4: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 7

The next 45 minutes:

How to design a detectorrunning at 100 Hz (CPU only),

step by step

https://youtu.be/scDyiDl7D7c

Page 5: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 8

Why care about 100 Hz detection ?

● When detection is only a component of a larger system

● When computational power matters (e.g. cell phones !)

● When latency matters

● For certain categories, can be both fast and reach competitive detection quality (e.g. pedestrian, face, and traffic signs)

● Used as class-specific object proposals for convnets

Page 6: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 9

How to reach high detection speed ?

I only know three ways:

1) Wait 6 months

2) Make a better fit between algorithm and hardware

3) Design an algorithm that computes less

Page 7: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 10

How to reach high detection quality ?

● “A + B features”; boring, and slower than A or B.

● Find the core ingredient, then optimize it.

● Do not hand-design, arbitrary choices are most likely wrong.

Page 8: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 11

Running example:Pedestrian detection

Page 9: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 12

Pedestrian detection is harder than you might think

INRIA training examples

Page 10: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 13

Pedestrian detection is harder than you might think

Which of these is not a pedestrian ? (Caltech test set)

Page 11: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 14

Pedestrian detection is an interesting problem

● Large variance for intra-class appearance

● Strong illumination changes

● Deformations

● Occlusions

● (Interest on small instances)

● No structural variations(number of wings in an airplane)

Page 12: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 17

http://europa.informatik.uni-freiburg.de

Page 13: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 18

Part I: Quality (first)

Part II: Speed (given a quality target)

Page 14: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 19

Part I

Qualityaka “what makes strong rigid templates ?”

Viola & Jones 2001Dollar et al. 2009, 2014

Benenson et al. 2013, 2014, 2015

Page 15: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 25

HOG: Histogram of oriented gradients

● Good features are key

● Gradient orientation is a good feature

● Fully hand-crafted [Dalal & Triggs 2005]

Page 16: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 26

DPM: Deformable part models

● Good (pre-convnet) generic detector● Components● Latent variables estimation● Many free parameters (despite best effort from authors)● Are parts good to handle deformation ? [Hariharan et al. 2014]● Components do not share training samples [Zhu et al. 2014]

[Felzenszwalb et al. 2008; 30 Hz Sadeghi & Forsyth 2014]

Page 17: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 27

In this talk, we focus on rigid templates

● Simpler● Faster● Deformations modeled via components

Page 18: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 29

Everything old is new againBest rigid templates (I know), are Viola&Jones++

[Viola & Jones 2001; Dollar et al. 2009; Benenson et al. 2013, 2015]

Page 19: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 30

Strong detection with (shallow) boosted decision trees

Page 20: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 31

+1 -1 +1 -1

score= w1⋅h1+

Strong detection with (shallow) boosted decision trees

Page 21: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 32

+1 -1 +1 -1 +1 -1 +1 -1

score= w2⋅h2+w1⋅h1+

Strong detection with (shallow) boosted decision trees

Page 22: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 33

+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1

score=

⋯ +wN⋅hNw2⋅h2+w1⋅h1+

Strong detection with (shallow) boosted decision trees

[ChnFtrs, Dollar et al. 2009; Viola & Jones 2001] (~4 Hz on GPU)

Page 23: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 34

Less is moreSophisticated featuresDeformable partsDeeper architecturesNon-linear classifiersRicher training dataGeometric priorsMotion information

INRIA dataset

Better

Better

Page 24: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 35

+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1

score=

⋯ +wN⋅hNw2⋅h2+w1⋅h1+

Using oriented gradients channels is key

Page 25: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 36

+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1

score=

⋯ +wN⋅hNw2⋅h2+w1⋅h1+

Using oriented gradients channels is key

Page 26: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 37

+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1

score=

⋯ +wN⋅hNw2⋅h2+w1⋅h1+

Using oriented gradients channels is key

Page 27: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 38

HOG != ICF

● No clipping ● No fix cells nor blocks● No cell nor block normalization● Not only oriented gradient● More machine learning, fewer hand-crafted choices

vs

Page 28: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 39

HOG is dead, long live the HOG !

Dalal & Triggs2005

(ICF variants)

Page 29: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 40

Additional ingredients for detection quality

Page 30: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 42

The feature pool matters

Which feature pool use ?

Page 31: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 43

All rectangles > all squares > random rectangles

Page 32: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 44

Irregular cells are better than regular ones

Page 33: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 45

One template cannot detect at multiple scales

✔✔ ✘

Page 34: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 46

One template cannot detect at multiple scales

Page 35: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 47

One model for all scales is sub-optimal

Scale 1(64x128 pixels)

vs

Scales 1,2,4,8

Page 36: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 48

Multi-scales model is better than single scale

Page 37: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 49

Multi-scales model is better than single scale

Page 38: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 50

INRIA dataset ETH dataset

We improve over previous best

Caltech dataset

[Benenson et al. 2013]

Page 39: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 51

Pooling ⇨ Filtering + response reading

Plenty of possible filter banks:

[Zhang et al. 2015]

Page 40: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 52

Filtered channels further push quality

Page 41: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 53

Normalized gradient is better than raw gradient

For datasets with difficult illumination,gradient normalization is crucial(37 23 MR, without with normalization)→ →

, gradient magnitude

, 10x10 local average

Page 42: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 54

More and better training data is a game changer

Page 43: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 55

1. More and better training data

2. Filtered feature channels (one more layer)

3. Use a large features pool

4. Use more than one model size

5. Do local features normalization

Additional ingredients for detection quality

Page 44: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 56

Page 45: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 57

Only pedestrians ?

Page 46: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 58

Page 47: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 59

[Mathias et al. 2014]

Page 48: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 60

For multi-view detection we use components

mirrored mirrored

(+20°,-20°)(-60°,-20°)(-100°,-60°) (+20°,+60°) (+60°,+100°)

Page 49: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 61

Page 50: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 62

Part IISpeed, speed, speed

aka “techniques to make your processor happier”

Viola & Jones 2001Bourdev & Brandt 2005

Benenson et al. 2012Dollar et al. 2010, 2012, 2014

Sadeghi & Forsyth 2013Costea et al. 2015

Page 51: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 63

(Fast != Scalable)

[Branch&Rank, Lehmann et al. 2014]

Page 52: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 64

CPU != GPU

Page 53: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 65

How to make detectionfaster ?

Page 54: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 66

What slows down fastHOG ?

[Prisacariu and Reid 2009]

Page 55: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 67

How to make features computation

faster ?

Page 56: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 68

1. Features aggregation2. Scales handling3. Cascades4. Geometric prior

Page 57: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 69

1. Features aggregation2. Scales handling3. Cascades4. Geometric prior

Faster

Fewer

Page 58: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 70

1. Features aggregation2. Scales handling3. Cascades4. Geometric prior

Faster

Fewer

Page 59: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 71

+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1

score=

⋯ +wN⋅hNw2⋅h2+w1⋅h1+

How to quickly compute the input to the trees ?

Page 60: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 72

+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1

How to quickly compute the input to the trees ?

Page 61: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 73

How to quickly compute the input to the trees ?

Page 62: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 74

Single scale features are a fix bottleneck

● Use minimal set of feature channels● Show-off your low-level optimization skills !● This stage is very CPU and GPU friendly

Page 63: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 75

Integral channels allow to handle any rectangle

Also known as “integral images”, “summed area table”.

Computing table is O(N),but CPU and GPU unfriendly

Reading a feature is O(1)

Page 64: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 76

Integral channels allow to handle any rectangle

Also known as “integral images”, “summed area table”.

Computing table is O(N),but CPU and GPU unfriendly Will pay-off when

handling scales

Page 65: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 78

We can go faster is we only use few shapes

Computing table is O(N),but CPU and GPU friendly

Reading a feature is O(1)(and 4x faster than integral channels)

Small drop in quality, for high speed gains(3 square sizes already good enough)

Page 66: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 79

1. Features aggregation2. Scales handling3. Cascades4. Geometric prior

Page 67: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 80

One template cannot detect at multiple scales

✔✔ ✘

Page 68: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 81

Traditionally, features are computed many times

~50 scales

Page 69: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 82

Traditionally, features are computed many times

~50 scales

Page 70: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 83

We can invert the relation

1 model, 50 image scales

50 models, 1 image scale

Page 71: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 84

Training one model per scale is too expensive

~50 scales

Page 72: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 85

5 models, 1 image scale

We propose a method to reduce training time 10x

50 models, 1 image scale

Page 73: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 86

Features can be approximated across scales

~5 scales ~50 scales

[Dollar et al. 2010]

Page 74: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 87

We transfer test time computation to training time

1 model, 5 image scales

5 models, 1 image scale

(3x reduction in features computation)

Page 75: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 88

50 models, 1 image scale

At runtime, we use as many models as scales

5 models, 1 image scale

[Benenson et al. 2012]

Page 76: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 89

Page 77: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 90

Detecting without resizing improves quality

Page 78: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 91

One can also use hybrids

[Costea et al. 2015]

5 models, 3 image scale

= 15 effective scales

Page 79: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 93

1. Features aggregation2. Scales handling3. Cascades4. Geometric prior

Faster

Fewer

Page 80: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 94

There are many cascade types:1. Soft cascade2. Hard cascade3. Cross-talk cascade4. NMS cascade(and many others not covered here)

Page 81: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 95

Cascades are all about detecting dead ends early

We want to stop wasting computation in doomed candidates as early as possible (CPU, GPU).

Page 82: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 96

Cascades are all about stopping at an early stage

+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1

score=

⋯ +wN⋅hNw2⋅h2+w1⋅h1+

Page 83: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 97

Cascades are all about stopping at an early stage

+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1

score=

⋯ +wN⋅hNw2⋅h2+w1⋅h1+

Page 84: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 98

Cascades are all about stopping at an early stage

+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1

score=

⋯ +wN⋅hNw2⋅h2+w1⋅h1+

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1+1 +1 +1 +1

Page 85: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 99

Cascades are all about stopping at an early stage

+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1

score=

⋯ +wN⋅hNw2⋅h2+w1⋅h1+

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1+1 +1 +1 +1

Can we stop now ?

Page 86: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 100

Soft-cascades are post-hoc training

[Bourdev & Brandt 2005]

Page 87: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 101

Soft-cascades are post-hoc training

2k 30→ : For a detector with 2k weak classifier using soft-cascade, the average number of evaluated weak classifiers is ~30.

Page 88: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 102

Soft-cascades are post-hoc training

[Bourdev & Brandt 2005]

Soft-cascades are convenient, but:● Needs a good validation set to adjust properly● Are conceptually wrong, it asks too much of the later weak classifiers

Page 89: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 103

Hard-cascades are created during training

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1+1 +1 +1 +1

Tree 1 Tree N

Page 90: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 104

Hard-cascades are created during training

Tree 1 Tree N

Page 91: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 105

Hard-cascades are created during training

● Number of stages needs to be decided at training time

● The second stage only sees the difficult cases that pass the first stage, thus focuses its resources on the problem of interest

Tree 1 Tree N

Page 92: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 106

Soft-cascade != Hard-cascade

Page 93: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 107

There are many cascade types:1. Soft cascade2. Hard cascade3. Cross-talk cascade4. NMS cascade(and many others not covered here)

Soft- and Hard-cascades ignore the neighbor detections

Page 94: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 108

Neighbor windows score trajectories are very correlated

Both across space and scale, one detection tells a lot about its neighbors

Page 95: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 109

Crosstalk cascades exploits neighbors to minimize computation

Crosstalk is designed around a simple logic:

1) Start with coarse grid of active detections (the rest is dormant)

2) If a detection is under-performing, do early stop

3) If a detection is promising, trigger is dormant neighbors

4) If a detection is less promising than neighbors, do early stop

(Neighbors are across space and scales)

Crosstalk detector runs at ~30Hz on CPU (GPU).

[Dollar et al. 2012]

Page 96: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 110

NMS cascades are simplified Crosstalk cascades

Crosstalk is designed around a simple logic:

1) Start with coarse grid of active detections (the rest is dormant)

2) If a detection is under-performing, do early stop

3) If a detection is promising, trigger is dormant neighbors

4) If a detection is less promising than neighbors, do early stop

Page 97: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 111

NMS cascades are simplified Crosstalk cascades

Tree 1 Tree N

Page 98: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 112

NMS cascades are simplified Crosstalk cascades

Tree 1 Tree N

Lets do NMS now !

[Costea et al. 2015]

~10x reduction

Page 99: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 113

1. Soft cascade2. Hard cascade3. Cross-talk cascade4. NMS cascade(and many others not covered here)

Soft + NMS cascade is easy to implement, and provides large speed gains (10x~30x).

Page 100: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 114

1. Features aggregation2. Scales handling3. Cascades4. Geometric prior

Page 101: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 115

One row one person scale→

Page 102: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 116

One row one person scale→

Page 103: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 117

One row one person scale→

Page 104: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 118

One row one person scale→

When ground plane assumption holds, easy 5x reduction in search space

[Park et al. 2010; Sudowe & Leibe 2011]

Page 105: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 119

But we still slide along each row, can we do better ?

Page 106: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 120

But we still slide along each row, can we do better ?

Yes, if stereo images are available

Page 107: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 121

Stereo data can be leveraged for fast detection

Using stereo input, object regions can be estimated directly from the stereo pair, without computing a pixel-wise depth map.

Object regions (stixels) estimation runs at 200Hz on CPU.

30x reduction in search space compared to full image search.

[Benenson et al. 2011, 2012]

Page 108: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 123

1. Features aggregation2. Scales handling3. Cascades4. Geometric prior

Page 109: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 124

Aspect Relativespeed

Absolute speed

Baseline detector 1× 1.4 Hz

+ Scales handling 2× 2.8 Hz+ Soft-cascade 20× 50 Hz+ Geometric prior (ground plane) 2× 100 Hz+ Geometric prior (stixels) 1.6× 160 Hz

Final CPU + GPU ICF detector - 160 Hz

[Benenson et al. 2012]

Page 110: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 125

Fast ICF detectors are implementable and re-implementable

● ACF, LDCF, https://github.com/pdollar/toolbox (CPU)● VeryFast, Roerei, https://bitbucket.org/rodrigob/doppia (GPU)● ICF http://opencv.org (CPU), also http://libccv.org (CPU)

● 100 fps results also reached by: Costea et al. 2015 CPU, and Runia et al. 2015 GPU

Page 111: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benenson | ICCV 2015 Tutorial 126

Four lessons to remember

1. The right design of features is key for high quality

2. The right design minimizes hand-crafting

3. Fast features computation is 80% of the effort for high speed detection

4. Proper cascades and search space tunning will do the rest

Page 112: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015

Rodrigo Benensonhttp://rodrigob.github.com

?

Page 113: Detecting objects at 100 Hz with rigid templatesmp7.watson.ibm.com/ICCV2015/slides/2015_12_11_1400... · Viola & Jones 2001 Dollar et al. 2009, 2014 Benenson et al. 2013, 2014, 2015