object co-detection · • better detection accuracy than single -image methods • better matching...

39
Object Co-detection Yingze (Sid) Bao, Yu Xiang, Silvio Savarese Midwest Workshop 2012, ECCV 2012

Upload: others

Post on 20-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Object Co-detection

Yingze (Sid) Bao, Yu Xiang, Silvio SavareseMidwest Workshop 2012, ECCV 2012

Page 2: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Object Detection: A Review

2

Page 3: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Object Co-Detection• Detect objects in multiple images• Establish object correspondences• Estimate object viewpoint changes

Id: 1

Id: 2

A Coherent Algorithm

3

Page 4: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Motivations• Better detection accuracy than single-image methods

4

Page 5: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Motivations• Better detection accuracy than single-image methods• Better matching accuracy than low-level methods

Where is the car?

5

Page 6: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

• Better detection accuracy than single-image methods• Better matching accuracy than low-level methods• Tracking by Detection

– Co-detection provides consistent detection across frames

Motivations

6

Page 7: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Motivations• Better detection accuracy than single-image methods• Better matching accuracy than low-level methods• Tracking by Detection• Semantic Structure From Motion

R, t

R, tBao, Bagra, Chao, Savarese, CVPR 2011 CORP 2012 CVPR 2012 7

Page 8: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Related Problems• Object detection in a single image

• Viola et al. 2001• Fergus et al. 2003• Leibe et al. 2004 • Dalal et al. 2005• Savarese et al. 2006• Felzenszwalb et al. 2009• etc….

8

Page 9: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Related Problems• Object detection in a single image• Single instance detection

– Low level image features– Small pose variation– Rich texture

Lowe 1999

• Lowe 1999• Berg et al. 2005• Ferrari et al. 2006• Nister et al. 2006• Rothganger et al.2006• Hsiao et al. 2010• etc.

9

Page 10: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Related Problems• Object detection in a single image• Single instance detection• Co-segmentation

– No semantic information– Hard to handle objects with different poses

• Rother et al. 2006• Batra et al. 2010• Hochbaum et al. 2009• etc.

Rother et al. 2006

10

Page 11: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Related Problems• Object detection in a single image• Single instance detection• Co-segmentation• Region matching

– No semantic information– May require epipolar geometry validation– Sensitive to segmentation noise

• Schaffalitzky et al. 2001• Tuytelaars et al. 2004• Matas et al. 2004• Toshev et al. 2007• etc.

Toshev et al. 2007 11

Page 12: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Object Co-detection is challenging

12

Page 13: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Pose Variation & Self-occlusion

13

Page 14: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Object Representation

id, location, pose, visibility and scale

• 𝑝𝑝 : part • 𝑟𝑟 : root filter• 𝑉𝑉 : view point𝑂𝑂 = 𝑟𝑟,𝑉𝑉, 𝑝𝑝1, 𝑝𝑝2, … , 𝑝𝑝𝑛𝑛

14

Page 15: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Pose Variation & Self-occlusion

15

Page 16: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Goal of object co-detection

• Identify matching objects in every input images respectively

• 𝑂𝑂𝑘𝑘: an object in image 𝐼𝐼𝑘𝑘

• 𝐸𝐸: energy based on the co-detection model– What is the co-detection model?

𝑂𝑂1,𝑂𝑂2, … , 𝑂𝑂𝐾𝐾 = 𝑎𝑎𝑟𝑟𝑎𝑎max{𝑂𝑂𝑘𝑘}

𝐸𝐸(𝑂𝑂1,𝑂𝑂2, … , 𝑂𝑂𝐾𝐾; 𝑰𝑰)

16

Page 17: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Co-detection Model

• In the case of 2 input images

𝑝𝑝 : part 𝑟𝑟 : bounding box𝑉𝑉 : view point

𝐸𝐸𝑢𝑢𝑛𝑛𝑢𝑢𝑢𝑢

17

Page 18: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Co-detection Model

• In the case of 2 input images

Account for visibility!

𝑝𝑝 : part 𝑟𝑟 : bounding box𝑉𝑉 : view point

Rectification

Match

Rectification

𝐸𝐸𝑚𝑚𝑚𝑚𝑢𝑢𝑚𝑚𝑚

𝐸𝐸𝑢𝑢𝑛𝑛𝑢𝑢𝑢𝑢

18

Page 19: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Co-detection Model

• In the case of 2 input images

Account for visibility!

𝑝𝑝 : part 𝑟𝑟 : bounding box𝑉𝑉 : view point

Rectification

Match

Rectification

𝐸𝐸𝑚𝑚𝑚𝑚𝑢𝑢𝑚𝑚𝑚

𝐸𝐸𝑢𝑢𝑛𝑛𝑢𝑢𝑢𝑢

19

Page 20: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Set of objects Set of input images

20

Page 21: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

• Unitary potential– Single image object detector

Xiang & Savarese CVPR 12

Good with many part-based object detection models!• Fergus et al. 03• Leibe et al. 04• Silvio & Feifei 06• Kushal et. al. 07• Chiu et. al. 07• Sun et al. 09• Felzenszwalb 09• Filder 09

21

Page 22: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

• Matching potential• Decompose into pair-wise terms

rectification rectification

Match by concatenation of feature vectors (HOG, sift, color etc..)

22

Page 23: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Single Image Detector False Alarms

23

Page 24: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Inference

• Loopy model– approximating inference with 2 steps

• Step 1: �𝑂𝑂 objects with high unitary potential

• Step 2: 𝑂𝑂∗ = 𝑎𝑎𝑟𝑟𝑎𝑎max�𝑂𝑂

All possible configurationsof matching objects in different images

Goal: detectmatching objects

24

Page 25: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Learn the model• Learn parameters for

– Standard learning process in a part-based object detection model

• Learn parameters for– Learning weights w of different matching cues (e.g. HOG,

sift, color) in a SVM learning framework

25

Page 26: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

26

Page 27: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

2D Object Part Representation– A Simplification

• Treat different view points as different object categories– Cannot match objects with large pose variations

• More choices to compute– Fergus et al. CVPR’03– Leibe et al. 04– Felzenszwalb 09– Etc. 27

Page 28: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Experiments

28

Page 29: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Experiments

• 3 Datasets– Cars

• 300 image pairs• Pandey et al. 2009, Bao et al. 2010

– Pedestrians • 200 image pairs• Ess et al. 2007

– 3d objects • ~400 image pairs for 8 categories each• Savarese & Fei-fei. 2006

29

Page 30: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Car dataset

(2d object representation applied)

Single Img. Det. Co-detector30

Page 31: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Pedestrian dataset

(2d object representation applied)

Single Img. Det. Co-detector31

Page 32: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

3d object dataset

Single Img. Det. Co-detector

Shoe model

SIFT match

32

Page 33: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

33

Page 34: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Quantitative Evaluation

• Object detection• Pose estimation• Single instance detection

(More results in the paper)

34

Page 35: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

45

50

55

60

65

70

75

80

85

Car Pedestrain 3D Object

Aver

age

Prec

isio

n (%

)

Datasets

Object Detection Accuracy

Single Image detector

Co-detetion

Car, Pedestrian: Felzenszwalb et al. 20093D Object: Xiang & Savarese 2012

49.8 53

.5

59.7 62

.7

82.3

83.0

35

Page 36: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

606570

75

80

85

90

95

100

Accu

racy

(%)

3D Object Dataset

Pose Estimation Accuracy

Single Image Estimator

Object Co-detection

Xiang & Savarese 2012

36

Page 37: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

0

20

40

60

80

100

Aver

age

Prec

isio

n (%

)

(same poses)

Detecting the same instance

Lowe 2004

Co-detection

37

Page 38: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

Why the co-detection is better at the instance detection task?

38

Page 39: Object co-detection · • Better detection accuracy than single -image methods • Better matching accuracy than low -level methods • Tracking by Detection • Semantic Structure

• Object co-detection problem– A generalization of the object detection problem

• Our solution– Exploit existing object representation models– Measure object similarity by parts

• Experiments– Superior performance in extensive tests

• Acknowledgement

39