detecting object instances without discriminative featuresehsiao/thesis/ehsiao_thesis_slides.pdf ·...

Post on 23-Jul-2020

18 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Detecting Object Instances Without Discriminative Features

Edward Hsiao

June 19, 2013

Thesis Committee: Martial Hebert, Chair

Alexei Efros Takeo Kanade

Andrew Zisserman, University of Oxford 1

Object Instance Detection

Find this object under arbitrary viewpoint, lighting, clutter and occlusions 2

3

4

Robotic Manipulation

5

Scene Understanding

6

Scene Understanding

Stove Refrigerator

Microwave Coffee maker

Paper towel

Dishwasher

Faucet

7

Visual Search

8

Recognition Using Discriminative Features

model test image

9

[SIFT, Lowe 2004]

Extract Keypoints

test image

10

model

[SIFT, Lowe 2004]

Generate 1-To-1 Correspondences

test image

11

model

[SIFT, Lowe 2004]

Enforce Geometric Constraints

test image

12

model

[SIFT, Lowe 2004]

Recognized Object

test image

13

model

[SIFT, Lowe 2004]

Failure of Feature Matching

14

test image model

0 correct correspondences

Overview Lack of Discriminative Features

Ambiguous Keypoint Features

Feature-poor objects

Occlusions

15

Overview Lack of Discriminative Features

Ambiguous Keypoint Features

Feature-poor objects

Occlusions

16

Ambiguous Keypoint Features

17

Repeated Patterns

18

Failure of Discriminative Matching

Geometric model

mdesc2

Model descriptors

mdesc1

.

.

.

Image keypoint descriptor

19

Failure of Discriminative Matching

Geometric model

mdesc2

Model descriptors

mdesc1

.

.

.

Image keypoint descriptor

? or

One-to-one matching

20

Failure of Discriminative Matching

Geometric model

mdesc2

Model descriptors

mdesc1

.

.

.

Image keypoint descriptor

? or

One-to-one matching

Most approaches discard ambiguous features 21

Quantized Matching

Geometric model

qdesc2

Quantized model descriptors

qdesc1

.

.

.

Image keypoint descriptor

22

Quantized Matching

Geometric model

qdesc2

Quantized model descriptors

qdesc1

.

.

.

Image keypoint descriptor

Quantized matching

Preserve ambiguity of match until geometric verification 23

Detection Performance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Average Precision (higher is better)

24

CMU Grocery Dataset

620 images, 10 household objects

one-to-one matching

[Collet et al. 2009]

quantized matching

Failure of Feature Matching

test image

25 0 correct correspondences

model

Keypoint Comparison

Success Failure

26

Uninformative Keypoints

27

Uninformative Keypoints

28

Uninformative Keypoints

29

“Informative” Keypoints

980 keypoints 10 keypoints

Keypoints contained entirely within the object 30

“Informative” Keypoints

980 keypoints 10 keypoints

Keypoints due to specularities 31

Less keypoints More keypoints

Feature-richness

32

Less keypoints More keypoints

Feature-richness

33

Less keypoints More keypoints

Feature-richness

34

Less keypoints More keypoints

Feature-richness

35

Feature Matching Experiment

36

Feature Matching Experiment

37

Feature Matching Experiment

38

Feature Matching Experiment

39

At least 5 good correspondences between all pairs of images

Less keypoints More keypoints

Works Fails

40

Less keypoints More keypoints

Works Fails

41

Less keypoints More keypoints

Works Fails

42

Less keypoints More keypoints

Feature-rich Feature-poor

43

Less keypoints More keypoints

Feature-rich Feature-poor

44

Overview Lack of Discriminative Features

Ambiguous Keypoint Features

Feature-poor objects

Occlusions

45

46

Feature-poor Objects Shape Matching

Template shape Input window Matched shape

47

Representing Feature-poor Objects

Sparse Edge Points [Berg 2005], [Leordeanu 2007],

[Duchenne 2009], [Hinterstoisser 2011]

Lines & Contour Fragments [Ferrari 2006 & 2008],

[Opelt 2006], [Srinivasan 2010]

Histogram of Oriented Gradients (HOG) [Dalal and Triggs 2005], [Lai 2011]

48

Sparse Edge Points

49

Local information: gradient orientation and color

Sparse Edge Points Matched Not matched

50

Sparse Edge Points Matched Not matched

51

Sparse Edge Points

Edge connectivity is lost

Matched Not matched

52

Lines & Contour Fragments

53

Lines & Contour Fragments

Line fitting is brittle

Difficult to parameterize

Dependent on edge extraction

Splines sensitive to occlusions

54

Lines & Contour Fragments

Line fitting is brittle

Difficult to parameterize

Dependent on edge extraction

Splines sensitive to occlusions

55

Histogram of Oriented Gradients

56

Coarse statistics of gradient orientation and magnitude

Histogram of Oriented Gradients

Corrupted by background clutter Ambiguous shape

patch HOG HOG patch

57

Histogram of Oriented Gradients

Corrupted by background clutter Ambiguous shape

patch HOG HOG patch

58

Gradient Networks Our Approach

1. Match shape explicitly 2. Enforce connectivity without extracting edges

59

Gradient Networks Overview

Shape template Input window

60

Gradient Networks Overview

Shape template Input window

61

Gradient Networks Local Shape Potential

How well does each pixel match locally? 62

Gradient Networks Predicted Shape Match

Find long connected components which follow shape 63

Local Shape Potential

Distance to template Local orientation

Color Edge potential 64

Local Shape Potential

Distance to template Local orientation

Color Edge potential 65

Local Shape Potential

Distance to template Local orientation

Color Edge potential 66

Local Orientation Potential

67

model test

local orientation potential

Local Shape Potential

Distance to template Local orientation

Color Edge potential 68

Local Shape Potential

Distance to template Local orientation

Color Edge potential 69

Local Shape Potential

70

Gradient Networks

𝑝

𝑝

Each pixel is a node in the network 71

𝑝

Gradient Networks

pQ0

pQ1

q

𝑝

Connect each node to neighbors in tangent direction 72

Gradient Networks

𝑝

𝑝

Find paths in the network that match the shape well 73

𝑝

Message Passing Local shape potential

shape similarity

local shape potential

message from left

message from right

[Bhat et al. 2010]

74

𝑝

Message Passing Local shape potential

Initially, it is just the local shape potential 75

Message Passing

𝑝

Local shape potential

76

Message Passing

𝑝

Local shape potential

77

Message Passing

𝑝

Local shape potential

78

Predicted Shape Match

Local shape potential Predicted match

Message passing

79

CMU Kitchen Occlusion Dataset

• 1600 images of 8 feature-poor objects • Single and multiple viewpoints • Cluttered scenes and occlusions

80

Objects Example images

Shape Matching Results

Template Input window Local shape potential

Predicted match 81

Shape Matching Results

Template Input window Local shape potential

Predicted match 82

Shape Matching Results

Template Input window Local shape potential

Predicted match 83

Object Detection Sliding Window

84

Object Detection Sliding Window

85

Detection Performance

86

better

False positives with shape only

Object False positive window

GN point-wise confidences

88

Interior Appearance

Object False positive window

GN point-wise confidences

89

BaRT Boundary and Region Templates

90

BaRT Boundary and Region Templates

91

Boundary

Explicit shape: rLINE2D and GN 92

BaRT Boundary and Region Templates

93

Region

Consider appearance within the object interior HOG and color

94

BaRT Boundary and Region Templates

95

BaRT

Combines explicit boundary and region information

96

HOG Uniform Regions

Uniform regions not represented well

97

HOG Normalization

98

Each cell normalized with respect to magnitude of neighbors

HOG Normalization

99

Amplifies noise if magnitude close to 0

Uniform Regions

100

Learning?

HOG + SVM Multiple images

weight = 0

HOG + exemplar SVM Single image

weight = random

101

Learning?

HOG + SVM Multiple images

weight = 0

HOG + exemplar SVM Single image

weight = random

102

Learning?

HOG + SVM Multiple images

weight = 0

HOG + exemplar SVM Single image

weight = random

103

Modify HOG Normalization

Modified HOG HOG

Set cell to zero if normalization below threshold

104

Matching Uniform Regions

Ours HOG

Test image:

105

HOG Ours

Matching Uniform Regions

Ours HOG

Test image:

HOG Ours

106

Matching Uniform Regions

Ours HOG

More accurate confidences in uniform regions

Test image:

HOG Ours

107

Example Detections

detection zoomed in boundary (GN)

region (HOG+color) 108

Example Detections

detection zoomed in boundary (GN)

region (HOG+color) 109

Example Detections

detection zoomed in boundary (GN)

region (HOG+color) 110

Detection Performance

112

Detection Performance

113

Detection Performance Under Different Occlusion Levels

114

Detection Performance Under Different Occlusion Levels

115

Overview Lack of Discriminative Features

Ambiguous Keypoint Features

Feature-poor objects

Occlusions

116

Occlusions

117

Occlusions

118

Occlusions happen in 3D

119

Occlusions happen in 3D

120

Occlusions happen in 3D

121

Occlusions happen in 3D

122

Occlusion Reasoning

Matched Not matched

Which of these hypotheses is most likely? 123

Occlusion Reasoning

Matched Not matched

Which of these hypotheses is most likely? 124

Occlusion Reasoning

Matched Not matched

Which of these hypotheses is most likely? 125

Occlusion Reasoning

Matched Not matched

Which of these hypotheses is most likely? 126

Occlusion Reasoning

Local Coherency Fransens ‘06, Wang ‘09

Learn Occlusion Structure Gao ’11, Kwak ‘11

Object Detection Depth Ordering Wu ‘05, Wang ‘11

127

Structure of Occlusions

Binary variable that equals 1 if is visible

Probability a point is visible given the visibility labeling of all other points

Occlusion Conditional Likelihood

Occlusion under a given camera view point c

128

Matched Not matched

Occlusion Reasoning Per Environment

objH

objWobjL

Estimate of object dimensions Distribution of object dimensions for a given environment

129

Occlusion Model

130

Occlusion Model

Occluder

Object

131

Occlusion Model objW

objH

h

w

Occluder

Object

132

Occlusion Model objW

objH

h

w

Occluder

Object

133

Occlusion Conditional Likelihood

jX

iX

𝐴𝑉𝑖,𝑉𝑗,𝑂𝑐

jX

𝐴𝑉𝑗,𝑂𝑐

Integral Geometry 134

Occlusion Conditional Likelihood

jX

iX

𝐴𝑉𝑖,𝑉𝑗,𝑂𝑐

jX

𝐴𝑉𝑗,𝑂𝑐

Area covering all positions where Xj is visible and object occluded 135

Occlusion Conditional Likelihood

jX

iX

𝐴𝑉𝑖,𝑉𝑗,𝑂𝑐

jX

𝐴𝑉𝑗,𝑂𝑐

Area covering all positions where Xj is visible and object occluded 136

Occlusion Conditional Likelihood

jX

iX

𝐴𝑉𝑖,𝑉𝑗,𝑂𝑐

jX

𝐴𝑉𝑗,𝑂𝑐

Area covering all positions where Xj is visible and object occluded 137

Occlusion Conditional Likelihood

jX

iX

𝐴𝑉𝑖,𝑉𝑗,𝑂𝑐

jX

𝐴𝑉𝑗,𝑂𝑐

Area covering all positions where Xj and Xj are visible and object occluded 138

Occlusion Conditional Likelihood

139

𝑿𝒋

Occlusion Conditional Likelihood Under Different Viewpoints

140

Occlusion Conditional Likelihood Under Different Viewpoints

141

Occlusion Conditional Likelihood Penalty (OCLP)

iX

High penalty if unlikely to be occluded by a valid object on same support surface

Matched Not matched

:OCLPf

142

Occlusion Conditional Likelihood Penalty (OCLP)

iX

Low penalty if likely to be occluded by a valid object on same support surface

Matched Not matched

:OCLPf

143

Occlusion Conditional Likelihood Penalty (OCLP)

iXMatched Not matched

Low penalty if likely to be occluded by a valid object on same support surface :OCLPf

144

Example Detections

145

Detection Performance

146

Detection Performance Under Different Occlusion Levels

147

Limitation

Binary Matching Pattern Occlusion Conditional Likelihood

148

Limitation

Misclassifications can have impact on distribution

Binary Matching Pattern Occlusion Conditional Likelihood

149

Occlusion Efficient Subwindow Search (OESS)

Probabilistic Matching Pattern 150 Probabilistic Matching Pattern

OESS for True Positive

Occlusion can be explained well 151

OESS for True Positive

95% explained 152

OESS for False Positive

153

OESS for False Positive

Only 50% explained 154

OESS Scoring Matching Pattern

-1

+1 +1

-1

𝑝 = 1

𝑝 = 0

score = (1) + (1) + (-1) + (-1) = 0 155

+1

+1 +1

-1

Occluding block

𝑝 = 1

𝑝 = 0

OESS Scoring Matching Pattern

score = (1) + (1) + (1) + (-1) = 2 156

rewarded

OESS Scoring Matching Pattern

+1

-1 +1

-1

Occluding block

𝑝 = 1

𝑝 = 0

penalized

score = (-1) + (1) + (1) + (-1) = 0 157

OESS

Reformulate as Efficient Subwindow Search (ESS) 158

OESS

Find best occluder object 159

OESS

Remove all explained points 160

OESS

Iterate 161

OESS

Iterate 162

OESS

Iterate 163

OESS

Final prediction 164

Results

groundtruth predicted oboxes boundary region window detection

165

Results

groundtruth predicted oboxes boundary region window detection

166

Results

groundtruth predicted oboxes boundary region window detection

167

Results

groundtruth predicted oboxes boundary region window detection

168

Occlusion Prediction Performance

vs. predicted groundtruth

169

Average Intersection over Union (IoU)

Occlusion Prediction Performance

vs.

predicted groundtruth 170

Detection Performance

171

172

Summary Lack of Discriminative Features

Gradient Networks

Boundary and Region Templates

Occlusion Conditional Likelihood

Occlusion Efficient Subwindow Search

Ambiguous Keypoint Features

Feature-poor objects

Occlusions 173

Main Contributions Ambiguous Keypoint Features

Making specific features less discriminative 174

Main Contributions Representing Feature-poor Objects

Gradient Networks Boundary and Region Templates Explicit shape matching without

extracting edges Capture explicit boundary

and region information

175

Main Contributions Representing Feature-poor Objects

Gradient Networks Boundary and Region Templates Explicit shape matching without

extracting edges Capture explicit boundary

and region information

176

Main Contributions Representing Feature-poor Objects

Gradient Networks Boundary and Region Templates Explicit shape matching without

extracting edges Capture explicit boundary

and region information

177

Main Contributions Occlusion Reasoning

Occlusion Conditional Likelihood

Representing occlusion structure under arbitrary viewpoint

Occlusion Efficient Subwindow Search

Directly search for occluding blocks to explain matching pattern

178

Main Contributions Occlusion Reasoning

Occlusion Conditional Likelihood

Representing occlusion structure under arbitrary viewpoint

Occlusion Efficient Subwindow Search

Directly search for occluding blocks to explain matching pattern

179

Main Contributions Occlusion Reasoning

Occlusion Conditional Likelihood

Representing occlusion structure under arbitrary viewpoint

Occlusion Efficient Subwindow Search

Directly search for occluding blocks to explain matching pattern

180

Acknowledgements

181

Martial Hebert Alexei Efros Takeo Kanade Andrew Zisserman

182

183

184

Background

Augmented Reality

185

3D model Target environment

Augmented Reality

186

3D model Target environment

Instance vs. Category Recognition

187

Instance Arbitrary viewpoint and lighting

Single image per view

Category Intra-class variations

Many images per view

Ambiguous Viewpoint

188

Failure of SIFT Matching

189

Invariant Approaches

190

Future Directions

Fine-grained verification

Scalability 3D

191

Fine-grained Verification

192

Scalability

193

3D

194

195

Datasets

CMU Grocery Dataset

• 620 images of household objects – 10 objects

• 25 single instance, 25 double instance • 12 with ground truth pose

– Clutter, viewpoint, lighting, occlusion

CMU Kitchen Occlusion Dataset

• 1600 images of 8 household objects • Single and multiple viewpoints • Cluttered scenes and occlusions

197 Hsiao and Hebert, CVPR 2012.

Objects Example images

198

Gradient Networks

Local Shape Potential

199

Region of influence Appearance Edge

Local Appearance

200

Gradient Orientation Color

Potentials

201

Pairwise

Unary

Message Passing

202

Shape Similarity

Probability Calibration

scores

Scheirer et al. CVPR 2012

NOT Object Object

Dens

ity o

f N

OT

Obj

ect

Probability of O

bject Weibull fit to

tail of negative distribution

CDF of NOT Object

203

Soft Shape Model

204

Additional Results

205

Color Potential

206

LINE2D Similarity

207

ipModel point

∑=

∆=N

iiDLINEscore

12 )cos( θ

LINE2D (Hinterstoisser et al., PAMI 2011)

00.1)0cos( =o

71.0)45cos( =o

iθ∆

Quantized gradient orientation of model point, pi

Quantized gradient orientation of the best matching image point in a local neighborhood

∑=

=∆=N

iiDrLINEscore

12 )0( θδ

Robust LINE2D Similarity

208

iθ∆

ipModel point

rLINE2D (Hsiao and Hebert, CVPR 2012)

Quantized gradient orientation of model point, pi

Quantized gradient orientation of the best matching image point in a local neighborhood

Message Passing Iterations

209

Probability Calibration

210

F-Measure of Shape Matching

211

Single View

212

Multiple View

213

Detection Rate @ 1.0 FPPI

214

Detection Rate @ 1.0 FPPI

215

False Positives

216

217

BaRT

Grid Optimization

Un-optimized : 57 cells Optimized : 60 cells

218

HOG Normalization

219

Amplifies noise in uniform region!

HOG Normalization

220

Sensitive to shading effects!

HOG Normalization Pedestrians

221

Average Precision

222

Single View

223

Multiple View

224

False positives

Match both boundary and region

225

BaRT False Positives Insufficient edge evidence

Unlikely occlusion configuration

Region information is only informative after there is a plausible hypothesis based on the boundary

226

227

Occlusion Reasoning

Occlusion Model

228

Occlusion Scoring

Sliding window

Object detector

Occlusion hypothesis (binary)

Score of window

Occlusion model

229

Occlusion Conditional Likelihood

230

Occlusion Conditional Likelihood

Approximation

Analytic Approximate

231

Distribution of Physical Dimensions

Household Objects

232

Occlusion Statistics

233

Validity of Occlusion Model

234

Occlusion Penalty

Occlusion Prior Penalty (OPP)

Occlusion Conditional Likelihood Penalty (OCLP)

235

Average Precision

236

Performance vs. Occlusion

237

Learning from Data

238

Parameter Sensitivity

239

240

OESS

Occlusion Upper Bound

241

OESS Algorithm

242

OESS vs. Brute Force

243

Occlusion Prediction

244

Object Detection Performance

245

246

Ambiguous Features

Problem

• Not enough correct matches

Result of our system Difficult to obtain matches

Discriminative hierarchical matching (DHM)

Model features (Level 0)

Quantized features (Level 1)

Quantized features (Level 2)

Image features

discriminative match

discriminative match

discriminative match

Candidate correspondences

aggregate

DHM example

All features

DHM result

DHM – 11 correct matches (soymilk can)

Ratio test – 3 correct matches (soymilk can)

Simulated Affine (SA)

Morel & Yu 2009

Baseline systems • Gordon & Lowe

– SIFT + RANSAC – Levenberg-Marquardt non-linear optimization

• Enhanced PnP (EPnP)

– Gordon & Lowe – EPnP non-iterative pose estimation algorithm

• Collet et al.

– Gordon & Lowe – Mean-shift spatial clustering of image features

Averaged precision-recall

Average Precision

Object detection results

Failure cases

Pose ambiguity

Repeated patterns

Extreme lighting, occlusion, viewpoint…etc

top related