detecting object instances without discriminative featuresehsiao/thesis/ehsiao_thesis_slides.pdf ·...

Detecting Object Instances Without Discriminative Features

Edward Hsiao

June 19, 2013

Thesis Committee: Martial Hebert, Chair

Alexei Efros Takeo Kanade

Andrew Zisserman, University of Oxford 1

Object Instance Detection

Find this object under arbitrary viewpoint, lighting, clutter and occlusions 2

Robotic Manipulation

Scene Understanding

Stove Refrigerator

Microwave Coffee maker

Paper towel

Dishwasher

Faucet

Visual Search

Recognition Using Discriminative Features

model test image

[SIFT, Lowe 2004]

Extract Keypoints

test image

[SIFT, Lowe 2004]

Generate 1-To-1 Correspondences

test image

[SIFT, Lowe 2004]

Enforce Geometric Constraints

test image

[SIFT, Lowe 2004]

Recognized Object

test image

[SIFT, Lowe 2004]

Failure of Feature Matching

test image model

0 correct correspondences

Overview Lack of Discriminative Features

Ambiguous Keypoint Features

Feature-poor objects

Occlusions

Repeated Patterns

Failure of Discriminative Matching

Geometric model

mdesc2

Model descriptors

mdesc1

Image keypoint descriptor

Geometric model

mdesc2

Model descriptors

mdesc1

One-to-one matching

Geometric model

mdesc2

Model descriptors

mdesc1

One-to-one matching

Most approaches discard ambiguous features 21

Quantized Matching

Geometric model

qdesc2

Quantized model descriptors

qdesc1

Quantized Matching

Geometric model

qdesc2

Quantized model descriptors

qdesc1

Quantized matching

Preserve ambiguity of match until geometric verification 23

Detection Performance

Average Precision (higher is better)

CMU Grocery Dataset

620 images, 10 household objects

one-to-one matching

[Collet et al. 2009]

quantized matching

Failure of Feature Matching

test image

25 0 correct correspondences

Keypoint Comparison

Success Failure

Uninformative Keypoints

“Informative” Keypoints

980 keypoints 10 keypoints

Keypoints contained entirely within the object 30

“Informative” Keypoints

980 keypoints 10 keypoints

Keypoints due to specularities 31

Less keypoints More keypoints

Feature-richness

Feature Matching Experiment

At least 5 good correspondences between all pairs of images

Works Fails

Feature-rich Feature-poor

Occlusions

Feature-poor Objects Shape Matching

Template shape Input window Matched shape

Representing Feature-poor Objects

Sparse Edge Points [Berg 2005], [Leordeanu 2007],

[Duchenne 2009], [Hinterstoisser 2011]

Lines & Contour Fragments [Ferrari 2006 & 2008],

[Opelt 2006], [Srinivasan 2010]

Histogram of Oriented Gradients (HOG) [Dalal and Triggs 2005], [Lai 2011]

Sparse Edge Points

Local information: gradient orientation and color

Sparse Edge Points Matched Not matched

Sparse Edge Points

Edge connectivity is lost

Matched Not matched

Lines & Contour Fragments

Line fitting is brittle

Difficult to parameterize

Dependent on edge extraction

Splines sensitive to occlusions

Lines & Contour Fragments

Line fitting is brittle

Difficult to parameterize

Dependent on edge extraction

Splines sensitive to occlusions

Histogram of Oriented Gradients

Coarse statistics of gradient orientation and magnitude

Corrupted by background clutter Ambiguous shape

patch HOG HOG patch

Corrupted by background clutter Ambiguous shape

patch HOG HOG patch

Gradient Networks Our Approach

1. Match shape explicitly 2. Enforce connectivity without extracting edges

Gradient Networks Overview

Shape template Input window

Gradient Networks Overview

Shape template Input window

Gradient Networks Local Shape Potential

How well does each pixel match locally? 62

Gradient Networks Predicted Shape Match

Find long connected components which follow shape 63

Local Shape Potential

Distance to template Local orientation

Color Edge potential 64

Local Orientation Potential

model test

local orientation potential

Gradient Networks

Each pixel is a node in the network 71

Gradient Networks

Connect each node to neighbors in tangent direction 72

Gradient Networks

Find paths in the network that match the shape well 73

Message Passing Local shape potential

shape similarity

local shape potential

message from left

message from right

[Bhat et al. 2010]

Message Passing Local shape potential

Initially, it is just the local shape potential 75

Message Passing

Local shape potential

Message Passing

Predicted Shape Match

Local shape potential Predicted match

Message passing

CMU Kitchen Occlusion Dataset

• 1600 images of 8 feature-poor objects • Single and multiple viewpoints • Cluttered scenes and occlusions

Objects Example images

Shape Matching Results

Template Input window Local shape potential

Predicted match 81

Predicted match 82

Predicted match 83

Object Detection Sliding Window

better

False positives with shape only

Object False positive window

GN point-wise confidences

Interior Appearance

Object False positive window

GN point-wise confidences

BaRT Boundary and Region Templates

Boundary

Explicit shape: rLINE2D and GN 92

Region

Consider appearance within the object interior HOG and color

Combines explicit boundary and region information

HOG Uniform Regions

Uniform regions not represented well

HOG Normalization

Each cell normalized with respect to magnitude of neighbors

HOG Normalization

Amplifies noise if magnitude close to 0

Uniform Regions

Learning?

HOG + SVM Multiple images

weight = 0

HOG + exemplar SVM Single image

weight = random

Learning?

weight = 0

weight = random

Learning?

weight = 0

weight = random

Modify HOG Normalization

Modified HOG HOG

Set cell to zero if normalization below threshold

Matching Uniform Regions

Ours HOG

Test image:

HOG Ours

Ours HOG

Test image:

HOG Ours

Ours HOG

More accurate confidences in uniform regions

Test image:

HOG Ours

Example Detections

detection zoomed in boundary (GN)

region (HOG+color) 108

Example Detections

Detection Performance Under Different Occlusion Levels

Occlusions

Occlusions happen in 3D

Occlusion Reasoning

Matched Not matched

Which of these hypotheses is most likely? 123

Occlusion Reasoning

Matched Not matched

Occlusion Reasoning

Matched Not matched

Occlusion Reasoning

Matched Not matched

Occlusion Reasoning

Local Coherency Fransens ‘06, Wang ‘09

Learn Occlusion Structure Gao ’11, Kwak ‘11

Object Detection Depth Ordering Wu ‘05, Wang ‘11

Structure of Occlusions

Binary variable that equals 1 if is visible

Probability a point is visible given the visibility labeling of all other points

Occlusion Conditional Likelihood

Occlusion under a given camera view point c

Matched Not matched

Occlusion Reasoning Per Environment

objWobjL

Estimate of object dimensions Distribution of object dimensions for a given environment

Occlusion Model

Occluder

Object

Occlusion Model objW

Occluder

Object

Occlusion Model objW

Occluder

Object

𝐴𝑉𝑖,𝑉𝑗,𝑂𝑐

𝐴𝑉𝑗,𝑂𝑐

Integral Geometry 134

Area covering all positions where Xj is visible and object occluded 135

Area covering all positions where Xj and Xj are visible and object occluded 138

𝑿𝒋

Occlusion Conditional Likelihood Under Different Viewpoints

Occlusion Conditional Likelihood Penalty (OCLP)

High penalty if unlikely to be occluded by a valid object on same support surface

Matched Not matched

:OCLPf

Low penalty if likely to be occluded by a valid object on same support surface

Matched Not matched

:OCLPf

iXMatched Not matched

Low penalty if likely to be occluded by a valid object on same support surface :OCLPf

Example Detections

Detection Performance Under Different Occlusion Levels

Limitation

Binary Matching Pattern Occlusion Conditional Likelihood

Limitation

Misclassifications can have impact on distribution

Binary Matching Pattern Occlusion Conditional Likelihood

Occlusion Efficient Subwindow Search (OESS)

Probabilistic Matching Pattern 150 Probabilistic Matching Pattern

OESS for True Positive

Occlusion can be explained well 151

OESS for True Positive

95% explained 152

OESS for False Positive

Only 50% explained 154

OESS Scoring Matching Pattern

𝑝 = 1

𝑝 = 0

score = (1) + (1) + (-1) + (-1) = 0 155

Occluding block

𝑝 = 1

𝑝 = 0

score = (1) + (1) + (1) + (-1) = 2 156

rewarded

Occluding block

𝑝 = 1

𝑝 = 0

penalized

score = (-1) + (1) + (1) + (-1) = 0 157

Reformulate as Efficient Subwindow Search (ESS) 158

Find best occluder object 159

Remove all explained points 160

Iterate 161

Iterate 162

Iterate 163

Final prediction 164

Results

groundtruth predicted oboxes boundary region window detection

Results

Occlusion Prediction Performance

vs. predicted groundtruth

Average Intersection over Union (IoU)

Occlusion Prediction Performance

predicted groundtruth 170

Summary Lack of Discriminative Features

Gradient Networks

Boundary and Region Templates

Occlusion Efficient Subwindow Search

Occlusions 173

Main Contributions Ambiguous Keypoint Features

Making specific features less discriminative 174

Main Contributions Representing Feature-poor Objects

Gradient Networks Boundary and Region Templates Explicit shape matching without

extracting edges Capture explicit boundary

and region information

Main Contributions Occlusion Reasoning

Representing occlusion structure under arbitrary viewpoint

Directly search for occluding blocks to explain matching pattern

Acknowledgements

Martial Hebert Alexei Efros Takeo Kanade Andrew Zisserman

Background

Augmented Reality

3D model Target environment

Augmented Reality

3D model Target environment

Instance vs. Category Recognition

Instance Arbitrary viewpoint and lighting

Single image per view

Category Intra-class variations

Many images per view

Ambiguous Viewpoint

Failure of SIFT Matching

Invariant Approaches

Future Directions

Fine-grained verification

Scalability 3D

Fine-grained Verification

Scalability

Datasets

CMU Grocery Dataset

• 620 images of household objects – 10 objects

• 25 single instance, 25 double instance • 12 with ground truth pose

– Clutter, viewpoint, lighting, occlusion

CMU Kitchen Occlusion Dataset

• 1600 images of 8 household objects • Single and multiple viewpoints • Cluttered scenes and occlusions

197 Hsiao and Hebert, CVPR 2012.

Objects Example images

Gradient Networks

Region of influence Appearance Edge

Local Appearance

Gradient Orientation Color

Potentials

Pairwise

Message Passing

Shape Similarity

Probability Calibration

scores

Scheirer et al. CVPR 2012

NOT Object Object

Probability of O

bject Weibull fit to

tail of negative distribution

CDF of NOT Object

Soft Shape Model

Additional Results

Color Potential

LINE2D Similarity

ipModel point

iiDLINEscore

12 )cos( θ

LINE2D (Hinterstoisser et al., PAMI 2011)

00.1)0cos( =o

71.0)45cos( =o

iθ∆

Quantized gradient orientation of model point, pi

Quantized gradient orientation of the best matching image point in a local neighborhood

=∆=N

iiDrLINEscore

12 )0( θδ

Robust LINE2D Similarity

iθ∆

ipModel point

rLINE2D (Hsiao and Hebert, CVPR 2012)

Quantized gradient orientation of model point, pi

Quantized gradient orientation of the best matching image point in a local neighborhood

Message Passing Iterations

Probability Calibration

F-Measure of Shape Matching

Single View

Multiple View

Detection Rate @ 1.0 FPPI

False Positives

Grid Optimization

Un-optimized : 57 cells Optimized : 60 cells

HOG Normalization

Amplifies noise in uniform region!

HOG Normalization

Sensitive to shading effects!

HOG Normalization Pedestrians

Average Precision

Single View

Multiple View

False positives

Match both boundary and region

BaRT False Positives Insufficient edge evidence

Unlikely occlusion configuration

Region information is only informative after there is a plausible hypothesis based on the boundary

Occlusion Reasoning

Occlusion Model

Occlusion Scoring

Sliding window

Object detector

Occlusion hypothesis (binary)

Score of window

Occlusion model

Approximation

Analytic Approximate

Distribution of Physical Dimensions

Household Objects

Occlusion Statistics

Validity of Occlusion Model

Occlusion Penalty

Occlusion Prior Penalty (OPP)

Average Precision

Performance vs. Occlusion

Learning from Data

Parameter Sensitivity

Occlusion Upper Bound

OESS Algorithm

OESS vs. Brute Force

Occlusion Prediction

Object Detection Performance

Ambiguous Features

Problem

• Not enough correct matches

Result of our system Difficult to obtain matches

Discriminative hierarchical matching (DHM)

Model features (Level 0)

Quantized features (Level 1)

Quantized features (Level 2)

Image features

discriminative match

Candidate correspondences

aggregate

DHM example

All features

DHM result

DHM – 11 correct matches (soymilk can)

Ratio test – 3 correct matches (soymilk can)

Simulated Affine (SA)

Morel & Yu 2009

Baseline systems • Gordon & Lowe

– SIFT + RANSAC – Levenberg-Marquardt non-linear optimization

• Enhanced PnP (EPnP)

– Gordon & Lowe – EPnP non-iterative pose estimation algorithm

• Collet et al.

– Gordon & Lowe – Mean-shift spatial clustering of image features

Averaged precision-recall

Average Precision

Object detection results

Failure cases

Pose ambiguity

Repeated patterns

Extreme lighting, occlusion, viewpoint…etc

detecting object instances without discriminative featuresehsiao/thesis/ehsiao_thesis_slides.pdf ·...

Documents

food-101 { mining discriminative components with random...

discriminative adaptive training and discriminative...

maxentmodels and discriminative estimation · pdf...

adversarial discriminative domain adaptation - cvf open...

discriminative model checking

a discriminative-generative model for detecting ......a...

discriminative collaborative representation for...

localized discriminative gaussian process latent variable...

discriminative naïve bayesian classifiers

machine learning classification, discriminative...

base classes - arxiv.org e-print archivearxiv:1812.01866v2...

discriminative multiple target tracking

efﬁcient mining of discriminative co-clusters from gene...

hybrid generative-discriminative visual...

networkscampar.in.tum.de/files/teaching/2019ws/mlmi/protected/gans/gan… ·...

discriminative estimation (maxentmodelsandperceptron)

lecture 32: discriminative training

lecture 25: discriminative training

on detecting co-resident cloud instances using network ﬂow...

discriminative blur detection features