3rd workshop on semantic perception, mapping and exploration (spme) karlsruhe, germany,2013 semantic...

47
3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany ,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes Cesar Cadena and Jana Kosecka

Upload: angelique-alvey

Post on 01-Apr-2015

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

3rd Workshop On Semantic Perception, Mapping and Exploration (SPME)

Karlsruhe, Germany ,2013

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Cesar Cadena and Jana Kosecka

Page 2: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Motivation

5/5/2013

Long-term robotic operation

The semantic information about the surrounding environment is important for high level robotic tasks.

It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.

Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time.

Page 3: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Motivation

5/5/2013

Long-term robotic operation

The semantic information about the surrounding environment is important for high level robotic tasks.

It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.

Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time.

Page 4: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Motivation

5/5/2013

Long-term robotic operation

The semantic information about the surrounding environment is important for high level robotic tasks.

It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.

Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time.

Page 5: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Motivation

5/5/2013

Long-term robotic operation

The semantic information about the surrounding environment is important for high level robotic tasks.

It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.

Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time.

Page 6: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

However: There are things we can assume to be present

(almost) always Generic “detachable” objects also share some

characteristics

Urban: Ground Buildings Sky ObjectsIndoors:Ground Walls Ceiling Objects

Today: Ground – Structure – Furniture – Props

Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors

Motivation

5/5/2013

Page 7: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

However: There are things we can assume to be present

(almost) always Generic “detachable” objects also share some

characteristics

Urban: Ground Buildings Sky ObjectsIndoors:Ground Walls Ceiling Objects

Today: Ground – Structure – Furniture – Props

Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors

Motivation

5/5/2013

Page 8: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

However: There are things we can assume to be present

(almost) always Generic “detachable” objects also share some

characteristics

Urban: Ground Buildings Sky ObjectsIndoors:Ground Walls Ceiling Objects

Today: Ground – Structure – Furniture – Props

Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors

Motivation

5/5/2013

Page 9: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

However: There are things we can assume to be present

(almost) always Generic “detachable” objects also share some

characteristics

Urban: Ground Buildings Sky ObjectsIndoors:Ground Walls Ceiling Objects

Today: Ground – Structure – Furniture – Props

Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors

Motivation

5/5/2013

Page 10: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

However: There are things we can assume to be present

(almost) always Generic “detachable” objects also share some

characteristics

Urban: Ground Buildings Sky ObjectsIndoors:Ground Walls Ceiling Objects

Today: Ground – Structure – Furniture – Props

Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors

Our Problem

5/5/2013

Page 11: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

However: There are things we can assume to be present

(almost) always Generic “detachable” objects also share some

characteristics

Urban: Ground Buildings Sky ObjectsIndoors:Ground Walls Ceiling Objects

Today: Ground – Structure – Furniture – Props

Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors

Our Problem

5/5/2013

Page 12: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

NYU Depth v2

5/5/2013

1449 labeled frames. 26 scenes classes. Labeling spans over 894 different classes.

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, Indoor segmentation and support inference from RGBD images, in ECCV, 2012.

Thanks to N. Silberman for proving the mapping 894 to 4 classes.

Page 13: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

The System

5/5/2013

Semantic Segmentation

MAP

Marginals

Page 14: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Different approaches

5/5/2013

Semantic Segmentation

MAP

Marginals

N. Silberman et al. ECCV 2012 C. Couprie et al. CoRR 2013 X. Ren et al. CVPR 2012 D. Munoz et al. ECCV 2010 I. Endres and D. Hoeim, ECCV

2010

They have at least one:

Expensive over-segmentation

Expensive features Expensive Inference

Page 15: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Our approach

5/5/2013

MAP

Marginals

Semantic Segmentation

Conditional Random Fields

Potentials

Graph Structure Inferenc

ePreprocessing

Page 16: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Outline

5/5/2013

MAP

Marginals

Conditional Random Fields

Potentials

Graph Structure Inferenc

ePreprocessing (1)

(2)

(3)

(5)Results

(6)Conclusions

(4)

Page 17: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Preprocessing: Over-segmentation

5/5/2013

SLIC superpixels

R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk,SLIC superpixels compared to state-of-the-art superpixel methods,PAMI, 2012.

Page 18: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Graph Structure

5/5/2013

Classical choice on images

Page 19: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Graph Structure: Our choice

5/5/2013

Minimum Spanning Tree

Over 3D

Page 20: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Graph Structure: Our choice

5/5/2013

Minimum Spanning Tree

Over 3D

Page 21: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: Pairwise CRFs

5/5/2013

Page 22: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: Pairwise CRFs

5/5/2013

Page 23: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: Pairwise CRFs

5/5/2013

Page 24: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: unary

5/5/2013

frequency of label j in a k-NN queryfrequency of label j the database

J. Tighe and S. Lazebnik, Superparsing: Scalable nonparametric image parsing with superpixels,ECCV 2010.

The database is a kd-tree of features from training data

Page 25: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features 12D

5/5/2013

From Image: mean of Lab color space

3D vertical pixel location

1D entropy from vanishing points

1D

From 3D height and depth

2D mean and std of differences on depth

2D local planarity

1D neighboring planarity

1D vertical orientation

1D

Page 26: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features

5/5/2013

From Image: entropy from vanishing points

Page 27: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features

5/5/2013

From 3D mean and std of differences on depth

Page 28: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features

5/5/2013

From 3D mean and std of differences on depth

Page 29: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Features

5/5/2013

From 3D mean and std of differences on depth

local planarity neighboring planarity vertical orientation

Page 30: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Potentials: pairwise

5/5/2013

Lab color

Page 31: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Inference

5/5/2013

We use belief propagation:

Exact results in MAP/marginals

Efficient computation, in

Thanks to our graph structure choice!

Page 32: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset

5/5/2013

GT MAP

Page 33: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset

5/5/2013

Confusion matrix:

Comparisons:

Page 34: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset

5/5/2013

Confusion matrix:

Comparisons:

Page 35: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset

5/5/2013

GT MAP

Some failures:

Page 36: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Results: NYU-D v2 Dataset

5/5/2013

Page 37: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Marginal probabilities

5/5/2013

Provide very useful information for specific tasks, e.g. :

Specific object detection Support inference

P(Ground) P(Structure) P(Furniture) P(Props)

Page 38: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Conclusions

5/5/2013

We have presented a computational efficient approach for semantic segmentation of priming objects in indoors.

Our approach effectively uses 3D and Images cues.

Depth discontinuities are evidence for occlusions

The MST over 3D keeps intra-class components coherently connected.

Page 39: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Discussion

5/5/2013

Features:

Local classifier:

Graph structure

Bunch of engineered features (>1000D)

Learned features(>1000D)

Select meaningful features(12D)

Logistic Regression Neural Networks k-NN

Dense ConnectionsImage

None MST over 3D

Silberman et al. 2012 Couprie et al. 2013

Ours.

Page 40: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Thanks!!

5/5/2013

Cesar Cadena [email protected] Kosecka [email protected]

Funded by the US Army Research Office Grant W911NF-1110476.

Page 41: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Working on:

5/5/2013

People detection by Shenghui Zhou

Page 42: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video:

5/5/2013

Page 43: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video:

5/5/2013

Page 44: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video:

5/5/2013

Page 45: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video:

5/5/2013

Page 46: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video:

5/5/2013

Page 47: 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Multi-view and video:

5/5/2013