a coarse-to-fine indoor layout estimation (cfile)...

A Coarse-to-Fine Indoor Layout Estimation (CFILE) Method

YUZHUO REN AND C.-C. JAY KUO

Media Communications Lab

University of Southern California

• Introduction

• Problem Statement

• Applications

• Challenges

• Related Work

• Proposed Method

• Conclusion

15 July 2016 Seminar 2

Outline

• Introduction

• Problem Statement

• Applications

• Challenges

• Related Work

• Proposed Method

• Conclusion


Outline

Problem Statement


Inp

ut

Imag

e

Layout: Segmentation Representation Layout: Corner Representation

De

sire

d O

utp

ut

Indoor Layout Estimation:

Applications


Indoor scene understanding from a single image is a challenging yet important problem in many applications including:

• Indoor Robotics • Real Estate• Virtual Interior Design

Applications


Indoor Robotics

Applications


Real Estate

Applications


Virtual Interior Design

Challenges


There are many challenges in indoor scene understanding from a single image which are mainly due to:

• Poor illumination• Cluttered objects• Different viewpoints• Occlusions

Challenges


Lots of objects

Challenges


View point variations

Challenges


Occlusion &Poor illumination

Assumption


Indoor Scene understanding from a single image is generally based on the so-called “Manhattan World” assumption:

The scene is composed of three main directions orthogonal to each other.

Vanishing Point


Receding parallel lines converge in the distance at eye level. The pointwhere they meet is called a vanishing point.

Dataset


Image

Sample Image from a dataset

Layout Ground Truth Object Label

Dataset


Dataset Published Year

Gray/Color

Image Number

Scene Category

ObjectLabel

Layout Label

UCB 2009 Gray 340 N/A x √

UIUC 2009 Color 314 N/A √ √

3DGP 2013 Color 963 3 √ √

LSUN 2016 Color 5394 8 x √

There are several datasets including:

UIUC Dataset


• Published in ICCV 2009

• 314 Images

• Color Images

• Layout Ground Truth

• Object Label Image Layout Ground Truth Object Label

LSUN Dataset


• Published in CVPR 2016 workshop

• 5394 Images

• Color Images

• Layout Ground Truth

• 8 Scene Types Image Layout Ground Truth

LSUN Dataset


Evaluation Metric


1 Pixel-wise Error: • Search for the best one to one surface mapping

• Compute percentage of pixels that

have the wrong labels

• Penalize unmatched region

2 Corner Error: • Search for the best one to one corner mapping

• The error will be the distance from ground truth corner

• The error will be normalized by the image resolution

Result Ground Truth

• Introduction

• Related Work

• Proposed Method

• Conclusion


Outline

Related Work


• Traditional Methods:• Hand craft features: vanishing lines, line membership features, geometric

context labels, object locations, etc.

• Structured regressor for rank layouts

• Fully Convolutional Networks (FCN) Based Methods:• Apply FCN to learn “Informative Edges” and use edge based feature and line

membership feature in structured regressor learning, by Mallya et al., ICCV 2015

• Apply FCN to learn surface segmentation and use surface belief map to rank layouts, by Dasgupta et al., CVPR 2016

Related Work








Related Work


• Traditional Methods: Structured Learning

Hedau, Varsha, Derek Hoiem, and David Forsyth. "Recovering the spatial layout of cluttered rooms." ICCV, 2009

X = Y =

Related Work




X = Y(i) =

Score(i) = f (X, Y)Y = Highest Score of all Y(i)

Related Work


• Traditional Methods: Structured Learning• Assumption : Manhattan World Assumption


Related Work




Related Work



Vanishing Point Estimation

Layout Generation

Evaluate Box Layout

Pick Highest Score Box Layout


Line Segment Detection

Related Work



Visual Result: Best Cases

Related Work



Visual Result: Worst Cases

Related Work


• Improve Features• Surface Label (ICCV2009)• Orientation Map (CVPR2009)• Manhattan Junctions (CVPR2013)

• Improve Layout Proposals• Volume Reasoning (NIPS2010)• Generative Model(CVPR2012)• 3D Geometric Phrases (CVPR2013)• Box in the Box (CVPR2013)• Rent 3D (CVPR2015)• Informative Edge(ICCV2015)• Surface Norm (CVPR2015)• “Informative Edge” (ICCV2015)

Related Work


MethodsSurfaceLabel

(ICCV2009)

OrientationMap

(CVPR2009)

Volume Reasoning (NIPS2010)

ManhattanJunctions

(CVPR2013)

3DGP(CVPR2013)

Box in Box(CVPR2013)

Pixel-wiseError

0.2120 0.1860 0.1620 0.1340 0.1740 0.1360

Related Work








Related Work


• FCN to learn “Informative Edges”

• Use edge-based feature and line membership feature in structured regressor learning

Vanishing Line Informative Edge Maps Generate Candidate Layouts

Mallya, Arun, and Svetlana Lazebnik. "Learning Informative Edge Maps for Indoor Scene Layout Prediction." ICCV 2015.

Related Work


Saumitro Dasgupta, Kuan Fang, K.C.S.S.”Delay: Robust spatial layout estimation for cluttered indoor scenes”. CVPR 2016

• Apply FCN (FCN8s) to learn surface segmentation

• Use surface belief map to rank layouts

• Introduction

• Related Work

• Proposed Method

• Conclusion


Outline

Input ResultStep 1:

Coarse Layout EstimationStep 2:

Layout Refinement


Overview of Our Method

Input ResultStep 1:


Layout Refinement


Step 1: Coarse Layout Estimation (1)


Step 1: Coarse Layout Estimation (2) Multi-task Fully Convolutional Networks (FCN)*

• Two tasks: Coarse layout and semantic surface

• Architecture: VGG-16 structure, 32 pixel output stride

• Training images: 4000 LSUN 2016 training images resized to 404x404

* Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” CVPR 2015.


Step 1: Coarse Layout Estimation (3) Multi-task Fully Convolutional Networks (FCN)*

• Network initialization: NYUD v2 indoor dataset trained on 40 classes semantic segmentation task

• Base learning rate : 10e-4

* Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” CVPR 2015.


Step 1: Coarse Layout Estimation (4)

Semantic Surface Re-Labeling• Original Label

• Not Consistent

• New Label • Consistent among surfaces

• 1-> Frontal wall

• 2-> Left wall

• 3-> Right wall

• 4-> Floor

• 5-> Ceiling

New

Lab

elO

rigi

nal

Lab

el


Step 1: Coarse Layout Estimation (5) Visual Results

Image Informative Edge* Our Result

* Arun Mallya and Svetlana Lazebnik. “Learning Informative Edge Maps for Indoor Scene Layout Prediction.” ICCV 2015.

Image Informative Edge* Our Result

Image Informative Edge* Our Result Image Informative Edge* Our Result


Step 1: Coarse Layout Estimation (6) Quantitative Results

FCN(ICCV2015) MFCN1(ours) MFCN2(ours)

Metrics ODS OIS ODS OIS ODS OIS

UIUC dataset 0.255 0.263 0.265 0.284 0.265 0.291

• FCN: jointly train coarse layout and geometric context label(ICCV 2015)

• MFCN1: jointly train coarse layout and semantic surface, original size

• MFCN2: jointly train coarse layout and semantic surface, resize to 404

Input ResultStep 1:


Layout Refinement


Step 2: Layout Refinement (1)



Layout Model

Image



Scoring Layout Hypotheses

Critical LineDetection

Input

Result

…

Score = 0.574 Score = 0.476

Score = 0.326 Score = 0.211…


Step 2: Layout Refinement (4) Critical Line Detection

• Vanishing line and vanishing point detection

• Binarize coarse layout (Threshold=0.1) and erode by 3 pixels

• Sample vanishing lines inside the binary map as critical lines

Critical Line Detection

Input

Vanishing Line

Binarize



• Handling undetected lines: Least square fitting of the coarse layout

Input Image Coarse Layout Vanishing Lines



• Handling occluded lines

Coarse Layout

Vanishing LinesOccluded Lines extension and fill in



Scoring Layout Hypotheses• P :Coarse layout probability output

• L : Layout binary map(dilate by 3 pixels)

(1: layout pixel, 0: background pixel)

• N: Number of layout pixels in L

• S : Score function value P

L


Step 2: Layout Refinement (8) Scoring Layout Hypotheses

Score = 0.574 Score = 0.476

Score = 0.326 Score = 0.211


Image Coarse Layout Score = 0.209 Score = 0.156 Score = 0.132




Performance Results

Method Pixel-wise Error Corner Error

Baseline(Hedau et al. ICCV09) 0.2423 0.1548

UIUC (Mallya et al. ICCV2015) 0.1671 0.1102

DeLay (Dasgupta et al. CVPR2016) 0.1063 0.0820

Ours 0.0757 0.0523

LSUN 2016 Dataset


Performance Results

Method Pixel-wise Error

Baseline(Hedau et al. ICCV09) 0.2120

UIUC (Mallya et al. ICCV2015) 0.1283

DeLay (Dasgupta et al. CVPR2016) 0.0973

Ours (ACCV 2016, in submission) 0.0867

UIUC Dataset


Visual Results: Best Cases(1)Image Coarse Layout Image Our Result Our Result


Visual Results: Best Cases(2)Image Coarse Layout Image Our Result Our Result


Visual Results: Worst Cases(1)Image Coarse Layout Image Our Result Our Result


Visual Results: Worst Cases(2)Image Coarse Layout Image Our Result Our Result

• Introduction

• Related Work

• Proposed Method

• Conclusion


Outline


Conclusion

• A simple coarse-to-fine indoor layout estimation framework is proposed.

• The effectiveness of multi-task FCN for coarse layout learning is demonstrated (i.e., jointly learn coarse layout and semantic surface).

• A coarse layout probability based score function is used to score layout hypotheses.

• Possible improvement may be achieved by incorporating object information and increasing training samples for rare layout types.


Thank You!


References

• V. Hedau, D. Hoiem, and D. Forsyth. Recovering the spatial layout of cluttered rooms. ICCV, 2009.

• J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. CVPR, 2015.

• A. Mallya, and S. Lazebnik. Learning informative edge maps for indoor scene layout prediction. ICCV, 2015.

• S. Dasgupta, et al. DeLay: Robust spatial layout estimation for cluttered indoor scenes. CVPR, 2016.

a coarse-to-fine indoor layout estimation (cfile)...

Documents