a coarse-to-fine indoor layout estimation (cfile)...
TRANSCRIPT
A Coarse-to-Fine Indoor Layout Estimation (CFILE) Method
YUZHUO REN AND C.-C. JAY KUO
Media Communications Lab
University of Southern California
• Introduction
• Problem Statement
• Applications
• Challenges
• Related Work
• Proposed Method
• Conclusion
15 July 2016 Seminar 2
Outline
• Introduction
• Problem Statement
• Applications
• Challenges
• Related Work
• Proposed Method
• Conclusion
15 July 2016 Seminar 3
Outline
Problem Statement
15 July 2016 Seminar 4
Inp
ut
Imag
e
Layout: Segmentation Representation Layout: Corner Representation
De
sire
d O
utp
ut
Indoor Layout Estimation:
Applications
15 July 2016 Seminar 5
Indoor scene understanding from a single image is a challenging yet important problem in many applications including:
• Indoor Robotics • Real Estate• Virtual Interior Design
Applications
15 July 2016 Seminar 6
Indoor Robotics
Applications
15 July 2016 Seminar 7
Real Estate
Applications
15 July 2016 Seminar 8
Virtual Interior Design
Challenges
15 July 2016 Seminar 9
There are many challenges in indoor scene understanding from a single image which are mainly due to:
• Poor illumination• Cluttered objects• Different viewpoints• Occlusions
Challenges
15 July 2016 Seminar 10
Lots of objects
Challenges
15 July 2016 Seminar 11
View point variations
Challenges
15 July 2016 Seminar 12
Occlusion &Poor illumination
Assumption
15 July 2016 Seminar 13
Indoor Scene understanding from a single image is generally based on the so-called “Manhattan World” assumption:
The scene is composed of three main directions orthogonal to each other.
Vanishing Point
15 July 2016 Seminar 14
Receding parallel lines converge in the distance at eye level. The pointwhere they meet is called a vanishing point.
Dataset
15 July 2016 Seminar 15
Image
Sample Image from a dataset
Layout Ground Truth Object Label
Dataset
15 July 2016 Seminar 16
Dataset Published Year
Gray/Color
Image Number
Scene Category
ObjectLabel
Layout Label
UCB 2009 Gray 340 N/A x √
UIUC 2009 Color 314 N/A √ √
3DGP 2013 Color 963 3 √ √
LSUN 2016 Color 5394 8 x √
There are several datasets including:
UIUC Dataset
15 July 2016 Seminar 17
• Published in ICCV 2009
• 314 Images
• Color Images
• Layout Ground Truth
• Object Label Image Layout Ground Truth Object Label
LSUN Dataset
15 July 2016 Seminar 18
• Published in CVPR 2016 workshop
• 5394 Images
• Color Images
• Layout Ground Truth
• 8 Scene Types Image Layout Ground Truth
LSUN Dataset
15 July 2016 Seminar 19
Evaluation Metric
15 July 2016 Seminar 20
1 Pixel-wise Error: • Search for the best one to one surface mapping
• Compute percentage of pixels that
have the wrong labels
• Penalize unmatched region
2 Corner Error: • Search for the best one to one corner mapping
• The error will be the distance from ground truth corner
• The error will be normalized by the image resolution
Result Ground Truth
• Introduction
• Related Work
• Proposed Method
• Conclusion
15 July 2016 Seminar 21
Outline
Related Work
15 July 2016 Seminar 22
• Traditional Methods:• Hand craft features: vanishing lines, line membership features, geometric
context labels, object locations, etc.
• Structured regressor for rank layouts
• Fully Convolutional Networks (FCN) Based Methods:• Apply FCN to learn “Informative Edges” and use edge based feature and line
membership feature in structured regressor learning, by Mallya et al., ICCV 2015
• Apply FCN to learn surface segmentation and use surface belief map to rank layouts, by Dasgupta et al., CVPR 2016
Related Work
15 July 2016 Seminar 23
• Traditional Methods:• Hand craft features: vanishing lines, line membership features, geometric
context labels, object locations, etc.
• Structured regressor for rank layouts
• Fully Convolutional Networks (FCN) Based Methods:• Apply FCN to learn “Informative Edges” and use edge based feature and line
membership feature in structured regressor learning, by Mallya et al., ICCV 2015
• Apply FCN to learn surface segmentation and use surface belief map to rank layouts, by Dasgupta et al., CVPR 2016
Related Work
15 July 2016 Seminar 24
• Traditional Methods: Structured Learning
Hedau, Varsha, Derek Hoiem, and David Forsyth. "Recovering the spatial layout of cluttered rooms." ICCV, 2009
X = Y =
Related Work
15 July 2016 Seminar 25
• Traditional Methods: Structured Learning
Hedau, Varsha, Derek Hoiem, and David Forsyth. "Recovering the spatial layout of cluttered rooms." ICCV, 2009
X = Y(i) =
Score(i) = f (X, Y)Y = Highest Score of all Y(i)
Related Work
15 July 2016 Seminar 26
• Traditional Methods: Structured Learning• Assumption : Manhattan World Assumption
Hedau, Varsha, Derek Hoiem, and David Forsyth. "Recovering the spatial layout of cluttered rooms." ICCV, 2009
Related Work
15 July 2016 Seminar 27
• Traditional Methods: Structured Learning• Assumption : Manhattan World Assumption
Hedau, Varsha, Derek Hoiem, and David Forsyth. "Recovering the spatial layout of cluttered rooms." ICCV, 2009
Related Work
15 July 2016 Seminar 28
• Traditional Methods: Structured Learning• Assumption : Manhattan World Assumption
Hedau, Varsha, Derek Hoiem, and David Forsyth. "Recovering the spatial layout of cluttered rooms." ICCV, 2009
Related Work
15 July 2016 Seminar 29
• Traditional Methods: Structured Learning• Assumption : Manhattan World Assumption
Hedau, Varsha, Derek Hoiem, and David Forsyth. "Recovering the spatial layout of cluttered rooms." ICCV, 2009
Related Work
15 July 2016 Seminar 30
• Traditional Methods: Structured Learning
Vanishing Point Estimation
Layout Generation
Evaluate Box Layout
Pick Highest Score Box Layout
Hedau, Varsha, Derek Hoiem, and David Forsyth. "Recovering the spatial layout of cluttered rooms." ICCV, 2009
Line Segment Detection
Related Work
15 July 2016 Seminar 31
Hedau, Varsha, Derek Hoiem, and David Forsyth. "Recovering the spatial layout of cluttered rooms." ICCV, 2009
Visual Result: Best Cases
Related Work
15 July 2016 Seminar 32
Hedau, Varsha, Derek Hoiem, and David Forsyth. "Recovering the spatial layout of cluttered rooms." ICCV, 2009
Visual Result: Worst Cases
Related Work
15 July 2016 Seminar 33
• Improve Features• Surface Label (ICCV2009)• Orientation Map (CVPR2009)• Manhattan Junctions (CVPR2013)
• Improve Layout Proposals• Volume Reasoning (NIPS2010)• Generative Model(CVPR2012)• 3D Geometric Phrases (CVPR2013)• Box in the Box (CVPR2013)• Rent 3D (CVPR2015)• Informative Edge(ICCV2015)• Surface Norm (CVPR2015)• “Informative Edge” (ICCV2015)
Related Work
15 July 2016 Seminar 34
MethodsSurfaceLabel
(ICCV2009)
OrientationMap
(CVPR2009)
Volume Reasoning (NIPS2010)
ManhattanJunctions
(CVPR2013)
3DGP(CVPR2013)
Box in Box(CVPR2013)
Pixel-wiseError
0.2120 0.1860 0.1620 0.1340 0.1740 0.1360
Related Work
15 July 2016 Seminar 35
• Traditional Methods:• Hand craft features: vanishing lines, line membership features, geometric
context labels, object locations, etc.
• Structured regressor for rank layouts
• Fully Convolutional Networks (FCN) Based Methods:• Apply FCN to learn “Informative Edges” and use edge based feature and line
membership feature in structured regressor learning, by Mallya et al., ICCV 2015
• Apply FCN to learn surface segmentation and use surface belief map to rank layouts, by Dasgupta et al., CVPR 2016
Related Work
15 July 2016 Seminar 36
• FCN to learn “Informative Edges”
• Use edge-based feature and line membership feature in structured regressor learning
Vanishing Line Informative Edge Maps Generate Candidate Layouts
Mallya, Arun, and Svetlana Lazebnik. "Learning Informative Edge Maps for Indoor Scene Layout Prediction." ICCV 2015.
Related Work
15 July 2016 Seminar 37
Saumitro Dasgupta, Kuan Fang, K.C.S.S.”Delay: Robust spatial layout estimation for cluttered indoor scenes”. CVPR 2016
• Apply FCN (FCN8s) to learn surface segmentation
• Use surface belief map to rank layouts
• Introduction
• Related Work
• Proposed Method
• Conclusion
15 July 2016 Seminar 38
Outline
Input ResultStep 1:
Coarse Layout EstimationStep 2:
Layout Refinement
15 July 2016 Seminar 39
Overview of Our Method
Input ResultStep 1:
Coarse Layout EstimationStep 2:
Layout Refinement
15 July 2016 Seminar 40
Step 1: Coarse Layout Estimation (1)
15 July 2016 Seminar 41
Step 1: Coarse Layout Estimation (2) Multi-task Fully Convolutional Networks (FCN)*
• Two tasks: Coarse layout and semantic surface
• Architecture: VGG-16 structure, 32 pixel output stride
• Training images: 4000 LSUN 2016 training images resized to 404x404
* Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” CVPR 2015.
15 July 2016 Seminar 42
Step 1: Coarse Layout Estimation (3) Multi-task Fully Convolutional Networks (FCN)*
• Network initialization: NYUD v2 indoor dataset trained on 40 classes semantic segmentation task
• Base learning rate : 10e-4
* Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” CVPR 2015.
15 July 2016 Seminar 43
Step 1: Coarse Layout Estimation (4)
Semantic Surface Re-Labeling• Original Label
• Not Consistent
• New Label • Consistent among surfaces
• 1-> Frontal wall
• 2-> Left wall
• 3-> Right wall
• 4-> Floor
• 5-> Ceiling
New
Lab
elO
rigi
nal
Lab
el
15 July 2016 Seminar 44
Step 1: Coarse Layout Estimation (5) Visual Results
Image Informative Edge* Our Result
* Arun Mallya and Svetlana Lazebnik. “Learning Informative Edge Maps for Indoor Scene Layout Prediction.” ICCV 2015.
Image Informative Edge* Our Result
Image Informative Edge* Our Result Image Informative Edge* Our Result
15 July 2016 Seminar 45
Step 1: Coarse Layout Estimation (6) Quantitative Results
FCN(ICCV2015) MFCN1(ours) MFCN2(ours)
Metrics ODS OIS ODS OIS ODS OIS
UIUC dataset 0.255 0.263 0.265 0.284 0.265 0.291
• FCN: jointly train coarse layout and geometric context label(ICCV 2015)
• MFCN1: jointly train coarse layout and semantic surface, original size
• MFCN2: jointly train coarse layout and semantic surface, resize to 404
Input ResultStep 1:
Coarse Layout EstimationStep 2:
Layout Refinement
15 July 2016 Seminar 46
Step 2: Layout Refinement (1)
15 July 2016 Seminar 47
Step 2: Layout Refinement (2)
Layout Model
Image
15 July 2016 Seminar 48
Step 2: Layout Refinement (3)
Scoring Layout Hypotheses
Critical LineDetection
Input
Result
…
Score = 0.574 Score = 0.476
Score = 0.326 Score = 0.211…
15 July 2016 Seminar 49
Step 2: Layout Refinement (4) Critical Line Detection
• Vanishing line and vanishing point detection
• Binarize coarse layout (Threshold=0.1) and erode by 3 pixels
• Sample vanishing lines inside the binary map as critical lines
Critical Line Detection
Input
Vanishing Line
Binarize
15 July 2016 Seminar 50
Step 2: Layout Refinement (5) Critical Line Detection
• Handling undetected lines: Least square fitting of the coarse layout
Input Image Coarse Layout Vanishing Lines
15 July 2016 Seminar 51
Step 2: Layout Refinement (6) Critical Line Detection
• Handling occluded lines
Coarse Layout
Vanishing LinesOccluded Lines extension and fill in
15 July 2016 Seminar 52
Step 2: Layout Refinement (7)
Scoring Layout Hypotheses• P :Coarse layout probability output
• L : Layout binary map(dilate by 3 pixels)
(1: layout pixel, 0: background pixel)
• N: Number of layout pixels in L
• S : Score function value P
L
15 July 2016 Seminar 53
Step 2: Layout Refinement (8) Scoring Layout Hypotheses
Score = 0.574 Score = 0.476
Score = 0.326 Score = 0.211
15 July 2016 Seminar 54
Image Coarse Layout Score = 0.209 Score = 0.156 Score = 0.132
Image Coarse Layout Score = 0.188 Score = 0.168 Score = 0.148
Image Coarse Layout Score = 0.259 Score = 0.208 Score = 0.187
15 July 2016 Seminar 55
Performance Results
Method Pixel-wise Error Corner Error
Baseline(Hedau et al. ICCV09) 0.2423 0.1548
UIUC (Mallya et al. ICCV2015) 0.1671 0.1102
DeLay (Dasgupta et al. CVPR2016) 0.1063 0.0820
Ours 0.0757 0.0523
LSUN 2016 Dataset
15 July 2016 Seminar 56
Performance Results
Method Pixel-wise Error
Baseline(Hedau et al. ICCV09) 0.2120
UIUC (Mallya et al. ICCV2015) 0.1283
DeLay (Dasgupta et al. CVPR2016) 0.0973
Ours (ACCV 2016, in submission) 0.0867
UIUC Dataset
15 July 2016 Seminar 57
Visual Results: Best Cases(1)Image Coarse Layout Image Our Result Our Result
15 July 2016 Seminar 58
Visual Results: Best Cases(2)Image Coarse Layout Image Our Result Our Result
15 July 2016 Seminar 59
Visual Results: Worst Cases(1)Image Coarse Layout Image Our Result Our Result
15 July 2016 Seminar 60
Visual Results: Worst Cases(2)Image Coarse Layout Image Our Result Our Result
• Introduction
• Related Work
• Proposed Method
• Conclusion
15 July 2016 Seminar 61
Outline
15 July 2016 Seminar 62
Conclusion
• A simple coarse-to-fine indoor layout estimation framework is proposed.
• The effectiveness of multi-task FCN for coarse layout learning is demonstrated (i.e., jointly learn coarse layout and semantic surface).
• A coarse layout probability based score function is used to score layout hypotheses.
• Possible improvement may be achieved by incorporating object information and increasing training samples for rare layout types.
15 July 2016 Seminar 63
Thank You!
15 July 2016 Seminar 64
References
• V. Hedau, D. Hoiem, and D. Forsyth. Recovering the spatial layout of cluttered rooms. ICCV, 2009.
• J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. CVPR, 2015.
• A. Mallya, and S. Lazebnik. Learning informative edge maps for indoor scene layout prediction. ICCV, 2015.
• S. Dasgupta, et al. DeLay: Robust spatial layout estimation for cluttered indoor scenes. CVPR, 2016.