3d scene understanding from rgb-d imagesfunk/bridges18.pdfintel realsense r200 examples: talk...
TRANSCRIPT
![Page 1: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/1.jpg)
3D Scene Understanding
from RGB-D Images
Thomas Funkhouser
![Page 2: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/2.jpg)
Disclaimer: I am talking about the work of these people …
Shuran Song
Manolis Savva Angel Chang
Yinda Zhang Maciej Halber
Fisher Yu
Andy Zeng Kyle Genova
Cu
rren
t
Ph
.D.
Stu
de
nts
Re
ce
nt
Ph
.D.
Stu
de
nt
Cu
rre
nt
Po
std
oc
s
![Page 3: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/3.jpg)
Motivation
Help devices with RGB-D cameras understand their 3D environments
• Robot manipulation
• Augmented reality
• Virtual reality
• Personal assistance
• Surveillance
• Navigation
• Mapping
• Games
• etc.
![Page 4: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/4.jpg)
Goal
Given a RGB-D image, infer a complete, annotated 3D representation
Input: RGB-D ImageOutput: complete, annotated 3D representation
Colo
r (R
GB
)D
epth
(D
)
Output: complete, annotated 3D representation
Bed
Door
Nightstand Nightstand
Bench
Wall
Wall Picture
Pillow
Free space
![Page 5: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/5.jpg)
Problem
Challenge: get only partial observation of scene, must infer the rest
Side viewInput: RGB-D Image
![Page 6: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/6.jpg)
Problem
Challenge: get only partial observation of scene, must infer the rest
Rotating side viewInput: RGB-D Image
![Page 7: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/7.jpg)
Problem
Challenge: get only partial observation of scene, must infer the rest
Top viewInput: RGB-D Image
![Page 8: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/8.jpg)
Problem
Challenge: get only partial observation of scene, must infer the rest
Top viewInput: RGB-D Image
Beyond
Field of View
![Page 9: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/9.jpg)
Problem
Challenge: get only partial observation of scene, must infer the rest
Top viewInput: RGB-D Image
Beyond
Field of View
Occluded
Regions
![Page 10: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/10.jpg)
Problem
Challenge: get only partial observation of scene, must infer the rest
Missing
Depths
Top viewInput: RGB-D Image
Beyond
Field of View
Occluded
Regions
![Page 11: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/11.jpg)
Problem
Challenge: get only partial observation of scene, must infer the rest
Top view
Missing
Depths
Structure
Free space
Input: RGB-D Image
Beyond
Field of View
Occluded
Regions
![Page 12: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/12.jpg)
Problem
Challenge: get only partial observation of scene, must infer the rest
Top view
Bed
Door
Nightstand Nightstand
Bench
Wall
Wall Picture
Pillow
Missing
Depths
Semantics
Structure
Free space
Input: RGB-D Image
Beyond
Field of View
Occluded
Regions
![Page 13: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/13.jpg)
Talk Outline
Introduction
Three recent projects
• Deep depth completion [CVPR 2018]
• Semantic scene completion [CVPR 2017]
• Semantic view extrapolation [CVPR 2018]
Common themes
Future work
![Page 14: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/14.jpg)
Talk Outline (Part 1)
Introduction
Three recent projects
• Deep depth completion [CVPR 2018]
• Semantic scene completion [CVPR 2017]
• Semantic view extrapolation [CVPR 2018]
Common themes
Future work
Yinda Zhang and Thomas Funkhouser,
“Deep Depth Completion of a Single RGB-D Image,”
CVPR 2018 (spotlight on Tuesday)
![Page 15: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/15.jpg)
Deep Depth Completion
Goal: estimate depths missing from an RGB-D image
Color (RGB)
Raw Depth (D)
Output Depth (D)
![Page 16: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/16.jpg)
Deep Depth Completion
Goal: estimate depths missing from an RGB-D image
Color (RGB)
Raw Depth (D) from Intel R200 camera
Missing
Depth
Shiny
Surfaces
Bright
illumination
Distant
Surfaces
Thin
Structures
Black
Surfaces
![Page 17: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/17.jpg)
Deep Depth Completion
Motivation: help upstream applications “understand” 3D environment
Raw Depth Output Depth
RGB-D images shown as colored 3D point clouds
![Page 18: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/18.jpg)
Deep Depth Completion
Previous work on depth estimation (from RGB):
Sparsity Invariant CNNs[Uhrig, 2017]
Previous work on depth completion (from RGB-D):
Deeper Depth Prediction[Laina, 2016]
Harmonizing Overcomplete Predictions[Chakrabarti, 2016]
Joint Bilateral Filter[Silberman, 2012]
![Page 19: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/19.jpg)
Deep Depth Completion
Problem: estimating depth from color requires global scene understanding
Output DepthInput Color
FCN
![Page 20: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/20.jpg)
Deep Depth Completion
Approach: estimate local surface normals from color,
and then solve for depths globally with system of equations
Output Depth
Input Depth
Input Color Surface Normals
FCNSystem ofEquations
![Page 21: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/21.jpg)
Deep Depth Completion
Rationale 1: estimating surface normals is easier than estimating depths
• Constant within planar regions
• Determined by local shading (for diffuse surfaces)
• Often associated with specific textures
Color Estimated Surface Normals
Y. Zhang, S. Song, E. Yumer, M. Savva, J.-Y. Lee, H. Jin, T. Funkhouser, “Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks,” CVPR 2017
![Page 22: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/22.jpg)
Deep Depth Completion
Rationale 2: depths can be estimated robustly from normals
• Solution is unique for each continuously connected component (up to scale)
r
q
N(p)
p
Non-linear system of equations:
N(p) = (v(p,q) x v(p,r))/||(v(p,q) x v(p,r))||
Linear approximation:
N(p) • v(p,q) = 0
N(p) • v(p,r) = 0
![Page 23: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/23.jpg)
Deep Depth Completion
Rationale 2: depths can be estimated robustly from normals
• Solution is unique for each continuously connected component (up to scale)
r
q
N(p)
p
![Page 24: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/24.jpg)
Deep Depth Completion
Rationale 2: depths can be estimated robustly from normals
• Real-world scenes generally have few (one) continuously connected components
![Page 25: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/25.jpg)
Deep Depth Completion
Rationale 2: depths can be estimated robustly from normals
• We use observed depths and smoothness constraints to guarantee a solution
r
q
N(p)
p
![Page 26: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/26.jpg)
Deep Depth Completion
Rationale 2: depths can be estimated robustly from normals
• Solving the linearized equations guarantees a globally optimal solution
Output Depth
Input Depth
Input Color Surface Normals
FCN
LinearSystem ofEquations
![Page 27: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/27.jpg)
Deep Depth Completion: Data
Where get real training/test data?
Color Raw Depth
Missing
Depth
![Page 28: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/28.jpg)
Deep Depth Completion: Data
Where get real training/test data?
• Complete depths by
rendering RGB-D SLAM
surface reconstructions
(ScanNet, Matteport3D)
ScanNet Surface Reconstruction
Color Raw Depth
A. Dai, A.X. Chang, M. Savva, M. Halber, T. Funkhouser, M. Niessner., “ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017
![Page 29: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/29.jpg)
Deep Depth Completion: Data
Where get real training/test data?
• Complete depths by
rendering RGB-D SLAM
surface reconstructions
(ScanNet, Matteport3D)
Color Raw Depth
A. Dai, A.X. Chang, M. Savva, M. Halber, T. Funkhouser, M. Niessner., “ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017
ScanNet Surface Reconstruction
![Page 30: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/30.jpg)
Deep Depth Completion: Data
Where get real training/test data?
• Complete depths by
rendering RGB-D SLAM
surface reconstructions
(ScanNet, Matteport3D)
Rendered DepthColor Raw Depth
A. Dai, A.X. Chang, M. Savva, M. Halber, T. Funkhouser, M. Niessner., “ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017
ScanNet Surface Reconstruction
![Page 31: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/31.jpg)
Deep Depth Completion: Results
Comparisons to other depth completion methods:
[5] J. T. Barron and B. Poole. The fast bilateral solver. ECCV 2016.[6] D. Garcia. Robust smoothing of gridded data in one and higher dimensions with missing values. Comp. stat. & data anal., 2010.[13] Y. Zhang et al. Physically-based rendering for indoor scene understanding using convolutional neural networks. CVPR 2017.[20] D. Ferstl et al. Image guided depth upsampling using anisotropic total generalized variation. ICCV 2013.[64] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from rgbd images. ECCV 2012.
![Page 32: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/32.jpg)
Deep Depth Estimation: Results
Comparison to other depth estimation methods:
Laina [37]
Chakr. [7]
Laina [37]
Chakr. [7]
[7] Chakrabarti, A. et al., Depth from a single image by harmonizing overcomplete local network predictions. NIPS 2016.[37] Laina, C. et al., Deeper depth prediction with fully convolutional residual networks. 3DV 2016.
![Page 33: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/33.jpg)
Color Image Sensor Depth Completed Depth
Sensor Point Cloud Completed Point Cloud
Deep Depth Completion: Results
Intel RealSense R200 examples:
![Page 34: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/34.jpg)
Color Image Sensor Depth Completed Depth
Sensor Point Cloud Completed Point Cloud
Deep Depth Completion: Results
Intel RealSense R200 examples:
![Page 35: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/35.jpg)
Talk Outline (Part 2)
Introduction
Three recent projects
• Deep depth completion [CVPR 2018]
• Semantic scene completion [CVPR 2017]
• Semantic view extrapolation [CVPR 2018]
Common themes
Future workShuran Song, Fisher Yu, Andy Zeng,
Angel Chang, Manolis Savva, and Thomas Funkhouser,
“Semantic Scene Completion from a Single Depth Image,”
CVPR 2017 (oral)
![Page 36: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/36.jpg)
Input: Single view depth map Output: Semantic scene completion
Semantic Scene Completion
Goal: estimate the semantics and geometry occluded from a depth camera
RGB-D Image
![Page 37: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/37.jpg)
3D Scene
visible surface
free space
occluded space
outside view
outside room
Semantic Scene Completion
Formulation: given a depth image, label all voxels by semantic class
![Page 38: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/38.jpg)
visible surface
free space
occluded space
outside view
outside room
3D Scene
Semantic Scene Completion
Formulation: given a depth image, label all voxels by semantic class
![Page 39: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/39.jpg)
semantic scene completion
This paper
scene completion Firman et al.
surface segmentation Silberman et al.
The occupancy and the object identity
are tightly intertwined !
3D Scene
Semantic Scene Completion
Prior work: segmentation OR completion
![Page 40: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/40.jpg)
Semantic Scene Completion
Approach: end-to-end 3D deep network
Prediction: N+1 classes
Simultaneously predict voxel occupancy and semantics classes by a single forward pass.
Input:
Single view depth map
Output:
Volumetric occupancy + semantics
SSCNet
![Page 41: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/41.jpg)
Semantic Scene Completion: Network Architecture
![Page 42: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/42.jpg)
Semantic Scene Completion: Network Architecture
![Page 43: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/43.jpg)
Voxel size: 0.02 m
Semantic Scene Completion: Network Architecture
![Page 44: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/44.jpg)
Voxel size: 0.02 m
Semantic Scene Completion: Network Architecture
Standard TSDFView
![Page 45: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/45.jpg)
Encode 3D space using flipped TSDFVoxel size: 0.02 m
Semantic Scene Completion: Network Architecture
Flipped TSDFStandard TSDFView
![Page 46: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/46.jpg)
Receptive field: 0.98 m Receptive field:1.62 m Receptive field: 2.26 m
Semantic Scene Completion: Network Architecture
Extract features for different physical scalesVoxel size: 0.02 m
![Page 47: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/47.jpg)
Semantic Scene Completion: Network Architecture
Larger receptive field with
same number of parameters
and same output resolution!
Dilated Convolutions
learnable parameterreceptive field
Receptive Field = 7x7x7
Parameters = 27
F. Yu et al., Multi-Scale Context Aggregation by Dilated Convolutions, ICLR 2016
![Page 48: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/48.jpg)
Semantic Scene Completion: Data
Where get training data?
NYUv2Small number of objects labeled with CAD models
(suitable for testing, not training)
N. Silberman, P. Kohli, D. Hoiem, R. Fergus, Indoor Segmentation and Support Inference from RGBD Images, ECCV 2012
R. Guo, C. Zou, D. Hoiem, Predicting Complete 3D models of Indoor Scenes, arXiv 2015
![Page 49: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/49.jpg)
Semantic Scene Completion: Data
SUNCG dataset
• 46K houses
• 50K floors
• 400K rooms
• 5.6M object instances
![Page 50: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/50.jpg)
Semantic Scene Completion: Data
SUNCG dataset
synthetic camera views depth
ground truth
semantic scene
completion
![Page 51: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/51.jpg)
Semantic Scene Completion: Experiments
Pre-train on SUNCG Fine-tune and test on NYUv2
![Page 52: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/52.jpg)
Semantic Scene Completion: Results
Ground TruthOur Result
Input Color
Input Depth
![Page 53: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/53.jpg)
Semantic Scene Completion: Results
Ground TruthOur Result
Input Color
Input Depth
![Page 54: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/54.jpg)
Semantic Scene Completion: Results
Result 1: better than previous volumetric completion algorithms
Comparison to previous algorithms for volumetric completion
![Page 55: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/55.jpg)
Semantic Scene Completion: Results
Result 2: better than previous semantic labeling algorithms
Comparison to previous algorithms for semantic labeling with 3D model fitting
![Page 56: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/56.jpg)
Talk Outline (Part 3)
Introduction
Three recent projects
• Deep depth completion [CVPR 2018]
• Semantic scene completion [CVPR 2017]
• Semantic view extrapolation [CVPR 2018]
Common themes
Future workShuran Song, Andy Zeng, Angel X. Chang,
Manolis Savva, Silvio Savarese, and Thomas Funkhouser,
“Im2Pano3D: Extrapolating 360 Structure and Semantics
Beyond the Field of View,”
CVPR 2018 (oral)
![Page 57: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/57.jpg)
Input: RGB-D Image
Semantic View Extrapolation
Goal: given an RGB-D image, predict 3D structure and semantics outside view
Output 1: 3D structure
BedBed
nightstand
door
chair
ceilingceiling
floor
Output 2: semantic segmentation°
360°
![Page 58: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/58.jpg)
Semantic View Extrapolation
Input:
RGB-D Image
![Page 59: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/59.jpg)
Wall
Window
Bed
Nightstand
Semantic View Extrapolation
Input:
RGB-D Image
Output:
360° panorama
with 3D structure
& semantics
360°
![Page 60: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/60.jpg)
Semantic View Extrapolation
Prior work: extrapolating appearance (color) outside field of view
Pathak et al. CVPR 2017
![Page 61: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/61.jpg)
Semantic View Extrapolation
Our work: predicting 3D structure and semantics for full 360° panorama
3D structure
BedBed
nightstand
door
chair
ceilingceiling
floor
Semantic segmentation
360°
![Page 62: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/62.jpg)
Semantic View Extrapolation
3D structure representation: plane equation per pixel (normal and offset)
ax + by + cz - d=0
Plane Equation
(a,b,c) = normal d = plane offset from origin
Similar to first project
![Page 63: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/63.jpg)
Semantic View Extrapolation: Network Architecture
Scene attribute losses:
Scene category
Object distribution
Pixel-wise loss
Adversarial loss
![Page 64: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/64.jpg)
Semantic View Extrapolation: Training Objectives
![Page 65: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/65.jpg)
• Lose the ability to generalize.
• Hard for even humans to do.
Every pixel is
correct
Prediction
Ground truth
Semantic View Extrapolation: Training Objectives
![Page 66: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/66.jpg)
Adversarial loss
Real or fake
Goodfellow et al. 2014
Prediction is
plausible
Prediction
Every pixel is
correct
Semantic View Extrapolation: Training Objectives
G:generator D: discriminator
![Page 67: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/67.jpg)
Prediction is
plausible Similar scene
attributes
Object Distribution
Every pixel is
correct
Scene Category
Semantic View Extrapolation: Training Objectives
Prediction Ground truth
wa
ll
flo
or
ce
ilin
g
…
ch
air … …
wa
ll
flo
or
ce
ilin
g
…
ch
air … …
![Page 68: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/68.jpg)
Prediction is
plausible Similar scene
attributeEvery pixel is
correct
Semantic View Extrapolation: Training Objectives
Object Distribution
Scene Category
Prediction Ground truth
![Page 69: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/69.jpg)
Every pixel is
correct
Similar scene
attribute
Prediction is
plausible
Semantic View Extrapolation: Training Objectives
![Page 70: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/70.jpg)
Semantic View Extrapolation: Network Architecture
Scene attribute losses:
Scene category
Object distribution
Pixel-wise loss
Adversarial loss
![Page 71: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/71.jpg)
Semantic View Extrapolation: Data
Where get training/test data?
3D structure
BedBed
nightstand
door
chair
ceilingceiling
floor
Semantic segmentation
![Page 72: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/72.jpg)
Semantic View Extrapolation: Data
Matterport3D dataset
Matterport Camera
3D Building Reconstruction
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, Y Zhang, “Matterport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
![Page 73: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/73.jpg)
Semantic View Extrapolation: Data
Matterport3D dataset
Matterport Camera
3D Building Reconstruction
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, Y Zhang, “Matterport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
![Page 74: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/74.jpg)
Semantic View Extrapolation: Data
Matterport3D dataset
Matterport Camera
RGB-D Panorama
with Semantics
3D Building Reconstruction
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, Y Zhang, “Matterport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
![Page 75: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/75.jpg)
Semantic View Extrapolation: Experiments
Pre-train on SUNCG
58,866 synthetic panoramas
Fine-tune and test on Matterport3D
5,315 real panoramas
![Page 76: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/76.jpg)
Semantic View Extrapolation: Results
Input Observation
![Page 77: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/77.jpg)
Semantic View Extrapolation: Results
Ceiling
BedWall
Floor
Prediction
![Page 78: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/78.jpg)
Semantic View Extrapolation: Results
Prediction
Bed
Object
Window
Ground truth
![Page 79: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/79.jpg)
Prediction
Bed
Object
Window
Ground truth
Semantic View Extrapolation: Results
![Page 80: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/80.jpg)
Prediction
Bed
Object
Window
Ground truth
Semantic View Extrapolation: Results
![Page 81: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/81.jpg)
Prediction
Bed
Object
Window
Ground truth
Semantic View Extrapolation: Results
![Page 82: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/82.jpg)
Prediction
Bed
Object
Window
Ground truth
Semantic View Extrapolation: Results
![Page 83: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/83.jpg)
0
0.055
0.11
0.165
0.22
Semantic Accuracy (IoU)
0
0.225
0.45
0.675
0.9
1.125
3D Structure Error (L2)
Ours
Semantic View Extrapolation: Results
Comparison to alternative completion methods
Nearest
Two-Step
Ours
Nearest Two-Step
Input
Image Inpainting Two Step Approach
Ours
![Page 84: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/84.jpg)
Summary
Scene understanding from partial observation …
Bed
Door
Nightstand Nightstand
Bench
Wall
Wall Picture
Pillow
Structure
Free space
Output: complete, annotated 3D representationInput: RGB-D Image
Semantics
![Page 85: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/85.jpg)
Talk Outline
Introduction
Three recent projects
• Deep depth completion [CVPR 2018]
• Semantic scene completion [CVPR 2017]
• Semantic view extrapolation [CVPR 2018]
Common themes
Future work
![Page 86: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/86.jpg)
Common Themes
Geometric representation
• Choice of 3D representation is critical
• Choosing the most obvious representation is usually not best
Large-scale context
• Global context is very important … even for simply estimating depth
• Can leverage larger contexts with global minimization, dilated convolutions, etc.
3D Dataset curation
• Synthetic 3D datasets very useful for training
• Real 3D datasets are important for testing. More needed
![Page 87: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/87.jpg)
Common Themes
Geometric representation
• Choice of 3D representation is critical
• Choosing the most obvious representation is usually not best
Large-scale context
• Global context is very important … even for simply estimating depth
• Can leverage larger contexts with global minimization, dilated convolutions, etc.
3D Dataset curation
• Synthetic 3D datasets very useful for training
• Real 3D datasets are important for testing. More needed
![Page 88: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/88.jpg)
Geometric representation
• Choice of 3D representation is critical
• Choosing the most obvious representation is usually not best
Large-scale context
• Global context is very important … even for simply estimating depth
• Can leverage larger contexts with global minimization, dilated convolutions, etc.
3D Dataset curation
• Synthetic 3D datasets very useful for training
• Real 3D datasets are important for testing. More needed
Common Themes
Surface Normals Plane EquationsFlipped TSDF
![Page 89: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/89.jpg)
Common Themes
Geometric representation
• Choice of 3D representation is critical
• Choosing the most obvious representation is usually not best
Large-scale context
• Global context is very important … even for simply estimating depth
• Can leverage larger contexts with global minimization, dilated convolutions, etc.
3D Dataset curation
• Synthetic 3D datasets very useful for training
• Real 3D datasets are important for testing. More needed
![Page 90: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/90.jpg)
Geometric representation
• Choice of 3D representation is critical
• Choosing the most obvious representation is usually not best
Large-scale context
• Global context is very important … even for simply estimating depth
• Can leverage larger contexts with global minimization, dilated convolutions, etc.
3D Dataset curation
• Synthetic 3D datasets very useful for training
• Real 3D datasets are important for testing. More needed
Common Themes
Dilated
Convolutions
Global Solution to
Linear System of Equations
Panoramic
Representations
![Page 91: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/91.jpg)
Common Themes
Geometric representation
• Choice of 3D representation is critical
• Choosing the most obvious representation is usually not best
Large-scale context
• Global context is very important … even for simply estimating depth
• Can leverage larger contexts with global minimization, dilated convolutions, etc.
3D Dataset curation
• Synthetic 3D datasets very useful for training
• Real 3D datasets are important for testing. More needed
![Page 92: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/92.jpg)
Common Themes
Geometric representation
• Choice of 3D representation is critical
• Choosing the most obvious representation is usually not best
Large-scale context
• Global context is very important … even for simply estimating depth
• Can leverage larger contexts with global minimization, dilated convolutions, etc.
3D Dataset curation
• Synthetic 3D datasets very useful for training
• Real 3D datasets are important for testing. More needed
Largest 3D datasets available today for indoor environments
Synthetic RGB-D Image RGB-D Video
Object ShapeNet Intel RealSense Redwood
Room SUNCG SUN RGB-D ScanNet
Multiroom SUNCG Matterport3D SUN3D
![Page 93: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/93.jpg)
Talk Outline
Introduction
Three recent projects
• Deep depth completion [CVPR 2018]
• Semantic scene completion [CVPR 2017]
• Semantic view extrapolation [CVPR 2018]
Common themes
Future work
![Page 94: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/94.jpg)
Future work
Large-scale scenes
Self-supervision
Active sensing
![Page 95: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]](https://reader033.vdocuments.mx/reader033/viewer/2022053001/5f055e6a7e708231d4129e66/html5/thumbnails/95.jpg)
Acknowledgments
Princeton students and postdocs:• Angel X. Chang, Kyle Genova, Maciej Halber, Manolis Savva, Elena Sizikova,
Shuran Song, Fisher Yu, Yinda Zhang, Andy Zeng
Google collaborators:• Martin Bokeloh, Alireza Fathi, Sean Fanello, Aleksey Golovinskiy, Shahram Izadi, Sameh
Khamis, Adarsh Kowdle, Johnny Lee, Christoph Rhemann, Jurgen Sturm, Vladimir Tankovich,
Julien Valentin, Stefan Welker
Other collaborators:• Angela Dai, Vladlen Koltun, Matthias Niessner, Alberto Rodriquez, Silvio Savarese,
Yifei Shi, Jianxiong Xiao, Kai Xu
Data:• SUN3D, NYU, Trimble, Planner5D, Matterport
Funding:• NSF, Google, Intel, Facebook, Amazon, Adobe, Pixar
Thank You!