sun3d: a database of big spaces reconstructed using sfm and...

1
hotel_umd/maryland_hotel3 (JSON XML ) Loaded 0 374 Image Depth Hybrid Clear Delete pillow: 2 pillow: 1 pillow: 3 bed headboard: 1 bed headboard: 2 bed: 1 bed: 2 wall: 1 pillow: 4 lamp: 1 night stand: 1 telephone: 1 sofa chair floor wall: window side wall: pillar 1 wall: pillar 2 curtain: 1 test wall: window 2 wall: window 3 SUN3D: A Database of Big Spaces Reconstructed using SfM and Object Labels Jianxiong Xiao (Princeton) Andrew Owens (MIT) Antonio Torralba (MIT) Motivation References Generalized Bundle Adjustment Multi-view Object Annotation SUN3D: A Place-centric 3D Database “Data-driven” Brute-force SfM VISI N GROUP http://sun3d.cs.princeton.edu 1 meter UW NYU SUN3D Berkeley 0 10000 20000 0% 5% 10% 15% number of frames (30 fps) 0 100 200 300 400 500 0% 5% 10% 15% 20% area coverage (meter 2 ) −90 −60 −30 0 30 60 90 0% 5% 10% camera tilt angle (degree) −90 −60 −30 0 30 60 90 0% 5% 10% camera roll angle (degree) 0 0.5 1 1.5 2 2.5 0% 5% 10% 15% 20% camera height (meter) 0 1000 2000 3000 4000 5000 0% 5% 10% 15% 20% # frames taken within 1m 0 1 2 3 4 5 6 7 8 9 10 0% 5% 10% 15% 20% 25% distance to surface (meter) 0 1 2 3 4 5 6 7 8 9 10 0% 5% 10% 15% 20% 25% mean distance to surface (meter) 0% 25% 50% 75% 100% 0% 5% 10% 15% 20% area explored by the observer 0 1000 2000 3000 4000 0% 10% 20% 30% 40% # frames that a cell is visible 0 10 20 30 40 50 60 70 0% 5% 10% 15% 20% 25% # viewing directions 0% 25% 50% 75% 100% 0% 10% 20% 30% 40% pixels with valid depth Geometric Statistics others corridor(5%) restroom(5%) classroom(6%) office(6%) dorm(6%) lounge(9%) hotel(9%) lab(9%) apartment(17%) conference room(19%) dorm room(3%) corridor(2%) kitchen(2%) playroom(3%) living room(3%) staircase(4%) bedroom(13%) bathroom(20%) others conference room(8%) restroom(6%) escalator(6%) cubicle office(5%) basement(4%) attic(2%) laundromat(2%) View category distribution. Scene Statistics Raw Depth Improved Depth Image Difference Depth Raw Normals Improved Normals Raw Phong Improved Phong Raw TSDF (Ours) Cross-bilateral Filter apartment conference room conference hall restroom classroom dorm hotel room lab lounge office 57 m 2 47 m 2 130 m 2 23 m 2 109 m 2 33 m 2 41 m 2 55 m 2 124 m 2 49 m 2 R 2 ,t 2 R 1 ,t 1 Frame 1 Frame 2 Frame 3 R 3 ,t 3 X 1 X 2 X 3 Frame 1 Frame 2 Frame 3 R 3 ,t 3 X 1 X 2 X 3 R 2 ,t 2 R 1 ,t 1 Bounding Box Constraints ï ï ï ï 0 ï ï ï ï 0 wall painting suitcase toilet headboard chair pillow bathtub curtain bed door cabinet Data & Source Code Available [1] SUN Database: Large-scale Scene Recognition from Abbey to Zoo. J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. CVPR2010. [2] Indoor Segmentation and Support Inference from RGBD Images. N. Silberman, P. Kohli, D. Hoiem and R. Fergus. ECCV2012. [3] Shape Anchors for Data-driven Multi-view Reconstruction. A. Owens, J. Xiao, A. Torralba and W. T. Freeman. ICCV2013. [4] Recognizing Scene Viewpoint using Panoramic Place Representa- tion. J. Xiao, K. A. Ehinger, A. Oliva and A. Torralba. CVPR2012. Source Code: Automatic RGB-D SfM Generalized Bundle Adjustment TSDF Depth Improvement Online Annotation Tool IO functions to read SUN3D Ψ(X, s)= max 0, X - s 2 , -X - s 2 Frame 1448 Frame 1188 BOW td-idf scores 2D View-based SUN database 3D Place-centric SUN3D database SUN3D Data: RGB-D Video for Big Spaces Camera Poses Refined Depth Maps Object Labels (in progress) Point Cloud Corrected Reconstruction and Object Annotations (Colored Based on Semantic Categories) Visualization of the Constraints With Object-to-Object Correspondences Without Object Constraint Segmented Point Cloud Merged from Different Views Idea: Turn object annotations into bounding box constraints to fix loop closure errors. Constrain the points for each object instance to lie inside a single bounding box. NYU Depth V2 SUN3D raw video yes yes coverage area part of a room whole room or multi-rooms typical length hundreds of frames thousands of frames camera poses no yes object label sparse frames whole video instance label within one frame within the whole video depth improvement cross-bilateral filtering multi-frame TSDF filling Differences between NYU and SUN3D Key Frame Key Frame Propagation Result Frame 76 Frame 121 Frame 96 More Propagation Results Existing scene datasets contain only a limited set of views of a place, and they lack representa- tions of complete 3D spaces. We introduce SUN3D, a large-scale RGB-D video database with camera poses and object labels, capturing the full 3D extent of big spaces. Place category distribution Data Capturing The tasks that go into creating such a dataset are difficult in isolation: Labeling videos is painstaking SfM is unreliable for large spaces. But if we combine them together, we make the dataset construction task easier. We use 3D to make object annota- tion easier, and we use object labels to correct large camera pose errors. min c pV(c) ( ˜ x c p - K [R c |t c ] X p 2 + λ 0 ˜ X c p - [R c |t c ] X p 2 )+ λ 1 o c pL(o,c) Ψ([R o |t o ][R c |t c ] -1 ˜ X c p , s o ) 2 where Semantic Object Labeling Good Reconstruction Semantic Segmentation errors Bad Reconstruction TSDF-based Depth Map Improvement RGB-D bundle adjustment Bounding box costs c p V ( c ) o c p L ( o,c ) Use conservative loop closing (high precision and low recall) Capture video in many passes so that similar views appear frequently, and loop closing succeeds.

Upload: others

Post on 28-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SUN3D: A Database of Big Spaces Reconstructed using SfM and …vision.princeton.edu/projects/2013/SUN3D/poster.pdf · 2013-12-07 · SUN3D: A Database of Big Spaces Reconstructed

hotel_umd/maryland_hotel3 (JSON XML) Loaded

0374

Image Depth Hybrid Clear Delete pillow: 2pillow: 1pillow: 3bed headboard: 1bed headboard: 2bed: 1bed: 2wall: 1pillow: 4lamp: 1night stand: 1telephone: 1sofa chairfloorwall: window sidewall: pillar 1wall: pillar 2curtain: 1testwall: window 2wall: window 3

SUN3D: A Database of Big Spaces Reconstructed using SfM and Object LabelsJianxiong Xiao (Princeton) Andrew Owens (MIT) Antonio Torralba (MIT)

Motivation

References

Generalized Bundle Adjustment

Multi-view Object Annotation

SUN3D: A Place-centric 3D Database

“Data-driven” Brute-force SfM

VISI N GROUP

http://sun3d.cs.princeton.edu

1 meter

UW NYU SUN3DBerkeley

0 10000 200000%

5%

10%

15%

number of frames (30 fps)0 100 200 300 400 500

0%

5%

10%

15%

20%

area coverage (meter2)−90 −60 −30 0 30 60 90

0%

5%

10%

camera tilt angle (degree)−90 −60 −30 0 30 60 90

0%

5%

10%

camera roll angle (degree)

0 0.5 1 1.5 2 2.50%

5%

10%

15%

20%

camera height (meter)0 1000 2000 3000 4000 5000

0%

5%

10%

15%

20%

# frames taken within 1m0 1 2 3 4 5 6 7 8 9 10

0%

5%

10%

15%

20%

25%

distance to surface (meter)0 1 2 3 4 5 6 7 8 9 10

0%

5%

10%

15%

20%

25%

mean distance to surface (meter)

0% 25% 50% 75% 100%0%

5%

10%

15%

20%

area explored by the observer0 1000 2000 3000 4000

0%

10%

20%

30%

40%

# frames that a cell is visible0 10 20 30 40 50 60 70

0%

5%

10%

15%

20%

25%

# viewing directions0% 25% 50% 75% 100%

0%

10%

20%

30%

40%

pixels with valid depth

Geometric Statistics

otherscorridor(5%)

restroom(5%)classroom(6%)

office(6%)dorm(6%)lounge(9%)

hotel(9%)lab(9%)

apartment(17%)

conference room(19%)

dorm room(3%)corridor(2%)kitchen(2%)

playroom(3%)living room(3%)

staircase(4%)

bedroom(13%)

bathroom(20%)others

conference room(8%)restroom(6%)

escalator(6%)cubicle office(5%)basement(4%)

attic(2%)laundromat(2%)

View category distribution.

Scene Statistics

Raw Depth

Improved Depth

Image

Difference Depth

Raw Normals

Improved Normals

Raw Phong

Improved Phong

Raw TSDF (Ours) Cross-bilateral Filter

apartment conference room conference hall restroom classroom dorm hotel room lab lounge office57 m2 47 m2 130 m2 23 m2 109 m2 33 m2 41 m2 55 m2 124 m2 49 m2

R2,t2

R1,t1

Frame 1

Frame 2

Frame 3R3,t3

X1

X2

X3

Frame 1 Frame 2 Frame 3R3,t3

X1

X2 X3

R2,t2 R1,t1

Bounding Box Constraints

0

0wall painting suitcase toilet

headboard chair pillow bathtub

curtain bed door cabinet

Data & Source Code Available

[1] SUN Database: Large-scale Scene Recognition from Abbey to Zoo. J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. CVPR2010.[2] Indoor Segmentation and Support Inference from RGBD Images. N. Silberman, P. Kohli, D. Hoiem and R. Fergus. ECCV2012.[3] Shape Anchors for Data-driven Multi-view Reconstruction. A. Owens, J. Xiao, A. Torralba and W. T. Freeman. ICCV2013.[4] Recognizing Scene Viewpoint using Panoramic Place Representa-tion. J. Xiao, K. A. Ehinger, A. Oliva and A. Torralba. CVPR2012.

Source Code:• Automatic RGB-D SfM• Generalized Bundle Adjustment• TSDF Depth Improvement• Online Annotation Tool• IO functions to read SUN3D

Ψ(X, s) =∥∥∥max

(0,X− s

2,−X− s

2

)∥∥∥

Frame 1448 Frame 1188 BOW td-idf scores

2D View-based SUN database 3D Place-centric SUN3D database

SUN3D Data:• RGB-D Video for Big Spaces• Camera Poses• Refined Depth Maps• Object Labels (in progress)• Point Cloud

Corrected Reconstruction and Object Annotations (Colored Based on Semantic Categories)

Visualization of the Constraints

With Object-to-Object CorrespondencesWithout Object Constraint

Segmented Point Cloud Merged from Different Views

Idea: Turn object annotations into bounding box constraints to fix loop closure errors. Constrain the points for each object instance to lie inside a single bounding box.

NYU Depth V2 SUN3Draw video yes yes

coverage area part of a room whole room or multi-roomstypical length hundreds of frames thousands of framescamera poses no yesobject label sparse frames whole video

instance label within one frame within the whole videodepth improvement cross-bilateral filtering multi-frame TSDF filling

Differences between NYU and SUN3D

Key Frame Key FramePropagation Result

Frame 76 Frame 121Frame 96

Propagation ResultPropagation Result

More Propagation Results

Existing scene datasets contain only a limited set of views of a place, and they lack representa-tions of complete 3D spaces. We introduce SUN3D, a large-scale RGB-D video database with camera poses and object labels, capturing the full 3D extent of big spaces.

Place category distribution

Data Capturing

The tasks that go into creating such a dataset are di�cult in isolation:• Labeling videos is painstaking• SfM is unreliable for large spaces. But if we combine them together, we make the dataset construction task easier. We use 3D to make object annota-tion easier, and we use object labels to correct large camera pose errors.

min∑c

∑p∈V(c)

(∥∥x̃c

p−K [Rc|tc]Xp

∥∥2+λ0

∥∥∥X̃cp−[Rc|tc]Xp

∥∥∥2

)+λ1

∑o

∑c

∑p∈L(o,c)

Ψ([Ro|to] [Rc|tc]−1X̃c

p, so)2

where

SemanticObject

Labeling

Good Reconstruction Semantic Segmentation

errors

Bad Reconstruction

err

TSDF-based Depth Map Improvement

RGB-D bundle adjustment Bounding box costs

c p∈V(c) o c p∈L(o,c)

• Use conservative loop closing (high precision and low recall)• Capture video in many passes so that similar views appear frequently, and loop closing succeeds.