panocontext: a whole-room 3d context model for panoramic...
TRANSCRIPT
PanoContext: A Whole-room 3D Context Modelfor Panoramic Scene Understanding
Yinda Zhang1 Shuran Song1 Ping Tan2 Jianxiong Xiao1
1Princeton University 2National University of Singapore
Abstract
We observe that the small field-of-view in standard cam-eras is one of the main reasons that contextual informationis not as useful as it should be for object detection. Toovercome this limitation, we propose a whole-room contextmodel in 3D for a 360◦ full-view panorama. From an in-put panorama, our method outputs a 3D bounding box ofthe room and all major objects inside, together with theirsemantic categories (Fig. 1). To train our model, we con-struct an annotated panorama dataset and reconstruct the3D model from single-view using manual annotation. Ex-periments show that our model can recognize objects us-ing only 3D contextual information without any image fea-ture for categorization, and still achieve a comparable per-formance with the state-of-the-art object detector that onlyuses image features.
1. IntroductionWhile the past decade witnesses rapid progress on
bottom-up object detection methods, the improvementbrought by the top-down context cue is rather limited. Incontrast, there are strong psychophysical evidence that con-text plays a crucial role in scene understanding for humans.We believe that one of the main reasons for this gap is be-cause the field of view (FOV) for a typical camera is onlyabout 15% of that of the human vision system. Therefore,we advocate the use of panoramic images in scene under-standing, which nowadays can be easily obtained by cameraarrays, special lenses, and automatic image stitching.
2. Method and ResultsOur method first generates scene hypotheses (room lay-
out and objects) in a bottom-up fashion from image evi-dence, and then evaluate them holistically by top-down in-formation learned from our dataset. In a panorama, we cansee the whole scene, and characteristic scene objects suchas beds and sofas are usually visible despite occlusion, sothat we can jointly optimize the room layout and object de-
Input: a single-view panorama Output: 3D reconstructionOutput: object detection
bedsofa nightstand
painting
tvmirror
door
window
painting
desk
chair
Figure 1. Input and output. Taken an full-view panorama as in-put, our algorithm can detect all the objects inside the panoramaand represent them as a bounding box in 3D, which also enables3D reconstruction from a single-view.
tection to exploit the contextual information in a variety ofways with its full strength.
Some results for both bedroom and living rooms areshown in Fig. 2, where we can see that the algorithm per-forms reasonably. Using only 3D contextual informationwithout any image feature for categorization, we can stillachieve a comparable performance with state-of-the-art ob-ject detectors (DPM) using image features.
roompainting
nightstanddoormirror cabinet tv
deskwindowbed chairwardrobe
sofatv stand
coffee tabledining table end table
−500
−400
−300
−200
−100
0
100
−1000
100200
−150
−100
−50
0
50
100
150
200 13
5
3
6
11
4
1
8
12
7
9
2
10
−1000
100200 −300
−200
−100
0
100
200
300
−150
−100
−50
0
50
100
7
3
9
2
16
5
10
84
0100
200300
−100
0
100
200
300−150
−100
−50
0
50
100
150
5
83
11
9
10
1
7
6
2
4
0100
200 −300
−200
−100
0
100
−150
−100
−50
0
50
100
35
11
7
4
6
1
10
9
8
2
Figure 2. Example results. The first column is the input panoramaand the output object detection results. The second column showsgenerated cuboid hypotheses. The third column is the results visu-alized in 3D.
1