panocontext: a whole-room 3d context model for panoramic...

PanoContext: A Whole-room 3D Context Modelfor Panoramic Scene Understanding

Yinda Zhang1 Shuran Song1 Ping Tan2 Jianxiong Xiao1

1Princeton University 2National University of Singapore

Abstract

We observe that the small field-of-view in standard cam-eras is one of the main reasons that contextual informationis not as useful as it should be for object detection. Toovercome this limitation, we propose a whole-room contextmodel in 3D for a 360◦ full-view panorama. From an in-put panorama, our method outputs a 3D bounding box ofthe room and all major objects inside, together with theirsemantic categories (Fig. 1). To train our model, we con-struct an annotated panorama dataset and reconstruct the3D model from single-view using manual annotation. Ex-periments show that our model can recognize objects us-ing only 3D contextual information without any image fea-ture for categorization, and still achieve a comparable per-formance with the state-of-the-art object detector that onlyuses image features.

1. IntroductionWhile the past decade witnesses rapid progress on

bottom-up object detection methods, the improvementbrought by the top-down context cue is rather limited. Incontrast, there are strong psychophysical evidence that con-text plays a crucial role in scene understanding for humans.We believe that one of the main reasons for this gap is be-cause the field of view (FOV) for a typical camera is onlyabout 15% of that of the human vision system. Therefore,we advocate the use of panoramic images in scene under-standing, which nowadays can be easily obtained by cameraarrays, special lenses, and automatic image stitching.

2. Method and ResultsOur method first generates scene hypotheses (room lay-

out and objects) in a bottom-up fashion from image evi-dence, and then evaluate them holistically by top-down in-formation learned from our dataset. In a panorama, we cansee the whole scene, and characteristic scene objects suchas beds and sofas are usually visible despite occlusion, sothat we can jointly optimize the room layout and object de-

Input: a single-view panorama Output: 3D reconstructionOutput: object detection

bedsofa nightstand

painting

tvmirror

door

window

painting

desk

chair

Figure 1. Input and output. Taken an full-view panorama as in-put, our algorithm can detect all the objects inside the panoramaand represent them as a bounding box in 3D, which also enables3D reconstruction from a single-view.

tection to exploit the contextual information in a variety ofways with its full strength.

Some results for both bedroom and living rooms areshown in Fig. 2, where we can see that the algorithm per-forms reasonably. Using only 3D contextual informationwithout any image feature for categorization, we can stillachieve a comparable performance with state-of-the-art ob-ject detectors (DPM) using image features.

roompainting

nightstanddoormirror cabinet tv

deskwindowbed chairwardrobe

sofatv stand

coffee tabledining table end table

−500

−400

−300

−200

−100

0

100

−1000

100200

−150

−100

−50

0

50

100

150

200 13

5

3

6

11

4

1

8

12

7

9

2

10

−1000

100200 −300

−200

−100

0

100

200

300

−150

−100

−50

0

50

100

7

3

9

2

16

5

10

84

0100

200300

−100

0

100

200

300−150

−100

−50

0

50

100

150

5

83

11

9

10

1

7

6

2

4

0100

200 −300

−200

−100

0

100

−150

−100

−50

0

50

100

35

11

7

4

6

1

10

9

8

2

Figure 2. Example results. The first column is the input panoramaand the output object detection results. The second column showsgenerated cuboid hypotheses. The third column is the results visu-alized in 3D.

1

panocontext: a whole-room 3d context model for panoramic...

Documents