render for cnn - home | computer science and...

51
Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views Hao Su * Charles R. Qi * Yangyan Li Leonidas J. Guibas

Upload: hoanghanh

Post on 25-May-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Render for CNN: Viewpoint Estimation in Images Using CNNs Trained

with Rendered 3D Model Views

Hao Su*

Charles R. Qi*

Yangyan LiLeonidas J. Guibas

Page 2: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

ILSVRC Image Classification Top-5 Error (%)

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2015

28.225.8

16.4

11.7

6.73.6

Top

-5 E

rro

r (%

)

Page 3: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Go beyond 2D Image Classification

car• 3D bounding box

• 3D alignment

• 3D model retrieval

Page 4: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Go beyond 2D Image Classification

car

3D Viewpoint Estimation

azimuth

elevation

in-plane rotation

Page 5: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

3D Viewpoint Estimation in the Wild

Images inthe WildModels

unknown

Page 6: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

3D Perception in the Wild

Images in theWild

Modelsunknown

AlexNet [Krizhevsky et al.]

Learn from Data

AlexNet [Krizhevsky et al.]

Page 7: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

However.. Accurate Label Acquisition is Expensive

What’s the camera viewpoint angles tothe SUV in the image?

Page 8: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

PASCAL3D+ dataset [Xiang et al.]

However.. Accurate Label Acquisition is Expensive

Page 9: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

PASCAL3D+ dataset [Xiang et al.]

However.. Accurate Label Acquisition is Expensive

Step1:Choose similar model

Page 10: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

PASCAL3D+ dataset [Xiang et al.]

However.. Accurate Label Acquisition is Expensive

Step2:Coarse ViewpointLabeling

Step1:Choose similar model

Page 11: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

PASCAL3D+ dataset [Xiang et al.]

However.. Accurate Label Acquisition is Expensive

Step2:Coarse ViewpointLabeling

Step3:Label keypointsFor alignment

Step1:Choose similar model

Annotation takes~1 min per object

Page 12: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

30K images with viewpoint labels in PASCAL3D+dataset [Xiang et al.]

High-cost Label Acquisition High-capacity Model

60M parameters. AlexNet [Krizhevsky et al.]

How to get MORE images with ACCURATE viewpoint labels?

Page 13: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Manual alignmentby annotators

Auto alignment through rendering

Page 14: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

http://shapenet.cs.stanford.edu

#total models

#m

od

els

per

clas

s

1,000 1,000,000

PSB 05’

SHREC 14’

10

100

1,000

ModelNet 15’

Good News: ShapeNet

3M models in total330K from 4K categories annotated

ShapeNet(going on)

Page 15: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Key Idea: Render for CNN

Rendering

Viewpoint

Synthetic Images

Training

ShapeNet

Page 16: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Key Idea: Render for CNN

Viewpoint

Real Images

Testing

Page 17: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Rendering

I want data!

How to render data with both quantity and

quality?

Page 18: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Synthesize: Scalability vs Quality

Scalability

Qu

ali

ty

Low High

High

Low

Ideal

Page 19: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Synthesize: Scalability vs Quality

Scalability

Qu

ali

ty

Low High

High

Low

Ideal

Page 20: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Synthesize: Scalability vs Quality

Scalability

Qu

ali

ty

Low High

High

Low

Ideal

Previous works

Page 21: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Synthesize: Scalability vs Quality

Scalability

Qu

ali

ty

Low High

High

Low

Ideal

Sweet spot

Previous works

Page 22: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Synthesize: Scalability vs Quality

Scalability

Qu

ali

ty

Low High

High

Low

Ideal

Sweet spot

Previous works

Story Time!

Page 23: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

A “Data Engineering” Journey

• 80K rendered chair images

• Metric: 16-view classification accuracy tested on real images

At beginning..

• Lighting: 4 fixed point light sources on the sphere

• Background: clean

Page 24: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

A “Data Engineering” Journey

ConvNet:Ah ha, I know!Viewpoint is justthe brightnesspattern!

47% on real test set

95% on synthetic val set

Page 25: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

A “Data Engineering” Journey

ConvNet:Ah ha, I know!Viewpoint is justthe brightnesspattern!

47% on real test set

95% on synthetic val set

Page 26: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

47% -> 74%

A “Data Engineering” Journey

Randomizelighting

ConvNet: hmm.. viewpoint is not the brightnesspattern. Maybe it’s the contour?

Page 27: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

47% -> 74%

A “Data Engineering” Journey

Randomizelighting

ConvNet: hmm.. viewpoint is not the brightnesspattern. Maybe it’s the contour?

Page 28: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

A “Data Engineering” Journey

74% -> 86%

Add backgrounds

ConvNet: It becomes really hard! Let me lookmore into the picture.

Page 29: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

A “Data Engineering” Journey

bbox croptexture

86% -> 93%

Page 30: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

A “Data Engineering” Journey

bbox croptexture

ConvNet: the mapping becomes hard. I have tolearn harder to get it right!

86% -> 93%

Key Lesson: Don’t give CNN a chance to “cheat” - it’s very goodat it. When there is no way to cheat, true learning starts.

Page 31: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Render for CNN Image Synthesis Pipeline

3D model

Rendering Add bkg Crop

Sample lighting andcamera params

Sample bkg. ImageAlpha-blending

Sample croppingparams

Hyper-parameters estimation from real images

Page 32: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Render for CNN Image Synthesis Pipeline

3D model

Rendering

Sample lighting andcamera params

Page 33: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Rendering

Camera params KDE from PASCAL3D+ train set

Lighting paramsRandomly sampled

• Number of light sources

• Light distances

• Light energies

• Light positions

• Light types

Page 34: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Render for CNN Image Synthesis Pipeline

3D model

Rendering Add bkg

Sample lighting andcamera params

Sample bkg. ImageAlpha-blending

Page 35: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Background Composition

• Simple but effective!

• Backgrounds randomly sampled from SUN397 dataset [Xiao et al.]

• Alpha blending composition for natural boundaries

Page 36: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Render for CNN Image Synthesis Pipeline

3D model

Rendering Add bkg Crop

Sample lighting andcamera params

Sample bkg. ImageAlpha-blending

Sample croppingparams

Page 37: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Image Cropping

Cropping patterns KDE from PASCAL3D+ train set

Page 38: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Image Cropping

Cropping patterns KDE from PASCAL3D+ train set

Page 39: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

2.4M Synthesized Images for 12 Categories

• High scalability

• High quality• Overfit-resistant• Accurate labels

Page 40: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Results

Page 41: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

3D Viewpoint Estimation Evaluation

Metric: median angle error (lower the better)

Real test images from PASCAL3D+ dataset

Page 42: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

3D Viewpoint Estimation Evaluation

Metric: viewpoint accuracy and median angle error (lower the better)

Real test images from PASCAL3D+ dataset

Our model trained on rendered images outperforms state-of-the-art model trained on real images in PASCAL3D+.

8

9

10

11

12

13

14

15

16

Vps&Kps(CVPR15)

RenderForCNN(Ours)

Vie

wp

oin

t M

edia

n E

rro

r

Page 43: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

How many 3D models are necessary?

55

60

65

70

75

80

85

90

10 91 1000 6928

#models (for one category)

Acc

ura

cy

10 vs 1000 models20% + difference

Page 44: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

3D Viewpoint Estimation

Page 45: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

0 90 180 270 360 0 90 180 270 360 0 90 180 270 360

0 90 180 270 360

0

90

180

270

Ground truth view

Estimated viewconfidence

airplane bicyclebicycle

boat motorbikecar

Azimuth Viewpoint Estimation

Page 46: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

table chair

monitor

0

90

180

270

Ground truth view

Estimated viewconfidence

chair

sofa sofa

Azimuth Viewpoint Estimation

Page 47: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Failure Cases

sofa occluded by people

car occluded by motorbike

ambiguous car viewpoint

ambiguous chair viewpoint

multiple cars

multiple chairs

Page 48: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Limitations of Current Synthesis Pipeline

• Modeling Occlusions?

• Modeling Background Context?

• Shape database augmentation by interpolation?

Page 49: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Render for CNN – Beyond Viewpoint

• 3D model retrieval• Joint Embedding [Li et al sigasia15]

• Object detection• Segmentation• Intrinsic image decomposition

• Controlled experiments for DL• Vision algorithm verification

Page 50: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

Conclusion

Images rendered from 3D models can be effectively used to train CNNs, especially for 3D tasks. State-of-the-art result has been achieved.

Keys to success

• Quantity: Large scale 3D model collection (ShapeNet)

• Quality: Overfit-resistant, scalable image synthesis pipeline

http://shapenet.cs.stanford.edu

Page 51: Render for CNN - Home | Computer Science and …cseweb.ucsd.edu/~haosu/slides/iccv_renderforcnn_website.pdfA “Data Engineering” Journey •80K rendered chair images •Metric:

THE END

THANK YOU!