![Page 1: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/1.jpg)
Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning
Allan Zelener
Dissertation Proposal
December 12th 2016
![Page 2: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/2.jpg)
Overview
1. Introduction to 3D Object Identification
2. Completed Work
• Part-based Object Classification of Vehicle Point Clouds.
• CNN-based Object Segmentation in LIDAR with Missing Points.
3. Proposed Work
• Joint localization, segmentation, classification, and 3D pose estimation.
• Depth-sensitive localization.
• Depth-sensitive subpixel methods for segmentation.
• Spatial transformers for pose estimation.
• Domain adaptation and shape completion from synthetic data.
• Timeline for completion.
![Page 3: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/3.jpg)
Identifying 3D Objects
• Real world objects have a 3D shape and a position in a 3D scene.
• Objects may be oriented with respect to some reference pose.
• These object properties are associated with their semantic class.
![Page 4: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/4.jpg)
Identifying 3D Objects
![Page 5: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/5.jpg)
Identifying Objects in 2D Images
Fei-Fei, Karpathy, Johnson (http://cs231n.stanford.edu/slides/winter1516_lecture13.pdf)
![Page 6: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/6.jpg)
Identifying 3D Objects in 2D Images
• 3D oriented CAD models mapped to 2D image regions.
• Approximate 3D shape based on selected models.
• Relative 3D position and scale may still be ambiguous.
• Visual perspective cues required to estimate object properties.
Yu et al., ObjectNet3D: A Large Scale Database for 3D Object Recognition
![Page 7: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/7.jpg)
Identifying 3D Objects in 3D Images
Song et al., SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite
• 3D sensors provide accurate pointwise depth measurements.• Object position and scale can be determined from a single 3D image.
![Page 8: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/8.jpg)
Challenges in 3D Images
• Missing measurements due to sensor properties.
• Partial 3D data based on limited viewpoints.
• Difficult large-scale annotation compared to 2D images.
• Feature representations for 3D properties.
Manual Labeling of 3D Point Cloud
![Page 9: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/9.jpg)
Completed Work
• Classification of Vehicle Parts in Unstructured 3D Point Clouds
• RANSAC point clustering for planar parts.
• Part-based structured model for classifying parts and overall object class.
Classification of Vehicle Parts in Unstructured 3D Point Clouds,Zelener, Mordohai, and Stamos, 3DV, 2014.
![Page 10: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/10.jpg)
Local Feature Extraction
• Density weighted spin images.
• Dense sampling of keypoints on a uniformly spaced voxel grid.
• Normals oriented away from center of object centroid.
• K-means clustering to generatebag-of-words codebook.
• Baseline object descriptor is normalized count vector of codebook features.
K-Means Spin Image Codebook𝑘 = 50
![Page 11: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/11.jpg)
Automatic Part Segmentation
• Iterative RANSAC plane fitting.
• Candidate planes from faces of convex hull.
• Robust re-estimation of planes using PCA.
• For vehicles, five planar parts cover most of the surface
Colored by Segmentation Order
Convex Hull Examples
![Page 12: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/12.jpg)
Part-Level Features
• Spin image bag-of-words.
• Average height ഥ𝒉.
• Horizontal/vertical indicator 𝐼 𝑛 = ൝0, if 𝒏𝑇𝒛 > cos
𝜋
4
1, otherwise
• Mean, median, and max of plane fit errors.
• Eigenvalues from plane fitting 𝜆1, 𝜆2, 𝜆3 (in descending order).
• Linearity (𝜆1−𝜆2) and Planarity 𝜆2 − 𝜆3 .
![Page 13: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/13.jpg)
Pairwise Part Features
• Dot product of normals, 𝒏𝟏𝑇 ⋅ 𝒏𝟐
• Absolute difference in average heights, 𝒉𝟏 − 𝒉𝟐
• Distance between centroids, 𝒄𝟏 − 𝒄𝟐
• Closest distance between points, min𝑖∈𝑃1,𝑗∈𝑃2
𝒑𝟏,𝒊 − 𝒑𝟐,𝒋
• Coplanarity as mean, median, and max cross-plane fit errors.
![Page 14: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/14.jpg)
Structured Part Modeling
• Generalized HMM as sequence of parts and final class variable.
• Trained discriminatively by structured averaged perceptron.
• Parts reordered in sequence based on 𝐼(𝑛) and average height.
a1 a2 an⋯
x1 x2 xn
c
x1 x2 … xn
![Page 15: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/15.jpg)
Experimental Results for Part Classification
• Evaluation on Ottawa dataset with 155 sedans and 67 SUVs.
• Structured part modeling provides increased performance for part classification.
• Manual segmentation provides increase for classification of all parts per object.
Part Classification Comparison
![Page 16: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/16.jpg)
Experimental Results for Object Classification
• SP gives significant gains over baseline perceptron model.
• Manual segmentation with SP exceeds unstructured baselines.
Sedan vs SUV Object ClassificationNo Part Segmentation Part Segmentation
![Page 17: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/17.jpg)
Comparison Between Automatic andManual Segmentation• Under-segmentation from
unbounded plane fitting.
• Merged semantic part classes like roof-hood and roof-trunk.
• Inconsistent labeling behavior at boundaries and noisy points.
Automatic
Manual
![Page 18: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/18.jpg)
Conclusions for Part-based Classification
PROS
• RANSAC segmentation is robust to many complexities of 3D data.
• Structured part-based method shows improvement over bag-of-words with local features.
• Pairwise features based on geometric properties improve classification performance.
CONS
• RANSAC segmentation is not equivalent to semantic segmentation.
• Labeling ground truth parts for every possible object class may be infeasible.
• RANSAC segmentation, features, and structure model are determined before training the classifier.
![Page 19: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/19.jpg)
CNN-Based Object Segmentation• Segmentation on LIDAR
scanning grid with missing points.
• CNN training procedure for LIDAR data.
• CNN-based features extracted from small set of initial feature maps for 3D images.
CNN-Based Object Segmentation in Urban LIDAR with Missing Points,Zelener and Stamos, 3DV, 2016.
![Page 20: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/20.jpg)
Missing Points in LIDAR
• Contiguous LIDAR scanlines form 2.5D grid of scanner measurements.
• Laser reflection causes missing points on objects in the grid.
• We can label and infer over these positions.
Missing Points in Gray on Scanning Grid
Missing Points on Vehicles are Labeled
![Page 21: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/21.jpg)
Preprocessing Pipeline
• Sample positive and negative locations in large LIDAR scene piece.
• Extract 𝑀 × 𝑀 patch as input to CNN.
• Predict labels for central 𝐾 × 𝐾 region, 𝐾 ≤ 𝑀. (𝑀 = 64, 𝐾 = 8)
![Page 22: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/22.jpg)
Initial Feature Maps• Compute normalized feature maps from 3D points in 𝑀 × 𝑀 patch.
• Assume ~𝒩(0,1) truncated to [−6, 6] within each patch.
• Missing data given max value (6) in clip range.
Relative Depth Relative Height-6
6
0
![Page 23: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/23.jpg)
Initial Feature Maps• Angle and missing mask describe sensor properties.
• Angle normalized as before and missing mask in {0,1}.
Angle Missing Mask0
1
-6
6
0
![Page 24: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/24.jpg)
Initial Feature Maps
• Signed Angle from Hadjiliadis and Stamos. 3DPVT 2010.
Signed Angle
-6
6
0
ො𝒛
𝑝1
𝑝2
𝑝3
𝑣1
𝑣2
Scanning Direction
𝑆𝑖𝑔𝑛𝑒𝑑𝐴𝑛𝑔𝑙𝑒 𝑝2 = acos( Ƹ𝑧 ⋅ ො𝑣2) ⋅ sgn 𝑣1 ⋅ 𝑣2
• Horizontal surfaces at 90 degrees.• Vertical surfaces at 0 degrees.• Sharp changes yield negative sign.
![Page 25: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/25.jpg)
Model Overview
• Baseline CNN architecture.
• ReLU nonlinear activation functions.
• L2-regularization on affine layers.
• Dropout regularization on final layer.
• Predict binary label for each point in the 𝐾 × 𝐾 target.
• Total model loss is
Input Patch
Conv 5 × 5
Conv 5 × 5
Affine
Affine
64 (= 𝐾2)
512
(16, 16, 64)
(64, 64, 5)
(32, 32, 32)
Output Labels
ℒ 𝒙, 𝒚 = −
𝑘=1
𝐾2
[𝑦𝑘 log 𝑝𝑘 + (1 − 𝑦𝑘)log (1 − 𝑝𝑘)] +𝜆
2
𝑙=1
𝐿
𝑊𝑙 22
Binary Cross Entropy L2-Regularization
![Page 26: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/26.jpg)
Results from Vehicle Point Detection using CNN [patch size = 64 x 64, target size = 8 x 8]
nyc_0 (in-sample) test piece
nyc_1 test piece
True Positive – YellowTrue Negative – Dark Blue
False Positive – CyanFalse Negative – Orange
![Page 27: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/27.jpg)
True Positive – YellowTrue Negative – Dark Blue
False Positive – CyanFalse Negative – Orange
Nyc_0 (In-sample)Test Recall .85, Precision .73
![Page 28: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/28.jpg)
True Positive – YellowTrue Negative – Dark Blue
False Positive – CyanFalse Negative – Orange
Nyc_1Test Recall .85, Precision .73
![Page 29: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/29.jpg)
Experimental ResultsInput Feature Map Comparison
D – Depth, H – Height, A – Angle, S – Signed Angle, M - Missing Mask
![Page 30: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/30.jpg)
Impact of Using Missing Point Labels
• Training with missing point labels improves precision.• Missing point labels allow for complete segmentation.
DHASM with Missing Point Labels
DHASM with No Missing Point Labels
![Page 31: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/31.jpg)
Experimental Results
Use of Missing Point Labels
NML – No Missing Labels
![Page 32: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/32.jpg)
Conclusions for CNN-Based Segmentation
• CNN for LIDAR learned using a sampling based training pipeline.
• We can predict class labels over missing points in LIDAR.
• Incorporating missing points improves precision.
• Input feature maps that describe 3D shape and sensor properties have a significant effect on performance.
![Page 33: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/33.jpg)
Proposed Work
• Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in 3D images.
• Examine design and structure of CNN components for 3D images:• Depth-sensitive localization.
• Depth-subpixel methods for segmentation.
• Spatial transformer for pose estimation.
• Utilize domain adaptation from synthetic data for auxiliary training data and missing point reconstruction.
![Page 34: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/34.jpg)
Novelty of Proposed Work
• Multi-task model for all tasks.• Previous models only address up to three of the proposed tasks.
• Addition of 3D object pose estimation.
• Improve performance on all tasks by integrating algorithms of current state-of-the-art techniques for the domain of 3D objects.• Balance between 2.5D image and 3D voxel representation.
• Incorporation of additional datasets.• Comparison across urban LIDAR and indoor RGB-D domains.
• Missing point estimation from synthetic data or multi-view reconstruction.
• Domain adaptation from synthetic datasets.
![Page 35: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/35.jpg)
2D Object Localization in LIDAR (In Progress)
• Preliminary results at 0.8 confidence threshold.
• Based on YOLO single-shot architecture.
• Can be used for region proposal or extended to 3D bounding boxes.
![Page 36: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/36.jpg)
Automatic fit of bounding boxesPCA to fit non-axis aligned boxesManual tool to
(a) select front face (different color) for orientation(default is selected automatically)
(b) change size/position/orientation of boxes in case of incomplete objects
Google Street View DatasetGround Truth Pose Labeling
![Page 37: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/37.jpg)
Multi-task Model for Object Identification• Shared representation can be applicable for multiple tasks.
• Tasks: Object localization, segmentation, classification, and pose estimation.
• Error signal for each task trains weights for shared representation.
Source: Dai et al., Instance-aware Semantic Segmentation via Multi-task Network Cascades
![Page 38: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/38.jpg)
Multi-task Model for Object Identification
• Straightforward extension to orientation estimation.
• Assume objects are upright, estimate rotation about gravity axis.
Source: Dai et al., Instance-aware Semantic Segmentation via Multi-task Network Cascades
oriented
![Page 39: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/39.jpg)
Localization for 3D Objects in Voxel Space
• 3D voxel input representation (TSDF).
• Voxel gives relative position, anchor box gives shape prior.
• Network estimates adjustments for box position and dimensions.
Source: Song and Xiao, Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images
![Page 40: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/40.jpg)
Depth-Sensitive Localization
• We aim to maintain a non-volumetric 2.5D input representation.
• Partition viewing volume and consider localization in depth slices.
𝑧1
𝑧2
𝑧3
𝑧4
2.5D Input
2D Conv𝑎𝑧
𝑎𝑥
𝑎𝑦
(𝑋, 𝑌, 𝑍 × 𝐴 × 6)
Conv 3D Box𝐻
𝑊
(𝑊, 𝐻, 𝐹𝑖𝑛)
𝑏 ො𝑥 = 𝑥𝑖 + 𝑑𝑥𝑏 ො𝑦 = 𝑦𝑖 + 𝑑𝑦
𝑏 Ƹ𝑧 = 𝑧𝑖 + 𝑑𝑧
𝑏width = 𝑎𝑥 ∗ 𝑠𝑥𝑏height = 𝑎𝑦 ∗ 𝑠𝑦
𝑏depth = 𝑎𝑧 ∗ 𝑠𝑧
![Page 41: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/41.jpg)
Subpixel Convolutions
• Pooled CNN features can still encode higher resolution information.
• Upscale back through “deconvolution” or subpixel convolution.
• Used in state-of-the-art segmentation networks.
Source: Shi et al., Is the deconvolution layer the same as a convolutional layer?
Padded Image
Zero-padded Sub-Pixel Image
Subpixel Filter
Filter Activations
![Page 42: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/42.jpg)
Subpixel Convolutions
• Independent subpixel filter weights can be separated.
• All convolutions are in low resolution then interleaved to upsample at the end of the network.
Source: Shi et al., Is the deconvolution layer the same as a convolutional layer?
Padded ImageSeparate Filters
Filter Activations
Combined Filter Activations
![Page 43: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/43.jpg)
Position-sensitive Score Maps
• Subpixel-like features can be specialized for a given task.
Source: Dai et al., R-FCN: Object Detection via Region-based Fully Convolutional Networks
![Page 44: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/44.jpg)
Depth-sensitive Score Maps
• We can extend this approach to be depth-sensitive.
conv
feature maps
𝑘3(𝐶 + 1) conv
Top-left-back,Top-left-center,…Bottom-right-center,Bottom-right-front.
pool
𝑘𝑘
𝑘vote
𝐶 + 1
= 𝐶 + 1
![Page 45: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/45.jpg)
Spatial Transformers for Pose Estimation
• General method for parameterized transforms between feature maps.
• Interpolation of transformed sampling grid.
• Estimated transformation is related to 3D object pose.
![Page 46: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/46.jpg)
Complete Model Sketch
conv
sharedfeature maps
down convs
multi-scaledepth-sensitive
localization
ROI pooling and spatial transformer
depth-sensitivesegmentation,classification,pose estimation
𝑂
2.5D imagefeature maps
![Page 47: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/47.jpg)
Timeline for Completion
• December 2016• Select and prepare new datasets for experiments.
• Annotate Street View dataset with object bounding boxes.
• Extend current localization and segmentation implementations for baselines.
• Begin implementation of classification and pose estimation baselines.
• January 2017• Complete implementation of baseline models and begin training models for
evaluation on a chosen dataset.
• Implement baseline multi-task model.
![Page 48: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/48.jpg)
Timeline for Completion
• February 2017• Begin some experiments with architectures using:
• Depth-sensitive localization.
• Depth-sensitive subpixel convolution for segmentation.
• 3D object pose estimation with spatial transformers.
• March 2017• Prepare paper for ICCV 2017 submission including experiments on:
• Multi-task learning for 3D object identification.
• One of the proposed depth-sensitive experimental architectures.
• Consider additional experiments on domain adaptation and missing point reconstruction.
![Page 49: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/49.jpg)
Timeline for Completion
• April 2017• Dissertation writing.
• Continuation of experiments.
• May 2017• Dissertation defense.
• Prepare paper submission to 3DV 2017 containing additional experiments.
![Page 50: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/50.jpg)
Additional Slides
![Page 51: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/51.jpg)
Google Street View Dataset
• Google R5 Street View Dataset
• All but two pieces of NYC 0 used for training.
• Remaining runs used for evaluation.
![Page 52: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/52.jpg)
KITTI Dataset
• 3D bounding boxes for vehicles, cyclists, and pedestrians in LIDAR.
• Precise segmentation labels not included in benchmark.
![Page 53: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/53.jpg)
Synthia Dataset
• Synthetic urban scenes for simulated RGB-D scans.
• Exact labels for semantic segmentation but 3D poses are not given.
• Domain adaptation required for effective use on real-world data.
• Missing point reconstruction task can be simulated.
![Page 54: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/54.jpg)
Indoor RGB-D Datasets
• SUN RGB-D and SceneNN.
• Class, segmentation, and oriented 3D bounding boxes included.
• Reconstructed shape can be used for missing points.
![Page 55: Object Localization, Segmentation, Classification, and Pose …€¦ · •Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in](https://reader030.vdocuments.mx/reader030/viewer/2022040612/5f0261de7e708231d403ff8c/html5/thumbnails/55.jpg)
Assumptions for Proposed Work
• Single 3D image from LIDAR sensor sweep or RGB-D camera.• Excludes video, multiview registration, and volumetric sensors.
• Possible shape completion only for missing (non-occluded) scan points.• Excluding complete volumetric shape reconstruction and database matching.
Hua et al., SceneNN: A Scene Meshes Dataset with aNNotationsWu et al., 3D ShapeNets: A Deep Representation for Volumetric Shapes