l12 6d pose estimation ii - github pages
TRANSCRIPT
![Page 1: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/1.jpg)
L11: 6D Pose Estimation(Cont’)
Hao Su
Machine Learning meets Geometry
Ack: Minghua Liu and Jiayuan Gu for helping to prepare slides
![Page 2: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/2.jpg)
We are talking about object pose
2
Figure from https://paperswithcode.com/task/6d-pose-estimation
Review
![Page 3: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/3.jpg)
6D Pose Estimation
3
recognize the 3D location and orientation of an object relative to a canonical frame
canonical frame
pose 1
pose 2
Review
![Page 4: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/4.jpg)
Agenda
• Introduction
• Rigid transformation estimation- Closed-form solution given correspondences- Iterative closet point (ICP)
• Learning-based approaches- Direct approaches- Indirect approaches
4 Review
![Page 5: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/5.jpg)
Rigid Transformation Estimation
Review
![Page 6: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/6.jpg)
Correspondence
6
Rigid transformation T(p) = Rp + t
source
target
q1 = T(p1) = Rp1 + t
q2 = T(p2) = Rp2 + t
qn = T(pn) = Rpn + t
…
3n equations from n pairs of points
pi
qi
Review
![Page 7: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/7.jpg)
Umeyama’s Algorithm
• Known: ,
• Objective:
• Solution
- (SVD)
- (flip the sign of the last column of if )
-
P = {pi} Q = {qi}
minR,t
n
∑i=1
∥Rpi + t − qi∥2
n
∑i=1
(qi − q̄i)(pi − p̄i)T = UΣVT
R = UVT Vdet(R) = − 1
t =∑n
i=1 qi
n−
∑ni=1 Rpi
n:= q̄ − Rp̄
7 Review
![Page 8: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/8.jpg)
Direct Approaches
Review
![Page 9: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/9.jpg)
Direct Approaches• Input: cropped image/point cloud/depth map of a
single object• Output: (R, t)
9
Point cloud
Depth map
Image
Neural Network (R, t)
Review
![Page 10: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/10.jpg)
Indirect Approaches
![Page 11: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/11.jpg)
Indirect Approaches
• Input: cropped image/point cloud/depth map of a single object
• Output: corresponding pairs - points in canonical frame - points in camera frame - estimate by solving
{(pi, qi)}{pi}
{qi}(R, t)
minR,t
n
∑i=1
∥Rpi + t − qi∥2
11
![Page 12: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/12.jpg)
Two Categories of Indirect Approaches
• If points in the canonical frame are known, predict their corresponding locations in the camera frame
• If points in the camera frame are known, predict their corresponding locations in the canonical frame
12
![Page 13: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/13.jpg)
Given Points in the Canonical Frame, Predict Corresponding Location in the Camera Frame
• Recall: Three correspondences are enough • Which points in the canonical frame should be given?
- Choice by PVN3D: keypoints in the canonical frame
13He et al., “PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation”, CVPR 2020
![Page 14: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/14.jpg)
• Option1: bounding box vertices
• Option 2: farthest point sampling (FPS) over CAD object model
Keypoint Selection
14Chen et al., “G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features”, CVPR 2020
![Page 15: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/15.jpg)
Example: PVN3D
15
Get point-wise features by fusing color and geometry features
![Page 16: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/16.jpg)
Example: PVN3D
16
For each keypoint:• Voting: for each point in the camera frame, predict its offset to the keypoint (in the camera frame)
• Clustering: find one location according to all the candidates
Keypoint
![Page 17: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/17.jpg)
Given Points in the Camera Frame, Predict Corresponding Location in the Canonical Frame
• Which points in the camera frame should be given?- Choice by NOCS: every point in the camera frame
17Wang, He, et al. "Normalized object coordinate space for category-level 6d object pose and size estimation." CVPR 2019.
3D point in the camera frame(2D visible pixel with depth)
3D point in the (normalized) canonical frame
![Page 18: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/18.jpg)
Example: NOCS
18Wang, He, et al. "Normalized object coordinate space for category-level 6d object pose and size estimation." CVPR 2019.
Output NOCS Map H × W × 3
Input Image
Note: the object is normalized to have unit diagonal of bounding box in the canonical space, so the canonical space is called “Normalized Object Canonical Space” (NOCS)
![Page 19: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/19.jpg)
Example: NOCS for Symmetric Objects
• Given equivalent GT rotations (finite symmetry order n),
we can generate n equivalent NOCS maps
• Similar to shape-agnostic loss in direct approaches, we can use Min of N loss
ℛ = {R1GT, R2
GT, ⋯, RnGT}
19Wang, He, et al. "Normalized object coordinate space for category-level 6d object pose and size estimation." CVPR 2019.
![Page 20: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/20.jpg)
Umeyama’s Algorithm with Unknown Scale
• However, the target points in the canonical space of NOCS are normalized, and thus we also need to predict the scale factor
• Similarity transformation estimation (rigid transformation + uniform scale factor)
• Closed-form solution- Umeyama algorithm: http://web.stanford.edu/class/
cs273/refs/umeyama.pdf- Similar to the counterpart without scale
20 Read by yourself
![Page 21: L12 6D Pose Estimation II - GitHub Pages](https://reader034.vdocuments.mx/reader034/viewer/2022050207/626dd1382337570c142959c1/html5/thumbnails/21.jpg)
Tips for Homework 2
• For learning-based approaches- Start with direct approaches- Crop the point cloud of each object from GT depth
map given GT segmentation mask- Train a neural network, e.g. PointNet, with shape-
agnostic loss- Improve the results considering symmetry
21