![Page 1: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/1.jpg)
No.4
動きからの3次元復元
Structure from Motion and SLAM
担当教員:向川康博・田中賢一郎
Computer Vison 2019
![Page 2: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/2.jpg)
Slide credits
• Special Thanks: Some slides are adopted from other instructors’ slides.
– Tomokazu Sato, former NAIST CV1 class
– James Tompkin, Brown CSCI 1430 Fall 2017
– Ioannis Gkioulekas, CMU 16-385 Spring 2018
– We also thank many other instructors for sharing their slides.
2
![Page 3: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/3.jpg)
Today’s topics
3
• Reminder: calibration and stereo
• 2-view Structure from Motion (SfM)
• Multi-view SfM• Large-scale SfM
• visual SLAM
![Page 4: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/4.jpg)
Direct Sparse Odometry
4
https://www.youtube.com/watch?v=C6-xwSOOdqQ
![Page 5: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/5.jpg)
Recap: Calibration and Stereo
• Camera calibration
• Stereo
6
image 2image 1
𝑋
𝒙 = 𝑴𝑿
camera center
image plane
principal axis
𝒙 = 𝑴𝑿
known knownestimate
known known estimate
Solve Perspective n-Points (PnP) problem
Solve Triangulation problem
![Page 6: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/6.jpg)
Structure(scene geometry)
Motion(camera geometry)
Measurements
Pose Estimation known estimate3D to 2D
correspondences
Triangulation estimate known2D to 2D
correspondences
Reconstruction estimate estimate2D to 2D
correspondences
Reconstruction(再構成)
![Page 7: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/7.jpg)
How do you reconstruct?
14
We don’t know both scene geometry and camera geometry
![Page 8: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/8.jpg)
Reconstruction Problem
• Problems so far
• Can we jointly estimate M and X ?
15
𝒙 = 𝑴𝑿known estimateestimate
𝒙 = 𝑴𝑿known knownestimate
𝒙 = 𝑴𝑿known known estimate
Camera calibration Triangulation (Stereo)
![Page 9: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/9.jpg)
2-view SfM
Structure from Motion
16
![Page 10: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/10.jpg)
Two-view SfM
1. Compute the Fundamental Matrix 𝑭 from
points correspondences
8-point algorithm
Image feature matching
Estimate 𝑭 from matched pairs
𝒑′⊺𝑭𝒑 = 0
![Page 11: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/11.jpg)
Good feature point
• Mini-report:2– Which is the good feature point to find
corresponding point uniquely? What is the reason?
(1) corner
(2) edge
(3) flat
18
(1)
(2)
(3)
![Page 12: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/12.jpg)
Good feature point
(1) corner
easy to find
(2) edge
difficult due to aperture problem(窓問題)
(3) flat
difficult because no clues
19
(1)
(2)
(3)
![Page 13: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/13.jpg)
Aperture problem (窓問題)
• Human eyes interpret conveniently.
• Corresponding point cannot be uniquely determined.
20
ambiguity
?
?
?
barber-pole illusion
![Page 14: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/14.jpg)
Interest operators
• Corner detectors to find small areas which are suitable as feature points
– Moravec corner detector (1980)
– Harris corner detector (1988)
– KLT(Kanade-Locus-Tomasi) corner detector(1991)
• Check whether small area contains edges in multiple directions
Bad feature points: corresponding point cannot be determined uniquely.
Good feature point
![Page 15: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/15.jpg)
Harris corner detector
• Principal component analysis of gradient in small area
• Classification by Eigenvalues 𝜆1and 𝜆21. X and Y direction differentiation
𝑿 = Τ𝝏𝑰 𝝏𝒙 𝒀 = Τ𝝏𝑰 𝝏𝒚
2. Calculation of variance and covariance
𝐀 = 𝐗𝟐⨂𝐰 𝐁 = 𝐘𝟐⨂𝐰 𝐂 = 𝐗𝐘 ⨂𝐰
𝐰𝐮,𝐯 = 𝐞𝐱𝐩 − (𝐮𝟐 + 𝐯𝟐)/𝟐𝛔𝟐
3. Eigenvalues of the variance-covariance matrix
𝑴 =𝑨 𝑪𝑪 𝑩
𝝀𝟏, 𝝀𝟐
Principal component analysisof image gradient
![Page 16: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/16.jpg)
Harris corner detector
• Eigenvalue-based classification
Flat
𝜆1and 𝜆2 are smalledge
𝜆1 ≫ 𝜆2
Corner𝜆1and 𝜆2 are large
edge
𝜆2≫
𝜆1
λ2
λ1
![Page 17: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/17.jpg)
Image feature descriptor
- SIFT -
• Features that are invariant to changes in scale and rotation of the image
• Intensity gradient histogram– Rotate coordinate axis in gradient direction (invariant to rotational
change)
– Normalize the vector sum (to reduce the influence of lighting changes)
0 2p
Image gradient Angle histogram
Detect peak position(gradient direction)
![Page 18: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/18.jpg)
Image feature descriptor
- SIFT -
• Accumulate while overlapping gradient information around key points with Gaussian function
• Histogram in 8 directions every 4 × 4 = 16 blocks
– 128 dimensional feature vector
Keypoint description
Keypoint
Gradient direction
Gaussian
![Page 19: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/19.jpg)
Two-view SfM
1. Compute the Fundamental Matrix 𝑭 from
points correspondences
8-point algorithm
Image feature matching
Estimate 𝑭 from matched pairs
𝒑′⊺𝑭𝒑 = 0
![Page 20: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/20.jpg)
RANSAC (RANdom SAmpling Consensus)
28
Eliminating outliers (外れ値).
1. Randomly sample two points from given points
2. Count inliers for the line that passesselected two points.
3. Repeat 1 and 2 for given number of iterations.
4. Select the line that maximize the number of inliers.
line fitting example 2
4
1 2
![Page 21: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/21.jpg)
• Randomly sampled 8 points
8-point algorithm
29
Do the remaining points agree?
![Page 22: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/22.jpg)
Two-view SfM
2. Compute the camera matrices 𝑴 from the
Fundamental matrix
𝑴 = 𝑰|𝟎 and 𝑴′ = 𝒆𝒙 𝑭|𝒆′
1. Compute the Fundamental Matrix 𝑭 from
points correspondences
8-point algorithm
![Page 23: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/23.jpg)
Estimation of camera pose from pairs of feature points
33
Even if we do not know 3Dpositions of feature points, we canestimate camera’s rotation 𝐑 andtranslation direction 𝐭 by epipolarconstraint from multiple 2Dpositions of corresponding featurepairs.
It should be noted that, we cannot recover the scale of 𝐭 (actual distance between 𝑐0 and 𝑐1).
𝑴 = 𝑰|𝟎 and 𝑴′ = 𝒆𝒙 𝑭|𝒆′
東武ワールドスクエア
![Page 24: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/24.jpg)
Unknown real size
• Miniature effect
34https://ameblo.jp/9999999999999999999/entry-10332221773.html
![Page 25: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/25.jpg)
Front/back ambiguity• Find the configuration where the
points is in front of both cameras
35
![Page 26: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/26.jpg)
Two-view SfM
2. Compute the camera matrices 𝑴 from the
Fundamental matrix
𝑴 = 𝑰|𝟎 and 𝑴′ = 𝒆𝒙 𝑭|𝒆′
1. Compute the Fundamental Matrix 𝑭 from
points correspondences
8-point algorithm
3. For each point correspondence, compute the point 𝑿 in
3D space
Triangulate with 𝒙 = 𝑴𝑿 and 𝒙′ = 𝑴′𝑿
![Page 27: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/27.jpg)
Structure from motion ambiguity
• If we scale the entire scene by some factor 𝑘 and, at the same time, scale the camera matrices by the factor of 1/𝑘, the projections of the scene points in the image remain exactly the same:
It is impossible to recover the absolute scale of the scene!
𝒙 = 𝑴𝑿 =1
𝑘𝑴 𝑘𝑿
![Page 28: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/28.jpg)
Structure from motion ambiguity
• If we scale the entire scene by some factor 𝑘 and, at the same time, scale the camera matrices by the factor of 1/𝑘, the projections of the scene points in the image remain exactly the same
• More generally: if we transform the scene using a transformation 𝑸 and apply the inverse transformation to the camera matrices, then the images do not change
𝒙 = 𝑴𝑿 = 𝑴𝑸−𝟏 𝑸𝑿
![Page 29: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/29.jpg)
Reconstruction ambiguity
39
Calibrated cameras(similarity projection ambiguity)
Uncalibrated cameras(projective projection ambiguity)
![Page 30: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/30.jpg)
Multi-view SfM
40
![Page 31: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/31.jpg)
Projective structure from motion
• Given: 𝑚 images of 𝑛 fixed 3D points
• 𝒙𝒊𝒋 = 𝑴𝒊𝑿𝒋, 𝑖 = 1,… ,𝑚, 𝑗 = (1,… , 𝑛)
• Problem: estimate 𝑚 projection matrices 𝑴𝒊 and
𝑛 3D points 𝑿𝒋 from the 𝑚𝑛 correspondences 𝒙𝒊𝒋
𝒙𝟏𝒋
𝒙𝟐𝒋
𝒙𝟑𝒋
𝑿𝒋
𝑴𝟏
𝑴𝟐
𝑴𝟑
![Page 32: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/32.jpg)
Sequential SfM
Multi-view, ordered images
43
![Page 33: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/33.jpg)
2. Compute the camera matrices 𝑀 from the
Fundamental matrix
𝑴 = 𝑰|𝟎 and 𝑴′ = 𝒆𝒙 𝑭|𝒆′
1. Compute the Fundamental Matrix F from
points correspondences
8-point algorithm
3. For each point correspondence, compute the point 𝑿 in
3D space
Triangulate with 𝒙 = 𝑴𝑿 and 𝒙′ = 𝑴′𝑿
Initialization
• Initialize by 2-view SfM
![Page 34: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/34.jpg)
45
Image 1 Image 2
If we know camera poses for a pair of image 1 and 2, we can continue to estimate camera poses and 3-D structure for new input by repeating ‘mapping’ and ‘localization’.
Idea for sequential SfM
Initial guess
mapping
localization
![Page 35: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/35.jpg)
Problem of accumulative errorsby chaining relative poses
46
Image 1Image 2
Image 3
1% scale error 1% scale error 1% scale error
Image 4
If relative camera poses are estimated with +1% biased scale error for each pair, the scale error at 100 frame will be 1.01100 = 2.70 = 270%.*This kind of effect is called as scale drift.
Image 100
![Page 36: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/36.jpg)
Bundle adjustment
• Non-linear method for refining structure and motion
• Minimizing reprojection error
x1j
x2j
x3j
P1
P2P3
P1Xj
P2XjP3Xj
𝐸 𝑴,𝑿 =
𝑖=1
𝑚
𝑗=1
𝑛
𝐷 𝒙𝑖𝑗 ,𝑴𝑖𝑿𝑗2
𝑋𝑗
![Page 37: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/37.jpg)
Large-scale SfM
Multi-view, non-ordered images
48
![Page 38: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/38.jpg)
Photo Tourism
![Page 39: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/39.jpg)
Feature detection
Detect features using SIFT [Lowe, IJCV 2004]
![Page 40: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/40.jpg)
Feature description
Describe features using SIFT [Lowe, IJCV 2004]
![Page 41: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/41.jpg)
Feature matching
Refine matching using RANSAC to estimate fundamental
matrix between each pair
![Page 42: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/42.jpg)
Incremental structure from motion
![Page 43: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/43.jpg)
Final reconstruction
![Page 44: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/44.jpg)
![Page 45: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/45.jpg)
visual SLAM
Simultaneous Localization and Mapping
59
![Page 46: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/46.jpg)
Relationship between SfM and v-SLAM
SfM v-SLAM
SfM v-SLAM
Input Images (non-ordered is OK) Video
Real-time Not necessary Required
Output Batch Successive
Available information for estimation
All frames Data acquired before the target frame
Recovery from failure Easier Not easy
Feature tracking Not easy (for non-successive image input)
Easier
Accumulation of errors Smaller Bigger / Faster
Offline, high accuracy Real-time, successive output
60
![Page 47: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/47.jpg)
61
Image 1 Image 2
If we know camera poses for a pair of image 1 and 2, we can continue to estimate camera poses and 3-D structure for new input by repeating ‘mapping’ and ‘localization’.
Idea for sequential SfM
Initial guess
mapping
localization
![Page 48: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/48.jpg)
62
Basic pipeline of sequential visual SLAM
Iterative process
Feature point tracking
Estimation of 3D points
Addition and deletion of feature points
Camera pose initialization by Two-view SfM
Camera pose estimation
(option) Local / Global bundle adjustment
![Page 49: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/49.jpg)
Local bundle adjustment is asynchronously processed forminimizing accumulation of errors using selected key-frames in order not to prevent real-time processing.
Parallel Tracking and Mapping
Initialization by two-view SfM
Feature point tracking
Camera pose estimation
Bundle adjustment for selected key-frames
+Addition of map points
Tracking thread Mapping thread
Selection of key-frames
63
![Page 50: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/50.jpg)
Reduction of accumulation errors / re-start from tracking failure
Loop closing, re-localization
Basic flow1. Find similar image of current input from already observed images2. Find corresponding points between current and selected images. 3. Optimize data using bundle adjustment / Estimate camera pose.
True camera path
Estimated camera path
64
![Page 51: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/51.jpg)
LSD-SLAM: Large Scale Direct Monocular SLAM
https://www.youtube.com/watch?v=GnuQzP3gty4 66
![Page 52: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/52.jpg)
Final report
• Explain the reason of these strange views
74
![Page 53: Human Computer Interaction 1. overview and history of HCIomilab.naist.jp/class/CV/2019/2019-CV-No4-SfM.pdf · No.4 動きからの3次元復元 Structure from Motion and SLAM 担当教員:向川康博・田中賢一郎](https://reader030.vdocuments.mx/reader030/viewer/2022041214/5e030c1fd9e2ea2f204182ed/html5/thumbnails/53.jpg)
Inconsistent?
• Physically consistent
• Humans interpret conveniently.
– 90 degree, rectangle, parallel,...
– Estimation of skewed shape
75
𝒙 = 𝑴𝑿
𝒙 = 𝑴𝑸−𝟏𝑸𝑿
𝑸𝑿