motion capture from rgb-d camera -...
TRANSCRIPT
![Page 1: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/1.jpg)
Motion Capture from RGB-D Camera
Ruigang Yang
University of Kentucky
![Page 2: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/2.jpg)
Motion/Performance Capture
Marker
Optical Inertial/ Mechnical
Markerless
Multi-View Single-View
Depth Video
![Page 3: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/3.jpg)
Mocap with a Single Depth Sensor Discriminative
• Shotton et al.,CVPR, 2011
• …
Generative
• Ganapathi et al. CVPR 2010
• …
Images from Christian Theobalt
![Page 4: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/4.jpg)
The “Kinect” Approach
• Body Part Recognition
right elbow
right hand left shoulder neck
Slides from Shotton et al. 2011 CVPR Presentation
![Page 5: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/5.jpg)
infer body parts per pixel
cluster pixels to hypothesize body joint positions
The Kinect pose estimation pipeline
capture depth image &
remove bg
fit model & track skeleton
![Page 6: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/6.jpg)
Classifying pixels
• Compute 𝑃(𝐶𝑥|𝜔𝑥)
– pixels 𝑥
– body part 𝐶𝑥
– image window 𝜔𝑥
• Discriminative approach
– learn classifier 𝑃(𝐶𝑥|𝜔𝑥) from training data
image windows move with classifier
![Page 7: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/7.jpg)
Fast depth image features
• Depth comparisons
– very fast to compute
input depth image
x Δ
x Δ
x Δ
x
Δ
x
Δ
x
Δ
𝑓 𝐼, x = 𝑑𝐼 x − 𝑑𝐼(x + Δ)
image depth
image coordinate
offset depth
feature response
Background pixels d = large constant
Δ =𝐯
𝑑𝐼 x
![Page 8: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/8.jpg)
Decision tree classification Image Pixel 𝑥
no
Toy example: distinguish left (L) and right (R) sides of the body
no yes
yes
L R
P(c)
L R
P(c)
L R
P(c)
f(I, x; Δ1) > θ1
f(I, x; Δ2) > θ2
![Page 9: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/9.jpg)
Training decision trees
Qn = (I, x)
f(I, x; Δn) > θn
no yes
c
Pr(c)
body part c
Pn(c)
c
Pl(c)
Take (Δ, θ) that maximises information gain:
n
l r
Goal: drive entropy at leaf nodes to zero
reduce entropy
[Breiman et al. 84]
Δ𝐸 = −𝑄l𝑄𝑛
𝐸(Ql) −𝑄r𝑄𝑛
𝐸(Qr)
![Page 10: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/10.jpg)
• Trained on different random subset of images – “bagging” helps avoid over-fitting
• Average tree posteriors
Decision forest classifier [Breiman 01]
……… tree 1 tree T
c
P1(c) c
PT(c)
(𝐼, x) (𝐼, x)
𝑃 𝑐 𝐼, x =1
𝑇 𝑃𝑡(𝑐|𝐼, x)
𝑇
𝑡=1
![Page 11: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/11.jpg)
• Define 3D world space density:
• Mean shift for mode detection
𝑓𝑐 𝑥 ∝ 𝜔𝑖𝑐exp(−𝑥 −𝑥𝑖
𝑏𝑐
2)𝑖
Body parts to joint hypotheses
3. hypothesize body joints
…
1 2
bandwidth
3D Coordinates
pixel weight
inferred probability
depth at i th pixel
𝜔𝑖𝑐 = 𝑃 𝑐 𝐼, 𝑥𝑖 𝑑𝐼 𝑥𝑖2
![Page 12: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/12.jpg)
From proposals to skeleton
• Input – 3D joint hypotheses
– kinematic constraints
– temporal coherence
• Output – full skeleton
– higher accuracy
– invisible joints 4. track skeleton
1
2
3
![Page 13: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/13.jpg)
Synthetic training data
Train invariance to:
Record mocap 500k frames
distilled to 100k poses
Retarget to several models
Render (depth, body parts) pairs
![Page 14: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/14.jpg)
Synthetic vs real data
synthetic (train & test)
real (test)
![Page 15: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/15.jpg)
Number of trees ground truth
1 tree 3 trees 6 trees
inferred body parts (most likely)
40%
45%
50%
55%
1 2 3 4 5 6
Ave
rage
per
-cla
ss a
ccu
racy
Number of trees
![Page 16: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/16.jpg)
front view top view side view
input depth inferred body parts
inferred joint positions
![Page 17: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/17.jpg)
front view top view side view
input depth inferred body parts
inferred joint positions
![Page 18: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/18.jpg)
Joint prediction accuracy
0.00.10.20.30.40.50.60.70.80.91.0
Ce
nte
r H
ead
Cen
ter
Ne
ck
Lef
t S
ho
uld
er
Rig
ht
Sh
ou
lder
Lef
t E
lbo
w
Rig
ht
Elb
ow
Lef
t W
rist
Rig
ht
Wri
st
Lef
t H
and
Rig
ht
Han
d
Lef
t K
nee
Rig
ht
Kn
ee
Lef
t A
nkl
e
Rig
ht
An
kle
Lef
t F
oo
t
Rig
ht
Fo
ot
Mea
n A
P
Ave
rag
e p
reci
sio
n
Joint prediction from ground truth body parts
Joint prediction from inferred body parts
![Page 19: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/19.jpg)
Summary
Pros:
–Frame-by-frame gives robustness
–Fast, simple machine learning
Cons:
–Accuracy can be improved.
![Page 20: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/20.jpg)
Generative Approaches • Kinematic model
• Maximize model-to-observation consistency
Single Templates (normally in a tree structure) [Poppe et al. 2007]
Observation
Template
Statistical Models [Anguelov et al. 2005, Hasler et al. 2009]
![Page 21: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/21.jpg)
Linear Blend Skinning (LBS)
• Mesh + Skeleton
• Each vertex is controlled by several neighboring bones
𝑣𝑖 = 𝛼𝑖,𝑘𝑇𝑘𝑘
𝒗𝑖0
Bone transformations
Skinning weights
![Page 22: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/22.jpg)
Parametric Models
Images from Christian Theobalt
![Page 23: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/23.jpg)
Pose Parameterization
Images from Christian Theobalt
![Page 24: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/24.jpg)
Twist-based Representation of Transformations [Murray et al., 1994]
• 𝑇𝑥 = 𝑒𝝃𝒙 𝜃𝑥
– 𝝃𝒙: the twist representing rotation axis
– 𝜃𝑥: rotation angle
• Linearization
– 𝑇𝑥 ≈ (𝐼 + 𝝃𝒙 𝜃𝑥) if 𝜃𝑥 is small
𝒙
𝒚
𝒛
𝜃𝑥 𝜃𝑦
𝜃𝑧
𝝃𝒙
𝝃𝒛 =𝝎𝒛
−𝝎𝒛 × 𝒑𝒛
𝝃𝒚
World coordinate
𝒑𝒛
𝜃𝑦
𝜃𝑥
𝜃𝑧
Direction
![Page 25: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/25.jpg)
Model-to-Observation Consistency
• Silhouette
• Texture
• Depth
Image-based
Depth-based
[Gall et al. 2009, Gall et al. 2010, Liu et al. 2013, etc.]
[Ganapathi et al. 2010, Ye et al. 2011 Baak et al. 2011 Helten et al. 2013 Wei et al. 2013 Ye et al. 2014 etc.]
![Page 26: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/26.jpg)
Depth consistency and pose update
• Typically using ICP – [Ganapathi et al. 2012, Helten et al. 2013, Wei et al. 2013]
• Limitation: sensitive to local minima
Estimate pose Template vertices
Closest points
Bone (Skeleton)
![Page 27: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/27.jpg)
Soft correspondences association • Gaussian Mixture Model
– Template vertices are Gaussian centroids
– Observed points are sampling from the GMM
𝒑 𝒙𝒏 = 𝟏− 𝒖
𝑴𝒑 𝒙𝒏 𝒗𝒎
𝑴
𝒎=𝟏
+𝒖
𝑵
• Pose estimation = find the pose that gives the 𝒗𝒎 that achieves maximum joint probability 𝒑 𝒙𝒏𝑛
Template vertices
Observed points
Bone (Skeleton)
𝒑 𝒙𝒏|𝒗𝟏 𝒑 𝒙𝒏|𝒗𝟐
𝒑 𝒙𝒏|𝒗𝟑
Uniform distribution for outliers
![Page 28: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/28.jpg)
Maximize the joint probability
• Log likelihood
𝑬 𝚯, 𝑺, 𝝈𝟐 = 𝐥𝐨𝐠 𝟏− 𝒖
𝑴𝒑 𝒙𝒏 𝒗𝒎
𝑴
𝒎=𝟏
+𝒖
𝑵
𝑵
𝒏=𝟏
• Solve parameters (pose 𝜣) of 𝒗𝒎 via EM – Negative complete log likelihood
𝑸 𝜣, 𝑺, 𝝈𝟐 ∝ 𝒑 𝒗𝒎 𝒙𝒏𝒏,𝒎
𝒙𝒏 − 𝒗𝒎 𝜣𝟐
Linearization 𝒑 𝒗𝒎 𝒙𝒏𝒏,𝒎
𝒙𝒏 − 𝑳 𝒗𝒐, 𝜣𝐩𝐫𝐞𝐯, 𝜟𝜣
𝟐
Linear Blend Skinning
Incremental pose update
Most recent pose Template vertices in reference pose
Posterior
![Page 29: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/29.jpg)
Pose Energy Function
• Negative complete log likelihood
𝒑 𝒗𝒎 𝒙𝒏𝒏,𝒎
𝒙𝒏 − 𝑳 𝒗𝒐, 𝜣𝐩𝐫𝐞𝐯, 𝜟𝜣
𝟐
• Regularization
– Small pose update 𝜟𝜣 𝟐
– Prediction via auto-regression
𝜣𝐩𝐫𝐞𝐯 + 𝜟𝜣 −𝜣𝐩𝐫𝐞𝐝𝟐
![Page 30: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/30.jpg)
Pose Estimation
• Initialize the pose 𝜣𝒕 (e.g. from previous frame)
• Iterate until convergence
– Compute template vertices {𝒗𝒎} via LBS
– E-step: compute posterior
– M-step: minimize the pose energy function in previous slide over the pose update 𝜟𝜣
– Increment the pose 𝜣𝒕 with 𝜟𝜣
![Page 31: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/31.jpg)
Template-Subject Consistency
• Body size and shape consistency between template and the subject is critical.
• Therefore, estimate
– body size: limb length scales (higher or shorter)
– shape: Vertex displacements (fatter or slimmer)
• System workflow
Shape Adaptation
Body Size (Limb Lengths) Adaptation Initialization (One Frame)
Pose Estimation Adjusted Template
Live data
![Page 32: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/32.jpg)
Limb length scales
• Represent the vertex as a function of the limb length scales
• Differential bone coordinates [Straka et al. 2012]
– Template vertex 𝒗𝒎= LinearFunction(scales)
Vertex
Differential bone coordinate (single control bone case)
Joint
Joint Parent
![Page 33: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/33.jpg)
Limb length scales estimation
• Iterate between pose estimation and scale estimation
• Scale energy function
– Negative complete log likelihood with 𝒗𝒎 = Linearfunction(scales)
– Regularization
• Symmetric bones have similar scales
• Connected bones have similar scales
![Page 34: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/34.jpg)
Limb length scales estimation (cont.)
Initial Pose Only Pose + Scale
![Page 35: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/35.jpg)
Shape Adaptation (cont.)
• Update for each 5 frames
Scale adapted only (frame 9)
First update (frame 10)
Stable (frame 40)
No shape adapt With shape adapt No shape adapt With shape adapt
![Page 36: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/36.jpg)
Evaluations
Comparison in terms of joint distance errors (unit = meter)
Comparison in terms of marker distance errors (unit = meter)
Ye Ye et al. 2014
![Page 37: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/37.jpg)
Qualitative Evaluations
Comparisons with KinectSDK
Kin
ect
SDK
O
urs
![Page 38: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/38.jpg)
Qualitative Evaluations - Video
![Page 39: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/39.jpg)
Application: shape collection registration
• Align a single skin template to a collection of meshes
Init
ial
Alig
ned
![Page 40: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/40.jpg)
Applications Health Care Virtual Try-On
Garments
Tan et al. 2013 Virtual Accessory
Mao et al. 2014
Reflexion Health
Jintronix
![Page 41: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/41.jpg)
Summary
• Advantages
– Metric Input/Output
– Fast and robust algorithms
• Challenges
– Outdoor
– Large Deformation
– Crowd Mocap
![Page 42: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/42.jpg)
Acknowledgment
• Christian Theobalt
• J. Shotton et al.
• Mao Ye
![Page 43: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/43.jpg)
References
• Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software.
• [Gall et al 2009] J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H.-P. Seidel, Motion capture using joint skeleton tracking and surface estimation, in IEEE CVPR, 2009
• [Cui et al 2012] Y. Cui, W. Chang, T. N•oll, and D. Stricker, Kinectavatar: Fully automatic body capture using a single kinect, in ACCV 2012 Workshop on Color Depth Fusion in Computer Vision, 2012
• [Change et al 2011] W. Chang and M. Zwicker, Global registration of dynamic range scans for articulated model reconstruction, ACM TOG, 2011
• [Weiss et al 2011] A. Weiss, D. Hirshberg, and M. J. Black, Home 3D body scans from noisy image and range data, in ICCV, 2011.
• [Strake et al 2012] M. Straka, S. Hauswiesner, M. R•uther, and H. Bischof, Simultaneous shape and pose adaption of articulated models using linear optimization, in ECCV, 2012
• [Baak et al 2011] A. Baak, M. M•uller, G. Bharaj, H.-P. Seidel, and C. Theobalt, A data-driven approach for realtime full body pose reconstruction from a depth camera, in ICCV, 2011
• [Ganapathi et al 2010] V. Ganapathi, C. Plagemann, D. Koller, and S. Thrun, Real time motion capture using a single time-ofight camera, in IEEE CVPR, 2010
• [Ganapathi et al 2012] V. Ganapathi, C. Plagemann, D. Koller and S. Thrun, Real Time Human Pose Tracking from Range Data, in ECCV, 2012
Mao Ye University of Kentucky 53
![Page 44: Motion Capture from RGB-D Camera - uni-bonn.depages.iai.uni-bonn.de/gall_juergen/tutorials/slides14/cvpr2014... · –very fast to compute input depth image x Δ Δ x Δ x x Δ x](https://reader035.vdocuments.mx/reader035/viewer/2022070821/5f1ebb40633f9e655e4a66e8/html5/thumbnails/44.jpg)
• [de Aguiar et al 2008] E. de Aguiar, C. Stoll, C. Theobalt, N. Ahmed, H. Seidel, and S. Thrun. 2008. Performance capture from sparse multi-view video. In ACM SIGGRAPH 2008
• [Vlasic et al 2008] D. Vlasic, I. Baran, W. Matusik, and J. Popovic, Articulated mesh animation from multi-view silhouettes," in ACM SIGGRAPH 2008
• [Ballan et al 2008] L. Ballan and G. M. Cortelazzo, Marker-less Motion Capture of Skinned Models in a Four Camera Set-up using Optical Flow and Silhouettes, in 3DPVT, 2008
• [Gall et al 2011] J Gall, A Fossati L Van Gool, Functional categorization of objects using real-time markerless motion capture, in CVPR, 2011
• [Liu et al 2013] Y. Liu, J. Gall, C. Stoll, Q. Dai, H.P. Seidel, C. Theobalt, Markerless Motion Capture of Multiple Characters Using Multiview Image Segmentation. IEEE TPAMI, 2013
• [Li et al 2013] H. Li, E. Vouga, A. Gudym, J. Barron, L. Luo and G. Gusev, 3D Self-Portraits , in ACM SIGGRAPH Asia, 2013
• [Li et al 2009] H. Li, B. Adams, L. Guibas, M. Pauly. Robust Single-View Geometry and Motion Reconstruction, in ACM SIGGRAPH, 2009
• [Shotton et al 2011] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, Real-time human pose recognition in parts from single depth images, in CVPR, 2011
• [Helten et al 2013a] T. Helten, A. Baak, G. Bharaj, M. Muller, H.P. Seidel, C. Theobalt, Personalization and Evaluation of a Real-time Depth-based Full Body Tracker, in 3DV, 2013
• [Helten et al 2013b] T. Helten, A. Baak, M. Muller, C. Theobalt, Full-Body Human Motion Capture from Monocular Depth Images, in LNCS, 2013
• [Ye et al 2011] M. Ye, X. Wang, R. Yang, L. Ren and M. Pollefeys. Accurate 3D Pose Estimation from a Single Depth Image. In ICCV, 2011
• [Ye and Yang 2014] M. Ye and R. Yang, Real-time Simultaneous Pose and Shape Estimation for Articulated Objects with a Single Depth Camera, in CVPR 2014
• [Mueslund et al 2006] T. B. Moeslund, A. Hilton, and V. Kr•uger, A survey of advances in vision-based human motion capture and analysis, in CVIU, 2006
54
References (Cont.)