learning human pose and motion models for animation aaron hertzmann university of toronto
TRANSCRIPT
Learning Human Pose and Motion Models for Animation
Aaron HertzmannUniversity of Toronto
Animation is maturing …
… but it’s still hard to create
Keyframe animation
Keyframe animation
http://www.cadtutor.net/dd/bryce/anim/anim.html
q1 q2 q3
q(t) q(t)
Characters are very complex
Woody:- 200 facial controls- 700 controls in his body
http://www.pbs.org/wgbh/nova/specialfx2/mcqueen.html
Motion capture
[Images from NYU and UW]
Motion capture
Mocap is not a panacea
Goal: model human motion
What motions are likely?
Applications:• Computer animation• Computer vision
Related work: physical models
•Accurate, in principle•Too complex to work with
(but see [Liu, Hertzmann, Popović 2005])
•Computationally expensive
Related work: motion graphs
Input: raw motion capture
“Motion graph”(slide from J. Lee)
Approach: statistical models of motions
Learn a PDF over motions, and synthesize from this PDF [Brand and Hertzmann 1999]
What PDF do we use?
Style-Based Inverse Kinematics
with: Keith Grochow, Steve Martin, Zoran Popović
Motivation
Body parameterization
Pose at time t: qt
Root pos./orientation (6 DOFs)
Joint angles (29 DOFs)
MotionX = [q1, …, qT]
Forward kinematics
Pose to 3D positions:
qt
[xi,yi,zi]t
FK
Problem Statement
Generate a character pose based on a chosen style subject to constraints
Constraints
Degrees of freedom (DOFs) q
Real-time Pose Synthesis
Off-Line Learning
Approach
Motion Learning Style
Synthesis
Pose
Constraints
y(q) = q orientation(q) velocity(q) [ q0 q1 q2 …… r0 r1 r2 v0 v1 v2 … ]
Features
Goals for the PDF
• Learn PDF from any data
• Smooth and descriptive
• Minimal parameter tuning
• Real-time synthesis
Mixtures-of-Gaussians
GPLVM
y1
y2
y3
x1
x2
Latent Space Feature Space
Gaussian Process Latent Variable Model [Lawrence 2004]
GP
-1
x ~ N(0,I)y ~ GP(x; )
Learning: arg max p(X, | Y) = arg max p(Y | X, ) p(X)
Scaled Outputs
Different DOFs have different “importances”
Solution:RBF kernel function k(x,x’)ki(x,x’) = k(x,x’)/wi
2
Equivalently: learn x Wywhere W = diag(w1, w2, … wD)
Precision in Latent Space
2(x)
);(ln2);(2
);();(L 2
2
2
IK θxθx
θ)f(xyWθyx, θ
D
SGPLVM Objective Function
y1
y2
y3
x1
x2 θ)f(x;y
xx
C
Baseball Pitch
Track Start
Jump Shot
Style interpolation
Given two styles 1 and 2, can we “interpolate” them?
));(exp()(1 1θyy IKLp
Approach: interpolate in log-domain
));(exp()(2 2θyy IKLp
Style interpolation
));(exp()( 22 θyy IKLp ));(exp()(1 1θyy IKLp
(1-s) s
)(s)()s1( 21 ypyp
Style interpolation in log space
));(exp( 1θyIKL ));(exp( 1θyIKL
(1-s)s
));(s);()s1((exp( 21 θyθy LL
Interactive Posing
Interactive Posing
Interactive Posing
Multiple motion style
Realtime Motion Capture
Style Interpolation
Trajectory Keyframing
Posing from an Image
Modeling motion
GPLVM doesn’t model motions• Velocity features are a hack How do we model and learn dynamics?
Gaussian Process Dynamical Models
with: David Fleet, Jack Wang
Dynamical models
xt+1xt
Hidden Markov Model (HMM)Linear Dynamical Systems (LDS)
[van Overschee et al ‘94; Doretto et al ‘01]
Switching LDS[Ghahramani and Hinton ’98; Pavlovic et al ‘00; Li et al ‘02]
Nonlinear Dynamical Systems[e.g., Ghahramani and Roweis ‘00]
Dynamical models
Gaussian Process Dynamical Model (GPDM)
Marginalize out , and then optimize the latent positions to simultaneously minimize pose reconstruction error and (dynamic) prediction error on training data .
pose reconstruction
latent dynamics
Latent dynamical model:
Assume IID Gaussian noise, and
with Gaussian priors on and
DynamicsThe latent dynamic process on
has a similar form:
where
is a kernel matrix defined by kernel function
with hyperparameters
Subspace dynamical model:
Markov Property
Remark: Conditioned on , the dynamical model is 1st-order Markov, but the marginalization introduces longer temporal dependence.
Learning
To estimate the latent coordinates & kernel parameters we minimize
with respect to and .
GPDM posterior:
reconstruction likelihood
priorsdynamics likelihood
training motions
hyperparameterslatent trajectories
Motion Capture Data
~2.5 gait cycles (157 frames) Learned latent coordinates (1st-order prediction, RBF kernel)
56 joint angles + 3 global translational velocity + 3 global orientation from CMU motion capture database
3D GPLVM Latent Coordinates
large “jumps’ in latent space
Reconstruction Variance
Volume visualization of .
(1st-order prediction, RBF kernel)
Motion Simulation
Animation of mean motion (200 step sequence)
initial state
Random trajectories from MCMC (~1 gait cycle, 60 steps)
Simulation: 1st-Order Mean Prediction
Red: 200 steps of mean prediction
Green: 60-step MCMC mean
Animation
Missing Data
50 of 147 frames dropped (almost a full gait cycle)
spline interpolation
Missing Data: RBF Dynamics
Determining hyperparameters
GPDM Neil’s parameters MCEM
Data: six distinct walkers
Where do we go from here?
Let’s look at some limitations of the model
60 Hz 120 Hz
What do we want?
Phase
Variation
x1
x2
A walk cycle
Branching motions
Walk Run
Stylistic variation
Current work: manifold GPs
Latent space (x) Data space (y)
Summary
GPLVM and GPDM provide priors from small data sets
Dependence on initialization, hyperpriors, latent dimensionality
Open problems modeling data topology and stylistic variation