bilinear models and riemannian metrics for motion classification fabio cuzzolin microsoft research,...
TRANSCRIPT
Bilinear models and Riemannian metrics for
motion classification
Fabio Cuzzolin
Microsoft Research, Cambridge, UK11/7/2006
Myself
Master’s thesis on gesturegesture recognitionrecognition at the University of Padova Visiting student, ESSRL, Washington
University in St. Louis Ph.D. thesis on the theory of belief theory of belief
functionsfunctions Young researcher in Milan with the Image
and Sound Processing group Post-doc at UCLA in the Vision Lab
My research
research
Discrete mathematics
linear independence on lattices
Belief functions and imprecise probabilities
geometric approach
algebraic analysis
combinatorial analysis
Computer vision object and body tracking
data association
gesture and action recognition
identity recognition
Today’s talk Motion classificationMotion classification is one of most popular
vision problems Applications: surveillance, biometric, human-
computer interaction Issue: influence of nuisance factorsBilinear models for invariant
gaitID Issue: choice of distance function
Learning Riemannian metrics for motion classification
Bilinear models for invariant gaitID
The identity recognition problem View-invariance in gaitIDBilinear modelsHMMs and a three-layer modelFour experiments on the Mobo database Riemannian metrics for
classification Distance between dynamical modelsLearning a metric from a training setPullback metricsSpaces of linear systems and Fisher metricExperiments on scalar models
GaitID Biometrics increasingly popular Cooperative methods: face recognition, retinal
analysis Surveillance context: non-cooperative users The problem: recognizing the identity of
humans from their gait Methods: dimensionality reduction, silhouette analysis Issues: nuisance factors, viewpoint dependence
A brief review Gait signatures:
Silhouettes [Collins 02, Wang 03] Optical flow, velocity moments, shape symmetry, static body
parameters “Baseline” algorithm [Sarkar 05]
Computes similarity scores between a probe sequence and each gallery (training) sequence by pairwise frame correlation
Methodologies: mostly pattern recognition after dimensionality reduction
Eigenspaces [Abdelkader 01] PCA/MDA [Tolliver 03, Han 04]
Stochastic models (HMMs): [Kale 02, Debrunner 00]
KL-divergence between Markov models
Bilinear models for invariant gaitID
The identity recognition problem View-invariance in gaitIDBilinear modelsHMMs and a three-layer modelFour experiments on the Mobo database Riemannian metrics for
classification Distance between dynamical modelsLearning a metric from a training setPullback metricsSpaces of linear systems and Fisher metricExperiments on scalar models
The view-invariance issue
Many different nuisance factorsnuisance factors are involved Viewpoint Illumination Clothes, shoes, carried objects trajectory
Issue: view-invarianceview-invariance possible approaches:
3D tracking Virtual view reconstruction Static body parameters
Approches to view-invariant gait ID
[Cunado 99]: “Evidence gathering” technique coupled oscillators, Fourier description, inclination of thigh and leg
[Urtasun,Fua 04]: fitting 3D temporal motion models to synchronized video sequences
Motion parameters: coefficients of the singular value decomposition of the estimated model angles
[Bhanu,Han 02] matching a 3D kinematic model to 2D silhouettes
extracting a number of feature angles from the fitted model
[Kale 03]: synthetic side-view of the moving person using a single camera
[Shakhnarovich 01]:view-normalization from volumetric intersection of the visual hulls
[Johnson, Bobick 01]: static body parameters recovered across multiple views
Bilinear models for invariant gaitID
The identity recognition problem View-invariance in gaitIDBilinear modelsHMMs and a three-layer modelFour experiments on the Mobo database Riemannian metrics for
classification Distance between dynamical modelsLearning a metric from a training setPullback metricsSpaces of linear systems and Fisher metricExperiments on scalar models
Bilinear models From view-invariance to “style” invariance“style” invariance motions usually possess several labels: action,
identity, viewpoint, emotional state, etc. Bilinear modelsBilinear models (Tenenbaum) can be used to
separate the influence of two of those factors, called “style” and “content” (the label to classify)
ySC is a training set of k-dimensional observations with labels S and C
bC is a parameter vector representing content, while AS is a style-specific linear map mapping the content space onto the observation space
CSSC bAy
Bilinear models The “content” of an observation can be
thought of as a vector in an abstract “content space” of some dimension J
bC
AS
ySC
Observations are then derived from content vector linearly, through a map which depends on the “style” parameter S
Learning an asymmetric bilinear model
Given an observation sequence ySC… an asymmetric bilinear model can fitted to the
data through the SVD Y=SUV’ of a stacked SVD Y=SUV’ of a stacked observation matrixobservation matrix
The symmetric model can be written as Y=AB where
least square optimal style and content parameters are
SCS
C
yy
yy
Y
1
111
]',,[ 1 SAAA ],,[ 1 CbbB
JcolUSA ,...,1][ JrowVB ,...,1]'[
Content classification of unknown style
Consider a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint)
when new motions are acquired in which a known person is walking from a different viewpoint (unknown style)…
… an iterative EM procedure can be set up to classify the content (identity)
E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style
M step -> estimation of the linear map for the unknown style s2
2~
2),~|(
csbAy
ecsyp
Bilinear models for invariant gaitID
The identity recognition problem View-invariance in gaitIDBilinear modelsHMMs and a three-layer modelFour experiments on the Mobo database Riemannian metrics for
classification Distance between dynamical modelsLearning a metric from a training setPullback metricsSpaces of linear systems and Fisher metricExperiments on scalar models
Hidden Markov models Finite-state representation of an observation process State process {Xk} is a Markov chain
Given a sequence os observations (feature matrix)... ... EM algorithm for parameter learning (Moore) A->transition probabilities (motion dynamics) C-> means of state-output distributions (poses)
Motions as stacked HMMs Interpretation of the C matrix: columns of C are means of the Interpretation of the C matrix: columns of C are means of the
output distributions associated with the states of the modeloutput distributions associated with the states of the model
In gaitID (cyclic motions) the dynamics is the same for all sequences (A neglected)
A sequence can then be represented as a collection of poses: stacked columns of the C matrixstacked columns of the C matrix
Three-layer model
First layer (feature representation): projection of the contour of the silhouette on a sheaf of lines passing through the center
1
Third layer: bilinear model of HMMs
3
2In the second layer each sequence is encoded as a Markov model, its C matrix is stacked in an observation vector, and a bilinear model is trained over those vectors
Bilinear models for invariant gaitID
The identity recognition problem View-invariance in gaitIDBilinear modelsHMMs and a three-layer modelFour experiments on the Mobo database Riemannian metrics for
classification Distance between dynamical modelsLearning a metric from a training setPullback metricsSpaces of linear systems and Fisher metricExperiments on scalar models
Mobo database: 25 people performing 4 different walking actions, from 6 cameras6 cameras
Each sequence has three labels: action, id, viewaction, id, view
MOBO database
Four experiments We can then set up four experiments in which one one
label is chosen as contentlabel is chosen as content, another one as styleanother one as style, and the remaining is considered as a nuisance factor
content style nuisance
actionview-invariantview-invariant
action recognitionaction recognition view ID
actionID-invariantID-invariant
action recognitionaction recognition ID view
IDaction-invariantaction-invariant
gaitIDgaitID action view
IDview-invariantview-invariant
gaitIDgaitID view action
Results – ID versus VIEW Compared performances with “baseline” baseline”
algorithmalgorithm and straight k-NN on sequence HMMs
Results – ID versus action
Performance of the bilinear classifier in the ID vs action experimentID vs action experiment as a function of the nuisance (view=1:5), averaged over all the possible choices of the test action. The average best-match performance of the bilinear classifier is shown in solid red, (minimum and maximum in magenta). The best-3 matches ratio is in dotted red. The average performance of the KL-nearest neighbor classifier is shown in solid black, minimum and maximum in blue. Pure chance is in dashed black.
Feature extraction Type 1: projection of the contourprojection of the contour of the
silhouette on a sheaf of lines passing through the center
Type 2: size functions [Frosini 90] Type 3: Lee’s momentsLee’s moments
Results - influence of features
Left: ID-invariant action recognitionID-invariant action recognition using the bilinear classifier. The entire dataset is considered, regardless the viewpoint. The correct classification percentage is shown as a function of the test identity in black (for models using Lee's features) and red (contour projections). Related mean levels are drawn as dotted lines. Right: View-invariant action recognitionView-invariant action recognition.
Conclusions
Nuisance factorsNuisance factors of paramount importance in gaitID
Bilinear-multilinear modelsBilinear-multilinear models provide a way to separate different factors
Proposed a three-layer modelthree-layer model in which sequence are represented through HMMs
Some approaches to view-invariance are expensive and sensitiveexpensive and sensitive
Experiments on the Mobo database show how much separating factor is effective for motion classification
Future: multilinear models, testing on more realistic setups (many factors, UCF database)
Bilinear models for invariant gaitID
The identity recognition problem View-invariance in gaitIDBilinear modelsHMMs and a three-layer modelFour experiments on the Mobo database Riemannian metrics for
classification Distances between dynamical modelsLearning a metric from a training setPullback metricsSpaces of linear systems and Fisher metricExperiments on scalar models
Distances between dynamical models
Problem: motion classification Approach: representing each movement as a
linear dynamical modellinear dynamical model for instance, each image sequence can be
mapped to an ARMA, or AR linear model Classification is then reduced to find a suitable
distance function in the space of dynamical distance function in the space of dynamical modelsmodels
We can then use this distance in any distance-based classification scheme: k-NN, SVM, etc.
A review of the literature Some distances have been proposed a family of probability distributions depending on a n-
dimensional parameter can be regarded in fact as an n-dimensional manifold, with Fisher information matrixFisher information matrix [Amari]
Kullback-Leibler divergenceKullback-Leibler divergence Gap metricGap metric [Zames,El-Sakkary]: compares graphs
associated with linear systems thought of as input-output maps
Cepstrum normCepstrum norm [Martin] Subspace anglesSubspace angles between column spaces of the
observability matrices
ji
ij
xpxpEg
),(log,),(log
Bilinear models for invariant gaitID
The identity recognition problem View-invariance in gaitIDBilinear modelsHMMs and a three-layer modelFour experiments on the Mobo database Riemannian metrics for
classification Distances between dynamical modelsLearning a metric from a training setPullback metricsSpaces of linear systems and Fisher metricExperiments on scalar models
Learning metrics from a training set
All those metrics are task-specific Besides, it makes no sense to choose a single
distance for all possible classification problems as…
Labels can be assigned arbitrarily to dynamical systems, no matter what the underlying structure is
When some a-priori info is available (training set).. .. we can learn in a supervised fashion the “best” .. we can learn in a supervised fashion the “best”
metric for the classification problem!metric for the classification problem! A feasible approach: volume minimization of volume minimization of pullback metricspullback metrics
Learning distances Of course many unsupervised algorithms take an input
dataset and embed it in some other space, implicitly learning a metric (LLE, Laplacian Eigenmaps, etc.)
they fail to learn a full metric for the whole input space, but only images of a set of samples
[Xing, Jordan]: maximizes classification performance for linear maps y=A1/2 x > optimal Mahalanobis optimal Mahalanobis distancedistance reduces to convex optimization
[Shental et al]: relevant component analysisrelevant component analysis – changes the feature space by a global linear transformation which assigns large weights to relevant dimensions" and low weights to irrelevant dimensions
Bilinear models for invariant gaitID
The identity recognition problem View-invariance in gaitIDBilinear modelsHMMs and a three-layer modelFour experiments on the Mobo database Riemannian metrics for
classification Distances between dynamical modelsLearning a metric from a training setPullback metricsSpaces of linear systems and Fisher metricExperiments on scalar models
Learning pullback metrics Some notions of differential geometry give us a
tool to build a parameterized family of metrics
The diffeomorphism F induces on M a family of pullback metricspullback metrics
The geodesicsgeodesics of the pullback metric are the liftings of the geodesics associated with the original metric
Consider than a family of diffeomorphisms F between the original space M and a metric space N
M
F
ND
Pullback metrics - detail
)(
:
mFm
MMF
DiffeomorphismDiffeomorphism on M:
MTvMTv
MTMTF
mFm
mm
)(
*
'
:
Push-forwardPush-forward map:
),(),( **)(* vFuFgvug mFm
Given a metric on M, g:TMTM, the
pullback metricpullback metric is
N
k
M
k
k
dmmg
mgDO
1 2
1
2
1
))((det
))((det)( Inverse volumeInverse volume:
Inverse volume maximization
The natural criterion would be to optimize the classification performance
In a nonlinear setup this is hard to formulate and solve
Reasonable to choose a different but related objective function
Effect: finding the manifold which better interpolates the data (i.e. forcing the geodesics to pass through “crowded” regions)
Bilinear models for invariant gaitID
The identity recognition problem View-invariance in gaitIDBilinear modelsHMMs and a three-layer modelFour experiments on the Mobo database Riemannian metrics for
classification Distances between dynamical modelsLearning a metric from a training setPullback metricsSpaces of linear systems and Fisher metricExperiments on scalar models
Space of AR(2) models Given an input sequence, we can identify the parameters
of the linear model which better describes it We chose the class of autoregressive models of order 2
AR(2)
21
12
2212121 1
1
)1)(1)(1(
1),(
aa
aa
aaaaaaag
Fisher metric on AR(2)
to get a distance: compute the geodesics of the pullback metric on M
Under stability (|a|<1) and minimality (b 0) this family forms a manifold
0,0|),(0,0|),()1,1,1( babababaM
Space of M(1,1,1) models Consider instead the class of stable discrete-time
linear systems of order 1
After choosing a canonical setting c = 1 the transfer function becomes h(z) = b/(z a)
)()(
)()()1(
kxcky
kubkxakx
Fisher tensor:
20
01),(
rrg )(arctan,
1 2ah
a
br
Families of diffeomorphisms We chose two different families of diffeomorphisms
332211 ,,1
)( mmmm
mFp
For AR(2) systems:
For M(1,1,1) systems: babrbrarbrFp 22 ,),(
Bilinear models for invariant gaitID
The identity recognition problem View-invariance in gaitIDBilinear modelsHMMs and a three-layer modelFour experiments on the Mobo database Riemannian metrics for
classification Distances between dynamical modelsLearning a metric from a training setPullback metricsSpaces of linear systems and Fisher metricExperiments on scalar models
Classification of scalar models recognition of actions and identities from image sequences we used the Mobo database scalar feature, AR(2) and M(1,1,1) models
compared performance of all known distances, with pullback Fisher metric
built the geodesic distance used NN algorithm to classify
new sequences
Results - action
Action recognition performance, all views considered – second best distance function
Action recognition performance, all views considered – pullback Fisher metric
Action recognition, view 5 only – difference between classification rates pullback metric – second best
Results – action 2 Recognition performance of the second-best distance
(blue) and the optimal pull-back metric (red), increasing size of training set
View 1 View 5
View 3 View 6
Effect of the training set The size of the training set obviously affects the
recognition rate Systems of the class M(1,1,1) Increasing size of the training set on the abscissae
All views considered
View 2 only
Conclusions Movements can be represented as dynamical systems
Motion classification then reduces to finding a distance between dynamical model
having a training set of such models we can learn the “best” metric for a given classification problem…
… and use it to classify new sequences Pullback metrics induced by the Fisher metric
structure on linear models is a possible choice Design of a family of diffeomorphisms
Future: multidimensional observations, better objective function