cvpr2008 tutorial generalized pca

Generalized Principal Component AnalysisTutorial @ CVPR 2008

Yi Ma

ECE Department

University of Illinois

Urbana Champaign

René Vidal

Center for Imaging Science

Institute for Computational Medicine

Johns Hopkins University

Data segmentation and clustering

• Given a set of points, separate them into multiple groups

• Discriminative methods: learn boundary

• Generative methods: learn mixture model, using, e.g.Expectation Maximization

Dimensionality reduction and clustering

• In many problems data is high-dimensional: can reducedimensionality using, e.g. Principal Component Analysis

• Image compression

• Recognition– Faces (Eigenfaces)

• Image segmentation– Intensity (black-white)

– Texture

Segmentation problems in dynamic vision

• Segmentation of video and dynamic textures

• Segmentation of rigid-body motions

Segmentation problems in dynamic vision

• Segmentation of rigid-body motions from dynamic textures

Clustering data on non Euclidean spaces

• Clustering data on non Euclidean spaces– Mixtures of linear spaces

– Mixtures of algebraic varieties

– Mixtures of Lie groups

• “Chicken-and-egg” problems– Given segmentation, estimate models

– Given models, segment the data

– Initialization?

• Need to combine– Algebra/geometry, dynamics and statistics

Outline of the tutorial

• Introduction (8.00-8.15)

• Part I: Theory (8.15-9.45)

– Basic GPCA theory and algorithms (8.15-9.00)

– Advanced statistical methods for GPCA (9.00-9.45)

• Questions (9.45-10.00)

• Break (10.00-10.30)

• Part II: Applications (10.30-12.00)

– Applications to motion and video segmentation (10.30-11.15)

– Applications to image representation & segmentation (11.15-12.00)

• Questions (12.00-12.15)

Part I: Theory

• Introduction to GPCA (8.00-8.15)

• Basic GPCA theory and algorithms (8.15-9.00)

– Review of PCA and extensions

– Introductory cases: line, plane and hyperplane segmentation

– Segmentation of a known number of subspaces

– Segmentation of an unknown number of subspaces

• Advanced statistical and methods for GPCA (9.00-9.45)

– Lossy coding of samples from a subspace

– Minimum coding length principle for data segmentation

– Agglomerative lossy coding for subspace clustering

Part II: Applications in computer vision

• Applications to motion & video segmentation (10.30-11.15)

– 2-D and 3-D motion segmentation

– Temporal video segmentation

– Dynamic texture segmentation

• Applications to image representation and segmentation

(11.15-12.00)

– Multi-scale hybrid linear models for sparse

image representation

– Hybrid linear models for image segmentation

References: Springer-Verlag 2008

Slides, MATLAB code, papers

Slides: http://www.vision.jhu.edu/gpca/cvpr08-tutorial-gpca.htm

Code: http://perception.csl.uiuc.edu/gpca

Part I

Generalized Principal Component AnalysisRené Vidal




Principal Component Analysis (PCA)

• Given a set of points x1, x2, …, xN

– Geometric PCA: find a subspace S passing through them

– Statistical PCA: find projection directions that maximize the variance

• Solution (Beltrami’1873, Jordan’1874, Hotelling’33, Eckart-Householder-Young’36)

• Applications: data compression, regression, computervision (eigenfaces), pattern recognition, genomics

Basis for S

Extensions of PCA

• Higher order SVD (Tucker’66, Davis’02)

• Independent Component Analysis (Common ‘94)

• Probabilistic PCA (Tipping-Bishop ’99)

– Identify subspace from noisy data

– Gaussian noise: standard PCA

– Noise in exponential family (Collins et al.’01)

• Nonlinear dimensionality reduction

– Multidimensional scaling (Torgerson’58)

– Locally linear embedding (Roweis-Saul ’00)

– Isomap (Tenenbaum ’00)

• Nonlinear PCA (Scholkopf-Smola-Muller ’98)

– Identify nonlinear manifold by applying PCA todata embedded in high-dimensional space

• Principal Curves and Principal Geodesic Analysis(Hastie-Stuetzle’89, Tishbirany ‘92, Fletcher ‘04)

Generalized Principal Component Analysis

• Given a set of points lying in multiple subspaces, identify

– The number of subspaces and their dimensions

– A basis for each subspace

– The segmentation of the data points

• “Chicken-and-egg” problem

– Given segmentation, estimate subspaces

– Given subspaces, segment the data

Prior work on subspace clustering

• Iterative algorithms:

– K-subspace (Ho et al. ’03),

– RANSAC, subspace selection and growing (Leonardis et al. ’02)

• Probabilistic approaches: learn the parameters of a mixture

model using e.g. EM

– Mixtures of PPCA: (Tipping-Bishop ‘99):

– Multi-Stage Learning (Kanatani’04)

• Initialization

– Geometric approaches: 2 planes in R3 (Shizawa-Maze ’91)

– Factorization approaches: independent subspaces of equal

dimension (Boult-Brown ‘91, Costeira-Kanade ‘98, Kanatani ’01)

– Spectral clustering based approaches: (Yan-Pollefeys’06)

Basic ideas behind GPCA

• Towards an analytic solution to subspace clustering

– Can we estimate ALL models simultaneously using ALL data?

– When can we do so analytically? In closed form?

– Is there a formula for the number of models?

• Will consider the most general case

– Subspaces of unknown and possibly different dimensions

– Subspaces may intersect arbitrarily (not only at the origin)

• GPCA is an algebraic geometric approach to data segmentation

– Number of subspaces = degree of a polynomial

– Subspace basis = derivatives of a polynomial

– Subspace clustering is algebraically equivalent to

• Polynomial fitting

• Polynomial differentiation

Applications of GPCA in computer vision

• Geometry

– Vanishing points

• Image compression

• Segmentation

– Intensity (black-white)

– Texture

– Motion (2-D, 3-D)

– Video (host-guest)

• Recognition

– Faces (Eigenfaces)

• Man - Woman

– Human Gaits

– Dynamic Textures

• Water-bird

• Biomedical imaging

• Hybrid systems identification

Introductory example: algebraic clustering in 1D

• Number of groups?


• How to compute n, c, b’s?

– Number of clusters

– Cluster centers

– Solution is unique if

– Solution is closed form if


• What about dimension 2?

• What about higher dimensions?

– Complex numbers in higher dimensions?

– How to find roots of a polynomial of quaternions?

• Instead

– Project data onto one or two dimensional space

– Apply same algorithm to projected data

Representing one subspace

• One plane

• One line

• One subspace can be represented with

– Set of linear equations

– Set of polynomials of degree 1

De Morgan’s rule

Representing n subspaces

• Two planes

• One plane and one line

– Plane:

– Line:

• A union of n subspaces can be represented with a set of

homogeneous polynomials of degree n

Veronese map

Fitting polynomials to data points

• Polynomials can be written linearly in terms of the vector of coefficientsby using polynomial embedding

• Coefficients of the polynomials can be computed from nullspace ofembedded data

– Solve using least squares

– N = #data points

Finding a basis for each subspace

• Case of hyperplanes:

– Only one polynomial

– Number of subspaces

– Basis are normal vectors

• Problems

– Computing roots may be sensitive to noise

– The estimated polynomial may not perfectly factor with noisy

– Cannot be applied to subspaces of different dimensions

• Polynomials are estimated up to change of basis, hence they may not factor,

even with perfect data

Polynomial Factorization (GPCA-PFA) [CVPR 2003]• Find roots of polynomial of degree in one variable

• Solve linear systems in variables

• Solution obtained in closed form for

Finding a basis for each subspace

• To learn a mixture of subspaces we just need one positiveexample per class

Polynomial Differentiation (GPCA-PDA) [CVPR’04]

Choosing one point per subspace

• With noise and outliers

– Polynomials may not be a perfect union of subspaces

– Normals can estimated correctly by choosing points optimally

• Distance to closest subspace without knowing

segmentation?

GPCA for hyperplane segmentation

• Coefficients of the polynomial can be computed from null

space of embedded data matrix

– Solve using least squares

– N = #data points

• Number of subspaces can be computed from the rank of

embedded data matrix

• Normal to the subspaces can be computed

from the derivatives of the polynomial

GPCA for subspaces of different dimensions

• There are multiple polynomials

fitting the data

• The derivative of each

polynomial gives a different

normal vector

• Can obtain a basis for the

subspace by applying PCA to

normal vectors

GPCA for subspaces of different dimensions

• Apply polynomial embedding to projected data

• Obtain multiple subspace model by polynomial fitting

– Solve to obtain

– Need to know number of subspaces

• Obtain bases & dimensions by polynomial differentiation

• Optimally choose one point per subspace using distance

An example

• Given data lying in the union

of the two subspaces

• We can write the union as

• Therefore, the union can be represented with the two

polynomials

An example

• Can compute polynomials from

• Can compute normals from

Dealing with high-dimensional data

• Minimum number of points

– K = dimension of ambient space

– n = number of subspaces

• In practice the dimension ofeach subspace ki is muchsmaller than K

– Number and dimension of thesubspaces is preserved by alinear projection onto asubspace of dimension

– Can remove outliers by robustlyfitting the subspace

• Open problem: how to chooseprojection?

– PCA?

Subspace 1

Subspace 2

GPCA with spectral clustering

• Spectral clustering

– Build a similarity matrix between pairs of points

– Use eigenvectors to cluster data

• How to define a similarity for subspaces?

– Want points in the same subspace to be close

– Want points in different subspace to be far

• Use GPCA to get basis

• Distance: subspace angles

Comparison of PFA, PDA, K-sub, EM

0 1 2 3 4 50

2

4

6

8

10

12

14

16

18

Noise level [%]

Err

or in

the

norm

als

[deg

rees

]PFAK−subPDAEMPDA+K−subPDA+EMPDA+K−sub+EM

Null(Ln) L

n= [

n(x1), . . . ,

n(xN)].

Dealing with outliers

• GPCA with perfect data

• GPCA with outliers

• GPCA fails because PCA fails seek a robust estimateof where

Three approaches to tackle outliers

• Probability-based: small-probability samples

– Probability plots: [Healy 1968, Cox 1968]

– PCs: [Rao 1964, Ganadesikan & Kettenring 1972]

– M-estimators: [Huber 1981, Camplbell 1980]

– Multivariate-trimming (MVT):[Ganadesikan & Kettenring 1972]

• Influence-based: large influence on model parameters

– Parameter difference with and without a sample:[Hampel et al. 1986, Critchley 1985]

• Consensus-based: not consistent with models of high consensus.

– Hough: [Ballard 1981, Lowe 1999]

– RANSAC: [Fischler & Bolles 1981, Torr 1997]

– Least Median Estimate (LME):[Rousseeuw 1984, Steward 1999]

Robust GPCA

Robust GPCA

Simulation on Robust GPCA (parameters fixed at = 0.3rad and = 0.4

• RGPCA – Influence

• RGPCA - MVT

(e) 12% (f) 32% (g) 48% (h) 12% (i) 32% (j) 48%

(k) 12% (l) 32% (m) 48% (n) 12% (o) 32% (p) 48%

Robust GPCA

Comparison with RANSAC

• Accuracy

• Speed

(q) (2,2,1) in 3 (s) (5,5,5) in 6(r) (4,2,2,1) in 5

Arrangement (2,2,1) in 3 (4,2,2,1) in 5 (5,5,5) in 6

RANSAC 44s 5.1min 3.4min

MVT 46s 23min 8min

Influence 3min 58min 146min

Table: Average time of RANSAC and RGPCA with 24% outliers.

Summary

• GPCA: algorithm for clustering subspaces

– Deals with unknown and possibly different dimensions

– Deals with arbitrary intersections among the subspaces

• Our approach is based on

– Projecting data onto a low-dimensional subspace

– Fitting polynomials to projected subspaces

– Differentiating polynomials to obtain a basis

• Applications in image processing and computer vision

– Image segmentation: intensity and texture

– Image compression

– Face recognition under varying illumination

For more information,

Vision, Dynamics and Learning Lab

@


Thank You!Thank You!

Generalized Principal Component Analysis Generalized Principal Component Analysis via via LossyLossy Coding and CompressionCoding and Compression

Yi MaYi Ma

Image Formation & Processing Group, BeckmanDecision & Control Group, Coordinated Science

Lab.Electrical & Computer Engineering Department

University of Illinois at Urbana-Champaign

MOTIVATION

PROBLEM FORMULATION AND EXISTING APPROACHES

SEGMENTATION VIA LOSSY DATA COMPRESSION

SIMULATIONS (AND EXPERIMENTS)

CONCLUSIONS AND FUTURE DIRECTIONS

OUTLINE

MOTIVATION – Motion Segmentation in Computer Vision

The “chicken-and-egg” difficulty:– Knowing the segmentation, estimating the motions is easy;– Knowing the motions, segmenting the features is easy.

Goal: Given a sequence of images of multiple moving objects, determine:– 1. the number and types of motions (rigid-body, affine, linear, etc.)

2. the features that belong to the same motion.

A Unified Algebraic Approach to 2D and 3D Motion Segmentation, [Vidal-Ma, ECCV’

QuickTime™ and aCinepak decompressor

are needed to see this picture.

MOTIVATION – Image Segmentation

features

Computer Human

Goal: segment an image into multiple regions with homogeneous texture.

Difficulty: A mixture of models of different dimensions or complexities.

Multiscale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP’

MOTIVATION – Video SegmentationGoal: segmenting a video sequence into segments with “stationary” dynamics

Identification of Hybrid Linear Systems via Subspace Segmentation, [Huang-Wagner-Ma, C

Model: different segments as outputs from different (linear) dynamical systems:

QuickTime™ and aH.264 decompressor


MOTIVATION – Massive Multivariate Mixed Data

Face database

Hand written digits

Hyperspectral images

Microarrays

Articulate motions

QuickTime™ and aBMP decompressor


SUBSPACE SEGMENTATION – Problem Formulation

Difficulties:– the dimensions of the subspaces can be different– the data can be corrupted by noise or contaminated by outliers– the number and dimensions of subspaces may be unknown

Assumption: the data are noisy samples from an arrangement of linear subspaces:

noise-free samples noisy samples samples with outliers

SUBSPACE SEGMENTATION – Statistical Approaches

Assume that the data are i.i.d. samples from a mixture of probabilistic distributions:

Essentially iterate between data segmentation and model estimation.

Solutions:• Expectation Maximization (EM) for the maximum-likelihood estimate

[Dempster et. al.’77], e.g., Probabilistic PCA [Tipping-Bishop’99]:

• K-Means for a minimax-like estimate [Forgy’65, Jancey’66, MacQueen’67], e.g., K-Subspaces [Ho and Kriegman’03]:

SUBSPACE SEGMENTATION – An Algebro-Geometric Approach

Idea: a union of linear subspaces is an algebraic set -- the zero set of a set of (homogeneous) polynomials:

Complexity exponential in the dimension and number of subspaces.

Solution:• Identify the set of polynomials of degree n that vanish on

• Gradients of the vanishing polynomials are normals to the subspaces

Generalized Principal Component Analysis, [Vidal-Ma-Sastry, IEEE Transactions PAMI’0

SUBSPACE SEGMENTATION – An Information-Theoretic Approach

Problem: If the number/dimension of subspaces not given and data corrupted

by noise and outliers, how to determine the optimal subspaces that fit the data?Solutions: Model Selection Criteria?– Minimum message length (MML) [Wallace-Boulton’68]– Minimum description length (MDL) [Rissanen’78]– Bayesian information criterion (BIC)– Akaike information criterion (AIC) [Akaike’77]– Geometric AIC [Kanatani’03], Robust AIC [Torr’98]

Key idea (MDL):• a good balance between model complexity and data fidelity.• minimize the length of codes that describe the model and the data:

with a quantization error optimal for the model.

LOSSY DATA COMPRESSION

Questions:

– What is the “gain” or “loss” of segmenting or merging data?

– How does tolerance of error affect segmentation results?

Basic idea: whether the number of bits required to store “the whole is more than the sum of its parts”?

LOSSY DATA COMPRESSION – Problem Formulation

– A coding scheme maps a set of vectors to a sequence of bits, from which we can decode The coding length is denoted as:

– Given a set of real-valued mixed data the optimal segmentation

minimizes the overall coding length:

where

LOSSY DATA COMPRESSION – Coding Length for Multivariate Data

Theorem.Given with

is the number of bits needed to encode the data s.t. .

A nearly optimal bound for even a small number of vectors drawn from a subspace or a Gaussian source.

Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’

LOSSY DATA COMPRESSION – Two Coding Schemes

Goal: code s.t. a mean squared error

Linear subspace Gaussian source

LOSSY DATA COMPRESSION – Properties of the Coding Length

2. Asymptotic Property:

At high SNR, this is the optimal rate distortion for a Gaussian source.

1. Commutative Property:For high-dimensional data, computing the coding length only needsthe kernel matrix:

3. Invariant Property:Harmonic Analysis is useful for data compression only when the data arenon-Gaussian or nonlinear ……… so is segmentation!

LOSSY DATA COMPRESSION – Why Segment?

partitioning:

sifting:

LOSSY DATA COMPRESSION – Probabilistic Segmentation?

is a concave function in Π over the domain of a convex polytope.

Minima are reached at the vertexes of the polytope -- no probabilistic segmentation!

Assign the ith point to the jth group with probability

Theorem. The expected coding length of the segmented data


LOSSY DATA COMPRESSION – Segmentation & Channel Capacity

A MIMO additive white Gaussian noise (AWGN) channel

has the capacity:

If allowing probabilistic grouping of transmitters, the expectedcapacity

is a concave function in Π over a convex polytope.Maximizing such a capacity is a convexproblem.

On Coding and Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI

LOSSY DATA COMPRESSION – A Greedy (Agglomerative) Algorithm

Objective: minimizing the overall coding length

Input:

while true dochoose two sets such

that is minimal

ifthenelse breakendif

endOutput:

“Bottom-up” merge

QuickTime™ and aPNG decompressor


Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]

SIMULATIONS – Mixture of Almost Degenerate GaussiansNoisy samples from two lines and one plane in <3

Given Data Segmentation Results

ε0 = 0.01


ε0 = 0.08

SIMULATIONS – “Phase Transition”Rate v.s. distortion

0.08

ε0 = 0.08

#group v.s. distortion

Stability: the same segmentation for ε across 3 magnitudes!

0.08

steamwater

ice cubes


SIMULATIONS – Comparison with EM

100 x d uniformly distributed random samples from each subspace, corruptewith 4% noise. Classification rate averaged over 25 trials for each case.


SIMULATIONS – Comparison with EM

Segmenting three degenerate or non-degenerate Gaussian clusters for 50 tria


SIMULATIONS – Robustness with Outliers

35.8% outliers 45.6%

71.5% 73.6%


SIMULATIONS – Affine Subspaces with Outliers

35.8% outliers 45.6%

66.2% 69.1%


SIMULATIONS – Piecewise-Linear Approximation of Manifolds

Swiss roll Mobius strip Torus Klein bottle

SIMULATIONS – Summary

– The minimum coding length objective automatically addresses the

model selection issue: the optimal solution is very stable and robust.

– The segmentation/merging is physically meaningful (measured in bits).

The results resemble phase transition in statistical physics.

– The greedy algorithm is scalable (polynomial in both K and N) and

converges well when ε is not too small w.r.t. the sample density.

Clustering from a Classification Perspective

Solution: Knowing the distributions and , the optimal classifier is the maximum a posteriori (MAP) classifier:

Difficulties: How to learn the two distributions from samples?(parametric, non-parametric, model selection, high-dimension, outliers…)

Goal: Construct a classifier such that the misclassification error

reaches minimum.

Assumption: The training data are drawn from a distribution

MINIMUM INCREMENTAL CODING LENGTH – Problem Formulation

Ideas: Using the lossy coding length

as a surrogate for the Shannon lossless coding length w.r.t. true distributions.

Classification Criterion: Minimum Incremental Coding Length (MICL)

Additional bits need to encode the test sample with the jth training set is

MICL (“Michael”) – Asymptotic Properties

Theorem: As the number of samples goes to infinity, the MICL criterion converges with probability one to the following criterion:

where?

is the “number of effective parameters” of the j-th model (class).

Theorem: The MICL classifier converges to the above asymptotic form at the rate of for some constant .

Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]

SIMULATIONS – Interpolation and Extrapolation via MICL

MICL

SVM

k-NN


SIMULATIONS – Improvement over MAP and RDA [Friedman1989]

Two Gaussians in R2

isotropic (left)anisotropic

(right)(500 trials)

Three Gaussians in Rn

dim = ndim = n/2dim = 1

(500 trials)


SIMULATIONS – Local and Kernel MICL

LMICL k-NN

KMICL-RBF SVM-RBF

Local MICL (LMICL): Applying MICL locally to the k-nearest neighbors of the test sample (frequencylist + Bayesianist).

Kernel MICL (KMICL): Incorporating MICL with a nonlinear kernel naturally through the identity (“kernelized” RDA):


CONCLUSIONS

Assumptions: Data are in a high-dimensional space but have low-dimensional structures (subspaces or submanifolds).

Compression => Clustering & Classification:– Minimum (incremental) coding length subject to distortion.– Asymptotically optimal clustering and classification.– Greedy clustering algorithm (bottom-up, agglomerative).– MICL corroborates MAP, RDA, k-NN, and kernel methods.

Applications (Next Lectures):– Video segmentation, motion segmentation (Vidal)– Image representation & segmentation (Ma)– Others: microarray clustering, recognition of faces and

handwritten digits (Ma)

FUTURE DIRECTIONS

Theory– More complex structures: manifolds, systems, random

fields…– Regularization (ridge, lasso, banding etc.)– Sparse representation and subspace arrangements

Computation– Global optimality (random techniques, convex

optimization…)– Scalability: random sampling, approximation…

Future Application Domains– Image/video/audio classification, indexing, and retrieval– Hyper-spectral images and videos– Biomedical images, microarrays– Autonomous navigation, surveillance, and 3D mapping– Identification of hybrid linear/nonlinear systems

REFERENCES & ACKNOWLEGMENT

References:– Segmentation of Multivariate Mixed Data via Lossy Data

Compression, Yi Ma, Harm Derksen, Wei Hong, John Wright, PAMI, 2007.

– Classification via Minimum Incremental Coding Length (MICL), John Wright et. al., NIPS, 2007.

– Website: http://perception.csl.uiuc.edu/coding/home.htm

People:– John Wright, PhD Student, ECE Department, University of Illinois– Prof. Harm Derksen, Mathematics Department, University of

Michigan– Allen Yang (UC Berkeley) and Wei Hong (Texas Instruments R&D)– Zhoucheng Lin and Harry Shum, Microsoft Research Asia, China

Funding:– ONR YIP N00014-05-1-0633– NSF CAREER IIS-0347456, CCF-TF-0514955, CRS-EHS-0509151

http://perception.csl.uiuc.edu/coding/home.htm

““The whole is more than the sum of its The whole is more than the sum of its partsparts.”.”

----Aristotle Aristotle

Questions, please?

11/2003

Yi Ma, CVPR 2008

Part II

Applications of GPCA in Computer VisionRené Vidal




Part II: Applications in computer vision

• Applications to motion & video segmentation (10.30-11.15)




• Applications to image representation and segmentation

(11.15-12.00)

– Multi-scale hybrid linear models for sparse

image representation

– Hybrid linear models for image segmentation

Applications to motion and and video

segmentationRené Vidal




3-D motion segmentation problem

• Given a set of point correspondences in multiple views, determine

– Number of motion models

– Motion model: affine, homography, fundamental matrix, trifocal tensor

– Segmentation: model to which each pixel belongs

• Mathematics of the problem depends on

– Number of frames (2, 3, multiple)

– Projection model (affine, perspective)

– Motion model (affine, translational, homography, fundamental matrix, etc.)

– 3-D structure (planar or not)

Taxonomy of problems

• 2-D Layered representation– Probabilistic approaches: Jepson-Black’93, Ayer-Sawhney’95, Darrel-Pentland’95, Weiss-

Adelson’96, Weiss’97, Torr-Szeliski-Anandan’99

– Variational approaches: Cremers-Soatto ICCV’03

– Initialization: Wang-Adelson’94, Irani-Peleg’92, Shi-Malik‘98, Vidal-Singaraju’05-’06

• Multiple rigid motions in two perspective views– Probabilistic approaches: Feng-Perona’98, Torr’98

– Particular cases: Izawa-Mase’92, Shashua-Levin’01, Sturm’02,

– Multibody fundamental matrix: Wolf-Shashua CVPR’01, Vidal et al. ECCV’02, CVPR’03, IJCV’06

– Motions of different types: Vidal-Ma-ECCV’04, Rao-Ma-ICCV’05

• Multiple rigid motions in three perspective views– Multibody trifocal tensor: Hartley-Vidal-CVPR’04

• Multiple rigid motions in multiple affine views– Factorization-based: Costeira-Kanade’98, Gear’98, Wu et al.’01, Kanatani’ et al.’01-02-04

– Algebraic: Yan-Pollefeys-ECCV’06, Vidal-Hartley-CVPR’04

• Multiple rigid motions in multiple perspective views

– Schindler et al. ECCV’06, Li et al. CVPR’07

A unified approach to motion segmentation

• Estimation of multiple motion models equivalent to

estimation of one multibody motion model

– Eliminate feature clustering: multiplication

– Estimate a single multibody motion model: polynomial fitting

– Segment multibody motion model: polynomial differentiation

chicken-and-egg

A unified approach to motion segmentation

• Applies to most motion models in computer vision

• All motion models can be segmented algebraically by– Fitting multibody model: real or complex polynomial to all data

– Fitting individual model: differentiate polynomial at a data point

Segmentation of 3-D translational motions

• Multiple epipoles (translation)

• Epipolar constraint: plane in

– Plane normal = epipoles

– Data = epipolar lines

• Multibody epipolar constraint • Epipoles are derivatives of

at epipolar lines

Segmentation of 3-D translational motions

Single-body factorization

• Affine camera model

– p = point

– f = frame

• Motion of one rigid-body lives in a 4-D subspace(Boult and Brown ’91,

Tomasi and Kanade ‘92)

– P = #points

– F = #frames

Structure = 3D surface

Motion = camera position and orientation

Multi-body factorization

• Given n rigid motions

• Motion segmentation is obtained from

– Leading singular vector of (Boult and Brown ’91)

– Shape interaction matrix (Costeira & Kanade ’95, Gear ’94)

– Number of motions (if fully-dimensional)

• Motion subspaces need to be independent (Kanatani ’01)

Multi-body factorization

• Sensitive to noise

– Kanatani (ICCV ’01): use model selection to scale Q

– Wu et al. (CVPR’01): project data onto subspaces and iterate

• Fails with partially dependent motions

– Zelnik-Manor and Irani (CVPR’03)

• Build similarity matrix from normalized Q

• Apply spectral clustering to similarity matrix

– Yan and Pollefeys (ECCV’06)

• Local subspace estimation + spectral clustering

– Kanatani (ECCV’04)

• Assume degeneracy is known: pure translation in the image

• Segment data by multi-stage optimization (multiple EM problems)

• Cannot handle missing data

– Gruber and Weiss (CVPR’04)

• Expectation Maximization

PowerFactorization+GPCA

• A motion segmentation algorithm that

– Is provably correct with perfect data

– Handles both independent and degenerate motions

– Handles both complete and incomplete data

• Project trajectories onto a 5-D subspace of

– Complete data: PCA or SVD

– Incomplete data: PowerFactorization

• Cluster projected subspaces using GPCA

– Handles both independent and degenerate motions

– Non-iterative: can be used to initialize EM

Projection onto a 5-D subspace

• Motion of one rigid-body lives in

4-D subspace of

• By projecting onto a 5-D

subspace of

– Number and dimensions of

subspaces are preserved

– Motion segmentation is

equivalent to clustering

subspaces of dimension

2, 3 or 4 in

– Minimum #frames = 3

(CK needs a minimum of 2n

frames for n motions)

– Can remove outliers by robustly

fitting the 5-D subspace using

Robust SVD (DeLaTorre-Black)

• What projection to use?

– PCA: 5 principal components

– RPCA: with outliers

Motion 1

Motion 2

Projection onto a 5-D subspace

PowerFactorization algorithm:

• Complete data

– Given A solve for B

– Orthonormalize B

– Given B solve for A

– Iterate

• Converges to rank-rapproximation with rate

Given , factor it as

• Incomplete data

• It diverges in some cases

• Works well with up to 30% ofmissing data

Linear problem

Motion segmentation using GPCA

• Apply polynomial embedding to 5-D points

Veronese map

Hopkins 155 motion segmentation database

• Collected 155 sequences– 120 with 2 motions

– 35 with 3 motions

• Types of sequences– Checkerboard sequences: mostly full

dimensional and independent motions

– Traffic sequences: mostly degenerate (linear,planar) and partially dependent motions

– Articulated sequences: mostly full dimensionaland partially dependent motions

• Point correspondences– In few cases, provided by Kanatani & Pollefeys

– In most cases, extracted semi-automaticallywith OpenCV

Experimental results: Hopkins 155 database

• 2 motions, 120 sequences, 266 points, 30 frames

Experimental results: Hopkins 155 database

• 3 motions, 35 sequences, 398 points, 29 frames

Experimental results: missing data sequences

• There is no clear correlation between amount of missing data and

percentage of misclassification

• This could be because convergence of PF depends more on “where”

missing data is located than on “how much” missing data there is

Conclusions

• For two motions

– Algebraic methods (GPCA and LSA) are more accurate than

statistical methods (RANSAC and MSL)

– LSA performs better on full and independent sequences, while

GPCA performs better on degenerate and partially dependent

– LSA is sensitive to dimension of projection: d=4n better than d=5

– MSL is very slow, RANSAC and GPCA are fast

• For three motions

– GPCA is not very accurate, but is very fast

– MSL is the most accurate, but it is very slow

– LSA is almost as accurate as MSL and almost as fast as GPCA

Segmentation of Dynamic Textures

René Vidal




Modeling a dynamic texture: fixed boundary

• Examples of dynamic textures:

• Model temporal evolution as the output of a linear

dynamical system (LDS): Soatto et al. ‘01

dynamics

appearance

images

zt+1

= Azt+ v

t

yt= Cz

t+ w

t

Segmenting non-moving dynamic textures

• One dynamic texture lives in the observability subspace

• Multiple textures live in multiple subspaces

• Cluster the data using GPCA

water

steam

zt+1

= Azt+ v

t

yt= Cz

t+ w

t

Segmenting moving dynamic textures

Segmenting moving dynamic textures

Ocean-bird

Level-set intensity-based segmentation

• Chan-Vese energy functional

• Implicit methods– Represent C as the zero level set

of an implicit function , i.e.C = {(x, y) : (x, y) = 0}

• Solution– The solution to the gradient descent algorithm for is given by

– c1 and c2 are the mean intensities inside and outside the contour C.

Dynamics & intensity-based energy

• We represent the intensities of the pixels in the images as

the output of a mixture of AR models of order p

• We propose the following spatial-temporal extension of the

Chan-Vese energy functional

where

Variational segmentation of dynamic textures

• Given the ARX parameters, we can solve for the implicit

function by solving the PDE

• Given the implicit function , we can solve for the ARX

parameters of the jth region by solving the linear system


• Fixed boundary segmentation results and comparison

Ocean-smoke Ocean-dynamics Ocean-appearance


• Moving boundary segmentation results and comparison

Ocean-fire


• Results on a real sequence

Raccoon on River

Temporal video segmentation

• Segmenting N=30 frames of a

sequence containing n=3

scenes

– Host

– Guest

– Both

• Image intensities are output of

linear system

• Apply GPCA to fit n=3

observability subspaces

dynamics

appearanceimages

xt+1=Axt+vtyt=Cxt+wt

Temporal video segmentation

• Segmenting N=60 frames of a

sequence containing n=3

scenes

– Burning wheel

– Burnt car with people

– Burning car

• Image intensities are output of linearsystem

• Apply GPCA to fit n=3 observabilitysubspaces

dynamics

appearanceimages

xt+1=Axt+vtyt=Cxt+wt

Conclusions

• Many problems in computer vision can be posed as subspaceclustering problems




– Nonrigid motion segmentation

• These problems can be solved using GPCA: an algorithm for clusteringsubspaces

– Deals with unknown and possibly different dimensions

– Deals with arbitrary intersections among the subspaces

• GPCA is based on

– Projecting data onto a low-dimensional subspace

– Recursively fitting polynomials to projected subspaces

– Differentiating polynomials to obtain a basis

For more information,

Vision, Dynamics and Learning Lab

@



Generalized Principal Component Analysis Generalized Principal Component Analysis for Image Representation & Segmentationfor Image Representation & Segmentation

Yi MaYi MaControl & Decision, Coordinated Science Laboratory

Image Formation & Processing Group, BeckmanDepartment of Electrical & Computer Engineering

University of Illinois at Urbana-Champaign

INTRODUCTION

GPCA FOR LOSSY IMAGE REPRESENTATION

IMAGE SEGMENTATION VIA LOSSY COMPRESSION

OTHER APPLICATIONS

CONCLUSIONS AND FUTURE DIRECTIONS

Introduction – Image Representation via Linear Transformations

better representations?

pixel-based representation

three matrixes of RGB-values

linear transformationa more compact representation

Introduction

Fixed Orthogonal Bases (representation, approximation, compression) - Discrete Fourier transform (DFT) or discrete cosine transform (DCT)

(Ahmed ’74): JPEG.- Wavelets (multi-resolution) (Daubechies’88, Mallat’92): JPEG-2000.- Curvelets and contourlets (Candes & Donoho’99, Do & Veterlli’00)

Unorthogonal Bases (for redundant representations)- Extended lapped transforms, frames, sparse representations (Lp

geometry)…

Discrete Fourier transform (DFT)

Wavelet transform

6.25% coefficients.

Introduction

Adaptive Bases (optimal if imagery data are uni-modal)- Karhunen-Loeve transform (KLT), also known as PCA (Pearson’1901,

Hotelling’33, Jolliffe’86)

stack

adaptive bases

Introduction – Principal Component Analysis (PCA)Dimensionality Reduction

Find a low-dimensional representation (model) for high-dimensional data.

Variations of PCA – Nonlinear Kernel PCA (Scholkopf-Smola-Muller’98) – Probabilistic PCA (Tipping-Bishop’99, Collins et.al’01)– Higher-Order SVD (HOSVD) (Tucker’66, Davis’02)– Independent Component Analysis (Hyvarinen-Karhunen-Oja’01)

Principal Component Analysis (Pearson’1901, Hotelling’1933, Eckart & Young’1936) or Karhunen-Loeve transform (KLT).

Basis for S SVD

Distribution of the first three principal components of the Baboon image: A clear multi-modal distribution

Hybrid Linear Models – Multi-Modal Characteristics

Vector Quantization (VQ)

- multiple 0-dimensional affine subspaces (i.e. cluster means)

- existing clustering algorithms are iterative (EM, K-means)

Hybrid Linear Models – Multi-Modal Characteristics

Hybrid Linear Models – Versus Linear ModelsA single linear model

stackLinear

Hybrid linear

Hybrid linear models

stack

Hybrid Linear Models – Characteristics of Natural Images

Multivariate Hybrid Hierarchical High-dimensio1D 2D (multi-modal) (multi-scale) (vector-value

Fourier (DCT)

WaveletsCurvelets

Random fields

PCA/KLT

VQ

Hybrid linear

We need a new & simple paradigm to effectively account for allthese characteristics simultaneously.

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

Hybrid Linear Models – Subspace Estimation and Segmentation

“Chicken-and-Egg” Coupling– Given segmentation, estimate subspaces– Given subspaces, segment the data

Hybrid Linear Models (or Subspace Arrangements)

– the number of subspaces is unknown– the dimensions of the subspaces are unknown– the basis of the subspaces are unknown– the segmentation of the data points is unknown

Hybrid Linear Models – Recursive GPCA (an Example)

Hybrid Linear Models – Effective Dimension

Model Selection (for Noisy Data)Model complexity;Data fidelity;

Dimension of each

subspace

Number of points in each

subspace

Number of subspaces

Total number of

points

Model selection criterion: minimizing effective dimension subject to a given error tolerance (or PSNR)

Hybrid Linear Models – Simulation Results (5% Noise)

ED=3

ED=2.0067

ED=1.6717

Hybrid Linear Models – Subspaces of the Barbara Image

PCA (8x8)

HarrWavelet GPCA (8x8)

Original

Hybrid Linear Models – Lossy Image Representation (Baboon)

DCT (JPEG)

GPCA

Multi-Scale Implementation – Algorithm Diagram

Diagram for a level-3 implementation of hybrid linear modelsfor image representation

Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP

Multi-Scale Implementation – The Baboon Image

downsampleby two twice

segmentation of2 by 2 blocks

The Baboon image


Multi-Scale Implementation – Comparison with Other Methods

The Baboon image


Multi-Scale Implementation – Image Approximation

Comparison with level-3 wavelet (7.5% coefficients)

Level-3 bior-4.4 wavelets PSNR=23.94

Level-3 hybrid linear model PSNR=24.64


Multi-Scale Implementation – Block Size Effect

Some problems with the multi-scale hybrid linear model: 1. has minor block effect; 2. is computationally more costly (than Fourier, wavelets, PCA); 3. does not fully exploit spatial smoothness as wavelets.

The Baboon image


Multi-Scale Implementation – The Wavelet Domain

The Baboon image

segmentationat each scale

LH HH

HL


Multi-Scale Implementation – Wavelets v.s. Hybrid Linear Wavelets

The Baboon image

Advantages of the hybrid linear model in wavelet domain: 1. eliminates block effect; 2. is computationally less costly (than in the spatial domain); 3. achieves higher PSNR.


Multi-Scale Implementation – Visual ComparisonComparison among several models (7.5% coefficients)

Original Image

Wavelets PSNR=23.94

Hybrid modelin spatial domain

PSNR=24.64

Hybrid modelin wavelet

domain PSNR=24.88


Image Segmentation – via Lossy Data Compression

stack

QuickTime™ and aPNG decompressor


APPLICATIONS – Texture-Based Image Segmentation

Naïve approach:– Take a 7x7 Gaussian window around every pixel.– Stack these windows as vectors.– Clustering the vectors using our algorithm.

A few results:


APPLICATIONS – Distribution of Texture Features

Question: why does such a simple algorithm work at all?

Answer: Compression (MDL/MCL) is well suited to mid-level texture segmentation.

Using a single representation (e.g. windows, filterbank responses) for texturesdifferent complexity ⇒ redundancy and degeneracy, which can be exploited foclustering / compression.

QuickTime™ and aTIFF (LZW) decompressor


Above: singular values of feature vectors from two differentsegments of the image at left.

APPLICATIONS – Compression-based Texture Merging (CTM)

Problem with the naïve approach:Strong edges, segment boundaries

Solution:Low-level, edge-preserving over-segmentation into small homogeneous regions.

Simple features: stacked Gaussian windows (7x7 in our experiments).

Merge adjacent regions to minimize coding length (“compress” the features).







APPLICATIONS – Hierarchical Image Segmentation via CTM

ε = 0.1 ε = 0.2 ε = 0.4Lossy coding with varying distortion ε => hierarchy of segmentations

APPLICATIONS – CTM: Qualitative Results

APPLICATIONS – CTM: Quantitative Evaluation and Comparison

PRI: Probabilistic Rand Index [Pantofaru 2005]VoI: Variation of Information [Meila 2005]GCE: Global Consistency Error [Martin 2001]BDE: Boundary Displacement Error [Freixenet 2002]

Berkeley Image Segmentation Database

Unsupervised Segmentation of Natural Images via Lossy Data Compression, CVIU, 200

Other Applications: Multiple Motion Segmentation (on Hopkins155)

Shankar Rao, Roberton Tron, Rene Vidal, and Yi Ma, to appear in CVPR’08





Two Motions: MSL 4.14%, LSA 3.45%, ALC 2.40%, and work with up to 25% outliers.Three Motions: MSL 8.32%, LSA 9.73%, ALC 6.26%.

Other Applications – Clustering of Microarray Data


Other Applications – Supervised Classification

Premises: Data lie on anarrangement of subspaces

Unsupervised Clustering– Generalized PCA

Supervised Classification– Sparse Representation

Other Applications – Robust Face Recognition

Robust Face Recognition via Sparse Representation, to appear in PAMI 2008

Other Applications: Robust Motion Segmentation (on Hopkins155)

Shankar Rao, Roberto Tron, Rene Vidal, and Yi Ma, to appear in CVPR’08

Dealing with incomplete or mistracked features with dataset 80% corrupted!

Three Measures of Sparsity: Bits, L_0 and L1-Norm

Reason: High-dimensional data, like images, do have compact, compressible, sparse structures, in terms of their geometry, statistics, and semantics.

Conclusions

Most imagery data are high-dimensional, statistically or geometrically heterogeneous, and have multi-scalestructures.

Imagery data require hybrid models that can adaptivelyrepresent different subsets of the data with different (sparse) linear models.

Mathematically, it is possible to estimate and segment hybrid (linear) models non-iteratively. GPCA offers one such method.

Hybrid models lead to new paradigms, new principles, andnew applications for image representation, compression, and segmentation.

Future Directions

Mathematical Theory– Subspace arrangements (algebraic properties).– Extension of GPCA to more complex algebraic varieties (e.g.,

hybrid multilinear, high-order tensors).– Representation & approximation of vector-valued functions.

Computation & Algorithm Development– Efficiency, noise sensitivity, outlier elimination.– Other ways to combine with wavelets and curvelets.

Applications to Other Data– Medical imaging (ultra-sonic, MRI, diffusion tensor…)– Satellite hyper-spectral imaging.– Audio, video, faces, and digits.– Sensor networks (location, temperature, pressure, RFID…)– Bioinformatics (gene expression data…)

Acknowledgement

People– Wei Hong, Allen Yang, John Wright, University of Illinois– Rene Vidal of Biomedical Engineering Dept., Johns Hopkins

University– Kun Huang of Biomedical & Informatics Science Dept., Ohio-

State University

Funding– Research Board, University of Illinois at Urbana-Champaign– National Science Foundation (NSF CAREER IIS-0347456)– Office of Naval Research (ONR YIP N000140510633)– National Science Foundation (NSF CRS-EHS0509151)– National Science Foundation (NSF CCF-TF0514955)


Generalized Principal Component Analysis: Generalized Principal Component Analysis: Modeling and Segmentation of Multivariate Mixed Modeling and Segmentation of Multivariate Mixed DataData

Rene Vidal, Yi Ma, and Rene Vidal, Yi Ma, and ShankarShankar SastrySastrySpringerSpringer--VerlagVerlag, to appear, to appear

Yi Ma, CVPR 2008

cvpr2008 tutorial generalized pca

Education