cvpr2008 tutorial generalized pca

157
Generalized Principal Component Analysis Tutorial @ CVPR 2008 Yi Ma ECE Department University of Illinois Urbana Champaign René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University

Upload: zukun

Post on 18-Dec-2014

1.144 views

Category:

Education


1 download

DESCRIPTION

part I: 1part II: 12part III: 43part IV: 80part V: 116

TRANSCRIPT

Page 1: CVPR2008 tutorial generalized pca

Generalized Principal Component AnalysisTutorial @ CVPR 2008

Yi Ma

ECE Department

University of Illinois

Urbana Champaign

René Vidal

Center for Imaging Science

Institute for Computational Medicine

Johns Hopkins University

Page 2: CVPR2008 tutorial generalized pca

Data segmentation and clustering

• Given a set of points, separate them into multiple groups

• Discriminative methods: learn boundary

• Generative methods: learn mixture model, using, e.g.Expectation Maximization

Page 3: CVPR2008 tutorial generalized pca

Dimensionality reduction and clustering

• In many problems data is high-dimensional: can reducedimensionality using, e.g. Principal Component Analysis

• Image compression

• Recognition– Faces (Eigenfaces)

• Image segmentation– Intensity (black-white)

– Texture

Page 4: CVPR2008 tutorial generalized pca

Segmentation problems in dynamic vision

• Segmentation of video and dynamic textures

• Segmentation of rigid-body motions

Page 5: CVPR2008 tutorial generalized pca

Segmentation problems in dynamic vision

• Segmentation of rigid-body motions from dynamic textures

Page 6: CVPR2008 tutorial generalized pca

Clustering data on non Euclidean spaces

• Clustering data on non Euclidean spaces– Mixtures of linear spaces

– Mixtures of algebraic varieties

– Mixtures of Lie groups

• “Chicken-and-egg” problems– Given segmentation, estimate models

– Given models, segment the data

– Initialization?

• Need to combine– Algebra/geometry, dynamics and statistics

Page 7: CVPR2008 tutorial generalized pca

Outline of the tutorial

• Introduction (8.00-8.15)

• Part I: Theory (8.15-9.45)

– Basic GPCA theory and algorithms (8.15-9.00)

– Advanced statistical methods for GPCA (9.00-9.45)

• Questions (9.45-10.00)

• Break (10.00-10.30)

• Part II: Applications (10.30-12.00)

– Applications to motion and video segmentation (10.30-11.15)

– Applications to image representation & segmentation (11.15-12.00)

• Questions (12.00-12.15)

Page 8: CVPR2008 tutorial generalized pca

Part I: Theory

• Introduction to GPCA (8.00-8.15)

• Basic GPCA theory and algorithms (8.15-9.00)

– Review of PCA and extensions

– Introductory cases: line, plane and hyperplane segmentation

– Segmentation of a known number of subspaces

– Segmentation of an unknown number of subspaces

• Advanced statistical and methods for GPCA (9.00-9.45)

– Lossy coding of samples from a subspace

– Minimum coding length principle for data segmentation

– Agglomerative lossy coding for subspace clustering

Page 9: CVPR2008 tutorial generalized pca

Part II: Applications in computer vision

• Applications to motion & video segmentation (10.30-11.15)

– 2-D and 3-D motion segmentation

– Temporal video segmentation

– Dynamic texture segmentation

• Applications to image representation and segmentation

(11.15-12.00)

– Multi-scale hybrid linear models for sparse

image representation

– Hybrid linear models for image segmentation

Page 10: CVPR2008 tutorial generalized pca

References: Springer-Verlag 2008

Page 11: CVPR2008 tutorial generalized pca

Slides, MATLAB code, papers

Slides: http://www.vision.jhu.edu/gpca/cvpr08-tutorial-gpca.htm

Code: http://perception.csl.uiuc.edu/gpca

Page 12: CVPR2008 tutorial generalized pca

Part I

Generalized Principal Component AnalysisRené Vidal

Center for Imaging Science

Institute for Computational Medicine

Johns Hopkins University

Page 13: CVPR2008 tutorial generalized pca

Principal Component Analysis (PCA)

• Given a set of points x1, x2, …, xN

– Geometric PCA: find a subspace S passing through them

– Statistical PCA: find projection directions that maximize the variance

• Solution (Beltrami’1873, Jordan’1874, Hotelling’33, Eckart-Householder-Young’36)

• Applications: data compression, regression, computervision (eigenfaces), pattern recognition, genomics

Basis for S

Page 14: CVPR2008 tutorial generalized pca

Extensions of PCA

• Higher order SVD (Tucker’66, Davis’02)

• Independent Component Analysis (Common ‘94)

• Probabilistic PCA (Tipping-Bishop ’99)

– Identify subspace from noisy data

– Gaussian noise: standard PCA

– Noise in exponential family (Collins et al.’01)

• Nonlinear dimensionality reduction

– Multidimensional scaling (Torgerson’58)

– Locally linear embedding (Roweis-Saul ’00)

– Isomap (Tenenbaum ’00)

• Nonlinear PCA (Scholkopf-Smola-Muller ’98)

– Identify nonlinear manifold by applying PCA todata embedded in high-dimensional space

• Principal Curves and Principal Geodesic Analysis(Hastie-Stuetzle’89, Tishbirany ‘92, Fletcher ‘04)

Page 15: CVPR2008 tutorial generalized pca

Generalized Principal Component Analysis

• Given a set of points lying in multiple subspaces, identify

– The number of subspaces and their dimensions

– A basis for each subspace

– The segmentation of the data points

• “Chicken-and-egg” problem

– Given segmentation, estimate subspaces

– Given subspaces, segment the data

Page 16: CVPR2008 tutorial generalized pca

Prior work on subspace clustering

• Iterative algorithms:

– K-subspace (Ho et al. ’03),

– RANSAC, subspace selection and growing (Leonardis et al. ’02)

• Probabilistic approaches: learn the parameters of a mixture

model using e.g. EM

– Mixtures of PPCA: (Tipping-Bishop ‘99):

– Multi-Stage Learning (Kanatani’04)

• Initialization

– Geometric approaches: 2 planes in R3 (Shizawa-Maze ’91)

– Factorization approaches: independent subspaces of equal

dimension (Boult-Brown ‘91, Costeira-Kanade ‘98, Kanatani ’01)

– Spectral clustering based approaches: (Yan-Pollefeys’06)

Page 17: CVPR2008 tutorial generalized pca

Basic ideas behind GPCA

• Towards an analytic solution to subspace clustering

– Can we estimate ALL models simultaneously using ALL data?

– When can we do so analytically? In closed form?

– Is there a formula for the number of models?

• Will consider the most general case

– Subspaces of unknown and possibly different dimensions

– Subspaces may intersect arbitrarily (not only at the origin)

• GPCA is an algebraic geometric approach to data segmentation

– Number of subspaces = degree of a polynomial

– Subspace basis = derivatives of a polynomial

– Subspace clustering is algebraically equivalent to

• Polynomial fitting

• Polynomial differentiation

Page 18: CVPR2008 tutorial generalized pca

Applications of GPCA in computer vision

• Geometry

– Vanishing points

• Image compression

• Segmentation

– Intensity (black-white)

– Texture

– Motion (2-D, 3-D)

– Video (host-guest)

• Recognition

– Faces (Eigenfaces)

• Man - Woman

– Human Gaits

– Dynamic Textures

• Water-bird

• Biomedical imaging

• Hybrid systems identification

Page 19: CVPR2008 tutorial generalized pca

Introductory example: algebraic clustering in 1D

• Number of groups?

Page 20: CVPR2008 tutorial generalized pca

Introductory example: algebraic clustering in 1D

• How to compute n, c, b’s?

– Number of clusters

– Cluster centers

– Solution is unique if

– Solution is closed form if

Page 21: CVPR2008 tutorial generalized pca

Introductory example: algebraic clustering in 2D

• What about dimension 2?

• What about higher dimensions?

– Complex numbers in higher dimensions?

– How to find roots of a polynomial of quaternions?

• Instead

– Project data onto one or two dimensional space

– Apply same algorithm to projected data

Page 22: CVPR2008 tutorial generalized pca

Representing one subspace

• One plane

• One line

• One subspace can be represented with

– Set of linear equations

– Set of polynomials of degree 1

Page 23: CVPR2008 tutorial generalized pca

De Morgan’s rule

Representing n subspaces

• Two planes

• One plane and one line

– Plane:

– Line:

• A union of n subspaces can be represented with a set of

homogeneous polynomials of degree n

Page 24: CVPR2008 tutorial generalized pca

Veronese map

Fitting polynomials to data points

• Polynomials can be written linearly in terms of the vector of coefficientsby using polynomial embedding

• Coefficients of the polynomials can be computed from nullspace ofembedded data

– Solve using least squares

– N = #data points

Page 25: CVPR2008 tutorial generalized pca

Finding a basis for each subspace

• Case of hyperplanes:

– Only one polynomial

– Number of subspaces

– Basis are normal vectors

• Problems

– Computing roots may be sensitive to noise

– The estimated polynomial may not perfectly factor with noisy

– Cannot be applied to subspaces of different dimensions

• Polynomials are estimated up to change of basis, hence they may not factor,

even with perfect data

Polynomial Factorization (GPCA-PFA) [CVPR 2003]• Find roots of polynomial of degree in one variable

• Solve linear systems in variables

• Solution obtained in closed form for

Page 26: CVPR2008 tutorial generalized pca

Finding a basis for each subspace

• To learn a mixture of subspaces we just need one positiveexample per class

Polynomial Differentiation (GPCA-PDA) [CVPR’04]

Page 27: CVPR2008 tutorial generalized pca

Choosing one point per subspace

• With noise and outliers

– Polynomials may not be a perfect union of subspaces

– Normals can estimated correctly by choosing points optimally

• Distance to closest subspace without knowing

segmentation?

Page 28: CVPR2008 tutorial generalized pca

GPCA for hyperplane segmentation

• Coefficients of the polynomial can be computed from null

space of embedded data matrix

– Solve using least squares

– N = #data points

• Number of subspaces can be computed from the rank of

embedded data matrix

• Normal to the subspaces can be computed

from the derivatives of the polynomial

Page 29: CVPR2008 tutorial generalized pca

GPCA for subspaces of different dimensions

• There are multiple polynomials

fitting the data

• The derivative of each

polynomial gives a different

normal vector

• Can obtain a basis for the

subspace by applying PCA to

normal vectors

Page 30: CVPR2008 tutorial generalized pca

GPCA for subspaces of different dimensions

• Apply polynomial embedding to projected data

• Obtain multiple subspace model by polynomial fitting

– Solve to obtain

– Need to know number of subspaces

• Obtain bases & dimensions by polynomial differentiation

• Optimally choose one point per subspace using distance

Page 31: CVPR2008 tutorial generalized pca

An example

• Given data lying in the union

of the two subspaces

• We can write the union as

• Therefore, the union can be represented with the two

polynomials

Page 32: CVPR2008 tutorial generalized pca

An example

• Can compute polynomials from

• Can compute normals from

Page 33: CVPR2008 tutorial generalized pca

Dealing with high-dimensional data

• Minimum number of points

– K = dimension of ambient space

– n = number of subspaces

• In practice the dimension ofeach subspace ki is muchsmaller than K

– Number and dimension of thesubspaces is preserved by alinear projection onto asubspace of dimension

– Can remove outliers by robustlyfitting the subspace

• Open problem: how to chooseprojection?

– PCA?

Subspace 1

Subspace 2

Page 34: CVPR2008 tutorial generalized pca

GPCA with spectral clustering

• Spectral clustering

– Build a similarity matrix between pairs of points

– Use eigenvectors to cluster data

• How to define a similarity for subspaces?

– Want points in the same subspace to be close

– Want points in different subspace to be far

• Use GPCA to get basis

• Distance: subspace angles

Page 35: CVPR2008 tutorial generalized pca

Comparison of PFA, PDA, K-sub, EM

0 1 2 3 4 50

2

4

6

8

10

12

14

16

18

Noise level [%]

Err

or in

the

norm

als

[deg

rees

]PFAK−subPDAEMPDA+K−subPDA+EMPDA+K−sub+EM

Page 36: CVPR2008 tutorial generalized pca

Null(Ln) L

n= [

n(x1), . . . ,

n(xN)].

Dealing with outliers

• GPCA with perfect data

• GPCA with outliers

• GPCA fails because PCA fails seek a robust estimateof where

Page 37: CVPR2008 tutorial generalized pca

Three approaches to tackle outliers

• Probability-based: small-probability samples

– Probability plots: [Healy 1968, Cox 1968]

– PCs: [Rao 1964, Ganadesikan & Kettenring 1972]

– M-estimators: [Huber 1981, Camplbell 1980]

– Multivariate-trimming (MVT):[Ganadesikan & Kettenring 1972]

• Influence-based: large influence on model parameters

– Parameter difference with and without a sample:[Hampel et al. 1986, Critchley 1985]

• Consensus-based: not consistent with models of high consensus.

– Hough: [Ballard 1981, Lowe 1999]

– RANSAC: [Fischler & Bolles 1981, Torr 1997]

– Least Median Estimate (LME):[Rousseeuw 1984, Steward 1999]

Page 38: CVPR2008 tutorial generalized pca

Robust GPCA

Page 39: CVPR2008 tutorial generalized pca

Robust GPCA

Simulation on Robust GPCA (parameters fixed at = 0.3rad and = 0.4

• RGPCA – Influence

• RGPCA - MVT

(e) 12% (f) 32% (g) 48% (h) 12% (i) 32% (j) 48%

(k) 12% (l) 32% (m) 48% (n) 12% (o) 32% (p) 48%

Page 40: CVPR2008 tutorial generalized pca

Robust GPCA

Comparison with RANSAC

• Accuracy

• Speed

(q) (2,2,1) in 3 (s) (5,5,5) in 6(r) (4,2,2,1) in 5

Arrangement (2,2,1) in 3 (4,2,2,1) in 5 (5,5,5) in 6

RANSAC 44s 5.1min 3.4min

MVT 46s 23min 8min

Influence 3min 58min 146min

Table: Average time of RANSAC and RGPCA with 24% outliers.

Page 41: CVPR2008 tutorial generalized pca

Summary

• GPCA: algorithm for clustering subspaces

– Deals with unknown and possibly different dimensions

– Deals with arbitrary intersections among the subspaces

• Our approach is based on

– Projecting data onto a low-dimensional subspace

– Fitting polynomials to projected subspaces

– Differentiating polynomials to obtain a basis

• Applications in image processing and computer vision

– Image segmentation: intensity and texture

– Image compression

– Face recognition under varying illumination

Page 42: CVPR2008 tutorial generalized pca

For more information,

Vision, Dynamics and Learning Lab

@

Johns Hopkins University

Thank You!Thank You!

Page 43: CVPR2008 tutorial generalized pca

Generalized Principal Component Analysis Generalized Principal Component Analysis via via LossyLossy Coding and CompressionCoding and Compression

Yi MaYi Ma

Image Formation & Processing Group, BeckmanDecision & Control Group, Coordinated Science

Lab.Electrical & Computer Engineering Department

University of Illinois at Urbana-Champaign

Page 44: CVPR2008 tutorial generalized pca

MOTIVATION

PROBLEM FORMULATION AND EXISTING APPROACHES

SEGMENTATION VIA LOSSY DATA COMPRESSION

SIMULATIONS (AND EXPERIMENTS)

CONCLUSIONS AND FUTURE DIRECTIONS

OUTLINE

Page 45: CVPR2008 tutorial generalized pca

MOTIVATION – Motion Segmentation in Computer Vision

The “chicken-and-egg” difficulty:– Knowing the segmentation, estimating the motions is easy;– Knowing the motions, segmenting the features is easy.

Goal: Given a sequence of images of multiple moving objects, determine:– 1. the number and types of motions (rigid-body, affine, linear, etc.)

2. the features that belong to the same motion.

A Unified Algebraic Approach to 2D and 3D Motion Segmentation, [Vidal-Ma, ECCV’

QuickTime™ and aCinepak decompressor

are needed to see this picture.

Page 46: CVPR2008 tutorial generalized pca

MOTIVATION – Image Segmentation

features

Computer Human

Goal: segment an image into multiple regions with homogeneous texture.

Difficulty: A mixture of models of different dimensions or complexities.

Multiscale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP’

Page 47: CVPR2008 tutorial generalized pca

MOTIVATION – Video SegmentationGoal: segmenting a video sequence into segments with “stationary” dynamics

Identification of Hybrid Linear Systems via Subspace Segmentation, [Huang-Wagner-Ma, C

Model: different segments as outputs from different (linear) dynamical systems:

QuickTime™ and aH.264 decompressor

are needed to see this picture.

Page 48: CVPR2008 tutorial generalized pca

MOTIVATION – Massive Multivariate Mixed Data

Face database

Hand written digits

Hyperspectral images

Microarrays

Articulate motions

QuickTime™ and aBMP decompressor

are needed to see this picture.

Page 49: CVPR2008 tutorial generalized pca

SUBSPACE SEGMENTATION – Problem Formulation

Difficulties:– the dimensions of the subspaces can be different– the data can be corrupted by noise or contaminated by outliers– the number and dimensions of subspaces may be unknown

Assumption: the data are noisy samples from an arrangement of linear subspaces:

noise-free samples noisy samples samples with outliers

Page 50: CVPR2008 tutorial generalized pca

SUBSPACE SEGMENTATION – Statistical Approaches

Assume that the data are i.i.d. samples from a mixture of probabilistic distributions:

Essentially iterate between data segmentation and model estimation.

Solutions:• Expectation Maximization (EM) for the maximum-likelihood estimate

[Dempster et. al.’77], e.g., Probabilistic PCA [Tipping-Bishop’99]:

• K-Means for a minimax-like estimate [Forgy’65, Jancey’66, MacQueen’67], e.g., K-Subspaces [Ho and Kriegman’03]:

Page 51: CVPR2008 tutorial generalized pca

SUBSPACE SEGMENTATION – An Algebro-Geometric Approach

Idea: a union of linear subspaces is an algebraic set -- the zero set of a set of (homogeneous) polynomials:

Complexity exponential in the dimension and number of subspaces.

Solution:• Identify the set of polynomials of degree n that vanish on

• Gradients of the vanishing polynomials are normals to the subspaces

Generalized Principal Component Analysis, [Vidal-Ma-Sastry, IEEE Transactions PAMI’0

Page 52: CVPR2008 tutorial generalized pca

SUBSPACE SEGMENTATION – An Information-Theoretic Approach

Problem: If the number/dimension of subspaces not given and data corrupted

by noise and outliers, how to determine the optimal subspaces that fit the data?Solutions: Model Selection Criteria?– Minimum message length (MML) [Wallace-Boulton’68]– Minimum description length (MDL) [Rissanen’78]– Bayesian information criterion (BIC)– Akaike information criterion (AIC) [Akaike’77]– Geometric AIC [Kanatani’03], Robust AIC [Torr’98]

Key idea (MDL):• a good balance between model complexity and data fidelity.• minimize the length of codes that describe the model and the data:

with a quantization error optimal for the model.

Page 53: CVPR2008 tutorial generalized pca

LOSSY DATA COMPRESSION

Questions:

– What is the “gain” or “loss” of segmenting or merging data?

– How does tolerance of error affect segmentation results?

Basic idea: whether the number of bits required to store “the whole is more than the sum of its parts”?

Page 54: CVPR2008 tutorial generalized pca

LOSSY DATA COMPRESSION – Problem Formulation

– A coding scheme maps a set of vectors to a sequence of bits, from which we can decode The coding length is denoted as:

– Given a set of real-valued mixed data the optimal segmentation

minimizes the overall coding length:

where

Page 55: CVPR2008 tutorial generalized pca

LOSSY DATA COMPRESSION – Coding Length for Multivariate Data

Theorem.Given with

is the number of bits needed to encode the data s.t. .

A nearly optimal bound for even a small number of vectors drawn from a subspace or a Gaussian source.

Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’

Page 56: CVPR2008 tutorial generalized pca

LOSSY DATA COMPRESSION – Two Coding Schemes

Goal: code s.t. a mean squared error

Linear subspace Gaussian source

Page 57: CVPR2008 tutorial generalized pca

LOSSY DATA COMPRESSION – Properties of the Coding Length

2. Asymptotic Property:

At high SNR, this is the optimal rate distortion for a Gaussian source.

1. Commutative Property:For high-dimensional data, computing the coding length only needsthe kernel matrix:

3. Invariant Property:Harmonic Analysis is useful for data compression only when the data arenon-Gaussian or nonlinear ……… so is segmentation!

Page 58: CVPR2008 tutorial generalized pca

LOSSY DATA COMPRESSION – Why Segment?

partitioning:

sifting:

Page 59: CVPR2008 tutorial generalized pca

LOSSY DATA COMPRESSION – Probabilistic Segmentation?

is a concave function in Π over the domain of a convex polytope.

Minima are reached at the vertexes of the polytope -- no probabilistic segmentation!

Assign the ith point to the jth group with probability

Theorem. The expected coding length of the segmented data

Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’

Page 60: CVPR2008 tutorial generalized pca

LOSSY DATA COMPRESSION – Segmentation & Channel Capacity

A MIMO additive white Gaussian noise (AWGN) channel

has the capacity:

If allowing probabilistic grouping of transmitters, the expectedcapacity

is a concave function in Π over a convex polytope.Maximizing such a capacity is a convexproblem.

On Coding and Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI

Page 61: CVPR2008 tutorial generalized pca

LOSSY DATA COMPRESSION – A Greedy (Agglomerative) Algorithm

Objective: minimizing the overall coding length

Input:

while true dochoose two sets such

that is minimal

ifthenelse breakendif

endOutput:

“Bottom-up” merge

QuickTime™ and aPNG decompressor

are needed to see this picture.

Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]

Page 62: CVPR2008 tutorial generalized pca

SIMULATIONS – Mixture of Almost Degenerate GaussiansNoisy samples from two lines and one plane in <3

Given Data Segmentation Results

ε0 = 0.01

Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]

ε0 = 0.08

Page 63: CVPR2008 tutorial generalized pca

SIMULATIONS – “Phase Transition”Rate v.s. distortion

0.08

ε0 = 0.08

#group v.s. distortion

Stability: the same segmentation for ε across 3 magnitudes!

0.08

steamwater

ice cubes

Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]

Page 64: CVPR2008 tutorial generalized pca

SIMULATIONS – Comparison with EM

100 x d uniformly distributed random samples from each subspace, corruptewith 4% noise. Classification rate averaged over 25 trials for each case.

Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]

Page 65: CVPR2008 tutorial generalized pca

SIMULATIONS – Comparison with EM

Segmenting three degenerate or non-degenerate Gaussian clusters for 50 tria

Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]

Page 66: CVPR2008 tutorial generalized pca

SIMULATIONS – Robustness with Outliers

35.8% outliers 45.6%

71.5% 73.6%

Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]

Page 67: CVPR2008 tutorial generalized pca

SIMULATIONS – Affine Subspaces with Outliers

35.8% outliers 45.6%

66.2% 69.1%

Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]

Page 68: CVPR2008 tutorial generalized pca

SIMULATIONS – Piecewise-Linear Approximation of Manifolds

Swiss roll Mobius strip Torus Klein bottle

Page 69: CVPR2008 tutorial generalized pca

SIMULATIONS – Summary

– The minimum coding length objective automatically addresses the

model selection issue: the optimal solution is very stable and robust.

– The segmentation/merging is physically meaningful (measured in bits).

The results resemble phase transition in statistical physics.

– The greedy algorithm is scalable (polynomial in both K and N) and

converges well when ε is not too small w.r.t. the sample density.

Page 70: CVPR2008 tutorial generalized pca

Clustering from a Classification Perspective

Solution: Knowing the distributions and , the optimal classifier is the maximum a posteriori (MAP) classifier:

Difficulties: How to learn the two distributions from samples?(parametric, non-parametric, model selection, high-dimension, outliers…)

Goal: Construct a classifier such that the misclassification error

reaches minimum.

Assumption: The training data are drawn from a distribution

Page 71: CVPR2008 tutorial generalized pca

MINIMUM INCREMENTAL CODING LENGTH – Problem Formulation

Ideas: Using the lossy coding length

as a surrogate for the Shannon lossless coding length w.r.t. true distributions.

Classification Criterion: Minimum Incremental Coding Length (MICL)

Additional bits need to encode the test sample with the jth training set is

Page 72: CVPR2008 tutorial generalized pca

MICL (“Michael”) – Asymptotic Properties

Theorem: As the number of samples goes to infinity, the MICL criterion converges with probability one to the following criterion:

where?

is the “number of effective parameters” of the j-th model (class).

Theorem: The MICL classifier converges to the above asymptotic form at the rate of for some constant .

Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]

Page 73: CVPR2008 tutorial generalized pca

SIMULATIONS – Interpolation and Extrapolation via MICL

MICL

SVM

k-NN

Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]

Page 74: CVPR2008 tutorial generalized pca

SIMULATIONS – Improvement over MAP and RDA [Friedman1989]

Two Gaussians in R2

isotropic (left)anisotropic

(right)(500 trials)

Three Gaussians in Rn

dim = ndim = n/2dim = 1

(500 trials)

Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]

Page 75: CVPR2008 tutorial generalized pca

SIMULATIONS – Local and Kernel MICL

LMICL k-NN

KMICL-RBF SVM-RBF

Local MICL (LMICL): Applying MICL locally to the k-nearest neighbors of the test sample (frequencylist + Bayesianist).

Kernel MICL (KMICL): Incorporating MICL with a nonlinear kernel naturally through the identity (“kernelized” RDA):

Minimum Incremental Coding Length (MICL), [Wright and Ma et. a.., NIPS’07]

Page 76: CVPR2008 tutorial generalized pca

CONCLUSIONS

Assumptions: Data are in a high-dimensional space but have low-dimensional structures (subspaces or submanifolds).

Compression => Clustering & Classification:– Minimum (incremental) coding length subject to distortion.– Asymptotically optimal clustering and classification.– Greedy clustering algorithm (bottom-up, agglomerative).– MICL corroborates MAP, RDA, k-NN, and kernel methods.

Applications (Next Lectures):– Video segmentation, motion segmentation (Vidal)– Image representation & segmentation (Ma)– Others: microarray clustering, recognition of faces and

handwritten digits (Ma)

Page 77: CVPR2008 tutorial generalized pca

FUTURE DIRECTIONS

Theory– More complex structures: manifolds, systems, random

fields…– Regularization (ridge, lasso, banding etc.)– Sparse representation and subspace arrangements

Computation– Global optimality (random techniques, convex

optimization…)– Scalability: random sampling, approximation…

Future Application Domains– Image/video/audio classification, indexing, and retrieval– Hyper-spectral images and videos– Biomedical images, microarrays– Autonomous navigation, surveillance, and 3D mapping– Identification of hybrid linear/nonlinear systems

Page 78: CVPR2008 tutorial generalized pca

REFERENCES & ACKNOWLEGMENT

References:– Segmentation of Multivariate Mixed Data via Lossy Data

Compression, Yi Ma, Harm Derksen, Wei Hong, John Wright, PAMI, 2007.

– Classification via Minimum Incremental Coding Length (MICL), John Wright et. al., NIPS, 2007.

– Website: http://perception.csl.uiuc.edu/coding/home.htm

People:– John Wright, PhD Student, ECE Department, University of Illinois– Prof. Harm Derksen, Mathematics Department, University of

Michigan– Allen Yang (UC Berkeley) and Wei Hong (Texas Instruments R&D)– Zhoucheng Lin and Harry Shum, Microsoft Research Asia, China

Funding:– ONR YIP N00014-05-1-0633– NSF CAREER IIS-0347456, CCF-TF-0514955, CRS-EHS-0509151

Page 79: CVPR2008 tutorial generalized pca

““The whole is more than the sum of its The whole is more than the sum of its partsparts.”.”

----Aristotle Aristotle

Questions, please?

11/2003

Yi Ma, CVPR 2008

Page 80: CVPR2008 tutorial generalized pca

Part II

Applications of GPCA in Computer VisionRené Vidal

Center for Imaging Science

Institute for Computational Medicine

Johns Hopkins University

Page 81: CVPR2008 tutorial generalized pca

Part II: Applications in computer vision

• Applications to motion & video segmentation (10.30-11.15)

– 2-D and 3-D motion segmentation

– Temporal video segmentation

– Dynamic texture segmentation

• Applications to image representation and segmentation

(11.15-12.00)

– Multi-scale hybrid linear models for sparse

image representation

– Hybrid linear models for image segmentation

Page 82: CVPR2008 tutorial generalized pca

Applications to motion and and video

segmentationRené Vidal

Center for Imaging Science

Institute for Computational Medicine

Johns Hopkins University

Page 83: CVPR2008 tutorial generalized pca

3-D motion segmentation problem

• Given a set of point correspondences in multiple views, determine

– Number of motion models

– Motion model: affine, homography, fundamental matrix, trifocal tensor

– Segmentation: model to which each pixel belongs

• Mathematics of the problem depends on

– Number of frames (2, 3, multiple)

– Projection model (affine, perspective)

– Motion model (affine, translational, homography, fundamental matrix, etc.)

– 3-D structure (planar or not)

Page 84: CVPR2008 tutorial generalized pca

Taxonomy of problems

• 2-D Layered representation– Probabilistic approaches: Jepson-Black’93, Ayer-Sawhney’95, Darrel-Pentland’95, Weiss-

Adelson’96, Weiss’97, Torr-Szeliski-Anandan’99

– Variational approaches: Cremers-Soatto ICCV’03

– Initialization: Wang-Adelson’94, Irani-Peleg’92, Shi-Malik‘98, Vidal-Singaraju’05-’06

• Multiple rigid motions in two perspective views– Probabilistic approaches: Feng-Perona’98, Torr’98

– Particular cases: Izawa-Mase’92, Shashua-Levin’01, Sturm’02,

– Multibody fundamental matrix: Wolf-Shashua CVPR’01, Vidal et al. ECCV’02, CVPR’03, IJCV’06

– Motions of different types: Vidal-Ma-ECCV’04, Rao-Ma-ICCV’05

• Multiple rigid motions in three perspective views– Multibody trifocal tensor: Hartley-Vidal-CVPR’04

• Multiple rigid motions in multiple affine views– Factorization-based: Costeira-Kanade’98, Gear’98, Wu et al.’01, Kanatani’ et al.’01-02-04

– Algebraic: Yan-Pollefeys-ECCV’06, Vidal-Hartley-CVPR’04

• Multiple rigid motions in multiple perspective views

– Schindler et al. ECCV’06, Li et al. CVPR’07

Page 85: CVPR2008 tutorial generalized pca

A unified approach to motion segmentation

• Estimation of multiple motion models equivalent to

estimation of one multibody motion model

– Eliminate feature clustering: multiplication

– Estimate a single multibody motion model: polynomial fitting

– Segment multibody motion model: polynomial differentiation

chicken-and-egg

Page 86: CVPR2008 tutorial generalized pca

A unified approach to motion segmentation

• Applies to most motion models in computer vision

• All motion models can be segmented algebraically by– Fitting multibody model: real or complex polynomial to all data

– Fitting individual model: differentiate polynomial at a data point

Page 87: CVPR2008 tutorial generalized pca

Segmentation of 3-D translational motions

• Multiple epipoles (translation)

• Epipolar constraint: plane in

– Plane normal = epipoles

– Data = epipolar lines

• Multibody epipolar constraint • Epipoles are derivatives of

at epipolar lines

Page 88: CVPR2008 tutorial generalized pca

Segmentation of 3-D translational motions

Page 89: CVPR2008 tutorial generalized pca

Single-body factorization

• Affine camera model

– p = point

– f = frame

• Motion of one rigid-body lives in a 4-D subspace(Boult and Brown ’91,

Tomasi and Kanade ‘92)

– P = #points

– F = #frames

Structure = 3D surface

Motion = camera position and orientation

Page 90: CVPR2008 tutorial generalized pca

Multi-body factorization

• Given n rigid motions

• Motion segmentation is obtained from

– Leading singular vector of (Boult and Brown ’91)

– Shape interaction matrix (Costeira & Kanade ’95, Gear ’94)

– Number of motions (if fully-dimensional)

• Motion subspaces need to be independent (Kanatani ’01)

Page 91: CVPR2008 tutorial generalized pca

Multi-body factorization

• Sensitive to noise

– Kanatani (ICCV ’01): use model selection to scale Q

– Wu et al. (CVPR’01): project data onto subspaces and iterate

• Fails with partially dependent motions

– Zelnik-Manor and Irani (CVPR’03)

• Build similarity matrix from normalized Q

• Apply spectral clustering to similarity matrix

– Yan and Pollefeys (ECCV’06)

• Local subspace estimation + spectral clustering

– Kanatani (ECCV’04)

• Assume degeneracy is known: pure translation in the image

• Segment data by multi-stage optimization (multiple EM problems)

• Cannot handle missing data

– Gruber and Weiss (CVPR’04)

• Expectation Maximization

Page 92: CVPR2008 tutorial generalized pca

PowerFactorization+GPCA

• A motion segmentation algorithm that

– Is provably correct with perfect data

– Handles both independent and degenerate motions

– Handles both complete and incomplete data

• Project trajectories onto a 5-D subspace of

– Complete data: PCA or SVD

– Incomplete data: PowerFactorization

• Cluster projected subspaces using GPCA

– Handles both independent and degenerate motions

– Non-iterative: can be used to initialize EM

Page 93: CVPR2008 tutorial generalized pca

Projection onto a 5-D subspace

• Motion of one rigid-body lives in

4-D subspace of

• By projecting onto a 5-D

subspace of

– Number and dimensions of

subspaces are preserved

– Motion segmentation is

equivalent to clustering

subspaces of dimension

2, 3 or 4 in

– Minimum #frames = 3

(CK needs a minimum of 2n

frames for n motions)

– Can remove outliers by robustly

fitting the 5-D subspace using

Robust SVD (DeLaTorre-Black)

• What projection to use?

– PCA: 5 principal components

– RPCA: with outliers

Motion 1

Motion 2

Page 94: CVPR2008 tutorial generalized pca

Projection onto a 5-D subspace

PowerFactorization algorithm:

• Complete data

– Given A solve for B

– Orthonormalize B

– Given B solve for A

– Iterate

• Converges to rank-rapproximation with rate

Given , factor it as

• Incomplete data

• It diverges in some cases

• Works well with up to 30% ofmissing data

Linear problem

Page 95: CVPR2008 tutorial generalized pca

Motion segmentation using GPCA

• Apply polynomial embedding to 5-D points

Veronese map

Page 96: CVPR2008 tutorial generalized pca

Hopkins 155 motion segmentation database

• Collected 155 sequences– 120 with 2 motions

– 35 with 3 motions

• Types of sequences– Checkerboard sequences: mostly full

dimensional and independent motions

– Traffic sequences: mostly degenerate (linear,planar) and partially dependent motions

– Articulated sequences: mostly full dimensionaland partially dependent motions

• Point correspondences– In few cases, provided by Kanatani & Pollefeys

– In most cases, extracted semi-automaticallywith OpenCV

Page 97: CVPR2008 tutorial generalized pca

Experimental results: Hopkins 155 database

• 2 motions, 120 sequences, 266 points, 30 frames

Page 98: CVPR2008 tutorial generalized pca

Experimental results: Hopkins 155 database

• 3 motions, 35 sequences, 398 points, 29 frames

Page 99: CVPR2008 tutorial generalized pca

Experimental results: missing data sequences

• There is no clear correlation between amount of missing data and

percentage of misclassification

• This could be because convergence of PF depends more on “where”

missing data is located than on “how much” missing data there is

Page 100: CVPR2008 tutorial generalized pca

Conclusions

• For two motions

– Algebraic methods (GPCA and LSA) are more accurate than

statistical methods (RANSAC and MSL)

– LSA performs better on full and independent sequences, while

GPCA performs better on degenerate and partially dependent

– LSA is sensitive to dimension of projection: d=4n better than d=5

– MSL is very slow, RANSAC and GPCA are fast

• For three motions

– GPCA is not very accurate, but is very fast

– MSL is the most accurate, but it is very slow

– LSA is almost as accurate as MSL and almost as fast as GPCA

Page 101: CVPR2008 tutorial generalized pca

Segmentation of Dynamic Textures

René Vidal

Center for Imaging Science

Institute for Computational Medicine

Johns Hopkins University

Page 102: CVPR2008 tutorial generalized pca

Modeling a dynamic texture: fixed boundary

• Examples of dynamic textures:

• Model temporal evolution as the output of a linear

dynamical system (LDS): Soatto et al. ‘01

dynamics

appearance

images

zt+1

= Azt+ v

t

yt= Cz

t+ w

t

Page 103: CVPR2008 tutorial generalized pca

Segmenting non-moving dynamic textures

• One dynamic texture lives in the observability subspace

• Multiple textures live in multiple subspaces

• Cluster the data using GPCA

water

steam

zt+1

= Azt+ v

t

yt= Cz

t+ w

t

Page 104: CVPR2008 tutorial generalized pca

Segmenting moving dynamic textures

Page 105: CVPR2008 tutorial generalized pca

Segmenting moving dynamic textures

Ocean-bird

Page 106: CVPR2008 tutorial generalized pca

Level-set intensity-based segmentation

• Chan-Vese energy functional

• Implicit methods– Represent C as the zero level set

of an implicit function , i.e.C = {(x, y) : (x, y) = 0}

• Solution– The solution to the gradient descent algorithm for is given by

– c1 and c2 are the mean intensities inside and outside the contour C.

Page 107: CVPR2008 tutorial generalized pca

Dynamics & intensity-based energy

• We represent the intensities of the pixels in the images as

the output of a mixture of AR models of order p

• We propose the following spatial-temporal extension of the

Chan-Vese energy functional

where

Page 108: CVPR2008 tutorial generalized pca

Variational segmentation of dynamic textures

• Given the ARX parameters, we can solve for the implicit

function by solving the PDE

• Given the implicit function , we can solve for the ARX

parameters of the jth region by solving the linear system

Page 109: CVPR2008 tutorial generalized pca

Variational segmentation of dynamic textures

• Fixed boundary segmentation results and comparison

Ocean-smoke Ocean-dynamics Ocean-appearance

Page 110: CVPR2008 tutorial generalized pca

Variational segmentation of dynamic textures

• Moving boundary segmentation results and comparison

Ocean-fire

Page 111: CVPR2008 tutorial generalized pca

Variational segmentation of dynamic textures

• Results on a real sequence

Raccoon on River

Page 112: CVPR2008 tutorial generalized pca

Temporal video segmentation

• Segmenting N=30 frames of a

sequence containing n=3

scenes

– Host

– Guest

– Both

• Image intensities are output of

linear system

• Apply GPCA to fit n=3

observability subspaces

dynamics

appearanceimages

xt+1=Axt+vtyt=Cxt+wt

Page 113: CVPR2008 tutorial generalized pca

Temporal video segmentation

• Segmenting N=60 frames of a

sequence containing n=3

scenes

– Burning wheel

– Burnt car with people

– Burning car

• Image intensities are output of linearsystem

• Apply GPCA to fit n=3 observabilitysubspaces

dynamics

appearanceimages

xt+1=Axt+vtyt=Cxt+wt

Page 114: CVPR2008 tutorial generalized pca

Conclusions

• Many problems in computer vision can be posed as subspaceclustering problems

– Temporal video segmentation

– 2-D and 3-D motion segmentation

– Dynamic texture segmentation

– Nonrigid motion segmentation

• These problems can be solved using GPCA: an algorithm for clusteringsubspaces

– Deals with unknown and possibly different dimensions

– Deals with arbitrary intersections among the subspaces

• GPCA is based on

– Projecting data onto a low-dimensional subspace

– Recursively fitting polynomials to projected subspaces

– Differentiating polynomials to obtain a basis

Page 115: CVPR2008 tutorial generalized pca

For more information,

Vision, Dynamics and Learning Lab

@

Johns Hopkins University

Thank You!Thank You!

Page 116: CVPR2008 tutorial generalized pca

Generalized Principal Component Analysis Generalized Principal Component Analysis for Image Representation & Segmentationfor Image Representation & Segmentation

Yi MaYi MaControl & Decision, Coordinated Science Laboratory

Image Formation & Processing Group, BeckmanDepartment of Electrical & Computer Engineering

University of Illinois at Urbana-Champaign

Page 117: CVPR2008 tutorial generalized pca

INTRODUCTION

GPCA FOR LOSSY IMAGE REPRESENTATION

IMAGE SEGMENTATION VIA LOSSY COMPRESSION

OTHER APPLICATIONS

CONCLUSIONS AND FUTURE DIRECTIONS

Page 118: CVPR2008 tutorial generalized pca

Introduction – Image Representation via Linear Transformations

better representations?

pixel-based representation

three matrixes of RGB-values

linear transformationa more compact representation

Page 119: CVPR2008 tutorial generalized pca

Introduction

Fixed Orthogonal Bases (representation, approximation, compression) - Discrete Fourier transform (DFT) or discrete cosine transform (DCT)

(Ahmed ’74): JPEG.- Wavelets (multi-resolution) (Daubechies’88, Mallat’92): JPEG-2000.- Curvelets and contourlets (Candes & Donoho’99, Do & Veterlli’00)

Unorthogonal Bases (for redundant representations)- Extended lapped transforms, frames, sparse representations (Lp

geometry)…

Discrete Fourier transform (DFT)

Wavelet transform

6.25% coefficients.

Page 120: CVPR2008 tutorial generalized pca

Introduction

Adaptive Bases (optimal if imagery data are uni-modal)- Karhunen-Loeve transform (KLT), also known as PCA (Pearson’1901,

Hotelling’33, Jolliffe’86)

stack

adaptive bases

Page 121: CVPR2008 tutorial generalized pca

Introduction – Principal Component Analysis (PCA)Dimensionality Reduction

Find a low-dimensional representation (model) for high-dimensional data.

Variations of PCA – Nonlinear Kernel PCA (Scholkopf-Smola-Muller’98) – Probabilistic PCA (Tipping-Bishop’99, Collins et.al’01)– Higher-Order SVD (HOSVD) (Tucker’66, Davis’02)– Independent Component Analysis (Hyvarinen-Karhunen-Oja’01)

Principal Component Analysis (Pearson’1901, Hotelling’1933, Eckart & Young’1936) or Karhunen-Loeve transform (KLT).

Basis for S SVD

Page 122: CVPR2008 tutorial generalized pca

Distribution of the first three principal components of the Baboon image: A clear multi-modal distribution

Hybrid Linear Models – Multi-Modal Characteristics

Page 123: CVPR2008 tutorial generalized pca

Vector Quantization (VQ)

- multiple 0-dimensional affine subspaces (i.e. cluster means)

- existing clustering algorithms are iterative (EM, K-means)

Hybrid Linear Models – Multi-Modal Characteristics

Page 124: CVPR2008 tutorial generalized pca

Hybrid Linear Models – Versus Linear ModelsA single linear model

stackLinear

Hybrid linear

Hybrid linear models

stack

Page 125: CVPR2008 tutorial generalized pca

Hybrid Linear Models – Characteristics of Natural Images

Multivariate Hybrid Hierarchical High-dimensio1D 2D (multi-modal) (multi-scale) (vector-value

Fourier (DCT)

WaveletsCurvelets

Random fields

PCA/KLT

VQ

Hybrid linear

We need a new & simple paradigm to effectively account for allthese characteristics simultaneously.

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

Page 126: CVPR2008 tutorial generalized pca

Hybrid Linear Models – Subspace Estimation and Segmentation

“Chicken-and-Egg” Coupling– Given segmentation, estimate subspaces– Given subspaces, segment the data

Hybrid Linear Models (or Subspace Arrangements)

– the number of subspaces is unknown– the dimensions of the subspaces are unknown– the basis of the subspaces are unknown– the segmentation of the data points is unknown

Page 127: CVPR2008 tutorial generalized pca

Hybrid Linear Models – Recursive GPCA (an Example)

Page 128: CVPR2008 tutorial generalized pca

Hybrid Linear Models – Effective Dimension

Model Selection (for Noisy Data)Model complexity;Data fidelity;

Dimension of each

subspace

Number of points in each

subspace

Number of subspaces

Total number of

points

Model selection criterion: minimizing effective dimension subject to a given error tolerance (or PSNR)

Page 129: CVPR2008 tutorial generalized pca

Hybrid Linear Models – Simulation Results (5% Noise)

ED=3

ED=2.0067

ED=1.6717

Page 130: CVPR2008 tutorial generalized pca

Hybrid Linear Models – Subspaces of the Barbara Image

Page 131: CVPR2008 tutorial generalized pca

PCA (8x8)

HarrWavelet GPCA (8x8)

Original

Hybrid Linear Models – Lossy Image Representation (Baboon)

DCT (JPEG)

GPCA

Page 132: CVPR2008 tutorial generalized pca

Multi-Scale Implementation – Algorithm Diagram

Diagram for a level-3 implementation of hybrid linear modelsfor image representation

Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP

Page 133: CVPR2008 tutorial generalized pca

Multi-Scale Implementation – The Baboon Image

downsampleby two twice

segmentation of2 by 2 blocks

The Baboon image

Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP

Page 134: CVPR2008 tutorial generalized pca

Multi-Scale Implementation – Comparison with Other Methods

The Baboon image

Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP

Page 135: CVPR2008 tutorial generalized pca

Multi-Scale Implementation – Image Approximation

Comparison with level-3 wavelet (7.5% coefficients)

Level-3 bior-4.4 wavelets PSNR=23.94

Level-3 hybrid linear model PSNR=24.64

Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP

Page 136: CVPR2008 tutorial generalized pca

Multi-Scale Implementation – Block Size Effect

Some problems with the multi-scale hybrid linear model: 1. has minor block effect; 2. is computationally more costly (than Fourier, wavelets, PCA); 3. does not fully exploit spatial smoothness as wavelets.

The Baboon image

Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP

Page 137: CVPR2008 tutorial generalized pca

Multi-Scale Implementation – The Wavelet Domain

The Baboon image

segmentationat each scale

LH HH

HL

Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP

Page 138: CVPR2008 tutorial generalized pca

Multi-Scale Implementation – Wavelets v.s. Hybrid Linear Wavelets

The Baboon image

Advantages of the hybrid linear model in wavelet domain: 1. eliminates block effect; 2. is computationally less costly (than in the spatial domain); 3. achieves higher PSNR.

Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP

Page 139: CVPR2008 tutorial generalized pca

Multi-Scale Implementation – Visual ComparisonComparison among several models (7.5% coefficients)

Original Image

Wavelets PSNR=23.94

Hybrid modelin spatial domain

PSNR=24.64

Hybrid modelin wavelet

domain PSNR=24.88

Multi-Scale Hybrid Linear Models for Lossy Image Representation, [Hong-Wright-Ma, TIP

Page 140: CVPR2008 tutorial generalized pca

Image Segmentation – via Lossy Data Compression

stack

QuickTime™ and aPNG decompressor

are needed to see this picture.

Page 141: CVPR2008 tutorial generalized pca

APPLICATIONS – Texture-Based Image Segmentation

Naïve approach:– Take a 7x7 Gaussian window around every pixel.– Stack these windows as vectors.– Clustering the vectors using our algorithm.

A few results:

Segmentation of Multivariate Mixed Data via Lossy Coding and Compression, [Ma-Derksen-Hong-Wright, PAMI’07]

Page 142: CVPR2008 tutorial generalized pca

APPLICATIONS – Distribution of Texture Features

Question: why does such a simple algorithm work at all?

Answer: Compression (MDL/MCL) is well suited to mid-level texture segmentation.

Using a single representation (e.g. windows, filterbank responses) for texturesdifferent complexity ⇒ redundancy and degeneracy, which can be exploited foclustering / compression.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Above: singular values of feature vectors from two differentsegments of the image at left.

Page 143: CVPR2008 tutorial generalized pca

APPLICATIONS – Compression-based Texture Merging (CTM)

Problem with the naïve approach:Strong edges, segment boundaries

Solution:Low-level, edge-preserving over-segmentation into small homogeneous regions.

Simple features: stacked Gaussian windows (7x7 in our experiments).

Merge adjacent regions to minimize coding length (“compress” the features).

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 144: CVPR2008 tutorial generalized pca

APPLICATIONS – Hierarchical Image Segmentation via CTM

ε = 0.1 ε = 0.2 ε = 0.4Lossy coding with varying distortion ε => hierarchy of segmentations

Page 145: CVPR2008 tutorial generalized pca

APPLICATIONS – CTM: Qualitative Results

Page 146: CVPR2008 tutorial generalized pca

APPLICATIONS – CTM: Quantitative Evaluation and Comparison

PRI: Probabilistic Rand Index [Pantofaru 2005]VoI: Variation of Information [Meila 2005]GCE: Global Consistency Error [Martin 2001]BDE: Boundary Displacement Error [Freixenet 2002]

Berkeley Image Segmentation Database

Unsupervised Segmentation of Natural Images via Lossy Data Compression, CVIU, 200

Page 147: CVPR2008 tutorial generalized pca

Other Applications: Multiple Motion Segmentation (on Hopkins155)

Shankar Rao, Roberton Tron, Rene Vidal, and Yi Ma, to appear in CVPR’08

QuickTime™ and aCinepak decompressor

are needed to see this picture.

QuickTime™ and aCinepak decompressor

are needed to see this picture.

Two Motions: MSL 4.14%, LSA 3.45%, ALC 2.40%, and work with up to 25% outliers.Three Motions: MSL 8.32%, LSA 9.73%, ALC 6.26%.

Page 148: CVPR2008 tutorial generalized pca

Other Applications – Clustering of Microarray Data

Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’

Page 149: CVPR2008 tutorial generalized pca

Other Applications – Clustering of Microarray Data

Segmentation of Multivariate Mixed Data, [Ma-Derksen-Hong-Wright, PAMI’

Page 150: CVPR2008 tutorial generalized pca

Other Applications – Supervised Classification

Premises: Data lie on anarrangement of subspaces

Unsupervised Clustering– Generalized PCA

Supervised Classification– Sparse Representation

Page 151: CVPR2008 tutorial generalized pca

Other Applications – Robust Face Recognition

Robust Face Recognition via Sparse Representation, to appear in PAMI 2008

Page 152: CVPR2008 tutorial generalized pca

Other Applications: Robust Motion Segmentation (on Hopkins155)

Shankar Rao, Roberto Tron, Rene Vidal, and Yi Ma, to appear in CVPR’08

Dealing with incomplete or mistracked features with dataset 80% corrupted!

Page 153: CVPR2008 tutorial generalized pca

Three Measures of Sparsity: Bits, L_0 and L1-Norm

Reason: High-dimensional data, like images, do have compact, compressible, sparse structures, in terms of their geometry, statistics, and semantics.

Page 154: CVPR2008 tutorial generalized pca

Conclusions

Most imagery data are high-dimensional, statistically or geometrically heterogeneous, and have multi-scalestructures.

Imagery data require hybrid models that can adaptivelyrepresent different subsets of the data with different (sparse) linear models.

Mathematically, it is possible to estimate and segment hybrid (linear) models non-iteratively. GPCA offers one such method.

Hybrid models lead to new paradigms, new principles, andnew applications for image representation, compression, and segmentation.

Page 155: CVPR2008 tutorial generalized pca

Future Directions

Mathematical Theory– Subspace arrangements (algebraic properties).– Extension of GPCA to more complex algebraic varieties (e.g.,

hybrid multilinear, high-order tensors).– Representation & approximation of vector-valued functions.

Computation & Algorithm Development– Efficiency, noise sensitivity, outlier elimination.– Other ways to combine with wavelets and curvelets.

Applications to Other Data– Medical imaging (ultra-sonic, MRI, diffusion tensor…)– Satellite hyper-spectral imaging.– Audio, video, faces, and digits.– Sensor networks (location, temperature, pressure, RFID…)– Bioinformatics (gene expression data…)

Page 156: CVPR2008 tutorial generalized pca

Acknowledgement

People– Wei Hong, Allen Yang, John Wright, University of Illinois– Rene Vidal of Biomedical Engineering Dept., Johns Hopkins

University– Kun Huang of Biomedical & Informatics Science Dept., Ohio-

State University

Funding– Research Board, University of Illinois at Urbana-Champaign– National Science Foundation (NSF CAREER IIS-0347456)– Office of Naval Research (ONR YIP N000140510633)– National Science Foundation (NSF CRS-EHS0509151)– National Science Foundation (NSF CCF-TF0514955)

Page 157: CVPR2008 tutorial generalized pca

Thank You!Thank You!

Generalized Principal Component Analysis: Generalized Principal Component Analysis: Modeling and Segmentation of Multivariate Mixed Modeling and Segmentation of Multivariate Mixed DataData

Rene Vidal, Yi Ma, and Rene Vidal, Yi Ma, and ShankarShankar SastrySastrySpringerSpringer--VerlagVerlag, to appear, to appear

Yi Ma, CVPR 2008