part 2: unsupervised learning machine learning techniques

72
Part 2: Unsupervised Learning Machine Learning Techniques for Computer Vision Microsoft Research Cambridge ECCV 2004, Prague Christopher M. Bishop

Upload: butest

Post on 11-May-2015

1.522 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Part 2: Unsupervised Learning Machine Learning Techniques

Part 2: Unsupervised Learning

Machine Learning Techniques

for Computer Vision

Microsoft Research Cambridge

ECCV 2004, Prague

Christopher M. Bishop

Page 2: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Overview of Part 2

• Mixture models• EM• Variational Inference• Bayesian model complexity• Continuous latent variables

Page 3: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

The Gaussian Distribution

• Multivariate Gaussian

• Maximum likelihood

meancovariance

Page 4: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Gaussian Mixtures

• Linear super-position of Gaussians

• Normalization and positivity require

Page 5: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Example: Mixture of 3 Gaussians

Page 6: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Maximum Likelihood for the GMM

• Log likelihood function

• Sum over components appears inside the log– no closed form ML solution

Page 7: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

EM Algorithm – Informal Derivation

Page 8: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

EM Algorithm – Informal Derivation

• M step equations

Page 9: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

EM Algorithm – Informal Derivation

• E step equation

Page 10: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

EM Algorithm – Informal Derivation

• Can interpret the mixing coefficients as prior probabilities

• Corresponding posterior probabilities (responsibilities)

Page 11: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Old Faithful Data Set

Duration of eruption (minutes)

Time betweeneruptions (minutes)

Page 12: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Page 13: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Page 14: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Page 15: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Page 16: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Page 17: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Page 18: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Latent Variable View of EM

• To sample from a Gaussian mixture:– first pick one of the components with probability – then draw a sample from that component– repeat these two steps for each new data point

Page 19: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Latent Variable View of EM

• Goal: given a data set, find • Suppose we knew the colours

– maximum likelihood would involve fitting each component to the corresponding cluster

• Problem: the colours are latent (hidden) variables

Page 20: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Incomplete and Complete Data

completeincomplete

Page 21: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Latent Variable Viewpoint

Page 22: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Latent Variable Viewpoint

• Binary latent variables describing which component generated each data point

• Conditional distribution of observed variable

• Prior distribution of latent variables

• Marginalizing over the latent variables we obtain

Page 23: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Graphical Representation of GMM

Page 24: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Latent Variable View of EM

• Suppose we knew the values for the latent variables– maximize the complete-data log likelihood

– trivial closed-form solution: fit each component to the corresponding set of data points

• We don’t know the values of the latent variables– however, for given parameter values we can compute

the expected values of the latent variables

Page 25: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Posterior Probabilities (colour coded)

Page 26: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Over-fitting in Gaussian Mixture Models

• Infinities in likelihood function when a component ‘collapses’ onto a data point:

with

• Also, maximum likelihood cannot determine the number K of components

Page 27: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Cross Validation

• Can select model complexity using an independent validation data set

• If data is scarce use cross-validation:– partition data into S subsets– train on S1 subsets – test on remainder– repeat and average

• Disadvantages– computationally expensive – can only determine one or

two complexity parameters

Page 28: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Bayesian Mixture of Gaussians

• Parameters and latent variables appear on equal footing• Conjugate priors

Page 29: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Data Set Size

• Problem 1: learn the functionfor from 100 (slightly) noisy examples– data set is computationally small but statistically large

• Problem 2: learn to recognize 1,000 everyday objects from 5,000,000 natural images– data set is computationally large but statistically small

• Bayesian inference – computationally more demanding than ML or MAP

(but see discussion of Gaussian mixtures later)– significant benefit for statistically small data sets

Page 30: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Variational Inference

• Exact Bayesian inference intractable• Markov chain Monte Carlo

– computationally expensive– issues of convergence

• Variational Inference – broadly applicable deterministic approximation– let denote all latent variables and parameters– approximate true posterior using a simpler

distribution – minimize Kullback-Leibler divergence

Page 31: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

General View of Variational Inference

• For arbitrary

where

• Maximizing over would give the true posterior– this is intractable by definition

Page 32: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Variational Lower Bound

Page 33: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Factorized Approximation

• Goal: choose a family of q distributions which are:– sufficiently flexible to give good approximation– sufficiently simple to remain tractable

• Here we consider factorized distributions

• No further assumptions are required!• Optimal solution for one factor, keeping the remainder fixed

– coupled solutions so initialize then cyclically update– message passing view (Winn and Bishop, 2004)

Page 34: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Page 35: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Lower Bound

• Can also be evaluated• Useful for maths/code verification• Also useful for model comparison:

Page 36: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Illustration: Univariate Gaussian

• Likelihood function

• Conjugate prior • Factorized variational distribution

Page 37: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Initial Configuration

Page 38: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

After Updating

Page 39: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

After Updating

Page 40: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Converged Solution

Page 41: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Variational Mixture of Gaussians

• Assume factorized posterior distribution

• No other approximations needed!

Page 42: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Variational Equations for GMM

Page 43: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Lower Bound for GMM

Page 44: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

VIBES

Bishop, Spiegelhalter and Winn (2002)

Page 45: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

ML Limit

• If instead we choose

we recover the maximum likelihood EM algorithm

Page 46: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Bound vs. K for Old Faithful Data

Page 47: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Bayesian Model Complexity

Page 48: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Sparse Bayes for Gaussian Mixture

• Corduneanu and Bishop (2001)• Start with large value of K

– treat mixing coefficients as parameters– maximize marginal likelihood– prunes out excess components

Page 49: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Page 50: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Page 51: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Summary: Variational Gaussian Mixtures

• Simple modification of maximum likelihood EM code• Small computational overhead compared to EM• No singularities• Automatic model order selection

Page 52: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Continuous Latent Variables

• Conventional PCA– data covariance matrix

– eigenvector decomposition

• Minimizes sum-of-squares projection– not a probabilistic model– how should we choose L ?

Page 53: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Probabilistic PCA

• Tipping and Bishop (1998)• L dimensional continuous latent space

• D dimensional data space

PCA

factor analysis

Page 54: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Probabilistic PCA

• Marginal distribution

• Advantages– exact ML solution– computationally efficient EM algorithm– captures dominant correlations with few parameters– mixtures of PPCA– Bayesian PCA– building block for more complex models

Page 55: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

EM for PCA

Page 56: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

EM for PCA

Page 57: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

EM for PCA

Page 58: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

EM for PCA

Page 59: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

EM for PCA

Page 60: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

EM for PCA

Page 61: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

EM for PCA

Page 62: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Bayesian PCA

• Bishop (1998)• Gaussian prior over columns of

• Automatic relevance determination (ARD)

ML PCA Bayesian PCA

Page 63: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Non-linear Manifolds

• Example: images of a rigid object

Page 64: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Bayesian Mixture of BPCA Models

Page 65: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Page 66: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Flexible Sprites

• Jojic and Frey (2001)• Automatic decomposition of video sequence into

– background model– ordered set of masks (one per object per frame)– foreground model (one per object per frame)

Page 67: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Page 68: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Transformed Component Analysis

• Generative model

• Now include transformations (translations)

• Extend to L layers• Inference intractable so

use variational framework

Page 69: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Page 70: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Bayesian Constellation Model

• Li, Fergus and Perona (2003)• Object recognition from small training sets• Variational treatment of fully Bayesian model

Page 71: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Bayesian Constellation Model

Page 72: Part 2: Unsupervised Learning Machine Learning Techniques

Machine Learning Techniques for Computer Vision (ECCV 2004)

Christopher M. Bishop

Summary of Part 2

• Discrete and continuous latent variables – EM algorithm

• Build complex models from simple components– represented graphically– incorporates prior knowledge

• Variational inference– Bayesian model comparison