04 cv mil_fitting_probability_models

Post on 05-Dec-2014

358 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Computer vision: models, learning and inference

Chapter 4 Fitting probability models

Please send errata to s.prince@cs.ucl.ac.uk

2

Structure

2Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach

• Worked example 1: Normal distribution• Worked example 2: Categorical distribution

As the name suggests we find the parameters under which the data is most likely.

We have assumed that data was independent (hence product)

Predictive Density:

Evaluate new data point under probability distribution with best parameters

Maximum Likelihood

3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Maximum a posteriori (MAP)Fitting

As the name suggests we find the parameters which maximize the posterior probability .

Again we have assumed that data was independent4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting

As the name suggests we find the parameters which maximize the posterior probability .

Since the denominator doesn’t depend on the parameters we can instead maximize

Maximum a posteriori (MAP)

5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Maximum a posteriori

Predictive Density:

Evaluate new data point under probability distribution with MAP parameters

6Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Bayesian ApproachFitting

Compute the posterior distribution over possible parameter values using Bayes’ rule:

Principle: why pick one set of parameters? There are many values that could have explained the data. Try to capture all of the possibilities

7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Bayesian ApproachPredictive Density

• Each possible parameter value makes a prediction• Some parameters more probable than others

Make a prediction that is an infinite weighted sum (integral) of the predictions for each parameter value, where weights are the probabilities

8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Predictive densities for 3 methods

Maximum a posteriori:

Evaluate new data point under probability distribution with MAP parameters

Maximum likelihood:

Evaluate new data point under probability distribution with ML parameters

Bayesian:

Calculate weighted sum of predictions from all possible value of parameters

9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

How to rationalize different forms?

Consider ML and MAP estimates as probability distributions with zero probability everywhere except at estimate (i.e. delta functions)

Predictive densities for 3 methods

10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

11

Structure

11Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach

• Worked example 1: Normal distribution• Worked example 2: Categorical distribution

Univariate Normal Distribution

For short we write:

Univariate normal distribution describes single continuous variable.

Takes 2 parameters m and s2>012Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Normal Inverse Gamma DistributionDefined on 2 variables m and s2>0

or for short

Four parameters , , > 0 a b g and .d

13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

As the name suggests we find the parameters under which the data is most likely.

Likelihood given by pdf

Fitting normal distribution: ML

14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal distribution: ML

15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting a normal distribution: ML

Plotted surface of likelihoods as a function of possible parameter values

ML Solution is at peak

16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal distribution: ML

Algebraically:

or alternatively, we can maximize the logarithm

17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

where:

Why the logarithm?

The logarithm is a monotonic transformation.

Hence, the position of the peak stays in the same place

But the log likelihood is easier to work with18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal distribution: ML

How to maximize a function? Take derivative and set to zero.

19Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Solution:

Fitting normal distribution: ML

Maximum likelihood solution:

Should look familiar!

20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Least Squares

21

Maximum likelihood for the normal distribution...

...gives `least squares’ fitting criterion.

Fitting normal distribution: MAPFitting

As the name suggests we find the parameters which maximize the posterior probability ..

Likelihood is normal PDF

22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal distribution: MAPPrior

Use conjugate prior, normal scaled inverse gamma.

23Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal distribution: MAP

Likelihood Prior Posterior

24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal distribution: MAP

Again maximize the log – does not change position of maximum

25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal distribution: MAPMAP solution:

Mean can be rewritten as weighted sum of data mean and prior mean:

26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal distribution: MAP

50 data points 5 data points 1 data points

Fitting normal: Bayesian approachFitting

Compute the posterior distribution using Bayes’ rule:

Fitting normal: Bayesian approachFitting

Compute the posterior distribution using Bayes’ rule:

Two constants MUST cancel out or LHS not a valid pdf29Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal: Bayesian approachFitting

Compute the posterior distribution using Bayes’ rule:

where

30Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal: Bayesian approachPredictive density

Take weighted sum of predictions from different parameter values:

31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal: Bayesian approachPredictive density

Take weighted sum of predictions from different parameter values:

32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal: Bayesian approachPredictive density

Take weighted sum of predictions from different parameter values:

where

33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting normal: Bayesian Approach

50 data points 5 data points 1 data points34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

35

Structure

35Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach

• Worked example 1: Normal distribution• Worked example 2: Categorical distribution

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Categorical Distribution

or can think of data as vector with all elements zero except kth e.g. [0,0,0,1 0]

For short we write:

Categorical distribution describes situation where K possible outcomes y=1… y=k.Takes a K parameters where

36

Dirichlet DistributionDefined over K values where

Or for short: Has k parameters ak>0

37Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Categorical distribution: ML

38Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Maximize product of individual likelihoods

Categorical distribution: ML

39Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Instead maximize the log probability

Log likelihood Lagrange multiplier to ensure that params sum to one

Take derivative, set to zero and re-arrange:

Categorical distribution: MAP

40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

MAP criterion:

Categorical distribution: MAP

41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

With a uniform prior (a1..K=1), gives same result as maximum likelihood.

Take derivative, set to zero and re-arrange:

Categorical Distribution

Observed data

Five samples from prior

Five samples from posterior

42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Categorical Distribution: Bayesian approach

43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Two constants MUST cancel out or LHS not a valid pdf

Compute posterior distribution over parameters:

Categorical Distribution: Bayesian approach

44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Two constants MUST cancel out or LHS not a valid pdf

Compute predictive distribution:

ML / MAP vs. Bayesian

MAP/ML Bayesian45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Conclusion

46Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Three ways to fit probability distributions• Maximum likelihood• Maximum a posteriori• Bayesian Approach

• Two worked example• Normal distribution (ML least squares)• Categorical distribution

top related