04 cv mil_fitting_probability_models

46
Computer vision: models, learning and inference Chapter 4 Fitting probability models Please send errata to [email protected]

Upload: zukun

Post on 05-Dec-2014

358 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: 04 cv mil_fitting_probability_models

Computer vision: models, learning and inference

Chapter 4 Fitting probability models

Please send errata to [email protected]

Page 2: 04 cv mil_fitting_probability_models

2

Structure

2Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach

• Worked example 1: Normal distribution• Worked example 2: Categorical distribution

Page 3: 04 cv mil_fitting_probability_models

As the name suggests we find the parameters under which the data is most likely.

We have assumed that data was independent (hence product)

Predictive Density:

Evaluate new data point under probability distribution with best parameters

Maximum Likelihood

3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 4: 04 cv mil_fitting_probability_models

Maximum a posteriori (MAP)Fitting

As the name suggests we find the parameters which maximize the posterior probability .

Again we have assumed that data was independent4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 5: 04 cv mil_fitting_probability_models

Fitting

As the name suggests we find the parameters which maximize the posterior probability .

Since the denominator doesn’t depend on the parameters we can instead maximize

Maximum a posteriori (MAP)

5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 6: 04 cv mil_fitting_probability_models

Maximum a posteriori

Predictive Density:

Evaluate new data point under probability distribution with MAP parameters

6Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 7: 04 cv mil_fitting_probability_models

Bayesian ApproachFitting

Compute the posterior distribution over possible parameter values using Bayes’ rule:

Principle: why pick one set of parameters? There are many values that could have explained the data. Try to capture all of the possibilities

7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 8: 04 cv mil_fitting_probability_models

Bayesian ApproachPredictive Density

• Each possible parameter value makes a prediction• Some parameters more probable than others

Make a prediction that is an infinite weighted sum (integral) of the predictions for each parameter value, where weights are the probabilities

8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 9: 04 cv mil_fitting_probability_models

Predictive densities for 3 methods

Maximum a posteriori:

Evaluate new data point under probability distribution with MAP parameters

Maximum likelihood:

Evaluate new data point under probability distribution with ML parameters

Bayesian:

Calculate weighted sum of predictions from all possible value of parameters

9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 10: 04 cv mil_fitting_probability_models

How to rationalize different forms?

Consider ML and MAP estimates as probability distributions with zero probability everywhere except at estimate (i.e. delta functions)

Predictive densities for 3 methods

10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 11: 04 cv mil_fitting_probability_models

11

Structure

11Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach

• Worked example 1: Normal distribution• Worked example 2: Categorical distribution

Page 12: 04 cv mil_fitting_probability_models

Univariate Normal Distribution

For short we write:

Univariate normal distribution describes single continuous variable.

Takes 2 parameters m and s2>012Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 13: 04 cv mil_fitting_probability_models

Normal Inverse Gamma DistributionDefined on 2 variables m and s2>0

or for short

Four parameters , , > 0 a b g and .d

13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 14: 04 cv mil_fitting_probability_models

As the name suggests we find the parameters under which the data is most likely.

Likelihood given by pdf

Fitting normal distribution: ML

14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 15: 04 cv mil_fitting_probability_models

Fitting normal distribution: ML

15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 16: 04 cv mil_fitting_probability_models

Fitting a normal distribution: ML

Plotted surface of likelihoods as a function of possible parameter values

ML Solution is at peak

16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 17: 04 cv mil_fitting_probability_models

Fitting normal distribution: ML

Algebraically:

or alternatively, we can maximize the logarithm

17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

where:

Page 18: 04 cv mil_fitting_probability_models

Why the logarithm?

The logarithm is a monotonic transformation.

Hence, the position of the peak stays in the same place

But the log likelihood is easier to work with18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 19: 04 cv mil_fitting_probability_models

Fitting normal distribution: ML

How to maximize a function? Take derivative and set to zero.

19Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Solution:

Page 20: 04 cv mil_fitting_probability_models

Fitting normal distribution: ML

Maximum likelihood solution:

Should look familiar!

20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 21: 04 cv mil_fitting_probability_models

21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Least Squares

21

Maximum likelihood for the normal distribution...

...gives `least squares’ fitting criterion.

Page 22: 04 cv mil_fitting_probability_models

Fitting normal distribution: MAPFitting

As the name suggests we find the parameters which maximize the posterior probability ..

Likelihood is normal PDF

22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 23: 04 cv mil_fitting_probability_models

Fitting normal distribution: MAPPrior

Use conjugate prior, normal scaled inverse gamma.

23Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 24: 04 cv mil_fitting_probability_models

Fitting normal distribution: MAP

Likelihood Prior Posterior

24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 25: 04 cv mil_fitting_probability_models

Fitting normal distribution: MAP

Again maximize the log – does not change position of maximum

25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 26: 04 cv mil_fitting_probability_models

Fitting normal distribution: MAPMAP solution:

Mean can be rewritten as weighted sum of data mean and prior mean:

26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 27: 04 cv mil_fitting_probability_models

Fitting normal distribution: MAP

50 data points 5 data points 1 data points

Page 28: 04 cv mil_fitting_probability_models

Fitting normal: Bayesian approachFitting

Compute the posterior distribution using Bayes’ rule:

Page 29: 04 cv mil_fitting_probability_models

Fitting normal: Bayesian approachFitting

Compute the posterior distribution using Bayes’ rule:

Two constants MUST cancel out or LHS not a valid pdf29Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 30: 04 cv mil_fitting_probability_models

Fitting normal: Bayesian approachFitting

Compute the posterior distribution using Bayes’ rule:

where

30Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 31: 04 cv mil_fitting_probability_models

Fitting normal: Bayesian approachPredictive density

Take weighted sum of predictions from different parameter values:

31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 32: 04 cv mil_fitting_probability_models

Fitting normal: Bayesian approachPredictive density

Take weighted sum of predictions from different parameter values:

32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 33: 04 cv mil_fitting_probability_models

Fitting normal: Bayesian approachPredictive density

Take weighted sum of predictions from different parameter values:

where

33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 34: 04 cv mil_fitting_probability_models

Fitting normal: Bayesian Approach

50 data points 5 data points 1 data points34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 35: 04 cv mil_fitting_probability_models

35

Structure

35Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach

• Worked example 1: Normal distribution• Worked example 2: Categorical distribution

Page 36: 04 cv mil_fitting_probability_models

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Categorical Distribution

or can think of data as vector with all elements zero except kth e.g. [0,0,0,1 0]

For short we write:

Categorical distribution describes situation where K possible outcomes y=1… y=k.Takes a K parameters where

36

Page 37: 04 cv mil_fitting_probability_models

Dirichlet DistributionDefined over K values where

Or for short: Has k parameters ak>0

37Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 38: 04 cv mil_fitting_probability_models

Categorical distribution: ML

38Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Maximize product of individual likelihoods

Page 39: 04 cv mil_fitting_probability_models

Categorical distribution: ML

39Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Instead maximize the log probability

Log likelihood Lagrange multiplier to ensure that params sum to one

Take derivative, set to zero and re-arrange:

Page 40: 04 cv mil_fitting_probability_models

Categorical distribution: MAP

40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

MAP criterion:

Page 41: 04 cv mil_fitting_probability_models

Categorical distribution: MAP

41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

With a uniform prior (a1..K=1), gives same result as maximum likelihood.

Take derivative, set to zero and re-arrange:

Page 42: 04 cv mil_fitting_probability_models

Categorical Distribution

Observed data

Five samples from prior

Five samples from posterior

42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 43: 04 cv mil_fitting_probability_models

Categorical Distribution: Bayesian approach

43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Two constants MUST cancel out or LHS not a valid pdf

Compute posterior distribution over parameters:

Page 44: 04 cv mil_fitting_probability_models

Categorical Distribution: Bayesian approach

44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Two constants MUST cancel out or LHS not a valid pdf

Compute predictive distribution:

Page 45: 04 cv mil_fitting_probability_models

ML / MAP vs. Bayesian

MAP/ML Bayesian45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 46: 04 cv mil_fitting_probability_models

Conclusion

46Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Three ways to fit probability distributions• Maximum likelihood• Maximum a posteriori• Bayesian Approach

• Two worked example• Normal distribution (ML least squares)• Categorical distribution