04 cv mil_fitting_probability_models
Post on 05-Dec-2014
358 Views
Preview:
DESCRIPTION
TRANSCRIPT
Computer vision: models, learning and inference
Chapter 4 Fitting probability models
Please send errata to s.prince@cs.ucl.ac.uk
2
Structure
2Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach
• Worked example 1: Normal distribution• Worked example 2: Categorical distribution
As the name suggests we find the parameters under which the data is most likely.
We have assumed that data was independent (hence product)
Predictive Density:
Evaluate new data point under probability distribution with best parameters
Maximum Likelihood
3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Maximum a posteriori (MAP)Fitting
As the name suggests we find the parameters which maximize the posterior probability .
Again we have assumed that data was independent4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting
As the name suggests we find the parameters which maximize the posterior probability .
Since the denominator doesn’t depend on the parameters we can instead maximize
Maximum a posteriori (MAP)
5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Maximum a posteriori
Predictive Density:
Evaluate new data point under probability distribution with MAP parameters
6Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Bayesian ApproachFitting
Compute the posterior distribution over possible parameter values using Bayes’ rule:
Principle: why pick one set of parameters? There are many values that could have explained the data. Try to capture all of the possibilities
7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Bayesian ApproachPredictive Density
• Each possible parameter value makes a prediction• Some parameters more probable than others
Make a prediction that is an infinite weighted sum (integral) of the predictions for each parameter value, where weights are the probabilities
8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Predictive densities for 3 methods
Maximum a posteriori:
Evaluate new data point under probability distribution with MAP parameters
Maximum likelihood:
Evaluate new data point under probability distribution with ML parameters
Bayesian:
Calculate weighted sum of predictions from all possible value of parameters
9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
How to rationalize different forms?
Consider ML and MAP estimates as probability distributions with zero probability everywhere except at estimate (i.e. delta functions)
Predictive densities for 3 methods
10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
11
Structure
11Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach
• Worked example 1: Normal distribution• Worked example 2: Categorical distribution
Univariate Normal Distribution
For short we write:
Univariate normal distribution describes single continuous variable.
Takes 2 parameters m and s2>012Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Normal Inverse Gamma DistributionDefined on 2 variables m and s2>0
or for short
Four parameters , , > 0 a b g and .d
13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
As the name suggests we find the parameters under which the data is most likely.
Likelihood given by pdf
Fitting normal distribution: ML
14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal distribution: ML
15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting a normal distribution: ML
Plotted surface of likelihoods as a function of possible parameter values
ML Solution is at peak
16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal distribution: ML
Algebraically:
or alternatively, we can maximize the logarithm
17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
where:
Why the logarithm?
The logarithm is a monotonic transformation.
Hence, the position of the peak stays in the same place
But the log likelihood is easier to work with18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal distribution: ML
How to maximize a function? Take derivative and set to zero.
19Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Solution:
Fitting normal distribution: ML
Maximum likelihood solution:
Should look familiar!
20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Least Squares
21
Maximum likelihood for the normal distribution...
...gives `least squares’ fitting criterion.
Fitting normal distribution: MAPFitting
As the name suggests we find the parameters which maximize the posterior probability ..
Likelihood is normal PDF
22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal distribution: MAPPrior
Use conjugate prior, normal scaled inverse gamma.
23Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal distribution: MAP
Likelihood Prior Posterior
24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal distribution: MAP
Again maximize the log – does not change position of maximum
25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal distribution: MAPMAP solution:
Mean can be rewritten as weighted sum of data mean and prior mean:
26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal distribution: MAP
50 data points 5 data points 1 data points
Fitting normal: Bayesian approachFitting
Compute the posterior distribution using Bayes’ rule:
Fitting normal: Bayesian approachFitting
Compute the posterior distribution using Bayes’ rule:
Two constants MUST cancel out or LHS not a valid pdf29Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal: Bayesian approachFitting
Compute the posterior distribution using Bayes’ rule:
where
30Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal: Bayesian approachPredictive density
Take weighted sum of predictions from different parameter values:
31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal: Bayesian approachPredictive density
Take weighted sum of predictions from different parameter values:
32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal: Bayesian approachPredictive density
Take weighted sum of predictions from different parameter values:
where
33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Fitting normal: Bayesian Approach
50 data points 5 data points 1 data points34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
35
Structure
35Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach
• Worked example 1: Normal distribution• Worked example 2: Categorical distribution
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Categorical Distribution
or can think of data as vector with all elements zero except kth e.g. [0,0,0,1 0]
For short we write:
Categorical distribution describes situation where K possible outcomes y=1… y=k.Takes a K parameters where
36
Dirichlet DistributionDefined over K values where
Or for short: Has k parameters ak>0
37Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Categorical distribution: ML
38Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Maximize product of individual likelihoods
Categorical distribution: ML
39Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Instead maximize the log probability
Log likelihood Lagrange multiplier to ensure that params sum to one
Take derivative, set to zero and re-arrange:
Categorical distribution: MAP
40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
MAP criterion:
Categorical distribution: MAP
41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
With a uniform prior (a1..K=1), gives same result as maximum likelihood.
Take derivative, set to zero and re-arrange:
Categorical Distribution
Observed data
Five samples from prior
Five samples from posterior
42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Categorical Distribution: Bayesian approach
43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Two constants MUST cancel out or LHS not a valid pdf
Compute posterior distribution over parameters:
Categorical Distribution: Bayesian approach
44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Two constants MUST cancel out or LHS not a valid pdf
Compute predictive distribution:
ML / MAP vs. Bayesian
MAP/ML Bayesian45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Conclusion
46Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Three ways to fit probability distributions• Maximum likelihood• Maximum a posteriori• Bayesian Approach
• Two worked example• Normal distribution (ML least squares)• Categorical distribution
top related