04 cv mil_fitting_probability_models

Computer vision: models, learning and inference

Chapter 4 Fitting probability models

Please send errata to s.prince@cs.ucl.ac.uk

Structure

• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach

• Worked example 1: Normal distribution• Worked example 2: Categorical distribution

As the name suggests we find the parameters under which the data is most likely.

We have assumed that data was independent (hence product)

Predictive Density:

Evaluate new data point under probability distribution with best parameters

Maximum Likelihood

Maximum a posteriori (MAP)Fitting

As the name suggests we find the parameters which maximize the posterior probability .

Fitting

As the name suggests we find the parameters which maximize the posterior probability .

Since the denominator doesn’t depend on the parameters we can instead maximize

Maximum a posteriori (MAP)

Maximum a posteriori

Predictive Density:

Evaluate new data point under probability distribution with MAP parameters

Bayesian ApproachFitting

Compute the posterior distribution over possible parameter values using Bayes’ rule:

Principle: why pick one set of parameters? There are many values that could have explained the data. Try to capture all of the possibilities

Bayesian ApproachPredictive Density

• Each possible parameter value makes a prediction• Some parameters more probable than others

Make a prediction that is an infinite weighted sum (integral) of the predictions for each parameter value, where weights are the probabilities

Predictive densities for 3 methods

Maximum a posteriori:

Evaluate new data point under probability distribution with MAP parameters

Maximum likelihood:

Evaluate new data point under probability distribution with ML parameters

Bayesian:

Calculate weighted sum of predictions from all possible value of parameters

How to rationalize different forms?

Consider ML and MAP estimates as probability distributions with zero probability everywhere except at estimate (i.e. delta functions)

Predictive densities for 3 methods

Structure

Univariate Normal Distribution

For short we write:

Univariate normal distribution describes single continuous variable.

Normal Inverse Gamma DistributionDefined on 2 variables m and s2>0

or for short

Four parameters , , > 0 a b g and .d

As the name suggests we find the parameters under which the data is most likely.

Likelihood given by pdf

Fitting normal distribution: ML

Fitting a normal distribution: ML

Plotted surface of likelihoods as a function of possible parameter values

ML Solution is at peak

Algebraically:

or alternatively, we can maximize the logarithm

where:

Why the logarithm?

The logarithm is a monotonic transformation.

Hence, the position of the peak stays in the same place

How to maximize a function? Take derivative and set to zero.

Solution:

Maximum likelihood solution:

Should look familiar!

Least Squares

Maximum likelihood for the normal distribution...

...gives `least squares’ fitting criterion.

Fitting normal distribution: MAPFitting

As the name suggests we find the parameters which maximize the posterior probability ..

Likelihood is normal PDF

Fitting normal distribution: MAPPrior

Use conjugate prior, normal scaled inverse gamma.

Fitting normal distribution: MAP

Likelihood Prior Posterior

Again maximize the log – does not change position of maximum

Fitting normal distribution: MAPMAP solution:

Mean can be rewritten as weighted sum of data mean and prior mean:

50 data points 5 data points 1 data points

Fitting normal: Bayesian approachFitting

Compute the posterior distribution using Bayes’ rule:

Fitting normal: Bayesian approachPredictive density

Take weighted sum of predictions from different parameter values:

Fitting normal: Bayesian Approach

Structure

Categorical Distribution

or can think of data as vector with all elements zero except kth e.g. [0,0,0,1 0]

For short we write:

Categorical distribution describes situation where K possible outcomes y=1… y=k.Takes a K parameters where

Dirichlet DistributionDefined over K values where

Or for short: Has k parameters ak>0

Categorical distribution: ML

Maximize product of individual likelihoods

Categorical distribution: ML

Instead maximize the log probability

Log likelihood Lagrange multiplier to ensure that params sum to one

Take derivative, set to zero and re-arrange:

Categorical distribution: MAP

MAP criterion:

Categorical distribution: MAP

With a uniform prior (a1..K=1), gives same result as maximum likelihood.

Take derivative, set to zero and re-arrange:

Categorical Distribution

Observed data

Five samples from prior

Five samples from posterior

Categorical Distribution: Bayesian approach

Two constants MUST cancel out or LHS not a valid pdf

Compute posterior distribution over parameters:

Categorical Distribution: Bayesian approach

Two constants MUST cancel out or LHS not a valid pdf

Compute predictive distribution:

ML / MAP vs. Bayesian

Conclusion

• Three ways to fit probability distributions• Maximum likelihood• Maximum a posteriori• Bayesian Approach

• Two worked example• Normal distribution (ML least squares)• Categorical distribution

04 cv mil_fitting_probability_models

Technology

2016-10-19 - arab bank ata appeal brief€¦ · eastern...

case 1:04-cv-02628-rmb document 3 filed 04/12/04 page 1 of...

case 1:12-cv-00361-rmc document 10 filed 04/04/12 …...

case 3:04-cv-00521-btm-wmc document 52 filed 04/29/04...

sbrown cv 2016-04

4:04-cv-00848 #262

2:04-cv-08425 #170

united states court of appeals for the federal circuit ·...

esite cv 2016-04 highres

complete cv-mcroberts 04-2016

cv - 04. color

2020 04 22 cv jrodrigo

cv 27 04 14

case 2:04-cv-05184-geb-ps document 1300 filed 09/04/2007...

cv! · 2015-04-15 · cv! cv! cv! l & title: microsoft word...

2:04-cv-08425 #334

51530hupp cv-11-04.doc

case: 2:04-cv-00739-glf-mra doc #: 36 filed: 11/02/04 page...

cv 22-04-2015 (english) r gerards 04 doc

2:04-cv-08425 #149