nassp masters 5003f - computational astronomy - 2010 lecture 6 objective functions for model...

NASSP Masters 5003F - Computational Astronomy - 2010

Lecture 6

• Objective functions for model fitting:– Sum of squared residuals (=> the ‘method of

least squares’).– Likelihood

• Hypothesis testing


Model fitting – reminder of the terminology:• We have data yi at

samples of some independent variable xi.

• The model is our estimate of the parent or truth function.

• Let’s express the model m(xi) as a function of a few parameters θ1, θ2 .. θM.

• Finding the ‘best fit’ model then just means best estimates of the θ. (Bold – shorthand for a list)

• Knowledge of physics informs choice of m, θ.

The parent function –what we’d like tofind out (but never can,exactly).


Naive best fit calculation:• The residuals for a particular model = yi-mi.• To ‘thread the model through the middle of the

noise’, we want the magnitudes of all residuals to be small.

• A reasonable way (not the only way) to achieve this is to define a sum of squared residuals as our objective function:

• Fitting by minimizing this objective function is called the method of least squares. It is extremely common.

• NOTE! This approach IGNORES possible uncerts in x.

But what if the noise is not homogeneous?• Some bits clearly have

higher σ than others.

• Answer: weight by 1/σ2i

• This form of U is sometimes called χ2 (pronounced kai squared).

• To use it, we need to know the σi.



Simple example: mi = θ1 + θ2siModel – red is si, green the flat background.

The data yi:

Contour map of Uls.

Truth values!

An even simpler example:• Last lecture, I noted that there do exist

cases in which we can directly invert

• For least squares, this happens if the model is a polynomial function of the parameters θi.

• Expansion of grad U in this case gives a set of M linear equations in the M parameters called the normal equations.

• It is easy to solve these to get the θi.NASSP Masters 5003F - Computational Astronomy - 2009

0U

Simplest example of all: fitting a straight line.

• Called linear regression by the statisticians.

• There is a huge amount of literature on it.

• Normal equations for a line model turn out to be:

• Polynomial is an easy extension to this.NASSP Masters 5003F - Computational Astronomy - 2009

N

ii

ii

N

ii

i

N

ii

iN

ii

i

N

ii

iN

ii

i

yx

y

xx

xx

1 2

1 2

1 2

2

1 2

1 21 2

slope

intercept


χ2 for Poisson data – possible, but problematic.

• Choose data yi as estimator for σi2?

– No - can have zero values in denominator.

• Choose (evolving) model as estimator for σi2?

– No - gives a biased result.

• Better: Mighell formula

• Unbiased, but no good for goodness-of-fit.– Use Mighell to fit θ then standard U for

“goodness of fit” (GOF).

Mighell K J, Ap J 518, 380 (1999)

Another choice of U: likelihood.• Likelihood is best illustrated by Poisson

data.

• Consider a single Poisson random variable y: its PDF is

where m here plays the role of the expectation value of y.

• We’re used to thinking of this as a function just of one variable, ie y;– but it is really a function of both y and m.


!y

emmyp

my

Poisson PDF



PDF for y vs likelihood for θ.

Probability p(y|θ) = θy e–θ / y!

Likelihood p(y|θ) = θy e–θ / y!

The likelihood function.• Before, we thought “given m, let us apply

the PDF to obtain the probability of getting between y and y+dy.”

• Now we are saying “well we know y, we just measured it. We don’t know m. But surely the PDF taken as a function of m indicates the probability density for m.”

• Problems with this:– Likelihood function is not necessarily

normalized, like a ‘proper’ PDF;– What assurance do we have that the true PDF

for m has this shape??NASSP Masters 5003F - Computational Astronomy - 2009

Likelihood continued.• Usually we have many (N) samples yi.

Can we arrive at a single likelihood for all samples taken together?

• (Note that we’ve stopped talking just about Poisson data now – this expression is valid for any form of p.)

• Sometimes easier to deal with the log-likelihood L:


N

i ii mypp1total

N

i ii myLL1total

Likelihood continued• To get the best-fit model m, we need to

maximize the likelihood (or equivalently, the log likelihood).

• If we want an objective function to minimize, it is convenient to choose –L.

• Can show that for Gaussian data, minimizing –L is equivalent to minimizing the variance-weighted sum of squared residuals (=chi squared) given before.– Proof left as an exercise!



Poissonian/likelihood version of slide 3Model – red is si, green the flat background.

The data yi:

Map of the joint likelihood L.

What if also errors in xi?• Tricky… Bayes better in this case.


What next?• In fitting a model, we want (amplifying a bit

on lecture 4):1. The best fit values of the parameters;

2. Then we want to know if these values are good enough!• If not: need to go back to the drawing board and

choose a new model.

3. If the model passes, want uncertainties in the best-fit parameters.• (I’ll put this off to a later lecture…)

• Number 1 is accomplished. √


How to tell if our model is correct.• Supposing our model is absolutely

accurate.• The U value we calculate is, nevertheless,

a random variable: each fresh set of data will give rise to a slightly different value of U.

• In other words, U, even in the case of a perfectly accurate model, will have some spread – in fact, like any other random variable, it will have a PDF.– This PDF is sometimes calculable from first

principles (if not, one can do a Monte Carlo to estimate it).


How to tell if our model is correct.• The procedure is:

– First calculate the PDF for U in the ‘perfect fit’ case;

– From this curve, obtain the value of the PDF at our best-fit value of U;

– If p(Ubest fit) is very small, it is unlikely that our model is correct.

– Note that both χ2 and –L have the property that they cannot be negative.

– A model which is a less than ideal match to the truth function will always generate U values with a PDF displaced to higher values of U.


Perfect vs. imperfect p(U):


A perfect model gives thisshape PDF

PDF for imperfect modelis ALWAYS displaced to higher U.

Goodness of model continued• Because plausible Us are >=0; and because an

imperfect model always gives higher U: we prefer to– generate the survival function for the perfect model;– that tells us the probability of a perfect model giving us

the measured value of U or higher.

• This procedure is called hypothesis testing.• Because we make the hypothesis:

– “Suppose our model is correct. What sort of U value should we expect to find?”

• We’ll encounter the technique again next lecture when we turn to enquire if there is any signal at all buried in the noise.


• If we use the least-squares U (also known as χ2), this is easy, because p(U) is known for this:

where– Г is the gamma function

– and υ is called the degrees of freedom.

• Note: the PDF has a peak at U~υ.

Perfect-model p(U)s:


22

2exp2

22

UU

Up

0

1xt tedtx

What are degrees of freedom?• The easiest way to illustrate what degrees

of freedom is, is to try fitting a polynomial of higher and higher order to a set of noisy data.

• The more orders we include, the nearer the model will fit the data, and the smaller the sum of squared residuals (χ2) will be, until…

• when M=N (ie the number of parameters, polynomial orders in this case, equals the number of data points), the model will go through every point exactly. χ2 will equal 0.


Degrees of freedom• Defined as N-M: number of data points

minus number of parameters fitted.

• It is sometimes convenient to define a reduced chi squared

– PDF for χ2reduced should of course peak at

about 1.– There is no advantage in using this for

minimization rather than the ‘raw’ χ2.


22reduced

1 MN

‘Survival function’ for U.• Remember the survival function of a PDF

is defined as

• For χ2 this is

• where Г written with 2 arguments like this is called the incomplete gamma function:


2

2,21,2

UUP

xpdxxxPx

0

0

x at tedtxa

0

1,

nassp masters 5003f - computational astronomy - 2010 lecture 6 objective functions for model...

Documents